Invest in Hardware
Often, the most significant improvements in RiverWare model performances can be obtained by investing in hardware. When investing in hardware, the following should be considered (in order of most significance):
• Physical Memory (RAM) - purchase enough to eliminate paging
• If completely eliminating paging is impossible, purchase multiple faster disks.
• If making multiple runs, purchase additional machines or use multiple CPUs.
• Purchase a machine looking at the following attributes:
– Memory size
– Cache size
– Bus speed
– Disk speed
– CPU speed
Install More Physical Memory (RAM)
If your model is causing paging, then installing more RAM will lead to less paging and faster run times. Most modern operating systems allow programs to access more memory locations than are available to that program in physical memory. When RiverWare attempts to access a location that is not in physical memory, a page fault occurs and the operating system brings the page containing that location into physical memory. Since the time to access a memory location is likely to be orders of magnitude quicker if that memory location is currently in physical memory than when it is located in disk storage, reducing the number of page faults during a RiverWare run will decrease the run time.
To determine if your model leads to page faults and the associated disk I/O delays, use the Windows perfmon tool to display the Memory: Pages/Sec counter and the PhysicalDisk: % Disk Time for the disk which contains your system page file. See
Windows perfmon Utility for details.
These counters will be non-zero when the system is processing (hard) page faults. You will also notice that when RiverWare is doing significant I/O, the CPU Usage for the RiverWare process as displayed in the Windows Task Manager will be less than full (for a single CPU, less than 100%).
Figure 1.10 shows a sample model run.
Note: RiverWare Virtual Bytes is 11.1 GB.
Figure 1.10 Sample model run
How much RAM do you need? If your model/computer consistently utilizes 11 GB of memory, then make sure you have at least 12 GB of RAM available for it, and preferably more. Adding enough RAM will ensure that all the required memory used will indeed be used from the RAM, and not supplemented by paging. To determine how much memory is used during a model run, look at the perfmon utility. Under the Process, Virtual Bytes entry, it shows the number of bytes used by the riverware process. This is the total space that RiverWare is using. To fit RiverWare completely in memory, the amount of RAM should be larger than this plus enough for the operating system.
Purchase Additional, Faster Disks
If is impossible for you to eliminate paging during a model run, one strategy is to make the paging faster. To do this, purchase faster disks so the paging operation is faster.
Note: This refers to physical disks, not Windows drive letters, which are really partitions on physical disks. Partitioning a single drive does not improve performance.
It also helps if the different operations running on the machine do not compete for use of the same disk. For this reason it helps to have your paging files on different physical disks from your normal data and programs. See
Adjust the Size or Location of Your Paging File for details.
Use Additional CPUs for Multiple Runs
If you are making multiple runs, i.e. alternative analysis, and you have access to multiple machines (or a single machine with multiple CPUs), then you can run multiple RiverWare processes simultaneously. For example, if you are completing an alternatives analysis and are ready for three production runs, use three machines to run alternatives 1, 2 and 3 in parallel. The entire analysis will take the same time as running one run and should be three times faster than running them in series. Batch processing can help to manage many parallel different runs; see
Batch Mode and RiverWare Command Language in Automation Tools. Also, see
Distributed Concurrent Runs in Solution Approaches for details about running traces of a Concurrent Multiple Run in parallel on different processors on the same computer.
Also remember that a machine with multiple CPUs will speed execution of parallel runs but the total physical memory is shared among all CPUs and can limit run speed. In that regard, it may still be better to run parallel runs on completely separate machines.
Purchase a Bigger, Faster Machine
Above we have presented some approaches to upgrade existing hardware so that RiverWare performs better. Often it may be more cost effective to replace hardware with new machines. When selecting a new machine look at the following attributes, in order: physical memory size, cache size, bus speed, disk speed, and CPU speed.
Physical Memory
Purchase as much physical memory (aka RAM) as you can. Having plenty of physical memory allows you to “fit” all computations and data allocations into memory and prevent disk paging. Typical RAM configurations are 8GB all the way up to 64GB or more. For example, training machines at CADSWES have 8GB of RAM while development machines utilize 64GB of RAM.
Most operating systems configured for “servers” do allow large banks of main memory to be used. At CADSWES we have not yet experimented with Windows server configurations, so we cannot attest to the potential speed benefits of purchasing such a machine, nor can we guarantee that there will not be attendant problems. This is an option worthy of consideration.
Cache Size
Data accesses are faster if the data reside in memory than if they reside on disk. Modern computers further speed up memory accesses by employing caches of faster memory. Cache memory is physically different from main memory, making it more expensive, and cache memory physically resides closer to the CPU (the heart of the computer) than does main memory.
Thus, if the data needed resides in a memory cache, the access will be faster than if it resides in main memory. Consequently, you will benefit from having the largest caches available on the machines you purchase. Some architectures have Level 1 (L1), Level 2 and Level 3 caches. Some have only L1 and L2 and often only the lower-level (L2, L3) cache sizes are configureable when you purchase a machine.
Front-side Bus Disk Speed
Accesses to main memory are slower than accesses to cache memory. This is because the cycle time of the cache memory is lower than that of main memory, but also because the electrons have to travel farther from main memory than from cache memory. On many architectures, data (and instructions) flowing from main memory to the CPU have to travel across a central system bus. Consequently, the speed of the bus determines the speed of main-memory accesses, and a good rule of thumb is to purchase the fastest bus you can get. Most Intel-based computers call this bus the “Front-Side Bus”.
CPU
Most object-oriented applications, RiverWare included, are not limited by CPU speed, due to the nature of their memory access patterns. Consequently, CPU speed is not the most important factor in purchasing a new machine as most new machines have adequate CPU speeds.
One of the ways modern computers are made more powerful is by giving them more than one CPU. Some applications are multi-threaded, which allows them to run somewhat faster by using more than one CPU in parallel. RiverWare is not multi-threaded, however, you might still benefit from having multiple CPUs because the operating systems typically run more than one program, even if you have nothing but RiverWare running. Having another CPU can reduce the amount of time that RiverWare has to give up the CPU and its cached data for use by another process. If you tend to use your machine for things other than running RiverWare (that is to say, you do not have a dedicated machine for RiverWare), then you most certainly can benefit from additional CPUs.
Note: In so-called “multi-core” machines, two or more CPUs might share some cache, but be unable to share all of the cache. For example, a four-core machine might in fact be two two-core machines, where two CPUs share half the L2 cache and the other two CPUs share the other half of the L2 cache. This kind of architecture might be labeled as having, say 8 MB of L2 cache, when in fact it has two 4 MB L2 caches (2 x 4 MB). In this example, the most L2 cache that RiverWare could use is 4 MB. This also means that a process scheduled on the CPU that shares cache with the RiverWare CPU can have a negative impact on RiverWare performance, whereas the processes scheduled on the other CPUs have no effect on RiverWare available cache. The upshot: a two CPU machine with caches dedicated to each CPU might perform better than a two CPU machine with shared cache, all else being equal.