
CSIM 19/OptQuest : Using CSIM 19/OptQuest to Configure an Online Server
Combining Simulation and Optimization
An online server is a computer application executing on a computer system (the host) which is attached to a network. The server receives requests from other systems which are attached to the network. Many of these servers are operated in a commercial environment. Examples of commercial online servers include:
-
A database management system (DBMS) handling requests to retrieve insurance policies in an insurance company
-
An airline reservation system handling inquiries about the availability of seats on airline flights
-
A world wide web (www) server handling requests for html pages from browsers on the internet, and
-
An online retailer, offering items (such as books, computers, or general merchandise) for sale.
Most of these systems must deliver the requested information or conclude a commercial transaction in a timely manner. If this does not happen, the client (the requester) may become dissatisfied with the service being offered. In many cases, this dissatisfaction can result in lost business or decreased revenues for the commercial site.
Managing such a commercial online systems can present some special problems to the system manager. The demand for the online service can vary widely; anomalous factors such as advertising campaigns, mentions in product reviews and seasonal changes can stimulate spikes in the request arrival rates. As a result, the system must be configured with reserve processing capacity, so that service levels do not degrade significantly as arrivals rates increase dramatically. Still, most of these systems operate within strict financial constraints. Thus, the manager's job can often be viewed as continually having to balance the need for increased processing capability and the need to operate within the budget constraints for the system.
System Simulation and Optimization
System simulation has been used by system managers to aid in making decisions about configuring online systems to handle projected workloads. A valid model of an online system can be extremely useful for predicting system performance (as reflected in transaction response times) as the workload varies. In particular, the model can predict response times as transaction arrival rates spike to unprecedented levels.
CSIM 19 is a simulation library which has been used to model online systems. CSIM 19 offers many features which enable the system modeler to construct robust models of complex and large system. "CSIM 18 -- The Simulation Engine," Schwetman, 1996 [Schw96], has a description of CSIM and its use as a simulation engine.
While having a model can help make better decisions regarding configuring a system to handle a projected workload, actually developing the different configurations which could be used can be a difficult task. A component which can guide the search for the best possible configuration has not been part of the set of tools available to system modelers and managers.
Recent research has lead to the development of a tool (OptQuest [GKLa96]) that can be used to guide the search for the best possible system configuration to achieve stated performance goals. In addition, this tool can be used to find the least expensive configuration which will meet a pre-specified performance goal. This tool is called CSIM 19/OptQuest.
The remainder of this paper demonstrates this powerful combination of simulation and optimization software to create a tool which will enable better decisions in the management of online systems.
System Model
The CSIM 19/OptQuest configuration selection tool is described using a model of an online computer system. The system consists of a server node connected to a local area network (LAN). In this model, clients submit transactions to the system; the transaction generation rate is a set at 20 transactions per second. This model does not attempt to model any networks beyond the LAN shown above, so the transaction response times are just the time required inside of the host system: from the time the transaction enters the LAN until the time that the last of the requested data has left the LAN. The LAN transfer rate is a parameter of the model.
The system has a limit of 32 simultaneous network connections. There is a time out limit of 10 seconds associated with these connections; if the connection cannot be made within the time out limit, the transaction is rejected.The system consists of one or more CPUs with shared memory and one or more disk drives. The system processes a transaction via a sequence of cycles, where a cycle consists of a CPU usage interval, followed by a disk usage interval and a LAN usage interval. One property of the system is the file cache with the corresponding file cache hit rate, expressed as the probability of finding a requested file in the cache. If a file is found in the cache, all of the disk accesses for that transaction are skipped. The amount of main memory in the system determines the file cache hit rate.
The number of CPUs and the number disk drives are adjustable parameters of a system configuration as are the network transfer rate and the amount of main memory. The ranges of these four groups of parameters are summarized in Table 1. The File cache hit rates are shown in Table 2, and the LAN transfer rates are summarized in Table 3.
| Table 1: Summary of Adjustable Parameters | |||
| Parameter | Minimum | Maximum | |
| Number of CPUs | 1 | 4 | |
| Number of disk drives | 1 | 16 | |
| Memory size index | 0 | 3 | |
| LAN transfer rate index | 0 | 2 | |
| Table 2: File Cache Hit Rates | |
| Mem size index | File cache hit rate |
| 0 | 0.00 |
| 1 | 0.50 |
| 2 | 0.85 |
| 3 | 0.99 |
| Table 3: LAN Transfer Rate | |
| LAN rate index | Transfer rate (Mbits/sec) |
| 0 | 10.0 |
| 1 | 100.0 |
| 2 | 1000.0 |
One feature of the model is that as more CPUs are included in a configuration, the processing rate of each individual CPU decreases. The purpose of this is to model the interface experienced between multiple CPU's as they compete for access to main memory. The base CPU processing rate is set at 250 million instructions per second (mips); the rates for multiple CPU's are shown in Table 4.
| Table 4: CPU Processing Rates | |
| Number of CPUs | MIPS per CPU |
| 1 | 250 |
| 2 | 240 |
| 3 | 225 |
| 4 | 200 |
Each of the adjustable parameters has an associated cost factor, computed as the base cost plus an incremental cost for each additional unit or factor. The cost of configuration is computed as the sum of the base cost factor plus an incremental cost for each additional unit or factor included in the configuration. These cost factors are summarized in Table 5.
| 5.Table 5: Cost Factors | ||
| Component | Base cost | Incremental cost |
| CPU | 0 | 1000.00 |
| Disks | 0 | 2500.00 |
| Mem. Size factor | 0 | 250.00 |
| LAN transfer rate factor | 5000.0 | 2000.0 |
Other important parameters of the model are as shown in Table 6.
|
Table 6 - Other
System Model Parameters
|
||||
|
Parameter
|
value
|
units
|
||
|
Disk time per
block
|
0.15
|
sec
|
||
|
Disk block size
|
512
|
bytes
|
||
|
Number disk blocks/trans
(min, max, mode) |
1
|
100
|
10
|
blocks
|
|
Network block
sizes
|
1500
|
1500
|
64000
|
bytes
|
|
(10 mbps, 100
mbps, 1000 mbps)
|
||||
|
CPU instr/disk block
|
0.50
|
mips
|
||
|
Transaction interarrival
time
|
0.05
|
sec
|
||
|
Maximum trans
reject rate
|
0.05
|
|||
|
Min. processing
rate (model #1)
|
15.0
|
tps
|
||
|
Max cost (model
#2)
|
21,000.00
|
|||
|
Number of network
connections
|
32
|
|||
|
Connection time-out
interval
|
10.00
|
sec
|
||
CSIM 19 has a mechanism for automatically controlling the run length for an execution of that model [Schw97]. The mechanism allows the program to specify a confidence level and relative error for a parameter; it then runs the model until the accuracy at the specified confidence level is achieved. In the runs reported in this paper, the mean transaction response time was the parameter used to control the run length. A relative error of 0.05 and a confidence level of 0.90 were specified and achieved for all of the runs.
Optimization
The OptQuest optimization tool has been adapted to work with models constructed with the CSIM 19 simulation library. OptQuest manages a search for better configurations of a system model. The steps in this search can be summarized as follows [Lagu99]:
1. Setup the search procedure:
a. Define the configuration variables, along with their limits
b. Define the requirements, along with the their limits
c. Define the problem, including the optimization criteria and the number of iterations to be used
d. Initialize the search2. For the specified number of iterations, do the following steps
a. Get the next configuration from the search manager
b. Run the system model for the specified configuration
c. Store the values of the optimization criteria and the requirements produced by the mode3. When the specified number of iterations have been performed, get the "best" configuration and the corresponding values of the optimization criteria and requirements.
The techniques used to guide the search are based on "tabu search', developed by Glover, et. al. [GKLa96]. The use of the combined CSIM 19/OptQuest package is illustrated in the example in the following section.
Selecting the Best Configuration for an Online Server
The first issue addressed with the model was to find the least expensive configuration which had a system processing rate of greater than 15 transactions per second and a transaction rejection rate of less than 5% of the submitted transactions.
The sequence of configurations as selected by OptQuest are shown in Table 7. It can be seen that the first configuration selected was a minimal configuration, consisting of one CPU, one disk drive, a file cache hit rate of 0.00 and the slowest network device. The cost of this configuration is $10,750.00, but the throughput rate is 1.87 transactions per sec. (below the minimum requirement of 15) and the rejection rate is 91% (above the maximum of 5%). The second configuration is the maximum configuration, consisting of four CPUs, 16 disk drives, the best hit rate (0.99) and the fastest network device. This configuration cost $29,000.00 and has an acceptable throughput rate (20.06 transactions per sec.) and a rejection rate of 0%.
| Table 7 - First Model: Sequence of Configurations and Results | ||||||||
| Run | CPUs | Disks | Hit Rate | Net Rate | Cost | Tput | Resp Tm | RejRate |
| 1 | 1 | 1 | 0.00 | 10.00 | 10750.00 | 1.87 | 17.02 | 0.91 |
| 2 | 4 | 16 | 0.99 | 1000.00 | 29000.00 | 20.06 | 0.10 | 0.00 |
| 3 | 3 | 9 | 0.85 | 100.00 | 21750.00 | 20.13 | 0.19 | 0.00 |
| 4 | 1 | 7 | 0.00 | 1000.00 | 16250.00 | 11.06 | 2.87 | 0.43 |
| 5 | 3 | 15 | 0.00 | 10.00 | 16250.00 | 19.46 | 1.61 | 0.01 |
| 6 | 1 | 3 | 0.99 | 100.00 | 20750.00 | 14.16 | 2.24 | 0.27 |
| 7 | 4 | 16 | 0.99 | 10.00 | 25000.00 | 20.12 | 0.12 | 0.00 |
| 8 | 4 | 4 | 0.00 | 100.00 | 16500.00 | 6.83 | 4.66 | 0.65 |
| 9 | 3 | 13 | 0.00 | 1000.00 | 19750.00 | 17.71 | 1.77 | 0.08 |
| 10 | 1 | 15 | 0.85 | 100.00 | 21250.00 | 14.27 | 2.20 | 0.26 |
| 11 | 3 | 13 | 0.50 | 10.00 | 18250.00 | 20.74 | 0.59 | 0.00 |
| 12 | 3 | 11 | 0.50 | 100.00 | 19750.00 | 19.69 | 0.59 | 0.00 |
| 13 | 3 | 1 | 0.99 | 1000.00 | 24250.00 | 19.99 | 0.10 | 0.00 |
| 14 | 3 | 5 | 0.99 | 1000.00 | 25250.00 | 20.69 | 0.10 | 0.00 |
| 15 | 3 | 16 | 0.00 | 10.00 | 16500.00 | 19.67 | 1.55 | 0.00 |
| 16 | 3 | 16 | 0.50 | 10.00 | 19000.00 | 20.46 | 0.53 | 0.00 |
| 17 | 3 | 15 | 0.50 | 100.00 | 20750.00 | 20.45 | 0.49 | 0.00 |
| 18 | 4 | 16 | 0.85 | 100.00 | 24500.00 | 19.96 | 0.19 | 0.00 |
| 19 | 2 | 8 | 0.00 | 10.00 | 13500.00 | 12.16 | 2.62 | 0.39 |
| 20 | 2 | 14 | 0.00 | 100.00 | 18000.00 | 18.45 | 1.70 | 0.06 |
| Best(5) | 3 | 15 | 0.00 | 10.00 | 16250.00 | 19.46 | 0.00 | 0.01 |
The search continues, with OptQuest selecting different configurations, each leading to different results. After 20 iterations (the number specified), the best acceptable results are shown on the last line of Table 7. It can be seen that OptQuest was trying to trade-off disk drives and main memory (cache hit rates). Because main memory is more expensive than disk drives (in this model), it finally selected a configuration with 15 disk drives and the minimal amount of main memory, along with three CPUs and the slowest network device. The cost of the optimized system is $16,250.00, and the throughput rate is 19.46 tps, while the most expensive system costs $29,000 and delivers a throughput of only 20.06 tps. The optimized solution costs only 56% of the fully-loaded system price, but delivers 97% of the performance.
The second issue addressed with the model was to try to find the fastest configuration, where speed is measured in terms of the average transaction response time, with the requirements that the configuration cost less than $21,000.00 and the transaction rate be less than 5%. The sequence of configuration as selected by OptQuest to solve this problem is as shown in Table 8.
| Table 8 Second Model: Sequence of Configurations and Results | ||||||||
| Run | CPUs | Disks | Hit Rate | Net Rate | Cost | Tput | Resp Tm | RejRate |
| 1 | 1 | 1 | 0.00 | 10.00 | 10750.00 | 1.87 | 17.02 | 0.91 |
| 2 | 4 | 16 | 0.99 | 1000.00 | 29000.00 | 20.06 | 0.10 | 0.00 |
| 3 | 3 | 9 | 0.85 | 100.00 | 21750.00 | 20.13 | 0.19 | 0.00 |
| 4 | 1 | 7 | 0.00 | 1000.00 | 16250.00 | 11.06 | 2.87 | 0.43 |
| 5 | 3 | 15 | 0.00 | 10.00 | 16250.00 | 19.46 | 1.61 | 0.01 |
| 6 | 1 | 3 | 0.99 | 100.00 | 20750.00 | 14.16 | 2.24 | 0.27 |
| 7 | 4 | 16 | 0.99 | 10.00 | 25000.00 | 20.12 | 0.12 | 0.00 |
| 8 | 4 | 4 | 0.00 | 100.00 | 16500.00 | 6.83 | 4.66 | 0.65 |
| 9 | 3 | 13 | 0.00 | 1000.00 | 19750.00 | 17.71 | 1.77 | 0.08 |
| 10 | 1 | 15 | 0.85 | 100.00 | 21250.00 | 14.27 | 2.20 | 0.26 |
| 11 | 3 | 15 | 0.50 | 100.00 | 20750.00 | 19.63 | 0.50 | 0.00 |
| 12 | 4 | 16 | 0.85 | 100.00 | 24500.00 | 20.47 | 0.19 | 0.00 |
| 13 | 2 | 8 | 0.00 | 10.00 | 13500.00 | 12.39 | 2.55 | 0.36 |
| 14 | 3 | 16 | 0.50 | 10.00 | 19000.00 | 19.96 | 0.52 | 0.00 |
| 15 | 3 | 13 | 0.50 | 10.00 | 18250.00 | 20.21 | 0.55 | 0.00 |
| 16 | 3 | 11 | 0.50 | 100.00 | 19750.00 | 19.90 | 0.59 | 0.00 |
| 17 | 3 | 1 | 0.99 | 1000.00 | 24250.00 | 20.35 | 0.10 | 0.00 |
| 18 | 3 | 5 | 0.99 | 1000.00 | 25250.00 | 19.76 | 0.10 | 0.00 |
| 19 | 3 | 16 | 0.00 | 10.00 | 16500.00 | 19.66 | 1.52 | 0.00 |
| 20 | 3 | 14 | 0.00 | 100.00 | 18000.00 | 18.45 | 1.71 | 0.04 |
| Best(11) | 3 | 15 | 0.50 | 100.00 | 20750.00 | 0.00 | 0.50 | 0.00 |
In the second model, OptQuest finds that the best solution after 20 iterations cost $20,750.00 and has an average response time of 0.50 seconds and an rejection rate of 0%. This configuration had three CPUs, 15 disk drives (as in the configuration found in the first model), but this configuration has more main memory (with a file cache hit rate of 0.5).
The CPU execution times required to get the results for the first model are shown in Table 9. These were obtained on a 300 MHz PC with 128 megabytes of main memory. The total time for the 20 runs is 110.4 seconds of CPU time.
| Table 9: CPU Times (seconds) for 20 Runs Summarized in Table 7 | |
| Run | CPU Time |
| 1 | 6.828 |
| 2 | 2.547 |
| 3 | 7.375 |
| 4 | 5.281 |
| 5 | 5.250 |
| 6 | 3.266 |
| 7 | 1.828 |
| 8 | 5.844 |
| 9 | 4.593 |
| 10 | 3.407 |
| 11 | 3.000 |
| 12 | 3.922 |
| 13 | 1.438 |
| 14 | 1.437 |
| 15 | 20.078 |
| 16 | 2.937 |
| 17 | 2.563 |
| 18 | 1.843 |
| 19 | 21.953 |
| 20 | 5.000 |
Summary
This application note has illustrated the use of a new product, CSIM 19/OptQuest, which can be used to find the best configuration for a system with many configuration options. Two selection problems were analyzed:
1. Find the least expensive system which meets a specified performance requirement, and
2. Find the most powerful system which costs less than a specified cost requirement.
This product is based on CSIM, a widely used simulation package, and OptQuest, a well known optimization package from OptTek Systems that has been tailored to work with CSIM for this product.The results from the example show that the search procedure implemented in OptQuest leads to a rapid selection of an acceptable and good configuration.
For more information on this product, contact Mesquite Software at info@mesquite.com.
Acknowledgements
CSIM is copyrighted by Microelectronics and Computer Technology Corporation (MCC). CSIM 19 is supported and marketed by Mesquite Software under license from MCC. Dr. Jeff Brumfield developed the data collection and presentations functions including the run-length control algorithm in CSIM.
OptQuest is copyrighted by Optimization Technologies, Inc.
List of References
[GKLa96] Glover, F., Kelly, J. and M. Laguna, "New Advances and Applications of Combining Simulation and Optimization", Proceedings of the 1996 Winter Simulation Conference, ed. J. Charnes, D. Morrice, D. Brunner and J. Swain, Dec. 8 - 11, 1996, pp.144 - 152.
[Lagu99] Laguna, H., OptQuest Callable Library for C (Windows) Applications, Optimization Technologies, Inc., Boulder, CO., 1999.
[Mesq97] User's Guide, CSIM 18 C++ Version, Mesquite Software, Inc., Austin, TX, 1997.
[Schw96] Schwetman, H., "CSIM 18 - the Simulation Engine", Proceedings of the 1996 Winter Simulation Conference, ed. J. Charnes, D. Morrice, D. Brunner and J. Swain, Dec. 8 - 11, 1996, pp. 517 - 521.
[Schw97] Schwetman,
H. "Data Analysis and Automatic
Run-Length Control in CSIM 18",
Proceedings of the 1997 Winter Simulation
Conference, ed. S. Andradottir, K. Healy,
D. Withers, and B. Nelson, Dec. 7 -
10, 1997, pp. 687 692.

