An online server is a computer application
executing on a computer system (the
host) which is attached to a network.
The server receives requests from other
systems which are attached to the network.
Many of these servers are operated in
a commercial environment. Examples of
commercial online servers include:
-
A database management
system (DBMS) handling requests to
retrieve insurance policies in an
insurance company
-
An airline reservation
system handling inquiries about the
availability of seats on airline flights
-
A world wide web
(www) server handling requests for
html pages from browsers on the internet,
and
-
An online retailer,
offering items (such as books, computers,
or general merchandise) for sale.
Most of these systems must deliver
the requested information or conclude
a commercial transaction in a timely
manner. If this does not happen, the
client (the requester) may become dissatisfied
with the service being offered. In many
cases, this dissatisfaction can result
in lost business or decreased revenues
for the commercial site.
Managing such a commercial online systems
can present some special problems to
the system manager. The demand for the
online service can vary widely; anomalous
factors such as advertising campaigns,
mentions in product reviews and seasonal
changes can stimulate spikes in the
request arrival rates. As a result,
the system must be configured with reserve
processing capacity, so that service
levels do not degrade significantly
as arrivals rates increase dramatically.
Still, most of these systems operate
within strict financial constraints.
Thus, the manager's job can often be
viewed as continually having to balance
the need for increased processing capability
and the need to operate within the budget
constraints for the system.
System simulation has been used by
system managers to aid in making decisions
about configuring online systems to
handle projected workloads. A valid
model of an online system can be extremely
useful for predicting system performance
(as reflected in transaction response
times) as the workload varies. In particular,
the model can predict response times
as transaction arrival rates spike to
unprecedented levels.
CSIM 19 is a simulation
library which has been used to model online systems. CSIM 19 offers
many features which enable the system modeler to construct robust
models of complex and large system. "CSIM
18 -- The Simulation Engine," Schwetman, 1996 [Schw96],
has a description of CSIM and its use as a simulation engine.
While having a model can help make
better decisions regarding configuring
a system to handle a projected workload,
actually developing the different configurations
which could be used can be a difficult
task. A component which can guide the
search for the best possible configuration
has not been part of the set of tools
available to system modelers and managers.
Recent research has lead to the development
of a tool (OptQuest [GKLa96])
that can be used to guide the search
for the best possible system configuration
to achieve stated performance goals.
In addition, this tool can be used to
find the least expensive configuration
which will meet a pre-specified performance
goal. This tool is called CSIM 19/OptQuest.
The remainder of this paper demonstrates
this powerful combination of simulation
and optimization software to create
a tool which will enable better decisions
in the management of online systems.
The CSIM 19/OptQuest configuration
selection tool is described using a
model of an online computer system.
The system consists of a server node
connected to a local area network (LAN).
In this model, clients submit transactions
to the system; the transaction generation
rate is a set at 20 transactions per
second. This model does not attempt
to model any networks beyond the LAN
shown above, so the transaction response
times are just the time required inside
of the host system: from the time the
transaction enters the LAN until the
time that the last of the requested
data has left the LAN. The LAN transfer
rate is a parameter of the model.
The system has a limit
of 32 simultaneous network connections. There is a time out limit
of 10 seconds associated with these connections; if the connection
cannot be made within the time out limit, the transaction is rejected.The
system consists of one or more CPUs with shared memory and one
or more disk drives. The system processes a transaction via a
sequence of cycles, where a cycle consists of a CPU usage interval,
followed by a disk usage interval and a LAN usage interval. One
property of the system is the file cache with the corresponding
file cache hit rate, expressed as the probability of finding a
requested file in the cache. If a file is found in the cache,
all of the disk accesses for that transaction are skipped. The
amount of main memory in the system determines the file cache
hit rate.
The number of CPUs and
the number disk drives are adjustable parameters of a system configuration
as are the network transfer rate and the amount of main memory.
The ranges of these four groups of parameters are summarized in
Table 1. The File cache hit rates are shown in Table 2, and the
LAN transfer rates are summarized in Table 3.
| Table
1: Summary of Adjustable Parameters |
| Parameter |
Minimum |
Maximum |
| Number of CPUs |
1 |
4 |
| Number of disk drives |
1 |
16 |
| Memory size index |
0 |
3 |
| LAN transfer rate index |
0 |
2 |
| Table
2: File Cache Hit Rates |
| Mem size
index |
File cache
hit rate |
| 0 |
0.00 |
| 1 |
0.50 |
| 2 |
0.85 |
| 3 |
0.99 |
| Table
3: LAN Transfer Rate |
| LAN rate
index |
Transfer
rate (Mbits/sec) |
| 0 |
10.0 |
| 1 |
100.0 |
| 2 |
1000.0 |
One feature of the model
is that as more CPUs are included in a configuration, the processing
rate of each individual CPU decreases. The purpose of this is
to model the interface experienced between multiple CPU's as they
compete for access to main memory. The base CPU processing rate
is set at 250 million instructions per second (mips); the rates
for multiple CPU's are shown in Table 4.
| Table
4: CPU Processing Rates |
| Number of CPUs |
MIPS per
CPU |
| 1 |
250 |
| 2 |
240 |
| 3 |
225 |
| 4 |
200 |
Each of the adjustable
parameters has an associated cost factor, computed as the base
cost plus an incremental cost for each additional unit or factor.
The cost of configuration is computed as the sum of the base cost
factor plus an incremental cost for each additional unit or factor
included in the configuration. These cost factors are summarized
in Table 5.
| 5.Table
5: Cost Factors |
| Component
|
Base cost |
Incremental
cost |
| CPU |
0 |
1000.00 |
| Disks |
0 |
2500.00 |
| Mem. Size factor |
0 |
250.00 |
| LAN transfer rate factor |
5000.0 |
2000.0 |
Other important parameters of the model
are as shown in Table 6.
|
Table 6 - Other
System Model Parameters
|
|
Parameter
|
|
|
value
|
units
|
|
Disk time per
block
|
|
|
0.15
|
sec
|
|
Disk block size
|
|
|
512
|
bytes
|
|
Number disk blocks/trans
(min, max, mode)
|
1
|
100
|
10
|
blocks
|
|
Network block
sizes
|
1500
|
1500
|
64000
|
bytes
|
|
(10 mbps, 100
mbps, 1000 mbps)
|
|
|
|
|
|
CPU instr/disk block
|
|
|
0.50
|
mips
|
|
Transaction interarrival
time
|
|
|
0.05
|
sec
|
|
Maximum trans
reject rate
|
0.05
|
|
|
|
|
Min. processing
rate (model #1)
|
15.0
|
|
|
tps
|
|
Max cost (model
#2)
|
21,000.00
|
|
|
|
|
Number of network
connections
|
32
|
|
|
|
|
Connection time-out
interval
|
10.00
|
|
|
sec
|
CSIM 19 has a mechanism for automatically
controlling the run length for an execution
of that model [Schw97].
The mechanism allows the program to
specify a confidence level and relative
error for a parameter; it then runs
the model until the accuracy at the
specified confidence level is achieved.
In the runs reported in this paper,
the mean transaction response time was
the parameter used to control the run
length. A relative error of 0.05 and
a confidence level of 0.90 were specified
and achieved for all of the runs.
The OptQuest optimization tool has
been adapted to work with models constructed
with the CSIM 19 simulation library.
OptQuest manages a search for better
configurations of a system model. The
steps in this search can be summarized
as follows [Lagu99]:
1. Setup the search procedure:
a. Define the
configuration variables, along with
their limits
b. Define the requirements, along
with the their limits
c. Define the problem, including the
optimization criteria and the number
of iterations to be used
d. Initialize the search
2. For the specified number of iterations,
do the following steps
a. Get the next
configuration from the search manager
b. Run the system model for the specified
configuration
c. Store the values of the optimization
criteria and the requirements produced
by the mode
3. When the specified
number of iterations have been performed,
get the "best" configuration
and the corresponding values of the
optimization criteria and requirements.
The techniques used to guide the search
are based on "tabu search', developed
by Glover, et. al. [GKLa96].
The use of the combined CSIM 19/OptQuest
package is illustrated in the example
in the following section.
The first issue addressed with the
model was to find the least expensive
configuration which had a system processing
rate of greater than 15 transactions
per second and a transaction rejection
rate of less than 5% of the submitted
transactions.
The sequence of configurations
as selected by OptQuest are shown in Table 7. It can be seen that
the first configuration selected was a minimal configuration,
consisting of one CPU, one disk drive, a file cache hit rate of
0.00 and the slowest network device. The cost of this configuration
is $10,750.00, but the throughput rate is 1.87 transactions per
sec. (below the minimum requirement of 15) and the rejection rate
is 91% (above the maximum of 5%). The second configuration is
the maximum configuration, consisting of four CPUs, 16 disk drives,
the best hit rate (0.99) and the fastest network device. This
configuration cost $29,000.00 and has an acceptable throughput
rate (20.06 transactions per sec.) and a rejection rate of 0%.
| Table
7 - First Model: Sequence of Configurations
and Results |
| Run |
CPUs |
Disks |
Hit Rate |
Net Rate |
Cost |
Tput |
Resp Tm |
RejRate |
| 1 |
1 |
1 |
0.00 |
10.00 |
10750.00 |
1.87 |
17.02 |
0.91 |
| 2 |
4 |
16 |
0.99 |
1000.00 |
29000.00 |
20.06 |
0.10 |
0.00 |
| 3 |
3 |
9 |
0.85 |
100.00 |
21750.00 |
20.13 |
0.19 |
0.00 |
| 4 |
1 |
7 |
0.00 |
1000.00 |
16250.00 |
11.06 |
2.87 |
0.43 |
| 5 |
3 |
15 |
0.00 |
10.00 |
16250.00 |
19.46 |
1.61 |
0.01 |
| 6 |
1 |
3 |
0.99 |
100.00 |
20750.00 |
14.16 |
2.24 |
0.27 |
| 7 |
4 |
16 |
0.99 |
10.00 |
25000.00 |
20.12 |
0.12 |
0.00 |
| 8 |
4 |
4 |
0.00 |
100.00 |
16500.00 |
6.83 |
4.66 |
0.65 |
| 9 |
3 |
13 |
0.00 |
1000.00 |
19750.00 |
17.71 |
1.77 |
0.08 |
| 10 |
1 |
15 |
0.85 |
100.00 |
21250.00 |
14.27 |
2.20 |
0.26 |
| 11 |
3 |
13 |
0.50 |
10.00 |
18250.00 |
20.74 |
0.59 |
0.00 |
| 12 |
3 |
11 |
0.50 |
100.00 |
19750.00 |
19.69 |
0.59 |
0.00 |
| 13 |
3 |
1 |
0.99 |
1000.00 |
24250.00 |
19.99 |
0.10 |
0.00 |
| 14 |
3 |
5 |
0.99 |
1000.00 |
25250.00 |
20.69 |
0.10 |
0.00 |
| 15 |
3 |
16 |
0.00 |
10.00 |
16500.00 |
19.67 |
1.55 |
0.00 |
| 16 |
3 |
16 |
0.50 |
10.00 |
19000.00 |
20.46 |
0.53 |
0.00 |
| 17 |
3 |
15 |
0.50 |
100.00 |
20750.00 |
20.45 |
0.49 |
0.00 |
| 18 |
4 |
16 |
0.85 |
100.00 |
24500.00 |
19.96 |
0.19 |
0.00 |
| 19 |
2 |
8 |
0.00 |
10.00 |
13500.00 |
12.16 |
2.62 |
0.39 |
| 20 |
2 |
14 |
0.00 |
100.00 |
18000.00 |
18.45 |
1.70 |
0.06 |
| Best(5) |
3 |
15 |
0.00 |
10.00 |
16250.00 |
19.46 |
0.00 |
0.01 |
The search continues,
with OptQuest selecting different configurations, each leading
to different results. After 20 iterations (the number specified),
the best acceptable results are shown on the last line of Table
7. It can be seen that OptQuest was trying to trade-off disk drives
and main memory (cache hit rates). Because main memory is more
expensive than disk drives (in this model), it finally selected
a configuration with 15 disk drives and the minimal amount of
main memory, along with three CPUs and the slowest network device.
The cost of the optimized system is $16,250.00, and the throughput
rate is 19.46 tps, while the most expensive system costs $29,000
and delivers a throughput of only 20.06 tps. The optimized
solution costs only 56% of the fully-loaded system price, but
delivers 97% of the performance.
The second issue addressed with the
model was to try to find the fastest
configuration, where speed is measured
in terms of the average transaction
response time, with the requirements
that the configuration cost less than
$21,000.00 and the transaction rate
be less than 5%. The sequence of configuration
as selected by OptQuest to solve this
problem is as shown in Table 8.
| Table
8 Second Model: Sequence of Configurations
and Results |
| Run |
CPUs |
Disks |
Hit Rate |
Net Rate |
Cost |
Tput |
Resp Tm |
RejRate |
| 1 |
1 |
1 |
0.00 |
10.00 |
10750.00 |
1.87 |
17.02 |
0.91 |
| 2 |
4 |
16 |
0.99 |
1000.00 |
29000.00 |
20.06 |
0.10 |
0.00 |
| 3 |
3 |
9 |
0.85 |
100.00 |
21750.00 |
20.13 |
0.19 |
0.00 |
| 4 |
1 |
7 |
0.00 |
1000.00 |
16250.00 |
11.06 |
2.87 |
0.43 |
| 5 |
3 |
15 |
0.00 |
10.00 |
16250.00 |
19.46 |
1.61 |
0.01 |
| 6 |
1 |
3 |
0.99 |
100.00 |
20750.00 |
14.16 |
2.24 |
0.27 |
| 7 |
4 |
16 |
0.99 |
10.00 |
25000.00 |
20.12 |
0.12 |
0.00 |
| 8 |
4 |
4 |
0.00 |
100.00 |
16500.00 |
6.83 |
4.66 |
0.65 |
| 9 |
3 |
13 |
0.00 |
1000.00 |
19750.00 |
17.71 |
1.77 |
0.08 |
| 10 |
1 |
15 |
0.85 |
100.00 |
21250.00 |
14.27 |
2.20 |
0.26 |
| 11 |
3 |
15 |
0.50 |
100.00 |
20750.00 |
19.63 |
0.50 |
0.00 |
| 12 |
4 |
16 |
0.85 |
100.00 |
24500.00 |
20.47 |
0.19 |
0.00 |
| 13 |
2 |
8 |
0.00 |
10.00 |
13500.00 |
12.39 |
2.55 |
0.36 |
| 14 |
3 |
16 |
0.50 |
10.00 |
19000.00 |
19.96 |
0.52 |
0.00 |
| 15 |
3 |
13 |
0.50 |
10.00 |
18250.00 |
20.21 |
0.55 |
0.00 |
| 16 |
3 |
11 |
0.50 |
100.00 |
19750.00 |
19.90 |
0.59 |
0.00 |
| 17 |
3 |
1 |
0.99 |
1000.00 |
24250.00 |
20.35 |
0.10 |
0.00 |
| 18 |
3 |
5 |
0.99 |
1000.00 |
25250.00 |
19.76 |
0.10 |
0.00 |
| 19 |
3 |
16 |
0.00 |
10.00 |
16500.00 |
19.66 |
1.52 |
0.00 |
| 20 |
3 |
14 |
0.00 |
100.00 |
18000.00 |
18.45 |
1.71 |
0.04 |
| Best(11) |
3 |
15 |
0.50 |
100.00 |
20750.00 |
0.00 |
0.50 |
0.00 |
In the second model,
OptQuest finds that the best solution after 20 iterations cost
$20,750.00 and has an average response time of 0.50 seconds and
an rejection rate of 0%. This configuration had three CPUs, 15
disk drives (as in the configuration found in the first model),
but this configuration has more main memory (with a file cache
hit rate of 0.5).
The CPU execution times required to
get the results for the first model
are shown in Table 9. These were obtained
on a 300 MHz PC with 128 megabytes of
main memory. The total time for the
20 runs is 110.4 seconds of CPU time.
| Table
9: CPU Times (seconds) for 20 Runs
Summarized in Table 7 |
| Run |
CPU Time |
| 1 |
6.828 |
| 2 |
2.547 |
| 3 |
7.375 |
| 4 |
5.281 |
| 5 |
5.250 |
| 6 |
3.266 |
| 7 |
1.828 |
| 8 |
5.844 |
| 9 |
4.593 |
| 10 |
3.407 |
| 11 |
3.000 |
| 12 |
3.922 |
| 13 |
1.438 |
| 14 |
1.437 |
| 15 |
20.078 |
| 16 |
2.937 |
| 17 |
2.563 |
| 18 |
1.843 |
| 19 |
21.953 |
| 20 |
5.000 |
This application note has illustrated
the use of a new product, CSIM 19/OptQuest,
which can be used to find the best configuration
for a system with many configuration
options. Two selection problems were
analyzed:
1. Find the least
expensive system which meets a specified
performance requirement, and
2. Find the most powerful system which
costs less than a specified cost requirement.
This product is based on CSIM, a widely
used simulation package, and OptQuest, a well known optimization
package from OptTek Systems
that has been tailored to work with CSIM for this product.The
results from the example show that the search procedure implemented
in OptQuest leads to a rapid selection of an acceptable and good
configuration.
For more information
on this product, contact Mesquite Software at info@mesquite.com.
CSIM is copyrighted by Microelectronics
and Computer Technology Corporation
(MCC). CSIM 19 is supported and marketed
by Mesquite Software under license from
MCC. Dr. Jeff Brumfield developed the
data collection and presentations functions
including the run-length control algorithm
in CSIM.
OptQuest is copyrighted by Optimization
Technologies, Inc.
[GKLa96] Glover,
F., Kelly, J. and M. Laguna, "New
Advances and Applications of Combining
Simulation and Optimization", Proceedings
of the 1996 Winter Simulation Conference,
ed. J. Charnes, D. Morrice, D. Brunner
and J. Swain, Dec. 8 - 11, 1996, pp.144
- 152.
[Lagu99] Laguna,
H., OptQuest Callable Library for C
(Windows) Applications, Optimization
Technologies, Inc., Boulder, CO., 1999.
[Mesq97] User's Guide, CSIM 18 C++
Version, Mesquite Software, Inc., Austin,
TX, 1997.
[Schw96] Schwetman,
H., "CSIM 18 - the Simulation Engine",
Proceedings of the 1996 Winter Simulation
Conference, ed. J. Charnes, D. Morrice,
D. Brunner and J. Swain, Dec. 8 - 11,
1996, pp. 517 - 521.
[Schw97] Schwetman,
H. "Data Analysis and Automatic
Run-Length Control in CSIM 18",
Proceedings of the 1997 Winter Simulation
Conference, ed. S. Andradottir, K. Healy,
D. Withers, and B. Nelson, Dec. 7 -
10, 1997, pp. 687 692.
|