Introduction
A manufacturing or service system can be studied as a Markovian model because of random factors such as customer order time or amount, job processing time, and/or the reliability of machines or servers.
Buzacott and Shanthikumar (1993) proposed a systematic treatment of modeling stochastic manufacturing systems. For stochastic service systems, a typical example is a large call center.
Koole and Mandelbaum (2002) surveyed several stochastic models on telephone call centers. Both types of stochastic systems can be modeled as a waiting line or a queueing model of Markovian type. The main distinction between a manufacturing and a service system is usually the scale (or the size) of the system, which is measured by the number of servers. Manufacturing systems are typically involved with a small number of servers, whereas service systems, such as call centers, have a large number of servers. Such a difference entails different modeling approaches. In this paper, we briefly review the modeling hierarchy for analyzing stochastic manufacturing or service systems. We consider three levels of modeling–micro-, meso-, and macro-scopic models–and identify the advantages and disadvantages of each model type.
This paper is structured as follows: Section 2 defines the three levels of modeling and presents the related literature. Section 3 provides typical example of each modeling type in detail (for full details, we refer to the related literature). Section 4 concludes the paper.
Stochastic modeling hierarchy
The main approach to modeling a stochastic system is to formulate a Markovian process for the state variable of interest. For example, a production system in a make-to-order mode is a typical setting. Assuming that the production process for a product is triggered by a randomly arriving customer order, such a random arrival process is modeled as a Poisson process. If the production time of completing the ordered product is also random, the model generally has independent and identically distributed (i.i.d.) service random variables. Considering a production system with c facilities (machines), we can model the process of fulfilling these orders as a basic M/G/c queue. We generally call each service request as a “customer” and the service provider as a “server”. Numerous factors, such as customers with different priorities, servers subject to failures or absence, customer reneging or balking, and customer retrial or feedback, can definitely be considered. We intend to model the details of the system dynamics of this system type and treat the queue length or the waiting time as the main performance measure. Therefore, as long as the i.i.d. service times and/or Poisson process arrivals can be justified, we can obtain the stationary distribution of the queue length or waiting time by developing a Markovian process.
Considering that the system’s scale is small (only a few servers) and the details of the system dynamics are desired, we call this type of queueing system as a micro-scopic model. This type of models is mainly continuous-time Markov chains (CTMCs). The main approach is to generalize the exponential distribution to phase-type (PH) distribution to model the time interval and to generalize the Poisson process to Markov arrival process (MAP) to model the counting (or arrival) process. With these two generalizations, we can theoretically model the G/G/
c queue with any desired accuracy if the number of
c servers is not large (small-scale system) because a PH-distributed random variable can estimate any arbitrarily distributed non-negative variable (like service time) and MAP can approximate any arbitrary counting process (like customer arrivals) (
Neuts, 1981;
He, 2014). The cost of using this approximation approach is the increase of the state space of the CTMC. For example, the basic birth-and-death (BD) process for M/M/
c queue will be extended to quasi-birth-and-death (QBD) process for MAP/PH/
c queue. Although the dense property of PH distribution and MAP enables us to develop the CTMC for a G/G/
c queue with any desired accuracy, the “curse of dimensionality” or the explosion of the state space will make the model an NP-hard problem for a system with a large number of servers. However, given that most manufacturing systems are of small scales, the micro-scopic model still works well and produces detailed performance measure, such as the stationary distribution of the queue length for these stable systems.
As mentioned earlier, service systems such as call centers are usually of large scale with hundreds or thousands of servers. Clearly, the micro-scopic model does not work. The main issue in such a system is on how the system can be staffed to ensure that sufficient level of customer service can be achieved. One feature of this type of systems is that the traffic is heavy and most of the time servers are busy or even overloaded with customer abandonments (this condition is in sharp contrast to a manufacturing system receiving customer orders). Therefore, the adoption of heavy-traffic methods for the approximation of performance measures is an appropriate approach. Another characteristic of the service system is the time-varying arrival rate. If the system manager is only concerned with the average (first moment) performance measure such as mean queue length or mean waiting time, then the macro-scopic model with appropriate time and space scaling, also called fluid model, is appropriate. However, for a one-stage queueing system (like M/G/
c type), the fluid model does not capture the random variability of the system dynamics. For a stable system (the arrival rate is less than the service rate), the fluid limit becomes zero; for an unstable system (the arrival rate is greater than the service rate), the fluid limit becomes a linear function of time with slope of
, where
and
are the arrival rate and service rate, respectively. To capture the randomness of the queue length process or to stochastically refine the fluid model, the meso-scopic model needs to be considered by using the diffusion space scaling. This two-level modeling hierarchy was first proposed and discussed by
Chen and Mandelbaum (1994). Heavy-traffic conditions can be identified by taking the limit via space and time scaling.
Whitt (2002) presented different modes of determining the heavy-traffic limit.
The advantage of using macro- or meso-scopic models is that the approximation is increasingly improving when the system scale is becoming large and the traffic load is becoming heavy. These models will thus complement the micro-scopic models and are appropriate in analyzing large-scale service systems such as call centers or large queue networks.
Examples of micro-, meso-, and marco-scopic models
A micro-scopic model of a production system
As mentioned earlier, the micro-scopic model is appropriate for analyzing manufacturing systems with a small number of servers. We present a practical example in hot-rolling process for slabs in the steel industry. When a customer order arrives, it will trigger several slabs (order size) to proceed to the hot-rolling process. Given that order arrivals are random, the requests for the sets of slabs for the hot-rolling process (called jobs or customers) are also random and can be modeled as a Poisson process with arrival rate
. We only consider the first stage of this hot-rolling process which is “pre-heating”. These jobs triggered by customer orders will thus be pre-heated by a few pre-heating furnaces. Real data analysis (
Jia et al., 2017) reveals that pre-heating times follow a PH distribution (best-fit distribution). Owing to the limited nature of this operation, the process can be modeled as an M/PH/
c-type multi-server queue. The pre-heating system usually consists of only two to four furnaces (e.g., expensive and large facilities in Shanghai Baogang Steel Company). We can use the micro-scopic model to analyze this pre-heating operation and estimate the expected delay time of customer order and other related performance measures. The system can be formulated as QBD process if we define the state space as follows:
Define:
: The number of customers in the system at time t;
: The phase of the PH service time at time t for server i.
Assuming that the service time’s PH distribution is represented as with m phases, the system state space is where ; ; . We also assume that the PH representation of the service time is and with where S is a transition rate matrix among the m phases (for example, from State i to State j, the transition rate should be ) and e is the column vector of 1’s (all elements are 1) with appropriate dimension.
Figure 1 shows the state transition diagram of the QBD process for this operation, and Fig. 2 presents a specific example with three servers (furnaces). In these diagrams (without showing the service time phase), we use the first number in the state node to represent the number of customers in the system and “0” to represent all servers that are on (it can be set to 1 to represent a model with changeable staffing levels as in
Jia et al. (2017)). Each server node below each state node represents the service phase of each busy server.
We present more details about the QBD model for a three-server case.
The infinitesimal generator can be written as follows:
where
is the Kronecker multiplication operation. With this model, we can obtain the stationary queue length distribution as a matrix geometric solution (
Neuts, 1981). In this model,
, where
and
is the stationary probability vector for level
i. If we have
D levels of boundary states, then we can find stationary distribution as
,
, where
R is the rate matrix which is the minimal nonnegative solution to the equation quadratic matrix equation
, which can be numerically solved by using iteration algorithm (See
Neuts, 1981).
The boundary state probability vectors can be obtained by solving a finite number of equations and the normalization condition
. In this case, the rate matrix can be computed by using several alternative algorithms available (
Neuts, 1981;
Latouche and Ramaswami, 1999). Many important performance measures, such as expected queue length and the probability that the queue length exceeds a critical level, can be obtained with the queue length distribution. We can also handle both low-traffic and heavy-traffic situations. By contrast, the meso- or marco-scopic models may fail to produce useful performance measures for low-traffic conditions. However, the disadvantage is that the size of the state space is exponentially increasing with the number of servers. That is, for a
c-server system with PH-distributed service time of
m phases, the QBD process will have
mc states. Dealing with a large number of servers for micro-scopic models is thus like an NP-hard problem.
For a queueing model developed for the steel industry, we aim to determine the optimal number of servers. If we turn on an excessive number of reheating furnaces, then the productivity will be increased, but the company will also have the idle cost, which will increase the total cost of the hot rolling process. By contrast, when the number of reheating furnaces is excessively small, the production cost will be reduced, but the order waiting time will increase or the waiting cost will be high. Therefore, a tradeoff between the furnace idle cost and the order wait cost must be considered to determine the optimal number of furnaces. The micro-scopic model can help decision makers by providing the performance information of the system.
Jia et al. (2017) presented a sophisticated model based on the M/PH/
c in the steel industry.
Meso-scopic and macro-scopic models for a service system
Although the PH distribution and MAP turn several general stochastic systems into Markovian ones, the micro-scopic model can only treat small-scale systems such as a single-stage queue with only a few number of servers. For large-scale service systems, the micro-scopic model is impractical to use because of the “curse of dimensionality” mentioned in the previous section. A large-scale system usually implies a single-stage queue with a large number of servers such as call centers or a queueing network with many service stations of either a single server or many servers. Furthermore, customers can be either of single or multiple classes, and servers can be either homogeneous or heterogeneous. For these large-scale systems, we have to adopt the meso- or marco-scopic models. In this subsection, we provide two simple but typical examples.
The first example is to develop the meso-scopic model for a single-stage service process via reflected Brownian motion (RBM)-based approximations. In this model, the building block is the G/G/1 queue. After approximating the performance measure of G/G/1, we extend it to G/G/
s system. Let
be the number of customers that arrived from time 0 to
t, and
the number of customers the server would process from time 0 to
t if the server was busy during (0,
t). Given that the interarrival times and service times are i.i.d., the processes
and
are renewal processes. Therefore, for a large
t,
is estimated to be normally distributed with mean
and variance
, where
is the squared coefficient of variation for the interarrival time. In addition, for a large
t,
is normally distributed with mean
and variance
, where
is the squared coefficient of variation for the service time. Approximation (for some large constant T) of
and
by Gaussian processes, in particular, Brownian motions (BMs), is practical. This approach is a reasonable approximation and can be justified rigorously by appropriately adopting a scaling argument as done by
Chen and Yao (2001). A stochastic model based on BM or generally a diffusion process is called a meso-scopic model as it is an approximation under heavy-traffic condition but retains the random variation component. We first present the relevant performance measures for the queue in terms of
and
.
Let denote the number of customers in the G/G/1 queue at time t with , a given finite constant number of customers initially. To present in terms of and , the duration in which the server was busy and idle during (0 t) needs to be determined. Thus, let B(t) and I(t), respectively, denote the total time the server has been busy and idle from time 0 to t. We emphasize that the server is work conserving, which means that the server would be idle if and only if the system has no customers. This condition is equivalent to . Thus, we can present an expression for as . By using the centering, we can re-write the expression as where and . Computing the expected value and variance of U(t) for large t yields , because U(t) is approximated by a drifted BM via Donsker’s theorem. Although E[U(t)] is exact for any t, Var[U(t)] is reasonable only for a large t, that is, in the asymptotic case. and can be specifically determined by satisfying the following conditions:
Furthermore, they can be expressed in terms of U(t) as follows:
The proof of these relations can be found in the work of
Chen and Yao (2001). On the basis of the characteristics of this result,
is called the reflected process of
U(
t) and
V(
t) is the regulator of
U(
t). The
process is then a corresponding RBM. Using heavy-traffic approximations for queueing process, we focus on the workload process (which is inherently continuous) as an RBM. Considering that the RBM is a continuous function of the BM, according to continuous mapping theorem, we can conclude that
will approach the queue length process limit. By denoting
W(
t) as the workload at time
t,
Chen and Yao (2001) used the approximation to relate the workload in the system to the number in the system. Evidently, if
is an RBM with initial state
, drift
, and variance
, then
is also an RBM with initial state
, drift
, and variance
when
. For the G/G/1 queue described earlier, considering that we approximated the workload process
as an RBM, we have the expected workload in steady state as
, which is the approximation to the expected wait time of a customer. Other performance measures such as
Lq,
W, and
L, can then be obtained by using
,
and
. Such an approximation only needs the mean and variance of the interarrival time and service time instead of the entire distribution. This G/G/1 queue approximation can be extended to the multi-server case or G/G/
s queue. Under the first come first serve (FCFS) discipline in a stable system, the average total system time can be approximated by
where
. This approximation is appropriate for G/M/
s systems and can be proved using fluid and diffusion scaling (
Halfin and Whitt, 1981). In addition, an empirical approximation for G/G/
s queues (originally developed for M/G/
s queues) is
where
should be chosen such that
Hence, we briefly describe the diffusion approximation to a multi-server queue under heavy-traffic condition (extensive literature is available in this area;
Whitt, 2002). The main idea is to obtain the stochastic process limits as approximations of the actual stochastic processes. These limits are either deterministic processes called fluid limits or stochastic processes called diffusion limits that use appropriate scaling methods for time and space. Clearly, fluid limits correspond to macro-scopic models and diffusion limits correspond to meso-scopic models. We start with the fluid limit scaling. Let
A(
t) be the number of arrivals into a system during (0
t). The average arrival rate is
l, which is expressed as
based on the strong law of large numbers (SLLN).
To obtain the fluid limit of the discrete arrival process , we define
for any
n>0 and
t≥0. We show that as
,
, which we refer to as the fluid limit. This result can be seen as the functional strong law of large numbers (FSLLN), a generalization of the SLLN. Similarly, we define
as
for any
n>0 and
t≥0. We would like to study
as
, which we will call diffusion scaling. Such a scaling is similar to magnifying the fluid space scaling by a factor
after centering. In this way, some random fluctuations around the center can be captured.
converges to a BM with drift 0 and variance term
as
. Such a result is called the FCLT, also called Donsker’s theorem, a generalization of central limit theorem. Similar to the arrival process, the service time process when scaled in a similar manner also converges to a BM. Thus, approximate expressions for the distribution of waiting times or system times can be obtained using the diffusion approximation when the traffic intensity is close to one (heavy-traffic approximations).
The main goal in a diffusion approximation is to obtain the transient or steady-state distribution of a stochastic process
(this value is usually the number in the system process
but we will keep it more generic in this case). For a detailed description, see
Gautam (2012). The process is scaled by a factor “
n” across time and
across “space”. Thus, we define
The term
is the deterministic fluid model of the stochastic process
by fluid scaling. Usually,
or is a heuristic approximation for it, indicating that the fluid model is focused on the first moment value. Therefore, the fluid model is called a macro-scopic model. Assuming that the deterministic fluid limit
exists and can be computed, the meso-scopic model is mainly intended to study
, which captures the randomness feature of the process. The application of appropriate space scaling reveals that as
, the stochastic process
converges to a diffusion process. A diffusion process is a continuous-time stochastic process with almost certainly continuous sample paths and satisfies the Markov property. Examples of diffusion processes are BM, Ornstein-Uhlenbeck process, Brownian bridge process, and branching process. The diffusion process can be expressed as
. For detailed information, see
Whitt (2004). The key idea of diffusion approximation is to start by using the properties of
such as the stationary distribution of
. For large
n,
is approximately equal in distribution to
. We can thus estimate the distribution for
using
such that
.
We use an M/M/
s queue to illustrate the development of meso-scopic model first.
Whitt (2004) has presented general cases such as the G/G/
s or M/G/
s queue. For M/M/
s queues, the Markov property leads to diffusion processes. However, in the G/G/
s queue, although the marginal distribution at any time in steady state converges to Gaussian, the process itself may not be a diffusion process (because the Markov property would not be satisfied). For such an M/M/
s queue, let
X(
t) be the number of customers in the system at time
t. We are interested in applying diffusion scaling to the stochastic process
. Further, we define
for any
n>0 and
t≥0. As a heuristic approximation for the fluid model, we use the well-known steady-state system size
L for the M/M/
s queue
Given that , for large (nt), we use the approximation . We can thus study the diffusion scaled process as
by increasing
n and show that this process converges to a diffusion process under three different scalings for
n.
In the first scaling, we consider a sequence of M/M/
s queues where
m and s are held constant and only
l is increased to ensure that
r approaches 1. We can use the scale
to ensure that
n increases as
r increases. This condition is a special case of general scaling called by
Halfin and Whitt (1981) as the regime.
It can be shown that process converges to a diffusion process as n is scaled. This scaling can be used when the system has high traffic intensity but not a large number of servers.
The next scaling is to fix r but increase l and s. In this scaling, we consider a sequence of M/M/s queues where m and r are held constant but l and s are increased to ensure that s approaches . We use the scale n = s to ensure that n increases as s increases. Again, it can be shown that
process converges to a diffusion process as
n is scaled. This scaling would be more powerful if the scaled time is also available, that is, plotted
instead of
. This scaling could be used when the system has a large number of servers but not a high traffic intensity.
The third scaling is based on
Halfin and Whitt’s (1981) regime. We use the scale
to ensure that
n increases as
s increases and let
in such a way that
is held constant where
. Under this scaling,
process converges to a diffusion process. This scaling would be more powerful if we were to have scaled time as well, that is, plotted
instead of
. This scaling is practical to use when the system has both high traffic intensity and a large number of servers (which is typical in inbound call centers). Such a system is called quality and efficiency driven. Evidently, the first two scalings are special cases of this scaling. Clearly,
converges to zero because
would be
L.
Halfin and Whitt (1981) presented the probability behavior for
as
and
. If we define
and use the scale
, then
, where
q is a constant satisfying 0<
q<1. We provide an intuition for the first one and subsequently describe
q. For
, the process converges to an RBM with negative drift (because for
, the CTMC is a BD process with constant parameters
l and
sm, which is a random walk that converges to a BM upon scaling). However, for
, the process converges to an Ornstein-Uhlenbeck process (because for
, the CTMC is a BD process with parameters
l and
, which is a random walk that converges to an Ornstein-Uhlenbeck process upon scaling). Such a property also holds for G/M/
s queues. Therefore, the probability in which
can be obtained as
and
. Thus, under the scaling
,
where
j(
x) is the probability that a standard normal random variable is less than
x. Thus, with probability
q, an arriving customer in steady state will experience a delay. Notably,
q can be obtained with the other two scalings considered earlier. In those cases,
q would only be 0 or 1. In particular, if we fix
s and increase
l, then
q approaches 1 (that is, an arriving request with probability 1 will be delayed for service to begin). Such a case is said to be in efficiency driven (ED) regime. By contrast, if we fix
r but increase
l and
s, then
q approaches zero (that is, an arriving customer with probability 1 will find a free server). This case is said to be in quality driven regime. Aside from these probability measures, we can also obtain the expected queue length for either the stationary or transient system (
Halfin and Whitt, 1981). A major advantage of the meso-scopic model is that we can obtain major performance measures for large-scale service systems consisting hundreds of servers under heavy-traffic conditions. These systems are practical in the service sector.
Next, we discuss the macro-scopic models or fluid models. The fluid model itself is useful in studying the stability of networks and analyzing queues with time-varying arrivals. For a traditional single-stage multi-server queue, because of the first moment nature, the stable, critically loaded, and overloaded cases correspond to simple queue length results of zero, the same as the initial content, and linearly increase with time, respectively (
Whitt, 2002). Thus, to provide an interesting case, we consider a multi-server queue with a special feature, namely, overloaded with customer abandonment behavior. We use a Markovian M/M/
s/
r+
M queue as an example to demonstrate the benefit of using the macro-scopic model or fluid deterministic model, in which
r+
M represents a finite wait room
r-
s and the time to abandon is exponentially distributed with rate
. The analysis is based on
Whitt (2004), and we only present a few simple fluid model results. Overloaded queue is also called a queue under the ED regime, as mentioned earlier. Consider a sequence of M/M/
s/
r+
M queues indexed by
s, which is the number of servers and we would use to scale. In particular,
and
are the scaled arrival rate and system capacity, respectively. However, the service rate
m and abandonment rate
are not scaled. In addition, the traffic intensity
r is not scaled and remains fixed for the entire sequence of queues with
r>1.
Define
where
to ensure that asymptotically no arriving customers are rejected due to a full system (
Whitt, 2004). With this scaling, we can obtain the meso-scopic model by using the diffusion scaling as presented above. Let
be the number of customers in the system at time
t when
s servers are used. We define the diffusion term as follows:
where
is a deterministic marco-scopic model of
. We use a heuristic approximation for the deterministic quantity
, which is where
tends to linger around in the ED regime. In particular, we select
as an “equilibrium” point where the system growth rate equals the shrinkage rate. Thus, we have
as the solution to the flow balance equation
by estimating that
must be greater than
s (as
results in
for any
). We have the following:
where the last equality is achieved by using the expressions for
and
q. Thus, we represent the diffusion term as
Whitt (2004) showed that the stochastic process
as
converges to an Ornstein-Uhlenbeck diffusion process. In this case, we focus on the fluid model only. Let
be the expected steady-state number of customers waiting in queue. Under heavy traffic conditions, for large
t, we have
. Thus,
. We can also obtain the abandonment probability, which is denoted by
. The relation
shows that
. Such simple performance measures indicate the advantage of using the macro-scopic model. We only present a small sample of the fluid model results in this case. Detailed studies on fluid models can be found in
Whitt (2004;
2005;
2006). In theory, when the scaling conditions (i.e., heavy-traffic condition) are satisfied, the meso- and micro-scopic models must generate acceptable approximations. In practice, we can use simulations to verify the accuracy of using these approximations.
In contrast to these meso-scopic and micro-scopic models, the micro-scopic model cannot handle a system with a large number of servers and its performance measures can be obtained only by numerical approach.
The examples in this section demonstrate the strengths and weaknesses of each type of modeling in this stochastic modeling hierarchy. Different levels of models complement one another to better represent the stochastic manufacturing and service systems in practice.
Conclusions
We present different levels of modeling stochastic systems subject to congestions (queues). Micro-scopic models are practical in analyzing manufacturing systems with a small number of production facilities (servers). Using PH distribution and MAP to model the random service times and the customer arrivals makes continuous-time Markov chain flexible in modeling a real-world system. In theory, the PH distribution can approximate any non-negative continuous distribution and the MAP can approximate any non-renewal process. In practice, this condition implies that real data can be used to estimate the parameters of PH distribution and MAP to make the model fit the real system well. Another benefit of using the micro-scopic model is that we can obtain the stationary distribution of the system size. However, this level of modeling is not appropriate for analyzing large-scale systems, such as a call centers with hundreds of servers or a queue network with many nodes or service stations. For these large-scale service systems, we can rely on meso-scopic and macro-scopic models. For these two levels, we consider the scaled processes of system size or customer waiting time and obtain the performance measures based on FLLNs and FCLTs. The macro-scopic model is the deterministic fluid model. The meso-scopic model becomes the diffusion model by adding the stochastic refinements to the fluid model. With the meso-scopic model, we can obtain probability-based performance measures as the macro-scopic model only provides first moment-based performance measures. Given that these two higher levels of modeling are based on stochastic process limits, they can be used as powerful tools to study large-scale service systems. In addition, several closed-form formulas for major performance measures may be obtained. Finally, the meso- or macro-scopic models can be used to analyze not only stationary systems but also transient systems. In practice, the appropriate level of modeling should be selected depending on the situation.
The Author(s) 2017. Published by Higher Education Press. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0)