IEE 475: Simulating Stochastic Systems: 2020

Thursday, December 3, 2020

Lecture M (2020-12-01): Final Exam Review

This lecture section is a cumulative review of material from the semester and is meant to serve as a study guide for students preparing for the upcoming final exam. Topics start at modeling fundamentals (what is the purpose of a model in general) to the specifics of designing statistical experiments with stochastic simulations.

Tuesday, November 24, 2020

Lecture L (2020-11-24): Course Wrap-Up

In this wrap-up lecture, we finish the treatment of Variance Reduction Techniques (VRT's) for stochastic simulation. We cover (or review) Common Random Numbers (CRN's), Control Variates (CV's), Antithetic Variates (AV's), and Importance Sampling. The lecture ends with some brief big-picture comments about the lifelong learning process of simulation modeling.

Thursday, November 19, 2020

Lecture K2 (2020-11-19): Variance Reduction Techniques, Part 2 - AVs and Importance Sampling

In this lecture, we review different forms of Variance Reduction Techniques (VRT's) for stochastic simulation, which attempt to re-design simulation experiments to control for sources of variance and thus increase statistical power when making an estimate with a small number of replications. We start with common random numbers (CRN's) and Control Variates. We then pivot to discussing Antithetic Variates. The goal was to also cover Importance Sampling, but due to time constraints that topic will be saved for the next lecture period.

Tuesday, November 17, 2020

Lecture K1 (2020-11-17): Variance Reduction Techniques, Part 1 - CRN's and Control Variates

This lecture primarily finishes the coverage of estimation of relative performance by walking through the three different 2-sample mean tests (paired-difference t test, pooled-variance t-test, and Welch's unpooled-variance t-test) and the assumptions required to use them. Confidence intervals for each of the mean differences are defined, requiring formulas for standard error of the mean and degrees of freedom for each of the three experimental cases. We also briefly discuss how to extend this to more than 2 systems (with ANOVA's and post-hoc tests), but due to time we shift into an introduction of the related topic of Variance Reduction Techniques (VRT's). We introduce common random numbers (CRN's), but the full discussion of CRN's and Control Variates will be saved until the next lecture.

Friday, November 13, 2020

Lecture J4 (2020-11-12): Estimation of Relative Performance

In this lecture, we move from estimation of absolute performance from simulation studies to estimation of relative performance. We start with connecting confidence intervals with linear regression, as an alternative application of one-sample confidence intervals. We review the use of one-sample confidence intervals for relative performance estimation, and then we pivot to discussing the visualization of 2-sample tests with confidence intervals.

Wednesday, November 11, 2020

Lecture J3 (2020-11-10): Estimation of Absolute Performance, Part 3

This lecture continues to discuss issues related to estimating absolute performance from transient and steady-state simulations (of terminating and non-terminating systems, respectively). We continue to emphasize the importance and utility of interval estimations (over point estimates). We then move on to discuss experimental methodologies useful for steady-state simulations, particularly related to eliminating estimator bias and reducing computational time.

Thursday, November 5, 2020

Lecture J2 (2020-10-05): Estimation of Absolute Performance, Part 2

In this lecture, we continue our discussion of the use of performance estimation strategies for absolute performance (particularly in the case of transient simulation models of terminating systems). We review the sources of variation within and across replications in a simulation study, followed by a definition of common point estimators (for mean as well as quantile estimation), and then we define measures of estimator variance (e.g., standard error of the mean, SEM). That allows us to introduce interval estimation, particularly confidence interval estimation, which represents the interval of hypotheses that would fail to be rejected by a one-sample t-test. We discuss interval estimation in terms of quantile estimation as well. Finally, we conclude with a method of using constraints on interval half width to guide how many simulation replications are needed for a given simulation study.

Tuesday, November 3, 2020

Lecture J1 (2020-11-03): Estimation of Absolute Performance, Part 1

In this lecture, we continue to discuss hypothesis testing -- introducing parametric, non-parametric, exact, and non-exact tests and reviewing the assumptions behind many popular parameterized tests (like the t-test and ANOVA) and non-exact tests (Chi-square test). We then move to discuss the multiple-comparisons problem ("fishing") and Bonferroni correct. We end with an introduction to the estimation of absolute performance in simulated systems.

Thursday, October 29, 2020

Lecture I (2020-10-29): Statistical Reflections

In this lecture, we review the basics of hypothesis testing (type-I error, type-II error, statistical power) and the fundamental processes underlying hypothesis testing that create relationships among these things. We then dig deeper into the assumptions necessary for using parametric tests, like the Student's t-test, and non-exact parametric tests, like the Chi-square test (e.g., what the "continuity assumption" is with regard to the Chi-square test and the related inference).

Wednesday, October 28, 2020

Lecture H (2020-10-27): Verification, Validation, and Calibration of Simulation Models

In this lecture, we revisit some basics of hypothesis testing and then go on to introduce verification, validation, and calibration in the context of simulation models. This will ultimately move us away from goodness-of-fit tests of input models toward hypothesis tests of output performance (e.g., to detect differences from different simulations scenarios and confirm that simulations of real-world scenarios match our expectations from real-world data).

Thursday, October 22, 2020

Lecture G3 (2020-10-22): Input Modeling, Part 3

In this lecture, we continue our discussion of statistically rigorous methods for input modeling in simulation of stochastic systems. We first cover the basics of hypothesis testing, including a review of type-I error (alpha), p-values, and how they relate to critical values for goodness-of-fit tests (like Chi-squared and KS). We then review Q-Q and P-P probability plots to identify candidate families for input models from collected data. Then we discuss how maximum likelihood estimation (MLE) provides a bridge from summary statistics to mathematically justifiable choices for parameter values of the distributions we have chosen. Next time, we will discuss Chi-square testing and KS testing as applied to general probability distributions (i.e., not just as tests for uniformity).

Tuesday, October 20, 2020

Lecture G2 (2020-10-20): Input Modeling, Part 2

This is the second part in a unit on input modeling for simulating stochastic systems (stochastic simulation). In the this part, we describe how to start making sense of data collected from real-world systems. We start with an example that builds a model of a single-server, single-channel queue based on summary statistics alone and demonstrate that the resulting model is a poor fit for a realistic system. We then use a histogram to reveal insights into how the system can be re-structured to be more realistic while also requiring simpler input models. This leads into a discussion on building histograms to be maximally insightful.

Tuesday, October 13, 2020

Lecture F2 (2020-10-13): Review Before Midterm Retake

In this lecture period, we discuss student-generated questions as a means of reviewing for the upcoming midterm retake. Most of the discussion centers around the solution set from the first midterm.

Thursday, October 8, 2020

Lecture G1 (2020-10-08): Input Modeling, Part 1

In this lecture, we re-motivate the topic of Input Modeling in stochastic simulation. Input modeling is the process of choosing probabilistic models to represent realistic variation that is statistically similar to measured data even though the probabilistic models leave out the real-world details underlying that variation. This lecture focus primarily on issues relating to collecting data (when to use old data, issues related to data censoring, issues related to correlated variables, issues related to multi-modal distributions, etc.).

Thursday, October 1, 2020

Lecture F (2020-10-01): Midterm Review

Midterm review lecture for Fall 2020. Starts with an introduction to inverse-transform sampling for discrete random variables (including sampling from empirical CDF's).

Tuesday, September 29, 2020

Lecture E2 (2020-09-29): Random-Variate Generation

In this lecture, we review random-number generation and tests of uniformity and independence. We then focus on random-variate generation (for stochastic simulation) using inverse-transform sampling.

Wednesday, September 23, 2020

Lecture E1 (2020-09-24): Random Number Generation

This lecture surrounds random number generation. The topic is motivated by the need for generating samples from arbitrary random variables, which can be accomplished through transforming random numbers uniformly distributed between 0 and 1. We describe the key properties of a good pseudo-random number generator (uniformity and independence), discuss some historical random number generators, and then a more modern pseudo-random number generator. We close with descriptions of tests for uniformity (Chi-square and Kolmogorov-Smirnov) and independence (autocorrelation and runs tests).

Tuesday, September 22, 2020

Lecture D2 (2020-09-22): Probabilistic Models

In this lecture, we review the motivations for stochastic modeling in discrete event system simulation. We also review the basics of probability theory (specifically probability spaces, random variables, probability density functions, probability mass functions, cumulative distribution functions, and moments including expected value (first moment/mean) and variance). We then describe several popular continuous and discrete random variables used in input modeling for stochastic simulation.

Thursday, September 17, 2020

Lecture D1 (2020-09-17): Probability and Random Variables

In this lecture, we introduce basic concepts from probability theory that will be useful as we move toward input modeling for Discrete Event System simulation modeling. Our introduction starts with a brief acknowledgment of measure theory and then a definition of random variables, sample spaces, events, and probability measures. We cover the discrete random variable, the continuous random variable, and the related probability mass and probability density functions. We pivot to discuss cumulative distribution functions and several applications of moments (expected value, mean, variance, standard deviation, etc.). Throughout the lecture, we use the analogy of probability as a kind of weight of a set of mutually exclusive outcomes.

Tuesday, September 15, 2020

Lecture C2 (2020-09-15): Beyond DES Simulation – SDM, ABM, and NetLogo (plus post-Lab3 discussion)

In this lecture, we first discussion results from Lab 3 on Monte Carlo simulations. Then we transition to motivate other forms of simulation outside of Discrete Event System simulation, such as System Dynamics Modeling and Agent-Based Modeling. This allows for introducing NetLogo, which is the subject of the upcoming Lab 4.

Thursday, September 10, 2020

Lecture C1 (2020-09-10): Basic Simulation Tools and Techniques

In this lecture, we describe how simple computational tools such as spreadsheets can be be used not only for Discrete Event System simulation but also for Monte Carlo simulation. This provides an opportunity to describe different classes of inventory problems (newsvendor, order-up-to, etc.) that we may see again in later lectures. Ultimately, we describe how although spreadsheets can be very powerful, they are not practical to use for simulation with complex systems that may need to be reconfigured during the analysis and design process. Thus, we really use tools like spreadsheets for analysis of data from simulations which are implemented within other tools and frameworks.

Wednesday, September 9, 2020

Discrete-Event-System Simulation Modeling Terms: Guide and Example

This page serves as an archive of a resource posted on the office course's Learning Management System. It provides assistance in understanding the meaning of common terms used in Discrete Event System simulation modeling – resource, activity, entity, attribute, event, state, state variable, input, input modeling, and output/performance metric. It starts with a resource-centric perspective on the definitions of each, and then it pivots to a concrete example applying each term to a tour-bus modeling problem.

Basic Guide for Identifying Discrete-Event-System Model Components

When building a discrete-event-system simulation model, it is best to start to by identifying the resources of a the real-world system that is being modeled. Once the resources are identified, how all other components contribute becomes clearer. Try using this guide to identify each of the discrete-event-system vocabulary terms in a model. An EXAMPLE is given in the next section below.

RESOURCES: What are the limiting resources in the system that might cause delays when they are unavailable?

ACTIVITIES: How long will a resource be unavailable at one time? This duration of time (and the probability distribution that describes it) is an activity.
- ENTITIES: The entities of the system are those that move from resource to resource and either wait for resources to become available (delay) or wait within a resource for completion of the activity associated with that resource when combined with that entity. Note that within a simulation model, resources are fixed parts of a process and entities move from resource to resource.
- ATTRIBUTES: The expected length of an activity may only depend upon the resource type as well as attributes of the entities within the resource during the activity. Some entities will require more processing time than others. Some entities will require special processing based on their individual history (e.g., some entities might receive prioritization based on the attribute of the time they entered the system).
  - The length of the activity does not depend upon the state of the system outside of that resource. This is why activities are sometimes called "unconditional waits" – the waiting time distribution can be known without "conditioning" on other parameters in the system.
  - In contrast, a delay is sometimes called a "conditional wait" because the delay depends upon what is going on elsewhere in the system. For example, in order to guess the time an entity is going to wait in a queue, it is important to know all of the details about every entity ahead of it in the queue. So waiting in a queue is a delay while being serviced by the server at the head of the queue is an activity because the average service time depends only upon the server (the resource) and the entity being served.
EVENTS: The instants that mark the beginning and ending of an activity (such as the instant a resource becomes busy and the instant the resource becomes available again) are called events. When we program discrete-event-system simulations, the programming logic that executes to evolve the system forward in time is always triggered by an event; the simulation jumps from one event to another, skipping all time in between.
- In a queueing system, the end of service for one entity might also be the start of service to the entity waiting in a queue behind the first entity. We still consider the instant in between the two service durations as an event that separates one activity (the first entity's time being served) with the immediately following activity (the second entity's time being served). A resource that is never idle is still moving from one activity to another; it is not simply processing one large activity.
- STATE: At each event, the system may change its state. In fact, state changes can only occur at events. The state is a quantified snapshot of the system at any particular time (e.g., "5 resources are being used and 30 customers are waiting for service").
  - STATE VARIABLE: Within the program of the simulation, we keep track of the system state using a state variable. A group of state variables together is sometimes called a state vector.
    - These state variables are discrete (or discrete-time) because they only change at countable instants of time (the events of the discrete event system).

INPUTS and INPUT MODELING: Note that the same simulation model can have very different outputs if the probability distributions associated with each activity change (e.g., if some activities are made to be slower or faster than others or even if the distribution of times changes its shape). For this reason, the probability distributions of the activities are called the inputs of discrete-event-system simulation models, and the choice of probability distribution to best match reality is known as input modeling.

OUTPUTS and PERFORMANCE METRICS: To determine the total amount of delay that an entity will have to experience during its lifetime while waiting for resources to become available, the system must be simulated. Whereas activities can be estimated from data about individual components of a system running in isolation, delays have to be studied experimentally by running a system with different strings of entities entering and leaving the system. So activities are inputs (i.e., determined by technical information known before the simulation) and delays are outputs. Furthermore, to estimate how well a system is performing, statistics must be taken across a large number of entities and/or over an ensemble of simulation runs. During a simulation run, data may need to be aggregated over simulation time in order to calculate a performance metric (e.g., average delay experienced by all entities during the simulation run). The variables in the program that aggregate this information over time may look superficially like state variables, but they are not considered to be state variables because they are not a snapshot at any instant of time; instead, they are a compressed record of data from all time before. For example, in order to calculate the average time spent in a system for an entity type in a queueing system simulation, it may require adding up the time spent in a system over all entities as they exit the system; that summation variable will have to be stored in the program and accessed each time an entity leaves the simulation, but it does not capture the state of the system and it will not be used to guide simulation logic. State variables can trigger simulation logic whereas performance metrics only keep track of information that can be used later to assess how well the simulated system is performing.

SPECIAL NOTES: Real DES models are complex connections of multiple resources together, with entities potentially flowing back through resources that have previously been used. Furthermore, although activity durations are generally determined only by the resource type, entity type, and entity attributes and not by the state of the system, real-world operations may require introducing some state-dependence into the activity distributions. For example, each individual cashier may work more quickly when it is clear that many customers are waiting in a queue. The key point is that the activity duration is determined by statistically consistent properties local to the resource. The major difference between an activity and a delay is that an activity can be programmed ahead of time whereas a delay can only be measured after a simulation is run.

EXAMPLE: Discrete-Event-System Modeling of a Tour-Bus Operation

RESOURCES: The city touring company only has two tour buses. When customers arrive, they have to wait on a tour bus to be available. So the tour buses are the resources (and the customers are the entities).

ACTIVITIES: The two resources are used in slightly different ways. Bus 1 is used for 1-hour tours of a small section of the city (each bus-1 tour activity is roughly one hour long). Bus 2 is used for longer 2-hour tours and special events where customers can rent the tour buses for longer periods of time (bus-2 tour activities are 2-hours long for some customers or, in the case of a special set of customers, as long as the customer would like).
- ENTITIES: The customers who arrive to take tours are the entities because they are the ones who potentially have to wait for the resources to become available, and they will move to use a resource (take a tour) and then move on to other things (leave the system). They move from resource to resource while the resources (the buses) stay within the system to become available for new entities later.
- ATTRIBUTES: The desired tour type is an attribute of the customer (an entity). A customer might come for a 1-hour tour, a 2-hour tour, or be part of a special party. In the case of a special party, where the numerical value of the tour duration is an attribute of the customer.
EVENTS: A tour (an activity) could potentially start when a customer arrives. Even if a tour has already started and no buses are available, the number of customers waiting will change when a new customer arrives. So customer arrivals are events. Likewise, the instants when tours end are also events because they mark the end of activities.
- STATE: At a given instant of time, the state of the system might be that there are 5 customers waiting for 1-hour tours, 2 customers waiting for 2-hour tours, and a special party waiting on a 5-hour tour, all while both buses are currently busy providing other tours.
  - STATE VARIABLE: When we simulate the tour-bus system, we would have variables that keep track of the number of customers waiting for 1-hour tours, and the number of customers waiting for 2-hour tours, and the number of customers waiting on special tours, and the number and type of buses that are busy. Knowing these state variables would allow us to program the logic that must be triggered at each event.

INPUTS and INPUT MODELING: The inputs to the tour-bus system are the probabilistic models of the activity durations – the distribution of durations in between customer arrivals of each type, and the distribution of how long each tour type actually takes (because there will be variance around the average tour duration). Choosing these distributions so as to match a real tour-bus system is the process of input modeling.

OUTPUTS and PERFORMANCE METRICS: Each customer who arrives at the touring company has to wait in line for their preferred tour type, take the tour, and then exit the system. Whereas the length of the tour (once on the bus) is known ahead of time, the length of the wait before getting on the bus can only be known from simulating a system with a potentially long line of customers before the focal customer. So that customer's waiting time (or that customer's total time) can be viewed as an output of the system because it can only be known after simulating (whereas the tour duration activity is an input because it can be predicted without knowing how many customers are waiting in line). The operator of the touring company might care about the average (or maximum) waiting time across all customers. This average is a performance metric as it uses data accumulated over multiple customers within a single simulation run (where the customers arrive at random times and have a random preference of tour type). Calculating this performance metric may require introducing variables into the simulation programming that accumulate information from customers as they leave the system. Performance metrics can also be taken across ensembles of simulation runs where each run produces a different string of outputs because the activity durations are drawn randomly (although each activity duration comes from a consistent statistical distribution).

SPECIAL NOTES: In a real-world tour-bus system, some fraction of customers who leave one tour might immediately wish to wait for a new tour (e.g., after a 1-hour tour of one part of the city, a customer might want to start a 2-hour tour of another part of the city). So even a system as simple as a tour-bus operation might realistically be modeled as a network of multiple kinds of resources with different entities following different routes. The key performance metrics could be the average (or maximum) amount of time that any customer waits for a bus, or possibly the average (or maximum) amount of time that any customer spends in the system while both waiting for a bus and on a tour. These performance metrics cannot be calculated without simulating the delays incurred while waiting for resources to be available. That's the key difference between activities and delays – the duration of a tour can be estimated without running a simulation, but the time for one customer to wait on a tour bus to become available depends upon the customers that arrived before, and thus it requires executing the simulation. Activities are inputs, while delays are outputs. Performance metrics aggregate delay information across many customers and/or many simulation runs (where each simulation run has a different random set of delays as outputs).

Tuesday, September 8, 2020

Lecture B3 (2020-09-08): DES Examples II (and post-lab discussion for Lab 2)

In this lecture, we first spend some time to review the results of Lab 2 – a lab focused on the hand simulation of an inventory-management problem. That provides a discussion of statistical blocking and common random numbers and how they are related to paired t-tests. It also provides a brief motivation for Common Random Numbers (CRNs). Ultimately, we pivot back to talking about hand simulation in general and introduce how to use a spreadsheet to execute a discrete-event system simulation using relationships between entities (as opposed to manually executing state transitions at each simulated event).

Thursday, September 3, 2020

Lecture B2 (2020-09-03): DES Examples I

This lecture continues our introduction to Discrete Event System simulation (DESS). We cover examples of different ways to choose entities, resources, and activities for systems based upon the operations research question of interest. We then further describe the process of hand simulating a DES simulation model.

Tuesday, September 1, 2020

Lecture B1 (2020-09-01): Fundamentals of Discrete-Event Simulation

In this lecture, we continue the introduction to Discrete Event System (DES) simulation fundamentals. This includes revisiting the differences between entities, attributes, resources, state vectors, and output metrics. We discuss how "stochastic" modeling uses randomness to simplify building simulation models by substituting fine-grained deterministic details with random characterizations that have similar variability. With this in mind, we describe activities as the "inputs" to systems (and "input modeling" as a process of choosing the right probabilistic distributions for those inputs) and delays as the "outputs" of systems. Performance measures aggregate over the outputs of many simulation runs, which are required because each simulation run has a variable output. We close the lecture with an introduction to the event-scheduling world view, where simulations move from event to event, using logic at each event to schedule new events in the future on the "event calendar."

Thursday, August 27, 2020

Lecture A2 (2020-08-27): Introduction to Simulation Modeling

In this lecture, we introduce the three different kinds of simulation modeling (system dynamics modeling, agent-based modeling, and discrete event system simulation) and how they differ in the kinds of questions they help answer, the way they are programmed, and the computational resources that they require. We then introduce the fundamental concepts required for discrete event system modeling and start to discuss aspects of stochastic simulation and input modeling.

Tuesday, August 25, 2020

Lecture A1 (2020-08-25) - Introduction to Modeling

This lecture provides an introduction to modeling and how simulation is used within industrial engineering and operations research to gain insights into complex socio-technological systems.