Sample size is super important in statistics. It is hard to make a generalization about behavior from one person. For example, I like horror books, but that doesn’t mean everyone does. And sometimes it is just too impractical to run experiments to collect data, like rolling a die 500 times. To get around this problem, statisticians and students create **simulation statistics**.

## Is Simulation Real?

Simulation statistics is using artificially generate data in order to test out a hypothesis or statistical method. Whenever a new statistical method is developed or used, there are assumptions that need to be tested and confirmed. Statisticians use simulated data to test them out.

There are several advantages to using simulated data. First, it is cheap because it uses random numbers generated rather than data that are collected. Second, it is much faster than traditional data collection, so tests can be run more quickly. Best of all, if the hypothesis or model is pretty solid, then the results of simulation statistics can approximate real results.

The best part of simulation statistics is also one of its disadvantages. It only approximates real-world results, which indicates a little grain of salt you have to consider when running the data.

## How Do I Do Simulation Statistics?

Although different statistical tests require a slightly different method to generate simulation statistics, all simulation models follow the same general seven steps. Let’s take rolling a single, fair die as an example. How likely is it that I roll a six?

### 1. Define Outcomes

Before we even move the mouse on our data generator, we have to define which outcomes we expect. In our example, we can get any one of six outcomes, A = {1, 2, 3, 4, 5, 6}

### 2. Calculate the Probability of the Desired Outcom

In many cases in statistics, a probability can be tricky to calculate. In all cases, we have to come up with a probability of the desired outcome. Based on our question, the desired outcome is rolling a 6. To calculate that probability, I take the number of desired outcomes and divide by the number of total possible outcomes.

P(6) = 1 ÷ 6 = 0.16

### 3. Generate Random Numbers

In the very best statistics, randomness is actually desirable because if there is a pattern, then it sticks out. To simulate data, we come up with values that are randomly generated within our parameters. You can choose any size dataset to create, but for the sake of expediency let’s generate a measly 500 data points using a random number generator.

### 4. Choose a Value

Now you need to observe the random numbers and record how many times the desired outcome occurs.

### 5. Analyze the Data for Patterns

At this point, you have your data set and you have noted the number of desired outcomes. It is now possible to calculate the empirical probability.

In the case of your simulated data, the number 6 occurs 73 times. If you calculate the empirical probability, you get

P(6) = 73 ÷ 500 = 0.17

This is remarkably close to our theoretical probability of 0.16. This simulated finding confirms our theoretical finding. This means that you really do only have a 1/6 chance of getting a 6 when rolling a fair die.

## The Takeaway

Simulated statistics is a quick, efficient, and cost-effective method of gathering and analyzing data. There are advantages and disadvantages to using simulated data, such a the ability to only approximate real results. Though there are different simulation techniques, the overall process is the same for simulation. I hope to see any quesitons you have below. Happy statistics!

## Comments are closed.