What is a **stochastic process**? Put simply, a stochastic process describes the movement of a random variable through time. The random variable could be the closing price of a stock, the financial status of a gambler playing roulette, the position of a gas particle moving through a fluid, or the sum of a series of dice rolls. With each of these processes, even if we know the value of our random variable **now** (i.e., the Dow Jones closed yesterday at 23,857.71; the gambler playing roulette is currently up $1,000; the sum of two rolls of a die is 9), we cannot know for certain what value these variable will take at some future time *t*, since each of the processes described involve some degree of randomness. Indeed, the word “stochastic” comes from the Greek *stokhastikos*, which means “to aim at; guess.”

With a stochastic process, we don’t know exactly where our variable will be tomorrow, or the next day, or the next week. But if we know the position of an object today, and we know something of the possible moves it can make, and the probabilities for each possible move, we can draw some conclusions about the process, and determine which future positions are most probable.

### Simple stochastic process: the random walk

Perhaps the simplest stochastic process is the **random walk**. Let’s imagine the following scenario.

Bill and Amy are bored and decide to gamble, because, well, why not? They devise a game in which they flip a coin, and if the coin lands heads, Amy gives Bill a dollar. If the coin lands tails, however, Bill gives Amy a dollar. They flip the coin 100 times, and the coin lands in the following order:

HTTTTTTTHHHHHTTHHHH…. And so on.

Let’s now imagine we’re Bill, and we’re considering our profit or loss after a given number of flips of the coin. From the above sequence of flips, our profit and loss would vary over time, with each flip of the coin changing our financial status. We can visualize this with the following table:

Flip Number | Outcome | Profit/Loss |
---|---|---|

1 | H | +1 |

2 | T | 0 |

3 | T | -1 |

4 | T | -2 |

5 | T | -3 |

6 | T | -4 |

7 | T | -5 |

8 | T | -6 |

9 | H | -5 |

10 | H | -4 |

11 | H | -3 |

12 | H | -2 |

13 | H | -1 |

14 | T | -1 |

15 | H | +1 |

16 | H | -2 |

17 | H | -1 |

18 | H | 0 |

19 | H | +1 |

The series of values above constitute a **random walk along the integer number line**. This is because the win or loss on any given coin flip is one dollar, so Bill’s profit/loss will always be an integer value. (For more on random walks, check out this article of mine that focuses exclusively on the topic.)

Bill’s profit or loss is a **stochastic process** governed by the value that a random variable takes (heads or tails) over time. Now, perhaps intuitively, you might think to yourself: “Why bother playing this game? It’s 50/50, and, on average, no one is going to win or lose any money.” You’d certainly be correct in thinking this; the **expected value** of this random walk (that is, our average profit or loss after playing the game for an infinitely long time) is indeed zero. Of course, that doesn’t stop people from gambling—even when they’re up against a casino that has an edge. Why? Well, because in reality we can’t and don’t play the game for an infinite amount of time. We play for a finite amount of time, and within that finite time frame, you can and often will win or lose money. To illustrate this, let’s look at a plot of the stochastic process that is Bill’s wallet.

### Plotting a stochastic process

Here is a plot of Bill’s profit/loss after 100 coin flips. (The outcome of the flips were randomly generated using Microsoft Excel.)

Note that the greatest profit Bill ever has is $7, and the biggest loss he ever incurs along this interval is -$6. Now let’s look at what happens if Bill and Amy play the game for 500 flips of the coin.

Now Bill’s greatest loss occurs early, when he goes down $11. He later hits his maximum profit of $29 somewhere around flip number 200. The important thing to note is that as we expand our time frame, the greater the deviation we see from the long-term expected value of zero profit.

For good measure, let’s finish by looking at a 1000 coin flip simulation.

Now, in this particular game, it didn’t go so well for Bill. He went down early, and never recovered, reaching a maximum loss of almost $55 towards the end of the 1000 coin flips. Once again, as we increased the time frame, we saw even greater deviations from the expected value.

### Stochastic processes: Boundary values

As we’ve now seen, stochastic processes involve randomness, and thus, given where a system is today, we cannot know for certain where it will end up at some future time. However, we can know some basic parameters that help define where the system *might* be at some future time, *t*.

The first parameter is called a **boundary value**. Given a finite amount of time, a boundary value is some threshold number beyond which the stochastic process *cannot* go. For example, let’s say Bill and Amy flip the coin five times. The boundary values are then +5 and -5; with only 5 flips of the coin, it is impossible that Bill could have won more than 5 dollars, or lost more than 5 dollars. Hence, +5 and -5 define the boundaries of the random walk after 5 flips.

Now, for 100 flips, the boundary values will be +100 and -100. However, it’s extremely unlikely that the random walk will get anywhere near those boundary values, since getting there would require a coin flipping and landing heads 100 times in a row (something with a probability of (½)^{100}).

Thus, with large values of *n*, the boundary values don’t really give us a good indication of how far our random walk might stray from the expected value of zero. For that, we need to look at root-mean-square distance.

**Simple random walk: root-mean-square distance**

For the symmetrical random walk we’ve been describing (it’s symmetrical because the probabilities of gaining or losing a dollar are equal), we can rather easily calculate approximately how far our walk will be from zero after given number of coin flips, n. We simply take the square root of *n* to find what is called the **root-mean-square distance**.

For example, after 25 flips of the coin, we would expect to be about √25 or $5 away from zero (in either direction). Therefore, after 25 flips, we would expect either Bill or Amy to have won approximately $5. Of course, this won’t always be the case, but on average it’s what we would expect. After 100 flips, we would expect someone to be up $10, and so on.

To illustrate this, here is another simulation of 100 coin flips, along with a plot of y = √(*n*).

Notice how the random walk tends to stay within the bounds of √(*n*). It is this tendency that makes the root-mean-square distance a more useful metric than boundary value for a random walk such as this, especially as *n* gets large.

While we won’t show it here, check out this page from MIT if you want to see the derivation of root-mean-square distance.

### Stochastic Processes: Conclusion

A stochastic process describes the values a random variable takes through time. Many real-world phenomena, such as stock price movements, are stochastic processes and can be modelled as such. As we have seen, the simplest stochastic process is a symmetric random walk.

Still have questions on stochastic processes? Check out our statistics blog and videos!

## Comments are closed.