Hypothesis testing is literally my favorite part of conducting statistical analyses. Hypothesis testing refers to the process of generating a clear and testable question, collecting and analyzing appropriate data, and drawing an inference that answers your question. Of course, this means that some steps come into play.
What is a hypothesis in the first place?
A hypothesis is a possible explanation for patterns that you may observe in nature or people. For example, I think that people drink more coffee in the morning than in the afternoon. With this statement, I am stating that I think there is a difference between drinking coffee in the morning and afternoon.
Usually, you generate a hypothesis from previous research, either your own or someone else’s. I say usually because some non-quantitative research methods use inductive reasoning and gather data to come up with a possible explanation. As I said earlier, statistics typically use data to verify a model rather than come up with one.
A good hypothesis will propose a relationship between two or more variables.
In my coffee example, I propose that the time of day affects the amount of coffee that people drink. Time of day is one variable and number of coffees drank is another. In my case, I think the time of day is the reason for coffee drinking, so the time of day is the independent variable and coffee drinking is the dependent variable.
Once you have a clear hypothesis, you can design an experiment or data collection technique to look for the pattern you predict. My hypothesis predicts that people drink more coffee based on the time of day. I should collect data on people are in the morning and afternoons in addition to how much coffee they drink at both times.
Once you have data, you can analyze it for patterns. If you see the pattern that you predicted with your hypothesis, then you’re hypothesis was correct. You need to be able to state why. If your data doesn’t show the pattern you predicted, then you state that and revise your hypothesis to reflect the pattern in the data.
The tricky part about hypothesis testing is determining whether the pattern or model that you find is not only different, but different enough to mean something. Enter the null hypothesis.
A null hypothesis (µ0) is a hypothesis that predicts no difference or pattern between the variables. In our example, the null hypothesis would be that there is no pattern between people drinking coffee and the time of day. If my hypothesis is not right, then the null hypothesis is right, and vice versa.
To determine if the statistical model that you come up with is different enough, we compare it to another sample or the population. In the case of the tired coffee drinkers, we would compare the average number of coffee drinks in the morning to the average number of coffee drinks in the afternoon.
Let’s look at two examples. In this first example, the black line represents the number of people that drink coffee in the afternoon and the blue line represents the number of people that drink coffee in the morning.
In this example, the mean (highest point) of the morning drinkers overlaps with a major section of the afternoon drinkers. In this case, there is no significant difference between the average number of coffees drank in the morning and those drank in the afternoon. So, you should state that the null hypothesis is correct.
In this second example, the black line represents the number of coffees drank in the afternoon, but the red line represents the number of coffees drank in the morning.
In this example, the average number of coffee drank in the morning (the red line) only overlaps with the top 0.01% of the afternoon curve. Since it only overlaps a very small portion of the afternoon curve, we can confidently say that the number of coffees drank in the morning is significantly different from the number of coffees drank in the afternoon. This means that you reject the null hypothesis and accept your hypothesis as true.
But how different is different enough?
Determining whether the sample means are different enough from each other or different from the population involves even more statistics. It involves determining how your sample (or samples) compares to the population in terms of confidence.
For the sake of convenience, we are going to discuss a normal distribution in term of z-scores for a second.
In the image above, the area under the curve between -1.96 and 1.96 represent 95% of the sample data. In other words, you can be confident that any mean that falls in that region is “normal” or has a 95% chance of being considered normal.
Anything that falls into the top or bottom 2.5% statistically qualifies as significantly differentl. That is, any mean that would fall in those regions would represent a 5% chance of being normal. These regions are calculated based confidence intervals, as discussed in another post.
Now, let’s take a step back to our coffee example.
In this example, the mean of the red line (average number of coffees in the morning) would be above the 2.5% mark of the black line (average number of coffees in the afternoon). This means that the mean value of coffees drank in the afternoon is significantly different from the mean number of coffees drank in the morning
Since, it is significantly different, you reject the null hypothesis. Remember that the null hypothesis stated that there would be no significant difference.
By the same analysis, our alternative hypothesis is that there would be a difference. Since the data suggest that there is a difference, you reject the null hypothesis while accepting our alternate hypothesis.
Could my hypothesis be wrong?
Yes, you could be wrong. Your data could have been a bizarre fluke or random errors could have occurred with your sample. One common error is that the sample you selected does not exactly make the overall population. This is why we typically use a test statistic to determine if our model is significantly different from the data. Test statistics include things like t-test or F-test.
These test statistics indicate the probability that whether you rejected or accepted the null hypothesis was correct. The probability, or p-value, represents the probability that the mean value from your test sample is normal when compared to population or control sample. The value is usually set at 5% at most.
In everyday terms, this means that there is a 5% chance or less that your sample mean is “normal” compared to either another sample or the population.
I like to consider my errors as either false-positive or false-negative. In statistics, these are usually called Type 1 and Type 2 errors respectively. Both of those types of errors are discussed in detail in another post.
Hypothesis testing is an exercise in logic. This logic compares not only the mean of samples and populations but also the deviance and standard error. Thankfully, statistics allows us to compare these properties in meaningful and relatively unbiased ways.