To begin, a couple GMAT CR questions, variations on a theme.
1) Colfax Beta-80 is a rare genetic defect found primarily in people of Scandinavian descent. Over 97% of known carriers of this defect have are citizens of, or are direct descendants of immigrants from, Denmark, Norway, and Sweden. People who carry the Colfax Beta-80 defect are at substantially higher risk for contracting Lupus and related autoimmune diseases.
Assuming the statements above are true, which of the following can be inferred from them?
(A) People from Denmark are at a higher risk for Lupus than people of other, non-Scandinavian countries.
(B) Genetic engineering that eradicated this genetic defect would constitute a de facto cure for Lupus.
(C) Finding a cure for Lupus would eliminate most of the health threats associated with the Colfax Beta-80 defect.
(D) A person not of Scandinavian descent born with the Colfax Beta-80 defect is more likely to contract Lupus than is a Scandinavian who is born without this defect.
(E) The majority of people who contract Lupus are either Scandinavian or of Scandinavian descent.
2) In social science research, “highest education level attained” would refer to the most advanced grade or degree achieved by an individual — for some individuals, it may be a grade in grade school, and for other individuals, it may be a Bachelor’s Degree, a Master’s Degree, or Ph.D. (which is considered the highest education level). A recent study has shown a strong correlation between highest education level attained and proficiency in chess. Another result, studied at many points throughout the 20th century, shows a marked positive correlation between highest education level attained and income level.
Assuming the statements above are true, what conclusion can be drawn from them?
(A) If one practices chess enough to raise one’s proficiency, one has a good chance of raising one’s income level.
(B) It is possible that a person who has attained only a sixth grade level of education could earn more than a person who has a Ph. D.
(C) If Jane has a Ph. D., and Chris has not finished his undergraduate degree, then Jane will usually beat Chris in chess.
(D) The average salary for people who have completed three-year Master’s Programs is higher than the average salary of people who have completed two-year Master’s Programs.
(E) An individual’s proficiency at chess rises consistently during that individual’s years of school, and levels off once that individual has finished her years of formal education.
Reasoning with populations
Folks who have not studied statistic tend to fall into some likely mistakes when thinking about issues involving correlation. After all, correlation is a very sophisticated idea. Most educated people have heard this word and have a vague idea (e.g. if A goes up, B goes up), but this vague popular understanding lends itself to some obvious misunderstandings that the GMAT loves to exploit.
Mistake #1: Correlation & Causality
Any veteran of Statistics has probably heard the mantra: Correlation does not imply causality. This is tricky, because of course, the inverse is true: causality does, in fact, imply correlation. If A reliably causes B, then whenever you find A, you will be likely to find B. For example, smoking causes a large collection of undesirable conditions, including lung cancer, emphysema, and heart disease, and sure enough, it is highly correlated with each of these.
The catch, though, is that two things can be correlated and A does not cause B. For example, A & B would be highly correlated if they were the common response to the same underlying cause: for example, beer sales and ice cream sales are highly correlated, not because folks like having beer a la mode, but because another cause, hot weather, drives both. There are other more complicated relationships we will not explore here in which A & B would tend to show up together — that is, they would be correlated — but each would not be a relationship in which one is causing the other.
Another way to say this is: correlation is relatively easy to demonstrate. All you need is broad sociological or epidemiological data, and you can show correlation. Anyone with a data set and statistical software can demonstrate correlation. By contrast, demonstrating causality is often a major scientific achievement, sometimes worthy of a Nobel Prize. To demonstrate that A causes B, one would need to show dozens and dozens of conditions are met, only the most elementary of which is that A is correlated with B.
In any question about correlation, the GMAT loves incorrect answers that blur the distinction between correlation and causality.
Mistake #2: The Problem of Scope
A correlation is something that exists across a whole population. In the natural sciences, and especially in the physical sciences, one can get extremely tight correlations, such that all the data points line exclusively on a straight line. In that case, the correlation is true not only at the population level, but also at the level of individual points — one point is higher in A, that point must be higher in B.
When the GMAT talks about correlation, mostly this will not be in the context of the natural sciences. Instead, it will be in context of the social sciences. Human populations are messy. There’s always a ton of random fluctuation involved in anything you measure about people, and this makes the social science considerably less precise than the natural sciences. A correlation in the social sciences is something that’s true in a population-wide view, but when the scope shifts to individual-to-individual comparisons, the statistical noise is too great to discern any reiable pattern.
For example, one well measured social science study demonstrated the correlation of income and height. If one steps back and looks at the whole population, one can discern a mild relationship — on average, taller people are slightly more likely to have higher salaries than are shorter people. At the level of whole populations, at the level of probabilities, this relationship holds. Now, switch to the individual level. It’s sheer nonsense to say that, if Alex is taller than Bert, than Alex must be richer than Bert. It’s trivially easy to find single examples of poor tall people and rich short people. The correlation is something that is true in the population-view, but at the level of individuals, it’s virtually meaningless, except as a very weak probability statement. Folks not familiar with statistics forget this, and get almost “fundamentalist” in their interpretation of correlation, as if the fact that A is correlated with B means that in every single instance that A goes up, it absolutely must be true that B goes up. The GMAT loves to prey on this kind of misconception.
If reading this post gave you any insights into the nature of correlation, you might give those questions at the top another look before reading the explanations below. If you would like to share any insights or ask a question, let us know in the comment section at the bottom!
Solutions to the Practice Questions
1) The prompt tells us that Colfax Beta-80 is a genetic defect. Most of the folks who have this defect are Scandinavian, but we don’t know what percent of Scandinavians have this defect. It may be a substantial portion, but that’s unlikely because the defect is “rare.” Much more likely: there only be a couple hundred people in the whole world who have this defect, and 97% of this couple hundred are from Scandinavia —- a large percentage among those with the defect, but not a large percentage among the Scandinavian population as a whole. A couple answer choices conflate these two percentages.
Anyone with this defect is at higher risk for Lupus and other autoimmune diseases.
(D) is the credited answer. This more or less restates the information of the last sentence. Anyone with this defect (Scandinavian or not) has a substantially higher risk of Lupus compared to anyone without the defect (Scandinavian or not).
(A) & (E) play on the misunderstanding about what the 97% implies. Most folks with this genetic defect are Scandinavian, but that doesn’t imply that most Scandinavian people have this defect. People with the defect are at higher risk for Lupus, but that doesn’t mean large sections of the Scandinavian population are at risk for Lupus.
(B) is wrong because, while we are told this genetic defect causes susceptibility to Lupus, we don’t know what other factors might cause or contribute to Lupus. Just because we eliminate this one factor does not mean we would eliminate everything in the world that could possibly contribute to the onset of Lupus.
(C) is wrong because, while we are told this genetic defect causes Lupus, we are also told it causes other autoimmune diseases. Even if we had a cure for Lupus, these other autoimmune diseases would still poses health threats to carriers of the defect.
2) This question presents two correlations, education level with chess, and education level with income. We would do well to remember both errors mentioned above.
(B) is the credited answer. In the population view, higher education level is correlated, on average, with higher income, but this doesn’t apply at the individual level. Indeed, despite the overall population pattern, it would certainly be possible to find someone with a sixth-grade education who struck a fortune and therefore was richer than many people with Ph.D.’s. It wouldn’t be likely, if we picked a random person with a sixth-grade education and a random Ph.D., but it would be possible.
(A) plays on the correlation-causality fallacy. Chess is correlated with education level, but doesn’t “cause” education level. Education level is correlated with income, but doesn’t singlehandedly “cause” income. There is no reason to conclude what (A) says.
(C) plays on the fallacy of scope. Yes, there’s a correlation in the overall population, but just because Jane has a Ph.D. and Chris doesn’t even have an B.A., we can’t automatically assume that Jane is better at chess.
(D) is tricky. The “education level” variable implied the idea of “length of time being educated”, but that’s not explicitly part of the variable. The question very clearly says one of the last three categories is “Master’s Degree”, so all master’s degree would fall into this category, irrespective of the duration of the program.
(E) also plays on the correlation-causality fallacy. In general, folks who are more proficient at chess are more likely to pursue higher degrees, but it’s not that step-by-step in their year-by-year learning process, they are steadily learning more about chess. In other words, the education does not strictly “cause” the proficiency in chess.