Boxplots are one data format you may see on the GRE Data Interpretation questions. First, try these practice questions.

(*The following diagram applies to questions #1-3*)

The following boxplot shows the 2012 season runs batted in (RBIs) of 280 American League batters (the top 280 batters in terms of number of plate appearances).

1) What is the size of the IQR of this distribution?

- 25
- 47
- 56
- 83
- 140

2) How many AL hitters hit more than 25 RBIs in 2012?

- 9
- 56
- 83
- 114
- 140

3) B. J. Upton of the Tampa Bay Rays hit 78 RBIs in 2012; this is the 90th percentile value on this chart. How many players hit between 56 and 78 RBIs?

- 14
- 22
- 28
- 34
- 42

(BTW, that max value of 139 RBIs is Miguel Cabrera, after his extraordinary Triple Crown year.)

## Five-number summary

In this previous post, I discussed the idea of quartiles and IQR, tools that statisticians use to “chunk” a data set. Sometimes, statisticians add to this the median & minimum & maximum to create something called the “**five-number summary**”

1. maximum

2. third quarter, Q3, the 75th percentile

3. median, 50th percentile

4. first quartile, Q1, the 25th percentile

5. minimum

The beauty of the five-number summary is that it divides the entire data set into quarters — between any two numbers on the five-number summary is exactly 25% of the data.

## A visual approach

Because statisticians, like all human beings, are highly visual folks, they created a visual way to display the five-number summary. This visual form is called a **boxplot**. Boxplots were created by the brilliant statistician John Tukey in 1977. The five vertical lines represent the five numbers of the five-number summary, and the “box” in the middle, from Q1 to Q3, represents the IQR, i.e., the middle 50% of the data. Between any two adjacent vertical lines are 25% of the data points.

## Strikeouts

Here’s an example of a boxplot using real baseball data. The data here are the 2012 season total for strikeouts pitched (by all National League pitchers who pitched at least 75 innings in the season).

Half of all the NL pitchers here pitched between Q1 = 83 and Q3 = 161 strikeouts in the year — these are the pitchers in the IQR, the big blue box in the center. Only 25% of the pitchers in this group struck out fewer than 83 batters— this “bottom 25%” is on the “lower arm”, from 38 to 83. Only 25% of these pitchers struck out more than 161 batters in the 2012 season —- this “top 25%” is on the “upper arm”, from 161 to 230. (BTW, that maximum value, 230 strikeouts, is R. A. Dickey, the knuckleball star pitcher of the NY Mets!) On a Data Interpretation question, ETS could give you a boxplot and expect you to read all the five-number summary information (including percentiles) from it.

If you found the practice questions difficult the first time around, you may want to go back and give them another look, before you read the solutions below.

## Practice questions explanations

1) The IQR is the distance from Q1 to Q3. From the boxplot, we read that Q1 = 9 and Q3 = 56, and the difference between them is 56 – 9 = **47**. Answer = **B**

2) From the boxplot, we read that 25 RBIs is the median, so that number divides the list in half. There are 280 hitters on this list: half must be above the median, and half below. Therefore, there are **140** hitters above the median value of 25 RBIs. Answer = **E**.

3) Upton, at 78 RBIs, is the 90th percentile. From the boxplot, we read that 56 RBIs, is Q3, i.e. the 75th percentile. Between the 75th percentile and the 90th percentile is 15% of the list. There are 280 hitters on the list, so 15% of 280 = 0.15*280 = 42. There are **42** hitters between 56 RBIs and 78 RBIs. Answer = **E**.

Hello Magoosh,

While preparing for the GRE, i came across a problem that involved interpreting a very symmetrical box-plot with one of the options having a set of five or six equal entries (e.g. 3,3,3,3,3,3). While this clearly isn’t the answer, I would very much like to know how the box-plot of such a set would look like.

Thanks in advance.

Hi Shahriar,

This isn’t a boxplot as much as a line on a piece of graphing paper 🙂 The boxplot is drawn by finding the mean, 25th and 75th percentiles, and the highest and lowest numbers in the data set. If we have a data set where all of the numbers are the same, then all of these values are the same. There would be no ‘boxplot’ as you think of them, because all of the lines would be on the same number. It wouldn’t make sense to draw a boxplot for a data set like that because the boxplot gives us no information–it would just be a vertical line above the “3” on the number line.

I was reading the About Mike McGarry and it said: “despite having no obvious cranial deficiency, he insists on rooting for the NY Mets.”, I didnt understand this sentence. Please can you explain was this a joke? Or was it sarcasm? English is not my native language so perhaps it meant a joke?

Thanks!

Hi Saad,

Happy to clarify! It is saying, “Even though he has no mental problems, he still supports the NY Mets team.” This implies that the author thinks the NY Mets are not a good team to root for, so the fact that someone still insists on supporting them might indicate that person is a little crazy. I hope that helps! 🙂

For the second question cant 114 be an answer since we don’t know how many scored 25

Hi James,

Since we have an even number of batters, we know that exactly 50% of the batters are above the median and exactly 50% of the batters are below the median. The median here actually represents the average between the two batters who are in the middle. So for the purposes of this graph and the data it represents, nobody is actually

onthe median of 25. In addition, we can find exactly what percentile has 114 batters above it. 114 is about 40% of 280, which means that the top 114 batters comprise the top 40 percentile. This means that 114 batters are above the 60th percentile. So while the answer choice of 114 includes 40% of the batters whose RBI is above 25, it doesn’t include the batters that are between the 50th and 60th percentile, which includes players who have an average of above 25 RBI, so it can’t be our answer.In the gre exam,is the level of difficulty of this topic, similar to the questions above?

Hi Ann,

Good question! 🙂

While I would say that these questions are definitely similar to the level of the real GRE questions, it may be helpful to also know that statistics questions are only moderately common, so I wouldn’t prepare for, say, 5 boxplot questions to show up on your exam. You may want to check out this GRE quant section breakdown post for more information.

I hope that helps! 🙂

How did we come to the conclusion that Upton, at 78 RBIs, is the 90th percentile.

I am confused

I could be wrong here, but, I believe that that is not information that we are not able to get from the boxplot itself. This is information given in the problem so we can just take it as true without worrying how or why.

As for the rest of the problem, you know that 56 RBI’s is at the 75th percentile just by the definition of boxplot.

Therefore 90-75= 15% of hitters hit between 78 (the 90th percentile, given in the questions) and 56 (the 75th percentile, given by the boxplot) homeruns.

Thank you Mike. I have a question related to math section. Will I have a good score on the number of the correct question i solve only or the number of correct answers plus the difficulty of questions? So, if i have long question like combination or long geometry, is it better to skip it ( even if i know it) to save time for two less time consuming question? Or it is better to solve it as long as i know the tecnique!

Dear Heba,

I’m going to refer you to a blog that my buddy Chris wrote:

http://magoosh.com/gre/2012/pacing-on-the-gre-math-sections/

Let me know if that doesn’t answer your questions.

Mike 🙂