Box plot for Excel 2007

Keywords: Boxplot, box plot, stem and leaf plots, Excel 2007, how to make

Version: Excel 2007

Since the previous entriesI have recieved quite a few questions about Box-plots in Excel 2007, so I decided I should describe one way to create decent looking box plots in Excel 2007. In my example I start with a set of data containing six samples with ten replicates each, and from this I want to create a box plot showing the extremes, median and the quartiles.

Box plot for Excel 2007

I create five new rows (12-16), max, 3rd quartile, median, 1st quartile and min and then calculate the statistics accordingly in cells B12:B16:

=MAX(B2:B10)
=PERCENTILE(B2:B10,0.75)
=MEDIAN(B2:B10)
=PERCENTILE(B2:B10,0.25)
=MIN(B2:B10)

Then copy to cells C12:G16.
Box plot for Excel 2007

Since we will “trick” Excel to draw a box-plot and use a stacked column chart we have to modify our data slightly. The first segment of the stacked column will be invisible and end where the lower boundary of the 2nd quartile begins ( =PERCENTILE(B2:B10,0.25) ). The next segment will consist of the 2nd quartile (median-1st quartile, or B14-B15). The third segment is the 3rd quartile (3rd quartile - median, or B13-B14). The length of the whiskers representing the max and min values are calculated as 1st quartile - min or B15-B16 and max - 3rd quartile, or B12-B13.

These values are calculated in a new range, see image below.

Box plot for Excel 2007

Now I’m ready to insert the chart. I select the range B19:G21 (see image below) and select a 2D stacked column from the Insert–>Table menu.

Box plot for Excel 2007

Next we add the whiskers. Select the second segment, click on Chart Tools –> Layou –> Select Error bars –> More error bars options and pick the Display Direction: Minus, indicate the Error Amount: Custom and click the Specify Value button. Leave the Positive Error Value as is and select the range containing the Min values for the Negative Error bar.

Repeat for the max value whiskers. The chart now should look like the one in the image below.

Box plot for Excel 2007

To make the chart a bit neater, right-click the lower segment series (green series in the image) and select properties and make invisible. Format the rest of the chart to your liking. Done!

Box plot for Excel 2007

Good luck, and enjoy your new Box plots.

Popularity: 100% [?]



Comments (12)

praveenDecember 22nd, 2007 at 3:01 pm

Dear Sir,
i have been trying to find a solution to draw a continuous graph , for programs, on a continious timeline.
for eg, the tv channel programming starts at 6 am and goes on till 12 midnight. i would like to plot two variables on a bar chart - the x coordinate of individual bars should represent the time duration of the program and the Y coordinate of the bar should represent the viewership.
my question is - would it be possible in Excel ?
are there any solutions for this ?
would be much obliged if you could help me .
thanks and regards,
Praveen

JesperFebruary 14th, 2008 at 10:32 am

Hi
Yes it is possible, you use the same basic techniqe as described here. Chose a horizontal bar chart and trick Excel by hiding part of the bar with a dummy series to achieve the effect you are looking for.

If you need more hands on help please contact me via the Contact page.

nameRAKELEApril 11th, 2008 at 12:05 pm

great…i make it…
:)))

FRANCISMay 27th, 2008 at 9:43 am

Hi Jesper, great stuff, however, the error bars are not the max and min values in a box plot, they are 1.5 times the inter-quartile range. With your approach there is no room for outliers.
Have a great day

ArtJuly 30th, 2008 at 4:56 pm

Cute trick…and very useful. What about the instance where there are negative values? Your method of tricking excel be “reversed” where all the values are negative but the method breaks if only some of the values are negative.

It gets tricky because what range you hide (or not) depends on where (or if) any of the ranges straddle the zero point.

A more robust solution would be include some boolean logic in the formulas. I’ve been playing around with it and am stymied because the changing relative location of the zero point means:

1) the order of the stack changes
2) the series which should be made invisible changes. It will be either 1stQ, 3rdQ or none.

3) if none, then either the 1st and 2nd, or 3nd and 3rd series need to be rendered in the same color.

The cute trick starts to become an inelegant mess. Macros seem much easier at this point.

ArtJuly 30th, 2008 at 8:07 pm

Ahh, what going to lunch will do. I figured out a better way to handle negative values.

In the “trick” range and an offset constant equal or greater than the lowest negative value to all of the numbers. Proceed as in the original instructions.

Finally add another range of data that has two points, the offset constant and the highest value in the original range. Change the axis of that range to “secondary” and make the marker and line style “none” to make it invisible.

Finally, change the tick labels of the primary axis to “none.”

Now you have a good looking chart with an appropriate scale. You may have to tweak the scale range on the secondary axis to make sure it lines up properly.

JesperAugust 20th, 2008 at 7:33 am

Art, that’s a good solution, I will definately write up an short instruction about it if you don’t object. Thanks.

PraveenAugust 20th, 2008 at 7:41 am

Dear Sirs,

much thanks for your responses, but my question i still unanswered.

is it possible to plot columns of different thicknesses - signifying durations of programs and heights denoting the program ratings.

eg, if a program has a duration of 2 hours and1 trp, the column is 2 units wide and 1 unit tall, another program has a duration of half hour nd trps of 5, the column is half a unit wideand 5 units all.

would appreciate any solution on the same,

thanks , Praveen

xiaoyuSeptember 23rd, 2008 at 3:54 am

hi, i found a small mistake (if it is one) :P
the “min” row for the modified row for plotting box plot should be median - min instead of
Q1 - min

Please let me know

JesperSeptember 23rd, 2008 at 8:27 am

Hi

The ‘min’ in this case is the error bar representing the extreme value of the data set, and to caliculate the value of the errorbar in this case I think it should be 1st quartile - min().
So I think it is correct in the article.

Thank you for your comment!’
/Jesper

PaunkielSeptember 30th, 2008 at 5:42 am

Good morning (or good evening)
Is it possible to individually format axis labels under Excel 2007? How to do it? Thank you in advance.

MikeyOctober 12th, 2008 at 7:55 pm

“Next we add the whiskers. Select the second segment, click on Chart Tools –> Layou –> Select Error bars –> More error bars options and pick the Display Direction: Minus, indicate the Error Amount: Custom and click the Specify Value button. Leave the Positive Error Value as is and select the range containing the Min values for the Negative Error bar.”

Can somebody post a screenshot link to this part of the process? im having trouble finding where you do that step, thankyou so much!

Leave a comment

Your comment