Sign In | Subscribe
Start learning today, and be successful in your academic & professional career. Start Today!
Loading video...
This is a quick preview of the lesson. For full access, please Log In or Sign up.
For more information, please see full course syllabus of Statistics
  • Discussion

  • Download Lecture Slides

  • Table of Contents

  • Transcription

  • Related Books

Bookmark and Share
Lecture Comments (5)

1 answer

Last reply by: Ryan Reddell
Tue Apr 29, 2014 11:36 PM

Post by KyungYeop Kim on July 15, 2013

Does anyone know the website that was once introducted where you can see all the cool statistics expressed in circles??

0 answers

Post by Paulette Jones on May 13, 2013

I think you sound fine - you're very approachable and positive. Only question - what version is this of Excel?

1 answer

Last reply by: Manoj Joseph
Wed May 1, 2013 6:15 AM

Post by Manoj Joseph on May 1, 2013

are you saying meadian is also measure of spread?

Five Number Summary & Boxplots

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

  • Intro 0:00
  • Roadmap 0:06
    • Roadmap
  • Summarizing Distributions 0:37
    • Shape, Center, and Spread
    • 5 Number Summary
  • Boxplot: Visualizing 5 Number Summary 3:37
    • Boxplot: Visualizing 5 Number Summary
  • Boxplots on Excel 9:01
    • Using 'Stocks' and Using Stacked Columns
    • Boxplots on Excel Example
  • When are Boxplots Useful? 32:14
    • Pros
    • Cons
  • How to Determine Outlier Status 33:24
    • Rule of Thumb: Upper Limit
    • Rule of Thumb: Lower Limit
    • Signal Outliers in an Excel Data File Using Conditional Formatting
  • Modified Boxplot 48:38
    • Modified Boxplot
  • Example 1: Percentage Values & Lower and Upper Whisker 49:10
  • Example 2: Boxplot 50:10
  • Example 3: Estimating IQR From Boxplot 53:46
  • Example 4: Boxplot and Missing Whisker 54:35

Transcription: Five Number Summary & Boxplots

Hi and welcome back to

Today we are going to talk about the five number summary and box plots.0002

Now that you know a little bit about variability now we can talk about the 5 number summary 0005

and the box plot is going to be a way of visualizing that 5 number summary.0013

We are going to talk about how to determine outliers and we are going to be using conditional formatting in Excel.0019

Finally we are going to be talking about modified box plots which are going to be box plots where we exclude the outliers already.0027

There are 2 ways to summarize a distribution.0037

We have been talking about on way when we talk about shape, center, and spread.0042

Usually when we talk about shape, center, and spread we often use as a measure of mean.0048

Mean is what sometimes mean by center.0056

It could be median but a lot of times people use mean.0059

When we use spread we often mean standard deviation rather than other pairs such as median and inter quartile range.0063

It is where the 5 number summary comes in.0072

The 5 number summary is a way of using the median and the inter quartile range more as a summary of how our distribution looks.0077

The 5 numbers are number you already know.0089

The minimum value, the q1, the border of the first quartile, the median which is q2 it divides the entire distribution in half.0093

Two quartiles on one side and two quartiles on the other side, 0112

Q3 the third border and the maximum value, the highest value in your distribution.0115

This 5 number summary is often used in skewed distributions.0124

Let me show you visually what this looks like.0128

Let us say we have a whole bunch of different data points and here is our maximum value.0131

Here is our distribution so far.0142

The 5 number summary will basically say the minimum value is important, the maximum value is important.0149

Whatever the value in the middle is.0160

Q1 and q3 which would be dividing this by half and also dividing this by half.0162

Here is min, q1, q2, median, q3, and max.0182

In that way we get a 5 number summary.0206

We may not know our little dots in our distribution but we have a general idea of it.0210

Now we are going to talk about a way to visualize that 5 number summary just quickly.0216

It is going to be called the box plot.0222

It is also called the box and whiskers plot.0224

It is called box and whiskers because often times there is a box, it could also be on its side like this with whiskers on it.0227

Often times this box is split up somewhere in between not necessarily in the middle but just somewhere like that.0240

This box is often aligned with an axis of values of your variable.0248

Your q1 is on the low end, q3 is on the high end.0256

Those are the boundaries of that box.0270

The line in the middle is the median.0274

The whiskers and at the minimum value and the maximum value.0276

That is how you decipher a box plot.0283

It could also be on its side like this.0288

In this case the values will go like this.0291

Once again you have q1, q3, median, min, and max.0297

It does not matter whether it is on this axis or this axis but what you do have to know is the box should be in alignment with the values.0310

Let us do an example box plot here just by hand.0321

Here is an example set and we can easily see the minimum and the maximum.0328

We already know that.0334

Let us find out what the median is.0336

The median should be right here.0338

That should be 30 because what I am going to do is add 28 and 32 and divde by 2.0341

Just to find the average of those two numbers and that is 30.0351

Visually I just think about what is in the middle of 28 and 32.0355

Let us find q1 and q3.0359

What is in between here and here?0364

That is 20.0366

We count the 18 as the border, 30 as the border.0386

Q1 is right in here so that should be 22 and here 30 is the border, 55 it should be right here.0393

Here is q3 and that should be about 36.5. 0409

There we go we have our 5 numbers and we just need to plot them.0425

I am going to put it on a horizontal axis just because I have no room for it.0431

I know that 18 will be on one side of it and 55 will be on the other side.0439

Here I will put my values.0445

We could pretend this stan ds for age.0447

Maybe I will start from 0 and divide this up into 10’s.0453

10, 20, 30, 40, 50, 60.0466

Let us start plotting our box and whiskers.0474

What I like to start to do is put dots there and draw in the box and whiskers.0479

Here I put in a dot in one of my whiskers and here is the other end of that.0484

Q1 is here about 30 is right here the median and 36.5 is right here.0492

I know where my boundaries are.0508

I will just draw my box and whiskers.0512

To help myself I might just write down where q1 and q3 is because that is going to help me draw my box.0514

here is my median, min, max.0524

This would be a box and whisker plot or a box plot for this data set that we have here.0534

Now let us talk about making a box plot in Excel.0540

In Excel it is not a great program for using box plots.0544

If you have Sdata or other statistics programs those are better options.0550

For some of you, you want to make a box plot in Excel so I am going to show you 2 ways of doing it.0555

One method is by using stocks.0561

This is a stocks chart and it is not originally designed for plotting median, q3, min, and max.0564

It is for stock prices and opening and closing prices.0573

We are going to piggy back on it in order to create our box plots.0578

The trouble with that one is it does not allow to show you the median.0584

This ends up being a 4 number summary.0588

The other option is by using stock columns.0592

In this option you can show the median but it takes a little more filling with it.0603

I am tricking Excel into showing you something that looks like a box plot.0610

Let us get started.0614

Go ahead and download the Excel file and you will see there is a whole bunch of different sheets that are already pre put in for you.0617

Some of them, the ones that starts with a and s are pre answered.0629

They already have the graphs and everything in there but you could also follow along.0634

If you go to height stock, it is going to plot height on a stock chart.0638

The nice thing about box plots is that you could have several box plots on one visualization.0645

That is what is nice about being able to compare a couple of 5 number summary.0654

Let us go ahead and find the critical numbers we need for male and female height.0661

The thing about stock chart is that they need you to put in their numbers in a particular order.0668

This is the order you need.0677

You need q1, max, min, and q3.0679

This one is hard for me to remember because it is very arbitrary.0683

But because we are piggy backing of stocks.0693

It not originally meant for this use.0697

Let us find the boundary of the q1.0700

In order to do that, Excel has a nice function for us called quartile and here is what you have to put in quartile.0706

The input that it receives is the array or your data set as well as which quartile you would like.0715

In Excel, q1 is 1 so you would put in your data,1.0722

Let us go ahead and do that.0729

Go to data all the way up to height.0731

There is male height and I will color it in blue.0735

Go ahead and select that data then I am going to put in ,1).0739

Let us look at what we have here.0749

We have this data,1.0751

Let us lock that data in place because we do not need that to move in any time soon.0753

I am going to hit enter.0758

That is for q1.0763

In order to find the median it would be q2 or median.0766

In order to find the boundary for q3 it would just be the same thing with the 3.0770

Let us go ahead and copy and paste that function into my q3 slot.0776

All I have to do is change which quartile that I want it to get back at me.0783

There I have my first and third quartiles.0789

The nice thing about this quartile function is that it can also find you the min and max values because it considers the min value q0.0794

It considers the max value q4.0800

You could also use the functions min and max.0807

Let us go ahead and use quartile.0810

It makes it easier.0812

I could just copy and paste all of that in there. 0813

I am going to double click on that maximum and put in 4.0816

I am going to double click on the min quartile and put in 0.0821

There we go.0828

We now have q1, max, min, and q3.0830

Let us find the female q1.0835

Let us go ahead and find our data.0840

I am just going to color it in pink.0844

I am going to put in ,1).0847

Let us see what we have here.0857

You double click on it we have our data and q1.0859

I am just going to lock that data in place so I could easily copy and paste that all the way down.0862

Let us change it for our max value.0874

For our max value we need q4.0879

For our min value we need q0.0882

For our q3 we need q3.0886

Notice that for females the max is slightly smaller.0891

The min is slightly smaller.0898

It is q1 and q3 for its male counter part.0899

Let us go ahead and put in our box and whiskers plot.0903

Go ahead and select the male and female part of this data set.0909

When you do that, Excel will interpret it as you want it to be the labels for your box and whisker plots.0915

Click on charts and if you click on stocks you will see something that looks like a box and whisker plot.0923

You could go ahead and click on it but some of you may get an error signal that looks like this.0934

To create a stock chart, arrange the data on your sheet in this order opening price, high price, low price, closing price.0945

Use data stock names for labels.0954

We have done this already.0956

It has the opening price or q1, high price is max, low price is min, closing price is q3.0958

Yet Excel does not recognize that we have done the right thing.0967

The reason is because Excel is not seeing that we organized our data into our columns instead of rows.0971

Here is one thing you might want to do.0980

Just go ahead and pick column and you will see that Excel is treating each row as if it is a data set.0983

What we want to tell Excel is stop doing that instead organize it this way so that it is grouping them into columns.0998

I am going to move this formatting palette out of the way and now if we go back to stock and click on stocks it should recognize it.1014

It is not elegant but that is the way Excel is sometimes.1031

You have to force it to do what you want.1036

I am going to delete this series because now it is treating each of these 4 rows as a series but we do not need that information.1038

We do not need to know how they go across.1049

We just need to know how they go up and down.1055

I am just going to change my formatting here so you could see it a little more clearly.1057

White is might be a little tough to see.1063

Feel free to pick any color you want.1065

Here is our box plot or classic box plot.1082

You will see that our values are clustered around this end and one thing I want to do is modify my y axis 1087

so that I could stretch out the values that are important to us.1098

The way that I will do that is go to scale and put in my min value as 50 instead of 0.1105

When you do that it will stretch this part of from 50 to 80 instead of from 0 to 80.1115

Now we could see our box plot.1123

Here is our min for males which is below 62.1127

The max is 75.1133

Q1 which starts at 67.1135

Q3 which ends at 71.1139

Here is an important thing about a box plot.1145

Before in frequency distributions, we always have a value on the bottom.1148

Height would be on the bottom.1154

Now height in inches is shown on the y axis.1156

Because that is an important distinction, I am going to label my vertical axis to say height.1162

Now we could remember this is where height is.1181

Notice that frequency is not shown here.1189

We do not know exactly how many people have a height of 70 or 75.1192

We only know that those are the boundaries.1198

There is one thing we do know, we know that these are split up into quartiles.1201

This represents the quartile from 0 to 1.1208

This represents the quartile as well, the border from q3 to the end.1213

About 25% of our participants have heights in this range.1221

Another 25% have height in this range.1226

Using our powers of deduction how many people are in this range?1230


If we had a median line, we would also know where our other two quartiles break up.1237

Unfortunately, in this kind of formatting with the stock plots we do not show you the median.1242

In order to see the median because it is an important number sometimes, let us go ahead and use the stock columns option.1250

Go ahead and click on height stock column.1260

Here I have set it up so that now it is in a format that makes sense for me.1264

Something like min, q1, median, q3, and max.1271

That is the order that makes more sense to me or you could order that as a way max, q3, median, so on.1277

The reason I have done that is because it is going to make it a little bit easier for us to create our box plot if it is in order.1285

Let us go ahead and find our quartiles.1295

Let us find our data and put that in for males ,0 because we are starting at 0) and enter.1301

Notice that this is not new.1319

Instead of putting in 0 here is what I am going to do now.1324

I am going to click on this 0 because this way the more I make things into formulas the more I can just cut and paste.1328

The nice thing about clicking on this 0 is that because I have not put in any $ around it, it is a relative reference.1340

If I cut and paste this down one it will cut and paste this one down as well.1350

It will refer to the next one down.1355

I do not want my data to move down.1357

Let us go ahead and lock down in place and hit enter.1360

Once I have that I could just drag this down, copy and paste as we go.1367

We have all 5 of our values for our 5 numbers summary.1374

We did not have to do anything.1378

We could see that they are in order from least to greatest.1380

Let us do the same for females.1385

We have quartile, find our data and put in ,0) and hit enter.1387

All I am going to do is lock my data in place.1411

Sometimes I forget copying and pasting this.1416

I could just copy and paste that all the way down.1419

So far that part has been easy.1426

Here is where we get a little more complicated in order to create stocks box plots.1429

If you go to charts and go to columns you are going to be using the stock column.1438

In order to use stock column, one thing you have to do is find out the distance between each of these things.1446

In stock columns, what we will do is to put in a little box on top of another little box.1454

Each of those boxes represents the distance from the previous box to the next box.1465

We have to change this into distances.1470

I am just going to put in here distance for males and for females. 1474

The first one is the distance from 0.1491

It is just 1.1497

The second one is the distance from this value to this value.1500

Once I have that I can just copy and paste that all the way down because this will give us the distance between this two.1507

Here we have the distance between these two.1517

In order to do that for females all I have to do is copy and paste that over because Excel will do it for the column on the right.1522

Now we have our distances.1535

Because of that we could just take that and put on column.1538

Notice that right now it is putting each row as a column, it is not what we want.1551

We want each column to be a column.1559

I am going to tell Excel organize it this way.1562

I will leave this one aside.1571

Feel free to use whatever coloring scheme you would like just keep it consistent I am going with orange.1576

Here is what we need to do now.1598

This does not look like a box and whisker plot yet.1602

In fact there are no whiskers.1606

That is why we need to fudge a little bit.1609

One thing we have to do is start by getting rid of this block.1612

We do not want to hit delete because that will just delete and ignore that data.1617

That is not what we wanted to do.1624

I want to look at these things as floating up here.1625

Instead what I am going to do is tell Excel color these clear.1629

I am just double clicking on this and fill and no fill.1634

It is invisible now.1640

In terms of line, I think there might be no line.1643

Even though it is there, you could find it.1649

It is just invisible because we do not need that part.1658

It is just the distance from 0 all the way to our min value.1663

Here we want this to be our low whisker. 1667

Here is what we do.1672

We ask Excel to make this one invisible then we use error bars.1673

Error bars usually start at the top of that box.1689

Because we are on the top of the box we like for a bar just to to go through a box entirely.1694

We put a –errorbar and we ask for it to be 100% of our box.1702

When I do so, you see this little line here that is 100% of my box.1712

This is an Excel cheat.1719

We cout hit okay and we have our low end whisker.1722

We do not need to do anything with that.1729

We could leave that because that is our q1 to median box.1733

If you hit the error button it will go to the next box up.1739

This is the box that goes from median to q3 and this top one as well this is the box that goes from q3 to max.1746

We need to click on that and make it invisible and get that an error bar as well.1761

That is 100%.1770

In some of your Excel options you might be able to get rid of these little feet so that it looks like little bars.1773

In newer versions of Excel that is the case.1790

That is what I usually do because I think it looks ugly to have those little seats there.1793

That is not a big deal to have a feet.1798

Let us change our y axis so that we could see our box plot a little better and just hit scale.1803

I just double clicked on it and I am going to make my min value 50 and hit okay.1816

I do not want it to say distance of males instead I want it to say just males.1825

I am going to change this into males.1834

Excel will do this automatically for me.1838

Finally what I am going to do is add a little label to my vertical axis so that I remember the height and inches is being shown here.1847

Here we could see male that the quartiles here split up quite nicely.1864

But the quartiles here seem to be that there is a greater range of low end than the higher end.1871

The spread is bigger between q1 and median than the median and q3.1887

Our 2nd quartile is slightly fatter than our 3rd quartile.1896

But our 1st and last quartiles are even bigger than these two.1901

Although that is a bit of a fudge, you could see we could use stock columns in order to create box plots as well.1908

That is what I mean by slightly idiosyncratic.1921

It is not the best way but it can be done.1930

Let us talk about box plots are useful.1932

The pro’s of box plots is that you could plot a single quantitative variable and you can compare it to 2 or more distributions easily.1938

In fact box plots are useful for comparing lots of distributions very quickly because it is a good visualization.1947

It is very compact.1954

You put it all in one graph and it is easy to compare the boxes.1956

When distributions has many values, too many to show individually, you would not want to use a stem plot or a dot plot.1962

You would want to use something like a box plot.1972

Also it is nice when you do not want to see anything more than a 5 number summary.1974

One of that cons is that you cannot see frequency.1979

If you cannot see how many people have a height of 72 but you can see a frequency within a range.1983

You know that 50% of the people lie in between q1 and q3.1993

You can see that easily on a box plot.1999

How do we determine whether we have outliers in our data set?2002

Outliers are extreme values but that is a subjective way of defining it.2012

How do we know how extreme?2020

Who defines extreme?2021

We have a rule of that in statistics.2024

Typically we determine outliers by saying that an outlier is 1.5 times the inter quartile range from the nearest quartile.2027

What we mean is the upper limit is the nearest quartile is q3 and you add 1.5 times the inter quartile range whatever that is.2038

That is the upper limit boundary.2053

The lower limit boundary would be from the q1 boundary and because we are going lower 2056

we are subtracting 1.5 times the inter quartile range.2064

Sometimes in skewed distributions where the 5 number summary is useful you might not need one of these.2070

Because one side might not have outliers only the other side does.2078

This right skewed we have the upper limit boundary.2085

If it is left skewed we have a lower limit boundary.2088

We are going to learn how to signal outliers in Excel data file by using conditional formatting.2092

Go ahead and click on your example spread sheet again Excel file.2100

Let us look at photos of stocks.2110

Photos if you remember from our previous frequency distribution visualizations that tends to be a skewed distribution.2121

It tends to be that photos are skewed to the right where there are some people that have an extremely high number of photos on one end.2133

Let us determine what the outlier boundaries are for tagged photos.2148

In order to do that, we might want to start off by finding q1, max, min, and q3.2154

Let us go ahead and put in quartile and put in our data for tagged photos.2161

I am just going to put in ,1 and hit enter.2181

Our first quartile is going to be locked in place.2188

Our 1st quartile is 53.75 that is the boundary.2198

I am just going to copy and paste that all the way down.2207

I will just change the quartile that we are referencing.2210

We need the max here so that is 4.2213

We need the min here which is 0.2216

We need q3.2221

Notice that q3 is around 250 but the max value is 4,686.2224

I am guessing we have a few outliers here but to find outliers we need to find the inter quartile range.2237

The inter quartile range is easy once you have q3 and q1 because it is simply q3 – q1.2246

Instead of having max and min let us modify this.2259

Let us modify it to be the upper and lower limit.2282

Instead of a classic box plot, we are going to be looking at a modified box plot.2287

Instead of this max and min we are going to be putting our formulas for the upper and lower limit.2299

The min value we probably would not need it because 13 is going to be within the boundary 2311

like if we made q1 – 1.5 × 200 that would be way lower than our actual min value.2323

We do not need to modify that one.2331

Let us modify this one.2333

Here what we do is take our other boundary + 1.5 × IQR.2337

Excel knows order of operations so it will do the multiplication before it does the addition.2349

Let us hit enter.2356

Notice that our outer boundary is a lot smaller than our max value.2358

It is 549.375.2367

It would be nice if we could go back to our data and see quickly who is in that range and who is not.2372

Who is out of that range or my outliers?2380

If we go out to the data, what I am going to show you is something called conditional formatting.2384

Go ahead and select all the data that we want Excel to look at.2390

What we are going to let Excel do is color these data points if they are outliers.2397

In order to do that, after you select is go to your menu which we cannot see here and hit format.2405

There should be an option that is conditional formatting.2415

Go ahead and click on that.2419

You should get a box that looks like this.2421

Here we want to say if the cell value is greater than something then format it differently.2423

Let us see what our other boundary is again.2436

We go to photos and fortunately it would let you put enough formula unless it is in the same work sheet.2440

We just have to put it in manually.2450

If it is greater that 549.375 then format it differently.2452

Hit format.2461

One thing I might want to do is just color it in a different color.2463

I think I will put in something like red.2469

Color it red if it is an outlier and hit okay.2472

If you look through this data file you will see we already have 2, 3, 4 outliers.2477

We have 7 outliers in our sample of 100.2490

We know that those 7 or so people are not going to be acquitted.2497

In order to do a modified box plot, all we would do is plot these values instead of the max and min.2504

Here I am going to select these and use stock option.2514

Let us see if it will do for me on the first try.2524

Unfortunately no.2528

It gives me this error.2530

Instead I am just going to put something else just so that I could tell Excel please organize it differently.2533

I am going to go back to my stock.2542

I am going to color it something different.2548

I am going to delete this series because it is redundant.2557

Here are our tagged photos and what we see is this range is a lot bigger even though there is 2570

the same 25% of people in there as well as in here.2579

It is typically what it looks like for skewed distribution.2583

A skewed distribution will have a small whisker on one end and a large whisker on the other end.2587

Because even though it has the same number of people as there are in this little whisker down here, 2592

this one shows you there is a wider range that captures that same 25% of people.2602

This is a right skewed distribution because the top whisker is bigger than the bottom whisker.2609

One thing we might want to do is label our vertical axis to show number of photos.2617

Just so that we remember we are not talking about frequency of people here.2629

Let us use that same data in order to create stock columns.2636

Go ahead and put in stock columns and instead of a classic box plot we are going to be using a modified box plot.2650

Let us put in our quartiles and data right here.2660

Go back and put in this one.2676

Let us see what we have.2681

We know we want to lock in this data in place and hit enter.2685

I could just drag that down.2693

We want to change this max value to be modified.2695

In order to do that we have to find IQR which is q3 – q1 and delete that and put in =.2704

My q3 + 1.5 × IQR and enter.2725

Let us find the distances between these.2735

The first one I do not have to do anything with that because it is the distance from 0.2741

The next one I just subtract c3 from c4 and copy and paste that all the way down.2745

Let us plot these distances.2757

I am going to select all of that and hit charts, go to columns and choose stock columns.2764

Unfortunately it is not stocking them.2772

Here on my toolbar I am going to tell it organize differently and delete my series here because this is not what I need.2776

Now we have to do that fancy formatting.2792

The first one we know we just need to make it clear or transparent.2797

The second one we know we need to make it transparent as well as putting in a –errorbar 100% of that box.2805

Up here we also need that –errorbar 100% of my box with no fill.2819

Hit okay.2830

Here is what we have.2833

We should probably change that distance because it does not have to have distance.2836

We should put something like tagged photos and on the side we will change our vertical axis so that it reads number of photos.2842

We know that is our values.2856

There we go we have a modified box plot.2859

It is no longer from 0 all the way to 5,000.2863

It is form 0 to about 600.2868

It is still a skewed distribution even when we top off those crazy outliers we still have our whiskers on top that is a lot longer 2872

than our whiskers on the bottom even though they both indicate the same 25% of our 100% sample.2889

I am going to minimize all this and get rid of that.2898

Now we know how to create conditional formatting so that we could signal these people are outliers.2906

We are not including them in our visualization.2916

We just went over modify box plots.2919

The classic box plots relies on the max and min but those are very susceptible to outliers.2924

If you have a couple of crazy outliers that would drastically change our visualization.2930

The modify box plots use those modified boundaries that is 1.5 times the IQR.2936

This is helpful to highly skewed distributions as we saw with tagged photos.2943

Let us go into some examples.2949

Approximately what percentage of values in the data set lie within the box?2953

That box with the whiskers.2960

What percentage of the data lies in there?2965

We know that this is 25% of our sample as well as this also 25%.2967

Each of these quartiles are also 25% so we know that what is in here is 50% of our sample.2978

The lower whisker 25% and the upper whisker 25% and that is one of the reasons why box plots are useful.2991

It breaks up that data easily and do chunks that we could use.3004

Example 2, I want a box plot that looks for data set that is skewed right.3010

In a skewed right distribution we have a few of these outliers that are very positive.3018

What would our box plot look like?3034

For data set that is skewed right, which I write in red, if we have a box plot that looks like this 3037

we know that the right side would be much longer than the left side.3053

Probably the right box would be longer than the left box.3060

When it is up and down remember this side the top part is the greater side it would just leave this shoulder which is spread around.3065

This upper side is longer than the lower side.3088

What about those populations that are skewed left?3092

In that case smaller numbers there are more outliers than the smaller end of the continuum.3097

Skewed left.3106

In a side to side box it would be easy to see because that left side would be longer than the right side.3110

Probably even within the box you would have that left box being bigger than the right box.3119

Remember when we draw up and down the bottom end would be longer than the top end.3127

Here that is the positive end.3140

This side would be longer as well.3144

Skewed left distributions we know is either like this or like this.3149

Skewed right distributions we know might be one of this.3154

What about approximately normal distributions?3157

When you draw on each side you know that it has to be roughly symmetrical because it is approximately normal.3166

That part is easy.3175

Another thing about normal distributions is that most of the values 60% are clustered within that small space.3177

We are 1 standard deviation away on either side.3190

We could guess that these little tails may be longer than the actual box.3194

That is how it look but it is roughly symmetrical.3205

It is easy to draw that on this side as well because it is roughly symmetrical.3210

Example 3, how can you estimate the IQR from the box plot?3225

The IQR is easy to see on a box plot because whatever the box is the bottom end of that is q1 and the top is q3.3233

That is your IQR.3246

Can you estimate the range?3249

If so, how?3255

The range is your min and max values.3256

It is the length of that entire box and whiskers and that would be your range.3260

IQR is easy to see on a box plot.3268

Example 4, is it possible for a box plot to be missing a whisker?3275

If so, give an example.3282

If not, explain why not.3284

Let us think about this.3286

We know that when it is skewed, let us say skewed right, we know that this side of whisker can be very small and this side can be very long.3287

What would it mean for not to have a whisker at all?3302

One thing might be that you might have so many values that are exactly the same so you cannot split it up in a whisker and box.3306

For instance, let me give you an example of data set that might do so.3318

Let us say it is out of 8 values.3323

I just picked 8 because it is easy to split into quartiles.3330

What is all of the values in both q1 and q2 are exactly the same?3334

Then you could not have whiskers because it would be arbitrary like these 0 get whiskers and this is supposed to be in the box.3343

You probably put them all on the same box.3359

This would be a very thin box.3362

That is an example.3365

Obviously you could do it on the other side as well.3369

This might be an example of a left skewed distribution that does not have a whisker.3373

It is something like 4, 4, 3.3386

In these kinds of plots, it is hard to say what the IQR would be.3401

It would be difficult to arbitrary say 4 is the boundary for the whisker as well as the box.3409

It cannot be both.3420

These are the kinds of distributions that we would be missing a whisker.3421

That is the end of box and whisker plots.3429

Thanks for using