For more information, please see full course syllabus of Statistics

For more information, please see full course syllabus of Statistics

## Discussion

## Download Lecture Slides

## Table of Contents

## Transcription

## Related Books

### Five Number Summary & Boxplots

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

- Intro
- Roadmap
- Summarizing Distributions
- Boxplot: Visualizing 5 Number Summary
- Boxplots on Excel
- When are Boxplots Useful?
- How to Determine Outlier Status
- Rule of Thumb: Upper Limit
- Rule of Thumb: Lower Limit
- Signal Outliers in an Excel Data File Using Conditional Formatting
- Modified Boxplot
- Example 1: Percentage Values & Lower and Upper Whisker
- Example 2: Boxplot
- Example 3: Estimating IQR From Boxplot
- Example 4: Boxplot and Missing Whisker

- Intro 0:00
- Roadmap 0:06
- Roadmap
- Summarizing Distributions 0:37
- Shape, Center, and Spread
- 5 Number Summary
- Boxplot: Visualizing 5 Number Summary 3:37
- Boxplot: Visualizing 5 Number Summary
- Boxplots on Excel 9:01
- Using 'Stocks' and Using Stacked Columns
- Boxplots on Excel Example
- When are Boxplots Useful? 32:14
- Pros
- Cons
- How to Determine Outlier Status 33:24
- Rule of Thumb: Upper Limit
- Rule of Thumb: Lower Limit
- Signal Outliers in an Excel Data File Using Conditional Formatting
- Modified Boxplot 48:38
- Modified Boxplot
- Example 1: Percentage Values & Lower and Upper Whisker 49:10
- Example 2: Boxplot 50:10
- Example 3: Estimating IQR From Boxplot 53:46
- Example 4: Boxplot and Missing Whisker 54:35

### General Statistics Online Course

### Transcription: Five Number Summary & Boxplots

*Hi and welcome back to www.educator.com.*0000

*Today we are going to talk about the five number summary and box plots.*0002

*Now that you know a little bit about variability now we can talk about the 5 number summary*0005

*and the box plot is going to be a way of visualizing that 5 number summary.*0013

*We are going to talk about how to determine outliers and we are going to be using conditional formatting in Excel.*0019

*Finally we are going to be talking about modified box plots which are going to be box plots where we exclude the outliers already.*0027

*There are 2 ways to summarize a distribution.*0037

*We have been talking about on way when we talk about shape, center, and spread.*0042

*Usually when we talk about shape, center, and spread we often use as a measure of mean.*0048

*Mean is what sometimes mean by center.*0056

*It could be median but a lot of times people use mean.*0059

*When we use spread we often mean standard deviation rather than other pairs such as median and inter quartile range.*0063

*It is where the 5 number summary comes in.*0072

*The 5 number summary is a way of using the median and the inter quartile range more as a summary of how our distribution looks.*0077

*The 5 numbers are number you already know.*0089

*The minimum value, the q1, the border of the first quartile, the median which is q2 it divides the entire distribution in half.*0093

*Two quartiles on one side and two quartiles on the other side,*0112

*Q3 the third border and the maximum value, the highest value in your distribution.*0115

*This 5 number summary is often used in skewed distributions.*0124

*Let me show you visually what this looks like.*0128

*Let us say we have a whole bunch of different data points and here is our maximum value.*0131

*Here is our distribution so far.*0142

*The 5 number summary will basically say the minimum value is important, the maximum value is important.*0149

*Whatever the value in the middle is.*0160

*Q1 and q3 which would be dividing this by half and also dividing this by half.*0162

*Here is min, q1, q2, median, q3, and max.*0182

*In that way we get a 5 number summary.*0206

*We may not know our little dots in our distribution but we have a general idea of it.*0210

*Now we are going to talk about a way to visualize that 5 number summary just quickly.*0216

*It is going to be called the box plot.*0222

*It is also called the box and whiskers plot.*0224

*It is called box and whiskers because often times there is a box, it could also be on its side like this with whiskers on it.*0227

*Often times this box is split up somewhere in between not necessarily in the middle but just somewhere like that.*0240

*This box is often aligned with an axis of values of your variable.*0248

*Your q1 is on the low end, q3 is on the high end.*0256

*Those are the boundaries of that box.*0270

*The line in the middle is the median.*0274

*The whiskers and at the minimum value and the maximum value.*0276

*That is how you decipher a box plot.*0283

*It could also be on its side like this.*0288

*In this case the values will go like this.*0291

*Once again you have q1, q3, median, min, and max.*0297

*It does not matter whether it is on this axis or this axis but what you do have to know is the box should be in alignment with the values.*0310

*Let us do an example box plot here just by hand.*0321

*Here is an example set and we can easily see the minimum and the maximum.*0328

*We already know that.*0334

*Let us find out what the median is.*0336

*The median should be right here.*0338

*That should be 30 because what I am going to do is add 28 and 32 and divde by 2.*0341

*Just to find the average of those two numbers and that is 30.*0351

*Visually I just think about what is in the middle of 28 and 32.*0355

*Let us find q1 and q3.*0359

*What is in between here and here?*0364

*That is 20.*0366

*We count the 18 as the border, 30 as the border.*0386

*Q1 is right in here so that should be 22 and here 30 is the border, 55 it should be right here.*0393

*Here is q3 and that should be about 36.5.*0409

*There we go we have our 5 numbers and we just need to plot them.*0425

*I am going to put it on a horizontal axis just because I have no room for it.*0431

*I know that 18 will be on one side of it and 55 will be on the other side.*0439

*Here I will put my values.*0445

*We could pretend this stan ds for age.*0447

*Maybe I will start from 0 and divide this up into 10’s.*0453

*10, 20, 30, 40, 50, 60.*0466

*Let us start plotting our box and whiskers.*0474

*What I like to start to do is put dots there and draw in the box and whiskers.*0479

*Here I put in a dot in one of my whiskers and here is the other end of that.*0484

*Q1 is here about 30 is right here the median and 36.5 is right here.*0492

*I know where my boundaries are.*0508

*I will just draw my box and whiskers.*0512

*To help myself I might just write down where q1 and q3 is because that is going to help me draw my box.*0514

*here is my median, min, max.*0524

*This would be a box and whisker plot or a box plot for this data set that we have here.*0534

*Now let us talk about making a box plot in Excel.*0540

*In Excel it is not a great program for using box plots.*0544

*If you have Sdata or other statistics programs those are better options.*0550

*For some of you, you want to make a box plot in Excel so I am going to show you 2 ways of doing it.*0555

*One method is by using stocks.*0561

*This is a stocks chart and it is not originally designed for plotting median, q3, min, and max.*0564

*It is for stock prices and opening and closing prices.*0573

*We are going to piggy back on it in order to create our box plots.*0578

*The trouble with that one is it does not allow to show you the median.*0584

*This ends up being a 4 number summary.*0588

*The other option is by using stock columns.*0592

*In this option you can show the median but it takes a little more filling with it.*0603

*I am tricking Excel into showing you something that looks like a box plot.*0610

*Let us get started.*0614

*Go ahead and download the Excel file and you will see there is a whole bunch of different sheets that are already pre put in for you.*0617

*Some of them, the ones that starts with a and s are pre answered.*0629

*They already have the graphs and everything in there but you could also follow along.*0634

*If you go to height stock, it is going to plot height on a stock chart.*0638

*The nice thing about box plots is that you could have several box plots on one visualization.*0645

*That is what is nice about being able to compare a couple of 5 number summary.*0654

*Let us go ahead and find the critical numbers we need for male and female height.*0661

*The thing about stock chart is that they need you to put in their numbers in a particular order.*0668

*This is the order you need.*0677

*You need q1, max, min, and q3.*0679

*This one is hard for me to remember because it is very arbitrary.*0683

*But because we are piggy backing of stocks.*0693

*It not originally meant for this use.*0697

*Let us find the boundary of the q1.*0700

*In order to do that, Excel has a nice function for us called quartile and here is what you have to put in quartile.*0706

*The input that it receives is the array or your data set as well as which quartile you would like.*0715

*In Excel, q1 is 1 so you would put in your data,1.*0722

*Let us go ahead and do that.*0729

*Go to data all the way up to height.*0731

*There is male height and I will color it in blue.*0735

*Go ahead and select that data then I am going to put in ,1).*0739

*Let us look at what we have here.*0749

*We have this data,1.*0751

*Let us lock that data in place because we do not need that to move in any time soon.*0753

*I am going to hit enter.*0758

*That is for q1.*0763

*In order to find the median it would be q2 or median.*0766

*In order to find the boundary for q3 it would just be the same thing with the 3.*0770

*Let us go ahead and copy and paste that function into my q3 slot.*0776

*All I have to do is change which quartile that I want it to get back at me.*0783

*There I have my first and third quartiles.*0789

*The nice thing about this quartile function is that it can also find you the min and max values because it considers the min value q0.*0794

*It considers the max value q4.*0800

*You could also use the functions min and max.*0807

*Let us go ahead and use quartile.*0810

*It makes it easier.*0812

*I could just copy and paste all of that in there.*0813

*I am going to double click on that maximum and put in 4.*0816

*I am going to double click on the min quartile and put in 0.*0821

*There we go.*0828

*We now have q1, max, min, and q3.*0830

*Let us find the female q1.*0835

*Let us go ahead and find our data.*0840

*I am just going to color it in pink.*0844

*I am going to put in ,1).*0847

*Let us see what we have here.*0857

*You double click on it we have our data and q1.*0859

*I am just going to lock that data in place so I could easily copy and paste that all the way down.*0862

*Let us change it for our max value.*0874

*For our max value we need q4.*0879

*For our min value we need q0.*0882

*For our q3 we need q3.*0886

*Notice that for females the max is slightly smaller.*0891

*The min is slightly smaller.*0898

*It is q1 and q3 for its male counter part.*0899

*Let us go ahead and put in our box and whiskers plot.*0903

*Go ahead and select the male and female part of this data set.*0909

*When you do that, Excel will interpret it as you want it to be the labels for your box and whisker plots.*0915

*Click on charts and if you click on stocks you will see something that looks like a box and whisker plot.*0923

*You could go ahead and click on it but some of you may get an error signal that looks like this.*0934

*To create a stock chart, arrange the data on your sheet in this order opening price, high price, low price, closing price.*0945

*Use data stock names for labels.*0954

*We have done this already.*0956

*It has the opening price or q1, high price is max, low price is min, closing price is q3.*0958

*Yet Excel does not recognize that we have done the right thing.*0967

*The reason is because Excel is not seeing that we organized our data into our columns instead of rows.*0971

*Here is one thing you might want to do.*0980

*Just go ahead and pick column and you will see that Excel is treating each row as if it is a data set.*0983

*What we want to tell Excel is stop doing that instead organize it this way so that it is grouping them into columns.*0998

*I am going to move this formatting palette out of the way and now if we go back to stock and click on stocks it should recognize it.*1014

*It is not elegant but that is the way Excel is sometimes.*1031

*You have to force it to do what you want.*1036

*I am going to delete this series because now it is treating each of these 4 rows as a series but we do not need that information.*1038

*We do not need to know how they go across.*1049

*We just need to know how they go up and down.*1055

*I am just going to change my formatting here so you could see it a little more clearly.*1057

*White is might be a little tough to see.*1063

*Feel free to pick any color you want.*1065

*Here is our box plot or classic box plot.*1082

*You will see that our values are clustered around this end and one thing I want to do is modify my y axis*1087

*so that I could stretch out the values that are important to us.*1098

*The way that I will do that is go to scale and put in my min value as 50 instead of 0.*1105

*When you do that it will stretch this part of from 50 to 80 instead of from 0 to 80.*1115

*Now we could see our box plot.*1123

*Here is our min for males which is below 62.*1127

*The max is 75.*1133

*Q1 which starts at 67.*1135

*Q3 which ends at 71.*1139

*Here is an important thing about a box plot.*1145

*Before in frequency distributions, we always have a value on the bottom.*1148

*Height would be on the bottom.*1154

*Now height in inches is shown on the y axis.*1156

*Because that is an important distinction, I am going to label my vertical axis to say height.*1162

*Now we could remember this is where height is.*1181

*Notice that frequency is not shown here.*1189

*We do not know exactly how many people have a height of 70 or 75.*1192

*We only know that those are the boundaries.*1198

*There is one thing we do know, we know that these are split up into quartiles.*1201

*This represents the quartile from 0 to 1.*1208

*This represents the quartile as well, the border from q3 to the end.*1213

*About 25% of our participants have heights in this range.*1221

*Another 25% have height in this range.*1226

*Using our powers of deduction how many people are in this range?*1230

*50%.*1236

*If we had a median line, we would also know where our other two quartiles break up.*1237

*Unfortunately, in this kind of formatting with the stock plots we do not show you the median.*1242

*In order to see the median because it is an important number sometimes, let us go ahead and use the stock columns option.*1250

*Go ahead and click on height stock column.*1260

*Here I have set it up so that now it is in a format that makes sense for me.*1264

*Something like min, q1, median, q3, and max.*1271

*That is the order that makes more sense to me or you could order that as a way max, q3, median, so on.*1277

*The reason I have done that is because it is going to make it a little bit easier for us to create our box plot if it is in order.*1285

*Let us go ahead and find our quartiles.*1295

*Let us find our data and put that in for males ,0 because we are starting at 0) and enter.*1301

*Notice that this is not new.*1319

*Instead of putting in 0 here is what I am going to do now.*1324

*I am going to click on this 0 because this way the more I make things into formulas the more I can just cut and paste.*1328

*The nice thing about clicking on this 0 is that because I have not put in any $ around it, it is a relative reference.*1340

*If I cut and paste this down one it will cut and paste this one down as well.*1350

*It will refer to the next one down.*1355

*I do not want my data to move down.*1357

*Let us go ahead and lock down in place and hit enter.*1360

*Once I have that I could just drag this down, copy and paste as we go.*1367

*We have all 5 of our values for our 5 numbers summary.*1374

*We did not have to do anything.*1378

*We could see that they are in order from least to greatest.*1380

*Let us do the same for females.*1385

*We have quartile, find our data and put in ,0) and hit enter.*1387

*All I am going to do is lock my data in place.*1411

*Sometimes I forget copying and pasting this.*1416

*I could just copy and paste that all the way down.*1419

*So far that part has been easy.*1426

*Here is where we get a little more complicated in order to create stocks box plots.*1429

*If you go to charts and go to columns you are going to be using the stock column.*1438

*In order to use stock column, one thing you have to do is find out the distance between each of these things.*1446

*In stock columns, what we will do is to put in a little box on top of another little box.*1454

*Each of those boxes represents the distance from the previous box to the next box.*1465

*We have to change this into distances.*1470

*I am just going to put in here distance for males and for females.*1474

*The first one is the distance from 0.*1491

*It is just 1.*1497

*The second one is the distance from this value to this value.*1500

*Once I have that I can just copy and paste that all the way down because this will give us the distance between this two.*1507

*Here we have the distance between these two.*1517

*In order to do that for females all I have to do is copy and paste that over because Excel will do it for the column on the right.*1522

*Now we have our distances.*1535

*Because of that we could just take that and put on column.*1538

*Notice that right now it is putting each row as a column, it is not what we want.*1551

*We want each column to be a column.*1559

*I am going to tell Excel organize it this way.*1562

*I will leave this one aside.*1571

*Feel free to use whatever coloring scheme you would like just keep it consistent I am going with orange.*1576

*Here is what we need to do now.*1598

*This does not look like a box and whisker plot yet.*1602

*In fact there are no whiskers.*1606

*That is why we need to fudge a little bit.*1609

*One thing we have to do is start by getting rid of this block.*1612

*We do not want to hit delete because that will just delete and ignore that data.*1617

*That is not what we wanted to do.*1624

*I want to look at these things as floating up here.*1625

*Instead what I am going to do is tell Excel color these clear.*1629

*I am just double clicking on this and fill and no fill.*1634

*It is invisible now.*1640

*In terms of line, I think there might be no line.*1643

*Even though it is there, you could find it.*1649

*It is just invisible because we do not need that part.*1658

*It is just the distance from 0 all the way to our min value.*1663

*Here we want this to be our low whisker.*1667

*Here is what we do.*1672

*We ask Excel to make this one invisible then we use error bars.*1673

*Error bars usually start at the top of that box.*1689

*Because we are on the top of the box we like for a bar just to to go through a box entirely.*1694

*We put a –errorbar and we ask for it to be 100% of our box.*1702

*When I do so, you see this little line here that is 100% of my box.*1712

*This is an Excel cheat.*1719

*We cout hit okay and we have our low end whisker.*1722

*We do not need to do anything with that.*1729

*We could leave that because that is our q1 to median box.*1733

*If you hit the error button it will go to the next box up.*1739

*This is the box that goes from median to q3 and this top one as well this is the box that goes from q3 to max.*1746

*We need to click on that and make it invisible and get that an error bar as well.*1761

*That is 100%.*1770

*In some of your Excel options you might be able to get rid of these little feet so that it looks like little bars.*1773

*In newer versions of Excel that is the case.*1790

*That is what I usually do because I think it looks ugly to have those little seats there.*1793

*That is not a big deal to have a feet.*1798

*Let us change our y axis so that we could see our box plot a little better and just hit scale.*1803

*I just double clicked on it and I am going to make my min value 50 and hit okay.*1816

*I do not want it to say distance of males instead I want it to say just males.*1825

*I am going to change this into males.*1834

*Excel will do this automatically for me.*1838

*Finally what I am going to do is add a little label to my vertical axis so that I remember the height and inches is being shown here.*1847

*Here we could see male that the quartiles here split up quite nicely.*1864

*But the quartiles here seem to be that there is a greater range of low end than the higher end.*1871

*The spread is bigger between q1 and median than the median and q3.*1887

*Our 2nd quartile is slightly fatter than our 3rd quartile.*1896

*But our 1st and last quartiles are even bigger than these two.*1901

*Although that is a bit of a fudge, you could see we could use stock columns in order to create box plots as well.*1908

*That is what I mean by slightly idiosyncratic.*1921

*It is not the best way but it can be done.*1930

*Let us talk about box plots are useful.*1932

*The pro’s of box plots is that you could plot a single quantitative variable and you can compare it to 2 or more distributions easily.*1938

*In fact box plots are useful for comparing lots of distributions very quickly because it is a good visualization.*1947

*It is very compact.*1954

*You put it all in one graph and it is easy to compare the boxes.*1956

*When distributions has many values, too many to show individually, you would not want to use a stem plot or a dot plot.*1962

*You would want to use something like a box plot.*1972

*Also it is nice when you do not want to see anything more than a 5 number summary.*1974

*One of that cons is that you cannot see frequency.*1979

*If you cannot see how many people have a height of 72 but you can see a frequency within a range.*1983

*You know that 50% of the people lie in between q1 and q3.*1993

*You can see that easily on a box plot.*1999

*How do we determine whether we have outliers in our data set?*2002

*Outliers are extreme values but that is a subjective way of defining it.*2012

*How do we know how extreme?*2020

*Who defines extreme?*2021

*We have a rule of that in statistics.*2024

*Typically we determine outliers by saying that an outlier is 1.5 times the inter quartile range from the nearest quartile.*2027

*What we mean is the upper limit is the nearest quartile is q3 and you add 1.5 times the inter quartile range whatever that is.*2038

*That is the upper limit boundary.*2053

*The lower limit boundary would be from the q1 boundary and because we are going lower*2056

*we are subtracting 1.5 times the inter quartile range.*2064

*Sometimes in skewed distributions where the 5 number summary is useful you might not need one of these.*2070

*Because one side might not have outliers only the other side does.*2078

*This right skewed we have the upper limit boundary.*2085

*If it is left skewed we have a lower limit boundary.*2088

*We are going to learn how to signal outliers in Excel data file by using conditional formatting.*2092

*Go ahead and click on your example spread sheet again Excel file.*2100

*Let us look at photos of stocks.*2110

*Photos if you remember from our previous frequency distribution visualizations that tends to be a skewed distribution.*2121

*It tends to be that photos are skewed to the right where there are some people that have an extremely high number of photos on one end.*2133

*Let us determine what the outlier boundaries are for tagged photos.*2148

*In order to do that, we might want to start off by finding q1, max, min, and q3.*2154

*Let us go ahead and put in quartile and put in our data for tagged photos.*2161

*I am just going to put in ,1 and hit enter.*2181

*Our first quartile is going to be locked in place.*2188

*Our 1st quartile is 53.75 that is the boundary.*2198

*I am just going to copy and paste that all the way down.*2207

*I will just change the quartile that we are referencing.*2210

*We need the max here so that is 4.*2213

*We need the min here which is 0.*2216

*We need q3.*2221

*Notice that q3 is around 250 but the max value is 4,686.*2224

*I am guessing we have a few outliers here but to find outliers we need to find the inter quartile range.*2237

*The inter quartile range is easy once you have q3 and q1 because it is simply q3 – q1.*2246

*Instead of having max and min let us modify this.*2259

*Let us modify it to be the upper and lower limit.*2282

*Instead of a classic box plot, we are going to be looking at a modified box plot.*2287

*Instead of this max and min we are going to be putting our formulas for the upper and lower limit.*2299

*The min value we probably would not need it because 13 is going to be within the boundary*2311

*like if we made q1 – 1.5 × 200 that would be way lower than our actual min value.*2323

*We do not need to modify that one.*2331

*Let us modify this one.*2333

*Here what we do is take our other boundary + 1.5 × IQR.*2337

*Excel knows order of operations so it will do the multiplication before it does the addition.*2349

*Let us hit enter.*2356

*Notice that our outer boundary is a lot smaller than our max value.*2358

*It is 549.375.*2367

*It would be nice if we could go back to our data and see quickly who is in that range and who is not.*2372

*Who is out of that range or my outliers?*2380

*If we go out to the data, what I am going to show you is something called conditional formatting.*2384

*Go ahead and select all the data that we want Excel to look at.*2390

*What we are going to let Excel do is color these data points if they are outliers.*2397

*In order to do that, after you select is go to your menu which we cannot see here and hit format.*2405

*There should be an option that is conditional formatting.*2415

*Go ahead and click on that.*2419

*You should get a box that looks like this.*2421

*Here we want to say if the cell value is greater than something then format it differently.*2423

*Let us see what our other boundary is again.*2436

*We go to photos and fortunately it would let you put enough formula unless it is in the same work sheet.*2440

*We just have to put it in manually.*2450

*If it is greater that 549.375 then format it differently.*2452

*Hit format.*2461

*One thing I might want to do is just color it in a different color.*2463

*I think I will put in something like red.*2469

*Color it red if it is an outlier and hit okay.*2472

*If you look through this data file you will see we already have 2, 3, 4 outliers.*2477

*We have 7 outliers in our sample of 100.*2490

*We know that those 7 or so people are not going to be acquitted.*2497

*In order to do a modified box plot, all we would do is plot these values instead of the max and min.*2504

*Here I am going to select these and use stock option.*2514

*Let us see if it will do for me on the first try.*2524

*Unfortunately no.*2528

*It gives me this error.*2530

*Instead I am just going to put something else just so that I could tell Excel please organize it differently.*2533

*I am going to go back to my stock.*2542

*I am going to color it something different.*2548

*I am going to delete this series because it is redundant.*2557

*Here are our tagged photos and what we see is this range is a lot bigger even though there is*2570

*the same 25% of people in there as well as in here.*2579

*It is typically what it looks like for skewed distribution.*2583

*A skewed distribution will have a small whisker on one end and a large whisker on the other end.*2587

*Because even though it has the same number of people as there are in this little whisker down here,*2592

*this one shows you there is a wider range that captures that same 25% of people.*2602

*This is a right skewed distribution because the top whisker is bigger than the bottom whisker.*2609

*One thing we might want to do is label our vertical axis to show number of photos.*2617

*Just so that we remember we are not talking about frequency of people here.*2629

*Let us use that same data in order to create stock columns.*2636

*Go ahead and put in stock columns and instead of a classic box plot we are going to be using a modified box plot.*2650

*Let us put in our quartiles and data right here.*2660

*Go back and put in this one.*2676

*Let us see what we have.*2681

*We know we want to lock in this data in place and hit enter.*2685

*I could just drag that down.*2693

*We want to change this max value to be modified.*2695

*In order to do that we have to find IQR which is q3 – q1 and delete that and put in =.*2704

*My q3 + 1.5 × IQR and enter.*2725

*Let us find the distances between these.*2735

*The first one I do not have to do anything with that because it is the distance from 0.*2741

*The next one I just subtract c3 from c4 and copy and paste that all the way down.*2745

*Let us plot these distances.*2757

*I am going to select all of that and hit charts, go to columns and choose stock columns.*2764

*Unfortunately it is not stocking them.*2772

*Here on my toolbar I am going to tell it organize differently and delete my series here because this is not what I need.*2776

*Now we have to do that fancy formatting.*2792

*The first one we know we just need to make it clear or transparent.*2797

*The second one we know we need to make it transparent as well as putting in a –errorbar 100% of that box.*2805

*Up here we also need that –errorbar 100% of my box with no fill.*2819

*Hit okay.*2830

*Here is what we have.*2833

*We should probably change that distance because it does not have to have distance.*2836

*We should put something like tagged photos and on the side we will change our vertical axis so that it reads number of photos.*2842

*We know that is our values.*2856

*There we go we have a modified box plot.*2859

*It is no longer from 0 all the way to 5,000.*2863

*It is form 0 to about 600.*2868

*It is still a skewed distribution even when we top off those crazy outliers we still have our whiskers on top that is a lot longer*2872

*than our whiskers on the bottom even though they both indicate the same 25% of our 100% sample.*2889

*I am going to minimize all this and get rid of that.*2898

*Now we know how to create conditional formatting so that we could signal these people are outliers.*2906

*We are not including them in our visualization.*2916

*We just went over modify box plots.*2919

*The classic box plots relies on the max and min but those are very susceptible to outliers.*2924

*If you have a couple of crazy outliers that would drastically change our visualization.*2930

*The modify box plots use those modified boundaries that is 1.5 times the IQR.*2936

*This is helpful to highly skewed distributions as we saw with tagged photos.*2943

*Let us go into some examples.*2949

*Approximately what percentage of values in the data set lie within the box?*2953

*That box with the whiskers.*2960

*What percentage of the data lies in there?*2965

*We know that this is 25% of our sample as well as this also 25%.*2967

*Each of these quartiles are also 25% so we know that what is in here is 50% of our sample.*2978

*The lower whisker 25% and the upper whisker 25% and that is one of the reasons why box plots are useful.*2991

*It breaks up that data easily and do chunks that we could use.*3004

*Example 2, I want a box plot that looks for data set that is skewed right.*3010

*In a skewed right distribution we have a few of these outliers that are very positive.*3018

*What would our box plot look like?*3034

*For data set that is skewed right, which I write in red, if we have a box plot that looks like this*3037

*we know that the right side would be much longer than the left side.*3053

*Probably the right box would be longer than the left box.*3060

*When it is up and down remember this side the top part is the greater side it would just leave this shoulder which is spread around.*3065

*This upper side is longer than the lower side.*3088

*What about those populations that are skewed left?*3092

*In that case smaller numbers there are more outliers than the smaller end of the continuum.*3097

*Skewed left.*3106

*In a side to side box it would be easy to see because that left side would be longer than the right side.*3110

*Probably even within the box you would have that left box being bigger than the right box.*3119

*Remember when we draw up and down the bottom end would be longer than the top end.*3127

*Here that is the positive end.*3140

*This side would be longer as well.*3144

*Skewed left distributions we know is either like this or like this.*3149

*Skewed right distributions we know might be one of this.*3154

*What about approximately normal distributions?*3157

*When you draw on each side you know that it has to be roughly symmetrical because it is approximately normal.*3166

*That part is easy.*3175

*Another thing about normal distributions is that most of the values 60% are clustered within that small space.*3177

*We are 1 standard deviation away on either side.*3190

*We could guess that these little tails may be longer than the actual box.*3194

*That is how it look but it is roughly symmetrical.*3205

*It is easy to draw that on this side as well because it is roughly symmetrical.*3210

*Example 3, how can you estimate the IQR from the box plot?*3225

*The IQR is easy to see on a box plot because whatever the box is the bottom end of that is q1 and the top is q3.*3233

*That is your IQR.*3246

*Can you estimate the range?*3249

*If so, how?*3255

*The range is your min and max values.*3256

*It is the length of that entire box and whiskers and that would be your range.*3260

*IQR is easy to see on a box plot.*3268

*Example 4, is it possible for a box plot to be missing a whisker?*3275

*If so, give an example.*3282

*If not, explain why not.*3284

*Let us think about this.*3286

*We know that when it is skewed, let us say skewed right, we know that this side of whisker can be very small and this side can be very long.*3287

*What would it mean for not to have a whisker at all?*3302

*One thing might be that you might have so many values that are exactly the same so you cannot split it up in a whisker and box.*3306

*For instance, let me give you an example of data set that might do so.*3318

*Let us say it is out of 8 values.*3323

*I just picked 8 because it is easy to split into quartiles.*3330

*What is all of the values in both q1 and q2 are exactly the same?*3334

*Then you could not have whiskers because it would be arbitrary like these 0 get whiskers and this is supposed to be in the box.*3343

*You probably put them all on the same box.*3359

*This would be a very thin box.*3362

*That is an example.*3365

*Obviously you could do it on the other side as well.*3369

*This might be an example of a left skewed distribution that does not have a whisker.*3373

*It is something like 4, 4, 3.*3386

*In these kinds of plots, it is hard to say what the IQR would be.*3401

*It would be difficult to arbitrary say 4 is the boundary for the whisker as well as the box.*3409

*It cannot be both.*3420

*These are the kinds of distributions that we would be missing a whisker.*3421

*That is the end of box and whisker plots.*3429

*Thanks for using www.educator.com.*3433

1 answer

Last reply by: Ryan Reddell

Tue Apr 29, 2014 11:36 PM

Post by KyungYeop Kim on July 15, 2013

Does anyone know the website that was once introducted where you can see all the cool statistics expressed in circles??

0 answers

Post by Paulette Jones on May 13, 2013

I think you sound fine - you're very approachable and positive. Only question - what version is this of Excel?

1 answer

Last reply by: Manoj Joseph

Wed May 1, 2013 6:15 AM

Post by Manoj Joseph on May 1, 2013

are you saying meadian is also measure of spread?