### Five Number Summary & Boxplots

- Intro 0:00
- Roadmap 0:06
- Roadmap
- Summarizing Distributions 0:37
- Shape, Center, and Spread
- 5 Number Summary
- Boxplot: Visualizing 5 Number Summary 3:37
- Boxplot: Visualizing 5 Number Summary
- Boxplots on Excel 9:01
- Using 'Stocks' and Using Stacked Columns
- Boxplots on Excel Example
- When are Boxplots Useful? 32:14
- Pros
- Cons
- How to Determine Outlier Status 33:24
- Rule of Thumb: Upper Limit
- Rule of Thumb: Lower Limit
- Signal Outliers in an Excel Data File Using Conditional Formatting
- Modified Boxplot 48:38
- Modified Boxplot
- Example 1: Percentage Values & Lower and Upper Whisker 49:10
- Example 2: Boxplot 50:10
- Example 3: Estimating IQR From Boxplot 53:46
- Example 4: Boxplot and Missing Whisker 54:35

*Hi and welcome back to www.educator.com.*0000

*Today we are going to talk about the five number summary and box plots.*0002

*Now that you know a little bit about variability now we can talk about the 5 number summary *0005

*and the box plot is going to be a way of visualizing that 5 number summary.*0013

*We are going to talk about how to determine outliers and we are going to be using conditional formatting in Excel.*0019

*Finally we are going to be talking about modified box plots which are going to be box plots where we exclude the outliers already.*0027

*There are 2 ways to summarize a distribution.*0037

*We have been talking about on way when we talk about shape, center, and spread.*0042

*Usually when we talk about shape, center, and spread we often use as a measure of mean.*0048

*Mean is what sometimes mean by center.*0056

*It could be median but a lot of times people use mean.*0059

*When we use spread we often mean standard deviation rather than other pairs such as median and inter quartile range.*0063

*It is where the 5 number summary comes in.*0072

*The 5 number summary is a way of using the median and the inter quartile range more as a summary of how our distribution looks.*0077

*The 5 numbers are number you already know.*0089

*The minimum value, the q1, the border of the first quartile, the median which is q2 it divides the entire distribution in half.*0093

*Two quartiles on one side and two quartiles on the other side, *0112

*Q3 the third border and the maximum value, the highest value in your distribution.*0115

*This 5 number summary is often used in skewed distributions.*0124

*Let me show you visually what this looks like.*0128

*Let us say we have a whole bunch of different data points and here is our maximum value.*0131

*Here is our distribution so far.*0142

*The 5 number summary will basically say the minimum value is important, the maximum value is important.*0149

*Whatever the value in the middle is.*0160

*Q1 and q3 which would be dividing this by half and also dividing this by half.*0162

*Here is min, q1, q2, median, q3, and max.*0182

*In that way we get a 5 number summary.*0206

*We may not know our little dots in our distribution but we have a general idea of it.*0210

*Now we are going to talk about a way to visualize that 5 number summary just quickly.*0216

*It is going to be called the box plot.*0222

*It is also called the box and whiskers plot.*0224

*It is called box and whiskers because often times there is a box, it could also be on its side like this with whiskers on it.*0227

*Often times this box is split up somewhere in between not necessarily in the middle but just somewhere like that.*0240

*This box is often aligned with an axis of values of your variable.*0248

*Your q1 is on the low end, q3 is on the high end.*0256

*Those are the boundaries of that box.*0270

*The line in the middle is the median.*0274

*The whiskers and at the minimum value and the maximum value.*0276

*That is how you decipher a box plot.*0283

*It could also be on its side like this.*0288

*In this case the values will go like this.*0291

*Once again you have q1, q3, median, min, and max.*0297

*It does not matter whether it is on this axis or this axis but what you do have to know is the box should be in alignment with the values.*0310

*Let us do an example box plot here just by hand.*0321

*Here is an example set and we can easily see the minimum and the maximum.*0328

*We already know that.*0334

*Let us find out what the median is.*0336

*The median should be right here.*0338

*That should be 30 because what I am going to do is add 28 and 32 and divde by 2.*0341

*Just to find the average of those two numbers and that is 30.*0351

*Visually I just think about what is in the middle of 28 and 32.*0355

*Let us find q1 and q3.*0359

*What is in between here and here?*0364

*That is 20.*0366

*We count the 18 as the border, 30 as the border.*0386

*Q1 is right in here so that should be 22 and here 30 is the border, 55 it should be right here.*0393

*Here is q3 and that should be about 36.5. *0409

*There we go we have our 5 numbers and we just need to plot them.*0425

*I am going to put it on a horizontal axis just because I have no room for it.*0431

*I know that 18 will be on one side of it and 55 will be on the other side.*0439

*Here I will put my values.*0445

*We could pretend this stan ds for age.*0447

*Maybe I will start from 0 and divide this up into 10’s.*0453

*10, 20, 30, 40, 50, 60.*0466

*Let us start plotting our box and whiskers.*0474

*What I like to start to do is put dots there and draw in the box and whiskers.*0479

*Here I put in a dot in one of my whiskers and here is the other end of that.*0484

*Q1 is here about 30 is right here the median and 36.5 is right here.*0492

*I know where my boundaries are.*0508

*I will just draw my box and whiskers.*0512

*To help myself I might just write down where q1 and q3 is because that is going to help me draw my box.*0514

*here is my median, min, max.*0524

*This would be a box and whisker plot or a box plot for this data set that we have here.*0534

*Now let us talk about making a box plot in Excel.*0540

*In Excel it is not a great program for using box plots.*0544

*If you have Sdata or other statistics programs those are better options.*0550

*For some of you, you want to make a box plot in Excel so I am going to show you 2 ways of doing it.*0555

*One method is by using stocks.*0561

*This is a stocks chart and it is not originally designed for plotting median, q3, min, and max.*0564

*It is for stock prices and opening and closing prices.*0573

*We are going to piggy back on it in order to create our box plots.*0578

*The trouble with that one is it does not allow to show you the median.*0584

*This ends up being a 4 number summary.*0588

*The other option is by using stock columns.*0592

*In this option you can show the median but it takes a little more filling with it.*0603

*I am tricking Excel into showing you something that looks like a box plot.*0610

*Let us get started.*0614

*Go ahead and download the Excel file and you will see there is a whole bunch of different sheets that are already pre put in for you.*0617

*Some of them, the ones that starts with a and s are pre answered.*0629

*They already have the graphs and everything in there but you could also follow along.*0634

*If you go to height stock, it is going to plot height on a stock chart.*0638

*The nice thing about box plots is that you could have several box plots on one visualization.*0645

*That is what is nice about being able to compare a couple of 5 number summary.*0654

*Let us go ahead and find the critical numbers we need for male and female height.*0661

*The thing about stock chart is that they need you to put in their numbers in a particular order.*0668

*This is the order you need.*0677

*You need q1, max, min, and q3.*0679

*This one is hard for me to remember because it is very arbitrary.*0683

*But because we are piggy backing of stocks.*0693

*It not originally meant for this use.*0697

*Let us find the boundary of the q1.*0700

*In order to do that, Excel has a nice function for us called quartile and here is what you have to put in quartile.*0706

*The input that it receives is the array or your data set as well as which quartile you would like.*0715

*In Excel, q1 is 1 so you would put in your data,1.*0722

*Let us go ahead and do that.*0729

*Go to data all the way up to height.*0731

*There is male height and I will color it in blue.*0735

*Go ahead and select that data then I am going to put in ,1).*0739

*Let us look at what we have here.*0749

*We have this data,1.*0751

*Let us lock that data in place because we do not need that to move in any time soon.*0753

*I am going to hit enter.*0758

*That is for q1.*0763

*In order to find the median it would be q2 or median.*0766

*In order to find the boundary for q3 it would just be the same thing with the 3.*0770

*Let us go ahead and copy and paste that function into my q3 slot.*0776

*All I have to do is change which quartile that I want it to get back at me.*0783

*There I have my first and third quartiles.*0789

*The nice thing about this quartile function is that it can also find you the min and max values because it considers the min value q0.*0794

*It considers the max value q4.*0800

*You could also use the functions min and max.*0807

*Let us go ahead and use quartile.*0810

*It makes it easier.*0812

*I could just copy and paste all of that in there. *0813

*I am going to double click on that maximum and put in 4.*0816

*I am going to double click on the min quartile and put in 0.*0821

*There we go.*0828

*We now have q1, max, min, and q3.*0830

*Let us find the female q1.*0835

*Let us go ahead and find our data.*0840

*I am just going to color it in pink.*0844

*I am going to put in ,1).*0847

*Let us see what we have here.*0857

*You double click on it we have our data and q1.*0859

*I am just going to lock that data in place so I could easily copy and paste that all the way down.*0862

*Let us change it for our max value.*0874

*For our max value we need q4.*0879

*For our min value we need q0.*0882

*For our q3 we need q3.*0886

*Notice that for females the max is slightly smaller.*0891

*The min is slightly smaller.*0898

*It is q1 and q3 for its male counter part.*0899

*Let us go ahead and put in our box and whiskers plot.*0903

*Go ahead and select the male and female part of this data set.*0909

*When you do that, Excel will interpret it as you want it to be the labels for your box and whisker plots.*0915

*Click on charts and if you click on stocks you will see something that looks like a box and whisker plot.*0923

*You could go ahead and click on it but some of you may get an error signal that looks like this.*0934

*To create a stock chart, arrange the data on your sheet in this order opening price, high price, low price, closing price.*0945

*Use data stock names for labels.*0954

*We have done this already.*0956

*It has the opening price or q1, high price is max, low price is min, closing price is q3.*0958

*Yet Excel does not recognize that we have done the right thing.*0967

*The reason is because Excel is not seeing that we organized our data into our columns instead of rows.*0971

*Here is one thing you might want to do.*0980

*Just go ahead and pick column and you will see that Excel is treating each row as if it is a data set.*0983

*What we want to tell Excel is stop doing that instead organize it this way so that it is grouping them into columns.*0998

*I am going to move this formatting palette out of the way and now if we go back to stock and click on stocks it should recognize it.*1014

*It is not elegant but that is the way Excel is sometimes.*1031

*You have to force it to do what you want.*1036

*I am going to delete this series because now it is treating each of these 4 rows as a series but we do not need that information.*1038

*We do not need to know how they go across.*1049

*We just need to know how they go up and down.*1055

*I am just going to change my formatting here so you could see it a little more clearly.*1057

*White is might be a little tough to see.*1063

*Feel free to pick any color you want.*1065

*Here is our box plot or classic box plot.*1082

*You will see that our values are clustered around this end and one thing I want to do is modify my y axis *1087

*so that I could stretch out the values that are important to us.*1098

*The way that I will do that is go to scale and put in my min value as 50 instead of 0.*1105

*When you do that it will stretch this part of from 50 to 80 instead of from 0 to 80.*1115

*Now we could see our box plot.*1123

*Here is our min for males which is below 62.*1127

*The max is 75.*1133

*Q1 which starts at 67.*1135

*Q3 which ends at 71.*1139

*Here is an important thing about a box plot.*1145

*Before in frequency distributions, we always have a value on the bottom.*1148

*Height would be on the bottom.*1154

*Now height in inches is shown on the y axis.*1156

*Because that is an important distinction, I am going to label my vertical axis to say height.*1162

*Now we could remember this is where height is.*1181

*Notice that frequency is not shown here.*1189

*We do not know exactly how many people have a height of 70 or 75.*1192

*We only know that those are the boundaries.*1198

*There is one thing we do know, we know that these are split up into quartiles.*1201

*This represents the quartile from 0 to 1.*1208

*This represents the quartile as well, the border from q3 to the end.*1213

*About 25% of our participants have heights in this range.*1221

*Another 25% have height in this range.*1226

*Using our powers of deduction how many people are in this range?*1230

*50%.*1236

*If we had a median line, we would also know where our other two quartiles break up.*1237

*Unfortunately, in this kind of formatting with the stock plots we do not show you the median.*1242

*In order to see the median because it is an important number sometimes, let us go ahead and use the stock columns option.*1250

*Go ahead and click on height stock column.*1260

*Here I have set it up so that now it is in a format that makes sense for me.*1264

*Something like min, q1, median, q3, and max.*1271

*That is the order that makes more sense to me or you could order that as a way max, q3, median, so on.*1277

*The reason I have done that is because it is going to make it a little bit easier for us to create our box plot if it is in order.*1285

*Let us go ahead and find our quartiles.*1295

*Let us find our data and put that in for males ,0 because we are starting at 0) and enter.*1301

*Notice that this is not new.*1319

*Instead of putting in 0 here is what I am going to do now.*1324

*I am going to click on this 0 because this way the more I make things into formulas the more I can just cut and paste.*1328

*The nice thing about clicking on this 0 is that because I have not put in any $ around it, it is a relative reference.*1340

*If I cut and paste this down one it will cut and paste this one down as well.*1350

*It will refer to the next one down.*1355

*I do not want my data to move down.*1357

*Let us go ahead and lock down in place and hit enter.*1360

*Once I have that I could just drag this down, copy and paste as we go.*1367

*We have all 5 of our values for our 5 numbers summary.*1374

*We did not have to do anything.*1378

*We could see that they are in order from least to greatest.*1380

*Let us do the same for females.*1385

*We have quartile, find our data and put in ,0) and hit enter.*1387

*All I am going to do is lock my data in place.*1411

*Sometimes I forget copying and pasting this.*1416

*I could just copy and paste that all the way down.*1419

*So far that part has been easy.*1426

*Here is where we get a little more complicated in order to create stocks box plots.*1429

*If you go to charts and go to columns you are going to be using the stock column.*1438

*In order to use stock column, one thing you have to do is find out the distance between each of these things.*1446

*In stock columns, what we will do is to put in a little box on top of another little box.*1454

*Each of those boxes represents the distance from the previous box to the next box.*1465

*We have to change this into distances.*1470

*I am just going to put in here distance for males and for females. *1474

*The first one is the distance from 0.*1491

*It is just 1.*1497

*The second one is the distance from this value to this value.*1500

*Once I have that I can just copy and paste that all the way down because this will give us the distance between this two.*1507

*Here we have the distance between these two.*1517

*In order to do that for females all I have to do is copy and paste that over because Excel will do it for the column on the right.*1522

*Now we have our distances.*1535

*Because of that we could just take that and put on column.*1538

*Notice that right now it is putting each row as a column, it is not what we want.*1551

*We want each column to be a column.*1559

*I am going to tell Excel organize it this way.*1562

*I will leave this one aside.*1571

*Feel free to use whatever coloring scheme you would like just keep it consistent I am going with orange.*1576

*Here is what we need to do now.*1598

*This does not look like a box and whisker plot yet.*1602

*In fact there are no whiskers.*1606

*That is why we need to fudge a little bit.*1609

*One thing we have to do is start by getting rid of this block.*1612

*We do not want to hit delete because that will just delete and ignore that data.*1617

*That is not what we wanted to do.*1624

*I want to look at these things as floating up here.*1625

*Instead what I am going to do is tell Excel color these clear.*1629

*I am just double clicking on this and fill and no fill.*1634

*It is invisible now.*1640

*In terms of line, I think there might be no line.*1643

*Even though it is there, you could find it.*1649

*It is just invisible because we do not need that part.*1658

*It is just the distance from 0 all the way to our min value.*1663

*Here we want this to be our low whisker. *1667

*Here is what we do.*1672

*We ask Excel to make this one invisible then we use error bars.*1673

*Error bars usually start at the top of that box.*1689

*Because we are on the top of the box we like for a bar just to to go through a box entirely.*1694

*We put a –errorbar and we ask for it to be 100% of our box.*1702

*When I do so, you see this little line here that is 100% of my box.*1712

*This is an Excel cheat.*1719

*We cout hit okay and we have our low end whisker.*1722

*We do not need to do anything with that.*1729

*We could leave that because that is our q1 to median box.*1733

*If you hit the error button it will go to the next box up.*1739

*This is the box that goes from median to q3 and this top one as well this is the box that goes from q3 to max.*1746

*We need to click on that and make it invisible and get that an error bar as well.*1761

*That is 100%.*1770

*In some of your Excel options you might be able to get rid of these little feet so that it looks like little bars.*1773

*In newer versions of Excel that is the case.*1790

*That is what I usually do because I think it looks ugly to have those little seats there.*1793

*That is not a big deal to have a feet.*1798

*Let us change our y axis so that we could see our box plot a little better and just hit scale.*1803

*I just double clicked on it and I am going to make my min value 50 and hit okay.*1816

*I do not want it to say distance of males instead I want it to say just males.*1825

*I am going to change this into males.*1834

*Excel will do this automatically for me.*1838

*Finally what I am going to do is add a little label to my vertical axis so that I remember the height and inches is being shown here.*1847

*Here we could see male that the quartiles here split up quite nicely.*1864

*But the quartiles here seem to be that there is a greater range of low end than the higher end.*1871

*The spread is bigger between q1 and median than the median and q3.*1887

*Our 2nd quartile is slightly fatter than our 3rd quartile.*1896

*But our 1st and last quartiles are even bigger than these two.*1901

*Although that is a bit of a fudge, you could see we could use stock columns in order to create box plots as well.*1908

*That is what I mean by slightly idiosyncratic.*1921

*It is not the best way but it can be done.*1930

*Let us talk about box plots are useful.*1932

*The pro’s of box plots is that you could plot a single quantitative variable and you can compare it to 2 or more distributions easily.*1938

*In fact box plots are useful for comparing lots of distributions very quickly because it is a good visualization.*1947

*It is very compact.*1954

*You put it all in one graph and it is easy to compare the boxes.*1956

*When distributions has many values, too many to show individually, you would not want to use a stem plot or a dot plot.*1962

*You would want to use something like a box plot.*1972

*Also it is nice when you do not want to see anything more than a 5 number summary.*1974

*One of that cons is that you cannot see frequency.*1979

*If you cannot see how many people have a height of 72 but you can see a frequency within a range.*1983

*You know that 50% of the people lie in between q1 and q3.*1993

*You can see that easily on a box plot.*1999

*How do we determine whether we have outliers in our data set?*2002

*Outliers are extreme values but that is a subjective way of defining it.*2012

*How do we know how extreme?*2020

*Who defines extreme?*2021

*We have a rule of that in statistics.*2024

*Typically we determine outliers by saying that an outlier is 1.5 times the inter quartile range from the nearest quartile.*2027

*What we mean is the upper limit is the nearest quartile is q3 and you add 1.5 times the inter quartile range whatever that is.*2038

*That is the upper limit boundary.*2053

*The lower limit boundary would be from the q1 boundary and because we are going lower *2056

*we are subtracting 1.5 times the inter quartile range.*2064

*Sometimes in skewed distributions where the 5 number summary is useful you might not need one of these.*2070

*Because one side might not have outliers only the other side does.*2078

*This right skewed we have the upper limit boundary.*2085

*If it is left skewed we have a lower limit boundary.*2088

*We are going to learn how to signal outliers in Excel data file by using conditional formatting.*2092

*Go ahead and click on your example spread sheet again Excel file.*2100

*Let us look at photos of stocks.*2110

*Photos if you remember from our previous frequency distribution visualizations that tends to be a skewed distribution.*2121

*It tends to be that photos are skewed to the right where there are some people that have an extremely high number of photos on one end.*2133

*Let us determine what the outlier boundaries are for tagged photos.*2148

*In order to do that, we might want to start off by finding q1, max, min, and q3.*2154

*Let us go ahead and put in quartile and put in our data for tagged photos.*2161

*I am just going to put in ,1 and hit enter.*2181

*Our first quartile is going to be locked in place.*2188

*Our 1st quartile is 53.75 that is the boundary.*2198

*I am just going to copy and paste that all the way down.*2207

*I will just change the quartile that we are referencing.*2210

*We need the max here so that is 4.*2213

*We need the min here which is 0.*2216

*We need q3.*2221

*Notice that q3 is around 250 but the max value is 4,686.*2224

*I am guessing we have a few outliers here but to find outliers we need to find the inter quartile range.*2237

*The inter quartile range is easy once you have q3 and q1 because it is simply q3 – q1.*2246

*Instead of having max and min let us modify this.*2259

*Let us modify it to be the upper and lower limit.*2282

*Instead of a classic box plot, we are going to be looking at a modified box plot.*2287

*Instead of this max and min we are going to be putting our formulas for the upper and lower limit.*2299

*The min value we probably would not need it because 13 is going to be within the boundary *2311

*like if we made q1 – 1.5 × 200 that would be way lower than our actual min value.*2323

*We do not need to modify that one.*2331

*Let us modify this one.*2333

*Here what we do is take our other boundary + 1.5 × IQR.*2337

*Excel knows order of operations so it will do the multiplication before it does the addition.*2349

*Let us hit enter.*2356

*Notice that our outer boundary is a lot smaller than our max value.*2358

*It is 549.375.*2367

*It would be nice if we could go back to our data and see quickly who is in that range and who is not.*2372

*Who is out of that range or my outliers?*2380

*If we go out to the data, what I am going to show you is something called conditional formatting.*2384

*Go ahead and select all the data that we want Excel to look at.*2390

*What we are going to let Excel do is color these data points if they are outliers.*2397

*In order to do that, after you select is go to your menu which we cannot see here and hit format.*2405

*There should be an option that is conditional formatting.*2415

*Go ahead and click on that.*2419

*You should get a box that looks like this.*2421

*Here we want to say if the cell value is greater than something then format it differently.*2423

*Let us see what our other boundary is again.*2436

*We go to photos and fortunately it would let you put enough formula unless it is in the same work sheet.*2440

*We just have to put it in manually.*2450

*If it is greater that 549.375 then format it differently.*2452

*Hit format.*2461

*One thing I might want to do is just color it in a different color.*2463

*I think I will put in something like red.*2469

*Color it red if it is an outlier and hit okay.*2472

*If you look through this data file you will see we already have 2, 3, 4 outliers.*2477

*We have 7 outliers in our sample of 100.*2490

*We know that those 7 or so people are not going to be acquitted.*2497

*In order to do a modified box plot, all we would do is plot these values instead of the max and min.*2504

*Here I am going to select these and use stock option.*2514

*Let us see if it will do for me on the first try.*2524

*Unfortunately no.*2528

*It gives me this error.*2530

*Instead I am just going to put something else just so that I could tell Excel please organize it differently.*2533

*I am going to go back to my stock.*2542

*I am going to color it something different.*2548

*I am going to delete this series because it is redundant.*2557

*Here are our tagged photos and what we see is this range is a lot bigger even though there is *2570

*the same 25% of people in there as well as in here.*2579

*It is typically what it looks like for skewed distribution.*2583

*A skewed distribution will have a small whisker on one end and a large whisker on the other end.*2587

*Because even though it has the same number of people as there are in this little whisker down here, *2592

*this one shows you there is a wider range that captures that same 25% of people.*2602

*This is a right skewed distribution because the top whisker is bigger than the bottom whisker.*2609

*One thing we might want to do is label our vertical axis to show number of photos.*2617

*Just so that we remember we are not talking about frequency of people here.*2629

*Let us use that same data in order to create stock columns.*2636

*Go ahead and put in stock columns and instead of a classic box plot we are going to be using a modified box plot.*2650

*Let us put in our quartiles and data right here.*2660

*Go back and put in this one.*2676

*Let us see what we have.*2681

*We know we want to lock in this data in place and hit enter.*2685

*I could just drag that down.*2693

*We want to change this max value to be modified.*2695

*In order to do that we have to find IQR which is q3 – q1 and delete that and put in =.*2704

*My q3 + 1.5 × IQR and enter.*2725

*Let us find the distances between these.*2735

*The first one I do not have to do anything with that because it is the distance from 0.*2741

*The next one I just subtract c3 from c4 and copy and paste that all the way down.*2745

*Let us plot these distances.*2757

*I am going to select all of that and hit charts, go to columns and choose stock columns.*2764

*Unfortunately it is not stocking them.*2772

*Here on my toolbar I am going to tell it organize differently and delete my series here because this is not what I need.*2776

*Now we have to do that fancy formatting.*2792

*The first one we know we just need to make it clear or transparent.*2797

*The second one we know we need to make it transparent as well as putting in a –errorbar 100% of that box.*2805

*Up here we also need that –errorbar 100% of my box with no fill.*2819

*Hit okay.*2830

*Here is what we have.*2833

*We should probably change that distance because it does not have to have distance.*2836

*We should put something like tagged photos and on the side we will change our vertical axis so that it reads number of photos.*2842

*We know that is our values.*2856

*There we go we have a modified box plot.*2859

*It is no longer from 0 all the way to 5,000.*2863

*It is form 0 to about 600.*2868

*It is still a skewed distribution even when we top off those crazy outliers we still have our whiskers on top that is a lot longer *2872

*than our whiskers on the bottom even though they both indicate the same 25% of our 100% sample.*2889

*I am going to minimize all this and get rid of that.*2898

*Now we know how to create conditional formatting so that we could signal these people are outliers.*2906

*We are not including them in our visualization.*2916

*We just went over modify box plots.*2919

*The classic box plots relies on the max and min but those are very susceptible to outliers.*2924

*If you have a couple of crazy outliers that would drastically change our visualization.*2930

*The modify box plots use those modified boundaries that is 1.5 times the IQR.*2936

*This is helpful to highly skewed distributions as we saw with tagged photos.*2943

*Let us go into some examples.*2949

*Approximately what percentage of values in the data set lie within the box?*2953

*That box with the whiskers.*2960

*What percentage of the data lies in there?*2965

*We know that this is 25% of our sample as well as this also 25%.*2967

*Each of these quartiles are also 25% so we know that what is in here is 50% of our sample.*2978

*The lower whisker 25% and the upper whisker 25% and that is one of the reasons why box plots are useful.*2991

*It breaks up that data easily and do chunks that we could use.*3004

*Example 2, I want a box plot that looks for data set that is skewed right.*3010

*In a skewed right distribution we have a few of these outliers that are very positive.*3018

*What would our box plot look like?*3034

*For data set that is skewed right, which I write in red, if we have a box plot that looks like this *3037

*we know that the right side would be much longer than the left side.*3053

*Probably the right box would be longer than the left box.*3060

*When it is up and down remember this side the top part is the greater side it would just leave this shoulder which is spread around.*3065

*This upper side is longer than the lower side.*3088

*What about those populations that are skewed left?*3092

*In that case smaller numbers there are more outliers than the smaller end of the continuum.*3097

*Skewed left.*3106

*In a side to side box it would be easy to see because that left side would be longer than the right side.*3110

*Probably even within the box you would have that left box being bigger than the right box.*3119

*Remember when we draw up and down the bottom end would be longer than the top end.*3127

*Here that is the positive end.*3140

*This side would be longer as well.*3144

*Skewed left distributions we know is either like this or like this.*3149

*Skewed right distributions we know might be one of this.*3154

*What about approximately normal distributions?*3157

*When you draw on each side you know that it has to be roughly symmetrical because it is approximately normal.*3166

*That part is easy.*3175

*Another thing about normal distributions is that most of the values 60% are clustered within that small space.*3177

*We are 1 standard deviation away on either side.*3190

*We could guess that these little tails may be longer than the actual box.*3194

*That is how it look but it is roughly symmetrical.*3205

*It is easy to draw that on this side as well because it is roughly symmetrical.*3210

*Example 3, how can you estimate the IQR from the box plot?*3225

*The IQR is easy to see on a box plot because whatever the box is the bottom end of that is q1 and the top is q3.*3233

*That is your IQR.*3246

*Can you estimate the range?*3249

*If so, how?*3255

*The range is your min and max values.*3256

*It is the length of that entire box and whiskers and that would be your range.*3260

*IQR is easy to see on a box plot.*3268

*Example 4, is it possible for a box plot to be missing a whisker?*3275

*If so, give an example.*3282

*If not, explain why not.*3284

*Let us think about this.*3286

*We know that when it is skewed, let us say skewed right, we know that this side of whisker can be very small and this side can be very long.*3287

*What would it mean for not to have a whisker at all?*3302

*One thing might be that you might have so many values that are exactly the same so you cannot split it up in a whisker and box.*3306

*For instance, let me give you an example of data set that might do so.*3318

*Let us say it is out of 8 values.*3323

*I just picked 8 because it is easy to split into quartiles.*3330

*What is all of the values in both q1 and q2 are exactly the same?*3334

*Then you could not have whiskers because it would be arbitrary like these 0 get whiskers and this is supposed to be in the box.*3343

*You probably put them all on the same box.*3359

*This would be a very thin box.*3362

*That is an example.*3365

*Obviously you could do it on the other side as well.*3369

*This might be an example of a left skewed distribution that does not have a whisker.*3373

*It is something like 4, 4, 3.*3386

*In these kinds of plots, it is hard to say what the IQR would be.*3401

*It would be difficult to arbitrary say 4 is the boundary for the whisker as well as the box.*3409

*It cannot be both.*3420

*These are the kinds of distributions that we would be missing a whisker.*3421

*That is the end of box and whisker plots.*3429

*Thanks for using www.educator.com.*3433

