Dr. Ji Son

Five Number Summary & Boxplots

Slide Duration:

Section 1: Introduction
Descriptive Statistics vs. Inferential Statistics

25m 31s

Intro
0:00
0:10
0:11
Statistics
0:35
Statistics
0:36
Let's Think About High School Science
1:12
Measurement and Find Patterns (Mathematical Formula)
1:13
Statistics = Math of Distributions
4:58
Distributions
4:59
Problematic… but also GREAT
5:58
Statistics
7:33
How is It Different from Other Specializations in Mathematics?
7:34
Statistics is Fundamental in Natural and Social Sciences
7:53
Two Skills of Statistics
8:20
Description (Exploration)
8:21
Inference
9:13
Descriptive Statistics vs. Inferential Statistics: Apply to Distributions
9:58
Descriptive Statistics
9:59
Inferential Statistics
11:05
Populations vs. Samples
12:19
Populations vs. Samples: Is it the Truth?
12:20
Populations vs. Samples: Pros & Cons
13:36
Populations vs. Samples: Descriptive Values
16:12
Putting Together Descriptive/Inferential Stats & Populations/Samples
17:10
Putting Together Descriptive/Inferential Stats & Populations/Samples
17:11
Example 1: Descriptive Statistics vs. Inferential Statistics
19:09
Example 2: Descriptive Statistics vs. Inferential Statistics
20:47
Example 3: Sample, Parameter, Population, and Statistic
21:40
Example 4: Sample, Parameter, Population, and Statistic
23:28
Section 2: About Samples: Cases, Variables, Measurements

32m 14s

Intro
0:00
Data
0:09
Data, Cases, Variables, and Values
0:10
Rows, Columns, and Cells
2:03
Example: Aircrafts
3:52
How Do We Get Data?
5:38
Research: Question and Hypothesis
5:39
Research Design
7:11
Measurement
7:29
Research Analysis
8:33
Research Conclusion
9:30
Types of Variables
10:03
Discrete Variables
10:04
Continuous Variables
12:07
Types of Measurements
14:17
Types of Measurements
14:18
Types of Measurements (Scales)
17:22
Nominal
17:23
Ordinal
19:11
Interval
21:33
Ratio
24:24
Example 1: Cases, Variables, Measurements
25:20
Example 2: Which Scale of Measurement is Used?
26:55
Example 3: What Kind of a Scale of Measurement is This?
27:26
Example 4: Discrete vs. Continuous Variables.
30:31
Section 3: Visualizing Distributions
Introduction to Excel

8m 9s

Intro
0:00
Before Visualizing Distribution
0:10
Excel
0:11
Excel: Organization
0:45
Workbook
0:46
Column x Rows
1:50
Tools: Menu Bar, Standard Toolbar, and Formula Bar
3:00
Excel + Data
6:07
Exce and Data
6:08
Frequency Distributions in Excel

39m 10s

Intro
0:00
0:08
Data in Excel and Frequency Distributions
0:09
Raw Data to Frequency Tables
0:42
Raw Data to Frequency Tables
0:43
Frequency Tables: Using Formulas and Pivot Tables
1:28
Example 1: Number of Births
7:17
Example 2: Age Distribution
20:41
Example 3: Height Distribution
27:45
Example 4: Height Distribution of Males
32:19
Frequency Distributions and Features

25m 29s

Intro
0:00
0:10
Data in Excel, Frequency Distributions, and Features of Frequency Distributions
0:11
Example #1
1:35
Uniform
1:36
Example #2
2:58
Unimodal, Skewed Right, and Asymmetric
2:59
Example #3
6:29
Bimodal
6:30
Example #4a
8:29
Symmetric, Unimodal, and Normal
8:30
Point of Inflection and Standard Deviation
11:13
Example #4b
12:43
Normal Distribution
12:44
Summary
13:56
Uniform, Skewed, Bimodal, and Normal
13:57
17:34
Sketch Problem 2: Life Expectancy
20:01
Sketch Problem 3: Telephone Numbers
22:01
Sketch Problem 4: Length of Time Used to Complete a Final Exam
23:43
Dotplots and Histograms in Excel

42m 42s

Intro
0:00
0:06
0:07
Previously
1:02
Data, Frequency Table, and visualization
1:03
Dotplots
1:22
Dotplots Excel Example
1:23
Dotplots: Pros and Cons
7:22
Pros and Cons of Dotplots
7:23
Dotplots Excel Example Cont.
9:07
Histograms
12:47
Histograms Overview
12:48
Example of Histograms
15:29
Histograms: Pros and Cons
31:39
Pros
31:40
Cons
32:31
Frequency vs. Relative Frequency
32:53
Frequency
32:54
Relative Frequency
33:36
Example 1: Dotplots vs. Histograms
34:36
Example 2: Age of Pennies Dotplot
36:21
Example 3: Histogram of Mammal Speeds
38:27
Example 4: Histogram of Life Expectancy
40:30
Stemplots

12m 23s

Intro
0:00
0:05
0:06
What Sets Stemplots Apart?
0:46
Data Sets, Dotplots, Histograms, and Stemplots
0:47
Example 1: What Do Stemplots Look Like?
1:58
Example 2: Back-to-Back Stemplots
5:00
7:46
Example 4: Quiz Grade & Afterschool Tutoring Stemplot
9:56
Bar Graphs

22m 49s

Intro
0:00
0:05
0:08
Review of Frequency Distributions
0:44
Y-axis and X-axis
0:45
Types of Frequency Visualizations Covered so Far
2:16
Introduction to Bar Graphs
4:07
Example 1: Bar Graph
5:32
Example 1: Bar Graph
5:33
Do Shapes, Center, and Spread of Distributions Apply to Bar Graphs?
11:07
Do Shapes, Center, and Spread of Distributions Apply to Bar Graphs?
11:08
Example 2: Create a Frequency Visualization for Gender
14:02
Example 3: Cases, Variables, and Frequency Visualization
16:34
Example 4: What Kind of Graphs are Shown Below?
19:29
Section 4: Summarizing Distributions
Central Tendency: Mean, Median, Mode

38m 50s

Intro
0:00
0:07
0:08
Central Tendency 1
0:56
Way to Summarize a Distribution of Scores
0:57
Mode
1:32
Median
2:02
Mean
2:36
Central Tendency 2
3:47
Mode
3:48
Median
4:20
Mean
5:25
Summation Symbol
6:11
Summation Symbol
6:12
Population vs. Sample
10:46
Population vs. Sample
10:47
Excel Examples
15:08
Finding Mode, Median, and Mean in Excel
15:09
Median vs. Mean
21:45
Effect of Outliers
21:46
Relationship Between Parameter and Statistic
22:44
Type of Measurements
24:00
Which Distributions to Use With
24:55
Example 1: Mean
25:30
Example 2: Using Summation Symbol
29:50
Example 3: Average Calorie Count
32:50
Example 4: Creating an Example Set
35:46
Variability

42m 40s

Intro
0:00
0:05
0:06
0:45
0:46
5:45
5:46
Range, Quartiles and Interquartile Range
6:37
Range
6:38
Interquartile Range
8:42
Interquartile Range Example
10:58
Interquartile Range Example
10:59
Variance and Standard Deviation
12:27
Deviations
12:28
Sum of Squares
14:35
Variance
16:55
Standard Deviation
17:44
Sum of Squares (SS)
18:34
Sum of Squares (SS)
18:35
Population vs. Sample SD
22:00
Population vs. Sample SD
22:01
Population vs. Sample
23:20
Mean
23:21
SD
23:51
Example 1: Find the Mean and Standard Deviation of the Variable Friends in the Excel File
27:21
Example 2: Find the Mean and Standard Deviation of the Tagged Photos in the Excel File
35:25
Example 3: Sum of Squares
38:58
Example 4: Standard Deviation
41:48
Five Number Summary & Boxplots

57m 15s

Intro
0:00
0:06
0:07
Summarizing Distributions
0:37
0:38
5 Number Summary
1:14
Boxplot: Visualizing 5 Number Summary
3:37
Boxplot: Visualizing 5 Number Summary
3:38
Boxplots on Excel
9:01
Using 'Stocks' and Using Stacked Columns
9:02
Boxplots on Excel Example
10:14
When are Boxplots Useful?
32:14
Pros
32:15
Cons
32:59
How to Determine Outlier Status
33:24
Rule of Thumb: Upper Limit
33:25
Rule of Thumb: Lower Limit
34:16
Signal Outliers in an Excel Data File Using Conditional Formatting
34:52
Modified Boxplot
48:38
Modified Boxplot
48:39
Example 1: Percentage Values & Lower and Upper Whisker
49:10
Example 2: Boxplot
50:10
Example 3: Estimating IQR From Boxplot
53:46
Example 4: Boxplot and Missing Whisker
54:35
Shape: Calculating Skewness & Kurtosis

41m 51s

Intro
0:00
0:16
0:17
Skewness Concept
1:09
Skewness Concept
1:10
Calculating Skewness
3:26
Calculating Skewness
3:27
Interpreting Skewness
7:36
Interpreting Skewness
7:37
Excel Example
8:49
Kurtosis Concept
20:29
Kurtosis Concept
20:30
Calculating Kurtosis
24:17
Calculating Kurtosis
24:18
Interpreting Kurtosis
29:01
Leptokurtic
29:35
Mesokurtic
30:10
Platykurtic
31:06
Excel Example
32:04
Example 1: Shape of Distribution
38:28
Example 2: Shape of Distribution
39:29
Example 3: Shape of Distribution
40:14
Example 4: Kurtosis
41:10
Normal Distribution

34m 33s

Intro
0:00
0:13
0:14
What is a Normal Distribution
0:44
The Normal Distribution As a Theoretical Model
0:45
Possible Range of Probabilities
3:05
Possible Range of Probabilities
3:06
What is a Normal Distribution
5:07
Can Be Described By
5:08
Properties
5:49
'Same' Shape: Illusion of Different Shape!
7:35
'Same' Shape: Illusion of Different Shape!
7:36
Types of Problems
13:45
Example: Distribution of SAT Scores
13:46
Shape Analogy
19:48
Shape Analogy
19:49
Example 1: The Standard Normal Distribution and Z-Scores
22:34
Example 2: The Standard Normal Distribution and Z-Scores
25:54
Example 3: Sketching and Normal Distribution
28:55
Example 4: Sketching and Normal Distribution
32:32
Standard Normal Distributions & Z-Scores

41m 44s

Intro
0:00
0:06
0:07
A Family of Distributions
0:28
Infinite Set of Distributions
0:29
Transforming Normal Distributions to 'Standard' Normal Distribution
1:04
Normal Distribution vs. Standard Normal Distribution
2:58
Normal Distribution vs. Standard Normal Distribution
2:59
Z-Score, Raw Score, Mean, & SD
4:08
Z-Score, Raw Score, Mean, & SD
4:09
Weird Z-Scores
9:40
Weird Z-Scores
9:41
Excel
16:45
For Normal Distributions
16:46
For Standard Normal Distributions
19:11
Excel Example
20:24
Types of Problems
25:18
Percentage Problem: P(x)
25:19
Raw Score and Z-Score Problems
26:28
Standard Deviation Problems
27:01
Shape Analogy
27:44
Shape Analogy
27:45
Example 1: Deaths Due to Heart Disease vs. Deaths Due to Cancer
28:24
Example 2: Heights of Male College Students
33:15
Example 3: Mean and Standard Deviation
37:14
Example 4: Finding Percentage of Values in a Standard Normal Distribution
37:49
Normal Distribution: PDF vs. CDF

55m 44s

Intro
0:00
0:15
0:16
Frequency vs. Cumulative Frequency
0:56
Frequency vs. Cumulative Frequency
0:57
Frequency vs. Cumulative Frequency
4:32
Frequency vs. Cumulative Frequency Cont.
4:33
Calculus in Brief
6:21
Derivative-Integral Continuum
6:22
PDF
10:08
PDF for Standard Normal Distribution
10:09
PDF for Normal Distribution
14:32
Integral of PDF = CDF
21:27
Integral of PDF = CDF
21:28
Example 1: Cumulative Frequency Graph
23:31
Example 2: Mean, Standard Deviation, and Probability
24:43
Example 3: Mean and Standard Deviation
35:50
Example 4: Age of Cars
49:32
Section 5: Linear Regression
Scatterplots

47m 19s

Intro
0:00
0:04
0:05
Previous Visualizations
0:30
Frequency Distributions
0:31
Compare & Contrast
2:26
Frequency Distributions Vs. Scatterplots
2:27
Summary Values
4:53
Shape
4:54
Center & Trend
6:41
8:22
Univariate & Bivariate
10:25
Example Scatterplot
10:48
Shape, Trend, and Strength
10:49
Positive and Negative Association
14:05
Positive and Negative Association
14:06
Linearity, Strength, and Consistency
18:30
Linearity
18:31
Strength
19:14
Consistency
20:40
Summarizing a Scatterplot
22:58
Summarizing a Scatterplot
22:59
Example 1: Gapminder.org, Income x Life Expectancy
26:32
Example 2: Gapminder.org, Income x Infant Mortality
36:12
Example 3: Trend and Strength of Variables
40:14
Example 4: Trend, Strength and Shape for Scatterplots
43:27
Regression

32m 2s

Intro
0:00
0:05
0:06
Linear Equations
0:34
Linear Equations: y = mx + b
0:35
Rough Line
5:16
Rough Line
5:17
Regression - A 'Center' Line
7:41
Reasons for Summarizing with a Regression Line
7:42
Predictor and Response Variable
10:04
Goal of Regression
12:29
Goal of Regression
12:30
Prediction
14:50
Example: Servings of Mile Per Year Shown By Age
14:51
Intrapolation
17:06
Extrapolation
17:58
Error in Prediction
20:34
Prediction Error
20:35
Residual
21:40
Example 1: Residual
23:34
Example 2: Large and Negative Residual
26:30
Example 3: Positive Residual
28:13
Example 4: Interpret Regression Line & Extrapolate
29:40
Least Squares Regression

56m 36s

Intro
0:00
0:13
0:14
Best Fit
0:47
Best Fit
0:48
Sum of Squared Errors (SSE)
1:50
Sum of Squared Errors (SSE)
1:51
Why Squared?
3:38
Why Squared?
3:39
Quantitative Properties of Regression Line
4:51
Quantitative Properties of Regression Line
4:52
So How do we Find Such a Line?
6:49
SSEs of Different Line Equations & Lowest SSE
6:50
Carl Gauss' Method
8:01
How Do We Find Slope (b1)
11:00
How Do We Find Slope (b1)
11:01
Hoe Do We Find Intercept
15:11
Hoe Do We Find Intercept
15:12
Example 1: Which of These Equations Fit the Above Data Best?
17:18
Example 2: Find the Regression Line for These Data Points and Interpret It
26:31
Example 3: Summarize the Scatterplot and Find the Regression Line.
34:31
Example 4: Examine the Mean of Residuals
43:52
Correlation

43m 58s

Intro
0:00
0:05
0:06
Summarizing a Scatterplot Quantitatively
0:47
Shape
0:48
Trend
1:11
Strength: Correlation ®
1:45
Correlation Coefficient ( r )
2:30
Correlation Coefficient ( r )
2:31
Trees vs. Forest
11:59
Trees vs. Forest
12:00
Calculating r
15:07
Average Product of z-scores for x and y
15:08
Relationship between Correlation and Slope
21:10
Relationship between Correlation and Slope
21:11
Example 1: Find the Correlation between Grams of Fat and Cost
24:11
Example 2: Relationship between r and b1
30:24
Example 3: Find the Regression Line
33:35
Example 4: Find the Correlation Coefficient for this Set of Data
37:37
Correlation: r vs. r-squared

52m 52s

Intro
0:00
0:07
0:08
R-squared
0:44
What is the Meaning of It? Why Squared?
0:45
Parsing Sum of Squared (Parsing Variability)
2:25
SST = SSR + SSE
2:26
What is SST and SSE?
7:46
What is SST and SSE?
7:47
r-squared
18:33
Coefficient of Determination
18:34
If the Correlation is Strong…
20:25
If the Correlation is Strong…
20:26
If the Correlation is Weak…
22:36
If the Correlation is Weak…
22:37
Example 1: Find r-squared for this Set of Data
23:56
Example 2: What Does it Mean that the Simple Linear Regression is a 'Model' of Variance?
33:54
Example 3: Why Does r-squared Only Range from 0 to 1
37:29
Example 4: Find the r-squared for This Set of Data
39:55
Transformations of Data

27m 8s

Intro
0:00
0:05
0:06
Why Transform?
0:26
Why Transform?
0:27
Shape-preserving vs. Shape-changing Transformations
5:14
Shape-preserving = Linear Transformations
5:15
Shape-changing Transformations = Non-linear Transformations
6:20
Common Shape-Preserving Transformations
7:08
Common Shape-Preserving Transformations
7:09
Common Shape-Changing Transformations
8:59
Powers
9:00
Logarithms
9:39
Change Just One Variable? Both?
10:38
Log-log Transformations
10:39
Log Transformations
14:38
Example 1: Create, Graph, and Transform the Data Set
15:19
Example 2: Create, Graph, and Transform the Data Set
20:08
Example 3: What Kind of Model would You Choose for this Data?
22:44
Example 4: Transformation of Data
25:46
Section 6: Collecting Data in an Experiment
Sampling & Bias

54m 44s

Intro
0:00
0:05
0:06
Descriptive vs. Inferential Statistics
1:04
Descriptive Statistics: Data Exploration
1:05
Example
2:03
To tackle Generalization…
4:31
Generalization
4:32
Sampling
6:06
'Good' Sample
6:40
Defining Samples and Populations
8:55
Population
8:56
Sample
11:16
Why Use Sampling?
13:09
Why Use Sampling?
13:10
Goal of Sampling: Avoiding Bias
15:04
What is Bias?
15:05
Where does Bias Come from: Sampling Bias
17:53
Where does Bias Come from: Response Bias
18:27
Sampling Bias: Bias from Bas Sampling Methods
19:34
Size Bias
19:35
Voluntary Response Bias
21:13
Convenience Sample
22:22
Judgment Sample
23:58
25:40
Response Bias: Bias from 'Bad' Data Collection Methods
28:00
Nonresponse Bias
29:31
Questionnaire Bias
31:10
Incorrect Response or Measurement Bias
37:32
Example 1: What Kind of Biases?
40:29
Example 2: What Biases Might Arise?
44:46
Example 3: What Kind of Biases?
48:34
Example 4: What Kind of Biases?
51:43
Sampling Methods

14m 25s

Intro
0:00
0:05
0:06
Biased vs. Unbiased Sampling Methods
0:32
Biased Sampling
0:33
Unbiased Sampling
1:13
Probability Sampling Methods
2:31
Simple Random
2:54
Stratified Random Sampling
4:06
Cluster Sampling
5:24
Two-staged Sampling
6:22
Systematic Sampling
7:25
8:33
Example 2: Describe How to Take a Two-Stage Sample from this Book
10:16
Example 3: Sampling Methods
11:58
Example 4: Cluster Sample Plan
12:48
Research Design

53m 54s

Intro
0:00
0:06
0:07
Descriptive vs. Inferential Statistics
0:51
Descriptive Statistics: Data Exploration
0:52
Inferential Statistics
1:02
Variables and Relationships
1:44
Variables
1:45
Relationships
2:49
Not Every Type of Study is an Experiment…
4:16
Category I - Descriptive Study
4:54
Category II - Correlational Study
5:50
Category III - Experimental, Quasi-experimental, Non-experimental
6:33
Category III
7:42
Experimental, Quasi-experimental, and Non-experimental
7:43
Why CAN'T the Other Strategies Determine Causation?
10:18
Third-variable Problem
10:19
Directionality Problem
15:49
What Makes Experiments Special?
17:54
Manipulation
17:55
Control (and Comparison)
21:58
Methods of Control
26:38
Holding Constant
26:39
Matching
29:11
Random Assignment
31:48
Experiment Terminology
34:09
'true' Experiment vs. Study
34:10
Independent Variable (IV)
35:16
Dependent Variable (DV)
35:45
Factors
36:07
Treatment Conditions
36:23
Levels
37:43
Confounds or Extraneous Variables
38:04
Blind
38:38
Blind Experiments
38:39
Double-blind Experiments
39:29
How Categories Relate to Statistics
41:35
Category I - Descriptive Study
41:36
Category II - Correlational Study
42:05
Category III - Experimental, Quasi-experimental, Non-experimental
42:43
Example 1: Research Design
43:50
Example 2: Research Design
47:37
Example 3: Research Design
50:12
Example 4: Research Design
52:00
Between and Within Treatment Variability

41m 31s

Intro
0:00
0:06
0:07
Experimental Designs
0:51
Experimental Designs: Manipulation & Control
0:52
Two Types of Variability
2:09
Between Treatment Variability
2:10
Within Treatment Variability
3:31
Updated Goal of Experimental Design
5:47
Updated Goal of Experimental Design
5:48
Example: Drugs and Driving
6:56
Example: Drugs and Driving
6:57
Different Types of Random Assignment
11:27
All Experiments
11:28
Completely Random Design
12:02
Randomized Block Design
13:19
Randomized Block Design
15:48
Matched Pairs Design
15:49
Repeated Measures Design
19:47
Between-subject Variable vs. Within-subject Variable
22:43
Completely Randomized Design
22:44
Repeated Measures Design
25:03
Example 1: Design a Completely Random, Matched Pair, and Repeated Measures Experiment
26:16
Example 2: Block Design
31:41
Example 3: Completely Randomized Designs
35:11
Example 4: Completely Random, Matched Pairs, or Repeated Measures Experiments?
39:01
Section 7: Review of Probability Axioms
Sample Spaces

37m 52s

Intro
0:00
0:07
0:08
Why is Probability Involved in Statistics
0:48
Probability
0:49
Can People Tell the Difference between Cheap and Gourmet Coffee?
2:08
Taste Test with Coffee Drinkers
3:37
If No One can Actually Taste the Difference
3:38
If Everyone can Actually Taste the Difference
5:36
Creating a Probability Model
7:09
Creating a Probability Model
7:10
D'Alembert vs. Necker
9:41
D'Alembert vs. Necker
9:42
Problem with D'Alembert's Model
13:29
Problem with D'Alembert's Model
13:30
Covering Entire Sample Space
15:08
Fundamental Principle of Counting
15:09
Where Do Probabilities Come From?
22:54
Observed Data, Symmetry, and Subjective Estimates
22:55
Checking whether Model Matches Real World
24:27
Law of Large Numbers
24:28
Example 1: Law of Large Numbers
27:46
Example 2: Possible Outcomes
30:43
Example 3: Brands of Coffee and Taste
33:25
Example 4: How Many Different Treatments are there?
35:33

20m 29s

Intro
0:00
0:08
0:09
Disjoint Events
0:41
Disjoint Events
0:42
Meaning of 'or'
2:39
In Regular Life
2:40
In Math/Statistics/Computer Science
3:10
3:55
If A and B are Disjoint: P (A and B)
3:56
If A and B are Disjoint: P (A or B)
5:15
5:41
5:42
8:31
If A and B are not Disjoint: P (A or B)
8:32
Example 1: Which of These are Mutually Exclusive?
10:50
Example 2: What is the Probability that You will Have a Combination of One Heads and Two Tails?
12:57
Example 3: Engagement Party
15:17
Example 4: Home Owner's Insurance
18:30
Conditional Probability

57m 19s

Intro
0:00
0:05
0:06
'or' vs. 'and' vs. Conditional Probability
1:07
'or' vs. 'and' vs. Conditional Probability
1:08
'and' vs. Conditional Probability
5:57
P (M or L)
5:58
P (M and L)
8:41
P (M|L)
11:04
P (L|M)
12:24
Tree Diagram
15:02
Tree Diagram
15:03
Defining Conditional Probability
22:42
Defining Conditional Probability
22:43
Common Contexts for Conditional Probability
30:56
Medical Testing: Positive Predictive Value
30:57
Medical Testing: Sensitivity
33:03
Statistical Tests
34:27
Example 1: Drug and Disease
36:41
Example 2: Marbles and Conditional Probability
40:04
Example 3: Cards and Conditional Probability
45:59
Example 4: Votes and Conditional Probability
50:21
Independent Events

24m 27s

Intro
0:00
0:05
0:06
Independent Events & Conditional Probability
0:26
Non-independent Events
0:27
Independent Events
2:00
Non-independent and Independent Events
3:08
Non-independent and Independent Events
3:09
Defining Independent Events
5:52
Defining Independent Events
5:53
Multiplication Rule
7:29
Previously…
7:30
But with Independent Evens
8:53
Example 1: Which of These Pairs of Events are Independent?
11:12
Example 2: Health Insurance and Probability
15:12
Example 3: Independent Events
17:42
Example 4: Independent Events
20:03
Section 8: Probability Distributions
Introduction to Probability Distributions

56m 45s

Intro
0:00
0:08
0:09
Sampling vs. Probability
0:57
Sampling
0:58
Missing
1:30
What is Missing?
3:06
Insight: Probability Distributions
5:26
Insight: Probability Distributions
5:27
What is a Probability Distribution?
7:29
From Sample Spaces to Probability Distributions
8:44
Sample Space
8:45
Probability Distribution of the Sum of Two Die
11:16
The Random Variable
17:43
The Random Variable
17:44
Expected Value
21:52
Expected Value
21:53
Example 1: Probability Distributions
28:45
Example 2: Probability Distributions
35:30
Example 3: Probability Distributions
43:37
Example 4: Probability Distributions
47:20
Expected Value & Variance of Probability Distributions

53m 41s

Intro
0:00
0:06
0:07
Discrete vs. Continuous Random Variables
1:04
Discrete vs. Continuous Random Variables
1:05
Mean and Variance Review
4:44
Mean: Sample, Population, and Probability Distribution
4:45
Variance: Sample, Population, and Probability Distribution
9:12
Example Situation
14:10
Example Situation
14:11
Some Special Cases…
16:13
Some Special Cases…
16:14
Linear Transformations
19:22
Linear Transformations
19:23
What Happens to Mean and Variance of the Probability Distribution?
20:12
n Independent Values of X
25:38
n Independent Values of X
25:39
Compare These Two Situations
30:56
Compare These Two Situations
30:57
Two Random Variables, X and Y
32:02
Two Random Variables, X and Y
32:03
Example 1: Expected Value & Variance of Probability Distributions
35:35
Example 2: Expected Values & Standard Deviation
44:17
Example 3: Expected Winnings and Standard Deviation
48:18
Binomial Distribution

55m 15s

Intro
0:00
0:05
0:06
Discrete Probability Distributions
1:42
Discrete Probability Distributions
1:43
Binomial Distribution
2:36
Binomial Distribution
2:37
Multiplicative Rule Review
6:54
Multiplicative Rule Review
6:55
How Many Outcomes with k 'Successes'
10:23
Adults and Bachelor's Degree: Manual List of Outcomes
10:24
P (X=k)
19:37
Putting Together # of Outcomes with the Multiplicative Rule
19:38
Expected Value and Standard Deviation in a Binomial Distribution
25:22
Expected Value and Standard Deviation in a Binomial Distribution
25:23
Example 1: Coin Toss
33:42
38:03
Example 3: Types of Blood and Probability
45:39
Example 4: Expected Number and Standard Deviation
51:11
Section 9: Sampling Distributions of Statistics
Introduction to Sampling Distributions

48m 17s

Intro
0:00
0:08
0:09
Probability Distributions vs. Sampling Distributions
0:55
Probability Distributions vs. Sampling Distributions
0:56
Same Logic
3:55
Logic of Probability Distribution
3:56
Example: Rolling Two Die
6:56
Simulating Samples
9:53
To Come Up with Probability Distributions
9:54
In Sampling Distributions
11:12
Connecting Sampling and Research Methods with Sampling Distributions
12:11
Connecting Sampling and Research Methods with Sampling Distributions
12:12
Simulating a Sampling Distribution
14:14
Experimental Design: Regular Sleep vs. Less Sleep
14:15
Logic of Sampling Distributions
23:08
Logic of Sampling Distributions
23:09
General Method of Simulating Sampling Distributions
25:38
General Method of Simulating Sampling Distributions
25:39
Questions that Remain
28:45
Questions that Remain
28:46
Example 1: Mean and Standard Error of Sampling Distribution
30:57
Example 2: What is the Best Way to Describe Sampling Distributions?
37:12
Example 3: Matching Sampling Distributions
38:21
Example 4: Mean and Standard Error of Sampling Distribution
41:51
Sampling Distribution of the Mean

1h 8m 48s

Intro
0:00
0:05
0:06
Special Case of General Method for Simulating a Sampling Distribution
1:53
Special Case of General Method for Simulating a Sampling Distribution
1:54
Computer Simulation
3:43
Using Simulations to See Principles behind Shape of SDoM
15:50
Using Simulations to See Principles behind Shape of SDoM
15:51
Conditions
17:38
Using Simulations to See Principles behind Center (Mean) of SDoM
20:15
Using Simulations to See Principles behind Center (Mean) of SDoM
20:16
Conditions: Does n Matter?
21:31
Conditions: Does Number of Simulation Matter?
24:37
Using Simulations to See Principles behind Standard Deviation of SDoM
27:13
Using Simulations to See Principles behind Standard Deviation of SDoM
27:14
Conditions: Does n Matter?
34:45
Conditions: Does Number of Simulation Matter?
36:24
Central Limit Theorem
37:13
SHAPE
38:08
CENTER
39:34
39:52
Comparing Population, Sample, and SDoM
43:10
Comparing Population, Sample, and SDoM
43:11
48:24
What Happens When We Don't Know What the Population Looks Like?
48:25
Can We Have Sampling Distributions for Summary Statistics Other than the Mean?
49:42
How Do We Know whether a Sample is Sufficiently Unlikely?
53:36
Do We Always Have to Simulate a Large Number of Samples in Order to get a Sampling Distribution?
54:40
Example 1: Mean Batting Average
55:25
Example 2: Mean Sampling Distribution and Standard Error
59:07
Example 3: Sampling Distribution of the Mean
1:01:04
Sampling Distribution of Sample Proportions

54m 37s

Intro
0:00
0:06
0:07
Intro to Sampling Distribution of Sample Proportions (SDoSP)
0:51
Categorical Data (Examples)
0:52
Wish to Estimate Proportion of Population from Sample…
2:00
Notation
3:34
Population Proportion and Sample Proportion Notations
3:35
What's the Difference?
9:19
SDoM vs. SDoSP: Type of Data
9:20
SDoM vs. SDoSP: Shape
11:24
SDoM vs. SDoSP: Center
12:30
15:34
Binomial Distribution vs. Sampling Distribution of Sample Proportions
19:14
Binomial Distribution vs. SDoSP: Type of Data
19:17
Binomial Distribution vs. SDoSP: Shape
21:07
Binomial Distribution vs. SDoSP: Center
21:43
24:08
Example 1: Sampling Distribution of Sample Proportions
26:07
Example 2: Sampling Distribution of Sample Proportions
37:58
Example 3: Sampling Distribution of Sample Proportions
44:42
Example 4: Sampling Distribution of Sample Proportions
45:57
Section 10: Inferential Statistics
Introduction to Confidence Intervals

42m 53s

Intro
0:00
0:06
0:07
Inferential Statistics
0:50
Inferential Statistics
0:51
Two Problems with This Picture…
3:20
Two Problems with This Picture…
3:21
Solution: Confidence Intervals (CI)
4:59
Solution: Hypotheiss Testing (HT)
5:49
Which Parameters are Known?
6:45
Which Parameters are Known?
6:46
Confidence Interval - Goal
7:56
When We Don't Know m but know s
7:57
When We Don't Know
18:27
When We Don't Know m nor s
18:28
Example 1: Confidence Intervals
26:18
Example 2: Confidence Intervals
29:46
Example 3: Confidence Intervals
32:18
Example 4: Confidence Intervals
38:31
t Distributions

1h 2m 6s

Intro
0:00
0:04
0:05
When to Use z vs. t?
1:07
When to Use z vs. t?
1:08
What is z and t?
3:02
z-score and t-score: Commonality
3:03
z-score and t-score: Formulas
3:34
z-score and t-score: Difference
5:22
Why not z? (Why t?)
7:24
Why not z? (Why t?)
7:25
But Don't Worry!
15:13
Gossett and t-distributions
15:14
Rules of t Distributions
17:05
t-distributions are More Normal as n Gets Bigger
17:06
t-distributions are a Family of Distributions
18:55
Degrees of Freedom (df)
20:02
Degrees of Freedom (df)
20:03
t Family of Distributions
24:07
t Family of Distributions : df = 2 , 4, and 60
24:08
df = 60
29:16
df = 2
29:59
How to Find It?
31:01
'Student's t-distribution' or 't-distribution'
31:02
Excel Example
33:06
Example 1: Which Distribution Do You Use? Z or t?
45:26
47:41
Example 3: t Distributions
52:15
Example 4: t Distributions , confidence interval, and mean
55:59
Introduction to Hypothesis Testing

1h 6m 33s

Intro
0:00
0:06
0:07
Issues to Overcome in Inferential Statistics
1:35
Issues to Overcome in Inferential Statistics
1:36
What Happens When We Don't Know What the Population Looks Like?
2:57
How Do We Know whether a sample is Sufficiently Unlikely
3:43
Hypothesizing a Population
6:44
Hypothesizing a Population
6:45
Null Hypothesis
8:07
Alternative Hypothesis
8:56
Hypotheses
11:58
Hypotheses
11:59
Errors in Hypothesis Testing
14:22
Errors in Hypothesis Testing
14:23
Steps of Hypothesis Testing
21:15
Steps of Hypothesis Testing
21:16
Single Sample HT ( When Sigma Available)
26:08
26:09
Step1
27:08
Step 2
27:58
Step 3
28:17
Step 4
32:18
Single Sample HT (When Sigma Not Available)
36:33
36:34
Step1: Hypothesis Testing
36:58
Step 2: Significance Level
37:25
Step 3: Decision Stage
37:40
Step 4: Sample
41:36
Sigma and p-value
45:04
Sigma and p-value
45:05
On tailed vs. Two Tailed Hypotheses
45:51
Example 1: Hypothesis Testing
48:37
Example 2: Heights of Women in the US
57:43
Example 3: Select the Best Way to Complete This Sentence
1:03:23
Confidence Intervals for the Difference of Two Independent Means

55m 14s

Intro
0:00
0:14
0:15
One Mean vs. Two Means
1:17
One Mean vs. Two Means
1:18
Notation
2:41
A Sample! A Set!
2:42
Mean of X, Mean of Y, and Difference of Two Means
3:56
SE of X
4:34
SE of Y
6:28
Sampling Distribution of the Difference between Two Means (SDoD)
7:48
Sampling Distribution of the Difference between Two Means (SDoD)
7:49
Rules of the SDoD (similar to CLT!)
15:00
Mean for the SDoD Null Hypothesis
15:01
Standard Error
17:39
When can We Construct a CI for the Difference between Two Means?
21:28
Three Conditions
21:29
Finding CI
23:56
One Mean CI
23:57
Two Means CI
25:45
Finding t
29:16
Finding t
29:17
Interpreting CI
30:25
Interpreting CI
30:26
Better Estimate of s (s pool)
34:15
Better Estimate of s (s pool)
34:16
Example 1: Confidence Intervals
42:32
Example 2: SE of the Difference
52:36
Hypothesis Testing for the Difference of Two Independent Means

50m

Intro
0:00
0:06
0:07
The Goal of Hypothesis Testing
0:56
One Sample and Two Samples
0:57
Sampling Distribution of the Difference between Two Means (SDoD)
3:42
Sampling Distribution of the Difference between Two Means (SDoD)
3:43
Rules of the SDoD (Similar to CLT!)
6:46
Shape
6:47
Mean for the Null Hypothesis
7:26
Standard Error for Independent Samples (When Variance is Homogenous)
8:18
Standard Error for Independent Samples (When Variance is not Homogenous)
9:25
Same Conditions for HT as for CI
10:08
Three Conditions
10:09
Steps of Hypothesis Testing
11:04
Steps of Hypothesis Testing
11:05
Formulas that Go with Steps of Hypothesis Testing
13:21
Step 1
13:25
Step 2
14:18
Step 3
15:00
Step 4
16:57
Example 1: Hypothesis Testing for the Difference of Two Independent Means
18:47
Example 2: Hypothesis Testing for the Difference of Two Independent Means
33:55
Example 3: Hypothesis Testing for the Difference of Two Independent Means
44:22
Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means

1h 14m 11s

Intro
0:00
0:09
0:10
The Goal of Hypothesis Testing
1:27
One Sample and Two Samples
1:28
Independent Samples vs. Paired Samples
3:16
Independent Samples vs. Paired Samples
3:17
Which is Which?
5:20
Independent SAMPLES vs. Independent VARIABLES
7:43
independent SAMPLES vs. Independent VARIABLES
7:44
T-tests Always…
10:48
T-tests Always…
10:49
Notation for Paired Samples
12:59
Notation for Paired Samples
13:00
Steps of Hypothesis Testing for Paired Samples
16:13
Steps of Hypothesis Testing for Paired Samples
16:14
Rules of the SDoD (Adding on Paired Samples)
18:03
Shape
18:04
Mean for the Null Hypothesis
18:31
Standard Error for Independent Samples (When Variance is Homogenous)
19:25
Standard Error for Paired Samples
20:39
Formulas that go with Steps of Hypothesis Testing
22:59
Formulas that go with Steps of Hypothesis Testing
23:00
Confidence Intervals for Paired Samples
30:32
Confidence Intervals for Paired Samples
30:33
Example 1: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means
32:28
Example 2: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means
44:02
Example 3: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means
52:23
Type I and Type II Errors

31m 27s

Intro
0:00
0:18
0:19
Errors and Relationship to HT and the Sample Statistic?
1:11
Errors and Relationship to HT and the Sample Statistic?
1:12
7:00
One Sample t-test: Friends on Facebook
7:01
Two Sample t-test: Friends on Facebook
13:46
Usually, Lots of Overlap between Null and Alternative Distributions
16:59
Overlap between Null and Alternative Distributions
17:00
How Distributions and 'Box' Fit Together
22:45
How Distributions and 'Box' Fit Together
22:46
Example 1: Types of Errors
25:54
Example 2: Types of Errors
27:30
Example 3: What is the Danger of the Type I Error?
29:38
Effect Size & Power

44m 41s

Intro
0:00
0:05
0:06
Distance between Distributions: Sample t
0:49
Distance between Distributions: Sample t
0:50
Problem with Distance in Terms of Standard Error
2:56
Problem with Distance in Terms of Standard Error
2:57
Test Statistic (t) vs. Effect Size (d or g)
4:38
Test Statistic (t) vs. Effect Size (d or g)
4:39
Rules of Effect Size
6:09
Rules of Effect Size
6:10
Why Do We Need Effect Size?
8:21
Tells You the Practical Significance
8:22
HT can be Deceiving…
10:25
Important Note
10:42
What is Power?
11:20
What is Power?
11:21
Why Do We Need Power?
14:19
Conditional Probability and Power
14:20
Power is:
16:27
Can We Calculate Power?
19:00
Can We Calculate Power?
19:01
How Does Alpha Affect Power?
20:36
How Does Alpha Affect Power?
20:37
How Does Effect Size Affect Power?
25:38
How Does Effect Size Affect Power?
25:39
How Does Variability and Sample Size Affect Power?
27:56
How Does Variability and Sample Size Affect Power?
27:57
How Do We Increase Power?
32:47
Increasing Power
32:48
Example 1: Effect Size & Power
35:40
Example 2: Effect Size & Power
37:38
Example 3: Effect Size & Power
40:55
Section 11: Analysis of Variance
F-distributions

24m 46s

Intro
0:00
0:04
0:05
Z- & T-statistic and Their Distribution
0:34
Z- & T-statistic and Their Distribution
0:35
F-statistic
4:55
The F Ration ( the Variance Ratio)
4:56
F-distribution
12:29
F-distribution
12:30
s and p-value
15:00
s and p-value
15:01
Example 1: Why Does F-distribution Stop At 0 But Go On Until Infinity?
18:33
Example 2: F-distributions
19:29
Example 3: F-distributions and Heights
21:29
ANOVA with Independent Samples

1h 9m 25s

Intro
0:00
0:05
0:06
The Limitations of t-tests
1:12
The Limitations of t-tests
1:13
Two Major Limitations of Many t-tests
3:26
Two Major Limitations of Many t-tests
3:27
Ronald Fisher's Solution… F-test! New Null Hypothesis
4:43
Ronald Fisher's Solution… F-test! New Null Hypothesis (Omnibus Test - One Test to Rule Them All!)
4:44
Analysis of Variance (ANoVA) Notation
7:47
Analysis of Variance (ANoVA) Notation
7:48
Partitioning (Analyzing) Variance
9:58
Total Variance
9:59
Within-group Variation
14:00
Between-group Variation
16:22
Time out: Review Variance & SS
17:05
Time out: Review Variance & SS
17:06
F-statistic
19:22
The F Ratio (the Variance Ratio)
19:23
S²bet = SSbet / dfbet
22:13
What is This?
22:14
How Many Means?
23:20
So What is the dfbet?
23:38
So What is SSbet?
24:15
S²w = SSw / dfw
26:05
What is This?
26:06
How Many Means?
27:20
So What is the dfw?
27:36
So What is SSw?
28:18
Chart of Independent Samples ANOVA
29:25
Chart of Independent Samples ANOVA
29:26
Example 1: Who Uploads More Photos: Unknown Ethnicity, Latino, Asian, Black, or White Facebook Users?
35:52
Hypotheses
35:53
Significance Level
39:40
Decision Stage
40:05
Calculate Samples' Statistic and p-Value
44:10
Reject or Fail to Reject H0
55:54
Example 2: ANOVA with Independent Samples
58:21
Repeated Measures ANOVA

1h 15m 13s

Intro
0:00
0:05
0:06
The Limitations of t-tests
0:36
Who Uploads more Pictures and Which Photo-Type is Most Frequently Used on Facebook?
0:37
ANOVA (F-test) to the Rescue!
5:49
Omnibus Hypothesis
5:50
Analyze Variance
7:27
Independent Samples vs. Repeated Measures
9:12
Same Start
9:13
Independent Samples ANOVA
10:43
Repeated Measures ANOVA
12:00
Independent Samples ANOVA
16:00
Same Start: All the Variance Around Grand Mean
16:01
Independent Samples
16:23
Repeated Measures ANOVA
18:18
Same Start: All the Variance Around Grand Mean
18:19
Repeated Measures
18:33
Repeated Measures F-statistic
21:22
The F Ratio (The Variance Ratio)
21:23
S²bet = SSbet / dfbet
23:07
What is This?
23:08
How Many Means?
23:39
So What is the dfbet?
23:54
So What is SSbet?
24:32
S² resid = SS resid / df resid
25:46
What is This?
25:47
So What is SS resid?
26:44
So What is the df resid?
27:36
SS subj and df subj
28:11
What is This?
28:12
How Many Subject Means?
29:43
So What is df subj?
30:01
So What is SS subj?
30:09
SS total and df total
31:42
What is This?
31:43
What is the Total Number of Data Points?
32:02
So What is df total?
32:34
so What is SS total?
32:47
Chart of Repeated Measures ANOVA
33:19
Chart of Repeated Measures ANOVA: F and Between-samples Variability
33:20
Chart of Repeated Measures ANOVA: Total Variability, Within-subject (case) Variability, Residual Variability
35:50
Example 1: Which is More Prevalent on Facebook: Tagged, Uploaded, Mobile, or Profile Photos?
40:25
Hypotheses
40:26
Significance Level
41:46
Decision Stage
42:09
Calculate Samples' Statistic and p-Value
46:18
Reject or Fail to Reject H0
57:55
Example 2: Repeated Measures ANOVA
58:57
Example 3: What's the Problem with a Bunch of Tiny t-tests?
1:13:59
Section 12: Chi-square Test
Chi-Square Goodness-of-Fit Test

58m 23s

Intro
0:00
0:05
0:06
Where Does the Chi-Square Test Belong?
0:50
Where Does the Chi-Square Test Belong?
0:51
A New Twist on HT: Goodness-of-Fit
7:23
HT in General
7:24
Goodness-of-Fit HT
8:26
12:17
Null Hypothesis
12:18
Alternative Hypothesis
13:23
Example
14:38
Chi-Square Statistic
17:52
Chi-Square Statistic
17:53
Chi-Square Distributions
24:31
Chi-Square Distributions
24:32
Conditions for Chi-Square
28:58
Condition 1
28:59
Condition 2
30:20
Condition 3
30:32
Condition 4
31:47
Example 1: Chi-Square Goodness-of-Fit Test
32:23
Example 2: Chi-Square Goodness-of-Fit Test
44:34
Example 3: Which of These Statements Describe Properties of the Chi-Square Goodness-of-Fit Test?
56:06
Chi-Square Test of Homogeneity

51m 36s

Intro
0:00
0:09
0:10
Goodness-of-Fit vs. Homogeneity
1:13
Goodness-of-Fit HT
1:14
Homogeneity
2:00
Analogy
2:38
5:00
Null Hypothesis
5:01
Alternative Hypothesis
6:11
Example
6:33
Chi-Square Statistic
10:12
Same as Goodness-of-Fit Test
10:13
Set Up Data
12:28
Setting Up Data Example
12:29
Expected Frequency
16:53
Expected Frequency
16:54
Chi-Square Distributions & df
19:26
Chi-Square Distributions & df
19:27
Conditions for Test of Homogeneity
20:54
Condition 1
20:55
Condition 2
21:39
Condition 3
22:05
Condition 4
22:23
Example 1: Chi-Square Test of Homogeneity
22:52
Example 2: Chi-Square Test of Homogeneity
32:10
Section 13: Overview of Statistics
Overview of Statistics

18m 11s

Intro
0:00
0:07
0:08
The Statistical Tests (HT) We've Covered
0:28
The Statistical Tests (HT) We've Covered
0:29
Organizing the Tests We've Covered…
1:08
One Sample: Continuous DV and Categorical DV
1:09
Two Samples: Continuous DV and Categorical DV
5:41
More Than Two Samples: Continuous DV and Categorical DV
8:21
The Following Data: OK Cupid
10:10
The Following Data: OK Cupid
10:11
Example 1: Weird-MySpace-Angle Profile Photo
10:38
Example 2: Geniuses
12:30
Example 3: Promiscuous iPhone Users
13:37
Example 4: Women, Aging, and Messaging
16:07
Bookmark & Share Embed

## Copy & Paste this embed code into your website’s HTML

Please ensure that your website editor is in text mode when you paste the code.
(In Wordpress, the mode button is on the top right corner.)
×
• - Allow users to view the embedded video in full-size.
Since this lesson is not free, only the preview will appear on your website.

• ## Related Books

 1 answerLast reply by: Ryan ReddellTue Apr 29, 2014 11:36 PMPost by KyungYeop Kim on July 15, 2013Does anyone know the website that was once introducted where you can see all the cool statistics expressed in circles?? 0 answersPost by Paulette Jones on May 13, 2013I think you sound fine - you're very approachable and positive. Only question - what version is this of Excel? 1 answerLast reply by: Manoj JosephWed May 1, 2013 6:15 AMPost by Manoj Joseph on May 1, 2013are you saying meadian is also measure of spread?

### Five Number Summary & Boxplots

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

• Intro 0:00
• Summarizing Distributions 0:37
• 5 Number Summary
• Boxplot: Visualizing 5 Number Summary 3:37
• Boxplot: Visualizing 5 Number Summary
• Boxplots on Excel 9:01
• Using 'Stocks' and Using Stacked Columns
• Boxplots on Excel Example
• When are Boxplots Useful? 32:14
• Pros
• Cons
• How to Determine Outlier Status 33:24
• Rule of Thumb: Upper Limit
• Rule of Thumb: Lower Limit
• Signal Outliers in an Excel Data File Using Conditional Formatting
• Modified Boxplot 48:38
• Modified Boxplot
• Example 1: Percentage Values & Lower and Upper Whisker 49:10
• Example 2: Boxplot 50:10
• Example 3: Estimating IQR From Boxplot 53:46
• Example 4: Boxplot and Missing Whisker 54:35

### Transcription: Five Number Summary & Boxplots

Hi and welcome back to www.educator.com.0000

Today we are going to talk about the five number summary and box plots.0002

Now that you know a little bit about variability now we can talk about the 5 number summary0005

and the box plot is going to be a way of visualizing that 5 number summary.0013

We are going to talk about how to determine outliers and we are going to be using conditional formatting in Excel.0019

Finally we are going to be talking about modified box plots which are going to be box plots where we exclude the outliers already.0027

There are 2 ways to summarize a distribution.0037

Usually when we talk about shape, center, and spread we often use as a measure of mean.0048

Mean is what sometimes mean by center.0056

It could be median but a lot of times people use mean.0059

When we use spread we often mean standard deviation rather than other pairs such as median and inter quartile range.0063

It is where the 5 number summary comes in.0072

The 5 number summary is a way of using the median and the inter quartile range more as a summary of how our distribution looks.0077

The 5 numbers are number you already know.0089

The minimum value, the q1, the border of the first quartile, the median which is q2 it divides the entire distribution in half.0093

Two quartiles on one side and two quartiles on the other side,0112

Q3 the third border and the maximum value, the highest value in your distribution.0115

This 5 number summary is often used in skewed distributions.0124

Let me show you visually what this looks like.0128

Let us say we have a whole bunch of different data points and here is our maximum value.0131

Here is our distribution so far.0142

The 5 number summary will basically say the minimum value is important, the maximum value is important.0149

Whatever the value in the middle is.0160

Q1 and q3 which would be dividing this by half and also dividing this by half.0162

Here is min, q1, q2, median, q3, and max.0182

In that way we get a 5 number summary.0206

We may not know our little dots in our distribution but we have a general idea of it.0210

Now we are going to talk about a way to visualize that 5 number summary just quickly.0216

It is going to be called the box plot.0222

It is also called the box and whiskers plot.0224

It is called box and whiskers because often times there is a box, it could also be on its side like this with whiskers on it.0227

Often times this box is split up somewhere in between not necessarily in the middle but just somewhere like that.0240

This box is often aligned with an axis of values of your variable.0248

Your q1 is on the low end, q3 is on the high end.0256

Those are the boundaries of that box.0270

The line in the middle is the median.0274

The whiskers and at the minimum value and the maximum value.0276

That is how you decipher a box plot.0283

It could also be on its side like this.0288

In this case the values will go like this.0291

Once again you have q1, q3, median, min, and max.0297

It does not matter whether it is on this axis or this axis but what you do have to know is the box should be in alignment with the values.0310

Let us do an example box plot here just by hand.0321

Here is an example set and we can easily see the minimum and the maximum.0328

Let us find out what the median is.0336

The median should be right here.0338

That should be 30 because what I am going to do is add 28 and 32 and divde by 2.0341

Just to find the average of those two numbers and that is 30.0351

Visually I just think about what is in the middle of 28 and 32.0355

Let us find q1 and q3.0359

What is in between here and here?0364

That is 20.0366

We count the 18 as the border, 30 as the border.0386

Q1 is right in here so that should be 22 and here 30 is the border, 55 it should be right here.0393

Here is q3 and that should be about 36.5.0409

There we go we have our 5 numbers and we just need to plot them.0425

I am going to put it on a horizontal axis just because I have no room for it.0431

I know that 18 will be on one side of it and 55 will be on the other side.0439

Here I will put my values.0445

We could pretend this stan ds for age.0447

Maybe I will start from 0 and divide this up into 10’s.0453

10, 20, 30, 40, 50, 60.0466

Let us start plotting our box and whiskers.0474

What I like to start to do is put dots there and draw in the box and whiskers.0479

Here I put in a dot in one of my whiskers and here is the other end of that.0484

Q1 is here about 30 is right here the median and 36.5 is right here.0492

I know where my boundaries are.0508

I will just draw my box and whiskers.0512

To help myself I might just write down where q1 and q3 is because that is going to help me draw my box.0514

here is my median, min, max.0524

This would be a box and whisker plot or a box plot for this data set that we have here.0534

Now let us talk about making a box plot in Excel.0540

In Excel it is not a great program for using box plots.0544

If you have Sdata or other statistics programs those are better options.0550

For some of you, you want to make a box plot in Excel so I am going to show you 2 ways of doing it.0555

One method is by using stocks.0561

This is a stocks chart and it is not originally designed for plotting median, q3, min, and max.0564

It is for stock prices and opening and closing prices.0573

We are going to piggy back on it in order to create our box plots.0578

The trouble with that one is it does not allow to show you the median.0584

This ends up being a 4 number summary.0588

The other option is by using stock columns.0592

In this option you can show the median but it takes a little more filling with it.0603

I am tricking Excel into showing you something that looks like a box plot.0610

Let us get started.0614

Go ahead and download the Excel file and you will see there is a whole bunch of different sheets that are already pre put in for you.0617

Some of them, the ones that starts with a and s are pre answered.0629

They already have the graphs and everything in there but you could also follow along.0634

If you go to height stock, it is going to plot height on a stock chart.0638

The nice thing about box plots is that you could have several box plots on one visualization.0645

That is what is nice about being able to compare a couple of 5 number summary.0654

Let us go ahead and find the critical numbers we need for male and female height.0661

The thing about stock chart is that they need you to put in their numbers in a particular order.0668

This is the order you need.0677

You need q1, max, min, and q3.0679

This one is hard for me to remember because it is very arbitrary.0683

But because we are piggy backing of stocks.0693

It not originally meant for this use.0697

Let us find the boundary of the q1.0700

In order to do that, Excel has a nice function for us called quartile and here is what you have to put in quartile.0706

The input that it receives is the array or your data set as well as which quartile you would like.0715

In Excel, q1 is 1 so you would put in your data,1.0722

Let us go ahead and do that.0729

Go to data all the way up to height.0731

There is male height and I will color it in blue.0735

Go ahead and select that data then I am going to put in ,1).0739

Let us look at what we have here.0749

We have this data,1.0751

Let us lock that data in place because we do not need that to move in any time soon.0753

I am going to hit enter.0758

That is for q1.0763

In order to find the median it would be q2 or median.0766

In order to find the boundary for q3 it would just be the same thing with the 3.0770

Let us go ahead and copy and paste that function into my q3 slot.0776

All I have to do is change which quartile that I want it to get back at me.0783

There I have my first and third quartiles.0789

The nice thing about this quartile function is that it can also find you the min and max values because it considers the min value q0.0794

It considers the max value q4.0800

You could also use the functions min and max.0807

Let us go ahead and use quartile.0810

It makes it easier.0812

I could just copy and paste all of that in there.0813

I am going to double click on that maximum and put in 4.0816

I am going to double click on the min quartile and put in 0.0821

There we go.0828

We now have q1, max, min, and q3.0830

Let us find the female q1.0835

Let us go ahead and find our data.0840

I am just going to color it in pink.0844

I am going to put in ,1).0847

Let us see what we have here.0857

You double click on it we have our data and q1.0859

I am just going to lock that data in place so I could easily copy and paste that all the way down.0862

Let us change it for our max value.0874

For our max value we need q4.0879

For our min value we need q0.0882

For our q3 we need q3.0886

Notice that for females the max is slightly smaller.0891

The min is slightly smaller.0898

It is q1 and q3 for its male counter part.0899

Let us go ahead and put in our box and whiskers plot.0903

Go ahead and select the male and female part of this data set.0909

When you do that, Excel will interpret it as you want it to be the labels for your box and whisker plots.0915

Click on charts and if you click on stocks you will see something that looks like a box and whisker plot.0923

You could go ahead and click on it but some of you may get an error signal that looks like this.0934

To create a stock chart, arrange the data on your sheet in this order opening price, high price, low price, closing price.0945

Use data stock names for labels.0954

It has the opening price or q1, high price is max, low price is min, closing price is q3.0958

Yet Excel does not recognize that we have done the right thing.0967

The reason is because Excel is not seeing that we organized our data into our columns instead of rows.0971

Here is one thing you might want to do.0980

Just go ahead and pick column and you will see that Excel is treating each row as if it is a data set.0983

What we want to tell Excel is stop doing that instead organize it this way so that it is grouping them into columns.0998

I am going to move this formatting palette out of the way and now if we go back to stock and click on stocks it should recognize it.1014

It is not elegant but that is the way Excel is sometimes.1031

You have to force it to do what you want.1036

I am going to delete this series because now it is treating each of these 4 rows as a series but we do not need that information.1038

We do not need to know how they go across.1049

We just need to know how they go up and down.1055

I am just going to change my formatting here so you could see it a little more clearly.1057

White is might be a little tough to see.1063

Feel free to pick any color you want.1065

Here is our box plot or classic box plot.1082

You will see that our values are clustered around this end and one thing I want to do is modify my y axis1087

so that I could stretch out the values that are important to us.1098

The way that I will do that is go to scale and put in my min value as 50 instead of 0.1105

When you do that it will stretch this part of from 50 to 80 instead of from 0 to 80.1115

Now we could see our box plot.1123

Here is our min for males which is below 62.1127

The max is 75.1133

Q1 which starts at 67.1135

Q3 which ends at 71.1139

Here is an important thing about a box plot.1145

Before in frequency distributions, we always have a value on the bottom.1148

Height would be on the bottom.1154

Now height in inches is shown on the y axis.1156

Because that is an important distinction, I am going to label my vertical axis to say height.1162

Now we could remember this is where height is.1181

Notice that frequency is not shown here.1189

We do not know exactly how many people have a height of 70 or 75.1192

We only know that those are the boundaries.1198

There is one thing we do know, we know that these are split up into quartiles.1201

This represents the quartile from 0 to 1.1208

This represents the quartile as well, the border from q3 to the end.1213

About 25% of our participants have heights in this range.1221

Another 25% have height in this range.1226

Using our powers of deduction how many people are in this range?1230

50%.1236

If we had a median line, we would also know where our other two quartiles break up.1237

Unfortunately, in this kind of formatting with the stock plots we do not show you the median.1242

In order to see the median because it is an important number sometimes, let us go ahead and use the stock columns option.1250

Go ahead and click on height stock column.1260

Here I have set it up so that now it is in a format that makes sense for me.1264

Something like min, q1, median, q3, and max.1271

That is the order that makes more sense to me or you could order that as a way max, q3, median, so on.1277

The reason I have done that is because it is going to make it a little bit easier for us to create our box plot if it is in order.1285

Let us go ahead and find our quartiles.1295

Let us find our data and put that in for males ,0 because we are starting at 0) and enter.1301

Notice that this is not new.1319

Instead of putting in 0 here is what I am going to do now.1324

I am going to click on this 0 because this way the more I make things into formulas the more I can just cut and paste.1328

The nice thing about clicking on this 0 is that because I have not put in any \$ around it, it is a relative reference.1340

If I cut and paste this down one it will cut and paste this one down as well.1350

It will refer to the next one down.1355

I do not want my data to move down.1357

Let us go ahead and lock down in place and hit enter.1360

Once I have that I could just drag this down, copy and paste as we go.1367

We have all 5 of our values for our 5 numbers summary.1374

We did not have to do anything.1378

We could see that they are in order from least to greatest.1380

Let us do the same for females.1385

We have quartile, find our data and put in ,0) and hit enter.1387

All I am going to do is lock my data in place.1411

Sometimes I forget copying and pasting this.1416

I could just copy and paste that all the way down.1419

So far that part has been easy.1426

Here is where we get a little more complicated in order to create stocks box plots.1429

If you go to charts and go to columns you are going to be using the stock column.1438

In order to use stock column, one thing you have to do is find out the distance between each of these things.1446

In stock columns, what we will do is to put in a little box on top of another little box.1454

Each of those boxes represents the distance from the previous box to the next box.1465

We have to change this into distances.1470

I am just going to put in here distance for males and for females.1474

The first one is the distance from 0.1491

It is just 1.1497

The second one is the distance from this value to this value.1500

Once I have that I can just copy and paste that all the way down because this will give us the distance between this two.1507

Here we have the distance between these two.1517

In order to do that for females all I have to do is copy and paste that over because Excel will do it for the column on the right.1522

Now we have our distances.1535

Because of that we could just take that and put on column.1538

Notice that right now it is putting each row as a column, it is not what we want.1551

We want each column to be a column.1559

I am going to tell Excel organize it this way.1562

I will leave this one aside.1571

Feel free to use whatever coloring scheme you would like just keep it consistent I am going with orange.1576

Here is what we need to do now.1598

This does not look like a box and whisker plot yet.1602

In fact there are no whiskers.1606

That is why we need to fudge a little bit.1609

One thing we have to do is start by getting rid of this block.1612

We do not want to hit delete because that will just delete and ignore that data.1617

That is not what we wanted to do.1624

I want to look at these things as floating up here.1625

Instead what I am going to do is tell Excel color these clear.1629

I am just double clicking on this and fill and no fill.1634

It is invisible now.1640

In terms of line, I think there might be no line.1643

Even though it is there, you could find it.1649

It is just invisible because we do not need that part.1658

It is just the distance from 0 all the way to our min value.1663

Here we want this to be our low whisker.1667

Here is what we do.1672

We ask Excel to make this one invisible then we use error bars.1673

Error bars usually start at the top of that box.1689

Because we are on the top of the box we like for a bar just to to go through a box entirely.1694

We put a –errorbar and we ask for it to be 100% of our box.1702

When I do so, you see this little line here that is 100% of my box.1712

This is an Excel cheat.1719

We cout hit okay and we have our low end whisker.1722

We do not need to do anything with that.1729

We could leave that because that is our q1 to median box.1733

If you hit the error button it will go to the next box up.1739

This is the box that goes from median to q3 and this top one as well this is the box that goes from q3 to max.1746

We need to click on that and make it invisible and get that an error bar as well.1761

That is 100%.1770

In some of your Excel options you might be able to get rid of these little feet so that it looks like little bars.1773

In newer versions of Excel that is the case.1790

That is what I usually do because I think it looks ugly to have those little seats there.1793

That is not a big deal to have a feet.1798

Let us change our y axis so that we could see our box plot a little better and just hit scale.1803

I just double clicked on it and I am going to make my min value 50 and hit okay.1816

I do not want it to say distance of males instead I want it to say just males.1825

I am going to change this into males.1834

Excel will do this automatically for me.1838

Finally what I am going to do is add a little label to my vertical axis so that I remember the height and inches is being shown here.1847

Here we could see male that the quartiles here split up quite nicely.1864

But the quartiles here seem to be that there is a greater range of low end than the higher end.1871

The spread is bigger between q1 and median than the median and q3.1887

Our 2nd quartile is slightly fatter than our 3rd quartile.1896

But our 1st and last quartiles are even bigger than these two.1901

Although that is a bit of a fudge, you could see we could use stock columns in order to create box plots as well.1908

That is what I mean by slightly idiosyncratic.1921

It is not the best way but it can be done.1930

Let us talk about box plots are useful.1932

The pro’s of box plots is that you could plot a single quantitative variable and you can compare it to 2 or more distributions easily.1938

In fact box plots are useful for comparing lots of distributions very quickly because it is a good visualization.1947

It is very compact.1954

You put it all in one graph and it is easy to compare the boxes.1956

When distributions has many values, too many to show individually, you would not want to use a stem plot or a dot plot.1962

You would want to use something like a box plot.1972

Also it is nice when you do not want to see anything more than a 5 number summary.1974

One of that cons is that you cannot see frequency.1979

If you cannot see how many people have a height of 72 but you can see a frequency within a range.1983

You know that 50% of the people lie in between q1 and q3.1993

You can see that easily on a box plot.1999

How do we determine whether we have outliers in our data set?2002

Outliers are extreme values but that is a subjective way of defining it.2012

How do we know how extreme?2020

Who defines extreme?2021

We have a rule of that in statistics.2024

Typically we determine outliers by saying that an outlier is 1.5 times the inter quartile range from the nearest quartile.2027

What we mean is the upper limit is the nearest quartile is q3 and you add 1.5 times the inter quartile range whatever that is.2038

That is the upper limit boundary.2053

The lower limit boundary would be from the q1 boundary and because we are going lower2056

we are subtracting 1.5 times the inter quartile range.2064

Sometimes in skewed distributions where the 5 number summary is useful you might not need one of these.2070

Because one side might not have outliers only the other side does.2078

This right skewed we have the upper limit boundary.2085

If it is left skewed we have a lower limit boundary.2088

We are going to learn how to signal outliers in Excel data file by using conditional formatting.2092

Let us look at photos of stocks.2110

Photos if you remember from our previous frequency distribution visualizations that tends to be a skewed distribution.2121

It tends to be that photos are skewed to the right where there are some people that have an extremely high number of photos on one end.2133

Let us determine what the outlier boundaries are for tagged photos.2148

In order to do that, we might want to start off by finding q1, max, min, and q3.2154

Let us go ahead and put in quartile and put in our data for tagged photos.2161

I am just going to put in ,1 and hit enter.2181

Our first quartile is going to be locked in place.2188

Our 1st quartile is 53.75 that is the boundary.2198

I am just going to copy and paste that all the way down.2207

I will just change the quartile that we are referencing.2210

We need the max here so that is 4.2213

We need the min here which is 0.2216

We need q3.2221

Notice that q3 is around 250 but the max value is 4,686.2224

I am guessing we have a few outliers here but to find outliers we need to find the inter quartile range.2237

The inter quartile range is easy once you have q3 and q1 because it is simply q3 – q1.2246

Instead of having max and min let us modify this.2259

Let us modify it to be the upper and lower limit.2282

Instead of a classic box plot, we are going to be looking at a modified box plot.2287

Instead of this max and min we are going to be putting our formulas for the upper and lower limit.2299

The min value we probably would not need it because 13 is going to be within the boundary2311

like if we made q1 – 1.5 × 200 that would be way lower than our actual min value.2323

We do not need to modify that one.2331

Let us modify this one.2333

Here what we do is take our other boundary + 1.5 × IQR.2337

Excel knows order of operations so it will do the multiplication before it does the addition.2349

Let us hit enter.2356

Notice that our outer boundary is a lot smaller than our max value.2358

It is 549.375.2367

It would be nice if we could go back to our data and see quickly who is in that range and who is not.2372

Who is out of that range or my outliers?2380

If we go out to the data, what I am going to show you is something called conditional formatting.2384

Go ahead and select all the data that we want Excel to look at.2390

What we are going to let Excel do is color these data points if they are outliers.2397

In order to do that, after you select is go to your menu which we cannot see here and hit format.2405

There should be an option that is conditional formatting.2415

Go ahead and click on that.2419

You should get a box that looks like this.2421

Here we want to say if the cell value is greater than something then format it differently.2423

Let us see what our other boundary is again.2436

We go to photos and fortunately it would let you put enough formula unless it is in the same work sheet.2440

We just have to put it in manually.2450

If it is greater that 549.375 then format it differently.2452

Hit format.2461

One thing I might want to do is just color it in a different color.2463

I think I will put in something like red.2469

Color it red if it is an outlier and hit okay.2472

If you look through this data file you will see we already have 2, 3, 4 outliers.2477

We have 7 outliers in our sample of 100.2490

We know that those 7 or so people are not going to be acquitted.2497

In order to do a modified box plot, all we would do is plot these values instead of the max and min.2504

Here I am going to select these and use stock option.2514

Let us see if it will do for me on the first try.2524

Unfortunately no.2528

It gives me this error.2530

Instead I am just going to put something else just so that I could tell Excel please organize it differently.2533

I am going to go back to my stock.2542

I am going to color it something different.2548

I am going to delete this series because it is redundant.2557

Here are our tagged photos and what we see is this range is a lot bigger even though there is2570

the same 25% of people in there as well as in here.2579

It is typically what it looks like for skewed distribution.2583

A skewed distribution will have a small whisker on one end and a large whisker on the other end.2587

Because even though it has the same number of people as there are in this little whisker down here,2592

this one shows you there is a wider range that captures that same 25% of people.2602

This is a right skewed distribution because the top whisker is bigger than the bottom whisker.2609

One thing we might want to do is label our vertical axis to show number of photos.2617

Just so that we remember we are not talking about frequency of people here.2629

Let us use that same data in order to create stock columns.2636

Go ahead and put in stock columns and instead of a classic box plot we are going to be using a modified box plot.2650

Let us put in our quartiles and data right here.2660

Go back and put in this one.2676

Let us see what we have.2681

We know we want to lock in this data in place and hit enter.2685

I could just drag that down.2693

We want to change this max value to be modified.2695

In order to do that we have to find IQR which is q3 – q1 and delete that and put in =.2704

My q3 + 1.5 × IQR and enter.2725

Let us find the distances between these.2735

The first one I do not have to do anything with that because it is the distance from 0.2741

The next one I just subtract c3 from c4 and copy and paste that all the way down.2745

Let us plot these distances.2757

I am going to select all of that and hit charts, go to columns and choose stock columns.2764

Unfortunately it is not stocking them.2772

Here on my toolbar I am going to tell it organize differently and delete my series here because this is not what I need.2776

Now we have to do that fancy formatting.2792

The first one we know we just need to make it clear or transparent.2797

The second one we know we need to make it transparent as well as putting in a –errorbar 100% of that box.2805

Up here we also need that –errorbar 100% of my box with no fill.2819

Hit okay.2830

Here is what we have.2833

We should probably change that distance because it does not have to have distance.2836

We should put something like tagged photos and on the side we will change our vertical axis so that it reads number of photos.2842

We know that is our values.2856

There we go we have a modified box plot.2859

It is no longer from 0 all the way to 5,000.2863

It is form 0 to about 600.2868

It is still a skewed distribution even when we top off those crazy outliers we still have our whiskers on top that is a lot longer2872

than our whiskers on the bottom even though they both indicate the same 25% of our 100% sample.2889

I am going to minimize all this and get rid of that.2898

Now we know how to create conditional formatting so that we could signal these people are outliers.2906

We are not including them in our visualization.2916

We just went over modify box plots.2919

The classic box plots relies on the max and min but those are very susceptible to outliers.2924

If you have a couple of crazy outliers that would drastically change our visualization.2930

The modify box plots use those modified boundaries that is 1.5 times the IQR.2936

This is helpful to highly skewed distributions as we saw with tagged photos.2943

Let us go into some examples.2949

Approximately what percentage of values in the data set lie within the box?2953

That box with the whiskers.2960

What percentage of the data lies in there?2965

We know that this is 25% of our sample as well as this also 25%.2967

Each of these quartiles are also 25% so we know that what is in here is 50% of our sample.2978

The lower whisker 25% and the upper whisker 25% and that is one of the reasons why box plots are useful.2991

It breaks up that data easily and do chunks that we could use.3004

Example 2, I want a box plot that looks for data set that is skewed right.3010

In a skewed right distribution we have a few of these outliers that are very positive.3018

What would our box plot look like?3034

For data set that is skewed right, which I write in red, if we have a box plot that looks like this3037

we know that the right side would be much longer than the left side.3053

Probably the right box would be longer than the left box.3060

When it is up and down remember this side the top part is the greater side it would just leave this shoulder which is spread around.3065

This upper side is longer than the lower side.3088

What about those populations that are skewed left?3092

In that case smaller numbers there are more outliers than the smaller end of the continuum.3097

Skewed left.3106

In a side to side box it would be easy to see because that left side would be longer than the right side.3110

Probably even within the box you would have that left box being bigger than the right box.3119

Remember when we draw up and down the bottom end would be longer than the top end.3127

Here that is the positive end.3140

This side would be longer as well.3144

Skewed left distributions we know is either like this or like this.3149

Skewed right distributions we know might be one of this.3154

When you draw on each side you know that it has to be roughly symmetrical because it is approximately normal.3166

That part is easy.3175

Another thing about normal distributions is that most of the values 60% are clustered within that small space.3177

We are 1 standard deviation away on either side.3190

We could guess that these little tails may be longer than the actual box.3194

That is how it look but it is roughly symmetrical.3205

It is easy to draw that on this side as well because it is roughly symmetrical.3210

Example 3, how can you estimate the IQR from the box plot?3225

The IQR is easy to see on a box plot because whatever the box is the bottom end of that is q1 and the top is q3.3233

Can you estimate the range?3249

If so, how?3255

The range is your min and max values.3256

It is the length of that entire box and whiskers and that would be your range.3260

IQR is easy to see on a box plot.3268

Example 4, is it possible for a box plot to be missing a whisker?3275

If so, give an example.3282

If not, explain why not.3284

We know that when it is skewed, let us say skewed right, we know that this side of whisker can be very small and this side can be very long.3287

What would it mean for not to have a whisker at all?3302

One thing might be that you might have so many values that are exactly the same so you cannot split it up in a whisker and box.3306

For instance, let me give you an example of data set that might do so.3318

Let us say it is out of 8 values.3323

I just picked 8 because it is easy to split into quartiles.3330

What is all of the values in both q1 and q2 are exactly the same?3334

Then you could not have whiskers because it would be arbitrary like these 0 get whiskers and this is supposed to be in the box.3343

You probably put them all on the same box.3359

This would be a very thin box.3362

That is an example.3365

Obviously you could do it on the other side as well.3369

This might be an example of a left skewed distribution that does not have a whisker.3373

It is something like 4, 4, 3.3386

In these kinds of plots, it is hard to say what the IQR would be.3401

It would be difficult to arbitrary say 4 is the boundary for the whisker as well as the box.3409

It cannot be both.3420

These are the kinds of distributions that we would be missing a whisker.3421

That is the end of box and whisker plots.3429

Thanks for using www.educator.com.3433

OR

### Start Learning Now

Our free lessons will get you started (Adobe Flash® required).