Dr. Ji Son

Dr. Ji Son

Chi-Square Goodness-of-Fit Test

Slide Duration:

Table of Contents

Section 1: Introduction
Descriptive Statistics vs. Inferential Statistics

25m 31s

Intro
0:00
Roadmap
0:10
Roadmap
0:11
Statistics
0:35
Statistics
0:36
Let's Think About High School Science
1:12
Measurement and Find Patterns (Mathematical Formula)
1:13
Statistics = Math of Distributions
4:58
Distributions
4:59
Problematic… but also GREAT
5:58
Statistics
7:33
How is It Different from Other Specializations in Mathematics?
7:34
Statistics is Fundamental in Natural and Social Sciences
7:53
Two Skills of Statistics
8:20
Description (Exploration)
8:21
Inference
9:13
Descriptive Statistics vs. Inferential Statistics: Apply to Distributions
9:58
Descriptive Statistics
9:59
Inferential Statistics
11:05
Populations vs. Samples
12:19
Populations vs. Samples: Is it the Truth?
12:20
Populations vs. Samples: Pros & Cons
13:36
Populations vs. Samples: Descriptive Values
16:12
Putting Together Descriptive/Inferential Stats & Populations/Samples
17:10
Putting Together Descriptive/Inferential Stats & Populations/Samples
17:11
Example 1: Descriptive Statistics vs. Inferential Statistics
19:09
Example 2: Descriptive Statistics vs. Inferential Statistics
20:47
Example 3: Sample, Parameter, Population, and Statistic
21:40
Example 4: Sample, Parameter, Population, and Statistic
23:28
Section 2: About Samples: Cases, Variables, Measurements
About Samples: Cases, Variables, Measurements

32m 14s

Intro
0:00
Data
0:09
Data, Cases, Variables, and Values
0:10
Rows, Columns, and Cells
2:03
Example: Aircrafts
3:52
How Do We Get Data?
5:38
Research: Question and Hypothesis
5:39
Research Design
7:11
Measurement
7:29
Research Analysis
8:33
Research Conclusion
9:30
Types of Variables
10:03
Discrete Variables
10:04
Continuous Variables
12:07
Types of Measurements
14:17
Types of Measurements
14:18
Types of Measurements (Scales)
17:22
Nominal
17:23
Ordinal
19:11
Interval
21:33
Ratio
24:24
Example 1: Cases, Variables, Measurements
25:20
Example 2: Which Scale of Measurement is Used?
26:55
Example 3: What Kind of a Scale of Measurement is This?
27:26
Example 4: Discrete vs. Continuous Variables.
30:31
Section 3: Visualizing Distributions
Introduction to Excel

8m 9s

Intro
0:00
Before Visualizing Distribution
0:10
Excel
0:11
Excel: Organization
0:45
Workbook
0:46
Column x Rows
1:50
Tools: Menu Bar, Standard Toolbar, and Formula Bar
3:00
Excel + Data
6:07
Exce and Data
6:08
Frequency Distributions in Excel

39m 10s

Intro
0:00
Roadmap
0:08
Data in Excel and Frequency Distributions
0:09
Raw Data to Frequency Tables
0:42
Raw Data to Frequency Tables
0:43
Frequency Tables: Using Formulas and Pivot Tables
1:28
Example 1: Number of Births
7:17
Example 2: Age Distribution
20:41
Example 3: Height Distribution
27:45
Example 4: Height Distribution of Males
32:19
Frequency Distributions and Features

25m 29s

Intro
0:00
Roadmap
0:10
Data in Excel, Frequency Distributions, and Features of Frequency Distributions
0:11
Example #1
1:35
Uniform
1:36
Example #2
2:58
Unimodal, Skewed Right, and Asymmetric
2:59
Example #3
6:29
Bimodal
6:30
Example #4a
8:29
Symmetric, Unimodal, and Normal
8:30
Point of Inflection and Standard Deviation
11:13
Example #4b
12:43
Normal Distribution
12:44
Summary
13:56
Uniform, Skewed, Bimodal, and Normal
13:57
Sketch Problem 1: Driver's License
17:34
Sketch Problem 2: Life Expectancy
20:01
Sketch Problem 3: Telephone Numbers
22:01
Sketch Problem 4: Length of Time Used to Complete a Final Exam
23:43
Dotplots and Histograms in Excel

42m 42s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Previously
1:02
Data, Frequency Table, and visualization
1:03
Dotplots
1:22
Dotplots Excel Example
1:23
Dotplots: Pros and Cons
7:22
Pros and Cons of Dotplots
7:23
Dotplots Excel Example Cont.
9:07
Histograms
12:47
Histograms Overview
12:48
Example of Histograms
15:29
Histograms: Pros and Cons
31:39
Pros
31:40
Cons
32:31
Frequency vs. Relative Frequency
32:53
Frequency
32:54
Relative Frequency
33:36
Example 1: Dotplots vs. Histograms
34:36
Example 2: Age of Pennies Dotplot
36:21
Example 3: Histogram of Mammal Speeds
38:27
Example 4: Histogram of Life Expectancy
40:30
Stemplots

12m 23s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
What Sets Stemplots Apart?
0:46
Data Sets, Dotplots, Histograms, and Stemplots
0:47
Example 1: What Do Stemplots Look Like?
1:58
Example 2: Back-to-Back Stemplots
5:00
Example 3: Quiz Grade Stemplot
7:46
Example 4: Quiz Grade & Afterschool Tutoring Stemplot
9:56
Bar Graphs

22m 49s

Intro
0:00
Roadmap
0:05
Roadmap
0:08
Review of Frequency Distributions
0:44
Y-axis and X-axis
0:45
Types of Frequency Visualizations Covered so Far
2:16
Introduction to Bar Graphs
4:07
Example 1: Bar Graph
5:32
Example 1: Bar Graph
5:33
Do Shapes, Center, and Spread of Distributions Apply to Bar Graphs?
11:07
Do Shapes, Center, and Spread of Distributions Apply to Bar Graphs?
11:08
Example 2: Create a Frequency Visualization for Gender
14:02
Example 3: Cases, Variables, and Frequency Visualization
16:34
Example 4: What Kind of Graphs are Shown Below?
19:29
Section 4: Summarizing Distributions
Central Tendency: Mean, Median, Mode

38m 50s

Intro
0:00
Roadmap
0:07
Roadmap
0:08
Central Tendency 1
0:56
Way to Summarize a Distribution of Scores
0:57
Mode
1:32
Median
2:02
Mean
2:36
Central Tendency 2
3:47
Mode
3:48
Median
4:20
Mean
5:25
Summation Symbol
6:11
Summation Symbol
6:12
Population vs. Sample
10:46
Population vs. Sample
10:47
Excel Examples
15:08
Finding Mode, Median, and Mean in Excel
15:09
Median vs. Mean
21:45
Effect of Outliers
21:46
Relationship Between Parameter and Statistic
22:44
Type of Measurements
24:00
Which Distributions to Use With
24:55
Example 1: Mean
25:30
Example 2: Using Summation Symbol
29:50
Example 3: Average Calorie Count
32:50
Example 4: Creating an Example Set
35:46
Variability

42m 40s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Variability (or Spread)
0:45
Variability (or Spread)
0:46
Things to Think About
5:45
Things to Think About
5:46
Range, Quartiles and Interquartile Range
6:37
Range
6:38
Interquartile Range
8:42
Interquartile Range Example
10:58
Interquartile Range Example
10:59
Variance and Standard Deviation
12:27
Deviations
12:28
Sum of Squares
14:35
Variance
16:55
Standard Deviation
17:44
Sum of Squares (SS)
18:34
Sum of Squares (SS)
18:35
Population vs. Sample SD
22:00
Population vs. Sample SD
22:01
Population vs. Sample
23:20
Mean
23:21
SD
23:51
Example 1: Find the Mean and Standard Deviation of the Variable Friends in the Excel File
27:21
Example 2: Find the Mean and Standard Deviation of the Tagged Photos in the Excel File
35:25
Example 3: Sum of Squares
38:58
Example 4: Standard Deviation
41:48
Five Number Summary & Boxplots

57m 15s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Summarizing Distributions
0:37
Shape, Center, and Spread
0:38
5 Number Summary
1:14
Boxplot: Visualizing 5 Number Summary
3:37
Boxplot: Visualizing 5 Number Summary
3:38
Boxplots on Excel
9:01
Using 'Stocks' and Using Stacked Columns
9:02
Boxplots on Excel Example
10:14
When are Boxplots Useful?
32:14
Pros
32:15
Cons
32:59
How to Determine Outlier Status
33:24
Rule of Thumb: Upper Limit
33:25
Rule of Thumb: Lower Limit
34:16
Signal Outliers in an Excel Data File Using Conditional Formatting
34:52
Modified Boxplot
48:38
Modified Boxplot
48:39
Example 1: Percentage Values & Lower and Upper Whisker
49:10
Example 2: Boxplot
50:10
Example 3: Estimating IQR From Boxplot
53:46
Example 4: Boxplot and Missing Whisker
54:35
Shape: Calculating Skewness & Kurtosis

41m 51s

Intro
0:00
Roadmap
0:16
Roadmap
0:17
Skewness Concept
1:09
Skewness Concept
1:10
Calculating Skewness
3:26
Calculating Skewness
3:27
Interpreting Skewness
7:36
Interpreting Skewness
7:37
Excel Example
8:49
Kurtosis Concept
20:29
Kurtosis Concept
20:30
Calculating Kurtosis
24:17
Calculating Kurtosis
24:18
Interpreting Kurtosis
29:01
Leptokurtic
29:35
Mesokurtic
30:10
Platykurtic
31:06
Excel Example
32:04
Example 1: Shape of Distribution
38:28
Example 2: Shape of Distribution
39:29
Example 3: Shape of Distribution
40:14
Example 4: Kurtosis
41:10
Normal Distribution

34m 33s

Intro
0:00
Roadmap
0:13
Roadmap
0:14
What is a Normal Distribution
0:44
The Normal Distribution As a Theoretical Model
0:45
Possible Range of Probabilities
3:05
Possible Range of Probabilities
3:06
What is a Normal Distribution
5:07
Can Be Described By
5:08
Properties
5:49
'Same' Shape: Illusion of Different Shape!
7:35
'Same' Shape: Illusion of Different Shape!
7:36
Types of Problems
13:45
Example: Distribution of SAT Scores
13:46
Shape Analogy
19:48
Shape Analogy
19:49
Example 1: The Standard Normal Distribution and Z-Scores
22:34
Example 2: The Standard Normal Distribution and Z-Scores
25:54
Example 3: Sketching and Normal Distribution
28:55
Example 4: Sketching and Normal Distribution
32:32
Standard Normal Distributions & Z-Scores

41m 44s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
A Family of Distributions
0:28
Infinite Set of Distributions
0:29
Transforming Normal Distributions to 'Standard' Normal Distribution
1:04
Normal Distribution vs. Standard Normal Distribution
2:58
Normal Distribution vs. Standard Normal Distribution
2:59
Z-Score, Raw Score, Mean, & SD
4:08
Z-Score, Raw Score, Mean, & SD
4:09
Weird Z-Scores
9:40
Weird Z-Scores
9:41
Excel
16:45
For Normal Distributions
16:46
For Standard Normal Distributions
19:11
Excel Example
20:24
Types of Problems
25:18
Percentage Problem: P(x)
25:19
Raw Score and Z-Score Problems
26:28
Standard Deviation Problems
27:01
Shape Analogy
27:44
Shape Analogy
27:45
Example 1: Deaths Due to Heart Disease vs. Deaths Due to Cancer
28:24
Example 2: Heights of Male College Students
33:15
Example 3: Mean and Standard Deviation
37:14
Example 4: Finding Percentage of Values in a Standard Normal Distribution
37:49
Normal Distribution: PDF vs. CDF

55m 44s

Intro
0:00
Roadmap
0:15
Roadmap
0:16
Frequency vs. Cumulative Frequency
0:56
Frequency vs. Cumulative Frequency
0:57
Frequency vs. Cumulative Frequency
4:32
Frequency vs. Cumulative Frequency Cont.
4:33
Calculus in Brief
6:21
Derivative-Integral Continuum
6:22
PDF
10:08
PDF for Standard Normal Distribution
10:09
PDF for Normal Distribution
14:32
Integral of PDF = CDF
21:27
Integral of PDF = CDF
21:28
Example 1: Cumulative Frequency Graph
23:31
Example 2: Mean, Standard Deviation, and Probability
24:43
Example 3: Mean and Standard Deviation
35:50
Example 4: Age of Cars
49:32
Section 5: Linear Regression
Scatterplots

47m 19s

Intro
0:00
Roadmap
0:04
Roadmap
0:05
Previous Visualizations
0:30
Frequency Distributions
0:31
Compare & Contrast
2:26
Frequency Distributions Vs. Scatterplots
2:27
Summary Values
4:53
Shape
4:54
Center & Trend
6:41
Spread & Strength
8:22
Univariate & Bivariate
10:25
Example Scatterplot
10:48
Shape, Trend, and Strength
10:49
Positive and Negative Association
14:05
Positive and Negative Association
14:06
Linearity, Strength, and Consistency
18:30
Linearity
18:31
Strength
19:14
Consistency
20:40
Summarizing a Scatterplot
22:58
Summarizing a Scatterplot
22:59
Example 1: Gapminder.org, Income x Life Expectancy
26:32
Example 2: Gapminder.org, Income x Infant Mortality
36:12
Example 3: Trend and Strength of Variables
40:14
Example 4: Trend, Strength and Shape for Scatterplots
43:27
Regression

32m 2s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Linear Equations
0:34
Linear Equations: y = mx + b
0:35
Rough Line
5:16
Rough Line
5:17
Regression - A 'Center' Line
7:41
Reasons for Summarizing with a Regression Line
7:42
Predictor and Response Variable
10:04
Goal of Regression
12:29
Goal of Regression
12:30
Prediction
14:50
Example: Servings of Mile Per Year Shown By Age
14:51
Intrapolation
17:06
Extrapolation
17:58
Error in Prediction
20:34
Prediction Error
20:35
Residual
21:40
Example 1: Residual
23:34
Example 2: Large and Negative Residual
26:30
Example 3: Positive Residual
28:13
Example 4: Interpret Regression Line & Extrapolate
29:40
Least Squares Regression

56m 36s

Intro
0:00
Roadmap
0:13
Roadmap
0:14
Best Fit
0:47
Best Fit
0:48
Sum of Squared Errors (SSE)
1:50
Sum of Squared Errors (SSE)
1:51
Why Squared?
3:38
Why Squared?
3:39
Quantitative Properties of Regression Line
4:51
Quantitative Properties of Regression Line
4:52
So How do we Find Such a Line?
6:49
SSEs of Different Line Equations & Lowest SSE
6:50
Carl Gauss' Method
8:01
How Do We Find Slope (b1)
11:00
How Do We Find Slope (b1)
11:01
Hoe Do We Find Intercept
15:11
Hoe Do We Find Intercept
15:12
Example 1: Which of These Equations Fit the Above Data Best?
17:18
Example 2: Find the Regression Line for These Data Points and Interpret It
26:31
Example 3: Summarize the Scatterplot and Find the Regression Line.
34:31
Example 4: Examine the Mean of Residuals
43:52
Correlation

43m 58s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Summarizing a Scatterplot Quantitatively
0:47
Shape
0:48
Trend
1:11
Strength: Correlation ®
1:45
Correlation Coefficient ( r )
2:30
Correlation Coefficient ( r )
2:31
Trees vs. Forest
11:59
Trees vs. Forest
12:00
Calculating r
15:07
Average Product of z-scores for x and y
15:08
Relationship between Correlation and Slope
21:10
Relationship between Correlation and Slope
21:11
Example 1: Find the Correlation between Grams of Fat and Cost
24:11
Example 2: Relationship between r and b1
30:24
Example 3: Find the Regression Line
33:35
Example 4: Find the Correlation Coefficient for this Set of Data
37:37
Correlation: r vs. r-squared

52m 52s

Intro
0:00
Roadmap
0:07
Roadmap
0:08
R-squared
0:44
What is the Meaning of It? Why Squared?
0:45
Parsing Sum of Squared (Parsing Variability)
2:25
SST = SSR + SSE
2:26
What is SST and SSE?
7:46
What is SST and SSE?
7:47
r-squared
18:33
Coefficient of Determination
18:34
If the Correlation is Strong…
20:25
If the Correlation is Strong…
20:26
If the Correlation is Weak…
22:36
If the Correlation is Weak…
22:37
Example 1: Find r-squared for this Set of Data
23:56
Example 2: What Does it Mean that the Simple Linear Regression is a 'Model' of Variance?
33:54
Example 3: Why Does r-squared Only Range from 0 to 1
37:29
Example 4: Find the r-squared for This Set of Data
39:55
Transformations of Data

27m 8s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Why Transform?
0:26
Why Transform?
0:27
Shape-preserving vs. Shape-changing Transformations
5:14
Shape-preserving = Linear Transformations
5:15
Shape-changing Transformations = Non-linear Transformations
6:20
Common Shape-Preserving Transformations
7:08
Common Shape-Preserving Transformations
7:09
Common Shape-Changing Transformations
8:59
Powers
9:00
Logarithms
9:39
Change Just One Variable? Both?
10:38
Log-log Transformations
10:39
Log Transformations
14:38
Example 1: Create, Graph, and Transform the Data Set
15:19
Example 2: Create, Graph, and Transform the Data Set
20:08
Example 3: What Kind of Model would You Choose for this Data?
22:44
Example 4: Transformation of Data
25:46
Section 6: Collecting Data in an Experiment
Sampling & Bias

54m 44s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Descriptive vs. Inferential Statistics
1:04
Descriptive Statistics: Data Exploration
1:05
Example
2:03
To tackle Generalization…
4:31
Generalization
4:32
Sampling
6:06
'Good' Sample
6:40
Defining Samples and Populations
8:55
Population
8:56
Sample
11:16
Why Use Sampling?
13:09
Why Use Sampling?
13:10
Goal of Sampling: Avoiding Bias
15:04
What is Bias?
15:05
Where does Bias Come from: Sampling Bias
17:53
Where does Bias Come from: Response Bias
18:27
Sampling Bias: Bias from Bas Sampling Methods
19:34
Size Bias
19:35
Voluntary Response Bias
21:13
Convenience Sample
22:22
Judgment Sample
23:58
Inadequate Sample Frame
25:40
Response Bias: Bias from 'Bad' Data Collection Methods
28:00
Nonresponse Bias
29:31
Questionnaire Bias
31:10
Incorrect Response or Measurement Bias
37:32
Example 1: What Kind of Biases?
40:29
Example 2: What Biases Might Arise?
44:46
Example 3: What Kind of Biases?
48:34
Example 4: What Kind of Biases?
51:43
Sampling Methods

14m 25s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Biased vs. Unbiased Sampling Methods
0:32
Biased Sampling
0:33
Unbiased Sampling
1:13
Probability Sampling Methods
2:31
Simple Random
2:54
Stratified Random Sampling
4:06
Cluster Sampling
5:24
Two-staged Sampling
6:22
Systematic Sampling
7:25
Example 1: Which Type(s) of Sampling was this?
8:33
Example 2: Describe How to Take a Two-Stage Sample from this Book
10:16
Example 3: Sampling Methods
11:58
Example 4: Cluster Sample Plan
12:48
Research Design

53m 54s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Descriptive vs. Inferential Statistics
0:51
Descriptive Statistics: Data Exploration
0:52
Inferential Statistics
1:02
Variables and Relationships
1:44
Variables
1:45
Relationships
2:49
Not Every Type of Study is an Experiment…
4:16
Category I - Descriptive Study
4:54
Category II - Correlational Study
5:50
Category III - Experimental, Quasi-experimental, Non-experimental
6:33
Category III
7:42
Experimental, Quasi-experimental, and Non-experimental
7:43
Why CAN'T the Other Strategies Determine Causation?
10:18
Third-variable Problem
10:19
Directionality Problem
15:49
What Makes Experiments Special?
17:54
Manipulation
17:55
Control (and Comparison)
21:58
Methods of Control
26:38
Holding Constant
26:39
Matching
29:11
Random Assignment
31:48
Experiment Terminology
34:09
'true' Experiment vs. Study
34:10
Independent Variable (IV)
35:16
Dependent Variable (DV)
35:45
Factors
36:07
Treatment Conditions
36:23
Levels
37:43
Confounds or Extraneous Variables
38:04
Blind
38:38
Blind Experiments
38:39
Double-blind Experiments
39:29
How Categories Relate to Statistics
41:35
Category I - Descriptive Study
41:36
Category II - Correlational Study
42:05
Category III - Experimental, Quasi-experimental, Non-experimental
42:43
Example 1: Research Design
43:50
Example 2: Research Design
47:37
Example 3: Research Design
50:12
Example 4: Research Design
52:00
Between and Within Treatment Variability

41m 31s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Experimental Designs
0:51
Experimental Designs: Manipulation & Control
0:52
Two Types of Variability
2:09
Between Treatment Variability
2:10
Within Treatment Variability
3:31
Updated Goal of Experimental Design
5:47
Updated Goal of Experimental Design
5:48
Example: Drugs and Driving
6:56
Example: Drugs and Driving
6:57
Different Types of Random Assignment
11:27
All Experiments
11:28
Completely Random Design
12:02
Randomized Block Design
13:19
Randomized Block Design
15:48
Matched Pairs Design
15:49
Repeated Measures Design
19:47
Between-subject Variable vs. Within-subject Variable
22:43
Completely Randomized Design
22:44
Repeated Measures Design
25:03
Example 1: Design a Completely Random, Matched Pair, and Repeated Measures Experiment
26:16
Example 2: Block Design
31:41
Example 3: Completely Randomized Designs
35:11
Example 4: Completely Random, Matched Pairs, or Repeated Measures Experiments?
39:01
Section 7: Review of Probability Axioms
Sample Spaces

37m 52s

Intro
0:00
Roadmap
0:07
Roadmap
0:08
Why is Probability Involved in Statistics
0:48
Probability
0:49
Can People Tell the Difference between Cheap and Gourmet Coffee?
2:08
Taste Test with Coffee Drinkers
3:37
If No One can Actually Taste the Difference
3:38
If Everyone can Actually Taste the Difference
5:36
Creating a Probability Model
7:09
Creating a Probability Model
7:10
D'Alembert vs. Necker
9:41
D'Alembert vs. Necker
9:42
Problem with D'Alembert's Model
13:29
Problem with D'Alembert's Model
13:30
Covering Entire Sample Space
15:08
Fundamental Principle of Counting
15:09
Where Do Probabilities Come From?
22:54
Observed Data, Symmetry, and Subjective Estimates
22:55
Checking whether Model Matches Real World
24:27
Law of Large Numbers
24:28
Example 1: Law of Large Numbers
27:46
Example 2: Possible Outcomes
30:43
Example 3: Brands of Coffee and Taste
33:25
Example 4: How Many Different Treatments are there?
35:33
Addition Rule for Disjoint Events

20m 29s

Intro
0:00
Roadmap
0:08
Roadmap
0:09
Disjoint Events
0:41
Disjoint Events
0:42
Meaning of 'or'
2:39
In Regular Life
2:40
In Math/Statistics/Computer Science
3:10
Addition Rule for Disjoin Events
3:55
If A and B are Disjoint: P (A and B)
3:56
If A and B are Disjoint: P (A or B)
5:15
General Addition Rule
5:41
General Addition Rule
5:42
Generalized Addition Rule
8:31
If A and B are not Disjoint: P (A or B)
8:32
Example 1: Which of These are Mutually Exclusive?
10:50
Example 2: What is the Probability that You will Have a Combination of One Heads and Two Tails?
12:57
Example 3: Engagement Party
15:17
Example 4: Home Owner's Insurance
18:30
Conditional Probability

57m 19s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
'or' vs. 'and' vs. Conditional Probability
1:07
'or' vs. 'and' vs. Conditional Probability
1:08
'and' vs. Conditional Probability
5:57
P (M or L)
5:58
P (M and L)
8:41
P (M|L)
11:04
P (L|M)
12:24
Tree Diagram
15:02
Tree Diagram
15:03
Defining Conditional Probability
22:42
Defining Conditional Probability
22:43
Common Contexts for Conditional Probability
30:56
Medical Testing: Positive Predictive Value
30:57
Medical Testing: Sensitivity
33:03
Statistical Tests
34:27
Example 1: Drug and Disease
36:41
Example 2: Marbles and Conditional Probability
40:04
Example 3: Cards and Conditional Probability
45:59
Example 4: Votes and Conditional Probability
50:21
Independent Events

24m 27s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Independent Events & Conditional Probability
0:26
Non-independent Events
0:27
Independent Events
2:00
Non-independent and Independent Events
3:08
Non-independent and Independent Events
3:09
Defining Independent Events
5:52
Defining Independent Events
5:53
Multiplication Rule
7:29
Previously…
7:30
But with Independent Evens
8:53
Example 1: Which of These Pairs of Events are Independent?
11:12
Example 2: Health Insurance and Probability
15:12
Example 3: Independent Events
17:42
Example 4: Independent Events
20:03
Section 8: Probability Distributions
Introduction to Probability Distributions

56m 45s

Intro
0:00
Roadmap
0:08
Roadmap
0:09
Sampling vs. Probability
0:57
Sampling
0:58
Missing
1:30
What is Missing?
3:06
Insight: Probability Distributions
5:26
Insight: Probability Distributions
5:27
What is a Probability Distribution?
7:29
From Sample Spaces to Probability Distributions
8:44
Sample Space
8:45
Probability Distribution of the Sum of Two Die
11:16
The Random Variable
17:43
The Random Variable
17:44
Expected Value
21:52
Expected Value
21:53
Example 1: Probability Distributions
28:45
Example 2: Probability Distributions
35:30
Example 3: Probability Distributions
43:37
Example 4: Probability Distributions
47:20
Expected Value & Variance of Probability Distributions

53m 41s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Discrete vs. Continuous Random Variables
1:04
Discrete vs. Continuous Random Variables
1:05
Mean and Variance Review
4:44
Mean: Sample, Population, and Probability Distribution
4:45
Variance: Sample, Population, and Probability Distribution
9:12
Example Situation
14:10
Example Situation
14:11
Some Special Cases…
16:13
Some Special Cases…
16:14
Linear Transformations
19:22
Linear Transformations
19:23
What Happens to Mean and Variance of the Probability Distribution?
20:12
n Independent Values of X
25:38
n Independent Values of X
25:39
Compare These Two Situations
30:56
Compare These Two Situations
30:57
Two Random Variables, X and Y
32:02
Two Random Variables, X and Y
32:03
Example 1: Expected Value & Variance of Probability Distributions
35:35
Example 2: Expected Values & Standard Deviation
44:17
Example 3: Expected Winnings and Standard Deviation
48:18
Binomial Distribution

55m 15s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Discrete Probability Distributions
1:42
Discrete Probability Distributions
1:43
Binomial Distribution
2:36
Binomial Distribution
2:37
Multiplicative Rule Review
6:54
Multiplicative Rule Review
6:55
How Many Outcomes with k 'Successes'
10:23
Adults and Bachelor's Degree: Manual List of Outcomes
10:24
P (X=k)
19:37
Putting Together # of Outcomes with the Multiplicative Rule
19:38
Expected Value and Standard Deviation in a Binomial Distribution
25:22
Expected Value and Standard Deviation in a Binomial Distribution
25:23
Example 1: Coin Toss
33:42
Example 2: College Graduates
38:03
Example 3: Types of Blood and Probability
45:39
Example 4: Expected Number and Standard Deviation
51:11
Section 9: Sampling Distributions of Statistics
Introduction to Sampling Distributions

48m 17s

Intro
0:00
Roadmap
0:08
Roadmap
0:09
Probability Distributions vs. Sampling Distributions
0:55
Probability Distributions vs. Sampling Distributions
0:56
Same Logic
3:55
Logic of Probability Distribution
3:56
Example: Rolling Two Die
6:56
Simulating Samples
9:53
To Come Up with Probability Distributions
9:54
In Sampling Distributions
11:12
Connecting Sampling and Research Methods with Sampling Distributions
12:11
Connecting Sampling and Research Methods with Sampling Distributions
12:12
Simulating a Sampling Distribution
14:14
Experimental Design: Regular Sleep vs. Less Sleep
14:15
Logic of Sampling Distributions
23:08
Logic of Sampling Distributions
23:09
General Method of Simulating Sampling Distributions
25:38
General Method of Simulating Sampling Distributions
25:39
Questions that Remain
28:45
Questions that Remain
28:46
Example 1: Mean and Standard Error of Sampling Distribution
30:57
Example 2: What is the Best Way to Describe Sampling Distributions?
37:12
Example 3: Matching Sampling Distributions
38:21
Example 4: Mean and Standard Error of Sampling Distribution
41:51
Sampling Distribution of the Mean

1h 8m 48s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Special Case of General Method for Simulating a Sampling Distribution
1:53
Special Case of General Method for Simulating a Sampling Distribution
1:54
Computer Simulation
3:43
Using Simulations to See Principles behind Shape of SDoM
15:50
Using Simulations to See Principles behind Shape of SDoM
15:51
Conditions
17:38
Using Simulations to See Principles behind Center (Mean) of SDoM
20:15
Using Simulations to See Principles behind Center (Mean) of SDoM
20:16
Conditions: Does n Matter?
21:31
Conditions: Does Number of Simulation Matter?
24:37
Using Simulations to See Principles behind Standard Deviation of SDoM
27:13
Using Simulations to See Principles behind Standard Deviation of SDoM
27:14
Conditions: Does n Matter?
34:45
Conditions: Does Number of Simulation Matter?
36:24
Central Limit Theorem
37:13
SHAPE
38:08
CENTER
39:34
SPREAD
39:52
Comparing Population, Sample, and SDoM
43:10
Comparing Population, Sample, and SDoM
43:11
Answering the 'Questions that Remain'
48:24
What Happens When We Don't Know What the Population Looks Like?
48:25
Can We Have Sampling Distributions for Summary Statistics Other than the Mean?
49:42
How Do We Know whether a Sample is Sufficiently Unlikely?
53:36
Do We Always Have to Simulate a Large Number of Samples in Order to get a Sampling Distribution?
54:40
Example 1: Mean Batting Average
55:25
Example 2: Mean Sampling Distribution and Standard Error
59:07
Example 3: Sampling Distribution of the Mean
1:01:04
Sampling Distribution of Sample Proportions

54m 37s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Intro to Sampling Distribution of Sample Proportions (SDoSP)
0:51
Categorical Data (Examples)
0:52
Wish to Estimate Proportion of Population from Sample…
2:00
Notation
3:34
Population Proportion and Sample Proportion Notations
3:35
What's the Difference?
9:19
SDoM vs. SDoSP: Type of Data
9:20
SDoM vs. SDoSP: Shape
11:24
SDoM vs. SDoSP: Center
12:30
SDoM vs. SDoSP: Spread
15:34
Binomial Distribution vs. Sampling Distribution of Sample Proportions
19:14
Binomial Distribution vs. SDoSP: Type of Data
19:17
Binomial Distribution vs. SDoSP: Shape
21:07
Binomial Distribution vs. SDoSP: Center
21:43
Binomial Distribution vs. SDoSP: Spread
24:08
Example 1: Sampling Distribution of Sample Proportions
26:07
Example 2: Sampling Distribution of Sample Proportions
37:58
Example 3: Sampling Distribution of Sample Proportions
44:42
Example 4: Sampling Distribution of Sample Proportions
45:57
Section 10: Inferential Statistics
Introduction to Confidence Intervals

42m 53s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Inferential Statistics
0:50
Inferential Statistics
0:51
Two Problems with This Picture…
3:20
Two Problems with This Picture…
3:21
Solution: Confidence Intervals (CI)
4:59
Solution: Hypotheiss Testing (HT)
5:49
Which Parameters are Known?
6:45
Which Parameters are Known?
6:46
Confidence Interval - Goal
7:56
When We Don't Know m but know s
7:57
When We Don't Know
18:27
When We Don't Know m nor s
18:28
Example 1: Confidence Intervals
26:18
Example 2: Confidence Intervals
29:46
Example 3: Confidence Intervals
32:18
Example 4: Confidence Intervals
38:31
t Distributions

1h 2m 6s

Intro
0:00
Roadmap
0:04
Roadmap
0:05
When to Use z vs. t?
1:07
When to Use z vs. t?
1:08
What is z and t?
3:02
z-score and t-score: Commonality
3:03
z-score and t-score: Formulas
3:34
z-score and t-score: Difference
5:22
Why not z? (Why t?)
7:24
Why not z? (Why t?)
7:25
But Don't Worry!
15:13
Gossett and t-distributions
15:14
Rules of t Distributions
17:05
t-distributions are More Normal as n Gets Bigger
17:06
t-distributions are a Family of Distributions
18:55
Degrees of Freedom (df)
20:02
Degrees of Freedom (df)
20:03
t Family of Distributions
24:07
t Family of Distributions : df = 2 , 4, and 60
24:08
df = 60
29:16
df = 2
29:59
How to Find It?
31:01
'Student's t-distribution' or 't-distribution'
31:02
Excel Example
33:06
Example 1: Which Distribution Do You Use? Z or t?
45:26
Example 2: Friends on Facebook
47:41
Example 3: t Distributions
52:15
Example 4: t Distributions , confidence interval, and mean
55:59
Introduction to Hypothesis Testing

1h 6m 33s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Issues to Overcome in Inferential Statistics
1:35
Issues to Overcome in Inferential Statistics
1:36
What Happens When We Don't Know What the Population Looks Like?
2:57
How Do We Know whether a sample is Sufficiently Unlikely
3:43
Hypothesizing a Population
6:44
Hypothesizing a Population
6:45
Null Hypothesis
8:07
Alternative Hypothesis
8:56
Hypotheses
11:58
Hypotheses
11:59
Errors in Hypothesis Testing
14:22
Errors in Hypothesis Testing
14:23
Steps of Hypothesis Testing
21:15
Steps of Hypothesis Testing
21:16
Single Sample HT ( When Sigma Available)
26:08
Example: Average Facebook Friends
26:09
Step1
27:08
Step 2
27:58
Step 3
28:17
Step 4
32:18
Single Sample HT (When Sigma Not Available)
36:33
Example: Average Facebook Friends
36:34
Step1: Hypothesis Testing
36:58
Step 2: Significance Level
37:25
Step 3: Decision Stage
37:40
Step 4: Sample
41:36
Sigma and p-value
45:04
Sigma and p-value
45:05
On tailed vs. Two Tailed Hypotheses
45:51
Example 1: Hypothesis Testing
48:37
Example 2: Heights of Women in the US
57:43
Example 3: Select the Best Way to Complete This Sentence
1:03:23
Confidence Intervals for the Difference of Two Independent Means

55m 14s

Intro
0:00
Roadmap
0:14
Roadmap
0:15
One Mean vs. Two Means
1:17
One Mean vs. Two Means
1:18
Notation
2:41
A Sample! A Set!
2:42
Mean of X, Mean of Y, and Difference of Two Means
3:56
SE of X
4:34
SE of Y
6:28
Sampling Distribution of the Difference between Two Means (SDoD)
7:48
Sampling Distribution of the Difference between Two Means (SDoD)
7:49
Rules of the SDoD (similar to CLT!)
15:00
Mean for the SDoD Null Hypothesis
15:01
Standard Error
17:39
When can We Construct a CI for the Difference between Two Means?
21:28
Three Conditions
21:29
Finding CI
23:56
One Mean CI
23:57
Two Means CI
25:45
Finding t
29:16
Finding t
29:17
Interpreting CI
30:25
Interpreting CI
30:26
Better Estimate of s (s pool)
34:15
Better Estimate of s (s pool)
34:16
Example 1: Confidence Intervals
42:32
Example 2: SE of the Difference
52:36
Hypothesis Testing for the Difference of Two Independent Means

50m

Intro
0:00
Roadmap
0:06
Roadmap
0:07
The Goal of Hypothesis Testing
0:56
One Sample and Two Samples
0:57
Sampling Distribution of the Difference between Two Means (SDoD)
3:42
Sampling Distribution of the Difference between Two Means (SDoD)
3:43
Rules of the SDoD (Similar to CLT!)
6:46
Shape
6:47
Mean for the Null Hypothesis
7:26
Standard Error for Independent Samples (When Variance is Homogenous)
8:18
Standard Error for Independent Samples (When Variance is not Homogenous)
9:25
Same Conditions for HT as for CI
10:08
Three Conditions
10:09
Steps of Hypothesis Testing
11:04
Steps of Hypothesis Testing
11:05
Formulas that Go with Steps of Hypothesis Testing
13:21
Step 1
13:25
Step 2
14:18
Step 3
15:00
Step 4
16:57
Example 1: Hypothesis Testing for the Difference of Two Independent Means
18:47
Example 2: Hypothesis Testing for the Difference of Two Independent Means
33:55
Example 3: Hypothesis Testing for the Difference of Two Independent Means
44:22
Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means

1h 14m 11s

Intro
0:00
Roadmap
0:09
Roadmap
0:10
The Goal of Hypothesis Testing
1:27
One Sample and Two Samples
1:28
Independent Samples vs. Paired Samples
3:16
Independent Samples vs. Paired Samples
3:17
Which is Which?
5:20
Independent SAMPLES vs. Independent VARIABLES
7:43
independent SAMPLES vs. Independent VARIABLES
7:44
T-tests Always…
10:48
T-tests Always…
10:49
Notation for Paired Samples
12:59
Notation for Paired Samples
13:00
Steps of Hypothesis Testing for Paired Samples
16:13
Steps of Hypothesis Testing for Paired Samples
16:14
Rules of the SDoD (Adding on Paired Samples)
18:03
Shape
18:04
Mean for the Null Hypothesis
18:31
Standard Error for Independent Samples (When Variance is Homogenous)
19:25
Standard Error for Paired Samples
20:39
Formulas that go with Steps of Hypothesis Testing
22:59
Formulas that go with Steps of Hypothesis Testing
23:00
Confidence Intervals for Paired Samples
30:32
Confidence Intervals for Paired Samples
30:33
Example 1: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means
32:28
Example 2: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means
44:02
Example 3: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means
52:23
Type I and Type II Errors

31m 27s

Intro
0:00
Roadmap
0:18
Roadmap
0:19
Errors and Relationship to HT and the Sample Statistic?
1:11
Errors and Relationship to HT and the Sample Statistic?
1:12
Instead of a Box…Distributions!
7:00
One Sample t-test: Friends on Facebook
7:01
Two Sample t-test: Friends on Facebook
13:46
Usually, Lots of Overlap between Null and Alternative Distributions
16:59
Overlap between Null and Alternative Distributions
17:00
How Distributions and 'Box' Fit Together
22:45
How Distributions and 'Box' Fit Together
22:46
Example 1: Types of Errors
25:54
Example 2: Types of Errors
27:30
Example 3: What is the Danger of the Type I Error?
29:38
Effect Size & Power

44m 41s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Distance between Distributions: Sample t
0:49
Distance between Distributions: Sample t
0:50
Problem with Distance in Terms of Standard Error
2:56
Problem with Distance in Terms of Standard Error
2:57
Test Statistic (t) vs. Effect Size (d or g)
4:38
Test Statistic (t) vs. Effect Size (d or g)
4:39
Rules of Effect Size
6:09
Rules of Effect Size
6:10
Why Do We Need Effect Size?
8:21
Tells You the Practical Significance
8:22
HT can be Deceiving…
10:25
Important Note
10:42
What is Power?
11:20
What is Power?
11:21
Why Do We Need Power?
14:19
Conditional Probability and Power
14:20
Power is:
16:27
Can We Calculate Power?
19:00
Can We Calculate Power?
19:01
How Does Alpha Affect Power?
20:36
How Does Alpha Affect Power?
20:37
How Does Effect Size Affect Power?
25:38
How Does Effect Size Affect Power?
25:39
How Does Variability and Sample Size Affect Power?
27:56
How Does Variability and Sample Size Affect Power?
27:57
How Do We Increase Power?
32:47
Increasing Power
32:48
Example 1: Effect Size & Power
35:40
Example 2: Effect Size & Power
37:38
Example 3: Effect Size & Power
40:55
Section 11: Analysis of Variance
F-distributions

24m 46s

Intro
0:00
Roadmap
0:04
Roadmap
0:05
Z- & T-statistic and Their Distribution
0:34
Z- & T-statistic and Their Distribution
0:35
F-statistic
4:55
The F Ration ( the Variance Ratio)
4:56
F-distribution
12:29
F-distribution
12:30
s and p-value
15:00
s and p-value
15:01
Example 1: Why Does F-distribution Stop At 0 But Go On Until Infinity?
18:33
Example 2: F-distributions
19:29
Example 3: F-distributions and Heights
21:29
ANOVA with Independent Samples

1h 9m 25s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
The Limitations of t-tests
1:12
The Limitations of t-tests
1:13
Two Major Limitations of Many t-tests
3:26
Two Major Limitations of Many t-tests
3:27
Ronald Fisher's Solution… F-test! New Null Hypothesis
4:43
Ronald Fisher's Solution… F-test! New Null Hypothesis (Omnibus Test - One Test to Rule Them All!)
4:44
Analysis of Variance (ANoVA) Notation
7:47
Analysis of Variance (ANoVA) Notation
7:48
Partitioning (Analyzing) Variance
9:58
Total Variance
9:59
Within-group Variation
14:00
Between-group Variation
16:22
Time out: Review Variance & SS
17:05
Time out: Review Variance & SS
17:06
F-statistic
19:22
The F Ratio (the Variance Ratio)
19:23
S²bet = SSbet / dfbet
22:13
What is This?
22:14
How Many Means?
23:20
So What is the dfbet?
23:38
So What is SSbet?
24:15
S²w = SSw / dfw
26:05
What is This?
26:06
How Many Means?
27:20
So What is the dfw?
27:36
So What is SSw?
28:18
Chart of Independent Samples ANOVA
29:25
Chart of Independent Samples ANOVA
29:26
Example 1: Who Uploads More Photos: Unknown Ethnicity, Latino, Asian, Black, or White Facebook Users?
35:52
Hypotheses
35:53
Significance Level
39:40
Decision Stage
40:05
Calculate Samples' Statistic and p-Value
44:10
Reject or Fail to Reject H0
55:54
Example 2: ANOVA with Independent Samples
58:21
Repeated Measures ANOVA

1h 15m 13s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
The Limitations of t-tests
0:36
Who Uploads more Pictures and Which Photo-Type is Most Frequently Used on Facebook?
0:37
ANOVA (F-test) to the Rescue!
5:49
Omnibus Hypothesis
5:50
Analyze Variance
7:27
Independent Samples vs. Repeated Measures
9:12
Same Start
9:13
Independent Samples ANOVA
10:43
Repeated Measures ANOVA
12:00
Independent Samples ANOVA
16:00
Same Start: All the Variance Around Grand Mean
16:01
Independent Samples
16:23
Repeated Measures ANOVA
18:18
Same Start: All the Variance Around Grand Mean
18:19
Repeated Measures
18:33
Repeated Measures F-statistic
21:22
The F Ratio (The Variance Ratio)
21:23
S²bet = SSbet / dfbet
23:07
What is This?
23:08
How Many Means?
23:39
So What is the dfbet?
23:54
So What is SSbet?
24:32
S² resid = SS resid / df resid
25:46
What is This?
25:47
So What is SS resid?
26:44
So What is the df resid?
27:36
SS subj and df subj
28:11
What is This?
28:12
How Many Subject Means?
29:43
So What is df subj?
30:01
So What is SS subj?
30:09
SS total and df total
31:42
What is This?
31:43
What is the Total Number of Data Points?
32:02
So What is df total?
32:34
so What is SS total?
32:47
Chart of Repeated Measures ANOVA
33:19
Chart of Repeated Measures ANOVA: F and Between-samples Variability
33:20
Chart of Repeated Measures ANOVA: Total Variability, Within-subject (case) Variability, Residual Variability
35:50
Example 1: Which is More Prevalent on Facebook: Tagged, Uploaded, Mobile, or Profile Photos?
40:25
Hypotheses
40:26
Significance Level
41:46
Decision Stage
42:09
Calculate Samples' Statistic and p-Value
46:18
Reject or Fail to Reject H0
57:55
Example 2: Repeated Measures ANOVA
58:57
Example 3: What's the Problem with a Bunch of Tiny t-tests?
1:13:59
Section 12: Chi-square Test
Chi-Square Goodness-of-Fit Test

58m 23s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Where Does the Chi-Square Test Belong?
0:50
Where Does the Chi-Square Test Belong?
0:51
A New Twist on HT: Goodness-of-Fit
7:23
HT in General
7:24
Goodness-of-Fit HT
8:26
Hypotheses about Proportions
12:17
Null Hypothesis
12:18
Alternative Hypothesis
13:23
Example
14:38
Chi-Square Statistic
17:52
Chi-Square Statistic
17:53
Chi-Square Distributions
24:31
Chi-Square Distributions
24:32
Conditions for Chi-Square
28:58
Condition 1
28:59
Condition 2
30:20
Condition 3
30:32
Condition 4
31:47
Example 1: Chi-Square Goodness-of-Fit Test
32:23
Example 2: Chi-Square Goodness-of-Fit Test
44:34
Example 3: Which of These Statements Describe Properties of the Chi-Square Goodness-of-Fit Test?
56:06
Chi-Square Test of Homogeneity

51m 36s

Intro
0:00
Roadmap
0:09
Roadmap
0:10
Goodness-of-Fit vs. Homogeneity
1:13
Goodness-of-Fit HT
1:14
Homogeneity
2:00
Analogy
2:38
Hypotheses About Proportions
5:00
Null Hypothesis
5:01
Alternative Hypothesis
6:11
Example
6:33
Chi-Square Statistic
10:12
Same as Goodness-of-Fit Test
10:13
Set Up Data
12:28
Setting Up Data Example
12:29
Expected Frequency
16:53
Expected Frequency
16:54
Chi-Square Distributions & df
19:26
Chi-Square Distributions & df
19:27
Conditions for Test of Homogeneity
20:54
Condition 1
20:55
Condition 2
21:39
Condition 3
22:05
Condition 4
22:23
Example 1: Chi-Square Test of Homogeneity
22:52
Example 2: Chi-Square Test of Homogeneity
32:10
Section 13: Overview of Statistics
Overview of Statistics

18m 11s

Intro
0:00
Roadmap
0:07
Roadmap
0:08
The Statistical Tests (HT) We've Covered
0:28
The Statistical Tests (HT) We've Covered
0:29
Organizing the Tests We've Covered…
1:08
One Sample: Continuous DV and Categorical DV
1:09
Two Samples: Continuous DV and Categorical DV
5:41
More Than Two Samples: Continuous DV and Categorical DV
8:21
The Following Data: OK Cupid
10:10
The Following Data: OK Cupid
10:11
Example 1: Weird-MySpace-Angle Profile Photo
10:38
Example 2: Geniuses
12:30
Example 3: Promiscuous iPhone Users
13:37
Example 4: Women, Aging, and Messaging
16:07
Loading...
This is a quick preview of the lesson. For full access, please Log In or Sign up.
For more information, please see full course syllabus of Statistics
Bookmark & Share Embed

Share this knowledge with your friends!

Copy & Paste this embed code into your website’s HTML

Please ensure that your website editor is in text mode when you paste the code.
(In Wordpress, the mode button is on the top right corner.)
  ×
  • - Allow users to view the embedded video in full-size.
Since this lesson is not free, only the preview will appear on your website.
  • Discussion

  • Answer Engine

  • Download Lecture Slides

  • Table of Contents

  • Transcription

  • Related Books

Lecture Comments (2)

0 answers

Post by Lois Han on April 30, 2012

You are a breath of fresh air in my statistics life. Thank you so much!

0 answers

Post by Matt Lin on March 18, 2012

Why we are not reject the Null if sample chi-square is larger than critical chi-square?

Chi-Square Goodness-of-Fit Test

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

  • Intro 0:00
  • Roadmap 0:05
    • Roadmap
  • Where Does the Chi-Square Test Belong? 0:50
    • Where Does the Chi-Square Test Belong?
  • A New Twist on HT: Goodness-of-Fit 7:23
    • HT in General
    • Goodness-of-Fit HT
  • Hypotheses about Proportions 12:17
    • Null Hypothesis
    • Alternative Hypothesis
    • Example
  • Chi-Square Statistic 17:52
    • Chi-Square Statistic
  • Chi-Square Distributions 24:31
    • Chi-Square Distributions
  • Conditions for Chi-Square 28:58
    • Condition 1
    • Condition 2
    • Condition 3
    • Condition 4
  • Example 1: Chi-Square Goodness-of-Fit Test 32:23
  • Example 2: Chi-Square Goodness-of-Fit Test 44:34
  • Example 3: Which of These Statements Describe Properties of the Chi-Square Goodness-of-Fit Test? 56:06

Transcription: Chi-Square Goodness-of-Fit Test

Hi, welcome to educator.com.0000

We are going to talk about the chi-square goodness of fit test.0002

So first, we are going to start with the bigger review of where the chi-square test actually fits in.0005

Amongst all the different inferential statistics we have been learning so far and then we are going to talk0012

about a new kind of hypothesis testing, the goodness of fit hypothesis test.0018

So it is going to be similar to hypothesis testing as we been doing so far but there is a slightly different logic behind it.0023

So because it is a slightly different logic there is a new all hypothesis as well as the alternative hypothesis.0029

Then we are going to introduce the chi-square distribution and the chi-square statistic.0037

And then we are going to talk about the conditions for chi-square test when do we actually do it.0044

So where does the chi-square test belong?0049

And it is been a while since we have looked at this if you are going in order with the videos but I think it is0054

pretty good to stop right now and sort of think where we come from?0059

Where are we now?0063

So the first thing we want to think about are the different independent variables that we been able to look at.0065

We been able to look at independent variables the predictor variables that are either categorical or continuous.0072

When the idea is categorical you have groups right?0084

Or different samples, right?0095

When the idea is continuous you do not have different groups you have a different levels that predict something.0098

So just to give you a idea of a categorical IV that would be something like experimental group versus the0107

control group or something like this categorical IV may be someone who gets a drug versus someone who0116

gets the placebo , a group that gets the drivers of the group that gets the placebo and example of the0127

continuous IV might be looking at how much you study predicting your score on a test , so how much you0132

study would be a continuous IV.0140

So that is one of the dimensions that we need to know, is your IV categorical or continuous.0143

You also need to know whether the DV is categorical or continuous so the DV is the thing that were0150

interested in measuring at the end of the day the things that we want to know that this thing change this is0160

the thing we want to predict right, and so far here is how would come.0167

At the very beginning we looked at continuous types of tests and those types of measures and those were0177

the regression, linear regression, as well as correlation.0187

Remember R and regression was that stuff about like Y equals the not + b sub 1 times X, so that was0193

regression and correlation way back in the day.0210

We have been covering a lot of this quadrant actually looking at t-tests and ANOVA right?0215

One important thing to know that t-tests and ANOVAs are both hypothesis tests, only so far have not0224

learned hypothesis testing with regression and correlation.0238

A lot of inferential statistics in college does not cover hypothesis testing of regression until you get to more advance levels of statistics.0241

So what do ANOVAs and t-tests sort of have in common?0255

Well they have in common that they are both categorical IV and continuous DV.0261

The IV is categorical and you only have one, one IV.0269

And your DV is continuous.0277

So that sort of what they have in common, what is different about them?0282

Well the difference is that the IV in t-tests has two levels in only two levels so there is only two groups or two samples.0287

In ANOVAs we could test for more than two samples, we can do that for 3 4 5 samples.0297

So that IV has greater than two levels and so that is where we been spending a lot of our time.0302

So for the most part continuous DV are really important because they tell us a lot, they tell us the find ways0312

that we could actually be different, that the data could actually be different.0320

So you are going to, it is more rare that you will use the categorical dependent variable, that is not going to0327

be as informative to us but it is still possible and that is where the chi-square is going to come in.0334

The chi-square is been coming right in this quadrant where we have categorical IV also a categorical DV so0340

for instance we might want to see something like if you are given a particular job or the placebo, do you0347

feel like you are getting better, yes or no right?0357

So that is a categorical DV, it is not like the score that we can find a mean and so this is where the chi-square tests come in.0360

And there is going to be 2 chi-square tests that we are going to look at.0375

The first one, we are going to cover today and it is called goodness of fit.0379

The next one is in the next lesson and it is called a test of homogeneity.0382

They are both chi-square test.0386

The other way you will see that what is written is chi-squared, so sometimes, do not think of, oh what is this doing here?0387

When it has this little curvy part here we need chi-square, the Greek letter chi, finally this is a test that0398

rarely is covered in inferential statistics but at more advanced levels of statistics he did cover it and it is called0407

the logistic test and logistic test takes you from continuous IV to categorical DV.0415

But that is rare design used in conducting science, it is not as informative as continuous to continuous or categorical to continues.0424

Alright so we are going to spend your time right in here.0436

So there is a new twist on hypothesis testing, it is not totally different, it is still very similar but there is there is a subtle difference.0441

Today we are going to start off with the chi-square goodness of fit test.0454

Basically let us think about hypothesis testing in general.0457

In general you want to determine whether a sample is very different from expected results that is the big idea of hypothesis testing0462

and expected results come from your hypothesized population.0470

If your sample is very different than we usually determine that with some sort of test statistic and looking0474

at how far it is on the on the tested statistics distribution right and we look at whether it is past that Alpha0481

cut off or the critical test statistic right and then we say, oh this sample is so different than would be0489

expected given that the null hypothesis is true that we are going to reject the null hypothesis.0496

That is usually hypothesis testing. It still takes that idea whether to look at whether a sample is very0504

different from expected results, but the question is how are we going to compare these two things?0511

We are not going to compare means anymore, we are not going to look at the distance between means,0517

nor are we going to look at the proportion of variances that is not what we are going to look at either.0521

Instead we are going to determine whether the sample proportions for some category are very different0527

from the hypothesized population proportion.0539

And the question will be how do we determine very different and here is what I mean by determine0542

whether the sample proportions are different from the hypothesized population proportion.0549

So here I am just going to draw for you sort of schematically what the hypothesized population proportions might look like.0554

So this is just sort about the idea, so you might think of the population as being like this and in the0569

population you might see a proportion of one third being blue, one third being red, and one third being yellow.0577

Now already it is hard to think about like you could already sort of see, well we cannot get the average of0588

blue red and yellow right like what would be the average of that, and how would you find the variability of0597

that so already we are starting to see why you cannot use t-tests or ANOVAs if you cannot find the mean or0605

variance you cannot use those test so is this is what our hypothesized population looks like and when we0613

get a sample we get a little sample from that population, we want to know whether our sample0622

proportions are very different from the hypothesized proportions or not, so let us say in our sample0631

proportion we get mostly blue, little bit of red, little bit of yellow so let say 60% blue 20% red 20% yellow.0637

Are those proportions different enough from our hypothesized proportion?0650

Another sample we might get is you know, half blue and half red and no yellow, is that really different from our hypothesized proportion?0655

Another sample we might get might be only like 110 blue and then 40% red and then the other half will be yellow.0674

So something like that we want to say if it is really different from these hypothesized population0694

proportion, and so that is what our new our new goal is.0700

How different are these proportions from these proportion and then the question becomes okay how to0706

determine whether something is very different?0713

Is this very different or just different?0717

How do we determine very different, that is going to be the key question here.0724

And that is why we are going to need the chi-square statistic and the chi-square distribution.0728

So we are changing our hypotheses a little bit now the null hypotheses is really about proportion and here is what we are talking about.0733

The null hypothesis now is that the proportions of the population are real population that we do not know?0749

Will this population be like the predicted or theorized proportion and so here we are asking is this unknown0756

population like or known population right and it should sound familiar as that sort of the fundamental basis of inferential statistics.0772

So that is our new null hypothesis.0782

That the proportions in the population are like the predicted will be like the predicted population proportion still be the same.0785

Remember sameness is always the hallmark of the null hypothesis alternatively if you want to say at least0798

one of the proportion in the population will be different than predicted so going back to our example, if our0807

population are hypothesized population is something like one third, one third, one third maybe what we0816

will find is something like in our sample will have one third blue but then some smaller proportion like 15% red and on the rest being yellow.0830

Now the one third should match up.0856

The one third matches up but what about these other two?0860

And so an alternative hypothesis at least one proportion in the population will be different from the predicted proportion,0864

there just has to be one guy that is different.0875

Suggest I give you an example, let us turn this problem into a null hypothesis in an alternative hypothesis.0878

So here it said according to early polls candidate A was supposed to win 63% of the votes and candidate B was supposed to win 37%.0886

When the votes are counted candidate a won 340 votes while B won 166 votes so here just to give you that0898

picture again the null hypothesis population was that candidate A color A in blue, candidate A should have0908

won 63% of the vote and candidate B all color in red should have won 37% of the vote so what would be our null hypothesis?0918

Our null hypothesis would be that our unknown population will be like this predicted the proportions of my unknown population0933

will have the same proportion as our predicted population.0945

So here we might see something like A's proportion of votes of the actual real votes should be like this,0949

the predicted population, and B’s proportion of votes should be like predicted population.0982

So let us say, A’s proportion the real proportion of votes should be like this, and so should B, B should be like this.1009

The other way we could say that is that the proportion of votes the real proportion of votes should be like1017

the predicted proportion of votes, and then you could just say for every single category for both A and B.1025

So what would be the alternative version of this?1031

The alternative would say at least one of the proportion one of the categories either A or B one of those1035

proportions will be different from the hypothesized proportion.1043

And in fact in this example if one of them is different the other will be different to because since we only1048

have two categories if we make one really different than the other one will automatically change.1056

But later on we might see example 3, 4, 5 category and so in those cases this will make more sense.1061

Okay so now let us talk about how to actually find out if out proportions are really off or not.1070

Are our proportion statistical outliers are they deviant, are they significant, do they stand out, that is what we want to know.1080

And in order to do that we have to use measure called the chi-square statistic instead of the T statistic1092

which looks at a distance away in terms of standard error instead of the S statistic which looks at the1099

proportion of the variance are interested in over the variance we cannot explain the chi-square does something different.1106

It is now looking at expected values what would we expect and what would we actually observe and so the1113

chi-square is going to look like this, so be careful that you do not, usually it is like a uppercase accident and1124

it is a little bit different than like a regular letter X, it is usually a little more curvy to let you know it is chi-square.1134

So the chi-square is really going to be interested in the difference between what we observe the actual1142

observed frequency or percentages minus the expected frequency.1150

So what were looking at observed versus expected this is what we see in our sample and this is what we1157

would predict given our hypothesized population so this is that predicted population part.1170

So were interested in the difference between those two frequencies.1180

Now although you could use proportions as well you can only do that if you have the same, if you have a1185

constant number of items so you probably are safer to go with frequencies because those are assertively1200

weeded proportion so you probably want to go with that.1203

So were interested in this difference but remember when we look at this different sometimes there can be1207

positive sometimes there can be negative and so we what we do here as is usual in statistics as we square1214

the whole thing, but we also want to know about this difference as a proportion of what was expected and we want to do this for every category.1220

For the number of categories and I goes from one to the number of categories and there is actually an I down here for everything.1234

So what this is saying is that for each category, each proportion that you are looking at so in our in our sort1249

of toy example with the red blue and yellow, in this example we would do this for blue we would do this1259

for red and we would do this for yellow so number of categories, so categories really speak to what are the proportions made of?1275

So in here we have three categories so we would do this three times and add those proportions up and we1291

want to eventually be able to find observed frequency and the expected frequency.1315

Now in the example that we saw with the voting of for candidate A and B, one of the things I hope you1321

noticed was that the observed frequencies were given is just number of votes how many people voted but1330

the expected frequencies would be expected hypothesized population, that was given as a percentage so1336

you cannot subtract votes from percentage, you have to translate them both into something that is the1346

same and so in that it is helpful to change the expected percentages into expected frequency and there is1353

going to be another reason for changing it into expected frequencies instead of changing the observed1366

frequencies into the observed proportion and I am going to that a little bit later.1371

So here is what I want you to think of this, is really the square difference between observed and expected1377

frequencies as a proportion of expected frequency and you want to do that and you want to sum that over all the categories.1384

Once you have that then you get your chi-square value, now let us think about this chi-square value.1394

If this difference is very large right so observed frequencies are just very different than expected one, is that difference is very large?1400

You are going to have a very large chi-square also if this difference is very small, they are really close to each other, then your chi-square is be very small.1413

So chi-square is giving us a measure of how far apart the observed and expected frequencies are, also I1422

want to see that the chi-square cannot be negative.1434

First of all because were squaring this difference right so the numerator cannot be negative not only that1439

the expected frequencies also cannot be negative because we are counting up how many things we have ,1445

how many things we observed and so this also cannot be negative so this whole thing cannot be negative.1451

So already we see in our mind the chi-square distribution will probably be positive and positively skewed1457

because it stops at zero there is a wall at zero.1465

Okay so now let us actually talk and draw the chi-square distribution so imagine having some sort of data1470

set and taking from it over and over again samples so you take a sample and so have this big data set, you1479

take the sample and you calculate the chi-square statistic and you plot that.1487

And then you put that back in you take another sample and you take the chi-square plotted again and do1493

that over and over and over and over again.1502

You will never get a value that is below zero and you will get values that might be way higher than zero1505

sometimes but for the most part though be clustered over here so you will get a skewed distribution and1514

indeed the chi-square distribution is a skewed distribution.1520

Now here when we look at this you might think, hey, that looks sort of like the F distribution and you are1527

right overall and shape it looks just like the F distribution and in a lot of ways we could apply the reasoning1536

from the F distribution directly to the chi-square distribution.1544

For instant in the chi-square distribution, our alpha is automatically one tailed it is only on one side and so1548

when we say something like alpha equals .05 this is what we mean, we mean that we will reject the null1556

when we have a chi-square value that somewhere out here or here or here but we will fail to reject if we1565

get a chi-square value in here from our sample.1573

Now this chi-square distribution like the S and t-distribution, it is a family of distribution, not just one1576

distribution the only one that is just one distribution is the normal distribution.1586

The chi-square distribution again depends on degrees of freedom and the degrees of freedom that the chi-1591

square depends on is going to be the number of categories -1 .1598

So if you have a lot of categories the chi-square it will look distribution will look different if you have a small1608

number of the categories like 2, the chi-square distribution will look different.1615

So let us talk about what Alpha means here.1619

The alpha here is this set significance level we are going to say, we are going to use this as the boundary so1623

that if we have a chi-square from our sample that bigger than this boundary then we will reject the null.1630

What is the difference now with P value?1643

Now the P value said this is the probability so we might have a P value somewhere out here or we might1647

have a P value somewhere here, the P value is going to be very similar to other hypothesis test what the P1656

value means and other hypothesis test, basically is going to be the probability of getting a high square value1669

larger more extreme and in this case there is only one kind of extreme, positive larger than the one from our sample but under condition.1681

Remember in this world which one is true?1700

The null hypothesis is true.1703

So considering if the null hypothesis were true this would be the probability of getting such an extreme chi-1712

square value , one that is that large or larger, that is all we need.1720

So, in that way the P value is from our data while the alpha is not from our data it is it is just something we sat as the cut off.1727

So there are some conditions that we need to know before we use the chi-square.1737

When we use the chi-square we cannot just always use it, there are conditions that have to be met so one of the conditions of the chi-square is this.1745

Each outcome in the population falls exactly into one of a fixed number of categories, so every time you1756

have some sort of case from the population so let us say we are drying out votes.1765

Each vote has to fall into one of a fixed number of categories so if it is two candidates, always two1773

candidates for every single voter so we cannot compare voters that had two candidates versus voters who had three candidates.1785

Also these have to be mutually exclusive categories, one vote cannot go to two candidates at ones so they1792

have to be mutually exclusive, you got vote for A or vote for B.1802

And you cannot opt out either, or else nobody has to be one of the fixed numbers of categories ahead of time.1807

So the numbering is slightly off here but the second condition that must be met is that you must have a1816

random sample from your population, that is just like all kinds of hypothesis testing though.1826

Number 3, the expected frequency in each category so once you once you compute all the expected1832

frequency in order to compute your chi-square, that needs to be each cell each square needs to have an1840

expected frequency of five or greater, here is why.1850

You need a big enough sample, if you have to small of the sample, again expected frequencies less than five1854

also unique big enough proportions, so let us say you want to compare proportions that are like you know1862

like one candidate is going to be predicted to win 99.999% of the votes and the other candidate is only1871

supposed to win .001% of the vote and you only have five people in your sample.1883

And so you need to also have big enough proportion and these balance each other out.1890

If you have a large and a sample than your proportions can be smaller also, if you have large enough1897

proportions in your sample could be smaller.1903

And the final condition is not really condition it is just sort of something I wanted you to know at the rule.1905

The chi-square goodness of fit test so that is always been talking about so far.1913

This test actually applies to more than two categories.1920

You do not just have 2 categories, you have 3 or 4 or 5 or 6 but they do need to be mutually exclusive and1927

each outcome in the population must be able to fall into any one of those.1935

So those are the conditions.1940

So now let us move on to some examples.1943

So the first example is the problem that we already looked at so far according to early polls candidate A1947

was supposed to win 63% of the vote and B was supposed to win 37%.1953

When the votes are counted, A won 340 votes while B won 166 votes.1958

One of the things that I like to do just to help myself is when I think of the null hypothesis, when I think of1967

the null hypothesis, I sort of write it in a sentence that the proportion of votes, that is my population,1975

should be like predicted proportions, and the alternative is that at least one of the proportion of votes will not be like predicted population.1990

What I also like to do is I like to draw this out for myself, I like to draw out the predicted population so I will2032

color candidate A in blue so that will be about 63%, candidate B will be in red, 37%.2040

And so eventually I want to know whether this is reflected in my actual votes.2053

The significance level we can set it up .05 just set of convention and we know that it has to be one tailed2059

because this is definitely going to be a chi-square and we know it is a chi-square because it is about expected proportions.2068

So now let us set our decision stage.2075

Now our decision stage, it is helpful to draw that chi-square distribution and to sort of label it, for alpha2081

here this is our rejection region .05, now it would be nice to know what our critical chi-square is, and in2100

order to find that we need degrees of freedom and degrees of freedom is the number of categories, in this2111

case 2 -1 and that is 1° of freedom and it is because if you know let us say that candidate B won that is2119

supposed to win 37% of the votes you could actually figure out candidate A like you do not need me to tell2131

you what that is to figure it out and candidate A cannot vary, the proportion cannot very freely once you2138

know this one and that is why it is number of categories – 1.2143

So now that we have that you might be useful to look at either in the back of your book or use XL2148

spreadsheet Excel function in order to find our critical chi-square.2156

So in order to find chi-square there are two functions that you need to know just like T this and T, F this and F in, now there is chi-this.2161

Actually we need to use chi in right now because here we have the probability .05 and the degrees of2182

freedom one and that will give us our critical chi-square and that is 3.84.2190

So critical and so this is the boundary were looking for 3.84 so anything more extreme more positive than2198

3.84 and were going to reject our null hypothesis.2208

So now that our decision stage is set, now it is helpful to actually work with our population and remember2214

when we talk about our population, should have left myself some room, when we talk about our actual sample here is what we ended having.2221

We have observed frequencies already so for candidate A, I am going to write a column for observed in2236

candidate B so candidate A, we observed 340 votes so that is our observed frequency for candidate B, we see 166 votes.2243

Now one that helps is we know what the total number of votes was, so the total number of votes is going to be 340+166 and that is 506.2261

So 506 people actually voted in this so down here I am going to write total 506.2274

Now the question is what should our active frequencies have been?2283

So here I am going to write expected and I know that my proportion of expected should be 63%.2291

That means is that the total number of people who voted?2298

So here is our little sample of 506 people.2302

This is our 100% but here we have 506 people in our sample, we should expect 63% of 506 to have voted2308

for A, and so how do we find that?2323

Well we are going to multiply 63% to 506 to find out how many votes that little blue bit is and so that is2328

going to be.63×506 that total amount.2341

If we multiply 506 x 1 we would get 506 right?2350

So if we multiply by a little bit of a smaller proportion that we get just that chunk. 318.78 actually I am2355

going to put this here, let me actually draw this little table right in here because that can help us do our 3939.1 finder chi-square much more quickly.2367

And so observed expected frequency observed frequency at 340 and 166, okay.2383

So what are the other expected frequency for B, so in order to find this little bit we are going to multiply2394

.37×506, so .37x506 and that is 187.22.2401

And usually if you add this entire column that you should get roughly a similar total.2414

When you do it, when you do these by hand sometimes you might not get exactly the same number it2422

might be off by just a little bit because of a rounding error, if you round to the nearest 10th, round to the nearest integer,2429

you make it a little bit around it here but you should be off by much so that one way you could check to see what you did was right.2438

And so once we have this, so let me just copy these down right here so 318.78 and 187.22 for each of these2445

the total is 506, so here, one of things we see is that the expected value for A are a little bit lower and the2463

expected values for B are little bit higher, but is this difference in proportion is that significant is that2476

standing out enough, and in order to find that we need to find the chi-square, the sample chi-square.2485

Now, we completely run out of room here.2493

But I will just write the chi-square formula up here.2497

So the chi-square is going to be the sum over all the categories of the observed frequency minus the2500

expected square as a proportion of the expected frequency.2510

And so what I am going to do is calculate this for each category, A and B and then add them up.2517

So right here I am going to call this a column, O minus E squared all over B.2525

So I am going to do that for A and B and then sum them up.2540

So, my observed minus expected squared all divided by expected and so here I get this proportion and I am2547

just going to copy and paste that down here and then here I am just going to some them up and I get 3.817.2565

We are really close but no cigar so where were right underneath so our sample chi-square is just a smidge2577

smaller than our critical chi-square so here were not rejecting the null, we are going to fail to reject the2589

null, so let us find the P value so in order to find the P value you could use chi disc or alternatively look it up2597

in the back of your book, look for the chi-square distribution.2609

It should be behind your normal, your T, your F and then chi-square should come right behind it, it usually goes in that order , maybe a slightly different order.2614

And our degrees of freedom remain the same one and so all our P value is just over .05, if we round, .51 right?2627

So because of that we are not going to reject the null so we are going to say the proportions of votes are roughly similar to the predicted proportions.2640

Well, they are not significantly different at least, they are not super similar but we cannot make a decision2657

about that but we can say they are not that different from, that they are not extremely different at least.2663

Okay, example 2. A study ask college students could tell dog food apart from expensive liver pâté liverwurst and spam.2669

All blended to the same consistency chilled and garnished with herbs and a lemon wedge, just to make it pretty.2684

Students are asked to identify which was dog food.2695

Researchers wanted to test the probability model where the students are randomly guessing.2698

How would they cast their hypothesized model?2703

Okay so see the download that shows how many students picked that item to be dog food, so it seems that2707

college students have a bunch of different choices in dog food liver Patty, liverwurst and spam, and then2714

they need to identify which was dog food so out of those, which of those is dog food?2723

So it is sort of like a multiple-choice question.2728

So if you hit example 2 in the download that listed below, you will see the number of students is selected that particular item as dog food.2732

Now be careful because some people right here, remember, you will really get this problem on a test and you would not know that it is a chi-square problem.2741

Sometimes people might immediately just think I will find the means and so they just go ahead and find the2751

mean but then if you do find the mean, ask yourself, what does this mean?2758

What is the idea or the concept?2763

If we average this, we would find the average number of students that selected any of these items as dog2768

food and that sort of a mean that does not make any sense right?2775

And so before you know, go ahead and find the mean, ask yourself whether the mean is actually meaningful.2779

So here we know that the chi-square because the students are choosing something and it is a categorical choice.2788

They are not giving you an answer like 20 inches or 50° or I got 10 questions correct right?2798

They are actually just saying, that one is dog food and they have five different choices and they have2804

chosen one of them as dog food so out of five choices of probability model that are just guessing would2813

mean that 20% of the time they should pick pâté, once we dog food, 20% of the time don't expand to be2821

dog food 20% of the time to pick dog food to be dog food and so on and so forth.2828

So let us try that probability model and by model we also need null hypothesis.2835

Model or hypothesized population so step one.2844

So the null hypothesis is the idea that they will fit into this picture so this is the population, and it is out of2848

100% and they have five choices of pictures just lightly un even, it helps really draw this is as well as you can, just as then it will help you reason to.2858

That they will have a equal chance of guessing either one of these and there is two liver patties that is why there are 5 choices.2878

So liver pâté 1, spam was next, then actual dog food just in the data set, patty 2 and a liverwurst.2885

So these are the five choices and were saying look the students are just guessing they should have a 20% probability of each.2909

Is this the right proportion for this sample, is the sample going to serve match that or be very different from this.2923

The alternative is that at least one of the real proportion is different from predicted.2938

So once we have that, we can set our alpha to be .05 our decision stage, could draw there chi-square and2954

our degrees of freedom, we now have five categories and so our degrees of freedom is 5-1 which equals 42970

and it is because once we know four of this, that we could actually figure out the proportion for the fifth one just from knowing 4 of this.2978

So that one is no longer free to vary, it does not have freedom anymore.2987

So what is our critical chi-square?2991

Well, if you want to pull up your Excel data, here I am just in a start off with step three, in step three we are2998

critical chi-square in order to find that we can use chi-in, put in the probability that were interested in and our degrees of freedom which is 4.3011

And so our critical chi-square is 9.49.3026

Noticed that as degrees of freedom goes up, what is happening to the chi distribution is that it is getting3035

fatter it is getting more variable and because of that we need a more extreme chi-square value.3053

So that is sort of different than like T distributions or F distribution.3059

Those distributions got sharper when we increased our degrees of freedom , chi distributions were the opposite way.3066

Those district chi distributions are getting more variable as degrees of freedom goes up.3075

So once we have this now we could start working on our actual data, our actual samples.3080

So step four is we need to find a sample chi-square and in order to do that it helps to draw out that table so3089

the table might look something like this.3102

I will just copy this down here and this is the type of food, so that is the category and here we have our observed frequencies.3106

The actual number of students that pick that thing to be dog food.3125

So here we seen one student pick pâté, one to be dog food, 15 students picked liverwurst to be the dog food.3130

What are the expected frequencies?3138

Well in order to find expected frequencies we know that the expected proportions are going to be .2 all the way down.3142

20% 20% 20% 20% and here I am just going to total this up.3153

And I see that 34 students were asked this question.3161

Are expected frequencies should add up to about 34?3170

Are expected proportions adds up to one?3175

And that is why we cannot just directly compare these two things, they are not in the same sort of currency3179

yet, you sort of have to change this currency into frequency.3184

So how do we do that?3189

Well we imagine here are all 34 students take 20% of them, how many students will that be?3192

So that is 0.2×34, this times 34.3199

And I am just going to lockdown that 34 because that total sum would not change.3207

So, this is what we should expect that if they were indeed guessing, this is the expected frequencies that3214

we should see and if I just move that over here , we will see that that also at the column also add up to 34.3226

Now once we have that we can compute our actual chi-square because remember that observed frequency3233

minus expected square divided by expected as a proportion of expected.3240

So, that is the observed frequency minus expected frequency squared divided by the expected frequency.3247

And I could take that down for each row and then add those up and here I get my chi-square statistic for3257

my sample and so my sample chi-square is going to be 16.29, and that is the larger more extreme chi-3268

square than my critical chi-square, and let's also find P value here.3281

In order to find P value I could use chi-disc, here I put in my chi-square and my degrees of freedom which is 4.3286

And so that is .003 and that is certainly smaller than .05 and so in step five, we reject the null.3297

Now I just want to make a comment here.3315

Notice that here, after we do the chi-square although we reject the null just like in the ANOVA we do not3318

actually know which of the categories is the one that is really off.3325

This one here, we can sort of see, this one probably seems to be the most off but we are just eyeballing it,3330

were not using actual statistical principles.3340

So once you reject the null there is a post hoc test that you could do but we are not going to cover those here.3343

So it seems that students are not randomly guessing they actually have a preference for something as being dog food.3349

My guess is liverwurst.3362

So example 3 which of these statements describe properties of the chi-square goodness of fit test?3365

So if you switch the order of categories the value of the test statistic does not change, that is actually true it3376

does not matter whether candidate A got added before candidate B addition is totally order insensitive you3383

could add A or B or B on A, you can add pâté or liverwurst and dog food or dog food the liverwurst and3391

pate, it does not really matter so this is actually true, as a true property.3398

Observed frequencies are always whole members that is also actually true because when you observe of3403

the frequency, you are actually counting how many category numbers you have so counting is going to be made up of whole numbers.3410

Expected frequencies are always whole numbers, that is actually not true, expected frequencies are predicted frequencies.3418

It is not that at any one time you will have plenty student saying that liverwurst is dog food but it is that on3427

average that is what you would predict given a certain proportion and so this is actually not true, expected3435

frequencies do not have to be whole numbers because they are theoretical, they are not actually things that we counted up in real life.3445

A high value of chi-square indicates high level of agreement between observed frequencies and the expected frequencies.3452

Actually if you think about the chi-square statistic, this is actually the opposite of what is the real case.3462

If we had a high level of agreement this number would be very small and because this numerator is small3472

the chi-square would also be small, a high value of chi-square would actually mean that this is quite large3479

compared to this and so this is actually also wrong, the opposite.3486

So that is it for chi-square goodness of fit test, join us next time on educator.com for chi-square test of homogeneity.3494

Educator®

Please sign in to participate in this lecture discussion.

Resetting Your Password?
OR

Start Learning Now

Our free lessons will get you started (Adobe Flash® required).
Get immediate access to our entire library.

Membership Overview

  • Available 24/7. Unlimited Access to Our Entire Library.
  • Search and jump to exactly what you want to learn.
  • *Ask questions and get answers from the community and our teachers!
  • Practice questions with step-by-step solutions.
  • Download lecture slides for taking notes.
  • Track your course viewing progress.
  • Accessible anytime, anywhere with our Android and iOS apps.