Dr. Ji Son

Chi-Square Goodness-of-Fit Test

Slide Duration:

Table of Contents

Section 1: Introduction

Descriptive Statistics vs. Inferential Statistics

25m 31s

Intro

0:00

Roadmap

0:10

Roadmap

0:11

Statistics

0:35

Statistics

0:36

Let's Think About High School Science

1:12

Measurement and Find Patterns (Mathematical Formula)

1:13

Statistics = Math of Distributions

4:58

Distributions

4:59

Problematic… but also GREAT

5:58

Statistics

7:33

How is It Different from Other Specializations in Mathematics?

7:34

Statistics is Fundamental in Natural and Social Sciences

7:53

Two Skills of Statistics

8:20

Description (Exploration)

8:21

Inference

9:13

Descriptive Statistics vs. Inferential Statistics: Apply to Distributions

9:58

Descriptive Statistics

9:59

Inferential Statistics

11:05

Populations vs. Samples

12:19

Populations vs. Samples: Is it the Truth?

12:20

Populations vs. Samples: Pros & Cons

13:36

Populations vs. Samples: Descriptive Values

16:12

Putting Together Descriptive/Inferential Stats & Populations/Samples

17:10

Putting Together Descriptive/Inferential Stats & Populations/Samples

17:11

Example 1: Descriptive Statistics vs. Inferential Statistics

19:09

Example 2: Descriptive Statistics vs. Inferential Statistics

20:47

Example 3: Sample, Parameter, Population, and Statistic

21:40

Example 4: Sample, Parameter, Population, and Statistic

23:28

Section 2: About Samples: Cases, Variables, Measurements

About Samples: Cases, Variables, Measurements

32m 14s

Intro

0:00

Data

0:09

Data, Cases, Variables, and Values

0:10

Rows, Columns, and Cells

2:03

Example: Aircrafts

3:52

How Do We Get Data?

5:38

Research: Question and Hypothesis

5:39

Research Design

7:11

Measurement

7:29

Research Analysis

8:33

Research Conclusion

9:30

Types of Variables

10:03

Discrete Variables

10:04

Continuous Variables

12:07

Types of Measurements

14:17

Types of Measurements

14:18

Types of Measurements (Scales)

17:22

Nominal

17:23

Ordinal

19:11

Interval

21:33

Ratio

24:24

Example 1: Cases, Variables, Measurements

25:20

Example 2: Which Scale of Measurement is Used?

26:55

Example 3: What Kind of a Scale of Measurement is This?

27:26

Example 4: Discrete vs. Continuous Variables.

30:31

Section 3: Visualizing Distributions

Introduction to Excel

8m 9s

Intro

0:00

Before Visualizing Distribution

0:10

Excel

0:11

Excel: Organization

0:45

Workbook

0:46

Column x Rows

1:50

Tools: Menu Bar, Standard Toolbar, and Formula Bar

3:00

Excel + Data

6:07

Exce and Data

6:08

Frequency Distributions in Excel

39m 10s

Intro

0:00

Roadmap

0:08

Data in Excel and Frequency Distributions

0:09

Raw Data to Frequency Tables

0:42

Raw Data to Frequency Tables

0:43

Frequency Tables: Using Formulas and Pivot Tables

1:28

Example 1: Number of Births

7:17

Example 2: Age Distribution

20:41

Example 3: Height Distribution

27:45

Example 4: Height Distribution of Males

32:19

Frequency Distributions and Features

25m 29s

Intro

0:00

Roadmap

0:10

Data in Excel, Frequency Distributions, and Features of Frequency Distributions

0:11

Example #1

1:35

Uniform

1:36

Example #2

2:58

Unimodal, Skewed Right, and Asymmetric

2:59

Example #3

6:29

Bimodal

6:30

Example #4a

8:29

Symmetric, Unimodal, and Normal

8:30

Point of Inflection and Standard Deviation

11:13

Example #4b

12:43

Normal Distribution

12:44

Summary

13:56

Uniform, Skewed, Bimodal, and Normal

13:57

Sketch Problem 1: Driver's License

17:34

Sketch Problem 2: Life Expectancy

20:01

Sketch Problem 3: Telephone Numbers

22:01

Sketch Problem 4: Length of Time Used to Complete a Final Exam

23:43

Dotplots and Histograms in Excel

42m 42s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Previously

1:02

Data, Frequency Table, and visualization

1:03

Dotplots

1:22

Dotplots Excel Example

1:23

Dotplots: Pros and Cons

7:22

Pros and Cons of Dotplots

7:23

Dotplots Excel Example Cont.

9:07

Histograms

12:47

Histograms Overview

12:48

Example of Histograms

15:29

Histograms: Pros and Cons

31:39

Pros

31:40

Cons

32:31

Frequency vs. Relative Frequency

32:53

Frequency

32:54

Relative Frequency

33:36

Example 1: Dotplots vs. Histograms

34:36

Example 2: Age of Pennies Dotplot

36:21

Example 3: Histogram of Mammal Speeds

38:27

Example 4: Histogram of Life Expectancy

40:30

Stemplots

12m 23s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

What Sets Stemplots Apart?

0:46

Data Sets, Dotplots, Histograms, and Stemplots

0:47

Example 1: What Do Stemplots Look Like?

1:58

Example 2: Back-to-Back Stemplots

5:00

Example 3: Quiz Grade Stemplot

7:46

Example 4: Quiz Grade & Afterschool Tutoring Stemplot

9:56

Bar Graphs

22m 49s

Intro

0:00

Roadmap

0:05

Roadmap

0:08

Review of Frequency Distributions

0:44

Y-axis and X-axis

0:45

Types of Frequency Visualizations Covered so Far

2:16

Introduction to Bar Graphs

4:07

Example 1: Bar Graph

5:32

Example 1: Bar Graph

5:33

Do Shapes, Center, and Spread of Distributions Apply to Bar Graphs?

11:07

Do Shapes, Center, and Spread of Distributions Apply to Bar Graphs?

11:08

Example 2: Create a Frequency Visualization for Gender

14:02

Example 3: Cases, Variables, and Frequency Visualization

16:34

Example 4: What Kind of Graphs are Shown Below?

19:29

Section 4: Summarizing Distributions

Central Tendency: Mean, Median, Mode

38m 50s

Intro

0:00

Roadmap

0:07

Roadmap

0:08

Central Tendency 1

0:56

Way to Summarize a Distribution of Scores

0:57

Mode

1:32

Median

2:02

Mean

2:36

Central Tendency 2

3:47

Mode

3:48

Median

4:20

Mean

5:25

Summation Symbol

6:11

Summation Symbol

6:12

Population vs. Sample

10:46

Population vs. Sample

10:47

Excel Examples

15:08

Finding Mode, Median, and Mean in Excel

15:09

Median vs. Mean

21:45

Effect of Outliers

21:46

Relationship Between Parameter and Statistic

22:44

Type of Measurements

24:00

Which Distributions to Use With

24:55

Example 1: Mean

25:30

Example 2: Using Summation Symbol

29:50

Example 3: Average Calorie Count

32:50

Example 4: Creating an Example Set

35:46

Variability

42m 40s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Variability (or Spread)

0:45

Variability (or Spread)

0:46

Things to Think About

5:45

Things to Think About

5:46

Range, Quartiles and Interquartile Range

6:37

Range

6:38

Interquartile Range

8:42

Interquartile Range Example

10:58

Interquartile Range Example

10:59

Variance and Standard Deviation

12:27

Deviations

12:28

Sum of Squares

14:35

Variance

16:55

Standard Deviation

17:44

Sum of Squares (SS)

18:34

Sum of Squares (SS)

18:35

Population vs. Sample SD

22:00

Population vs. Sample SD

22:01

Population vs. Sample

23:20

Mean

23:21

23:51

Example 1: Find the Mean and Standard Deviation of the Variable Friends in the Excel File

27:21

Example 2: Find the Mean and Standard Deviation of the Tagged Photos in the Excel File

35:25

Example 3: Sum of Squares

38:58

Example 4: Standard Deviation

41:48

Five Number Summary & Boxplots

57m 15s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Summarizing Distributions

0:37

Shape, Center, and Spread

0:38

5 Number Summary

1:14

Boxplot: Visualizing 5 Number Summary

3:37

Boxplot: Visualizing 5 Number Summary

3:38

Boxplots on Excel

9:01

Using 'Stocks' and Using Stacked Columns

9:02

Boxplots on Excel Example

10:14

When are Boxplots Useful?

32:14

Pros

32:15

Cons

32:59

How to Determine Outlier Status

33:24

Rule of Thumb: Upper Limit

33:25

Rule of Thumb: Lower Limit

34:16

Signal Outliers in an Excel Data File Using Conditional Formatting

34:52

Modified Boxplot

48:38

Modified Boxplot

48:39

Example 1: Percentage Values & Lower and Upper Whisker

49:10

Example 2: Boxplot

50:10

Example 3: Estimating IQR From Boxplot

53:46

Example 4: Boxplot and Missing Whisker

54:35

Shape: Calculating Skewness & Kurtosis

41m 51s

Intro

0:00

Roadmap

0:16

Roadmap

0:17

Skewness Concept

1:09

Skewness Concept

1:10

Calculating Skewness

3:26

Calculating Skewness

3:27

Interpreting Skewness

7:36

Interpreting Skewness

7:37

Excel Example

8:49

Kurtosis Concept

20:29

Kurtosis Concept

20:30

Calculating Kurtosis

24:17

Calculating Kurtosis

24:18

Interpreting Kurtosis

29:01

Leptokurtic

29:35

Mesokurtic

30:10

Platykurtic

31:06

Excel Example

32:04

Example 1: Shape of Distribution

38:28

Example 2: Shape of Distribution

39:29

Example 3: Shape of Distribution

40:14

Example 4: Kurtosis

41:10

Normal Distribution

34m 33s

Intro

0:00

Roadmap

0:13

Roadmap

0:14

What is a Normal Distribution

0:44

The Normal Distribution As a Theoretical Model

0:45

Possible Range of Probabilities

3:05

Possible Range of Probabilities

3:06

What is a Normal Distribution

5:07

Can Be Described By

5:08

Properties

5:49

'Same' Shape: Illusion of Different Shape!

7:35

'Same' Shape: Illusion of Different Shape!

7:36

Types of Problems

13:45

Example: Distribution of SAT Scores

13:46

Shape Analogy

19:48

Shape Analogy

19:49

Example 1: The Standard Normal Distribution and Z-Scores

22:34

Example 2: The Standard Normal Distribution and Z-Scores

25:54

Example 3: Sketching and Normal Distribution

28:55

Example 4: Sketching and Normal Distribution

32:32

Standard Normal Distributions & Z-Scores

41m 44s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

A Family of Distributions

0:28

Infinite Set of Distributions

0:29

Transforming Normal Distributions to 'Standard' Normal Distribution

1:04

Normal Distribution vs. Standard Normal Distribution

2:58

Normal Distribution vs. Standard Normal Distribution

2:59

Z-Score, Raw Score, Mean, & SD

4:08

Z-Score, Raw Score, Mean, & SD

4:09

Weird Z-Scores

9:40

Weird Z-Scores

9:41

Excel

16:45

For Normal Distributions

16:46

For Standard Normal Distributions

19:11

Excel Example

20:24

Types of Problems

25:18

Percentage Problem: P(x)

25:19

Raw Score and Z-Score Problems

26:28

Standard Deviation Problems

27:01

Shape Analogy

27:44

Shape Analogy

27:45

Example 1: Deaths Due to Heart Disease vs. Deaths Due to Cancer

28:24

Example 2: Heights of Male College Students

33:15

Example 3: Mean and Standard Deviation

37:14

Example 4: Finding Percentage of Values in a Standard Normal Distribution

37:49

Normal Distribution: PDF vs. CDF

55m 44s

Intro

0:00

Roadmap

0:15

Roadmap

0:16

Frequency vs. Cumulative Frequency

0:56

Frequency vs. Cumulative Frequency

0:57

Frequency vs. Cumulative Frequency

4:32

Frequency vs. Cumulative Frequency Cont.

4:33

Calculus in Brief

6:21

Derivative-Integral Continuum

6:22

PDF

10:08

PDF for Standard Normal Distribution

10:09

PDF for Normal Distribution

14:32

Integral of PDF = CDF

21:27

Integral of PDF = CDF

21:28

Example 1: Cumulative Frequency Graph

23:31

Example 2: Mean, Standard Deviation, and Probability

24:43

Example 3: Mean and Standard Deviation

35:50

Example 4: Age of Cars

49:32

Section 5: Linear Regression

Scatterplots

47m 19s

Intro

0:00

Roadmap

0:04

Roadmap

0:05

Previous Visualizations

0:30

Frequency Distributions

0:31

Compare & Contrast

2:26

Frequency Distributions Vs. Scatterplots

2:27

Summary Values

4:53

Shape

4:54

Center & Trend

6:41

Spread & Strength

8:22

Univariate & Bivariate

10:25

Example Scatterplot

10:48

Shape, Trend, and Strength

10:49

Positive and Negative Association

14:05

Positive and Negative Association

14:06

Linearity, Strength, and Consistency

18:30

Linearity

18:31

Strength

19:14

Consistency

20:40

Summarizing a Scatterplot

22:58

Summarizing a Scatterplot

22:59

Example 1: Gapminder.org, Income x Life Expectancy

26:32

Example 2: Gapminder.org, Income x Infant Mortality

36:12

Example 3: Trend and Strength of Variables

40:14

Example 4: Trend, Strength and Shape for Scatterplots

43:27

Regression

32m 2s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Linear Equations

0:34

Linear Equations: y = mx + b

0:35

Rough Line

5:16

Rough Line

5:17

Regression - A 'Center' Line

7:41

Reasons for Summarizing with a Regression Line

7:42

Predictor and Response Variable

10:04

Goal of Regression

12:29

Goal of Regression

12:30

Prediction

14:50

Example: Servings of Mile Per Year Shown By Age

14:51

Intrapolation

17:06

Extrapolation

17:58

Error in Prediction

20:34

Prediction Error

20:35

Residual

21:40

Example 1: Residual

23:34

Example 2: Large and Negative Residual

26:30

Example 3: Positive Residual

28:13

Example 4: Interpret Regression Line & Extrapolate

29:40

Least Squares Regression

56m 36s

Intro

0:00

Roadmap

0:13

Roadmap

0:14

Best Fit

0:47

Best Fit

0:48

Sum of Squared Errors (SSE)

1:50

Sum of Squared Errors (SSE)

1:51

Why Squared?

3:38

Why Squared?

3:39

Quantitative Properties of Regression Line

4:51

Quantitative Properties of Regression Line

4:52

So How do we Find Such a Line?

6:49

SSEs of Different Line Equations & Lowest SSE

6:50

Carl Gauss' Method

8:01

How Do We Find Slope (b1)

11:00

How Do We Find Slope (b1)

11:01

Hoe Do We Find Intercept

15:11

Hoe Do We Find Intercept

15:12

Example 1: Which of These Equations Fit the Above Data Best?

17:18

Example 2: Find the Regression Line for These Data Points and Interpret It

26:31

Example 3: Summarize the Scatterplot and Find the Regression Line.

34:31

Example 4: Examine the Mean of Residuals

43:52

Correlation

43m 58s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Summarizing a Scatterplot Quantitatively

0:47

Shape

0:48

Trend

1:11

Strength: Correlation ®

1:45

Correlation Coefficient ( r )

2:30

Correlation Coefficient ( r )

2:31

Trees vs. Forest

11:59

Trees vs. Forest

12:00

Calculating r

15:07

Average Product of z-scores for x and y

15:08

Relationship between Correlation and Slope

21:10

Relationship between Correlation and Slope

21:11

Example 1: Find the Correlation between Grams of Fat and Cost

24:11

Example 2: Relationship between r and b1

30:24

Example 3: Find the Regression Line

33:35

Example 4: Find the Correlation Coefficient for this Set of Data

37:37

Correlation: r vs. r-squared

52m 52s

Intro

0:00

Roadmap

0:07

Roadmap

0:08

R-squared

0:44

What is the Meaning of It? Why Squared?

0:45

Parsing Sum of Squared (Parsing Variability)

2:25

SST = SSR + SSE

2:26

What is SST and SSE?

7:46

What is SST and SSE?

7:47

r-squared

18:33

Coefficient of Determination

18:34

If the Correlation is Strong…

20:25

If the Correlation is Strong…

20:26

If the Correlation is Weak…

22:36

If the Correlation is Weak…

22:37

Example 1: Find r-squared for this Set of Data

23:56

Example 2: What Does it Mean that the Simple Linear Regression is a 'Model' of Variance?

33:54

Example 3: Why Does r-squared Only Range from 0 to 1

37:29

Example 4: Find the r-squared for This Set of Data

39:55

Transformations of Data

27m 8s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Why Transform?

0:26

Why Transform?

0:27

Shape-preserving vs. Shape-changing Transformations

5:14

Shape-preserving = Linear Transformations

5:15

Shape-changing Transformations = Non-linear Transformations

6:20

Common Shape-Preserving Transformations

7:08

Common Shape-Preserving Transformations

7:09

Common Shape-Changing Transformations

8:59

Powers

9:00

Logarithms

9:39

Change Just One Variable? Both?

10:38

Log-log Transformations

10:39

Log Transformations

14:38

Example 1: Create, Graph, and Transform the Data Set

15:19

Example 2: Create, Graph, and Transform the Data Set

20:08

Example 3: What Kind of Model would You Choose for this Data?

22:44

Example 4: Transformation of Data

25:46

Section 6: Collecting Data in an Experiment

Sampling & Bias

54m 44s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Descriptive vs. Inferential Statistics

1:04

Descriptive Statistics: Data Exploration

1:05

Example

2:03

To tackle Generalization…

4:31

Generalization

4:32

Sampling

6:06

'Good' Sample

6:40

Defining Samples and Populations

8:55

Population

8:56

Sample

11:16

Why Use Sampling?

13:09

Why Use Sampling?

13:10

Goal of Sampling: Avoiding Bias

15:04

What is Bias?

15:05

Where does Bias Come from: Sampling Bias

17:53

Where does Bias Come from: Response Bias

18:27

Sampling Bias: Bias from Bas Sampling Methods

19:34

Size Bias

19:35

Voluntary Response Bias

21:13

Convenience Sample

22:22

Judgment Sample

23:58

Inadequate Sample Frame

25:40

Response Bias: Bias from 'Bad' Data Collection Methods

28:00

Nonresponse Bias

29:31

Questionnaire Bias

31:10

Incorrect Response or Measurement Bias

37:32

Example 1: What Kind of Biases?

40:29

Example 2: What Biases Might Arise?

44:46

Example 3: What Kind of Biases?

48:34

Example 4: What Kind of Biases?

51:43

Sampling Methods

14m 25s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Biased vs. Unbiased Sampling Methods

0:32

Biased Sampling

0:33

Unbiased Sampling

1:13

Probability Sampling Methods

2:31

Simple Random

2:54

Stratified Random Sampling

4:06

Cluster Sampling

5:24

Two-staged Sampling

6:22

Systematic Sampling

7:25

Example 1: Which Type(s) of Sampling was this?

8:33

Example 2: Describe How to Take a Two-Stage Sample from this Book

10:16

Example 3: Sampling Methods

11:58

Example 4: Cluster Sample Plan

12:48

Research Design

53m 54s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Descriptive vs. Inferential Statistics

0:51

Descriptive Statistics: Data Exploration

0:52

Inferential Statistics

1:02

Variables and Relationships

1:44

Variables

1:45

Relationships

2:49

Not Every Type of Study is an Experiment…

4:16

Category I - Descriptive Study

4:54

Category II - Correlational Study

5:50

Category III - Experimental, Quasi-experimental, Non-experimental

6:33

Category III

7:42

Experimental, Quasi-experimental, and Non-experimental

7:43

Why CAN'T the Other Strategies Determine Causation?

10:18

Third-variable Problem

10:19

Directionality Problem

15:49

What Makes Experiments Special?

17:54

Manipulation

17:55

Control (and Comparison)

21:58

Methods of Control

26:38

Holding Constant

26:39

Matching

29:11

Random Assignment

31:48

Experiment Terminology

34:09

'true' Experiment vs. Study

34:10

Independent Variable (IV)

35:16

Dependent Variable (DV)

35:45

Factors

36:07

Treatment Conditions

36:23

Levels

37:43

Confounds or Extraneous Variables

38:04

Blind

38:38

Blind Experiments

38:39

Double-blind Experiments

39:29

How Categories Relate to Statistics

41:35

Category I - Descriptive Study

41:36

Category II - Correlational Study

42:05

Category III - Experimental, Quasi-experimental, Non-experimental

42:43

Example 1: Research Design

43:50

Example 2: Research Design

47:37

Example 3: Research Design

50:12

Example 4: Research Design

52:00

Between and Within Treatment Variability

41m 31s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Experimental Designs

0:51

Experimental Designs: Manipulation & Control

0:52

Two Types of Variability

2:09

Between Treatment Variability

2:10

Within Treatment Variability

3:31

Updated Goal of Experimental Design

5:47

Updated Goal of Experimental Design

5:48

Example: Drugs and Driving

6:56

Example: Drugs and Driving

6:57

Different Types of Random Assignment

11:27

All Experiments

11:28

Completely Random Design

12:02

Randomized Block Design

13:19

Randomized Block Design

15:48

Matched Pairs Design

15:49

Repeated Measures Design

19:47

Between-subject Variable vs. Within-subject Variable

22:43

Completely Randomized Design

22:44

Repeated Measures Design

25:03

Example 1: Design a Completely Random, Matched Pair, and Repeated Measures Experiment

26:16

Example 2: Block Design

31:41

Example 3: Completely Randomized Designs

35:11

Example 4: Completely Random, Matched Pairs, or Repeated Measures Experiments?

39:01

Section 7: Review of Probability Axioms

Sample Spaces

37m 52s

Intro

0:00

Roadmap

0:07

Roadmap

0:08

Why is Probability Involved in Statistics

0:48

Probability

0:49

Can People Tell the Difference between Cheap and Gourmet Coffee?

2:08

Taste Test with Coffee Drinkers

3:37

If No One can Actually Taste the Difference

3:38

If Everyone can Actually Taste the Difference

5:36

Creating a Probability Model

7:09

Creating a Probability Model

7:10

D'Alembert vs. Necker

9:41

D'Alembert vs. Necker

9:42

Problem with D'Alembert's Model

13:29

Problem with D'Alembert's Model

13:30

Covering Entire Sample Space

15:08

Fundamental Principle of Counting

15:09

Where Do Probabilities Come From?

22:54

Observed Data, Symmetry, and Subjective Estimates

22:55

Checking whether Model Matches Real World

24:27

Law of Large Numbers

24:28

Example 1: Law of Large Numbers

27:46

Example 2: Possible Outcomes

30:43

Example 3: Brands of Coffee and Taste

33:25

Example 4: How Many Different Treatments are there?

35:33

Addition Rule for Disjoint Events

20m 29s

Intro

0:00

Roadmap

0:08

Roadmap

0:09

Disjoint Events

0:41

Disjoint Events

0:42

Meaning of 'or'

2:39

In Regular Life

2:40

In Math/Statistics/Computer Science

3:10

Addition Rule for Disjoin Events

3:55

If A and B are Disjoint: P (A and B)

3:56

If A and B are Disjoint: P (A or B)

5:15

General Addition Rule

5:41

General Addition Rule

5:42

Generalized Addition Rule

8:31

If A and B are not Disjoint: P (A or B)

8:32

Example 1: Which of These are Mutually Exclusive?

10:50

Example 2: What is the Probability that You will Have a Combination of One Heads and Two Tails?

12:57

Example 3: Engagement Party

15:17

Example 4: Home Owner's Insurance

18:30

Conditional Probability

57m 19s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

'or' vs. 'and' vs. Conditional Probability

1:07

'or' vs. 'and' vs. Conditional Probability

1:08

'and' vs. Conditional Probability

5:57

P (M or L)

5:58

P (M and L)

8:41

P (M|L)

11:04

P (L|M)

12:24

Tree Diagram

15:02

Tree Diagram

15:03

Defining Conditional Probability

22:42

Defining Conditional Probability

22:43

Common Contexts for Conditional Probability

30:56

Medical Testing: Positive Predictive Value

30:57

Medical Testing: Sensitivity

33:03

Statistical Tests

34:27

Example 1: Drug and Disease

36:41

Example 2: Marbles and Conditional Probability

40:04

Example 3: Cards and Conditional Probability

45:59

Example 4: Votes and Conditional Probability

50:21

Independent Events

24m 27s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Independent Events & Conditional Probability

0:26

Non-independent Events

0:27

Independent Events

2:00

Non-independent and Independent Events

3:08

Non-independent and Independent Events

3:09

Defining Independent Events

5:52

Defining Independent Events

5:53

Multiplication Rule

7:29

Previously…

7:30

But with Independent Evens

8:53

Example 1: Which of These Pairs of Events are Independent?

11:12

Example 2: Health Insurance and Probability

15:12

Example 3: Independent Events

17:42

Example 4: Independent Events

20:03

Section 8: Probability Distributions

Introduction to Probability Distributions

56m 45s

Intro

0:00

Roadmap

0:08

Roadmap

0:09

Sampling vs. Probability

0:57

Sampling

0:58

Missing

1:30

What is Missing?

3:06

Insight: Probability Distributions

5:26

Insight: Probability Distributions

5:27

What is a Probability Distribution?

7:29

From Sample Spaces to Probability Distributions

8:44

Sample Space

8:45

Probability Distribution of the Sum of Two Die

11:16

The Random Variable

17:43

The Random Variable

17:44

Expected Value

21:52

Expected Value

21:53

Example 1: Probability Distributions

28:45

Example 2: Probability Distributions

35:30

Example 3: Probability Distributions

43:37

Example 4: Probability Distributions

47:20

Expected Value & Variance of Probability Distributions

53m 41s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Discrete vs. Continuous Random Variables

1:04

Discrete vs. Continuous Random Variables

1:05

Mean and Variance Review

4:44

Mean: Sample, Population, and Probability Distribution

4:45

Variance: Sample, Population, and Probability Distribution

9:12

Example Situation

14:10

Example Situation

14:11

Some Special Cases…

16:13

Some Special Cases…

16:14

Linear Transformations

19:22

Linear Transformations

19:23

What Happens to Mean and Variance of the Probability Distribution?

20:12

n Independent Values of X

25:38

n Independent Values of X

25:39

Compare These Two Situations

30:56

Compare These Two Situations

30:57

Two Random Variables, X and Y

32:02

Two Random Variables, X and Y

32:03

Example 1: Expected Value & Variance of Probability Distributions

35:35

Example 2: Expected Values & Standard Deviation

44:17

Example 3: Expected Winnings and Standard Deviation

48:18

Binomial Distribution

55m 15s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Discrete Probability Distributions

1:42

Discrete Probability Distributions

1:43

Binomial Distribution

2:36

Binomial Distribution

2:37

Multiplicative Rule Review

6:54

Multiplicative Rule Review

6:55

How Many Outcomes with k 'Successes'

10:23

Adults and Bachelor's Degree: Manual List of Outcomes

10:24

P (X=k)

19:37

Putting Together # of Outcomes with the Multiplicative Rule

19:38

Expected Value and Standard Deviation in a Binomial Distribution

25:22

Expected Value and Standard Deviation in a Binomial Distribution

25:23

Example 1: Coin Toss

33:42

Example 2: College Graduates

38:03

Example 3: Types of Blood and Probability

45:39

Example 4: Expected Number and Standard Deviation

51:11

Section 9: Sampling Distributions of Statistics

Introduction to Sampling Distributions

48m 17s

Intro

0:00

Roadmap

0:08

Roadmap

0:09

Probability Distributions vs. Sampling Distributions

0:55

Probability Distributions vs. Sampling Distributions

0:56

Same Logic

3:55

Logic of Probability Distribution

3:56

Example: Rolling Two Die

6:56

Simulating Samples

9:53

To Come Up with Probability Distributions

9:54

In Sampling Distributions

11:12

Connecting Sampling and Research Methods with Sampling Distributions

12:11

Connecting Sampling and Research Methods with Sampling Distributions

12:12

Simulating a Sampling Distribution

14:14

Experimental Design: Regular Sleep vs. Less Sleep

14:15

Logic of Sampling Distributions

23:08

Logic of Sampling Distributions

23:09

General Method of Simulating Sampling Distributions

25:38

General Method of Simulating Sampling Distributions

25:39

Questions that Remain

28:45

Questions that Remain

28:46

Example 1: Mean and Standard Error of Sampling Distribution

30:57

Example 2: What is the Best Way to Describe Sampling Distributions?

37:12

Example 3: Matching Sampling Distributions

38:21

Example 4: Mean and Standard Error of Sampling Distribution

41:51

Sampling Distribution of the Mean

1h 8m 48s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Special Case of General Method for Simulating a Sampling Distribution

1:53

Special Case of General Method for Simulating a Sampling Distribution

1:54

Computer Simulation

3:43

Using Simulations to See Principles behind Shape of SDoM

15:50

Using Simulations to See Principles behind Shape of SDoM

15:51

Conditions

17:38

Using Simulations to See Principles behind Center (Mean) of SDoM

20:15

Using Simulations to See Principles behind Center (Mean) of SDoM

20:16

Conditions: Does n Matter?

21:31

Conditions: Does Number of Simulation Matter?

24:37

Using Simulations to See Principles behind Standard Deviation of SDoM

27:13

Using Simulations to See Principles behind Standard Deviation of SDoM

27:14

Conditions: Does n Matter?

34:45

Conditions: Does Number of Simulation Matter?

36:24

Central Limit Theorem

37:13

SHAPE

38:08

CENTER

39:34

SPREAD

39:52

Comparing Population, Sample, and SDoM

43:10

Comparing Population, Sample, and SDoM

43:11

Answering the 'Questions that Remain'

48:24

What Happens When We Don't Know What the Population Looks Like?

48:25

Can We Have Sampling Distributions for Summary Statistics Other than the Mean?

49:42

How Do We Know whether a Sample is Sufficiently Unlikely?

53:36

Do We Always Have to Simulate a Large Number of Samples in Order to get a Sampling Distribution?

54:40

Example 1: Mean Batting Average

55:25

Example 2: Mean Sampling Distribution and Standard Error

59:07

Example 3: Sampling Distribution of the Mean

1:01:04

Sampling Distribution of Sample Proportions

54m 37s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Intro to Sampling Distribution of Sample Proportions (SDoSP)

0:51

Categorical Data (Examples)

0:52

Wish to Estimate Proportion of Population from Sample…

2:00

Notation

3:34

Population Proportion and Sample Proportion Notations

3:35

What's the Difference?

9:19

SDoM vs. SDoSP: Type of Data

9:20

SDoM vs. SDoSP: Shape

11:24

SDoM vs. SDoSP: Center

12:30

SDoM vs. SDoSP: Spread

15:34

Binomial Distribution vs. Sampling Distribution of Sample Proportions

19:14

Binomial Distribution vs. SDoSP: Type of Data

19:17

Binomial Distribution vs. SDoSP: Shape

21:07

Binomial Distribution vs. SDoSP: Center

21:43

Binomial Distribution vs. SDoSP: Spread

24:08

Example 1: Sampling Distribution of Sample Proportions

26:07

Example 2: Sampling Distribution of Sample Proportions

37:58

Example 3: Sampling Distribution of Sample Proportions

44:42

Example 4: Sampling Distribution of Sample Proportions

45:57

Section 10: Inferential Statistics

Introduction to Confidence Intervals

42m 53s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Inferential Statistics

0:50

Inferential Statistics

0:51

Two Problems with This Picture…

3:20

Two Problems with This Picture…

3:21

Solution: Confidence Intervals (CI)

4:59

Solution: Hypotheiss Testing (HT)

5:49

Which Parameters are Known?

6:45

Which Parameters are Known?

6:46

Confidence Interval - Goal

7:56

When We Don't Know m but know s

7:57

When We Don't Know

18:27

When We Don't Know m nor s

18:28

Example 1: Confidence Intervals

26:18

Example 2: Confidence Intervals

29:46

Example 3: Confidence Intervals

32:18

Example 4: Confidence Intervals

38:31

t Distributions

1h 2m 6s

Intro

0:00

Roadmap

0:04

Roadmap

0:05

When to Use z vs. t?

1:07

When to Use z vs. t?

1:08

What is z and t?

3:02

z-score and t-score: Commonality

3:03

z-score and t-score: Formulas

3:34

z-score and t-score: Difference

5:22

Why not z? (Why t?)

7:24

Why not z? (Why t?)

7:25

But Don't Worry!

15:13

Gossett and t-distributions

15:14

Rules of t Distributions

17:05

t-distributions are More Normal as n Gets Bigger

17:06

t-distributions are a Family of Distributions

18:55

Degrees of Freedom (df)

20:02

Degrees of Freedom (df)

20:03

t Family of Distributions

24:07

t Family of Distributions : df = 2 , 4, and 60

24:08

df = 60

29:16

df = 2

29:59

How to Find It?

31:01

'Student's t-distribution' or 't-distribution'

31:02

Excel Example

33:06

Example 1: Which Distribution Do You Use? Z or t?

45:26

Example 2: Friends on Facebook

47:41

Example 3: t Distributions

52:15

Example 4: t Distributions , confidence interval, and mean

55:59

Introduction to Hypothesis Testing

1h 6m 33s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Issues to Overcome in Inferential Statistics

1:35

Issues to Overcome in Inferential Statistics

1:36

What Happens When We Don't Know What the Population Looks Like?

2:57

How Do We Know whether a sample is Sufficiently Unlikely

3:43

Hypothesizing a Population

6:44

Hypothesizing a Population

6:45

Null Hypothesis

8:07

Alternative Hypothesis

8:56

Hypotheses

11:58

Hypotheses

11:59

Errors in Hypothesis Testing

14:22

Errors in Hypothesis Testing

14:23

Steps of Hypothesis Testing

21:15

Steps of Hypothesis Testing

21:16

Single Sample HT ( When Sigma Available)

26:08

Example: Average Facebook Friends

26:09

Step1

27:08

Step 2

27:58

Step 3

28:17

Step 4

32:18

Single Sample HT (When Sigma Not Available)

36:33

Example: Average Facebook Friends

36:34

Step1: Hypothesis Testing

36:58

Step 2: Significance Level

37:25

Step 3: Decision Stage

37:40

Step 4: Sample

41:36

Sigma and p-value

45:04

Sigma and p-value

45:05

On tailed vs. Two Tailed Hypotheses

45:51

Example 1: Hypothesis Testing

48:37

Example 2: Heights of Women in the US

57:43

Example 3: Select the Best Way to Complete This Sentence

1:03:23

Confidence Intervals for the Difference of Two Independent Means

55m 14s

Intro

0:00

Roadmap

0:14

Roadmap

0:15

One Mean vs. Two Means

1:17

One Mean vs. Two Means

1:18

Notation

2:41

A Sample! A Set!

2:42

Mean of X, Mean of Y, and Difference of Two Means

3:56

SE of X

4:34

SE of Y

6:28

Sampling Distribution of the Difference between Two Means (SDoD)

7:48

Sampling Distribution of the Difference between Two Means (SDoD)

7:49

Rules of the SDoD (similar to CLT!)

15:00

Mean for the SDoD Null Hypothesis

15:01

Standard Error

17:39

When can We Construct a CI for the Difference between Two Means?

21:28

Three Conditions

21:29

Finding CI

23:56

One Mean CI

23:57

Two Means CI

25:45

Finding t

29:16

Finding t

29:17

Interpreting CI

30:25

Interpreting CI

30:26

Better Estimate of s (s pool)

34:15

Better Estimate of s (s pool)

34:16

Example 1: Confidence Intervals

42:32

Example 2: SE of the Difference

52:36

Hypothesis Testing for the Difference of Two Independent Means

50m

Intro

0:00

Roadmap

0:06

Roadmap

0:07

The Goal of Hypothesis Testing

0:56

One Sample and Two Samples

0:57

Sampling Distribution of the Difference between Two Means (SDoD)

3:42

Sampling Distribution of the Difference between Two Means (SDoD)

3:43

Rules of the SDoD (Similar to CLT!)

6:46

Shape

6:47

Mean for the Null Hypothesis

7:26

Standard Error for Independent Samples (When Variance is Homogenous)

8:18

Standard Error for Independent Samples (When Variance is not Homogenous)

9:25

Same Conditions for HT as for CI

10:08

Three Conditions

10:09

Steps of Hypothesis Testing

11:04

Steps of Hypothesis Testing

11:05

Formulas that Go with Steps of Hypothesis Testing

13:21

Step 1

13:25

Step 2

14:18

Step 3

15:00

Step 4

16:57

Example 1: Hypothesis Testing for the Difference of Two Independent Means

18:47

Example 2: Hypothesis Testing for the Difference of Two Independent Means

33:55

Example 3: Hypothesis Testing for the Difference of Two Independent Means

44:22

Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means

1h 14m 11s

Intro

0:00

Roadmap

0:09

Roadmap

0:10

The Goal of Hypothesis Testing

1:27

One Sample and Two Samples

1:28

Independent Samples vs. Paired Samples

3:16

Independent Samples vs. Paired Samples

3:17

Which is Which?

5:20

Independent SAMPLES vs. Independent VARIABLES

7:43

independent SAMPLES vs. Independent VARIABLES

7:44

T-tests Always…

10:48

T-tests Always…

10:49

Notation for Paired Samples

12:59

Notation for Paired Samples

13:00

Steps of Hypothesis Testing for Paired Samples

16:13

Steps of Hypothesis Testing for Paired Samples

16:14

Rules of the SDoD (Adding on Paired Samples)

18:03

Shape

18:04

Mean for the Null Hypothesis

18:31

Standard Error for Independent Samples (When Variance is Homogenous)

19:25

Standard Error for Paired Samples

20:39

Formulas that go with Steps of Hypothesis Testing

22:59

Formulas that go with Steps of Hypothesis Testing

23:00

Confidence Intervals for Paired Samples

30:32

Confidence Intervals for Paired Samples

30:33

Example 1: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means

32:28

Example 2: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means

44:02

Example 3: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means

52:23

Type I and Type II Errors

31m 27s

Intro

0:00

Roadmap

0:18

Roadmap

0:19

Errors and Relationship to HT and the Sample Statistic?

1:11

Errors and Relationship to HT and the Sample Statistic?

1:12

Instead of a Box…Distributions!

7:00

One Sample t-test: Friends on Facebook

7:01

Two Sample t-test: Friends on Facebook

13:46

Usually, Lots of Overlap between Null and Alternative Distributions

16:59

Overlap between Null and Alternative Distributions

17:00

How Distributions and 'Box' Fit Together

22:45

How Distributions and 'Box' Fit Together

22:46

Example 1: Types of Errors

25:54

Example 2: Types of Errors

27:30

Example 3: What is the Danger of the Type I Error?

29:38

Effect Size & Power

44m 41s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Distance between Distributions: Sample t

0:49

Distance between Distributions: Sample t

0:50

Problem with Distance in Terms of Standard Error

2:56

Problem with Distance in Terms of Standard Error

2:57

Test Statistic (t) vs. Effect Size (d or g)

4:38

Test Statistic (t) vs. Effect Size (d or g)

4:39

Rules of Effect Size

6:09

Rules of Effect Size

6:10

Why Do We Need Effect Size?

8:21

Tells You the Practical Significance

8:22

HT can be Deceiving…

10:25

Important Note

10:42

What is Power?

11:20

What is Power?

11:21

Why Do We Need Power?

14:19

Conditional Probability and Power

14:20

Power is:

16:27

Can We Calculate Power?

19:00

Can We Calculate Power?

19:01

How Does Alpha Affect Power?

20:36

How Does Alpha Affect Power?

20:37

How Does Effect Size Affect Power?

25:38

How Does Effect Size Affect Power?

25:39

How Does Variability and Sample Size Affect Power?

27:56

How Does Variability and Sample Size Affect Power?

27:57

How Do We Increase Power?

32:47

Increasing Power

32:48

Example 1: Effect Size & Power

35:40

Example 2: Effect Size & Power

37:38

Example 3: Effect Size & Power

40:55

Section 11: Analysis of Variance

F-distributions

24m 46s

Intro

0:00

Roadmap

0:04

Roadmap

0:05

Z- & T-statistic and Their Distribution

0:34

Z- & T-statistic and Their Distribution

0:35

F-statistic

4:55

The F Ration ( the Variance Ratio)

4:56

F-distribution

12:29

F-distribution

12:30

s and p-value

15:00

s and p-value

15:01

Example 1: Why Does F-distribution Stop At 0 But Go On Until Infinity?

18:33

Example 2: F-distributions

19:29

Example 3: F-distributions and Heights

21:29

ANOVA with Independent Samples

1h 9m 25s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

The Limitations of t-tests

1:12

The Limitations of t-tests

1:13

Two Major Limitations of Many t-tests

3:26

Two Major Limitations of Many t-tests

3:27

Ronald Fisher's Solution… F-test! New Null Hypothesis

4:43

Ronald Fisher's Solution… F-test! New Null Hypothesis (Omnibus Test - One Test to Rule Them All!)

4:44

Analysis of Variance (ANoVA) Notation

7:47

Analysis of Variance (ANoVA) Notation

7:48

Partitioning (Analyzing) Variance

9:58

Total Variance

9:59

Within-group Variation

14:00

Between-group Variation

16:22

Time out: Review Variance & SS

17:05

Time out: Review Variance & SS

17:06

F-statistic

19:22

The F Ratio (the Variance Ratio)

19:23

S²bet = SSbet / dfbet

22:13

What is This?

22:14

How Many Means?

23:20

So What is the dfbet?

23:38

So What is SSbet?

24:15

S²w = SSw / dfw

26:05

What is This?

26:06

How Many Means?

27:20

So What is the dfw?

27:36

So What is SSw?

28:18

Chart of Independent Samples ANOVA

29:25

Chart of Independent Samples ANOVA

29:26

Example 1: Who Uploads More Photos: Unknown Ethnicity, Latino, Asian, Black, or White Facebook Users?

35:52

Hypotheses

35:53

Significance Level

39:40

Decision Stage

40:05

Calculate Samples' Statistic and p-Value

44:10

Reject or Fail to Reject H0

55:54

Example 2: ANOVA with Independent Samples

58:21

Repeated Measures ANOVA

1h 15m 13s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

The Limitations of t-tests

0:36

Who Uploads more Pictures and Which Photo-Type is Most Frequently Used on Facebook?

0:37

ANOVA (F-test) to the Rescue!

5:49

Omnibus Hypothesis

5:50

Analyze Variance

7:27

Independent Samples vs. Repeated Measures

9:12

Same Start

9:13

Independent Samples ANOVA

10:43

Repeated Measures ANOVA

12:00

Independent Samples ANOVA

16:00

Same Start: All the Variance Around Grand Mean

16:01

Independent Samples

16:23

Repeated Measures ANOVA

18:18

Same Start: All the Variance Around Grand Mean

18:19

Repeated Measures

18:33

Repeated Measures F-statistic

21:22

The F Ratio (The Variance Ratio)

21:23

S²bet = SSbet / dfbet

23:07

What is This?

23:08

How Many Means?

23:39

So What is the dfbet?

23:54

So What is SSbet?

24:32

S² resid = SS resid / df resid

25:46

What is This?

25:47

So What is SS resid?

26:44

So What is the df resid?

27:36

SS subj and df subj

28:11

What is This?

28:12

How Many Subject Means?

29:43

So What is df subj?

30:01

So What is SS subj?

30:09

SS total and df total

31:42

What is This?

31:43

What is the Total Number of Data Points?

32:02

So What is df total?

32:34

so What is SS total?

32:47

Chart of Repeated Measures ANOVA

33:19

Chart of Repeated Measures ANOVA: F and Between-samples Variability

33:20

Chart of Repeated Measures ANOVA: Total Variability, Within-subject (case) Variability, Residual Variability

35:50

Example 1: Which is More Prevalent on Facebook: Tagged, Uploaded, Mobile, or Profile Photos?

40:25

Hypotheses

40:26

Significance Level

41:46

Decision Stage

42:09

Calculate Samples' Statistic and p-Value

46:18

Reject or Fail to Reject H0

57:55

Example 2: Repeated Measures ANOVA

58:57

Example 3: What's the Problem with a Bunch of Tiny t-tests?

1:13:59

Section 12: Chi-square Test

Chi-Square Goodness-of-Fit Test

58m 23s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Where Does the Chi-Square Test Belong?

0:50

Where Does the Chi-Square Test Belong?

0:51

A New Twist on HT: Goodness-of-Fit

7:23

HT in General

7:24

Goodness-of-Fit HT

8:26

Hypotheses about Proportions

12:17

Null Hypothesis

12:18

Alternative Hypothesis

13:23

Example

14:38

Chi-Square Statistic

17:52

Chi-Square Statistic

17:53

Chi-Square Distributions

24:31

Chi-Square Distributions

24:32

Conditions for Chi-Square

28:58

Condition 1

28:59

Condition 2

30:20

Condition 3

30:32

Condition 4

31:47

Example 1: Chi-Square Goodness-of-Fit Test

32:23

Example 2: Chi-Square Goodness-of-Fit Test

44:34

Example 3: Which of These Statements Describe Properties of the Chi-Square Goodness-of-Fit Test?

56:06

Chi-Square Test of Homogeneity

51m 36s

Intro

0:00

Roadmap

0:09

Roadmap

0:10

Goodness-of-Fit vs. Homogeneity

1:13

Goodness-of-Fit HT

1:14

Homogeneity

2:00

Analogy

2:38

Hypotheses About Proportions

5:00

Null Hypothesis

5:01

Alternative Hypothesis

6:11

Example

6:33

Chi-Square Statistic

10:12

Same as Goodness-of-Fit Test

10:13

Set Up Data

12:28

Setting Up Data Example

12:29

Expected Frequency

16:53

Expected Frequency

16:54

Chi-Square Distributions & df

19:26

Chi-Square Distributions & df

19:27

Conditions for Test of Homogeneity

20:54

Condition 1

20:55

Condition 2

21:39

Condition 3

22:05

Condition 4

22:23

Example 1: Chi-Square Test of Homogeneity

22:52

Example 2: Chi-Square Test of Homogeneity

32:10

Section 13: Overview of Statistics

Overview of Statistics

18m 11s

Intro

0:00

Roadmap

0:07

Roadmap

0:08

The Statistical Tests (HT) We've Covered

0:28

The Statistical Tests (HT) We've Covered

0:29

Organizing the Tests We've Covered…

1:08

One Sample: Continuous DV and Categorical DV

1:09

Two Samples: Continuous DV and Categorical DV

5:41

More Than Two Samples: Continuous DV and Categorical DV

8:21

The Following Data: OK Cupid

10:10

The Following Data: OK Cupid

10:11

Example 1: Weird-MySpace-Angle Profile Photo

10:38

Example 2: Geniuses

12:30

Example 3: Promiscuous iPhone Users

13:37

Example 4: Women, Aging, and Messaging

16:07

This is a quick preview of the lesson. For full access, please Log In or Sign up.
For more information, please see full course syllabus of Statistics

Statistics Chi-Square Goodness-of-Fit Test

Name: Statistics: Chi-Square Goodness-of-Fit Test
Brand: Educator.com
Price: 35 USD
Availability: InStock

Section 12: Chi-square Test: Lecture 1 | 58:23 min

Lecture Description

Next Lecture

Previous Lecture

Discussion
Answer Engine
Download Lecture Slides
Table of Contents
Transcription
Related Books

Lecture Comments (2)

0 answers

Post by Lois Han on April 30, 2012

You are a breath of fresh air in my statistics life. Thank you so much!

0 answers

Post by Matt Lin on March 18, 2012

Why we are not reject the Null if sample chi-square is larger than critical chi-square?

Answer EngineGet answers to any question!Ask any question related to Statistics

Working on the solution...

Chi-Square Goodness-of-Fit Test

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

Intro 0:00
Roadmap 0:05

Roadmap

Where Does the Chi-Square Test Belong? 0:50

Where Does the Chi-Square Test Belong?

A New Twist on HT: Goodness-of-Fit 7:23

HT in General
Goodness-of-Fit HT

Hypotheses about Proportions 12:17

Null Hypothesis
Alternative Hypothesis
Example

Chi-Square Statistic 17:52

Chi-Square Statistic

Chi-Square Distributions 24:31

Chi-Square Distributions

Conditions for Chi-Square 28:58

Condition 1
Condition 2
Condition 3
Condition 4

Example 1: Chi-Square Goodness-of-Fit Test 32:23
Example 2: Chi-Square Goodness-of-Fit Test 44:34
Example 3: Which of These Statements Describe Properties of the Chi-Square Goodness-of-Fit Test? 56:06

General Statistics Online Course

Section 1: Introduction
	Descriptive Statistics vs. Inferential Statistics	25:31
Section 2: About Samples: Cases, Variables, Measurements
	About Samples: Cases, Variables, Measurements	32:14
Section 3: Visualizing Distributions
	Introduction to Excel	8:09
	Frequency Distributions in Excel	39:10
	Frequency Distributions and Features	25:29
	Dotplots and Histograms in Excel	42:42
	Stemplots	12:23
	Bar Graphs	22:49
Section 4: Summarizing Distributions
	Central Tendency: Mean, Median, Mode	38:50
	Variability	42:40
	Five Number Summary & Boxplots	57:15
	Shape: Calculating Skewness & Kurtosis	41:51
	Normal Distribution	34:33
	Standard Normal Distributions & Z-Scores	41:44
	Normal Distribution: PDF vs. CDF	55:44
Section 5: Linear Regression
	Scatterplots	47:19
	Regression	32:02
	Least Squares Regression	56:36
	Correlation	43:58
	Correlation: r vs. r-squared	52:52
	Transformations of Data	27:08
Section 6: Collecting Data in an Experiment
	Sampling & Bias	54:44
	Sampling Methods	14:25
	Research Design	53:54
	Between and Within Treatment Variability	41:31
Section 7: Review of Probability Axioms
	Sample Spaces	37:52
	Addition Rule for Disjoint Events	20:29
	Conditional Probability	57:19
	Independent Events	24:27
Section 8: Probability Distributions
	Introduction to Probability Distributions	56:45
	Expected Value & Variance of Probability Distributions	53:41
	Binomial Distribution	55:15
Section 9: Sampling Distributions of Statistics
	Introduction to Sampling Distributions	48:17
	Sampling Distribution of the Mean	1:08:48
	Sampling Distribution of Sample Proportions	54:37
Section 10: Inferential Statistics
	Introduction to Confidence Intervals	42:53
	t Distributions	1:02:06
	Introduction to Hypothesis Testing	1:06:33
	Confidence Intervals for the Difference of Two Independent Means	55:14
	Hypothesis Testing for the Difference of Two Independent Means	50:00
	Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means	1:14:11
	Type I and Type II Errors	31:27
	Effect Size & Power	44:41
Section 11: Analysis of Variance
	F-distributions	24:46
	ANOVA with Independent Samples	1:09:25
	Repeated Measures ANOVA	1:15:13
Section 12: Chi-square Test
	Chi-Square Goodness-of-Fit Test	58:23
	Chi-Square Test of Homogeneity	51:36
Section 13: Overview of Statistics
	Overview of Statistics	18:11

Transcription: Chi-Square Goodness-of-Fit Test

Hi, welcome to educator.com.0000

We are going to talk about the chi-square goodness of fit test.0002

So first, we are going to start with the bigger review of where the chi-square test actually fits in.0005

Amongst all the different inferential statistics we have been learning so far and then we are going to talk0012

about a new kind of hypothesis testing, the goodness of fit hypothesis test.0018

So it is going to be similar to hypothesis testing as we been doing so far but there is a slightly different logic behind it.0023

So because it is a slightly different logic there is a new all hypothesis as well as the alternative hypothesis.0029

Then we are going to introduce the chi-square distribution and the chi-square statistic.0037

And then we are going to talk about the conditions for chi-square test when do we actually do it.0044

So where does the chi-square test belong?0049

And it is been a while since we have looked at this if you are going in order with the videos but I think it is0054

pretty good to stop right now and sort of think where we come from?0059

Where are we now?0063

So the first thing we want to think about are the different independent variables that we been able to look at.0065

We been able to look at independent variables the predictor variables that are either categorical or continuous.0072

When the idea is categorical you have groups right?0084

Or different samples, right?0095

When the idea is continuous you do not have different groups you have a different levels that predict something.0098

So just to give you a idea of a categorical IV that would be something like experimental group versus the0107

control group or something like this categorical IV may be someone who gets a drug versus someone who0116

gets the placebo , a group that gets the drivers of the group that gets the placebo and example of the0127

continuous IV might be looking at how much you study predicting your score on a test , so how much you0132

study would be a continuous IV.0140

So that is one of the dimensions that we need to know, is your IV categorical or continuous.0143

You also need to know whether the DV is categorical or continuous so the DV is the thing that were0150

interested in measuring at the end of the day the things that we want to know that this thing change this is0160

the thing we want to predict right, and so far here is how would come.0167

At the very beginning we looked at continuous types of tests and those types of measures and those were0177

the regression, linear regression, as well as correlation.0187

Remember R and regression was that stuff about like Y equals the not + b sub 1 times X, so that was0193

regression and correlation way back in the day.0210

We have been covering a lot of this quadrant actually looking at t-tests and ANOVA right?0215

One important thing to know that t-tests and ANOVAs are both hypothesis tests, only so far have not0224

learned hypothesis testing with regression and correlation.0238

A lot of inferential statistics in college does not cover hypothesis testing of regression until you get to more advance levels of statistics.0241

So what do ANOVAs and t-tests sort of have in common?0255

Well they have in common that they are both categorical IV and continuous DV.0261

The IV is categorical and you only have one, one IV.0269

And your DV is continuous.0277

So that sort of what they have in common, what is different about them?0282

Well the difference is that the IV in t-tests has two levels in only two levels so there is only two groups or two samples.0287

In ANOVAs we could test for more than two samples, we can do that for 3 4 5 samples.0297

So that IV has greater than two levels and so that is where we been spending a lot of our time.0302

So for the most part continuous DV are really important because they tell us a lot, they tell us the find ways0312

that we could actually be different, that the data could actually be different.0320

So you are going to, it is more rare that you will use the categorical dependent variable, that is not going to0327

be as informative to us but it is still possible and that is where the chi-square is going to come in.0334

The chi-square is been coming right in this quadrant where we have categorical IV also a categorical DV so0340

for instance we might want to see something like if you are given a particular job or the placebo, do you0347

feel like you are getting better, yes or no right?0357

So that is a categorical DV, it is not like the score that we can find a mean and so this is where the chi-square tests come in.0360

And there is going to be 2 chi-square tests that we are going to look at.0375

The first one, we are going to cover today and it is called goodness of fit.0379

The next one is in the next lesson and it is called a test of homogeneity.0382

They are both chi-square test.0386

The other way you will see that what is written is chi-squared, so sometimes, do not think of, oh what is this doing here?0387

When it has this little curvy part here we need chi-square, the Greek letter chi, finally this is a test that0398

rarely is covered in inferential statistics but at more advanced levels of statistics he did cover it and it is called0407

the logistic test and logistic test takes you from continuous IV to categorical DV.0415

But that is rare design used in conducting science, it is not as informative as continuous to continuous or categorical to continues.0424

Alright so we are going to spend your time right in here.0436

So there is a new twist on hypothesis testing, it is not totally different, it is still very similar but there is there is a subtle difference.0441

Today we are going to start off with the chi-square goodness of fit test.0454

Basically let us think about hypothesis testing in general.0457

In general you want to determine whether a sample is very different from expected results that is the big idea of hypothesis testing0462

and expected results come from your hypothesized population.0470

If your sample is very different than we usually determine that with some sort of test statistic and looking0474

at how far it is on the on the tested statistics distribution right and we look at whether it is past that Alpha0481

cut off or the critical test statistic right and then we say, oh this sample is so different than would be0489

expected given that the null hypothesis is true that we are going to reject the null hypothesis.0496

That is usually hypothesis testing. It still takes that idea whether to look at whether a sample is very0504

different from expected results, but the question is how are we going to compare these two things?0511

We are not going to compare means anymore, we are not going to look at the distance between means,0517

nor are we going to look at the proportion of variances that is not what we are going to look at either.0521

Instead we are going to determine whether the sample proportions for some category are very different0527

from the hypothesized population proportion.0539

And the question will be how do we determine very different and here is what I mean by determine0542

whether the sample proportions are different from the hypothesized population proportion.0549

So here I am just going to draw for you sort of schematically what the hypothesized population proportions might look like.0554

So this is just sort about the idea, so you might think of the population as being like this and in the0569

population you might see a proportion of one third being blue, one third being red, and one third being yellow.0577

Now already it is hard to think about like you could already sort of see, well we cannot get the average of0588

blue red and yellow right like what would be the average of that, and how would you find the variability of0597

that so already we are starting to see why you cannot use t-tests or ANOVAs if you cannot find the mean or0605

variance you cannot use those test so is this is what our hypothesized population looks like and when we0613

get a sample we get a little sample from that population, we want to know whether our sample0622

proportions are very different from the hypothesized proportions or not, so let us say in our sample0631

proportion we get mostly blue, little bit of red, little bit of yellow so let say 60% blue 20% red 20% yellow.0637

Are those proportions different enough from our hypothesized proportion?0650

Another sample we might get is you know, half blue and half red and no yellow, is that really different from our hypothesized proportion?0655

Another sample we might get might be only like 110 blue and then 40% red and then the other half will be yellow.0674

So something like that we want to say if it is really different from these hypothesized population0694

proportion, and so that is what our new our new goal is.0700

How different are these proportions from these proportion and then the question becomes okay how to0706

determine whether something is very different?0713

Is this very different or just different?0717

How do we determine very different, that is going to be the key question here.0724

And that is why we are going to need the chi-square statistic and the chi-square distribution.0728

So we are changing our hypotheses a little bit now the null hypotheses is really about proportion and here is what we are talking about.0733

The null hypothesis now is that the proportions of the population are real population that we do not know?0749

Will this population be like the predicted or theorized proportion and so here we are asking is this unknown0756

population like or known population right and it should sound familiar as that sort of the fundamental basis of inferential statistics.0772

So that is our new null hypothesis.0782

That the proportions in the population are like the predicted will be like the predicted population proportion still be the same.0785

Remember sameness is always the hallmark of the null hypothesis alternatively if you want to say at least0798

one of the proportion in the population will be different than predicted so going back to our example, if our0807

population are hypothesized population is something like one third, one third, one third maybe what we0816

will find is something like in our sample will have one third blue but then some smaller proportion like 15% red and on the rest being yellow.0830

Now the one third should match up.0856

The one third matches up but what about these other two?0860

And so an alternative hypothesis at least one proportion in the population will be different from the predicted proportion,0864

there just has to be one guy that is different.0875

Suggest I give you an example, let us turn this problem into a null hypothesis in an alternative hypothesis.0878

So here it said according to early polls candidate A was supposed to win 63% of the votes and candidate B was supposed to win 37%.0886

When the votes are counted candidate a won 340 votes while B won 166 votes so here just to give you that0898

picture again the null hypothesis population was that candidate A color A in blue, candidate A should have0908

won 63% of the vote and candidate B all color in red should have won 37% of the vote so what would be our null hypothesis?0918

Our null hypothesis would be that our unknown population will be like this predicted the proportions of my unknown population0933

will have the same proportion as our predicted population.0945

So here we might see something like A's proportion of votes of the actual real votes should be like this,0949

the predicted population, and B’s proportion of votes should be like predicted population.0982

So let us say, A’s proportion the real proportion of votes should be like this, and so should B, B should be like this.1009

The other way we could say that is that the proportion of votes the real proportion of votes should be like1017

the predicted proportion of votes, and then you could just say for every single category for both A and B.1025

So what would be the alternative version of this?1031

The alternative would say at least one of the proportion one of the categories either A or B one of those1035

proportions will be different from the hypothesized proportion.1043

And in fact in this example if one of them is different the other will be different to because since we only1048

have two categories if we make one really different than the other one will automatically change.1056

But later on we might see example 3, 4, 5 category and so in those cases this will make more sense.1061

Okay so now let us talk about how to actually find out if out proportions are really off or not.1070

Are our proportion statistical outliers are they deviant, are they significant, do they stand out, that is what we want to know.1080

And in order to do that we have to use measure called the chi-square statistic instead of the T statistic1092

which looks at a distance away in terms of standard error instead of the S statistic which looks at the1099

proportion of the variance are interested in over the variance we cannot explain the chi-square does something different.1106

It is now looking at expected values what would we expect and what would we actually observe and so the1113

chi-square is going to look like this, so be careful that you do not, usually it is like a uppercase accident and1124

it is a little bit different than like a regular letter X, it is usually a little more curvy to let you know it is chi-square.1134

So the chi-square is really going to be interested in the difference between what we observe the actual1142

observed frequency or percentages minus the expected frequency.1150

So what were looking at observed versus expected this is what we see in our sample and this is what we1157

would predict given our hypothesized population so this is that predicted population part.1170

So were interested in the difference between those two frequencies.1180

Now although you could use proportions as well you can only do that if you have the same, if you have a1185

constant number of items so you probably are safer to go with frequencies because those are assertively1200

weeded proportion so you probably want to go with that.1203

So were interested in this difference but remember when we look at this different sometimes there can be1207

positive sometimes there can be negative and so we what we do here as is usual in statistics as we square1214

the whole thing, but we also want to know about this difference as a proportion of what was expected and we want to do this for every category.1220

For the number of categories and I goes from one to the number of categories and there is actually an I down here for everything.1234

So what this is saying is that for each category, each proportion that you are looking at so in our in our sort1249

of toy example with the red blue and yellow, in this example we would do this for blue we would do this1259

for red and we would do this for yellow so number of categories, so categories really speak to what are the proportions made of?1275

So in here we have three categories so we would do this three times and add those proportions up and we1291

want to eventually be able to find observed frequency and the expected frequency.1315

Now in the example that we saw with the voting of for candidate A and B, one of the things I hope you1321

noticed was that the observed frequencies were given is just number of votes how many people voted but1330

the expected frequencies would be expected hypothesized population, that was given as a percentage so1336

you cannot subtract votes from percentage, you have to translate them both into something that is the1346

same and so in that it is helpful to change the expected percentages into expected frequency and there is1353

going to be another reason for changing it into expected frequencies instead of changing the observed1366

frequencies into the observed proportion and I am going to that a little bit later.1371

So here is what I want you to think of this, is really the square difference between observed and expected1377

frequencies as a proportion of expected frequency and you want to do that and you want to sum that over all the categories.1384

Once you have that then you get your chi-square value, now let us think about this chi-square value.1394

If this difference is very large right so observed frequencies are just very different than expected one, is that difference is very large?1400

You are going to have a very large chi-square also if this difference is very small, they are really close to each other, then your chi-square is be very small.1413

So chi-square is giving us a measure of how far apart the observed and expected frequencies are, also I1422

want to see that the chi-square cannot be negative.1434

First of all because were squaring this difference right so the numerator cannot be negative not only that1439

the expected frequencies also cannot be negative because we are counting up how many things we have ,1445

how many things we observed and so this also cannot be negative so this whole thing cannot be negative.1451

So already we see in our mind the chi-square distribution will probably be positive and positively skewed1457

because it stops at zero there is a wall at zero.1465

Okay so now let us actually talk and draw the chi-square distribution so imagine having some sort of data1470

set and taking from it over and over again samples so you take a sample and so have this big data set, you1479

take the sample and you calculate the chi-square statistic and you plot that.1487

And then you put that back in you take another sample and you take the chi-square plotted again and do1493

that over and over and over and over again.1502

You will never get a value that is below zero and you will get values that might be way higher than zero1505

sometimes but for the most part though be clustered over here so you will get a skewed distribution and1514

indeed the chi-square distribution is a skewed distribution.1520

Now here when we look at this you might think, hey, that looks sort of like the F distribution and you are1527

right overall and shape it looks just like the F distribution and in a lot of ways we could apply the reasoning1536

from the F distribution directly to the chi-square distribution.1544

For instant in the chi-square distribution, our alpha is automatically one tailed it is only on one side and so1548

when we say something like alpha equals .05 this is what we mean, we mean that we will reject the null1556

when we have a chi-square value that somewhere out here or here or here but we will fail to reject if we1565

get a chi-square value in here from our sample.1573

Now this chi-square distribution like the S and t-distribution, it is a family of distribution, not just one1576

distribution the only one that is just one distribution is the normal distribution.1586

The chi-square distribution again depends on degrees of freedom and the degrees of freedom that the chi-1591

square depends on is going to be the number of categories -1 .1598

So if you have a lot of categories the chi-square it will look distribution will look different if you have a small1608

number of the categories like 2, the chi-square distribution will look different.1615

So let us talk about what Alpha means here.1619

The alpha here is this set significance level we are going to say, we are going to use this as the boundary so1623

that if we have a chi-square from our sample that bigger than this boundary then we will reject the null.1630

What is the difference now with P value?1643

Now the P value said this is the probability so we might have a P value somewhere out here or we might1647

have a P value somewhere here, the P value is going to be very similar to other hypothesis test what the P1656

value means and other hypothesis test, basically is going to be the probability of getting a high square value1669

larger more extreme and in this case there is only one kind of extreme, positive larger than the one from our sample but under condition.1681

Remember in this world which one is true?1700

The null hypothesis is true.1703

So considering if the null hypothesis were true this would be the probability of getting such an extreme chi-1712

square value , one that is that large or larger, that is all we need.1720

So, in that way the P value is from our data while the alpha is not from our data it is it is just something we sat as the cut off.1727

So there are some conditions that we need to know before we use the chi-square.1737

When we use the chi-square we cannot just always use it, there are conditions that have to be met so one of the conditions of the chi-square is this.1745

Each outcome in the population falls exactly into one of a fixed number of categories, so every time you1756

have some sort of case from the population so let us say we are drying out votes.1765

Each vote has to fall into one of a fixed number of categories so if it is two candidates, always two1773

candidates for every single voter so we cannot compare voters that had two candidates versus voters who had three candidates.1785

Also these have to be mutually exclusive categories, one vote cannot go to two candidates at ones so they1792

have to be mutually exclusive, you got vote for A or vote for B.1802

And you cannot opt out either, or else nobody has to be one of the fixed numbers of categories ahead of time.1807

So the numbering is slightly off here but the second condition that must be met is that you must have a1816

random sample from your population, that is just like all kinds of hypothesis testing though.1826

Number 3, the expected frequency in each category so once you once you compute all the expected1832

frequency in order to compute your chi-square, that needs to be each cell each square needs to have an1840

expected frequency of five or greater, here is why.1850

You need a big enough sample, if you have to small of the sample, again expected frequencies less than five1854

also unique big enough proportions, so let us say you want to compare proportions that are like you know1862

like one candidate is going to be predicted to win 99.999% of the votes and the other candidate is only1871

supposed to win .001% of the vote and you only have five people in your sample.1883

And so you need to also have big enough proportion and these balance each other out.1890

If you have a large and a sample than your proportions can be smaller also, if you have large enough1897

proportions in your sample could be smaller.1903

And the final condition is not really condition it is just sort of something I wanted you to know at the rule.1905

The chi-square goodness of fit test so that is always been talking about so far.1913

This test actually applies to more than two categories.1920

You do not just have 2 categories, you have 3 or 4 or 5 or 6 but they do need to be mutually exclusive and1927

each outcome in the population must be able to fall into any one of those.1935

So those are the conditions.1940

So now let us move on to some examples.1943

So the first example is the problem that we already looked at so far according to early polls candidate A1947

was supposed to win 63% of the vote and B was supposed to win 37%.1953

When the votes are counted, A won 340 votes while B won 166 votes.1958

One of the things that I like to do just to help myself is when I think of the null hypothesis, when I think of1967

the null hypothesis, I sort of write it in a sentence that the proportion of votes, that is my population,1975

should be like predicted proportions, and the alternative is that at least one of the proportion of votes will not be like predicted population.1990

What I also like to do is I like to draw this out for myself, I like to draw out the predicted population so I will2032

color candidate A in blue so that will be about 63%, candidate B will be in red, 37%.2040

And so eventually I want to know whether this is reflected in my actual votes.2053

The significance level we can set it up .05 just set of convention and we know that it has to be one tailed2059

because this is definitely going to be a chi-square and we know it is a chi-square because it is about expected proportions.2068

So now let us set our decision stage.2075

Now our decision stage, it is helpful to draw that chi-square distribution and to sort of label it, for alpha2081

here this is our rejection region .05, now it would be nice to know what our critical chi-square is, and in2100

order to find that we need degrees of freedom and degrees of freedom is the number of categories, in this2111

case 2 -1 and that is 1° of freedom and it is because if you know let us say that candidate B won that is2119

supposed to win 37% of the votes you could actually figure out candidate A like you do not need me to tell2131

you what that is to figure it out and candidate A cannot vary, the proportion cannot very freely once you2138

know this one and that is why it is number of categories – 1.2143

So now that we have that you might be useful to look at either in the back of your book or use XL2148

spreadsheet Excel function in order to find our critical chi-square.2156

So in order to find chi-square there are two functions that you need to know just like T this and T, F this and F in, now there is chi-this.2161

Actually we need to use chi in right now because here we have the probability .05 and the degrees of2182

freedom one and that will give us our critical chi-square and that is 3.84.2190

So critical and so this is the boundary were looking for 3.84 so anything more extreme more positive than2198

3.84 and were going to reject our null hypothesis.2208

So now that our decision stage is set, now it is helpful to actually work with our population and remember2214

when we talk about our population, should have left myself some room, when we talk about our actual sample here is what we ended having.2221

We have observed frequencies already so for candidate A, I am going to write a column for observed in2236

candidate B so candidate A, we observed 340 votes so that is our observed frequency for candidate B, we see 166 votes.2243

Now one that helps is we know what the total number of votes was, so the total number of votes is going to be 340+166 and that is 506.2261

So 506 people actually voted in this so down here I am going to write total 506.2274

Now the question is what should our active frequencies have been?2283

So here I am going to write expected and I know that my proportion of expected should be 63%.2291

That means is that the total number of people who voted?2298

So here is our little sample of 506 people.2302

This is our 100% but here we have 506 people in our sample, we should expect 63% of 506 to have voted2308

for A, and so how do we find that?2323

Well we are going to multiply 63% to 506 to find out how many votes that little blue bit is and so that is2328

going to be.63×506 that total amount.2341

If we multiply 506 x 1 we would get 506 right?2350

So if we multiply by a little bit of a smaller proportion that we get just that chunk. 318.78 actually I am2355

going to put this here, let me actually draw this little table right in here because that can help us do our 3939.1 finder chi-square much more quickly.2367

And so observed expected frequency observed frequency at 340 and 166, okay.2383

So what are the other expected frequency for B, so in order to find this little bit we are going to multiply2394

.37×506, so .37x506 and that is 187.22.2401

And usually if you add this entire column that you should get roughly a similar total.2414

When you do it, when you do these by hand sometimes you might not get exactly the same number it2422

might be off by just a little bit because of a rounding error, if you round to the nearest 10th, round to the nearest integer,2429

you make it a little bit around it here but you should be off by much so that one way you could check to see what you did was right.2438

And so once we have this, so let me just copy these down right here so 318.78 and 187.22 for each of these2445

the total is 506, so here, one of things we see is that the expected value for A are a little bit lower and the2463

expected values for B are little bit higher, but is this difference in proportion is that significant is that2476

standing out enough, and in order to find that we need to find the chi-square, the sample chi-square.2485

Now, we completely run out of room here.2493

But I will just write the chi-square formula up here.2497

So the chi-square is going to be the sum over all the categories of the observed frequency minus the2500

expected square as a proportion of the expected frequency.2510

And so what I am going to do is calculate this for each category, A and B and then add them up.2517

So right here I am going to call this a column, O minus E squared all over B.2525

So I am going to do that for A and B and then sum them up.2540

So, my observed minus expected squared all divided by expected and so here I get this proportion and I am2547

just going to copy and paste that down here and then here I am just going to some them up and I get 3.817.2565

We are really close but no cigar so where were right underneath so our sample chi-square is just a smidge2577

smaller than our critical chi-square so here were not rejecting the null, we are going to fail to reject the2589

null, so let us find the P value so in order to find the P value you could use chi disc or alternatively look it up2597

in the back of your book, look for the chi-square distribution.2609

It should be behind your normal, your T, your F and then chi-square should come right behind it, it usually goes in that order , maybe a slightly different order.2614

And our degrees of freedom remain the same one and so all our P value is just over .05, if we round, .51 right?2627

So because of that we are not going to reject the null so we are going to say the proportions of votes are roughly similar to the predicted proportions.2640

Well, they are not significantly different at least, they are not super similar but we cannot make a decision2657

about that but we can say they are not that different from, that they are not extremely different at least.2663

Okay, example 2. A study ask college students could tell dog food apart from expensive liver pâté liverwurst and spam.2669

All blended to the same consistency chilled and garnished with herbs and a lemon wedge, just to make it pretty.2684

Students are asked to identify which was dog food.2695

Researchers wanted to test the probability model where the students are randomly guessing.2698

How would they cast their hypothesized model?2703

Okay so see the download that shows how many students picked that item to be dog food, so it seems that2707

college students have a bunch of different choices in dog food liver Patty, liverwurst and spam, and then2714

they need to identify which was dog food so out of those, which of those is dog food?2723

So it is sort of like a multiple-choice question.2728

So if you hit example 2 in the download that listed below, you will see the number of students is selected that particular item as dog food.2732

Now be careful because some people right here, remember, you will really get this problem on a test and you would not know that it is a chi-square problem.2741

Sometimes people might immediately just think I will find the means and so they just go ahead and find the2751

mean but then if you do find the mean, ask yourself, what does this mean?2758

What is the idea or the concept?2763

If we average this, we would find the average number of students that selected any of these items as dog2768

food and that sort of a mean that does not make any sense right?2775

And so before you know, go ahead and find the mean, ask yourself whether the mean is actually meaningful.2779

So here we know that the chi-square because the students are choosing something and it is a categorical choice.2788

They are not giving you an answer like 20 inches or 50° or I got 10 questions correct right?2798

They are actually just saying, that one is dog food and they have five different choices and they have2804

chosen one of them as dog food so out of five choices of probability model that are just guessing would2813

mean that 20% of the time they should pick pâté, once we dog food, 20% of the time don't expand to be2821

dog food 20% of the time to pick dog food to be dog food and so on and so forth.2828

So let us try that probability model and by model we also need null hypothesis.2835

Model or hypothesized population so step one.2844

So the null hypothesis is the idea that they will fit into this picture so this is the population, and it is out of2848

100% and they have five choices of pictures just lightly un even, it helps really draw this is as well as you can, just as then it will help you reason to.2858

That they will have a equal chance of guessing either one of these and there is two liver patties that is why there are 5 choices.2878

So liver pâté 1, spam was next, then actual dog food just in the data set, patty 2 and a liverwurst.2885

So these are the five choices and were saying look the students are just guessing they should have a 20% probability of each.2909

Is this the right proportion for this sample, is the sample going to serve match that or be very different from this.2923

The alternative is that at least one of the real proportion is different from predicted.2938

So once we have that, we can set our alpha to be .05 our decision stage, could draw there chi-square and2954

our degrees of freedom, we now have five categories and so our degrees of freedom is 5-1 which equals 42970

and it is because once we know four of this, that we could actually figure out the proportion for the fifth one just from knowing 4 of this.2978

So that one is no longer free to vary, it does not have freedom anymore.2987

So what is our critical chi-square?2991

Well, if you want to pull up your Excel data, here I am just in a start off with step three, in step three we are2998

critical chi-square in order to find that we can use chi-in, put in the probability that were interested in and our degrees of freedom which is 4.3011

And so our critical chi-square is 9.49.3026

Noticed that as degrees of freedom goes up, what is happening to the chi distribution is that it is getting3035

fatter it is getting more variable and because of that we need a more extreme chi-square value.3053

So that is sort of different than like T distributions or F distribution.3059

Those distributions got sharper when we increased our degrees of freedom , chi distributions were the opposite way.3066

Those district chi distributions are getting more variable as degrees of freedom goes up.3075

So once we have this now we could start working on our actual data, our actual samples.3080

So step four is we need to find a sample chi-square and in order to do that it helps to draw out that table so3089

the table might look something like this.3102

I will just copy this down here and this is the type of food, so that is the category and here we have our observed frequencies.3106

The actual number of students that pick that thing to be dog food.3125

So here we seen one student pick pâté, one to be dog food, 15 students picked liverwurst to be the dog food.3130

What are the expected frequencies?3138

Well in order to find expected frequencies we know that the expected proportions are going to be .2 all the way down.3142

20% 20% 20% 20% and here I am just going to total this up.3153

And I see that 34 students were asked this question.3161

Are expected frequencies should add up to about 34?3170

Are expected proportions adds up to one?3175

And that is why we cannot just directly compare these two things, they are not in the same sort of currency3179

yet, you sort of have to change this currency into frequency.3184

So how do we do that?3189

Well we imagine here are all 34 students take 20% of them, how many students will that be?3192

So that is 0.2×34, this times 34.3199

And I am just going to lockdown that 34 because that total sum would not change.3207

So, this is what we should expect that if they were indeed guessing, this is the expected frequencies that3214

we should see and if I just move that over here , we will see that that also at the column also add up to 34.3226

Now once we have that we can compute our actual chi-square because remember that observed frequency3233

minus expected square divided by expected as a proportion of expected.3240

So, that is the observed frequency minus expected frequency squared divided by the expected frequency.3247

And I could take that down for each row and then add those up and here I get my chi-square statistic for3257

my sample and so my sample chi-square is going to be 16.29, and that is the larger more extreme chi-3268

square than my critical chi-square, and let's also find P value here.3281

In order to find P value I could use chi-disc, here I put in my chi-square and my degrees of freedom which is 4.3286

And so that is .003 and that is certainly smaller than .05 and so in step five, we reject the null.3297

Now I just want to make a comment here.3315

Notice that here, after we do the chi-square although we reject the null just like in the ANOVA we do not3318

actually know which of the categories is the one that is really off.3325

This one here, we can sort of see, this one probably seems to be the most off but we are just eyeballing it,3330

were not using actual statistical principles.3340

So once you reject the null there is a post hoc test that you could do but we are not going to cover those here.3343

So it seems that students are not randomly guessing they actually have a preference for something as being dog food.3349

My guess is liverwurst.3362

So example 3 which of these statements describe properties of the chi-square goodness of fit test?3365

So if you switch the order of categories the value of the test statistic does not change, that is actually true it3376

does not matter whether candidate A got added before candidate B addition is totally order insensitive you3383

could add A or B or B on A, you can add pâté or liverwurst and dog food or dog food the liverwurst and3391

pate, it does not really matter so this is actually true, as a true property.3398

Observed frequencies are always whole members that is also actually true because when you observe of3403

the frequency, you are actually counting how many category numbers you have so counting is going to be made up of whole numbers.3410

Expected frequencies are always whole numbers, that is actually not true, expected frequencies are predicted frequencies.3418

It is not that at any one time you will have plenty student saying that liverwurst is dog food but it is that on3427

average that is what you would predict given a certain proportion and so this is actually not true, expected3435

frequencies do not have to be whole numbers because they are theoretical, they are not actually things that we counted up in real life.3445

A high value of chi-square indicates high level of agreement between observed frequencies and the expected frequencies.3452

Actually if you think about the chi-square statistic, this is actually the opposite of what is the real case.3462

If we had a high level of agreement this number would be very small and because this numerator is small3472

the chi-square would also be small, a high value of chi-square would actually mean that this is quite large3479

compared to this and so this is actually also wrong, the opposite.3486

So that is it for chi-square goodness of fit test, join us next time on educator.com for chi-square test of homogeneity.3494

Related Books

Statistics by Witte, 10th Edition

Authors: Robert S. Witte, John S . Witte

ISBN: 1118450531

Publisher: Wiley

Year: 2013

This book provides a clear and methodical approach to essential statistical procedures. It clearly explains the basic concepts and procedures of descriptive and inferential statistical analysis. This book features a new emphasis on expressions involving sums of squares and degrees of freedom as well as a stronger stress on the importance of variability.

Related Books

Name	Description	Link
BookRenter.com	BookRenter.com is simply the most reliable online textbook rental service.	Visit BookRenter.com
PhysicsForums.com Homework Help	Physics Forums is a scientific community for students looking for math & science help.	Visit PhysicsForums.com Homework Help

Statistics Chi-Square Goodness-of-Fit Test

Share this knowledge with your friends!

Copy & Paste this embed code into your website’s HTML

Discussion

Answer Engine

Download Lecture Slides

Table of Contents

Transcription

Related Books

Answer EngineGet answers to any question!Ask any question related to Statistics

Chi-Square Goodness-of-Fit Test

General Statistics Online Course

Transcription: Chi-Square Goodness-of-Fit Test

Related Books

Related Books

Start Learning Now

Membership Overview

Statistics Chi-Square Goodness-of-Fit Test

Share this knowledge with your friends!

Copy & Paste this embed code into your website’s HTML

Discussion

Answer Engine

Download Lecture Slides

Table of Contents

Transcription

Related Books

Answer EngineGet answers to any question!Ask any question related to Statistics

Chi-Square Goodness-of-Fit Test

General Statistics Online Course

Transcription: Chi-Square Goodness-of-Fit Test

Related Books

Related Books

Available 24/7. Unlimited Access to Our Entire Library.

Searchable Lessons

Get Answers & Community Support

Downloadable Lecture Notes

Study Guides, Worksheets and Extra Example Lessons

Start Learning Now

Membership Overview