Dr. Ji Son

Least Squares Regression

Slide Duration:

Table of Contents

Section 1: Introduction

Descriptive Statistics vs. Inferential Statistics

25m 31s

Intro

0:00

Roadmap

0:10

Roadmap

0:11

Statistics

0:35

Statistics

0:36

Let's Think About High School Science

1:12

Measurement and Find Patterns (Mathematical Formula)

1:13

Statistics = Math of Distributions

4:58

Distributions

4:59

Problematic… but also GREAT

5:58

Statistics

7:33

How is It Different from Other Specializations in Mathematics?

7:34

Statistics is Fundamental in Natural and Social Sciences

7:53

Two Skills of Statistics

8:20

Description (Exploration)

8:21

Inference

9:13

Descriptive Statistics vs. Inferential Statistics: Apply to Distributions

9:58

Descriptive Statistics

9:59

Inferential Statistics

11:05

Populations vs. Samples

12:19

Populations vs. Samples: Is it the Truth?

12:20

Populations vs. Samples: Pros & Cons

13:36

Populations vs. Samples: Descriptive Values

16:12

Putting Together Descriptive/Inferential Stats & Populations/Samples

17:10

Putting Together Descriptive/Inferential Stats & Populations/Samples

17:11

Example 1: Descriptive Statistics vs. Inferential Statistics

19:09

Example 2: Descriptive Statistics vs. Inferential Statistics

20:47

Example 3: Sample, Parameter, Population, and Statistic

21:40

Example 4: Sample, Parameter, Population, and Statistic

23:28

Section 2: About Samples: Cases, Variables, Measurements

About Samples: Cases, Variables, Measurements

32m 14s

Intro

0:00

Data

0:09

Data, Cases, Variables, and Values

0:10

Rows, Columns, and Cells

2:03

Example: Aircrafts

3:52

How Do We Get Data?

5:38

Research: Question and Hypothesis

5:39

Research Design

7:11

Measurement

7:29

Research Analysis

8:33

Research Conclusion

9:30

Types of Variables

10:03

Discrete Variables

10:04

Continuous Variables

12:07

Types of Measurements

14:17

Types of Measurements

14:18

Types of Measurements (Scales)

17:22

Nominal

17:23

Ordinal

19:11

Interval

21:33

Ratio

24:24

Example 1: Cases, Variables, Measurements

25:20

Example 2: Which Scale of Measurement is Used?

26:55

Example 3: What Kind of a Scale of Measurement is This?

27:26

Example 4: Discrete vs. Continuous Variables.

30:31

Section 3: Visualizing Distributions

Introduction to Excel

8m 9s

Intro

0:00

Before Visualizing Distribution

0:10

Excel

0:11

Excel: Organization

0:45

Workbook

0:46

Column x Rows

1:50

Tools: Menu Bar, Standard Toolbar, and Formula Bar

3:00

Excel + Data

6:07

Exce and Data

6:08

Frequency Distributions in Excel

39m 10s

Intro

0:00

Roadmap

0:08

Data in Excel and Frequency Distributions

0:09

Raw Data to Frequency Tables

0:42

Raw Data to Frequency Tables

0:43

Frequency Tables: Using Formulas and Pivot Tables

1:28

Example 1: Number of Births

7:17

Example 2: Age Distribution

20:41

Example 3: Height Distribution

27:45

Example 4: Height Distribution of Males

32:19

Frequency Distributions and Features

25m 29s

Intro

0:00

Roadmap

0:10

Data in Excel, Frequency Distributions, and Features of Frequency Distributions

0:11

Example #1

1:35

Uniform

1:36

Example #2

2:58

Unimodal, Skewed Right, and Asymmetric

2:59

Example #3

6:29

Bimodal

6:30

Example #4a

8:29

Symmetric, Unimodal, and Normal

8:30

Point of Inflection and Standard Deviation

11:13

Example #4b

12:43

Normal Distribution

12:44

Summary

13:56

Uniform, Skewed, Bimodal, and Normal

13:57

Sketch Problem 1: Driver's License

17:34

Sketch Problem 2: Life Expectancy

20:01

Sketch Problem 3: Telephone Numbers

22:01

Sketch Problem 4: Length of Time Used to Complete a Final Exam

23:43

Dotplots and Histograms in Excel

42m 42s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Previously

1:02

Data, Frequency Table, and visualization

1:03

Dotplots

1:22

Dotplots Excel Example

1:23

Dotplots: Pros and Cons

7:22

Pros and Cons of Dotplots

7:23

Dotplots Excel Example Cont.

9:07

Histograms

12:47

Histograms Overview

12:48

Example of Histograms

15:29

Histograms: Pros and Cons

31:39

Pros

31:40

Cons

32:31

Frequency vs. Relative Frequency

32:53

Frequency

32:54

Relative Frequency

33:36

Example 1: Dotplots vs. Histograms

34:36

Example 2: Age of Pennies Dotplot

36:21

Example 3: Histogram of Mammal Speeds

38:27

Example 4: Histogram of Life Expectancy

40:30

Stemplots

12m 23s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

What Sets Stemplots Apart?

0:46

Data Sets, Dotplots, Histograms, and Stemplots

0:47

Example 1: What Do Stemplots Look Like?

1:58

Example 2: Back-to-Back Stemplots

5:00

Example 3: Quiz Grade Stemplot

7:46

Example 4: Quiz Grade & Afterschool Tutoring Stemplot

9:56

Bar Graphs

22m 49s

Intro

0:00

Roadmap

0:05

Roadmap

0:08

Review of Frequency Distributions

0:44

Y-axis and X-axis

0:45

Types of Frequency Visualizations Covered so Far

2:16

Introduction to Bar Graphs

4:07

Example 1: Bar Graph

5:32

Example 1: Bar Graph

5:33

Do Shapes, Center, and Spread of Distributions Apply to Bar Graphs?

11:07

Do Shapes, Center, and Spread of Distributions Apply to Bar Graphs?

11:08

Example 2: Create a Frequency Visualization for Gender

14:02

Example 3: Cases, Variables, and Frequency Visualization

16:34

Example 4: What Kind of Graphs are Shown Below?

19:29

Section 4: Summarizing Distributions

Central Tendency: Mean, Median, Mode

38m 50s

Intro

0:00

Roadmap

0:07

Roadmap

0:08

Central Tendency 1

0:56

Way to Summarize a Distribution of Scores

0:57

Mode

1:32

Median

2:02

Mean

2:36

Central Tendency 2

3:47

Mode

3:48

Median

4:20

Mean

5:25

Summation Symbol

6:11

Summation Symbol

6:12

Population vs. Sample

10:46

Population vs. Sample

10:47

Excel Examples

15:08

Finding Mode, Median, and Mean in Excel

15:09

Median vs. Mean

21:45

Effect of Outliers

21:46

Relationship Between Parameter and Statistic

22:44

Type of Measurements

24:00

Which Distributions to Use With

24:55

Example 1: Mean

25:30

Example 2: Using Summation Symbol

29:50

Example 3: Average Calorie Count

32:50

Example 4: Creating an Example Set

35:46

Variability

42m 40s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Variability (or Spread)

0:45

Variability (or Spread)

0:46

Things to Think About

5:45

Things to Think About

5:46

Range, Quartiles and Interquartile Range

6:37

Range

6:38

Interquartile Range

8:42

Interquartile Range Example

10:58

Interquartile Range Example

10:59

Variance and Standard Deviation

12:27

Deviations

12:28

Sum of Squares

14:35

Variance

16:55

Standard Deviation

17:44

Sum of Squares (SS)

18:34

Sum of Squares (SS)

18:35

Population vs. Sample SD

22:00

Population vs. Sample SD

22:01

Population vs. Sample

23:20

Mean

23:21

23:51

Example 1: Find the Mean and Standard Deviation of the Variable Friends in the Excel File

27:21

Example 2: Find the Mean and Standard Deviation of the Tagged Photos in the Excel File

35:25

Example 3: Sum of Squares

38:58

Example 4: Standard Deviation

41:48

Five Number Summary & Boxplots

57m 15s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Summarizing Distributions

0:37

Shape, Center, and Spread

0:38

5 Number Summary

1:14

Boxplot: Visualizing 5 Number Summary

3:37

Boxplot: Visualizing 5 Number Summary

3:38

Boxplots on Excel

9:01

Using 'Stocks' and Using Stacked Columns

9:02

Boxplots on Excel Example

10:14

When are Boxplots Useful?

32:14

Pros

32:15

Cons

32:59

How to Determine Outlier Status

33:24

Rule of Thumb: Upper Limit

33:25

Rule of Thumb: Lower Limit

34:16

Signal Outliers in an Excel Data File Using Conditional Formatting

34:52

Modified Boxplot

48:38

Modified Boxplot

48:39

Example 1: Percentage Values & Lower and Upper Whisker

49:10

Example 2: Boxplot

50:10

Example 3: Estimating IQR From Boxplot

53:46

Example 4: Boxplot and Missing Whisker

54:35

Shape: Calculating Skewness & Kurtosis

41m 51s

Intro

0:00

Roadmap

0:16

Roadmap

0:17

Skewness Concept

1:09

Skewness Concept

1:10

Calculating Skewness

3:26

Calculating Skewness

3:27

Interpreting Skewness

7:36

Interpreting Skewness

7:37

Excel Example

8:49

Kurtosis Concept

20:29

Kurtosis Concept

20:30

Calculating Kurtosis

24:17

Calculating Kurtosis

24:18

Interpreting Kurtosis

29:01

Leptokurtic

29:35

Mesokurtic

30:10

Platykurtic

31:06

Excel Example

32:04

Example 1: Shape of Distribution

38:28

Example 2: Shape of Distribution

39:29

Example 3: Shape of Distribution

40:14

Example 4: Kurtosis

41:10

Normal Distribution

34m 33s

Intro

0:00

Roadmap

0:13

Roadmap

0:14

What is a Normal Distribution

0:44

The Normal Distribution As a Theoretical Model

0:45

Possible Range of Probabilities

3:05

Possible Range of Probabilities

3:06

What is a Normal Distribution

5:07

Can Be Described By

5:08

Properties

5:49

'Same' Shape: Illusion of Different Shape!

7:35

'Same' Shape: Illusion of Different Shape!

7:36

Types of Problems

13:45

Example: Distribution of SAT Scores

13:46

Shape Analogy

19:48

Shape Analogy

19:49

Example 1: The Standard Normal Distribution and Z-Scores

22:34

Example 2: The Standard Normal Distribution and Z-Scores

25:54

Example 3: Sketching and Normal Distribution

28:55

Example 4: Sketching and Normal Distribution

32:32

Standard Normal Distributions & Z-Scores

41m 44s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

A Family of Distributions

0:28

Infinite Set of Distributions

0:29

Transforming Normal Distributions to 'Standard' Normal Distribution

1:04

Normal Distribution vs. Standard Normal Distribution

2:58

Normal Distribution vs. Standard Normal Distribution

2:59

Z-Score, Raw Score, Mean, & SD

4:08

Z-Score, Raw Score, Mean, & SD

4:09

Weird Z-Scores

9:40

Weird Z-Scores

9:41

Excel

16:45

For Normal Distributions

16:46

For Standard Normal Distributions

19:11

Excel Example

20:24

Types of Problems

25:18

Percentage Problem: P(x)

25:19

Raw Score and Z-Score Problems

26:28

Standard Deviation Problems

27:01

Shape Analogy

27:44

Shape Analogy

27:45

Example 1: Deaths Due to Heart Disease vs. Deaths Due to Cancer

28:24

Example 2: Heights of Male College Students

33:15

Example 3: Mean and Standard Deviation

37:14

Example 4: Finding Percentage of Values in a Standard Normal Distribution

37:49

Normal Distribution: PDF vs. CDF

55m 44s

Intro

0:00

Roadmap

0:15

Roadmap

0:16

Frequency vs. Cumulative Frequency

0:56

Frequency vs. Cumulative Frequency

0:57

Frequency vs. Cumulative Frequency

4:32

Frequency vs. Cumulative Frequency Cont.

4:33

Calculus in Brief

6:21

Derivative-Integral Continuum

6:22

PDF

10:08

PDF for Standard Normal Distribution

10:09

PDF for Normal Distribution

14:32

Integral of PDF = CDF

21:27

Integral of PDF = CDF

21:28

Example 1: Cumulative Frequency Graph

23:31

Example 2: Mean, Standard Deviation, and Probability

24:43

Example 3: Mean and Standard Deviation

35:50

Example 4: Age of Cars

49:32

Section 5: Linear Regression

Scatterplots

47m 19s

Intro

0:00

Roadmap

0:04

Roadmap

0:05

Previous Visualizations

0:30

Frequency Distributions

0:31

Compare & Contrast

2:26

Frequency Distributions Vs. Scatterplots

2:27

Summary Values

4:53

Shape

4:54

Center & Trend

6:41

Spread & Strength

8:22

Univariate & Bivariate

10:25

Example Scatterplot

10:48

Shape, Trend, and Strength

10:49

Positive and Negative Association

14:05

Positive and Negative Association

14:06

Linearity, Strength, and Consistency

18:30

Linearity

18:31

Strength

19:14

Consistency

20:40

Summarizing a Scatterplot

22:58

Summarizing a Scatterplot

22:59

Example 1: Gapminder.org, Income x Life Expectancy

26:32

Example 2: Gapminder.org, Income x Infant Mortality

36:12

Example 3: Trend and Strength of Variables

40:14

Example 4: Trend, Strength and Shape for Scatterplots

43:27

Regression

32m 2s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Linear Equations

0:34

Linear Equations: y = mx + b

0:35

Rough Line

5:16

Rough Line

5:17

Regression - A 'Center' Line

7:41

Reasons for Summarizing with a Regression Line

7:42

Predictor and Response Variable

10:04

Goal of Regression

12:29

Goal of Regression

12:30

Prediction

14:50

Example: Servings of Mile Per Year Shown By Age

14:51

Intrapolation

17:06

Extrapolation

17:58

Error in Prediction

20:34

Prediction Error

20:35

Residual

21:40

Example 1: Residual

23:34

Example 2: Large and Negative Residual

26:30

Example 3: Positive Residual

28:13

Example 4: Interpret Regression Line & Extrapolate

29:40

Least Squares Regression

56m 36s

Intro

0:00

Roadmap

0:13

Roadmap

0:14

Best Fit

0:47

Best Fit

0:48

Sum of Squared Errors (SSE)

1:50

Sum of Squared Errors (SSE)

1:51

Why Squared?

3:38

Why Squared?

3:39

Quantitative Properties of Regression Line

4:51

Quantitative Properties of Regression Line

4:52

So How do we Find Such a Line?

6:49

SSEs of Different Line Equations & Lowest SSE

6:50

Carl Gauss' Method

8:01

How Do We Find Slope (b1)

11:00

How Do We Find Slope (b1)

11:01

Hoe Do We Find Intercept

15:11

Hoe Do We Find Intercept

15:12

Example 1: Which of These Equations Fit the Above Data Best?

17:18

Example 2: Find the Regression Line for These Data Points and Interpret It

26:31

Example 3: Summarize the Scatterplot and Find the Regression Line.

34:31

Example 4: Examine the Mean of Residuals

43:52

Correlation

43m 58s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Summarizing a Scatterplot Quantitatively

0:47

Shape

0:48

Trend

1:11

Strength: Correlation ®

1:45

Correlation Coefficient ( r )

2:30

Correlation Coefficient ( r )

2:31

Trees vs. Forest

11:59

Trees vs. Forest

12:00

Calculating r

15:07

Average Product of z-scores for x and y

15:08

Relationship between Correlation and Slope

21:10

Relationship between Correlation and Slope

21:11

Example 1: Find the Correlation between Grams of Fat and Cost

24:11

Example 2: Relationship between r and b1

30:24

Example 3: Find the Regression Line

33:35

Example 4: Find the Correlation Coefficient for this Set of Data

37:37

Correlation: r vs. r-squared

52m 52s

Intro

0:00

Roadmap

0:07

Roadmap

0:08

R-squared

0:44

What is the Meaning of It? Why Squared?

0:45

Parsing Sum of Squared (Parsing Variability)

2:25

SST = SSR + SSE

2:26

What is SST and SSE?

7:46

What is SST and SSE?

7:47

r-squared

18:33

Coefficient of Determination

18:34

If the Correlation is Strong…

20:25

If the Correlation is Strong…

20:26

If the Correlation is Weak…

22:36

If the Correlation is Weak…

22:37

Example 1: Find r-squared for this Set of Data

23:56

Example 2: What Does it Mean that the Simple Linear Regression is a 'Model' of Variance?

33:54

Example 3: Why Does r-squared Only Range from 0 to 1

37:29

Example 4: Find the r-squared for This Set of Data

39:55

Transformations of Data

27m 8s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Why Transform?

0:26

Why Transform?

0:27

Shape-preserving vs. Shape-changing Transformations

5:14

Shape-preserving = Linear Transformations

5:15

Shape-changing Transformations = Non-linear Transformations

6:20

Common Shape-Preserving Transformations

7:08

Common Shape-Preserving Transformations

7:09

Common Shape-Changing Transformations

8:59

Powers

9:00

Logarithms

9:39

Change Just One Variable? Both?

10:38

Log-log Transformations

10:39

Log Transformations

14:38

Example 1: Create, Graph, and Transform the Data Set

15:19

Example 2: Create, Graph, and Transform the Data Set

20:08

Example 3: What Kind of Model would You Choose for this Data?

22:44

Example 4: Transformation of Data

25:46

Section 6: Collecting Data in an Experiment

Sampling & Bias

54m 44s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Descriptive vs. Inferential Statistics

1:04

Descriptive Statistics: Data Exploration

1:05

Example

2:03

To tackle Generalization…

4:31

Generalization

4:32

Sampling

6:06

'Good' Sample

6:40

Defining Samples and Populations

8:55

Population

8:56

Sample

11:16

Why Use Sampling?

13:09

Why Use Sampling?

13:10

Goal of Sampling: Avoiding Bias

15:04

What is Bias?

15:05

Where does Bias Come from: Sampling Bias

17:53

Where does Bias Come from: Response Bias

18:27

Sampling Bias: Bias from Bas Sampling Methods

19:34

Size Bias

19:35

Voluntary Response Bias

21:13

Convenience Sample

22:22

Judgment Sample

23:58

Inadequate Sample Frame

25:40

Response Bias: Bias from 'Bad' Data Collection Methods

28:00

Nonresponse Bias

29:31

Questionnaire Bias

31:10

Incorrect Response or Measurement Bias

37:32

Example 1: What Kind of Biases?

40:29

Example 2: What Biases Might Arise?

44:46

Example 3: What Kind of Biases?

48:34

Example 4: What Kind of Biases?

51:43

Sampling Methods

14m 25s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Biased vs. Unbiased Sampling Methods

0:32

Biased Sampling

0:33

Unbiased Sampling

1:13

Probability Sampling Methods

2:31

Simple Random

2:54

Stratified Random Sampling

4:06

Cluster Sampling

5:24

Two-staged Sampling

6:22

Systematic Sampling

7:25

Example 1: Which Type(s) of Sampling was this?

8:33

Example 2: Describe How to Take a Two-Stage Sample from this Book

10:16

Example 3: Sampling Methods

11:58

Example 4: Cluster Sample Plan

12:48

Research Design

53m 54s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Descriptive vs. Inferential Statistics

0:51

Descriptive Statistics: Data Exploration

0:52

Inferential Statistics

1:02

Variables and Relationships

1:44

Variables

1:45

Relationships

2:49

Not Every Type of Study is an Experiment…

4:16

Category I - Descriptive Study

4:54

Category II - Correlational Study

5:50

Category III - Experimental, Quasi-experimental, Non-experimental

6:33

Category III

7:42

Experimental, Quasi-experimental, and Non-experimental

7:43

Why CAN'T the Other Strategies Determine Causation?

10:18

Third-variable Problem

10:19

Directionality Problem

15:49

What Makes Experiments Special?

17:54

Manipulation

17:55

Control (and Comparison)

21:58

Methods of Control

26:38

Holding Constant

26:39

Matching

29:11

Random Assignment

31:48

Experiment Terminology

34:09

'true' Experiment vs. Study

34:10

Independent Variable (IV)

35:16

Dependent Variable (DV)

35:45

Factors

36:07

Treatment Conditions

36:23

Levels

37:43

Confounds or Extraneous Variables

38:04

Blind

38:38

Blind Experiments

38:39

Double-blind Experiments

39:29

How Categories Relate to Statistics

41:35

Category I - Descriptive Study

41:36

Category II - Correlational Study

42:05

Category III - Experimental, Quasi-experimental, Non-experimental

42:43

Example 1: Research Design

43:50

Example 2: Research Design

47:37

Example 3: Research Design

50:12

Example 4: Research Design

52:00

Between and Within Treatment Variability

41m 31s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Experimental Designs

0:51

Experimental Designs: Manipulation & Control

0:52

Two Types of Variability

2:09

Between Treatment Variability

2:10

Within Treatment Variability

3:31

Updated Goal of Experimental Design

5:47

Updated Goal of Experimental Design

5:48

Example: Drugs and Driving

6:56

Example: Drugs and Driving

6:57

Different Types of Random Assignment

11:27

All Experiments

11:28

Completely Random Design

12:02

Randomized Block Design

13:19

Randomized Block Design

15:48

Matched Pairs Design

15:49

Repeated Measures Design

19:47

Between-subject Variable vs. Within-subject Variable

22:43

Completely Randomized Design

22:44

Repeated Measures Design

25:03

Example 1: Design a Completely Random, Matched Pair, and Repeated Measures Experiment

26:16

Example 2: Block Design

31:41

Example 3: Completely Randomized Designs

35:11

Example 4: Completely Random, Matched Pairs, or Repeated Measures Experiments?

39:01

Section 7: Review of Probability Axioms

Sample Spaces

37m 52s

Intro

0:00

Roadmap

0:07

Roadmap

0:08

Why is Probability Involved in Statistics

0:48

Probability

0:49

Can People Tell the Difference between Cheap and Gourmet Coffee?

2:08

Taste Test with Coffee Drinkers

3:37

If No One can Actually Taste the Difference

3:38

If Everyone can Actually Taste the Difference

5:36

Creating a Probability Model

7:09

Creating a Probability Model

7:10

D'Alembert vs. Necker

9:41

D'Alembert vs. Necker

9:42

Problem with D'Alembert's Model

13:29

Problem with D'Alembert's Model

13:30

Covering Entire Sample Space

15:08

Fundamental Principle of Counting

15:09

Where Do Probabilities Come From?

22:54

Observed Data, Symmetry, and Subjective Estimates

22:55

Checking whether Model Matches Real World

24:27

Law of Large Numbers

24:28

Example 1: Law of Large Numbers

27:46

Example 2: Possible Outcomes

30:43

Example 3: Brands of Coffee and Taste

33:25

Example 4: How Many Different Treatments are there?

35:33

Addition Rule for Disjoint Events

20m 29s

Intro

0:00

Roadmap

0:08

Roadmap

0:09

Disjoint Events

0:41

Disjoint Events

0:42

Meaning of 'or'

2:39

In Regular Life

2:40

In Math/Statistics/Computer Science

3:10

Addition Rule for Disjoin Events

3:55

If A and B are Disjoint: P (A and B)

3:56

If A and B are Disjoint: P (A or B)

5:15

General Addition Rule

5:41

General Addition Rule

5:42

Generalized Addition Rule

8:31

If A and B are not Disjoint: P (A or B)

8:32

Example 1: Which of These are Mutually Exclusive?

10:50

Example 2: What is the Probability that You will Have a Combination of One Heads and Two Tails?

12:57

Example 3: Engagement Party

15:17

Example 4: Home Owner's Insurance

18:30

Conditional Probability

57m 19s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

'or' vs. 'and' vs. Conditional Probability

1:07

'or' vs. 'and' vs. Conditional Probability

1:08

'and' vs. Conditional Probability

5:57

P (M or L)

5:58

P (M and L)

8:41

P (M|L)

11:04

P (L|M)

12:24

Tree Diagram

15:02

Tree Diagram

15:03

Defining Conditional Probability

22:42

Defining Conditional Probability

22:43

Common Contexts for Conditional Probability

30:56

Medical Testing: Positive Predictive Value

30:57

Medical Testing: Sensitivity

33:03

Statistical Tests

34:27

Example 1: Drug and Disease

36:41

Example 2: Marbles and Conditional Probability

40:04

Example 3: Cards and Conditional Probability

45:59

Example 4: Votes and Conditional Probability

50:21

Independent Events

24m 27s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Independent Events & Conditional Probability

0:26

Non-independent Events

0:27

Independent Events

2:00

Non-independent and Independent Events

3:08

Non-independent and Independent Events

3:09

Defining Independent Events

5:52

Defining Independent Events

5:53

Multiplication Rule

7:29

Previously…

7:30

But with Independent Evens

8:53

Example 1: Which of These Pairs of Events are Independent?

11:12

Example 2: Health Insurance and Probability

15:12

Example 3: Independent Events

17:42

Example 4: Independent Events

20:03

Section 8: Probability Distributions

Introduction to Probability Distributions

56m 45s

Intro

0:00

Roadmap

0:08

Roadmap

0:09

Sampling vs. Probability

0:57

Sampling

0:58

Missing

1:30

What is Missing?

3:06

Insight: Probability Distributions

5:26

Insight: Probability Distributions

5:27

What is a Probability Distribution?

7:29

From Sample Spaces to Probability Distributions

8:44

Sample Space

8:45

Probability Distribution of the Sum of Two Die

11:16

The Random Variable

17:43

The Random Variable

17:44

Expected Value

21:52

Expected Value

21:53

Example 1: Probability Distributions

28:45

Example 2: Probability Distributions

35:30

Example 3: Probability Distributions

43:37

Example 4: Probability Distributions

47:20

Expected Value & Variance of Probability Distributions

53m 41s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Discrete vs. Continuous Random Variables

1:04

Discrete vs. Continuous Random Variables

1:05

Mean and Variance Review

4:44

Mean: Sample, Population, and Probability Distribution

4:45

Variance: Sample, Population, and Probability Distribution

9:12

Example Situation

14:10

Example Situation

14:11

Some Special Cases…

16:13

Some Special Cases…

16:14

Linear Transformations

19:22

Linear Transformations

19:23

What Happens to Mean and Variance of the Probability Distribution?

20:12

n Independent Values of X

25:38

n Independent Values of X

25:39

Compare These Two Situations

30:56

Compare These Two Situations

30:57

Two Random Variables, X and Y

32:02

Two Random Variables, X and Y

32:03

Example 1: Expected Value & Variance of Probability Distributions

35:35

Example 2: Expected Values & Standard Deviation

44:17

Example 3: Expected Winnings and Standard Deviation

48:18

Binomial Distribution

55m 15s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Discrete Probability Distributions

1:42

Discrete Probability Distributions

1:43

Binomial Distribution

2:36

Binomial Distribution

2:37

Multiplicative Rule Review

6:54

Multiplicative Rule Review

6:55

How Many Outcomes with k 'Successes'

10:23

Adults and Bachelor's Degree: Manual List of Outcomes

10:24

P (X=k)

19:37

Putting Together # of Outcomes with the Multiplicative Rule

19:38

Expected Value and Standard Deviation in a Binomial Distribution

25:22

Expected Value and Standard Deviation in a Binomial Distribution

25:23

Example 1: Coin Toss

33:42

Example 2: College Graduates

38:03

Example 3: Types of Blood and Probability

45:39

Example 4: Expected Number and Standard Deviation

51:11

Section 9: Sampling Distributions of Statistics

Introduction to Sampling Distributions

48m 17s

Intro

0:00

Roadmap

0:08

Roadmap

0:09

Probability Distributions vs. Sampling Distributions

0:55

Probability Distributions vs. Sampling Distributions

0:56

Same Logic

3:55

Logic of Probability Distribution

3:56

Example: Rolling Two Die

6:56

Simulating Samples

9:53

To Come Up with Probability Distributions

9:54

In Sampling Distributions

11:12

Connecting Sampling and Research Methods with Sampling Distributions

12:11

Connecting Sampling and Research Methods with Sampling Distributions

12:12

Simulating a Sampling Distribution

14:14

Experimental Design: Regular Sleep vs. Less Sleep

14:15

Logic of Sampling Distributions

23:08

Logic of Sampling Distributions

23:09

General Method of Simulating Sampling Distributions

25:38

General Method of Simulating Sampling Distributions

25:39

Questions that Remain

28:45

Questions that Remain

28:46

Example 1: Mean and Standard Error of Sampling Distribution

30:57

Example 2: What is the Best Way to Describe Sampling Distributions?

37:12

Example 3: Matching Sampling Distributions

38:21

Example 4: Mean and Standard Error of Sampling Distribution

41:51

Sampling Distribution of the Mean

1h 8m 48s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Special Case of General Method for Simulating a Sampling Distribution

1:53

Special Case of General Method for Simulating a Sampling Distribution

1:54

Computer Simulation

3:43

Using Simulations to See Principles behind Shape of SDoM

15:50

Using Simulations to See Principles behind Shape of SDoM

15:51

Conditions

17:38

Using Simulations to See Principles behind Center (Mean) of SDoM

20:15

Using Simulations to See Principles behind Center (Mean) of SDoM

20:16

Conditions: Does n Matter?

21:31

Conditions: Does Number of Simulation Matter?

24:37

Using Simulations to See Principles behind Standard Deviation of SDoM

27:13

Using Simulations to See Principles behind Standard Deviation of SDoM

27:14

Conditions: Does n Matter?

34:45

Conditions: Does Number of Simulation Matter?

36:24

Central Limit Theorem

37:13

SHAPE

38:08

CENTER

39:34

SPREAD

39:52

Comparing Population, Sample, and SDoM

43:10

Comparing Population, Sample, and SDoM

43:11

Answering the 'Questions that Remain'

48:24

What Happens When We Don't Know What the Population Looks Like?

48:25

Can We Have Sampling Distributions for Summary Statistics Other than the Mean?

49:42

How Do We Know whether a Sample is Sufficiently Unlikely?

53:36

Do We Always Have to Simulate a Large Number of Samples in Order to get a Sampling Distribution?

54:40

Example 1: Mean Batting Average

55:25

Example 2: Mean Sampling Distribution and Standard Error

59:07

Example 3: Sampling Distribution of the Mean

1:01:04

Sampling Distribution of Sample Proportions

54m 37s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Intro to Sampling Distribution of Sample Proportions (SDoSP)

0:51

Categorical Data (Examples)

0:52

Wish to Estimate Proportion of Population from Sample…

2:00

Notation

3:34

Population Proportion and Sample Proportion Notations

3:35

What's the Difference?

9:19

SDoM vs. SDoSP: Type of Data

9:20

SDoM vs. SDoSP: Shape

11:24

SDoM vs. SDoSP: Center

12:30

SDoM vs. SDoSP: Spread

15:34

Binomial Distribution vs. Sampling Distribution of Sample Proportions

19:14

Binomial Distribution vs. SDoSP: Type of Data

19:17

Binomial Distribution vs. SDoSP: Shape

21:07

Binomial Distribution vs. SDoSP: Center

21:43

Binomial Distribution vs. SDoSP: Spread

24:08

Example 1: Sampling Distribution of Sample Proportions

26:07

Example 2: Sampling Distribution of Sample Proportions

37:58

Example 3: Sampling Distribution of Sample Proportions

44:42

Example 4: Sampling Distribution of Sample Proportions

45:57

Section 10: Inferential Statistics

Introduction to Confidence Intervals

42m 53s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Inferential Statistics

0:50

Inferential Statistics

0:51

Two Problems with This Picture…

3:20

Two Problems with This Picture…

3:21

Solution: Confidence Intervals (CI)

4:59

Solution: Hypotheiss Testing (HT)

5:49

Which Parameters are Known?

6:45

Which Parameters are Known?

6:46

Confidence Interval - Goal

7:56

When We Don't Know m but know s

7:57

When We Don't Know

18:27

When We Don't Know m nor s

18:28

Example 1: Confidence Intervals

26:18

Example 2: Confidence Intervals

29:46

Example 3: Confidence Intervals

32:18

Example 4: Confidence Intervals

38:31

t Distributions

1h 2m 6s

Intro

0:00

Roadmap

0:04

Roadmap

0:05

When to Use z vs. t?

1:07

When to Use z vs. t?

1:08

What is z and t?

3:02

z-score and t-score: Commonality

3:03

z-score and t-score: Formulas

3:34

z-score and t-score: Difference

5:22

Why not z? (Why t?)

7:24

Why not z? (Why t?)

7:25

But Don't Worry!

15:13

Gossett and t-distributions

15:14

Rules of t Distributions

17:05

t-distributions are More Normal as n Gets Bigger

17:06

t-distributions are a Family of Distributions

18:55

Degrees of Freedom (df)

20:02

Degrees of Freedom (df)

20:03

t Family of Distributions

24:07

t Family of Distributions : df = 2 , 4, and 60

24:08

df = 60

29:16

df = 2

29:59

How to Find It?

31:01

'Student's t-distribution' or 't-distribution'

31:02

Excel Example

33:06

Example 1: Which Distribution Do You Use? Z or t?

45:26

Example 2: Friends on Facebook

47:41

Example 3: t Distributions

52:15

Example 4: t Distributions , confidence interval, and mean

55:59

Introduction to Hypothesis Testing

1h 6m 33s

Intro

0:00

Roadmap

0:06

Roadmap

0:07

Issues to Overcome in Inferential Statistics

1:35

Issues to Overcome in Inferential Statistics

1:36

What Happens When We Don't Know What the Population Looks Like?

2:57

How Do We Know whether a sample is Sufficiently Unlikely

3:43

Hypothesizing a Population

6:44

Hypothesizing a Population

6:45

Null Hypothesis

8:07

Alternative Hypothesis

8:56

Hypotheses

11:58

Hypotheses

11:59

Errors in Hypothesis Testing

14:22

Errors in Hypothesis Testing

14:23

Steps of Hypothesis Testing

21:15

Steps of Hypothesis Testing

21:16

Single Sample HT ( When Sigma Available)

26:08

Example: Average Facebook Friends

26:09

Step1

27:08

Step 2

27:58

Step 3

28:17

Step 4

32:18

Single Sample HT (When Sigma Not Available)

36:33

Example: Average Facebook Friends

36:34

Step1: Hypothesis Testing

36:58

Step 2: Significance Level

37:25

Step 3: Decision Stage

37:40

Step 4: Sample

41:36

Sigma and p-value

45:04

Sigma and p-value

45:05

On tailed vs. Two Tailed Hypotheses

45:51

Example 1: Hypothesis Testing

48:37

Example 2: Heights of Women in the US

57:43

Example 3: Select the Best Way to Complete This Sentence

1:03:23

Confidence Intervals for the Difference of Two Independent Means

55m 14s

Intro

0:00

Roadmap

0:14

Roadmap

0:15

One Mean vs. Two Means

1:17

One Mean vs. Two Means

1:18

Notation

2:41

A Sample! A Set!

2:42

Mean of X, Mean of Y, and Difference of Two Means

3:56

SE of X

4:34

SE of Y

6:28

Sampling Distribution of the Difference between Two Means (SDoD)

7:48

Sampling Distribution of the Difference between Two Means (SDoD)

7:49

Rules of the SDoD (similar to CLT!)

15:00

Mean for the SDoD Null Hypothesis

15:01

Standard Error

17:39

When can We Construct a CI for the Difference between Two Means?

21:28

Three Conditions

21:29

Finding CI

23:56

One Mean CI

23:57

Two Means CI

25:45

Finding t

29:16

Finding t

29:17

Interpreting CI

30:25

Interpreting CI

30:26

Better Estimate of s (s pool)

34:15

Better Estimate of s (s pool)

34:16

Example 1: Confidence Intervals

42:32

Example 2: SE of the Difference

52:36

Hypothesis Testing for the Difference of Two Independent Means

50m

Intro

0:00

Roadmap

0:06

Roadmap

0:07

The Goal of Hypothesis Testing

0:56

One Sample and Two Samples

0:57

Sampling Distribution of the Difference between Two Means (SDoD)

3:42

Sampling Distribution of the Difference between Two Means (SDoD)

3:43

Rules of the SDoD (Similar to CLT!)

6:46

Shape

6:47

Mean for the Null Hypothesis

7:26

Standard Error for Independent Samples (When Variance is Homogenous)

8:18

Standard Error for Independent Samples (When Variance is not Homogenous)

9:25

Same Conditions for HT as for CI

10:08

Three Conditions

10:09

Steps of Hypothesis Testing

11:04

Steps of Hypothesis Testing

11:05

Formulas that Go with Steps of Hypothesis Testing

13:21

Step 1

13:25

Step 2

14:18

Step 3

15:00

Step 4

16:57

Example 1: Hypothesis Testing for the Difference of Two Independent Means

18:47

Example 2: Hypothesis Testing for the Difference of Two Independent Means

33:55

Example 3: Hypothesis Testing for the Difference of Two Independent Means

44:22

Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means

1h 14m 11s

Intro

0:00

Roadmap

0:09

Roadmap

0:10

The Goal of Hypothesis Testing

1:27

One Sample and Two Samples

1:28

Independent Samples vs. Paired Samples

3:16

Independent Samples vs. Paired Samples

3:17

Which is Which?

5:20

Independent SAMPLES vs. Independent VARIABLES

7:43

independent SAMPLES vs. Independent VARIABLES

7:44

T-tests Always…

10:48

T-tests Always…

10:49

Notation for Paired Samples

12:59

Notation for Paired Samples

13:00

Steps of Hypothesis Testing for Paired Samples

16:13

Steps of Hypothesis Testing for Paired Samples

16:14

Rules of the SDoD (Adding on Paired Samples)

18:03

Shape

18:04

Mean for the Null Hypothesis

18:31

Standard Error for Independent Samples (When Variance is Homogenous)

19:25

Standard Error for Paired Samples

20:39

Formulas that go with Steps of Hypothesis Testing

22:59

Formulas that go with Steps of Hypothesis Testing

23:00

Confidence Intervals for Paired Samples

30:32

Confidence Intervals for Paired Samples

30:33

Example 1: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means

32:28

Example 2: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means

44:02

Example 3: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means

52:23

Type I and Type II Errors

31m 27s

Intro

0:00

Roadmap

0:18

Roadmap

0:19

Errors and Relationship to HT and the Sample Statistic?

1:11

Errors and Relationship to HT and the Sample Statistic?

1:12

Instead of a Box…Distributions!

7:00

One Sample t-test: Friends on Facebook

7:01

Two Sample t-test: Friends on Facebook

13:46

Usually, Lots of Overlap between Null and Alternative Distributions

16:59

Overlap between Null and Alternative Distributions

17:00

How Distributions and 'Box' Fit Together

22:45

How Distributions and 'Box' Fit Together

22:46

Example 1: Types of Errors

25:54

Example 2: Types of Errors

27:30

Example 3: What is the Danger of the Type I Error?

29:38

Effect Size & Power

44m 41s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Distance between Distributions: Sample t

0:49

Distance between Distributions: Sample t

0:50

Problem with Distance in Terms of Standard Error

2:56

Problem with Distance in Terms of Standard Error

2:57

Test Statistic (t) vs. Effect Size (d or g)

4:38

Test Statistic (t) vs. Effect Size (d or g)

4:39

Rules of Effect Size

6:09

Rules of Effect Size

6:10

Why Do We Need Effect Size?

8:21

Tells You the Practical Significance

8:22

HT can be Deceiving…

10:25

Important Note

10:42

What is Power?

11:20

What is Power?

11:21

Why Do We Need Power?

14:19

Conditional Probability and Power

14:20

Power is:

16:27

Can We Calculate Power?

19:00

Can We Calculate Power?

19:01

How Does Alpha Affect Power?

20:36

How Does Alpha Affect Power?

20:37

How Does Effect Size Affect Power?

25:38

How Does Effect Size Affect Power?

25:39

How Does Variability and Sample Size Affect Power?

27:56

How Does Variability and Sample Size Affect Power?

27:57

How Do We Increase Power?

32:47

Increasing Power

32:48

Example 1: Effect Size & Power

35:40

Example 2: Effect Size & Power

37:38

Example 3: Effect Size & Power

40:55

Section 11: Analysis of Variance

F-distributions

24m 46s

Intro

0:00

Roadmap

0:04

Roadmap

0:05

Z- & T-statistic and Their Distribution

0:34

Z- & T-statistic and Their Distribution

0:35

F-statistic

4:55

The F Ration ( the Variance Ratio)

4:56

F-distribution

12:29

F-distribution

12:30

s and p-value

15:00

s and p-value

15:01

Example 1: Why Does F-distribution Stop At 0 But Go On Until Infinity?

18:33

Example 2: F-distributions

19:29

Example 3: F-distributions and Heights

21:29

ANOVA with Independent Samples

1h 9m 25s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

The Limitations of t-tests

1:12

The Limitations of t-tests

1:13

Two Major Limitations of Many t-tests

3:26

Two Major Limitations of Many t-tests

3:27

Ronald Fisher's Solution… F-test! New Null Hypothesis

4:43

Ronald Fisher's Solution… F-test! New Null Hypothesis (Omnibus Test - One Test to Rule Them All!)

4:44

Analysis of Variance (ANoVA) Notation

7:47

Analysis of Variance (ANoVA) Notation

7:48

Partitioning (Analyzing) Variance

9:58

Total Variance

9:59

Within-group Variation

14:00

Between-group Variation

16:22

Time out: Review Variance & SS

17:05

Time out: Review Variance & SS

17:06

F-statistic

19:22

The F Ratio (the Variance Ratio)

19:23

S²bet = SSbet / dfbet

22:13

What is This?

22:14

How Many Means?

23:20

So What is the dfbet?

23:38

So What is SSbet?

24:15

S²w = SSw / dfw

26:05

What is This?

26:06

How Many Means?

27:20

So What is the dfw?

27:36

So What is SSw?

28:18

Chart of Independent Samples ANOVA

29:25

Chart of Independent Samples ANOVA

29:26

Example 1: Who Uploads More Photos: Unknown Ethnicity, Latino, Asian, Black, or White Facebook Users?

35:52

Hypotheses

35:53

Significance Level

39:40

Decision Stage

40:05

Calculate Samples' Statistic and p-Value

44:10

Reject or Fail to Reject H0

55:54

Example 2: ANOVA with Independent Samples

58:21

Repeated Measures ANOVA

1h 15m 13s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

The Limitations of t-tests

0:36

Who Uploads more Pictures and Which Photo-Type is Most Frequently Used on Facebook?

0:37

ANOVA (F-test) to the Rescue!

5:49

Omnibus Hypothesis

5:50

Analyze Variance

7:27

Independent Samples vs. Repeated Measures

9:12

Same Start

9:13

Independent Samples ANOVA

10:43

Repeated Measures ANOVA

12:00

Independent Samples ANOVA

16:00

Same Start: All the Variance Around Grand Mean

16:01

Independent Samples

16:23

Repeated Measures ANOVA

18:18

Same Start: All the Variance Around Grand Mean

18:19

Repeated Measures

18:33

Repeated Measures F-statistic

21:22

The F Ratio (The Variance Ratio)

21:23

S²bet = SSbet / dfbet

23:07

What is This?

23:08

How Many Means?

23:39

So What is the dfbet?

23:54

So What is SSbet?

24:32

S² resid = SS resid / df resid

25:46

What is This?

25:47

So What is SS resid?

26:44

So What is the df resid?

27:36

SS subj and df subj

28:11

What is This?

28:12

How Many Subject Means?

29:43

So What is df subj?

30:01

So What is SS subj?

30:09

SS total and df total

31:42

What is This?

31:43

What is the Total Number of Data Points?

32:02

So What is df total?

32:34

so What is SS total?

32:47

Chart of Repeated Measures ANOVA

33:19

Chart of Repeated Measures ANOVA: F and Between-samples Variability

33:20

Chart of Repeated Measures ANOVA: Total Variability, Within-subject (case) Variability, Residual Variability

35:50

Example 1: Which is More Prevalent on Facebook: Tagged, Uploaded, Mobile, or Profile Photos?

40:25

Hypotheses

40:26

Significance Level

41:46

Decision Stage

42:09

Calculate Samples' Statistic and p-Value

46:18

Reject or Fail to Reject H0

57:55

Example 2: Repeated Measures ANOVA

58:57

Example 3: What's the Problem with a Bunch of Tiny t-tests?

1:13:59

Section 12: Chi-square Test

Chi-Square Goodness-of-Fit Test

58m 23s

Intro

0:00

Roadmap

0:05

Roadmap

0:06

Where Does the Chi-Square Test Belong?

0:50

Where Does the Chi-Square Test Belong?

0:51

A New Twist on HT: Goodness-of-Fit

7:23

HT in General

7:24

Goodness-of-Fit HT

8:26

Hypotheses about Proportions

12:17

Null Hypothesis

12:18

Alternative Hypothesis

13:23

Example

14:38

Chi-Square Statistic

17:52

Chi-Square Statistic

17:53

Chi-Square Distributions

24:31

Chi-Square Distributions

24:32

Conditions for Chi-Square

28:58

Condition 1

28:59

Condition 2

30:20

Condition 3

30:32

Condition 4

31:47

Example 1: Chi-Square Goodness-of-Fit Test

32:23

Example 2: Chi-Square Goodness-of-Fit Test

44:34

Example 3: Which of These Statements Describe Properties of the Chi-Square Goodness-of-Fit Test?

56:06

Chi-Square Test of Homogeneity

51m 36s

Intro

0:00

Roadmap

0:09

Roadmap

0:10

Goodness-of-Fit vs. Homogeneity

1:13

Goodness-of-Fit HT

1:14

Homogeneity

2:00

Analogy

2:38

Hypotheses About Proportions

5:00

Null Hypothesis

5:01

Alternative Hypothesis

6:11

Example

6:33

Chi-Square Statistic

10:12

Same as Goodness-of-Fit Test

10:13

Set Up Data

12:28

Setting Up Data Example

12:29

Expected Frequency

16:53

Expected Frequency

16:54

Chi-Square Distributions & df

19:26

Chi-Square Distributions & df

19:27

Conditions for Test of Homogeneity

20:54

Condition 1

20:55

Condition 2

21:39

Condition 3

22:05

Condition 4

22:23

Example 1: Chi-Square Test of Homogeneity

22:52

Example 2: Chi-Square Test of Homogeneity

32:10

Section 13: Overview of Statistics

Overview of Statistics

18m 11s

Intro

0:00

Roadmap

0:07

Roadmap

0:08

The Statistical Tests (HT) We've Covered

0:28

The Statistical Tests (HT) We've Covered

0:29

Organizing the Tests We've Covered…

1:08

One Sample: Continuous DV and Categorical DV

1:09

Two Samples: Continuous DV and Categorical DV

5:41

More Than Two Samples: Continuous DV and Categorical DV

8:21

The Following Data: OK Cupid

10:10

The Following Data: OK Cupid

10:11

Example 1: Weird-MySpace-Angle Profile Photo

10:38

Example 2: Geniuses

12:30

Example 3: Promiscuous iPhone Users

13:37

Example 4: Women, Aging, and Messaging

16:07

This is a quick preview of the lesson. For full access, please Log In or Sign up.
For more information, please see full course syllabus of Statistics

Statistics Least Squares Regression

Name: Statistics: Least Squares Regression
Brand: Educator.com
Price: 35 USD
Availability: InStock

Section 5: Linear Regression: Lecture 3 | 56:36 min

Lecture Description

Next Lecture

Previous Lecture

Discussion
Answer Engine
Download Lecture Slides
Table of Contents
Transcription
Related Books

Lecture Comments (5)

1 answer

Last reply by: Professor Son
Wed Nov 5, 2014 12:47 PM

Post by IBRAHIM FORNA on November 5, 2014

HOW DO YOU HAVE THE 14.75

0 answers

Post by Professor Son on October 10, 2014

Sorry everyone but the table in example #2 should read 17.50, 18, and 20 dollars. It's correct in the excel file but not on the slide!

1 answer

Last reply by: Professor Son
Fri Oct 10, 2014 1:39 PM

Post by Kambiz Khosrowshahi on April 1, 2013

To find the intercept, how did you come up with y=0?

Answer EngineGet answers to any question!Ask any question related to Statistics

Working on the solution...

Least Squares Regression

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

Intro 0:00
Roadmap 0:13

Roadmap

Best Fit 0:47

Best Fit

Sum of Squared Errors (SSE) 1:50

Sum of Squared Errors (SSE)

Why Squared? 3:38

Why Squared?

Quantitative Properties of Regression Line 4:51

Quantitative Properties of Regression Line

So How do we Find Such a Line? 6:49

SSEs of Different Line Equations & Lowest SSE
Carl Gauss' Method

How Do We Find Slope (b1) 11:00

How Do We Find Slope (b1)

Hoe Do We Find Intercept 15:11

Hoe Do We Find Intercept

Example 1: Which of These Equations Fit the Above Data Best? 17:18
Example 2: Find the Regression Line for These Data Points and Interpret It 26:31
Example 3: Summarize the Scatterplot and Find the Regression Line. 34:31
Example 4: Examine the Mean of Residuals 43:52

General Statistics Online Course

Section 1: Introduction
	Descriptive Statistics vs. Inferential Statistics	25:31
Section 2: About Samples: Cases, Variables, Measurements
	About Samples: Cases, Variables, Measurements	32:14
Section 3: Visualizing Distributions
	Introduction to Excel	8:09
	Frequency Distributions in Excel	39:10
	Frequency Distributions and Features	25:29
	Dotplots and Histograms in Excel	42:42
	Stemplots	12:23
	Bar Graphs	22:49
Section 4: Summarizing Distributions
	Central Tendency: Mean, Median, Mode	38:50
	Variability	42:40
	Five Number Summary & Boxplots	57:15
	Shape: Calculating Skewness & Kurtosis	41:51
	Normal Distribution	34:33
	Standard Normal Distributions & Z-Scores	41:44
	Normal Distribution: PDF vs. CDF	55:44
Section 5: Linear Regression
	Scatterplots	47:19
	Regression	32:02
	Least Squares Regression	56:36
	Correlation	43:58
	Correlation: r vs. r-squared	52:52
	Transformations of Data	27:08
Section 6: Collecting Data in an Experiment
	Sampling & Bias	54:44
	Sampling Methods	14:25
	Research Design	53:54
	Between and Within Treatment Variability	41:31
Section 7: Review of Probability Axioms
	Sample Spaces	37:52
	Addition Rule for Disjoint Events	20:29
	Conditional Probability	57:19
	Independent Events	24:27
Section 8: Probability Distributions
	Introduction to Probability Distributions	56:45
	Expected Value & Variance of Probability Distributions	53:41
	Binomial Distribution	55:15
Section 9: Sampling Distributions of Statistics
	Introduction to Sampling Distributions	48:17
	Sampling Distribution of the Mean	1:08:48
	Sampling Distribution of Sample Proportions	54:37
Section 10: Inferential Statistics
	Introduction to Confidence Intervals	42:53
	t Distributions	1:02:06
	Introduction to Hypothesis Testing	1:06:33
	Confidence Intervals for the Difference of Two Independent Means	55:14
	Hypothesis Testing for the Difference of Two Independent Means	50:00
	Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means	1:14:11
	Type I and Type II Errors	31:27
	Effect Size & Power	44:41
Section 11: Analysis of Variance
	F-distributions	24:46
	ANOVA with Independent Samples	1:09:25
	Repeated Measures ANOVA	1:15:13
Section 12: Chi-square Test
	Chi-Square Goodness-of-Fit Test	58:23
	Chi-Square Test of Homogeneity	51:36
Section 13: Overview of Statistics
	Overview of Statistics	18:11

Transcription: Least Squares Regression

Hi and welcome to www.educator.com.0000

In the previous lesson we learned about conceptually the idea of regression.0003

In this lesson of squares regression we are going to talk about how to actually calculate a regression line and find it.0007

Here is the roadmap and we are going to talk about what it means to best fit the data, and what does it mean for a line to best fit the data.0016

We are going to talk about sum of squared errors and y that conflict is important for regression.0025

We are going to talk about sum quantitative properties of the regression line.0030

We know conceptually what it means but once we do have a regression line there are sum rules that the regression line conforms to.0034

We are going to talk about how to actually find the slope and the intercept of the regression line,0042

What does it mean to best fit the data?0050

Well you can think about it like this, there are any number of lines that you could drop through a set of data.0053

We could draw that one, we could draw this one, we can draw this one, we could draw that one.0060

There are an infinite number of possible ones, but our goal is a regression line that is in the middle of all of these data points.0067

When it is in the middle that is what we mean by best fitting line.0077

You can think of this fit as roughly being equal to the concept of in the middle and the difference between all of these lines0082

and the true regression line is that the best fitting line is roughly in the middle.0106

How do we find the best fitting line?0114

Quantitatively what it means to best fit the line means that this line had the lowest sum of squared errors.0117

Because of that the regression line is also called the least squares line.0128

That is y it is called the least squares method.0135

Even though in the middle and best fit are good conceptual ideas they are not quantitative ideas.0138

This is the quantitative definition of what the best fitting line is.0146

Let us talk about what error is.0151

We had a particular word for error and that word is the residual.0154

And that residual is the difference between y and the predicted y from our best fitting line.0160

Having the lowest sum of squared errors is having the lowest SSE is really having a sum of all the squared residuals.0170

Residuals square and another way to write that in y – y hat².0185

This is our quantitative measure of how good our line is.0199

Now there is one of a Catch-22 here.0206

We have to have the line before we could figure out whether it has the lowest SSE but the question is how do we find that line?0208

First, before we go on let us talk about y we need to square these residuals?0220

Remember when we talked about what it means to be in the middle?0226

It means that the distances on the positive side or the point above the line and the points below the line,0228

the negative distances and these should all balance out.0235

If you have a bunch of positive and a bunch of negative and you add them together you should get 0.0239

Here is the tricky part that sum of the residuals, y – y hat.0246

The sum of the residual period that should be roughly equal to 0 for the best fitting line.0255

Because of that we want to square these distances.0263

I will write the squared in red because that means that this value, the sum of squared errors should be greater than 0.0272

We definitely want to square it.0283

These other mathematical properties that will be able to take these seventh feature.0285

We know what it means to quantitatively be the regression line.0295

It means having the lowest sum of squared errors but there is other quantitative properties that come along.0299

One important property to note is that this line, the regression line also contains point of averages.0304

The average of all your x and the average of all your y.0313

The average of variable 1 and the average of variable 2.0317

This point is often also called the center of mass.0321

It is really easy to find this point you just take the average of your x and take the average of your y.0325

x bar and y bar is your point of average.0333

You can also think of it as the center of mass because if we think of all your points, the scatter plot as long like a object, this is the center of that mass.0339

We already know that this line has the lowest SSE of any other line that also contains point of averages.0352

The sum of the residuals when you do not square it that should be approximately 0.0362

And because the sum of the residual is 0, the mean of the residuals is also 0 because the mean is the sum divided by the number of points.0371

If the sum is 0 it means it will be 0 and the variation of the residual is as small as possible.0379

It is smaller than other lines.0389

One way to quantify variation is something like standard deviation.0391

The residual have the smallest standard deviation than any other line.0396

Those are very important quantitative properties that we need to know.0404

This sounds like a wonderful, magical line.0411

How do we find such a line?0413

You might be thinking that this is sounding pretty hard and maybe we have to find the SSE0416

for a whole bunch of different line equations and find the one with the lowest SSE.0424

That is actually problematic.0431

It is a good idea.0433

It is a good conceptual idea but it is problematic, and here is y.0435

There are an infinite number of lines.0439

You can just change the y intercept by .0001 and get a totally different line.0447

You can change the slope but a tiny, tiny amount and get a totally different line.0454

There is an infinite number of lines that we would have to test.0459

Infinite number of potential lines.0465

We can find the SSE of infinite number of line.0473

That is just not an option for us.0479

Thank you to our hero Carl Gauss he was a mathematician and all kinds of German guy, and he helped us out a lot in statistics.0483

Carl Gauss invented this method called the method of least squares and through Carl method we could easily find the slope.0498

Here is how we do it.0508

The slope is going to be a ratio.0510

Slopes are always ratios.0512

Rise/run but ratio change of y over change of x.0514

Through this methodology has a similar line to it.0520

Here is how Carl Gauss find slope.0524

Remember slope is not z sub 0, that is intercept.0527

It is z sub n.0532

B sub 1.0535

We call Gauss’s method you want to add up, take the sum of all your x deviation, so the deviation between x and the mean.0536

X – x bar and multiply that to all deviations of y.0552

Notice that we are not using x hat or y hat because we do not have the line but we do have the center of mass.0566

We are using that.0576

We are finding the deviations from sort of the center of mass right.0577

And that sum over the sum of x - x bar².0582

It is sort of think about this, as this as the 2 variation over the x variation squared.0595

When you think of rise/run you think of the y/x and you will see that here.0604

There is the change of y and there is the changes of x.0612

There is two changes of x and here is the change of y.0617

This method will give you the slope of your regression line.0623

And just as a review, remember that when we have this x here, we really mean x sub I and we mean x sub i.0632

I goes from 1 all the way up to n.0642

However many data points we have in our sample.0645

This often goes without saying that this we want to do this for every single data point that you have.0648

That is Carl Gauss’s method.0659

In order to find slope we need to use that function.0663

It is the change of x, the deviations of x multiplied by the deviations of y all added up over the deviations of x².0669

The sum of the deviations of x².0683

Let us actually do a little example here.0685

If we had a whole bunch of x and a whole bunch of y I just put a few here.0693

X = 1, 1.0698

The first point is 1, 1 and the second point is 0 and third point is -1, -1.0702

A very easy line.0709

We already know that the line equations should be something like Y = x.0711

Let us see if we could use Carl Gauss’s method in order to find slope and often find it useful and that is where we are going.0721

The deviations of x and the deviations of y so the sum of the deviations of x times the deviations of y and the ratio of that sum to the deviations of x².0733

In this way we have to find X bar and y bar and easily we can tell here if we take the X bar and we just add it up to 0, so the average of 0 adding this up 0.0752

We already know X bar and y bar.0777

In order to do this I’m going to have to find x - x bar.0783

I’m going to have draw this in a different color to make it easier.0792

X - x bar and y - y bar.0797

Not only that I’m going to need to know x.0805

I need to know X - X bar × Y - y bar and I'm going to need to know x – x bar².0812

Let me draw some lines here.0828

Let us get started, because my x bar and y bar is 0, 0 this makes this easy for me.0838

Let us find this difference for x deviation × y deviation.0849

I will just multiply it across that is y across 0 and that is y across 1.0857

X – x bar².0867

This is y².0870

That is 1, 0, and 1.0870

I need to find the sum here and take that and put it over this one.0873

This sum is 2 and this sum is 2 and I’m going to put that in here my b1 = 2/2 which is 1.0884

We found our slope.0895

Our slope is just 1.0897

Since I already knew that slope of a regression line here should be y = x we know that y = 1 × x which is y = x.0899

Now we know how to find slope but how do we find intercept once we have our slope.0913

Let us see our previous example b1 = 1.0923

We know 1 is a point that falls under our regression line already.0931

X bar / y bar which is 0, 0.0937

If we know all of those things we could find our intercept just by plugging it in.0945

Our equations have a line in statistics is y = b knot + b sub 1 × x.0955

All you have to do is plug in our numbers and substitute in order to find the sum.0966

That is what we are looking for.0973

Here is an example y.0976

B sub knot, b sub 0 + 1 × 0.0979

Here I will get b = 0.0986

This is definitely easy.0991

This is just finding our missing value just by having our example y, x, and having this slope.0995

We could just derive this linear so that in the future we will know what exactly to plug in.1006

Instead of trying to solve for y we could just slip around these things in order to solve for b sub 0.1010

All we have to do is move this over to that side so that is y – b sub 1 × x.1022

That is how to find b sub 0 / y intercept.1031

Let us do some more examples.1039

Here is example 1.1042

Pretend that this is 3 different kind of pizzas.1044

Let us say this is medium size pizza.1049

Let us say that this is giant size pizza.1054

It has 100 grams of fat per pizza but the cost is $17.50.1059

The double size let us say is 110 grams of fat per pizza but the cost is $18.00.1068

Pizza x has 120 grams of fat but the cost is $20.00.1074

Maybe we would have a feeling that fat makes the taste better or the cost.1082

The question is which of these following equations fit this data the best.1092

In order to solve this problem we have to find the sum of squared errors for each of these equations.1102

We are not sure if any of these equations is the regression line.1109

We are just trying to find the best equation out of the 3 that we have.1114

Which of these set above data the best which equation has lowest error or sum of square error?1120

You can put up your examples right for you in the x so far and click on example 1 are already in the data right here.1138

Here is our 3 pizzas of fat as well as the cost.1149

It seems just from borrowing it that bar is a positive trend.1154

As fat goes up the cost goes up.1159

Let us go ahead and try our first equation that we are given.1165

The equation sub y = 4.45 that is the intercept + .1x.1183

I separated that in order to the intercept as well as the slope because we are going to need those numbers.1192

Here is the fat, here is the cost.1200

Let us find the predicted cost or y hat.1202

In order to find y hat all we have to do is plug in our x into our y equation.1206

That would be these values .75 and that will change.1216

I’m going to lock it in place and add that to b1 × x.1223

B1 is not going to change either so I’m going to lock that in place.1236

We do want b12 to keep changing.1250

I’m going to take that predicted cost and I’m just copying and pasting.1254

These predicted cost are always a little bit less than the actual cost.1264

Here all I have are residuals are going to be.1273

The residuals are the actual cost – the predicted cost.1281

All of our residuals are going to be positive.1286

That is the case where all of our actual data are above our prediction line and so because of that we know that this is not quite as good is not a great regression line.1292

Maybe it has the best smallest SSC.1306

We have our residuals and what I’m going to do is take this residual and square it.1310

You can find all my squared residuals and then in order to get the sum of squared residuals I will just add them all up and so I get 23.1875 as my sum of squared errors.1318

Who knows, maybe that is the lowest one, we will see.1331

Here I put in the data for the next equation.1343

It is y equals 8 +.025x.1351

I separated out into the intercept versus the slope and let us find the sum of squared error.1361

To find the predicted cost I need to add this, take my intercept, lock in place and add that to my slope × x.1369

I’m going to lock my slope in place as well.1391

And so right now we are a little bit low, still low, still really low.1401

I could see that because our predicted costs are more off than our predicted cost I’m going to guess the sum of squared errors is going to be considerably larger.1408

Let us find the residual.1420

The residual is the data minus the predicted.1424

The data minus the predicted and then all I do is square that residual and then sum them all up1428

because all of our predicted costs were more off than the predicted cost.1447

This equation is much better than this equation.1454

Now let us test out the third one.1458

I hope we did not see those answers and let us see what the predicted costs look like.1471

We want to add our intercept with our slope like that in place × the x.1477

Excel will automatically do order of operations, so I do not have to put parentheses around the multiplication first.1499

Let us say that this is actually close to the costs.1508

If all of that is off by 20% but just below.1511

Let us say that is the next one.1518

This one is off in the opposite direction.1520

It is off in the negative direction.1523

This one is off in the negative direction.1527

This seems like pretty good prediction where we are getting pretty close to the cost.1530

Let us find out what the residual is.1535

Here we should have a mix of residuals.1538

Some positive and some are negative.1540

So costs - the predicted.1543

We have 2 positive ones and one -1 and in order to balance each other out quite nicely because the positive ones are smaller,1547

but the negative one is a little bit bigger.1557

Let us square this.1560

Here if we sum that up we get .375 and that is considerably smaller error than 23 and 182.1566

I can say that the third equation is the best fitting line.1582

This one is the best one.1590

Here is example 2.1593

Now it give us the same data and x find the regression line for these data point and them interpret it.1595

If we go back to our Excel file and click on example 2 then you will see the data here for you.1605

First thing we probably want to do is figure out all the different things we would like to get1614

and I’m just going to use a little bit of a shorthand instead of writing x – x bar.1623

I’m going to write deviations of x.1633

The deviations of x and I'm also going to need deviations of y and then I'm going to need to multiply the deviations of x × the deviations of y.1635

I’m also going to find deviations of x².1649

These are the four things I need.1655

In order to get these, I need X bar.1657

Here I’m going to put averages and you need to find X bar and y bar.1666

And that is right here.1673

Here I’m going to put average and find my X bar which is that and also just copy and paste that over to find y bar, the average cost.1675

My point of averages is 110 and 18.5.1688

Let us find all the deviations of x in order to find slope.1695

The deviations of x is x- my x bar.1701

Here I’m going to lock my X bar in place and then I can just copy and paste all the way down.1708

Let us also find the deviations of y which is costs minus the average cost.1731

And then I could just copy and paste that all the way down as well.1745

Notice that my deviations of x and deviations of y they are like helping us toward that lowering1748

of the residual idea because the deviations of x if you look at all of them they are very balanced.1757

Half of them are one side of the average and half of them are the other.1766

The definition that is what average means and so are my deviations of y half of them are on the negative side1772

and half of them on the positive side and they balance one another up.1779

Now let us multiply the deviations of x by the deviations of y and noticed them doing this for every data point.1784

Here I know I need to find sum.1793

I will sum them here.1799

That is my sum.1809

Actually color these the different colors so that we do not get confused.1811

Let us also find our deviations of x² and let us find the sum of those.1817

Here are two sum and what we need to find in order to find the b sub 1.1828

Finding b sub 1 we need to find the ratio between this and that.1839

Our b sub 1 equals .125.1850

Now that we know b sub 1 we can easily find the b sub 0.1855

Now actually color these the different color and remember the formula for b sub 0 is just y – b sub 1 × x.1862

I already have an X and Y, my point of averages.1875

I forgot to put equal sign.1882

y – b sub 1 × X and I get 4.75.1884

In order to find my equation for the line all we do is take the two values and put them into my actual line equation.1901

In order to find my predicted y I would take 4.75 and add that to .125 × x.1911

That is my regression line for this set of data.1924

The previous example of this would actually choice c.1931

It actually happened to be the regression line as well.1934

Here is the kicker though we need to interpret this.1940

It is not good enough for us to just have this, we need to know what this means.1945

In order to get y, we are changing everything from that into costs.1952

You can think of the Y intercept as a base cost.1960

4.75 seems to be the base cost for these pizza and then for every gram of fat you add 12 ½ cents.1965

If you have 1 g of fat presumably, then you would just add 12 ½ cents to this pizza and perhaps that pizza would taste very good.1977

It would be probably a lot healthier for you.1987

If you add 100 grams of fat so hundred grams of fat and each of those grams of fat is worth .12 then you have to multiply that in order to add that to your base cost.1991

In some ways these base cost and there is sort of acting like giving you an idea of how much every gram of fat cost.2011

Because notice that as grams of fat goes up, the cost goes up.2035

This data is actually wrong.2040

This would be very cheap pizza.2047

This equation is actually helping us to get an idea of how much each gram of fat is costing and exactly what the relationship is between grams of fat and the cost.2058

That is the goal of the regression line.2070

For these 40 data points summarize the scatter plot then find the regression line.2072

Presumably these data points are in the Excel file and remember how to summarize the scatter plot we are going to be doing that.2078

We have to bring them up that Excel file and click on example 3 that have at the bottom.2087

This data looks sort of familiar to us, but now they are giving us a different label for the these variables.2096

Here it says student faculty ratio on the x-axis and cost per unit on the y axis.2107

I'm presuming that each of these cases are something likes schools, maybe universities.2115

When the student faculty ratio is very high, then it is cheap to enroll at the schools.2125

It is cheap to take units there.2131

But when the student faculty ratio is very low then it is more expensive.2133

This sort what it looks like.2137

Number 1.2141

What are our cases?2142

Our cases particular, probably something likes schools or universities.2143

Our variables are the student faculty ratio and cost per unit.2147

Number two in summarizing the scatter plot it seems as the general shape is linear roughly so we can just stick with that.2152

Number 3 the trend seems to be a negative trend where as one goes up, as ratio goes up the cost goes down.2168

As ratio was down, the cost goes up.2178

Number 4, what does this sort of strange look like?2184

A sort of like maybe small to medium.2193

That is harder to add up and number 5 potential explanations.2198

Well, it might be that in order to provide more faculty per students or a better student faculty ratio you need more faculty or you need less students.2205

More faculty cost for many less students it costs more for each student.2218

That makes sense but it could be when you have a high cost you want to keep the student faculty ratio low.2223

Or maybe some of the third variable like prestige that keeps this relationship going.2232

We summarize the scatter plot that I think now we have to find the regression line.2241

In order to find the regression line we do not really need this chart very much.2246

I’m just going to make it feel small and put it over here.2254

It is useful to look at later just to eyeball whether our regression line makes sense.2258

But let us go ahead and take our steps to find Carl Gauss’s method of finding b sub 1.2266

I'm going to write here X deviations, Y deviations, X deviations × Y deviations and then X deviations².2276

And this is when Excel comes in real handy because it would be really sort of crazy in order to do all of these.2300

Just make life easier for, let us go ahead and find X bar and y bar.2309

It does not matter where you find this.2321

It is somewhere easy for you to keep track of.2324

I’m going to find the average of my x and just use my student faculty ratio as my x.2327

The average student faculty ratio is about 20 students per faculty and just a copy that over our average cost is about $366 per unit.2336

Let us find the X deviations, so that would be my x - x bar and I want that to just locked in place and then I’m also going to find my Y deviations.2353

Y - Y bar lock that in place and multiply my x deviation and y deviations.2382

I’m also going to find c deviations².2402

Once I have this I can actually just copy and paste all four of these values all the way down for all 40 data points.2406

If you take a look half of the X deviations should be negative and approximately half are positive.2418

And same with the Y deviations some are positive and then some are negative to balance that out.2426

We know we need to find the sum.2437

We need to find the sum of our x deviations × y deviations and just to help us out I’m going to pull down this little bar here.2442

You see in this corner there is a little sandwich looking thing I pulled it down in order to lock that row in place and so that row does not move.2453

Move that down and I know what column I am in.2464

I want to sum of all of these together and then I'm also going to sum all of these together and I'm just going to color all of this in a different color so we know.2468

Let us find b sub 1.2497

B sub 1 is the ratio of this sum over this sum.2501

Our slope is a negative slope and that makes sense because we had a negative trend and that -21.51.2510

Given that let us find b sub 0.2522

We know in order to find b sub 0 we need to use x bar and y bar as our example point.2527

I’m going to take y, my y – b sub 1 × x.2534

Again Y intercept is 795.21.2555

I’m just going to pull this over hold us over on this side and here I can now talk about the regression line.2566

The regression line would be Y equals and we put the intercept first 795.21.2578

Instead of plus, we could just put a minus because our slope is -21.51 × x.2589

This is our regression line and if you want to interpreted the idea is that sort of the base cost is around $8002601

and for whatever the student faculty ratio is with each increment you get to the detection of about 20 to 21.50.2613

As the ratio goes up and up and up you get a little deduction every time.2625

Here is example 4.2634

Remember that the regression line must past through the point of averages.2636

That is one of the quantitative features of regression lines and the residual should be equal to 0 approximately.2642

One of these actually causes the other.2653

It is either that the passing through the point averages automatically makes the mean of the residual 02658

or that the mean of this residuals been 0 causes the point averages to be positive.2664

This problem is going supposed to be basically to explore which one causes the other.2672

Examine the mean of residuals for the regression line, which definitely passes through the point of average.2678

An example line that did not pass through the point of average and we should try to see in that case is the mean of residuals still 0.2685

Or an example line that does pass through the point of averages, but had the wrong slope.2697

For any slope of the line that passes through the point of averages that is not the regression line.2704

And then finally we want to discuss the question is that going to find the regression line as the line that makes the sum or mean of the residual 0.2711

Let us see.2721

If you click on example 4, I put back the pizza example that we covered at the very beginning.2723

Here I put in our regression line which have $4.75 as the base rate and 12 ½ cent increase for every gram of fat.2733

I already calculated for you the predicted costs, the residuals, and the squared residuals because we actually already did this in the first problem.2748

The only that I have changed is i also provided for you the sum of the residual.2761

Here we find that the sum of the residuals is 0.2767

Let us think about this regression line.2771

It definitely passes through the point of averages and the sum of residuals of 0.2773

This regression line definitely fits our quantitative definition for regression line and it has a very low sum of squared residual.2778

Now given this point let us think about a line that does not pass through the point of averages.2792

Now, if we take our line or lines, and which is slightly up or down in either direction it won't pass through the point of averages because of parallel lines never intercept.2800

We can keep the same slope .125, but we just change our b sub 0 very slightly.2817

We could just change the intercept very slightly.2829

Maybe would not move a line just a little bit so when we get 4.8 instead of 4.75 and here our y is y = 4.8 +.125 × x.2831

Let us find the squared residuals and all that stuff.2853

The predicted costs would be b sub 0 + B sub 1 and lock that in place × x.2856

Noticed that are our predicted costs are very, very close because our line is not that far off.2880

Let us calculate the residual.2889

The actual cost minus the predicted costs and let us also calculate the squared residuals.2891

Just squaring each of my residual and are being added up down here.2904

Notice that although these sum of squared errors are very close to slightly this one is just bigger than this one.2909

It is slightly worse fit than this one.2917

This one is a better fit but let us check and see whether our residual at up to 0.2920

It does not.2927

It has been close to 0, but it does not quite add up to 0.2928

These lines that do not quite pass through the point of averages, even though they are only a little bit off these do not add that the sum of the residuals do not add up to 0.2934

Now that we have all this we can actually just change it.2948

Let us move the regression line down just a little bit.2952

Let us just move it down slightly and make this 4.5 instead of 4.75.2956

What if we do that?2963

Well again it is not that far off.2965

It is still pretty low sum of squared error, but the regression line is still the lowest and the residuals still does not add up to 0.2968

If it does not pass through the point of averages then it is off by a little bit.2982

The other thing we could do is we could keep the intercept the same and instead we could change the slope by a little bit.2988

If we do that, then we know it does not pass through the point of averages.2999

When we do that what we find once again is that the sum of squared residual is more off than our regression line.3003

Our sum of residuals still does not add up to 0.3015

Although we try a couple of lines if it does not pass through the point of averages, we see that the residual does not add up to 0.3018

Now let us talk about the flip side.3029

A line that does pass through the point of averages, but it is still not the regression line.3032

Well, in order to find one that passes through the point of averages, but had the wrong slope.3038

It is nice to figure out from our actual point, a line that passes through there but had just a different slope.3047

You can pick any slope you want.3058

I will pick the slope of 5.3059

Y is 5/1.3061

Let us find b sub 0.3066

We could just use that same formula we have use and plug in our values for the point of averages.3069

That would be y – x × b1.3080

Our B1 is right next to it.3090

This is the point that definitely passes through the point of averages, but obviously has the wrong slope.3093

Let us find the predicted costs.3101

I remember this is the line that it is totally made up.3104

Predicted costs might be very off.3106

Predicted costs would be the intercept and lock that in place + b1 × x and then lock b1 in place.3109

We see that the costs are fairly off.3136

$-31 this is pretty close to 18 but this was pretty far off 68.5.3141

Now let us find the residual.3148

The actual cost minus the predicted cost.3150

and finally, let us find the squared residuals.3156

Notice that the sum of squared residuals is very very off 4,753.3163

It is pretty off.3172

We know that this is not a great line.3173

It is not a well fitting line.3175

These other lines actually fits better, but let us check that sum of the residual.3176

What does that add up to be?3181

That has a sum of the residual is 0.3185

Just because this line passes through the point of average.3195

Remember in order to calculate residual always using is x bar and y bar.3201

It actually makes sense that as long as it passes through that point of averages the sum of residual is going to be 0.3209

Now that we have all of this setup with all our nice formulas we can actually put in any slope.3220

Let us put it -.1.3228

It will find the B sub 1 and this line perfectly passes through the point of averages.3231

Even though our sum of squared residuals have improved, our residual still add up to 0.3238

0 even though it is not the line of regression and let us try another one -.00035.3246

Excel will do this just because it too many small points for it to show you, but still, you get the idea.3259

Although it looks sort of crazy number this means that you need to move the decimal point to the left 18 times.3266

That is very, very close to 0.3275

Let us try another one 500.3279

Once again we see that the sum of the residual is 0.3284

These are obviously not very good lines they are not very good regression lines because the squared residuals are terribly, terribly off.3290

The sum of the residual is 0 as long as the line passes through the point of averages.3301

Let us go back to example 4.3310

Here we have seen the mean of the residuals or the sum of the residual similar idea for the regression line, and so the mean of residual equals 0.3314

An example line that does not pass through the point of averages, mean of residual is not equal 0.3328

An example line that does pass through the point of averages that has the wrong slope.3338

Here we find the mean of residuals once again equal 0.3345

Is it good enough to define the regression line as the line that makes the sum or mean of the residual 0?3353

No, that is not good enough because any line that passes through the point of averages will have the sum or mean of the residual as 0.3359

This one really causes that one.3370

We also need to have all those other rules.3374

For instant the other rules being the sum of squared errors is the lowest in a regression line that definitely has to be there.3379

That is it for calculating regressions using the least squares method.3389

See you next time on www.educator.com.3393

Related Books

Statistics by Witte, 10th Edition

Authors: Robert S. Witte, John S . Witte

ISBN: 1118450531

Publisher: Wiley

Year: 2013

This book provides a clear and methodical approach to essential statistical procedures. It clearly explains the basic concepts and procedures of descriptive and inferential statistical analysis. This book features a new emphasis on expressions involving sums of squares and degrees of freedom as well as a stronger stress on the importance of variability.

Related Books

Name	Description	Link
BookRenter.com	BookRenter.com is simply the most reliable online textbook rental service.	Visit BookRenter.com
PhysicsForums.com Homework Help	Physics Forums is a scientific community for students looking for math & science help.	Visit PhysicsForums.com Homework Help

Statistics Least Squares Regression

Share this knowledge with your friends!

Copy & Paste this embed code into your website’s HTML

Discussion

Answer Engine

Download Lecture Slides

Table of Contents

Transcription

Related Books

Answer EngineGet answers to any question!Ask any question related to Statistics

Least Squares Regression

General Statistics Online Course

Transcription: Least Squares Regression

Related Books

Related Books

Start Learning Now

Membership Overview

Statistics Least Squares Regression

Share this knowledge with your friends!

Copy & Paste this embed code into your website’s HTML

Discussion

Answer Engine

Download Lecture Slides

Table of Contents

Transcription

Related Books

Answer EngineGet answers to any question!Ask any question related to Statistics

Least Squares Regression

General Statistics Online Course

Transcription: Least Squares Regression

Related Books

Related Books

Available 24/7. Unlimited Access to Our Entire Library.

Searchable Lessons

Get Answers & Community Support

Downloadable Lecture Notes

Study Guides, Worksheets and Extra Example Lessons

Start Learning Now

Membership Overview