  Dr. Ji Son

Scatterplots

Slide Duration:

Section 1: Introduction
Descriptive Statistics vs. Inferential Statistics

25m 31s

Intro
0:00
0:10
0:11
Statistics
0:35
Statistics
0:36
Let's Think About High School Science
1:12
Measurement and Find Patterns (Mathematical Formula)
1:13
Statistics = Math of Distributions
4:58
Distributions
4:59
Problematic… but also GREAT
5:58
Statistics
7:33
How is It Different from Other Specializations in Mathematics?
7:34
Statistics is Fundamental in Natural and Social Sciences
7:53
Two Skills of Statistics
8:20
Description (Exploration)
8:21
Inference
9:13
Descriptive Statistics vs. Inferential Statistics: Apply to Distributions
9:58
Descriptive Statistics
9:59
Inferential Statistics
11:05
Populations vs. Samples
12:19
Populations vs. Samples: Is it the Truth?
12:20
Populations vs. Samples: Pros & Cons
13:36
Populations vs. Samples: Descriptive Values
16:12
Putting Together Descriptive/Inferential Stats & Populations/Samples
17:10
Putting Together Descriptive/Inferential Stats & Populations/Samples
17:11
Example 1: Descriptive Statistics vs. Inferential Statistics
19:09
Example 2: Descriptive Statistics vs. Inferential Statistics
20:47
Example 3: Sample, Parameter, Population, and Statistic
21:40
Example 4: Sample, Parameter, Population, and Statistic
23:28
Section 2: About Samples: Cases, Variables, Measurements

32m 14s

Intro
0:00
Data
0:09
Data, Cases, Variables, and Values
0:10
Rows, Columns, and Cells
2:03
Example: Aircrafts
3:52
How Do We Get Data?
5:38
Research: Question and Hypothesis
5:39
Research Design
7:11
Measurement
7:29
Research Analysis
8:33
Research Conclusion
9:30
Types of Variables
10:03
Discrete Variables
10:04
Continuous Variables
12:07
Types of Measurements
14:17
Types of Measurements
14:18
Types of Measurements (Scales)
17:22
Nominal
17:23
Ordinal
19:11
Interval
21:33
Ratio
24:24
Example 1: Cases, Variables, Measurements
25:20
Example 2: Which Scale of Measurement is Used?
26:55
Example 3: What Kind of a Scale of Measurement is This?
27:26
Example 4: Discrete vs. Continuous Variables.
30:31
Section 3: Visualizing Distributions
Introduction to Excel

8m 9s

Intro
0:00
Before Visualizing Distribution
0:10
Excel
0:11
Excel: Organization
0:45
Workbook
0:46
Column x Rows
1:50
Tools: Menu Bar, Standard Toolbar, and Formula Bar
3:00
Excel + Data
6:07
Exce and Data
6:08
Frequency Distributions in Excel

39m 10s

Intro
0:00
0:08
Data in Excel and Frequency Distributions
0:09
Raw Data to Frequency Tables
0:42
Raw Data to Frequency Tables
0:43
Frequency Tables: Using Formulas and Pivot Tables
1:28
Example 1: Number of Births
7:17
Example 2: Age Distribution
20:41
Example 3: Height Distribution
27:45
Example 4: Height Distribution of Males
32:19
Frequency Distributions and Features

25m 29s

Intro
0:00
0:10
Data in Excel, Frequency Distributions, and Features of Frequency Distributions
0:11
Example #1
1:35
Uniform
1:36
Example #2
2:58
Unimodal, Skewed Right, and Asymmetric
2:59
Example #3
6:29
Bimodal
6:30
Example #4a
8:29
Symmetric, Unimodal, and Normal
8:30
Point of Inflection and Standard Deviation
11:13
Example #4b
12:43
Normal Distribution
12:44
Summary
13:56
Uniform, Skewed, Bimodal, and Normal
13:57
17:34
Sketch Problem 2: Life Expectancy
20:01
Sketch Problem 3: Telephone Numbers
22:01
Sketch Problem 4: Length of Time Used to Complete a Final Exam
23:43
Dotplots and Histograms in Excel

42m 42s

Intro
0:00
0:06
0:07
Previously
1:02
Data, Frequency Table, and visualization
1:03
Dotplots
1:22
Dotplots Excel Example
1:23
Dotplots: Pros and Cons
7:22
Pros and Cons of Dotplots
7:23
Dotplots Excel Example Cont.
9:07
Histograms
12:47
Histograms Overview
12:48
Example of Histograms
15:29
Histograms: Pros and Cons
31:39
Pros
31:40
Cons
32:31
Frequency vs. Relative Frequency
32:53
Frequency
32:54
Relative Frequency
33:36
Example 1: Dotplots vs. Histograms
34:36
Example 2: Age of Pennies Dotplot
36:21
Example 3: Histogram of Mammal Speeds
38:27
Example 4: Histogram of Life Expectancy
40:30
Stemplots

12m 23s

Intro
0:00
0:05
0:06
What Sets Stemplots Apart?
0:46
Data Sets, Dotplots, Histograms, and Stemplots
0:47
Example 1: What Do Stemplots Look Like?
1:58
Example 2: Back-to-Back Stemplots
5:00
7:46
Example 4: Quiz Grade & Afterschool Tutoring Stemplot
9:56
Bar Graphs

22m 49s

Intro
0:00
0:05
0:08
Review of Frequency Distributions
0:44
Y-axis and X-axis
0:45
Types of Frequency Visualizations Covered so Far
2:16
Introduction to Bar Graphs
4:07
Example 1: Bar Graph
5:32
Example 1: Bar Graph
5:33
Do Shapes, Center, and Spread of Distributions Apply to Bar Graphs?
11:07
Do Shapes, Center, and Spread of Distributions Apply to Bar Graphs?
11:08
Example 2: Create a Frequency Visualization for Gender
14:02
Example 3: Cases, Variables, and Frequency Visualization
16:34
Example 4: What Kind of Graphs are Shown Below?
19:29
Section 4: Summarizing Distributions
Central Tendency: Mean, Median, Mode

38m 50s

Intro
0:00
0:07
0:08
Central Tendency 1
0:56
Way to Summarize a Distribution of Scores
0:57
Mode
1:32
Median
2:02
Mean
2:36
Central Tendency 2
3:47
Mode
3:48
Median
4:20
Mean
5:25
Summation Symbol
6:11
Summation Symbol
6:12
Population vs. Sample
10:46
Population vs. Sample
10:47
Excel Examples
15:08
Finding Mode, Median, and Mean in Excel
15:09
Median vs. Mean
21:45
Effect of Outliers
21:46
Relationship Between Parameter and Statistic
22:44
Type of Measurements
24:00
Which Distributions to Use With
24:55
Example 1: Mean
25:30
Example 2: Using Summation Symbol
29:50
Example 3: Average Calorie Count
32:50
Example 4: Creating an Example Set
35:46
Variability

42m 40s

Intro
0:00
0:05
0:06
0:45
0:46
5:45
5:46
Range, Quartiles and Interquartile Range
6:37
Range
6:38
Interquartile Range
8:42
Interquartile Range Example
10:58
Interquartile Range Example
10:59
Variance and Standard Deviation
12:27
Deviations
12:28
Sum of Squares
14:35
Variance
16:55
Standard Deviation
17:44
Sum of Squares (SS)
18:34
Sum of Squares (SS)
18:35
Population vs. Sample SD
22:00
Population vs. Sample SD
22:01
Population vs. Sample
23:20
Mean
23:21
SD
23:51
Example 1: Find the Mean and Standard Deviation of the Variable Friends in the Excel File
27:21
Example 2: Find the Mean and Standard Deviation of the Tagged Photos in the Excel File
35:25
Example 3: Sum of Squares
38:58
Example 4: Standard Deviation
41:48
Five Number Summary & Boxplots

57m 15s

Intro
0:00
0:06
0:07
Summarizing Distributions
0:37
0:38
5 Number Summary
1:14
Boxplot: Visualizing 5 Number Summary
3:37
Boxplot: Visualizing 5 Number Summary
3:38
Boxplots on Excel
9:01
Using 'Stocks' and Using Stacked Columns
9:02
Boxplots on Excel Example
10:14
When are Boxplots Useful?
32:14
Pros
32:15
Cons
32:59
How to Determine Outlier Status
33:24
Rule of Thumb: Upper Limit
33:25
Rule of Thumb: Lower Limit
34:16
Signal Outliers in an Excel Data File Using Conditional Formatting
34:52
Modified Boxplot
48:38
Modified Boxplot
48:39
Example 1: Percentage Values & Lower and Upper Whisker
49:10
Example 2: Boxplot
50:10
Example 3: Estimating IQR From Boxplot
53:46
Example 4: Boxplot and Missing Whisker
54:35
Shape: Calculating Skewness & Kurtosis

41m 51s

Intro
0:00
0:16
0:17
Skewness Concept
1:09
Skewness Concept
1:10
Calculating Skewness
3:26
Calculating Skewness
3:27
Interpreting Skewness
7:36
Interpreting Skewness
7:37
Excel Example
8:49
Kurtosis Concept
20:29
Kurtosis Concept
20:30
Calculating Kurtosis
24:17
Calculating Kurtosis
24:18
Interpreting Kurtosis
29:01
Leptokurtic
29:35
Mesokurtic
30:10
Platykurtic
31:06
Excel Example
32:04
Example 1: Shape of Distribution
38:28
Example 2: Shape of Distribution
39:29
Example 3: Shape of Distribution
40:14
Example 4: Kurtosis
41:10
Normal Distribution

34m 33s

Intro
0:00
0:13
0:14
What is a Normal Distribution
0:44
The Normal Distribution As a Theoretical Model
0:45
Possible Range of Probabilities
3:05
Possible Range of Probabilities
3:06
What is a Normal Distribution
5:07
Can Be Described By
5:08
Properties
5:49
'Same' Shape: Illusion of Different Shape!
7:35
'Same' Shape: Illusion of Different Shape!
7:36
Types of Problems
13:45
Example: Distribution of SAT Scores
13:46
Shape Analogy
19:48
Shape Analogy
19:49
Example 1: The Standard Normal Distribution and Z-Scores
22:34
Example 2: The Standard Normal Distribution and Z-Scores
25:54
Example 3: Sketching and Normal Distribution
28:55
Example 4: Sketching and Normal Distribution
32:32
Standard Normal Distributions & Z-Scores

41m 44s

Intro
0:00
0:06
0:07
A Family of Distributions
0:28
Infinite Set of Distributions
0:29
Transforming Normal Distributions to 'Standard' Normal Distribution
1:04
Normal Distribution vs. Standard Normal Distribution
2:58
Normal Distribution vs. Standard Normal Distribution
2:59
Z-Score, Raw Score, Mean, & SD
4:08
Z-Score, Raw Score, Mean, & SD
4:09
Weird Z-Scores
9:40
Weird Z-Scores
9:41
Excel
16:45
For Normal Distributions
16:46
For Standard Normal Distributions
19:11
Excel Example
20:24
Types of Problems
25:18
Percentage Problem: P(x)
25:19
Raw Score and Z-Score Problems
26:28
Standard Deviation Problems
27:01
Shape Analogy
27:44
Shape Analogy
27:45
Example 1: Deaths Due to Heart Disease vs. Deaths Due to Cancer
28:24
Example 2: Heights of Male College Students
33:15
Example 3: Mean and Standard Deviation
37:14
Example 4: Finding Percentage of Values in a Standard Normal Distribution
37:49
Normal Distribution: PDF vs. CDF

55m 44s

Intro
0:00
0:15
0:16
Frequency vs. Cumulative Frequency
0:56
Frequency vs. Cumulative Frequency
0:57
Frequency vs. Cumulative Frequency
4:32
Frequency vs. Cumulative Frequency Cont.
4:33
Calculus in Brief
6:21
Derivative-Integral Continuum
6:22
PDF
10:08
PDF for Standard Normal Distribution
10:09
PDF for Normal Distribution
14:32
Integral of PDF = CDF
21:27
Integral of PDF = CDF
21:28
Example 1: Cumulative Frequency Graph
23:31
Example 2: Mean, Standard Deviation, and Probability
24:43
Example 3: Mean and Standard Deviation
35:50
Example 4: Age of Cars
49:32
Section 5: Linear Regression
Scatterplots

47m 19s

Intro
0:00
0:04
0:05
Previous Visualizations
0:30
Frequency Distributions
0:31
Compare & Contrast
2:26
Frequency Distributions Vs. Scatterplots
2:27
Summary Values
4:53
Shape
4:54
Center & Trend
6:41
8:22
Univariate & Bivariate
10:25
Example Scatterplot
10:48
Shape, Trend, and Strength
10:49
Positive and Negative Association
14:05
Positive and Negative Association
14:06
Linearity, Strength, and Consistency
18:30
Linearity
18:31
Strength
19:14
Consistency
20:40
Summarizing a Scatterplot
22:58
Summarizing a Scatterplot
22:59
Example 1: Gapminder.org, Income x Life Expectancy
26:32
Example 2: Gapminder.org, Income x Infant Mortality
36:12
Example 3: Trend and Strength of Variables
40:14
Example 4: Trend, Strength and Shape for Scatterplots
43:27
Regression

32m 2s

Intro
0:00
0:05
0:06
Linear Equations
0:34
Linear Equations: y = mx + b
0:35
Rough Line
5:16
Rough Line
5:17
Regression - A 'Center' Line
7:41
Reasons for Summarizing with a Regression Line
7:42
Predictor and Response Variable
10:04
Goal of Regression
12:29
Goal of Regression
12:30
Prediction
14:50
Example: Servings of Mile Per Year Shown By Age
14:51
Intrapolation
17:06
Extrapolation
17:58
Error in Prediction
20:34
Prediction Error
20:35
Residual
21:40
Example 1: Residual
23:34
Example 2: Large and Negative Residual
26:30
Example 3: Positive Residual
28:13
Example 4: Interpret Regression Line & Extrapolate
29:40
Least Squares Regression

56m 36s

Intro
0:00
0:13
0:14
Best Fit
0:47
Best Fit
0:48
Sum of Squared Errors (SSE)
1:50
Sum of Squared Errors (SSE)
1:51
Why Squared?
3:38
Why Squared?
3:39
Quantitative Properties of Regression Line
4:51
Quantitative Properties of Regression Line
4:52
So How do we Find Such a Line?
6:49
SSEs of Different Line Equations & Lowest SSE
6:50
Carl Gauss' Method
8:01
How Do We Find Slope (b1)
11:00
How Do We Find Slope (b1)
11:01
Hoe Do We Find Intercept
15:11
Hoe Do We Find Intercept
15:12
Example 1: Which of These Equations Fit the Above Data Best?
17:18
Example 2: Find the Regression Line for These Data Points and Interpret It
26:31
Example 3: Summarize the Scatterplot and Find the Regression Line.
34:31
Example 4: Examine the Mean of Residuals
43:52
Correlation

43m 58s

Intro
0:00
0:05
0:06
Summarizing a Scatterplot Quantitatively
0:47
Shape
0:48
Trend
1:11
Strength: Correlation ®
1:45
Correlation Coefficient ( r )
2:30
Correlation Coefficient ( r )
2:31
Trees vs. Forest
11:59
Trees vs. Forest
12:00
Calculating r
15:07
Average Product of z-scores for x and y
15:08
Relationship between Correlation and Slope
21:10
Relationship between Correlation and Slope
21:11
Example 1: Find the Correlation between Grams of Fat and Cost
24:11
Example 2: Relationship between r and b1
30:24
Example 3: Find the Regression Line
33:35
Example 4: Find the Correlation Coefficient for this Set of Data
37:37
Correlation: r vs. r-squared

52m 52s

Intro
0:00
0:07
0:08
R-squared
0:44
What is the Meaning of It? Why Squared?
0:45
Parsing Sum of Squared (Parsing Variability)
2:25
SST = SSR + SSE
2:26
What is SST and SSE?
7:46
What is SST and SSE?
7:47
r-squared
18:33
Coefficient of Determination
18:34
If the Correlation is Strong…
20:25
If the Correlation is Strong…
20:26
If the Correlation is Weak…
22:36
If the Correlation is Weak…
22:37
Example 1: Find r-squared for this Set of Data
23:56
Example 2: What Does it Mean that the Simple Linear Regression is a 'Model' of Variance?
33:54
Example 3: Why Does r-squared Only Range from 0 to 1
37:29
Example 4: Find the r-squared for This Set of Data
39:55
Transformations of Data

27m 8s

Intro
0:00
0:05
0:06
Why Transform?
0:26
Why Transform?
0:27
Shape-preserving vs. Shape-changing Transformations
5:14
Shape-preserving = Linear Transformations
5:15
Shape-changing Transformations = Non-linear Transformations
6:20
Common Shape-Preserving Transformations
7:08
Common Shape-Preserving Transformations
7:09
Common Shape-Changing Transformations
8:59
Powers
9:00
Logarithms
9:39
Change Just One Variable? Both?
10:38
Log-log Transformations
10:39
Log Transformations
14:38
Example 1: Create, Graph, and Transform the Data Set
15:19
Example 2: Create, Graph, and Transform the Data Set
20:08
Example 3: What Kind of Model would You Choose for this Data?
22:44
Example 4: Transformation of Data
25:46
Section 6: Collecting Data in an Experiment
Sampling & Bias

54m 44s

Intro
0:00
0:05
0:06
Descriptive vs. Inferential Statistics
1:04
Descriptive Statistics: Data Exploration
1:05
Example
2:03
To tackle Generalization…
4:31
Generalization
4:32
Sampling
6:06
'Good' Sample
6:40
Defining Samples and Populations
8:55
Population
8:56
Sample
11:16
Why Use Sampling?
13:09
Why Use Sampling?
13:10
Goal of Sampling: Avoiding Bias
15:04
What is Bias?
15:05
Where does Bias Come from: Sampling Bias
17:53
Where does Bias Come from: Response Bias
18:27
Sampling Bias: Bias from Bas Sampling Methods
19:34
Size Bias
19:35
Voluntary Response Bias
21:13
Convenience Sample
22:22
Judgment Sample
23:58
25:40
Response Bias: Bias from 'Bad' Data Collection Methods
28:00
Nonresponse Bias
29:31
Questionnaire Bias
31:10
Incorrect Response or Measurement Bias
37:32
Example 1: What Kind of Biases?
40:29
Example 2: What Biases Might Arise?
44:46
Example 3: What Kind of Biases?
48:34
Example 4: What Kind of Biases?
51:43
Sampling Methods

14m 25s

Intro
0:00
0:05
0:06
Biased vs. Unbiased Sampling Methods
0:32
Biased Sampling
0:33
Unbiased Sampling
1:13
Probability Sampling Methods
2:31
Simple Random
2:54
Stratified Random Sampling
4:06
Cluster Sampling
5:24
Two-staged Sampling
6:22
Systematic Sampling
7:25
8:33
Example 2: Describe How to Take a Two-Stage Sample from this Book
10:16
Example 3: Sampling Methods
11:58
Example 4: Cluster Sample Plan
12:48
Research Design

53m 54s

Intro
0:00
0:06
0:07
Descriptive vs. Inferential Statistics
0:51
Descriptive Statistics: Data Exploration
0:52
Inferential Statistics
1:02
Variables and Relationships
1:44
Variables
1:45
Relationships
2:49
Not Every Type of Study is an Experiment…
4:16
Category I - Descriptive Study
4:54
Category II - Correlational Study
5:50
Category III - Experimental, Quasi-experimental, Non-experimental
6:33
Category III
7:42
Experimental, Quasi-experimental, and Non-experimental
7:43
Why CAN'T the Other Strategies Determine Causation?
10:18
Third-variable Problem
10:19
Directionality Problem
15:49
What Makes Experiments Special?
17:54
Manipulation
17:55
Control (and Comparison)
21:58
Methods of Control
26:38
Holding Constant
26:39
Matching
29:11
Random Assignment
31:48
Experiment Terminology
34:09
'true' Experiment vs. Study
34:10
Independent Variable (IV)
35:16
Dependent Variable (DV)
35:45
Factors
36:07
Treatment Conditions
36:23
Levels
37:43
Confounds or Extraneous Variables
38:04
Blind
38:38
Blind Experiments
38:39
Double-blind Experiments
39:29
How Categories Relate to Statistics
41:35
Category I - Descriptive Study
41:36
Category II - Correlational Study
42:05
Category III - Experimental, Quasi-experimental, Non-experimental
42:43
Example 1: Research Design
43:50
Example 2: Research Design
47:37
Example 3: Research Design
50:12
Example 4: Research Design
52:00
Between and Within Treatment Variability

41m 31s

Intro
0:00
0:06
0:07
Experimental Designs
0:51
Experimental Designs: Manipulation & Control
0:52
Two Types of Variability
2:09
Between Treatment Variability
2:10
Within Treatment Variability
3:31
Updated Goal of Experimental Design
5:47
Updated Goal of Experimental Design
5:48
Example: Drugs and Driving
6:56
Example: Drugs and Driving
6:57
Different Types of Random Assignment
11:27
All Experiments
11:28
Completely Random Design
12:02
Randomized Block Design
13:19
Randomized Block Design
15:48
Matched Pairs Design
15:49
Repeated Measures Design
19:47
Between-subject Variable vs. Within-subject Variable
22:43
Completely Randomized Design
22:44
Repeated Measures Design
25:03
Example 1: Design a Completely Random, Matched Pair, and Repeated Measures Experiment
26:16
Example 2: Block Design
31:41
Example 3: Completely Randomized Designs
35:11
Example 4: Completely Random, Matched Pairs, or Repeated Measures Experiments?
39:01
Section 7: Review of Probability Axioms
Sample Spaces

37m 52s

Intro
0:00
0:07
0:08
Why is Probability Involved in Statistics
0:48
Probability
0:49
Can People Tell the Difference between Cheap and Gourmet Coffee?
2:08
Taste Test with Coffee Drinkers
3:37
If No One can Actually Taste the Difference
3:38
If Everyone can Actually Taste the Difference
5:36
Creating a Probability Model
7:09
Creating a Probability Model
7:10
D'Alembert vs. Necker
9:41
D'Alembert vs. Necker
9:42
Problem with D'Alembert's Model
13:29
Problem with D'Alembert's Model
13:30
Covering Entire Sample Space
15:08
Fundamental Principle of Counting
15:09
Where Do Probabilities Come From?
22:54
Observed Data, Symmetry, and Subjective Estimates
22:55
Checking whether Model Matches Real World
24:27
Law of Large Numbers
24:28
Example 1: Law of Large Numbers
27:46
Example 2: Possible Outcomes
30:43
Example 3: Brands of Coffee and Taste
33:25
Example 4: How Many Different Treatments are there?
35:33

20m 29s

Intro
0:00
0:08
0:09
Disjoint Events
0:41
Disjoint Events
0:42
Meaning of 'or'
2:39
In Regular Life
2:40
In Math/Statistics/Computer Science
3:10
3:55
If A and B are Disjoint: P (A and B)
3:56
If A and B are Disjoint: P (A or B)
5:15
5:41
5:42
8:31
If A and B are not Disjoint: P (A or B)
8:32
Example 1: Which of These are Mutually Exclusive?
10:50
Example 2: What is the Probability that You will Have a Combination of One Heads and Two Tails?
12:57
Example 3: Engagement Party
15:17
Example 4: Home Owner's Insurance
18:30
Conditional Probability

57m 19s

Intro
0:00
0:05
0:06
'or' vs. 'and' vs. Conditional Probability
1:07
'or' vs. 'and' vs. Conditional Probability
1:08
'and' vs. Conditional Probability
5:57
P (M or L)
5:58
P (M and L)
8:41
P (M|L)
11:04
P (L|M)
12:24
Tree Diagram
15:02
Tree Diagram
15:03
Defining Conditional Probability
22:42
Defining Conditional Probability
22:43
Common Contexts for Conditional Probability
30:56
Medical Testing: Positive Predictive Value
30:57
Medical Testing: Sensitivity
33:03
Statistical Tests
34:27
Example 1: Drug and Disease
36:41
Example 2: Marbles and Conditional Probability
40:04
Example 3: Cards and Conditional Probability
45:59
Example 4: Votes and Conditional Probability
50:21
Independent Events

24m 27s

Intro
0:00
0:05
0:06
Independent Events & Conditional Probability
0:26
Non-independent Events
0:27
Independent Events
2:00
Non-independent and Independent Events
3:08
Non-independent and Independent Events
3:09
Defining Independent Events
5:52
Defining Independent Events
5:53
Multiplication Rule
7:29
Previously…
7:30
But with Independent Evens
8:53
Example 1: Which of These Pairs of Events are Independent?
11:12
Example 2: Health Insurance and Probability
15:12
Example 3: Independent Events
17:42
Example 4: Independent Events
20:03
Section 8: Probability Distributions
Introduction to Probability Distributions

56m 45s

Intro
0:00
0:08
0:09
Sampling vs. Probability
0:57
Sampling
0:58
Missing
1:30
What is Missing?
3:06
Insight: Probability Distributions
5:26
Insight: Probability Distributions
5:27
What is a Probability Distribution?
7:29
From Sample Spaces to Probability Distributions
8:44
Sample Space
8:45
Probability Distribution of the Sum of Two Die
11:16
The Random Variable
17:43
The Random Variable
17:44
Expected Value
21:52
Expected Value
21:53
Example 1: Probability Distributions
28:45
Example 2: Probability Distributions
35:30
Example 3: Probability Distributions
43:37
Example 4: Probability Distributions
47:20
Expected Value & Variance of Probability Distributions

53m 41s

Intro
0:00
0:06
0:07
Discrete vs. Continuous Random Variables
1:04
Discrete vs. Continuous Random Variables
1:05
Mean and Variance Review
4:44
Mean: Sample, Population, and Probability Distribution
4:45
Variance: Sample, Population, and Probability Distribution
9:12
Example Situation
14:10
Example Situation
14:11
Some Special Cases…
16:13
Some Special Cases…
16:14
Linear Transformations
19:22
Linear Transformations
19:23
What Happens to Mean and Variance of the Probability Distribution?
20:12
n Independent Values of X
25:38
n Independent Values of X
25:39
Compare These Two Situations
30:56
Compare These Two Situations
30:57
Two Random Variables, X and Y
32:02
Two Random Variables, X and Y
32:03
Example 1: Expected Value & Variance of Probability Distributions
35:35
Example 2: Expected Values & Standard Deviation
44:17
Example 3: Expected Winnings and Standard Deviation
48:18
Binomial Distribution

55m 15s

Intro
0:00
0:05
0:06
Discrete Probability Distributions
1:42
Discrete Probability Distributions
1:43
Binomial Distribution
2:36
Binomial Distribution
2:37
Multiplicative Rule Review
6:54
Multiplicative Rule Review
6:55
How Many Outcomes with k 'Successes'
10:23
Adults and Bachelor's Degree: Manual List of Outcomes
10:24
P (X=k)
19:37
Putting Together # of Outcomes with the Multiplicative Rule
19:38
Expected Value and Standard Deviation in a Binomial Distribution
25:22
Expected Value and Standard Deviation in a Binomial Distribution
25:23
Example 1: Coin Toss
33:42
38:03
Example 3: Types of Blood and Probability
45:39
Example 4: Expected Number and Standard Deviation
51:11
Section 9: Sampling Distributions of Statistics
Introduction to Sampling Distributions

48m 17s

Intro
0:00
0:08
0:09
Probability Distributions vs. Sampling Distributions
0:55
Probability Distributions vs. Sampling Distributions
0:56
Same Logic
3:55
Logic of Probability Distribution
3:56
Example: Rolling Two Die
6:56
Simulating Samples
9:53
To Come Up with Probability Distributions
9:54
In Sampling Distributions
11:12
Connecting Sampling and Research Methods with Sampling Distributions
12:11
Connecting Sampling and Research Methods with Sampling Distributions
12:12
Simulating a Sampling Distribution
14:14
Experimental Design: Regular Sleep vs. Less Sleep
14:15
Logic of Sampling Distributions
23:08
Logic of Sampling Distributions
23:09
General Method of Simulating Sampling Distributions
25:38
General Method of Simulating Sampling Distributions
25:39
Questions that Remain
28:45
Questions that Remain
28:46
Example 1: Mean and Standard Error of Sampling Distribution
30:57
Example 2: What is the Best Way to Describe Sampling Distributions?
37:12
Example 3: Matching Sampling Distributions
38:21
Example 4: Mean and Standard Error of Sampling Distribution
41:51
Sampling Distribution of the Mean

1h 8m 48s

Intro
0:00
0:05
0:06
Special Case of General Method for Simulating a Sampling Distribution
1:53
Special Case of General Method for Simulating a Sampling Distribution
1:54
Computer Simulation
3:43
Using Simulations to See Principles behind Shape of SDoM
15:50
Using Simulations to See Principles behind Shape of SDoM
15:51
Conditions
17:38
Using Simulations to See Principles behind Center (Mean) of SDoM
20:15
Using Simulations to See Principles behind Center (Mean) of SDoM
20:16
Conditions: Does n Matter?
21:31
Conditions: Does Number of Simulation Matter?
24:37
Using Simulations to See Principles behind Standard Deviation of SDoM
27:13
Using Simulations to See Principles behind Standard Deviation of SDoM
27:14
Conditions: Does n Matter?
34:45
Conditions: Does Number of Simulation Matter?
36:24
Central Limit Theorem
37:13
SHAPE
38:08
CENTER
39:34
39:52
Comparing Population, Sample, and SDoM
43:10
Comparing Population, Sample, and SDoM
43:11
48:24
What Happens When We Don't Know What the Population Looks Like?
48:25
Can We Have Sampling Distributions for Summary Statistics Other than the Mean?
49:42
How Do We Know whether a Sample is Sufficiently Unlikely?
53:36
Do We Always Have to Simulate a Large Number of Samples in Order to get a Sampling Distribution?
54:40
Example 1: Mean Batting Average
55:25
Example 2: Mean Sampling Distribution and Standard Error
59:07
Example 3: Sampling Distribution of the Mean
1:01:04
Sampling Distribution of Sample Proportions

54m 37s

Intro
0:00
0:06
0:07
Intro to Sampling Distribution of Sample Proportions (SDoSP)
0:51
Categorical Data (Examples)
0:52
Wish to Estimate Proportion of Population from Sample…
2:00
Notation
3:34
Population Proportion and Sample Proportion Notations
3:35
What's the Difference?
9:19
SDoM vs. SDoSP: Type of Data
9:20
SDoM vs. SDoSP: Shape
11:24
SDoM vs. SDoSP: Center
12:30
15:34
Binomial Distribution vs. Sampling Distribution of Sample Proportions
19:14
Binomial Distribution vs. SDoSP: Type of Data
19:17
Binomial Distribution vs. SDoSP: Shape
21:07
Binomial Distribution vs. SDoSP: Center
21:43
24:08
Example 1: Sampling Distribution of Sample Proportions
26:07
Example 2: Sampling Distribution of Sample Proportions
37:58
Example 3: Sampling Distribution of Sample Proportions
44:42
Example 4: Sampling Distribution of Sample Proportions
45:57
Section 10: Inferential Statistics
Introduction to Confidence Intervals

42m 53s

Intro
0:00
0:06
0:07
Inferential Statistics
0:50
Inferential Statistics
0:51
Two Problems with This Picture…
3:20
Two Problems with This Picture…
3:21
Solution: Confidence Intervals (CI)
4:59
Solution: Hypotheiss Testing (HT)
5:49
Which Parameters are Known?
6:45
Which Parameters are Known?
6:46
Confidence Interval - Goal
7:56
When We Don't Know m but know s
7:57
When We Don't Know
18:27
When We Don't Know m nor s
18:28
Example 1: Confidence Intervals
26:18
Example 2: Confidence Intervals
29:46
Example 3: Confidence Intervals
32:18
Example 4: Confidence Intervals
38:31
t Distributions

1h 2m 6s

Intro
0:00
0:04
0:05
When to Use z vs. t?
1:07
When to Use z vs. t?
1:08
What is z and t?
3:02
z-score and t-score: Commonality
3:03
z-score and t-score: Formulas
3:34
z-score and t-score: Difference
5:22
Why not z? (Why t?)
7:24
Why not z? (Why t?)
7:25
But Don't Worry!
15:13
Gossett and t-distributions
15:14
Rules of t Distributions
17:05
t-distributions are More Normal as n Gets Bigger
17:06
t-distributions are a Family of Distributions
18:55
Degrees of Freedom (df)
20:02
Degrees of Freedom (df)
20:03
t Family of Distributions
24:07
t Family of Distributions : df = 2 , 4, and 60
24:08
df = 60
29:16
df = 2
29:59
How to Find It?
31:01
'Student's t-distribution' or 't-distribution'
31:02
Excel Example
33:06
Example 1: Which Distribution Do You Use? Z or t?
45:26
47:41
Example 3: t Distributions
52:15
Example 4: t Distributions , confidence interval, and mean
55:59
Introduction to Hypothesis Testing

1h 6m 33s

Intro
0:00
0:06
0:07
Issues to Overcome in Inferential Statistics
1:35
Issues to Overcome in Inferential Statistics
1:36
What Happens When We Don't Know What the Population Looks Like?
2:57
How Do We Know whether a sample is Sufficiently Unlikely
3:43
Hypothesizing a Population
6:44
Hypothesizing a Population
6:45
Null Hypothesis
8:07
Alternative Hypothesis
8:56
Hypotheses
11:58
Hypotheses
11:59
Errors in Hypothesis Testing
14:22
Errors in Hypothesis Testing
14:23
Steps of Hypothesis Testing
21:15
Steps of Hypothesis Testing
21:16
Single Sample HT ( When Sigma Available)
26:08
26:09
Step1
27:08
Step 2
27:58
Step 3
28:17
Step 4
32:18
Single Sample HT (When Sigma Not Available)
36:33
36:34
Step1: Hypothesis Testing
36:58
Step 2: Significance Level
37:25
Step 3: Decision Stage
37:40
Step 4: Sample
41:36
Sigma and p-value
45:04
Sigma and p-value
45:05
On tailed vs. Two Tailed Hypotheses
45:51
Example 1: Hypothesis Testing
48:37
Example 2: Heights of Women in the US
57:43
Example 3: Select the Best Way to Complete This Sentence
1:03:23
Confidence Intervals for the Difference of Two Independent Means

55m 14s

Intro
0:00
0:14
0:15
One Mean vs. Two Means
1:17
One Mean vs. Two Means
1:18
Notation
2:41
A Sample! A Set!
2:42
Mean of X, Mean of Y, and Difference of Two Means
3:56
SE of X
4:34
SE of Y
6:28
Sampling Distribution of the Difference between Two Means (SDoD)
7:48
Sampling Distribution of the Difference between Two Means (SDoD)
7:49
Rules of the SDoD (similar to CLT!)
15:00
Mean for the SDoD Null Hypothesis
15:01
Standard Error
17:39
When can We Construct a CI for the Difference between Two Means?
21:28
Three Conditions
21:29
Finding CI
23:56
One Mean CI
23:57
Two Means CI
25:45
Finding t
29:16
Finding t
29:17
Interpreting CI
30:25
Interpreting CI
30:26
Better Estimate of s (s pool)
34:15
Better Estimate of s (s pool)
34:16
Example 1: Confidence Intervals
42:32
Example 2: SE of the Difference
52:36
Hypothesis Testing for the Difference of Two Independent Means

50m

Intro
0:00
0:06
0:07
The Goal of Hypothesis Testing
0:56
One Sample and Two Samples
0:57
Sampling Distribution of the Difference between Two Means (SDoD)
3:42
Sampling Distribution of the Difference between Two Means (SDoD)
3:43
Rules of the SDoD (Similar to CLT!)
6:46
Shape
6:47
Mean for the Null Hypothesis
7:26
Standard Error for Independent Samples (When Variance is Homogenous)
8:18
Standard Error for Independent Samples (When Variance is not Homogenous)
9:25
Same Conditions for HT as for CI
10:08
Three Conditions
10:09
Steps of Hypothesis Testing
11:04
Steps of Hypothesis Testing
11:05
Formulas that Go with Steps of Hypothesis Testing
13:21
Step 1
13:25
Step 2
14:18
Step 3
15:00
Step 4
16:57
Example 1: Hypothesis Testing for the Difference of Two Independent Means
18:47
Example 2: Hypothesis Testing for the Difference of Two Independent Means
33:55
Example 3: Hypothesis Testing for the Difference of Two Independent Means
44:22
Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means

1h 14m 11s

Intro
0:00
0:09
0:10
The Goal of Hypothesis Testing
1:27
One Sample and Two Samples
1:28
Independent Samples vs. Paired Samples
3:16
Independent Samples vs. Paired Samples
3:17
Which is Which?
5:20
Independent SAMPLES vs. Independent VARIABLES
7:43
independent SAMPLES vs. Independent VARIABLES
7:44
T-tests Always…
10:48
T-tests Always…
10:49
Notation for Paired Samples
12:59
Notation for Paired Samples
13:00
Steps of Hypothesis Testing for Paired Samples
16:13
Steps of Hypothesis Testing for Paired Samples
16:14
Rules of the SDoD (Adding on Paired Samples)
18:03
Shape
18:04
Mean for the Null Hypothesis
18:31
Standard Error for Independent Samples (When Variance is Homogenous)
19:25
Standard Error for Paired Samples
20:39
Formulas that go with Steps of Hypothesis Testing
22:59
Formulas that go with Steps of Hypothesis Testing
23:00
Confidence Intervals for Paired Samples
30:32
Confidence Intervals for Paired Samples
30:33
Example 1: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means
32:28
Example 2: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means
44:02
Example 3: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means
52:23
Type I and Type II Errors

31m 27s

Intro
0:00
0:18
0:19
Errors and Relationship to HT and the Sample Statistic?
1:11
Errors and Relationship to HT and the Sample Statistic?
1:12
7:00
One Sample t-test: Friends on Facebook
7:01
Two Sample t-test: Friends on Facebook
13:46
Usually, Lots of Overlap between Null and Alternative Distributions
16:59
Overlap between Null and Alternative Distributions
17:00
How Distributions and 'Box' Fit Together
22:45
How Distributions and 'Box' Fit Together
22:46
Example 1: Types of Errors
25:54
Example 2: Types of Errors
27:30
Example 3: What is the Danger of the Type I Error?
29:38
Effect Size & Power

44m 41s

Intro
0:00
0:05
0:06
Distance between Distributions: Sample t
0:49
Distance between Distributions: Sample t
0:50
Problem with Distance in Terms of Standard Error
2:56
Problem with Distance in Terms of Standard Error
2:57
Test Statistic (t) vs. Effect Size (d or g)
4:38
Test Statistic (t) vs. Effect Size (d or g)
4:39
Rules of Effect Size
6:09
Rules of Effect Size
6:10
Why Do We Need Effect Size?
8:21
Tells You the Practical Significance
8:22
HT can be Deceiving…
10:25
Important Note
10:42
What is Power?
11:20
What is Power?
11:21
Why Do We Need Power?
14:19
Conditional Probability and Power
14:20
Power is:
16:27
Can We Calculate Power?
19:00
Can We Calculate Power?
19:01
How Does Alpha Affect Power?
20:36
How Does Alpha Affect Power?
20:37
How Does Effect Size Affect Power?
25:38
How Does Effect Size Affect Power?
25:39
How Does Variability and Sample Size Affect Power?
27:56
How Does Variability and Sample Size Affect Power?
27:57
How Do We Increase Power?
32:47
Increasing Power
32:48
Example 1: Effect Size & Power
35:40
Example 2: Effect Size & Power
37:38
Example 3: Effect Size & Power
40:55
Section 11: Analysis of Variance
F-distributions

24m 46s

Intro
0:00
0:04
0:05
Z- & T-statistic and Their Distribution
0:34
Z- & T-statistic and Their Distribution
0:35
F-statistic
4:55
The F Ration ( the Variance Ratio)
4:56
F-distribution
12:29
F-distribution
12:30
s and p-value
15:00
s and p-value
15:01
Example 1: Why Does F-distribution Stop At 0 But Go On Until Infinity?
18:33
Example 2: F-distributions
19:29
Example 3: F-distributions and Heights
21:29
ANOVA with Independent Samples

1h 9m 25s

Intro
0:00
0:05
0:06
The Limitations of t-tests
1:12
The Limitations of t-tests
1:13
Two Major Limitations of Many t-tests
3:26
Two Major Limitations of Many t-tests
3:27
Ronald Fisher's Solution… F-test! New Null Hypothesis
4:43
Ronald Fisher's Solution… F-test! New Null Hypothesis (Omnibus Test - One Test to Rule Them All!)
4:44
Analysis of Variance (ANoVA) Notation
7:47
Analysis of Variance (ANoVA) Notation
7:48
Partitioning (Analyzing) Variance
9:58
Total Variance
9:59
Within-group Variation
14:00
Between-group Variation
16:22
Time out: Review Variance & SS
17:05
Time out: Review Variance & SS
17:06
F-statistic
19:22
The F Ratio (the Variance Ratio)
19:23
S²bet = SSbet / dfbet
22:13
What is This?
22:14
How Many Means?
23:20
So What is the dfbet?
23:38
So What is SSbet?
24:15
S²w = SSw / dfw
26:05
What is This?
26:06
How Many Means?
27:20
So What is the dfw?
27:36
So What is SSw?
28:18
Chart of Independent Samples ANOVA
29:25
Chart of Independent Samples ANOVA
29:26
Example 1: Who Uploads More Photos: Unknown Ethnicity, Latino, Asian, Black, or White Facebook Users?
35:52
Hypotheses
35:53
Significance Level
39:40
Decision Stage
40:05
Calculate Samples' Statistic and p-Value
44:10
Reject or Fail to Reject H0
55:54
Example 2: ANOVA with Independent Samples
58:21
Repeated Measures ANOVA

1h 15m 13s

Intro
0:00
0:05
0:06
The Limitations of t-tests
0:36
Who Uploads more Pictures and Which Photo-Type is Most Frequently Used on Facebook?
0:37
ANOVA (F-test) to the Rescue!
5:49
Omnibus Hypothesis
5:50
Analyze Variance
7:27
Independent Samples vs. Repeated Measures
9:12
Same Start
9:13
Independent Samples ANOVA
10:43
Repeated Measures ANOVA
12:00
Independent Samples ANOVA
16:00
Same Start: All the Variance Around Grand Mean
16:01
Independent Samples
16:23
Repeated Measures ANOVA
18:18
Same Start: All the Variance Around Grand Mean
18:19
Repeated Measures
18:33
Repeated Measures F-statistic
21:22
The F Ratio (The Variance Ratio)
21:23
S²bet = SSbet / dfbet
23:07
What is This?
23:08
How Many Means?
23:39
So What is the dfbet?
23:54
So What is SSbet?
24:32
S² resid = SS resid / df resid
25:46
What is This?
25:47
So What is SS resid?
26:44
So What is the df resid?
27:36
SS subj and df subj
28:11
What is This?
28:12
How Many Subject Means?
29:43
So What is df subj?
30:01
So What is SS subj?
30:09
SS total and df total
31:42
What is This?
31:43
What is the Total Number of Data Points?
32:02
So What is df total?
32:34
so What is SS total?
32:47
Chart of Repeated Measures ANOVA
33:19
Chart of Repeated Measures ANOVA: F and Between-samples Variability
33:20
Chart of Repeated Measures ANOVA: Total Variability, Within-subject (case) Variability, Residual Variability
35:50
Example 1: Which is More Prevalent on Facebook: Tagged, Uploaded, Mobile, or Profile Photos?
40:25
Hypotheses
40:26
Significance Level
41:46
Decision Stage
42:09
Calculate Samples' Statistic and p-Value
46:18
Reject or Fail to Reject H0
57:55
Example 2: Repeated Measures ANOVA
58:57
Example 3: What's the Problem with a Bunch of Tiny t-tests?
1:13:59
Section 12: Chi-square Test
Chi-Square Goodness-of-Fit Test

58m 23s

Intro
0:00
0:05
0:06
Where Does the Chi-Square Test Belong?
0:50
Where Does the Chi-Square Test Belong?
0:51
A New Twist on HT: Goodness-of-Fit
7:23
HT in General
7:24
Goodness-of-Fit HT
8:26
12:17
Null Hypothesis
12:18
Alternative Hypothesis
13:23
Example
14:38
Chi-Square Statistic
17:52
Chi-Square Statistic
17:53
Chi-Square Distributions
24:31
Chi-Square Distributions
24:32
Conditions for Chi-Square
28:58
Condition 1
28:59
Condition 2
30:20
Condition 3
30:32
Condition 4
31:47
Example 1: Chi-Square Goodness-of-Fit Test
32:23
Example 2: Chi-Square Goodness-of-Fit Test
44:34
Example 3: Which of These Statements Describe Properties of the Chi-Square Goodness-of-Fit Test?
56:06
Chi-Square Test of Homogeneity

51m 36s

Intro
0:00
0:09
0:10
Goodness-of-Fit vs. Homogeneity
1:13
Goodness-of-Fit HT
1:14
Homogeneity
2:00
Analogy
2:38
5:00
Null Hypothesis
5:01
Alternative Hypothesis
6:11
Example
6:33
Chi-Square Statistic
10:12
Same as Goodness-of-Fit Test
10:13
Set Up Data
12:28
Setting Up Data Example
12:29
Expected Frequency
16:53
Expected Frequency
16:54
Chi-Square Distributions & df
19:26
Chi-Square Distributions & df
19:27
Conditions for Test of Homogeneity
20:54
Condition 1
20:55
Condition 2
21:39
Condition 3
22:05
Condition 4
22:23
Example 1: Chi-Square Test of Homogeneity
22:52
Example 2: Chi-Square Test of Homogeneity
32:10
Section 13: Overview of Statistics
Overview of Statistics

18m 11s

Intro
0:00
0:07
0:08
The Statistical Tests (HT) We've Covered
0:28
The Statistical Tests (HT) We've Covered
0:29
Organizing the Tests We've Covered…
1:08
One Sample: Continuous DV and Categorical DV
1:09
Two Samples: Continuous DV and Categorical DV
5:41
More Than Two Samples: Continuous DV and Categorical DV
8:21
The Following Data: OK Cupid
10:10
The Following Data: OK Cupid
10:11
Example 1: Weird-MySpace-Angle Profile Photo
10:38
Example 2: Geniuses
12:30
Example 3: Promiscuous iPhone Users
13:37
Example 4: Women, Aging, and Messaging
16:07
Bookmark & Share Embed

## Copy & Paste this embed code into your website’s HTML

Please ensure that your website editor is in text mode when you paste the code.
(In Wordpress, the mode button is on the top right corner.)
×
• - Allow users to view the embedded video in full-size.
Since this lesson is not free, only the preview will appear on your website.

• ## Related Books 0 answersPost by Priscila Silva on February 16, 2013I'm attending college, and I'm studying statistics this semester. I was desperate because so far, my grade is too low, and so are the grades of all the other students by what I've heard. I never had any problem with math or any other subject that I couldn't manage to put a little more effort in order to grasp the content. I knew the problem had to be the professor; the way she delivers the information is almost impossible for us to understand because it's all new to us. I never had statistics in my life. So far I thought it was the most terrible subject on Earth. I'm on my 6th week at college this semester and what I couldn't understand within 15 hours of lecture, I understood in 10 minutes here. I knew this website was fantastic!!! Thanks a lot! I will tell everybody in my class.

### Scatterplots

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

• Intro 0:00
• Previous Visualizations 0:30
• Frequency Distributions
• Compare & Contrast 2:26
• Frequency Distributions Vs. Scatterplots
• Summary Values 4:53
• Shape
• Center & Trend
• Univariate & Bivariate
• Example Scatterplot 10:48
• Shape, Trend, and Strength
• Positive and Negative Association 14:05
• Positive and Negative Association
• Linearity, Strength, and Consistency 18:30
• Linearity
• Strength
• Consistency
• Summarizing a Scatterplot 22:58
• Summarizing a Scatterplot
• Example 1: Gapminder.org, Income x Life Expectancy 26:32
• Example 2: Gapminder.org, Income x Infant Mortality 36:12
• Example 3: Trend and Strength of Variables 40:14
• Example 4: Trend, Strength and Shape for Scatterplots 43:27

### Transcription: Scatterplots

Hi and welcome to www.educator.com0000

Today we are going to be talking about scatterplots.0002

First we are going to talk about how scatterplots are different from previous visualization.0007

Because of that I will go over a little bit about what the previous visualizations all have in common.0013

Then I will compare and contrast this with scatterplots.0018

Finally we are going to go on to describing the different aspects of scatterplot versus other distributions we have been talking about before.0023

The previous distributions we are talking about have largely been about frequency distributions.0030

and here we are talking about the one continuous variable like height or age or number of friends on www.facebook.com.0039

We are asking how frequent is this value?0049

How frequent is that to have 200 friends on www.facebook.com?0052

How frequent is it to be 6 feet tall?0057

Now the frequency distribution looks like there are two variables because the x-axis and y-axis.0061

But it is one variable height and the frequency which is just another variable for counting how many you have.0069

We have looked at some cases where we compare two different variables, but usually it might be comparing two groups on some continuous variables.0080

We might compare male and female heights, but we are only looking at one continuous variable height.0091

The other variable is a categorical variable, the two groups.0099

That is how we get the two groups like gender.0103

Although we have that still the fundamental basis is that we only been looking at one continuous variable at a time.0107

Now these frequency distribution we have drawn like histograms and different beings, they often look like this is and they are summarized by shape, center, and spread.0116

Usually by center we mean something like mean and by spread we mean something like standard deviation.0129

Now that is going to be sort of the past.0140

Now we are moving on to scatterplots.0144

Here is how scatterplots are different.0148

Instead of having one continuous variable we have two continuous variables.0150

That is the big key difference.0155

And because of that one axis is going to have variable 1 and instead of putting frequency here we are going to put variable 2 here.0157

Some frequency distributions we had variable 1 here and we have that frequency of variable 1 here.0172

Notice here there is no explicit representation of frequency.0179

There is no number that we are planning to represent frequency and each axis is going to represent the variable.0182

We are summarizing these distributions by shape, strand and strength.0193

Here it can be called a scatterplot because each case is now going to appear as a dot.0201

Each case, for instance each person on www.facebook.com might have variable 1, number of friends, as well as variable 2.0210

How many photos they have uploaded?0220

Each dot represents one person, but 2 values within that person.0223

And because of that, it is called the scatterplot because it looked like somebody just scattered a bunch of dots on this graph.0232

It makes sense that it is called the scatterplot and notice that here it does not quite seem like we are really interested in the center of one dimension.0242

We are interested in the center of two dimensions.0254

Because of that trend is going to be sort of like center and strength is going to be a lot of like spread.0258

These concepts are concepts you have got about before but we are translating them from one dimension to 2 dimensions.0273

Spread used to be something like this but now we are talking about spread in 2 dimensions.0284

That is going to look a little bit different.0291

Let us talk about these particular summary values of shape, center, and spread.0296

Remember frequency distributions are always what we call univariate in terms of continuous variables.0301

It might have 2 variables but one will probably be categorical.0309

When we talk about shape we talked about being like unimodal, symmetric, asymptote.0315

Those are common features that we are looking for.0329

Is it normal, uniform?0332

Those are words that we are buzzed when we talked about shape in frequency distributions.0336

In scatterplot we are largely interested in putting different shapes.0342

One shape might be that the dots sort of fall in a line.0347

Is it a linear scatterplot?0353

Another potential distribution is that the dots might fall in a curve.0358

Is it curvilinear?0366

It is a shape that we are interested in is that there is just sort of no shape, and it sort of the love like or cloudlike.0370

This is not the cloud may be one way of thinking about it.0383

Those are different shapes that we are interested in.0391

How linear is it? Is it curvilinear, is it cloud?0397

The way we talked about center before was that we are interested in things like the mean, median, and mode.0404

A lot of times we used mean.0413

Either signified by mu or ex-bar , depending on whether you are in the population or sample.0415

Trend has sort of the same idea as center.0426

You could think about this as a version of center, except the center of two variables not just one.0432

Here it is not useful to have a center of just dot because that is what we had before.0444

It was like a particular point, but now what we are interested in it is let us say we have a whole bunch of dot scattered here.0457

What we might be more interested in is a line of some sort that describes the relationship between all these points on these two variables.0466

Here we are not just interested in a pointcenter, we are interested in a line center.0478

I’m goingto adjust these to be lines center rather than a point center.0486

That line is going to be called the official term for that is regression.0493

That is going to be the regression line.0500

The final idea that we talk about here is strength and I want to tie that to the idea of spread that we talked about.0504

One important idea of spread that we frequency talked about with standard deviation expressed as sigma or expressed as S.0517

Those are two ways we have talked about spread before and that gives you a one-dimensional spread,0536

but what might be more useful here is something like two-dimensional spread.0542

Here we have our dots, we have our line, but now we are interested in how spread out these dots are.0549

You could think about it at all these little distances away from the line.0560

What is that is spread away from the line like?0568

You want to think of this as a multidimensional spread.0573

It is not just the one-dimensional spread, it is a two-dimensional spread.0576

Before this was spread around the points, but now it is spread around the whole line.0591

We are going to call that correlation.0598

Is the very strong it means it hugs that line really closely.0602

That is a strong correlation where it hugs it closely.0607

A weak correlation means it is hazy like it is far out and spread out from that line and a moderate correlation is there is a little bit of spread, but not too much.0611

And all of that has changed because now we are talking about by variate distribution.0626

And what we are talking about bivariate data we are no longer just interested in points and spread around the point, we are interested in things like lines.0632

Here is an example of scatter plots.0651

Remember that the data that we have looked at in the past with 100 friends on www.facebook.com0653

and we wanted to look at whether the number of friends people has correspond to the year of birth.0661

Now this is not saying that there are lots of people necessarily born in 1997.0669

This is not what this means, it means that this dot is actually a particular person.0677

It is a case of one person and this means that this one person was born in 1993, but they also have a inordinate number friends.0685

They have like 1900 friends.0712

This scatter plot means that you cannot interpret this as being a very popular year to be born anymore.0717

Now you have to say this particular person was born in this year and has this number friends.0724

If you look at another point like this one like here they have very few friends but they are born in 1978 or so.0732

Right and one thing you might notice about this is that there is sort of the shape here.0744

It seems to rise on those.0750

We drew some sort of line that would cut this and maybe would be aligned like this where these people as you see the year of birth increased, these people are younger.0756

They were born close in history.0777

They seem to have more friends.0781

If you drew a curve that might be better where it seems like the people born in 1985 they have less number of friends than the people who are born afterwards.0784

That seems to shoot out more.0797

This is an example of a scatter plot and this lines are example of rough lines that might be regression lines.0802

Lines that fall in the middle of all these points.0811

It is where these points are roughly below it.0815

These points are roughly below it.0819

And if you count all the distances up that will average that line.0821

Let us think about that as the trend and the strength is that.0828

It seems like a matter of strength.0836

It is not hugging the line quite closely but it is not just a plot either.0838

Usually we do not plot things by birth and sometimes we do but you could easily change the year of birth or age by just using 2011- whatever the year of birth was.0849

Here I have age plotted on the x axis and here I have year of birth plotted on the x axis.0863

On the y axis on both of this box it is the number of friends on www.facebook.com.0871

These are scatter plots and you could know that just by looking at these variables.0875

If one of them says frequency and you know it is not a scatter plot.0881

If they are both variables then you know it is a scatter plot.0885

Here we see this positive association.0889

The higher the year of birth, the higher the number of friends.0894

As one variable gets greater the other variable increase and vice versa.0900

As one variable gets less, the year of birth gets less and the number of friends seems to be quite low.0908

On the other hand, if you look at this graph we see and exact opposite trend where as age goes up the number of friends come down.0917

They are moving in an opposite direction as the other one goes up the other one goes down and vice versa as you go this way on the x axis then you will see friends going up.0933

This is what we call a positive association where the variables are couple to each other in the same direction.0952

When we plot age instead of birth we see a negative association.0972

When a negative association is going up or down in a way that they are opposite to each to other because it means opposite.0980

As long as the other one goes up, the other one goes down.0992

It is important to know that these are just associations.0997

It is not that the year of birth is causing them to have more friends.1000

Maybe there are some other variables that matters like when you are introduced to www.facebook.com, something like that.1005

How comfortable you are using the computer?1016

Just having a positive association does not mean that it is a causal association and that is where you get that instinct correlation does not equal to causation.1019

Because a matter of association is also correlation.1029

Just because you have this nice association either positive or negative it does not mean that it causes the other.1035

Let us think about why year of birth has an opposite effect of age.1046

It has the opposite association with friends on www.facebook.com.1052

If you think about year of birth, it means that when you are increasing the year of birth you are decreasing age.1058

These 2 variables actually have a perfect negative association.1072

As one goes up, the other one goes down.1083

If your birth goes up 1994 or 1991, 2000 the age is going down and down.1085

That is what we call a negative association.1093

It is not really that one is causing the other but is the same idea.1097

They are perfectly negative associated.1102

That is all correlation association just not equal causation.1105

Here are some examples of some scatter plots that you might see.1113

There are a couple of few different concepts that I will go over just one of these ideas of linearity.1119

It is going to be very important to us and linearity is just going to talk about how to connect the line.1126

I want you to know this distinction between linear and curvilinear.1137

When you think of strength, I want you to think of it as if we are talking about spread.1156

It will just come in your mind as spread so if we have these dots and a little bit of spread around the line.1167

You could think of it as a couple of distance away from that line.1183

First if you have something like this that was much more widely spread around this line.1191

There is a lot more spread going around here and if I added a few more spread around here, I will have even more spread.1208

I want you think of it as strong, moderate, weak.1229

You can think of strength in those terms.1238

Finally, I want to introduce these concepts of consistency that we not have been talking much until now.1241

Consistency just means how consistent is that strength.1247

Is it strong all the way through?1254

Is it weak all the way through?1256

Or is it inconsistent?1257

Example graph looks something like this.1259

This starts off looking very linear but then down here they might be more variability.1263

Here you could see that if we drew a regression line here we have a very little spread but here we have a lot of spread.1276

This would be inconsistent.1289

It might be constant spread versus inconsistent.1295

An example of constant will be something like this.1305

It is pretty constant, this one is less constant because there is this peak right here but here there is less variability.1309

This is an example of constant and this is an example of inconsistent.1319

You want to think of this consistency as a point of strength.1324

Is it consistent all the way through or is it different all the time?1328

Just to point out something, in all these graphs that are drawn here like coincidence, I have drawn a negative association1333

because there is one variable as we look at values that are greater one of the variables is consistent here.1342

These variables seems to be down low the values right here and these are all examples of negative association.1352

A long easy way you could visually see this is that it all have these negative strength where strength is pointing this way instead of that way.1362

Let us think about how to summarize a scatter plot.1385

It seems to have a different feature of it but here I use rock around a scatter plot.1388

It will be distributed in to 5 steps so that it will easier for us to knock through all of them.1396

First thing you want to do is identify the cases and the variables.1404

Oftentimes people look at a scatter plot and they see the shape, is it a line.1408

Then they forget what the dots are.1412

It seems like seeing the force but forgetting what the trees are.1413

First thing you want to do is identify what the traces are and then identify what the variables are so that you know what you are talking about.1419

Then you want to describe the overall shape and talk about the linearity if there are any clusters you want to identify those.1428

If there are any outliers you want to be able to identify those as well.1438

Then you want to describe the trend or you could think of this as the positive and negative association.1443

The strength and the positive side or strength in the negative side.1451

You could think of this as going that way and that way.1456

Step 4 is to describe the strength.1464

The way you could just think of it as borrowing your strong, median, or moderate, or weak, but we will talk about exactly how to do that later.1466

Any potential explanations for this relationship.1477

This is just an extra step.1481

Sometimes you might not need this but it is often helpful to do and it is critical that you remember not always causal.1485

They might be a causal relationship but not always.1498

It gives us potential explanations that are not causal.1501

One might they have this positive association.1507

They might be these 2 variables have these negative association.1510

That is going to be important for us to figure out.1514

But it is good for us to think about maybe it is causal but maybe it is not.1519

Those are harder to think of sometimes because you jump to the causal explanation.1526

One thing that might be the case is some third variable that explains these relationship and it might not be these 2 variables that are important.1532

Final thing is when we describe the trend but now we are just going to describe it in a sort of overall linear.1545

We are going to learn how to describe it in a way precise state of manner.1556

When we do that, that is going to be called finding the regression line.1561

In that way is it going to be the equation of that line.1570

We are also going to describe the strength roughly but later we are going to find precise quantitative values for strength and that is going to be called correlation.1576

Let us move on to some more examples.1594

First here is a graph and what I want you to do is just go thought those 5 steps of summarizing a scatter plot.1597

Remember it is to describe the cases and the variables.1608

I’m going to introduce you to the thing of dotminded.org.1625

It is a beautiful website that puts together these different data bases that are interesting definition from all over the world1628

and puts it in beautiful graphs so that we can look at the data in new and very interesting way.1638

Here you could go to dotminded.org if you want as I already pulled it out on my browser.1646

You want to clip in www.dotminded world, I actually cooked some helpful recognition.1658

If you want to follow along you could do that too.1671

I want to show you this graph and show what we have on the bottom is income per person.1673

GDP/capital.1681

This is the entire amount of money that the economy of that mission makes divided by how many people are part of that mission.1683

That is income per person and x axis.1697

Notice that it is in log form but it means that it is spread out and the higher numbers is squished together because they have taken the log of the income per person.1701

Also here we have life expectancy so that is the average number of years that people live in this country.1716

Step 1, what are the cases?1726

These cases actually represent countries and if you put your cursor over these dots it will tell you what country it is.1728

In www.dotminder, one nice thing is that you know the population of that country just by the size of the dots.1738

All the dots are different sizes.1746

Here it tells you the geographic regions.1751

Yellow is the Americas.1754

Red is East Asia.1757

Violet is Africa.1761

Green is Middle East and Northern Africa.1765

Orange is Europe and Central Asia.1769

You could probably guess what this big one is, China and also India.1773

These are the big circles.1780

If you want to find the United States it is a yellow country and quite roughly.1781

We live quite a long time.1787

That is the United States.1790

If you look behind it there is Singapore which is a very small country but they are very rich and high GDP/capita.1792

High income per person but also high in life expectancy.1801

India is also in the middle of the plot and it is median in terms of income but also median in terms life expectancy.1807

One thing you might notice is Africa is clustered down here or maybe these countries have relatively lower income per person.1818

Also relatively low life expectancy compared to these other countries.1830

You could also see that Europe is clustered down here.1834

America is up in there.1840

Asia is also up in the higher end of this.1843

Immediately you see this positive association.1850

You see this positive association and it seems roughly linear or maybe a little bit curve but roughly linear.1855

Another wonderful thing about www.dotminded.org that we will be talking that much today is that it has this data from 1800 all the way to 2000.1857

If you hit this point button, it will play for you how this scatter plot came about over time.1862

You see a lot of countries started off with a very low population numbers.1884

All the dots are relatively small.1891

But the dots grow our GDP will grow and at the same time our life expectancy will grow.1893

You will see that European countries are hot of the pack.1900

Africa is down here.1906

China is growing faster and faster in terms of life expectancy and the GDP is catching up with it.1908

Finally we end up with 2009 which apparently when this data goes up.1916

Another thing that you could do with this visualization is that you can pick a particular country that you might be interested in.1922

Let us say we are very interested in Azerbaijan and we can look at just how Azerbaijan changes over time and1934

it will keep on running track of how Azerbaijan is growing in terms of their GDP as well as their life expectancy.1942

This is just a really wonderful graph and you could do a lot of different kinds of variables for these nations.1953

But let us answer our five questions and back to more statistics things.1962

The first thing is that these are nations that are represented and it is their income per person, by life expectancy.1967

That is the first thing.1987

We have figured out what the cases are and what the variables are.1991

The second thing is the shape.1994

These are roughly linear and maybe a little bit curved.1997

We have seen some clusters of these geographic clusters, but not really in terms of the actual of that.2002

If you thought of these as just blacked out and would it would create roughly this line.2008

Let us talk about the trend.2016

The trend definitely seems to be a positive association, so as GDP is greater, life expectancy is also greater.2021

As GDP is lower, life expectancy is also lower.2033

The rate in the positive way, not opposite meaning.2038

Let us talk about spread and we can imagine a line and see some of the spread and maybe this is a moderate spread2042

and it is not that you see a perfect line, but it is not like so spread out you cannot see the line either.2054

Maybe moderate might be a good answer for that.2059

And number five let us think of why that must be.2063

We want to consider a couple of things when we think about the the relationship between these two variables.2067

We want to think about how variable 1 might impact variable 2.2077

We also want to think about how variable 2 might impact variable 1.2082

Finally, we might want to think about how some third variable, variable 3, the mystery variable might impact both 1 and 2.2090

We might think if you have a higher income per person, if you are a richer country you have better health care, better facility, better sanitation.2101

You might have greater life expectancy.2113

Also if you have greater life expectancy, you could invest in more education and more long-term things, and because of that might increase income per person.2116

You might be able to share more cultural capital.2124

Who knows right?2133

There might be some third variable.2133

Maybe government that governs so well that you have this great income also have good health care or other things to have great life expectancy.2135

It might be the different things or maybe like some countries have more.2146

If you have a lot of work in your economy sectors, but also life expectancy suffers.2150

That is like a third variable.2157

It might be a whole bunch of different things.2160

It is such a long answer but we have not write it down.2164

You want to think about all the different ways that they might impact each other.2166

Here we have almost the same idea, but now we plotted income per person by infant mortality and infant mortality is the rate of infant deaths.2175

Infants are counted as children under the age of one.2188

How many infant deaths you have per 1000 births?2193

Here we see immediately that it seems pretty linear.2200

It seems to hug the line pretty closely.2206

This seems to be a negative association.2217

Remember negative in this case means opposite.2223

As the other one goes up, the other one goes in the opposite direction.2227

As income goes up, infant mortality goes down.2231

Countries that are very wealthy have very low rates of infant mortality.2234

They are not losing a lot of infants.2239

Countries that are more poor where their income is very low they have a higher rate of infant mortality.2244

So that is what we think of a negative association.2253

Sorry I skipped to that the three.2256

Step 2 is shape and I am going to write just linear because this seems to be even more linear than before.2259

Let us say this is moderate to strong because you could clearly see that line and let us think about why there might be the negative association.2267

You could think of infant mortality as the opposite of life expectancy.2283

Life expectancy, the greater the number of the sort of better the health.2289

Infant mortality, the lower the number the better the health.2294

Those two ideas are negatively associated with each other.2299

It makes sense that they would have the exact opposite relationship to income per person.2304

Once again we might want to think about how variable 1 impacts variable 2.2312

How variable 2 impacts variable 1?2320

And also how third variable might impact both one and two.2323

Income might be to better healthcare, better prenatal care and that might to better mortality.2328

Also having less infant mortality might somehow help the economy.2339

For instance, having a growth in the population often helps economies grow in their force to get more jobs and2350

have more things to serve more people but also there might be a third variable again, like war where there are times2357

or disease and that might reduce infant mortality and income per person.2366

Those kind of things might be much more what you want to think about for answer number five.2373

I think this dotminder.org data set is just really interesting.2380

You might want to play with the different kinds of axis that you can actually create and you could create these wonderful scatter plot from real data from the world.2386

You could get women's education and you can look at other aspects of the economy or health or public policy.2400

You can go to war, you would get a whole bunch of different things.2410

Example 3, we are back to sort of more mundane kinds of statistics problems and we expect that these variables to have a positive or negative relationship or trends.2416

Would you expect weak strength or strong strength.2430

Well, these are sort of online and let us think about the case of chicken eggs.2433

For each chicken eggs they each have a length and a width.2440

By length, let us talk about being the axis of being the elongation.2443

That would be like this length and then for the chicken egg this is the width.2451

With these have a positive association or a negative association.2459

Well, I imagine that chicken eggs you might have better chicken eggs, or small chicken eggs.2464

There might be chicken eggs that are skinny or fat, or larger version.2471

Imagine that length and width are sort of positively associated.2480

I would probably expect maybe a strong strength because the nice thing with chicken eggs in sort of like one shape.2486

I would imagine that this have each other closely.2496

Let us talk about US cars.2502

If our cases where US cars, but with those with weights and gas mileage look like.2504

Most used cars are the Hummer, that is like very very strong on weight.2520

If we put weight here that might be way up here on the weight part.2527

In terms of gas mileage, the Hummer is not so great.2532

It has relatively have low gas mileage.2537

Whereas like really like tiny little cars they are less weight, but maybe they have greater gas mileage.2540

Maybe we would see something like this.2550

If we plotted all the US cars and their gas mileage and here we see something like the negative relationship.2558

They do the opposite of one another as weight goes up gas mileage goes down.2569

As weight goes down gas mileage goes up.2574

I’m not sure how strong the correlation that may be.2578

Maybe all the way to moderate or moderate to strong.2582

Maybe strong, just because I can imagine if you are putting more weight because your car is heavy then that might bring down your gas mileage.2588

That might be a strong connection there, but I’m not really sure.2599

I'm going with moderate or moderate to strong.2602

Example 4, join the trend, the strength and the shape for the following scatter plot.2610

Let us also threw in consistency just for ourselves.2617

And seems pretty linear and pretty positive.2627

A strong pretty strong.2636

It seems very constant that the spread is constant down here and here.2640

It is pretty constant throughout.2647

This one looks like a pretty weak strength.2650

I do not know if it is linear.2658

It looks like a cloud to me and you can draw line but it is really weak association.2660

The trend, it seems more positive than negative because at least there is more up here than down here.2670

And here is that with negative we see more here and here and if it is consistent it is pretty spread out here, but maybe a little bit less spread out here.2682

Maybe sort of consistent, but a little inconsistent.2696

Another one here these one looks definitely starts to curve to me.2707

This seems curvilinear.2712

It also seems like a positive association because as x goes up y goes down.2715

It seems pretty strong but maybe a little bit inconsistent because here it seems stronger and get a little bit less down here.2722

It is sort of consistent to me.2733

I can also see a curve but I also see the line going curve too.2745

When I see curved but now this is definitely a negative relationship because the low axis have high y.2750

And the high axis have low y.2764

It seems moderate or strong.2771

Moderate to moderate to strong and is pretty consistent.2778

I will go with constant.2789

Here we have a quite linear negative and it seems pretty strong too.2793

Let us go with strong and noticed that these are very light.2812

I’m sort of eyeballing it and it seems consistent.2817

This is just a way to just help us just eyeball these little bit better.2824

Get used to seeing them.2829

Get used to seeing some of these features very quickly.2831

That is scatter plot.2834

Thanks for using www.educator.com2838

OR

### Start Learning Now

Our free lessons will get you started (Adobe Flash® required).