Start learning today, and be successful in your academic & professional career. Start Today!

• ## Related Books

 0 answersPost by Priscila Silva on February 16, 2013I'm attending college, and I'm studying statistics this semester. I was desperate because so far, my grade is too low, and so are the grades of all the other students by what I've heard. I never had any problem with math or any other subject that I couldn't manage to put a little more effort in order to grasp the content. I knew the problem had to be the professor; the way she delivers the information is almost impossible for us to understand because it's all new to us. I never had statistics in my life. So far I thought it was the most terrible subject on Earth. I'm on my 6th week at college this semester and what I couldn't understand within 15 hours of lecture, I understood in 10 minutes here. I knew this website was fantastic!!! Thanks a lot! I will tell everybody in my class.

### Scatterplots

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

• Intro 0:00
• Previous Visualizations 0:30
• Frequency Distributions
• Compare & Contrast 2:26
• Frequency Distributions Vs. Scatterplots
• Summary Values 4:53
• Shape
• Center & Trend
• Univariate & Bivariate
• Example Scatterplot 10:48
• Shape, Trend, and Strength
• Positive and Negative Association 14:05
• Positive and Negative Association
• Linearity, Strength, and Consistency 18:30
• Linearity
• Strength
• Consistency
• Summarizing a Scatterplot 22:58
• Summarizing a Scatterplot
• Example 1: Gapminder.org, Income x Life Expectancy 26:32
• Example 2: Gapminder.org, Income x Infant Mortality 36:12
• Example 3: Trend and Strength of Variables 40:14
• Example 4: Trend, Strength and Shape for Scatterplots 43:27

### Transcription: Scatterplots

Hi and welcome to www.educator.com0000

Today we are going to be talking about scatterplots.0002

First we are going to talk about how scatterplots are different from previous visualization.0007

Because of that I will go over a little bit about what the previous visualizations all have in common.0013

Then I will compare and contrast this with scatterplots.0018

Finally we are going to go on to describing the different aspects of scatterplot versus other distributions we have been talking about before.0023

The previous distributions we are talking about have largely been about frequency distributions.0030

and here we are talking about the one continuous variable like height or age or number of friends on www.facebook.com.0039

We are asking how frequent is this value?0049

How frequent is that to have 200 friends on www.facebook.com?0052

How frequent is it to be 6 feet tall?0057

Now the frequency distribution looks like there are two variables because the x-axis and y-axis.0061

But it is one variable height and the frequency which is just another variable for counting how many you have.0069

We have looked at some cases where we compare two different variables, but usually it might be comparing two groups on some continuous variables.0080

We might compare male and female heights, but we are only looking at one continuous variable height.0091

The other variable is a categorical variable, the two groups.0099

That is how we get the two groups like gender.0103

Although we have that still the fundamental basis is that we only been looking at one continuous variable at a time.0107

Now these frequency distribution we have drawn like histograms and different beings, they often look like this is and they are summarized by shape, center, and spread.0116

Usually by center we mean something like mean and by spread we mean something like standard deviation.0129

Now that is going to be sort of the past.0140

Now we are moving on to scatterplots.0144

Here is how scatterplots are different.0148

Instead of having one continuous variable we have two continuous variables.0150

That is the big key difference.0155

And because of that one axis is going to have variable 1 and instead of putting frequency here we are going to put variable 2 here.0157

Some frequency distributions we had variable 1 here and we have that frequency of variable 1 here.0172

Notice here there is no explicit representation of frequency.0179

There is no number that we are planning to represent frequency and each axis is going to represent the variable.0182

We are summarizing these distributions by shape, strand and strength.0193

Here it can be called a scatterplot because each case is now going to appear as a dot.0201

Each case, for instance each person on www.facebook.com might have variable 1, number of friends, as well as variable 2.0210

How many photos they have uploaded?0220

Each dot represents one person, but 2 values within that person.0223

And because of that, it is called the scatterplot because it looked like somebody just scattered a bunch of dots on this graph.0232

It makes sense that it is called the scatterplot and notice that here it does not quite seem like we are really interested in the center of one dimension.0242

We are interested in the center of two dimensions.0254

Because of that trend is going to be sort of like center and strength is going to be a lot of like spread.0258

These concepts are concepts you have got about before but we are translating them from one dimension to 2 dimensions.0273

Spread used to be something like this but now we are talking about spread in 2 dimensions.0284

That is going to look a little bit different.0291

Let us talk about these particular summary values of shape, center, and spread.0296

Remember frequency distributions are always what we call univariate in terms of continuous variables.0301

It might have 2 variables but one will probably be categorical.0309

When we talk about shape we talked about being like unimodal, symmetric, asymptote.0315

Those are common features that we are looking for.0329

Is it normal, uniform?0332

Those are words that we are buzzed when we talked about shape in frequency distributions.0336

In scatterplot we are largely interested in putting different shapes.0342

One shape might be that the dots sort of fall in a line.0347

Is it a linear scatterplot?0353

Another potential distribution is that the dots might fall in a curve.0358

Is it curvilinear?0366

It is a shape that we are interested in is that there is just sort of no shape, and it sort of the love like or cloudlike.0370

This is not the cloud may be one way of thinking about it.0383

Those are different shapes that we are interested in.0391

How linear is it? Is it curvilinear, is it cloud?0397

The way we talked about center before was that we are interested in things like the mean, median, and mode.0404

A lot of times we used mean.0413

Either signified by mu or ex-bar , depending on whether you are in the population or sample.0415

Trend has sort of the same idea as center.0426

You could think about this as a version of center, except the center of two variables not just one.0432

Here it is not useful to have a center of just dot because that is what we had before.0444

It was like a particular point, but now what we are interested in it is let us say we have a whole bunch of dot scattered here.0457

What we might be more interested in is a line of some sort that describes the relationship between all these points on these two variables.0466

Here we are not just interested in a pointcenter, we are interested in a line center.0478

I’m goingto adjust these to be lines center rather than a point center.0486

That line is going to be called the official term for that is regression.0493

That is going to be the regression line.0500

The final idea that we talk about here is strength and I want to tie that to the idea of spread that we talked about.0504

One important idea of spread that we frequency talked about with standard deviation expressed as sigma or expressed as S.0517

Those are two ways we have talked about spread before and that gives you a one-dimensional spread,0536

but what might be more useful here is something like two-dimensional spread.0542

Here we have our dots, we have our line, but now we are interested in how spread out these dots are.0549

You could think about it at all these little distances away from the line.0560

What is that is spread away from the line like?0568

You want to think of this as a multidimensional spread.0573

It is not just the one-dimensional spread, it is a two-dimensional spread.0576

Before this was spread around the points, but now it is spread around the whole line.0591

We are going to call that correlation.0598

Is the very strong it means it hugs that line really closely.0602

That is a strong correlation where it hugs it closely.0607

A weak correlation means it is hazy like it is far out and spread out from that line and a moderate correlation is there is a little bit of spread, but not too much.0611

And all of that has changed because now we are talking about by variate distribution.0626

And what we are talking about bivariate data we are no longer just interested in points and spread around the point, we are interested in things like lines.0632

Here is an example of scatter plots.0651

Remember that the data that we have looked at in the past with 100 friends on www.facebook.com0653

and we wanted to look at whether the number of friends people has correspond to the year of birth.0661

Now this is not saying that there are lots of people necessarily born in 1997.0669

This is not what this means, it means that this dot is actually a particular person.0677

It is a case of one person and this means that this one person was born in 1993, but they also have a inordinate number friends.0685

They have like 1900 friends.0712

This scatter plot means that you cannot interpret this as being a very popular year to be born anymore.0717

Now you have to say this particular person was born in this year and has this number friends.0724

If you look at another point like this one like here they have very few friends but they are born in 1978 or so.0732

Right and one thing you might notice about this is that there is sort of the shape here.0744

It seems to rise on those.0750

We drew some sort of line that would cut this and maybe would be aligned like this where these people as you see the year of birth increased, these people are younger.0756

They were born close in history.0777

They seem to have more friends.0781

If you drew a curve that might be better where it seems like the people born in 1985 they have less number of friends than the people who are born afterwards.0784

That seems to shoot out more.0797

This is an example of a scatter plot and this lines are example of rough lines that might be regression lines.0802

Lines that fall in the middle of all these points.0811

It is where these points are roughly below it.0815

These points are roughly below it.0819

And if you count all the distances up that will average that line.0821

Let us think about that as the trend and the strength is that.0828

It seems like a matter of strength.0836

It is not hugging the line quite closely but it is not just a plot either.0838

Usually we do not plot things by birth and sometimes we do but you could easily change the year of birth or age by just using 2011- whatever the year of birth was.0849

Here I have age plotted on the x axis and here I have year of birth plotted on the x axis.0863

On the y axis on both of this box it is the number of friends on www.facebook.com.0871

These are scatter plots and you could know that just by looking at these variables.0875

If one of them says frequency and you know it is not a scatter plot.0881

If they are both variables then you know it is a scatter plot.0885

Here we see this positive association.0889

The higher the year of birth, the higher the number of friends.0894

As one variable gets greater the other variable increase and vice versa.0900

As one variable gets less, the year of birth gets less and the number of friends seems to be quite low.0908

On the other hand, if you look at this graph we see and exact opposite trend where as age goes up the number of friends come down.0917

They are moving in an opposite direction as the other one goes up the other one goes down and vice versa as you go this way on the x axis then you will see friends going up.0933

This is what we call a positive association where the variables are couple to each other in the same direction.0952

When we plot age instead of birth we see a negative association.0972

When a negative association is going up or down in a way that they are opposite to each to other because it means opposite.0980

As long as the other one goes up, the other one goes down.0992

It is important to know that these are just associations.0997

It is not that the year of birth is causing them to have more friends.1000

Maybe there are some other variables that matters like when you are introduced to www.facebook.com, something like that.1005

How comfortable you are using the computer?1016

Just having a positive association does not mean that it is a causal association and that is where you get that instinct correlation does not equal to causation.1019

Because a matter of association is also correlation.1029

Just because you have this nice association either positive or negative it does not mean that it causes the other.1035

Let us think about why year of birth has an opposite effect of age.1046

It has the opposite association with friends on www.facebook.com.1052

If you think about year of birth, it means that when you are increasing the year of birth you are decreasing age.1058

These 2 variables actually have a perfect negative association.1072

As one goes up, the other one goes down.1083

If your birth goes up 1994 or 1991, 2000 the age is going down and down.1085

That is what we call a negative association.1093

It is not really that one is causing the other but is the same idea.1097

They are perfectly negative associated.1102

That is all correlation association just not equal causation.1105

Here are some examples of some scatter plots that you might see.1113

There are a couple of few different concepts that I will go over just one of these ideas of linearity.1119

It is going to be very important to us and linearity is just going to talk about how to connect the line.1126

I want you to know this distinction between linear and curvilinear.1137

When you think of strength, I want you to think of it as if we are talking about spread.1156

It will just come in your mind as spread so if we have these dots and a little bit of spread around the line.1167

You could think of it as a couple of distance away from that line.1183

First if you have something like this that was much more widely spread around this line.1191

There is a lot more spread going around here and if I added a few more spread around here, I will have even more spread.1208

I want you think of it as strong, moderate, weak.1229

You can think of strength in those terms.1238

Finally, I want to introduce these concepts of consistency that we not have been talking much until now.1241

Consistency just means how consistent is that strength.1247

Is it strong all the way through?1254

Is it weak all the way through?1256

Or is it inconsistent?1257

Example graph looks something like this.1259

This starts off looking very linear but then down here they might be more variability.1263

Here you could see that if we drew a regression line here we have a very little spread but here we have a lot of spread.1276

This would be inconsistent.1289

It might be constant spread versus inconsistent.1295

An example of constant will be something like this.1305

It is pretty constant, this one is less constant because there is this peak right here but here there is less variability.1309

This is an example of constant and this is an example of inconsistent.1319

You want to think of this consistency as a point of strength.1324

Is it consistent all the way through or is it different all the time?1328

Just to point out something, in all these graphs that are drawn here like coincidence, I have drawn a negative association1333

because there is one variable as we look at values that are greater one of the variables is consistent here.1342

These variables seems to be down low the values right here and these are all examples of negative association.1352

A long easy way you could visually see this is that it all have these negative strength where strength is pointing this way instead of that way.1362

Let us think about how to summarize a scatter plot.1385

It seems to have a different feature of it but here I use rock around a scatter plot.1388

It will be distributed in to 5 steps so that it will easier for us to knock through all of them.1396

First thing you want to do is identify the cases and the variables.1404

Oftentimes people look at a scatter plot and they see the shape, is it a line.1408

Then they forget what the dots are.1412

It seems like seeing the force but forgetting what the trees are.1413

First thing you want to do is identify what the traces are and then identify what the variables are so that you know what you are talking about.1419

Then you want to describe the overall shape and talk about the linearity if there are any clusters you want to identify those.1428

If there are any outliers you want to be able to identify those as well.1438

Then you want to describe the trend or you could think of this as the positive and negative association.1443

The strength and the positive side or strength in the negative side.1451

You could think of this as going that way and that way.1456

Step 4 is to describe the strength.1464

The way you could just think of it as borrowing your strong, median, or moderate, or weak, but we will talk about exactly how to do that later.1466

Any potential explanations for this relationship.1477

This is just an extra step.1481

Sometimes you might not need this but it is often helpful to do and it is critical that you remember not always causal.1485

They might be a causal relationship but not always.1498

It gives us potential explanations that are not causal.1501

One might they have this positive association.1507

They might be these 2 variables have these negative association.1510

That is going to be important for us to figure out.1514

But it is good for us to think about maybe it is causal but maybe it is not.1519

Those are harder to think of sometimes because you jump to the causal explanation.1526

One thing that might be the case is some third variable that explains these relationship and it might not be these 2 variables that are important.1532

Final thing is when we describe the trend but now we are just going to describe it in a sort of overall linear.1545

We are going to learn how to describe it in a way precise state of manner.1556

When we do that, that is going to be called finding the regression line.1561

In that way is it going to be the equation of that line.1570

We are also going to describe the strength roughly but later we are going to find precise quantitative values for strength and that is going to be called correlation.1576

Let us move on to some more examples.1594

First here is a graph and what I want you to do is just go thought those 5 steps of summarizing a scatter plot.1597

Remember it is to describe the cases and the variables.1608

I’m going to introduce you to the thing of dotminded.org.1625

It is a beautiful website that puts together these different data bases that are interesting definition from all over the world1628

and puts it in beautiful graphs so that we can look at the data in new and very interesting way.1638

Here you could go to dotminded.org if you want as I already pulled it out on my browser.1646

You want to clip in www.dotminded world, I actually cooked some helpful recognition.1658

If you want to follow along you could do that too.1671

I want to show you this graph and show what we have on the bottom is income per person.1673

GDP/capital.1681

This is the entire amount of money that the economy of that mission makes divided by how many people are part of that mission.1683

That is income per person and x axis.1697

Notice that it is in log form but it means that it is spread out and the higher numbers is squished together because they have taken the log of the income per person.1701

Also here we have life expectancy so that is the average number of years that people live in this country.1716

Step 1, what are the cases?1726

These cases actually represent countries and if you put your cursor over these dots it will tell you what country it is.1728

In www.dotminder, one nice thing is that you know the population of that country just by the size of the dots.1738

All the dots are different sizes.1746

Here it tells you the geographic regions.1751

Yellow is the Americas.1754

Red is East Asia.1757

Violet is Africa.1761

Green is Middle East and Northern Africa.1765

Orange is Europe and Central Asia.1769

You could probably guess what this big one is, China and also India.1773

These are the big circles.1780

If you want to find the United States it is a yellow country and quite roughly.1781

We live quite a long time.1787

That is the United States.1790

If you look behind it there is Singapore which is a very small country but they are very rich and high GDP/capita.1792

High income per person but also high in life expectancy.1801

India is also in the middle of the plot and it is median in terms of income but also median in terms life expectancy.1807

One thing you might notice is Africa is clustered down here or maybe these countries have relatively lower income per person.1818

Also relatively low life expectancy compared to these other countries.1830

You could also see that Europe is clustered down here.1834

America is up in there.1840

Asia is also up in the higher end of this.1843

Immediately you see this positive association.1850

You see this positive association and it seems roughly linear or maybe a little bit curve but roughly linear.1855

Another wonderful thing about www.dotminded.org that we will be talking that much today is that it has this data from 1800 all the way to 2000.1857

If you hit this point button, it will play for you how this scatter plot came about over time.1862

You see a lot of countries started off with a very low population numbers.1884

All the dots are relatively small.1891

But the dots grow our GDP will grow and at the same time our life expectancy will grow.1893

You will see that European countries are hot of the pack.1900

Africa is down here.1906

China is growing faster and faster in terms of life expectancy and the GDP is catching up with it.1908

Finally we end up with 2009 which apparently when this data goes up.1916

Another thing that you could do with this visualization is that you can pick a particular country that you might be interested in.1922

Let us say we are very interested in Azerbaijan and we can look at just how Azerbaijan changes over time and1934

it will keep on running track of how Azerbaijan is growing in terms of their GDP as well as their life expectancy.1942

This is just a really wonderful graph and you could do a lot of different kinds of variables for these nations.1953

But let us answer our five questions and back to more statistics things.1962

The first thing is that these are nations that are represented and it is their income per person, by life expectancy.1967

That is the first thing.1987

We have figured out what the cases are and what the variables are.1991

The second thing is the shape.1994

These are roughly linear and maybe a little bit curved.1997

We have seen some clusters of these geographic clusters, but not really in terms of the actual of that.2002

If you thought of these as just blacked out and would it would create roughly this line.2008

Let us talk about the trend.2016

The trend definitely seems to be a positive association, so as GDP is greater, life expectancy is also greater.2021

As GDP is lower, life expectancy is also lower.2033

The rate in the positive way, not opposite meaning.2038

Let us talk about spread and we can imagine a line and see some of the spread and maybe this is a moderate spread2042

and it is not that you see a perfect line, but it is not like so spread out you cannot see the line either.2054

Maybe moderate might be a good answer for that.2059

And number five let us think of why that must be.2063

We want to consider a couple of things when we think about the the relationship between these two variables.2067

We want to think about how variable 1 might impact variable 2.2077

We also want to think about how variable 2 might impact variable 1.2082

Finally, we might want to think about how some third variable, variable 3, the mystery variable might impact both 1 and 2.2090

We might think if you have a higher income per person, if you are a richer country you have better health care, better facility, better sanitation.2101

You might have greater life expectancy.2113

Also if you have greater life expectancy, you could invest in more education and more long-term things, and because of that might increase income per person.2116

You might be able to share more cultural capital.2124

Who knows right?2133

There might be some third variable.2133

Maybe government that governs so well that you have this great income also have good health care or other things to have great life expectancy.2135

It might be the different things or maybe like some countries have more.2146

If you have a lot of work in your economy sectors, but also life expectancy suffers.2150

That is like a third variable.2157

It might be a whole bunch of different things.2160

It is such a long answer but we have not write it down.2164

You want to think about all the different ways that they might impact each other.2166

Here we have almost the same idea, but now we plotted income per person by infant mortality and infant mortality is the rate of infant deaths.2175

Infants are counted as children under the age of one.2188

How many infant deaths you have per 1000 births?2193

Here we see immediately that it seems pretty linear.2200

It seems to hug the line pretty closely.2206

This seems to be a negative association.2217

Remember negative in this case means opposite.2223

As the other one goes up, the other one goes in the opposite direction.2227

As income goes up, infant mortality goes down.2231

Countries that are very wealthy have very low rates of infant mortality.2234

They are not losing a lot of infants.2239

Countries that are more poor where their income is very low they have a higher rate of infant mortality.2244

So that is what we think of a negative association.2253

Sorry I skipped to that the three.2256

Step 2 is shape and I am going to write just linear because this seems to be even more linear than before.2259

Let us say this is moderate to strong because you could clearly see that line and let us think about why there might be the negative association.2267

You could think of infant mortality as the opposite of life expectancy.2283

Life expectancy, the greater the number of the sort of better the health.2289

Infant mortality, the lower the number the better the health.2294

Those two ideas are negatively associated with each other.2299

It makes sense that they would have the exact opposite relationship to income per person.2304

Once again we might want to think about how variable 1 impacts variable 2.2312

How variable 2 impacts variable 1?2320

And also how third variable might impact both one and two.2323

Income might be to better healthcare, better prenatal care and that might to better mortality.2328

Also having less infant mortality might somehow help the economy.2339

For instance, having a growth in the population often helps economies grow in their force to get more jobs and2350

have more things to serve more people but also there might be a third variable again, like war where there are times2357

or disease and that might reduce infant mortality and income per person.2366

Those kind of things might be much more what you want to think about for answer number five.2373

I think this dotminder.org data set is just really interesting.2380

You might want to play with the different kinds of axis that you can actually create and you could create these wonderful scatter plot from real data from the world.2386

You could get women's education and you can look at other aspects of the economy or health or public policy.2400

You can go to war, you would get a whole bunch of different things.2410

Example 3, we are back to sort of more mundane kinds of statistics problems and we expect that these variables to have a positive or negative relationship or trends.2416

Would you expect weak strength or strong strength.2430

Well, these are sort of online and let us think about the case of chicken eggs.2433

For each chicken eggs they each have a length and a width.2440

By length, let us talk about being the axis of being the elongation.2443

That would be like this length and then for the chicken egg this is the width.2451

With these have a positive association or a negative association.2459

Well, I imagine that chicken eggs you might have better chicken eggs, or small chicken eggs.2464

There might be chicken eggs that are skinny or fat, or larger version.2471

Imagine that length and width are sort of positively associated.2480

I would probably expect maybe a strong strength because the nice thing with chicken eggs in sort of like one shape.2486

I would imagine that this have each other closely.2496

Let us talk about US cars.2502

If our cases where US cars, but with those with weights and gas mileage look like.2504

Most used cars are the Hummer, that is like very very strong on weight.2520

If we put weight here that might be way up here on the weight part.2527

In terms of gas mileage, the Hummer is not so great.2532

It has relatively have low gas mileage.2537

Whereas like really like tiny little cars they are less weight, but maybe they have greater gas mileage.2540

Maybe we would see something like this.2550

If we plotted all the US cars and their gas mileage and here we see something like the negative relationship.2558

They do the opposite of one another as weight goes up gas mileage goes down.2569

As weight goes down gas mileage goes up.2574

I’m not sure how strong the correlation that may be.2578

Maybe all the way to moderate or moderate to strong.2582

Maybe strong, just because I can imagine if you are putting more weight because your car is heavy then that might bring down your gas mileage.2588

That might be a strong connection there, but I’m not really sure.2599

I'm going with moderate or moderate to strong.2602

Example 4, join the trend, the strength and the shape for the following scatter plot.2610

Let us also threw in consistency just for ourselves.2617

And seems pretty linear and pretty positive.2627

A strong pretty strong.2636

It seems very constant that the spread is constant down here and here.2640

It is pretty constant throughout.2647

This one looks like a pretty weak strength.2650

I do not know if it is linear.2658

It looks like a cloud to me and you can draw line but it is really weak association.2660

The trend, it seems more positive than negative because at least there is more up here than down here.2670

And here is that with negative we see more here and here and if it is consistent it is pretty spread out here, but maybe a little bit less spread out here.2682

Maybe sort of consistent, but a little inconsistent.2696

Another one here these one looks definitely starts to curve to me.2707

This seems curvilinear.2712

It also seems like a positive association because as x goes up y goes down.2715

It seems pretty strong but maybe a little bit inconsistent because here it seems stronger and get a little bit less down here.2722

It is sort of consistent to me.2733

I can also see a curve but I also see the line going curve too.2745

When I see curved but now this is definitely a negative relationship because the low axis have high y.2750

And the high axis have low y.2764

It seems moderate or strong.2771

Moderate to moderate to strong and is pretty consistent.2778

I will go with constant.2789

Here we have a quite linear negative and it seems pretty strong too.2793

Let us go with strong and noticed that these are very light.2812

I’m sort of eyeballing it and it seems consistent.2817

This is just a way to just help us just eyeball these little bit better.2824

Get used to seeing them.2829

Get used to seeing some of these features very quickly.2831

That is scatter plot.2834

Thanks for using www.educator.com2838