Sign In | Subscribe

Enter your Sign on user name and password.

Forgot password?
  • Follow us on:
Start learning today, and be successful in your academic & professional career. Start Today!
Loading video...
This is a quick preview of the lesson. For full access, please Log In or Sign up.
For more information, please see full course syllabus of Statistics
  • Discussion

  • Download Lecture Slides

  • Table of Contents

  • Transcription

  • Related Books

Start Learning Now

Our free lessons will get you started (Adobe Flash® required).
Get immediate access to our entire library.

Sign up for

Membership Overview

  • Unlimited access to our entire library of courses.
  • Search and jump to exactly what you want to learn.
  • *Ask questions and get answers from the community and our teachers!
  • Practice questions with step-by-step solutions.
  • Download lesson files for programming and software training practice.
  • Track your course viewing progress.
  • Download lecture slides for taking notes.
  • Learn at your own pace... anytime, anywhere!

Transformations of Data

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

  • Intro 0:00
  • Roadmap 0:05
    • Roadmap
  • Why Transform? 0:26
    • Why Transform?
  • Shape-preserving vs. Shape-changing Transformations 5:14
    • Shape-preserving = Linear Transformations
    • Shape-changing Transformations = Non-linear Transformations
  • Common Shape-Preserving Transformations 7:08
    • Common Shape-Preserving Transformations
  • Common Shape-Changing Transformations 8:59
    • Powers
    • Logarithms
  • Change Just One Variable? Both? 10:38
    • Log-log Transformations
    • Log Transformations
  • Example 1: Create, Graph, and Transform the Data Set 15:19
  • Example 2: Create, Graph, and Transform the Data Set 20:08
  • Example 3: What Kind of Model would You Choose for this Data? 22:44
  • Example 4: Transformation of Data 25:46

Transcription: Transformations of Data


We are going to be talking transformations of data today.0002

First we are going to talk about why we even transform data then we are going to talk about0007

two different broad types of transformations, shape preserving and shape changing transformation.0012

Then we will talk about some common shape changing transformations that you might need to know.0017

Some of them you already know.0024

First y transform.0028

One of the big reasons to transform data specially in the shape changing way is that all the stuff with a regression and correlation,0030

and all the stuff we have been learning works for linear patterns.0040

If the pattern is not linear even if you can still fit it to a regression line and you still can find the correlation it probably is not the best way to go.0044

Because for instance, in this graph you could see this has a distinct sort of curvy shape.0054

A simple linear regression one that account for a lot of this variation not as well as a curved line work.0060

Sometimes the transformation might make a nonlinear pattern more linear, thus making regression and correlation more useful.0074

All of a sudden you can use regression and correlation and it will account for a lot of the data.0086

That might be one reason to do that.0089

Let us look at this data for example.0090

This is data that we have looked at before from, where it shows the income per person and GDP per capita.0093

It takes all that stuff that your country buys and sells and a divided by how many people you have.0102

It shows life expectancy here and notice that it says lin, this is a linear graph.0110

It is just showing you even intervals, the distance between 10,000 and 20,000 is the same as the distance between 60,000 and 70,000.0120

Same here, the distance between 55 and 60 years old is the same as the distance between 80 and 85.0133

But one of the issues with this is that it has a distinctly curved shape.0139

And primarily, it is that a lot of countries are very poor in terms of GDP per capita.0144

They are very poor and they are all put together over on this side.0152

Most countries make less than about 15,000 per person.0158

They are also squished over on this side.0165

It would be nice if we could somehow stroke out this part and squish those down0172

because these countries are probably very similar because they are rich countries.0179

A lot of them are Europe and this part of the United States because of that this might be nice.0186

One way we could do that is we could do a log transformed.0194

Instead of giving us the income per person we can look at it at a logarithmic scale.0198

If you remember logs, logs is a lot like this log this 10 that means 10 to that power, 10 to the nth power will give you that number x.0205

We have log 10(x).0228

10^y will give us x.0231

Instead of plotting the actual x it is asking maybe we could transform this so that it is giving us just the exponents.0237

The way you could do this is to show this in a logarithmic scale and now the first parts of these are stretched out.0255

The distance between 401,000 is big and that is bigger than the distance between 20,000 and 40,000.0268

That is our logarithmic scale or exponential scale.0277

Here we see the same data except now we are looking at plotting by the log of these incomes.0281

Here what we see is a more linear pattern.0294

Before we saw a curved pattern but now we see roughly more of a linear pattern.0299

That is one reason why transformations are very useful.0308

There are two kinds of broad transformations that you should know.0312

One is shaped preserving transformation and the other shape changing transformation.0320

Shape preserving transformations are something like you do not actually do the distributions of shape.0325

The shape looks the same.0332

When we look at a scatter plot, the scatter plot will look exactly the same.0334

If it is linear it will stay linear.0339

Shape changing means that if it is linear we will make it curvilinear.0342

If it is curvilinear we will make it more linear.0347

Those are shape changing transformations.0351

In order to be shape preserving, this means that these are any linear transformations.0353

Remember, the equation for a line is y = mx + b.0360

This is the classic formula for a line.0369

Anything if you add a constant or you multiply a value by a constant those are called linear transformation.0372

Shape changing transformation are anything that is non linear.0379

Now you need to do something more than adding by a constant or multiplying by a constant.0387

That might be changing x into x2 or taking the square root of x or adding in other variables.0396

Adding in another variable here.0408

These are non linear transformations.0412

This is anything you do beyond just adding or subtracting or multiplying and dividing by a constant.0418

Here are some common shape preserving transformations.0427

These are the ones that do not change the shape at all.0433

When you add and subtract the constant that is fine.0437

If you multiply or divide a constant that is fine.0440

Converting units is often a common shape preserving transformation.0443

For instance, we collected our data in feet, but we want to see it in inches or something like we looked at minute, but we really wanted in hour.0450

There we are just multiplying like a constant here where you multiplying by 12.0461

Here we are dividing by 60.0468

Those are shape preserving, you will have the same shape.0470

Another shape preserving transformation is standardization or finding that z scores.0475

When we find the z scores, the z scores will have the same shape as your raw scores because we are subtracting by constant.0481

Subtracting by x bar or the mean and dividing by constant.0489

You could do combinations of these two things and still have a shape preserving transformation.0495

Another common shape preserving transformation that you might want to know are transformations from frequency to relative frequency.0500

So that is also shape preserving where you might have raw number of people that you might also want to have proportion from the total.0515

That is another way, because remember finding relative frequency is often just dividing by a constant.0528

Those are shape preserving transformations that you have already seen.0535

The shape changing transformations, the most common ones used are power transformation and log transformations.0539

Power transformations or anything where you raise your y x by some power.0549

For instance from y you change it into y2 or y into the square root of y or dividing 1/y.0557

Raising it to the negative power.0571

Any of these and any combination of these is a power transform.0574

log transformed are finding the exponent.0579

Instead of raising it you have to find the exponent.0583

You could find the log of y and this will give you smaller numbers or you can find the natural log of numbers as well.0587

Any of these are possibilities.0596

You could also look at things like the e to y so that is just the inverse of this.0600

And also like some other constant to y so we could use exponential constant or do something else.0612

Although I have written y here and an oftentimes you might see y become transparent but it is also quite common to transform x as well.0620

Sometimes you might transform both.0630

You may transform both y and x and we will talk about those situations as well.0633

Great, the question is how we know when to do this and should we just change one variable or both?0638

log transformation are usually when you do transformations on both.0647

Log x and log y.0655

Log transformations are often useful for data model by this basic formula.0657

So y = ax^b.0668

When x is raised to some constants power you often want to do a log log transformation.0671

It is just a nice rule.0678

When we do a log log transformation you are basically shrinking and expanding variables on both axis.0680

You are not just stretching out one or shrinking one variable, you are doing that both.0688

Just to give you some ideas for how to do that I'm always going to put back x here and y here.0696

This is a case where all these variables, all of these y are squished together.0705

Here they are not rising very quickly.0720

Here the y are not rising very quickly and then the y rise very quickly.0722

The y are all like there like shooting for each X.0725

Here we would want to shrink y and expand the x.0734

And so when you see curves that are approximately the shape you want to think shrink y and expand x.0740

Here we have a slightly different situation where now we still want to shrink y, because y is descending too quickly but we also want to shrink x here.0747

When the curves goes like this.0767

We can think of it like keeping track of y goes circle that goes around like this.0770

That is the order that I have written in it.0779

Here is 1, 2, 3.0781

In 1 you want to shrink y and expand it.0783

In 2 you want to shrink y but you also want to shrink x.0788

X is also expanding too quickly.0792

Here we want to expand y but shrink x.0797

Here y is not changing very fast up here.0802

It is changing very less and want to expand that up but we want to shrink x because x is going up too quickly in relation to y.0808

Here for the last one, number 4 we want to expand y, but we also want to expand x because y is changing in a way0818

where you it would be helpful to see it expanded outwards because here it is going down very fast.0835

Also with x it would be helpful to expand x out because all the x are squished up here but sort of spread out there.0842

That is just the largest nice rules of them obviously, you do not have to memorize these.0849

Sometimes what I do is play around with it a little bit.0859

I try in a shrieking one expanding the other and as long as I can identify that these are all y = ax^b power.0863

If x is your exponent before it was y =ax^b but now this is ab^x.0873

Here, you probably just want to transform one variable and leave the other one alone.0889

If you are not able to eyeball what you are trying to do is try things out there is no harm in playing around with it.0901

But eventually when you do decide on a transform you want to have reason for it instead of it is everything to do.0910

Let us go to example 1, create a set of data with this function, graph this data set and what kind of transformation should be done to make this data more linear.0918

Already we could see from this that this is the example of y = ax^b.0931

That is the case we are going to need to do a log log transformation.0938

We are going to need to do transformation to both x and y, but let us look at the shape of it to see what this data looks like.0942

If you download the example, for example 1 I have already put in the function y= ax^b let us put in a which is 10 and b -.4.0957

I already have seated this with just a whole bunch of positive integers for x that are just steadily going all the way up to 33.0978

Let us find the corresponding Y values.0991

In order to find y we just have to follow this formula here.0994

y = a × x^b.1000

Remember Excel knows order of operations, so it should do the power before it is multiplication.1011

Unless we have that I’m just going to drive all of these all the way down and get a whole set of data.1017

I think I forgot to lock down this.1032

I forgot to lock down a and b that is like giving me all these craziness.1037

Let us lock down A and B once we have that then I can.1041

We see have this nice curve if you remember that the second type of curve or so.1055

We know we need to do both kind of transformations already.1069

It would be helpful for us if we can actually shrink y, but also maybe shrink x and logged the way of shrinking both of them.1075

When we try log.1095

Let us do log transforms.1097

To get lot of x and get log of y.1100

Feel free to also use natural log.1104

I’m going to use log based 10 and Excel thankfully has log and I'm going to use log 10 and it put in my x.1107

It is going to change x from this into the exponent.1120

100 power will give us 1 and I will do the same thing to y.1128

101 11 will give us 10.1136

I’m going to take that copy and paste all the way down, get a nice log transform.1147

Here I have already made this graph and set this up so that it'll actually get this data.1157

If you click on those it will show you which data it is using.1165

I already labeled as log y, just y and log x instead of just x.1168

And what you notice this data that has ones been curved is now straightened out .1174

This is one way transformation can be useful because now we could use log x and y instead of x and y and put log in xy into our calculations and enter correlations population.1181

And we should be able to get more traction out of using those tools.1193

Let us move on to example 2.1201

Create a set of data with this function Y = ab^x and graph this data set.1208

What kind of transformation should be done to make this data more linear?1217

We could just put in whatever numbers we want for a and b now we want to probably just do one variable transform.1221

Like a log transforms or power transform on one side.1233

If we go to example 2 it already has a and a × b^x power and so we could just put in some numbers like 5.2, anything you want.1237

Let us put in our formula.1255

Let us not make the same mistake again let us lock-in.1260

Here is a × b^x.1263

Here I’m going to lock in b and once we have that I could just drag this all the way down.1275

This is very curved, very steeply curved.1292

How can we transform this?1300

One option might be to transform y.1305

Let me put in log y and maybe I will put in just the same thing log base 10 y and then just drag that all the way down.1309

Let us do it again.1326

Here we now get this nice linear looking distribution instead of this very, very curvy like right angle.1330

Because we do not change x, x stays nice and linear, but here we have a logarithmic axis.1343

logarithmic function here.1354

Finally let us move on to example 3.1359

Example 3 says considering this data set, the goal of a statistical model would be to allow accurate prediction of birthrate1367

from a country's GNP, that is gross national product.1378

And what kind of model would you choose for this data?1382

It is often helpful to just sort of look at the data and draw for yourself what you think1385

might be a helpful model ideal theory for what the underlying data come from.1393

This looks very curvy to me and to me that looks sort of like what we saw before where we did the log log transformation that looks sort of like that.1406

What kind of model would you choose?1421

Assume you would not choose a linear one, I'm not going to use the y = nx + b for it.1426

It is not quite like a parabola shape but it seems something like this before.1432

I have actually seen it and x financial function.1440

I have seen something that looks like that right and that is y = e^x.1446

I have seen something like that before and this is like that it can be flipped around.1458

I do not flip the y that would be like me folding it down, but it will sort of folded along the x.1466

For every positive x maybe I wanted to be negative.1482

Maybe I want model but there was something it does not have to be e but some constant to the –x.1486

One way that I that I would advise you to do this is if you have access to graphing calculator or www.wolframalpha,1495

one thing is that you can put these equations in so that you can eyeball it and see if you get roughly this shape and play around with it.1507

Feel free to put in different exponent and constants and try to get something at least the shape that looks like this.1513

Exact numbers are not important.1522

What we are really looking for that shape.1523

I’m going to guess that I need a shape that looks something like this and if you do not want to put in e you could just use a, a^-x.1525

That would be the model that I would choose for this data.1538

Example 4, same data set, but it says what kind of transformation might be drawn as data before fitting this data to regression line and finding correlation.1545

Well, as though this data corresponds to something like y = a^x and we said we need a –x to get that curve that looks like that1555

and when we see something like that perhaps one thing we might want to do is just change one variable.1579

That might be one of the strategies that we use since this corresponds to the basic equation y = ab^x.1589

Whenever you see equations of that kind you probably just want to change one of your variables.1608

You probably want to play around and either change GMP or change the birth rate to try and straighten out this data.1613

That is it for transformation thanks for using