WEBVTT mathematics/statistics/son
00:00:00.000 --> 00:00:02.200
Hi and welcome to www.educator.com.
00:00:02.200 --> 00:00:08.500
We are going to be looking at shapes again but now we are going to be calculating skewedness and pectoris.
00:00:08.500 --> 00:00:19.100
We could not do this before when we covered shapes because we needed to know something about variability first.
00:00:19.100 --> 00:00:25.400
This is a road map and basically we are going to be covering skewedness to concepts.
00:00:25.400 --> 00:00:30.500
I’m going to try and connect that to calculating skew.
00:00:30.500 --> 00:00:34.000
Then we are going to be talking about how to interpret the number that you have calculated
00:00:34.000 --> 00:00:39.200
and also what that number will tell you about the relationship of central tendency.
00:00:39.200 --> 00:00:44.500
Obviously because the number directly relates to the actual concepts of skewedness
00:00:44.500 --> 00:00:53.800
this measures a central tendency relationships will also hold for the skewedness concepts.
00:00:53.800 --> 00:00:55.500
Then we are going to be talking about kurtosis.
00:00:55.500 --> 00:01:03.400
Kurtosis is something that people do not focus a lot on because it is a different shape to understand.
00:01:03.400 --> 00:01:11.600
We are going to talking about calculating and how to interpret kurtosis.
00:01:11.600 --> 00:01:14.700
First let us start with the concept of skewedness.
00:01:14.700 --> 00:01:17.000
We have been over this again and again.
00:01:17.000 --> 00:01:27.300
We know that there are skewed right distributions, skewed left distributions, and ones that are not skewed at all.
00:01:27.300 --> 00:01:30.700
There are lots of these distributions.
00:01:30.700 --> 00:01:33.300
This one is not skewed.
00:01:33.300 --> 00:01:40.700
Anything that has a tail is skewed.
00:01:40.700 --> 00:01:55.100
This are also called j shapes or formally they are called asymptotic shapes
00:01:55.100 --> 00:02:05.200
because they are this asymptote right here, it asymptote against the x axis.
00:02:05.200 --> 00:02:11.200
Skewedness means there is a tail somewhere.
00:02:11.200 --> 00:02:14.900
One side is longer than the other.
00:02:14.900 --> 00:02:23.400
One side that values that go and on and those values are not clustered around other values.
00:02:23.400 --> 00:02:27.000
That is basically the concept of skewness.
00:02:27.000 --> 00:02:34.900
It would be nice if we could get a number that would tell us this is exactly how skewed your distribution is.
00:02:34.900 --> 00:02:45.800
Distribution like this might be somewhat skewed but a distribution like this might be really skewed.
00:02:45.800 --> 00:02:51.600
It would be nice if we could say how much more skewed one is on the other.
00:02:51.600 --> 00:02:57.400
Skewedness what we are going to calculate is going to tell us exactly how skewed something is.
00:02:57.400 --> 00:03:07.500
On the positive end it would mean that the tail is on the right, if the number is positive.
00:03:07.500 --> 00:03:12.200
If the number is negative it will tell us that the tail is on the left.
00:03:12.200 --> 00:03:17.700
The greater the positive number it means it is more skewed on the right.
00:03:17.700 --> 00:03:21.000
Normal distributions often have 0 skew.
00:03:21.000 --> 00:03:28.700
Things that are much go often have 0 skew.
00:03:28.700 --> 00:03:30.400
Let us talk about calculating skewedness.
00:03:30.400 --> 00:03:38.600
Some people get a little bit freaked out at this but everything in here I want to blow it down to what is important.
00:03:38.600 --> 00:03:47.300
Here is the main heart of the skewedness function.
00:03:47.300 --> 00:03:59.000
Basically it is going to be the sum of cubed distances from the min.
00:03:59.000 --> 00:04:02.700
We will talk in a little why it is cubed.
00:04:02.700 --> 00:04:11.900
The sum of all the cubed distances of each point from the min of the sample.
00:04:11.900 --> 00:04:23.900
Instead of sum of squares, it will be the sum of cubes / the standard deviation (s),
00:04:23.900 --> 00:04:32.200
where we use the sample to calculate the standard deviation of the population to estimate it.
00:04:32.200 --> 00:04:35.300
That is basically the crooks of the idea.
00:04:35.300 --> 00:04:40.500
There is going to be some more frills on this but this is basically the heart of the idea.
00:04:40.500 --> 00:04:42.300
Let us think about why it is cubed.
00:04:42.300 --> 00:04:47.200
It is cubed because it is going to matter whether it is going to be positive or negative.
00:04:47.200 --> 00:04:53.000
Before when we squared things, we do not care which direction away from the min it was.
00:04:53.000 --> 00:04:54.700
But now we care.
00:04:54.700 --> 00:05:04.000
How far are you away from the min but also what direction are you away from the min?
00:05:04.000 --> 00:05:06.200
We are having things that are stacking up.
00:05:06.200 --> 00:05:09.900
It is like each point is having a vote.
00:05:09.900 --> 00:05:13.800
They are saying I’m on the left side, I’m on the right side.
00:05:13.800 --> 00:05:19.400
By adding them up we see who wins, the left side guys or the right side guys.
00:05:19.400 --> 00:05:25.400
When there are more guys on the negative end we get a smaller number.
00:05:25.400 --> 00:05:32.000
When there are more guys on the positive end we get a large number.
00:05:32.000 --> 00:05:40.500
By cubing it, what we do is we make those that are on the ends matter more than those who are close to the center.
00:05:40.500 --> 00:05:46.500
Cubing is going to make it bigger than squaring it.
00:05:46.500 --> 00:05:49.600
That is the main heart of the idea.
00:05:49.600 --> 00:05:55.900
Just to review, if you double click on this s, what would be inside?
00:05:55.900 --> 00:06:00.900
Think about clicking on that s, what would be s equal to?
00:06:00.900 --> 00:06:12.400
S is the sum of squares ÷ n -1 because it is the standard deviation, we would square that.
00:06:12.400 --> 00:06:18.400
Standard deviation is always going to be positive.
00:06:18.400 --> 00:06:24.000
This is not going to change its sign.
00:06:24.000 --> 00:06:28.100
On top of the heart of this function.
00:06:28.100 --> 00:06:34.700
Sometimes people divide by n as well or n – 1.
00:06:34.700 --> 00:06:43.300
In Excel it is going to multiply by n × n + 1 ÷ n – 3.
00:06:43.300 --> 00:06:52.700
But that junk does not matter because that will not have a great impact on skewedness as the heart of this function.
00:06:52.700 --> 00:06:57.700
That is what I want you to know, this is the heart.
00:06:57.700 --> 00:07:05.600
That is the heart of the function and the other stuff are just frills.
00:07:05.600 --> 00:07:10.000
This is one type of skewedness but actually there are tons of different types of skewedness.
00:07:10.000 --> 00:07:17.400
There is pair skewedness, there are lots of skewness that I cannot remember the name.
00:07:17.400 --> 00:07:21.900
There is minimal or distance skewedness.
00:07:21.900 --> 00:07:24.400
There is momentum skewedness, there are ton of them.
00:07:24.400 --> 00:07:32.100
If you want to know more about skewedness, you can check out this link www.wolframalpha.com.
00:07:32.100 --> 00:07:39.000
I frequently use it and refer my students to it.
00:07:39.000 --> 00:07:43.900
Let us talk about interpreting skewedness.
00:07:43.900 --> 00:07:57.800
Imagine this is all of the skewedness values that are possible, if we find something out that has negative skewedness, like -1 skewness.
00:07:57.800 --> 00:08:02.300
We have find out that our distribution has it, but I have not shown you what our distribution is.
00:08:02.300 --> 00:08:04.400
Can you guess what kind of distribution it is?
00:08:04.400 --> 00:08:05.300
Yes you can.
00:08:05.300 --> 00:08:10.800
You know that it has a tail that goes to the left.
00:08:10.800 --> 00:08:17.100
On the other hand if it is a positive one, you know that it has a tail that goes to the right.
00:08:17.100 --> 00:08:28.400
If the skewness is 0, we are not sure exactly what it is but we basically know that it looks symmetrical.
00:08:28.400 --> 00:08:34.700
It could look uniform, like approximately a normal distribution.
00:08:34.700 --> 00:08:43.800
It could be some crazy thing that is symmetrical.
00:08:43.800 --> 00:08:51.200
It could be any number of distributions as long as they are approximately symmetrical.
00:08:51.200 --> 00:09:00.200
Let us talk a little bit about how to look at skewness in SPSS.
00:09:00.200 --> 00:09:10.500
Go ahead and download the example Excel file and if you look on the first sheet, you should see skew data.
00:09:10.500 --> 00:09:15.800
I have shown you 3 different little data sets.
00:09:15.800 --> 00:09:21.300
Data set A is skewed to the right, notice that it has a tail to the right.
00:09:21.300 --> 00:09:26.000
Data set B is skewed to the left, notice that it has a tail to the left.
00:09:26.000 --> 00:09:31.900
Data set c is more normal, it is symmetrical.
00:09:31.900 --> 00:09:39.300
It has a little peak here but it is roughly symmetrical on both sides.
00:09:39.300 --> 00:09:46.700
The nice thing about Excel is that it does have a skew function so you could automatically generate skewness.
00:09:46.700 --> 00:09:55.900
The skew function is just skew and you put in the data that you like it to calculate.
00:09:55.900 --> 00:10:09.300
This is showing you the skew of 2.08, 2.09, it is saying that it is very positively skewed.
00:10:09.300 --> 00:10:16.400
I could just drag this across and it will calculate the skews for the data sets above.
00:10:16.400 --> 00:10:21.900
This one is for this data set and we find that it is a negative skew.
00:10:21.900 --> 00:10:27.000
The extent is almost as negative as this one, right?
00:10:27.000 --> 00:10:37.300
This is about 2, -2, and it is because these two are largely mirror opposites of each other so they have similar skews.
00:10:37.300 --> 00:10:40.900
Here we see that the skew is closer to 0.
00:10:40.900 --> 00:10:50.200
It is still on the positive side but it is closer to 0 than it is to 2 or 3.
00:10:50.200 --> 00:10:57.100
SPSS, if you are interested in the particular formula that they use to calculate skew,
00:10:57.100 --> 00:11:07.400
you could double click and when these little help window comes up you will notice that skew is a hyperlink.
00:11:07.400 --> 00:11:17.400
If you click on that then you should get this little window that comes up, that is an Excel help window.
00:11:17.400 --> 00:11:20.800
Let me make this bigger.
00:11:20.800 --> 00:11:24.700
Here they show you the exact equation that they use for skew.
00:11:24.700 --> 00:11:29.100
Notice that this part is exactly what we talk about.
00:11:29.100 --> 00:11:38.900
It is that distance away from the min³ / stdev³.
00:11:38.900 --> 00:11:51.600
That is all the heart if the function but here we have this extra stuff n / n – 1 × n – 2.
00:11:51.600 --> 00:11:54.400
You could use that or you could use something else.
00:11:54.400 --> 00:11:57.700
You could use 1 / n – 1 that is also very common.
00:11:57.700 --> 00:12:03.200
I will be using n / n - 1.
00:12:03.200 --> 00:12:10.600
If you hit enter, let us look a little bit first before we calculate skew for our cells,
00:12:10.600 --> 00:12:19.100
let us look a little bit on what it would mean to our measures of central tendency to be skewed right or left.
00:12:19.100 --> 00:12:21.500
Let us calculate the min.
00:12:21.500 --> 00:12:29.900
Min is average, sometimes I type in min in Excel.
00:12:29.900 --> 00:12:32.000
I’m going to close my parentheses.
00:12:32.000 --> 00:12:47.400
Let us also get the median while we are at it and let us get the mode.
00:12:47.400 --> 00:12:57.100
Once we have those 3 values, we could just copy and paste straight across and
00:12:57.100 --> 00:13:03.900
it will find the respective min, median and mode for each of these other distributions.
00:13:03.900 --> 00:13:09.300
When it is skewed to the right and there is a very positive skew,
00:13:09.300 --> 00:13:15.900
what we find is that the min is greater than the median which is greater than the mode.
00:13:15.900 --> 00:13:17.800
That makes sense.
00:13:17.800 --> 00:13:22.600
The mode is probably going to be on the very left side of the distribution.
00:13:22.600 --> 00:13:25.700
The min is highly affected by the outliers.
00:13:25.700 --> 00:13:30.400
The outliers are on the right so the min is going to be greater.
00:13:30.400 --> 00:13:34.100
For negative skewed distributions, we have the exact opposite pattern.
00:13:34.100 --> 00:13:38.700
This time the mode is the greatest, followed by the median, then the min.
00:13:38.700 --> 00:13:45.000
Remember the min is highly affected by outliers, this time the outliers are small.
00:13:45.000 --> 00:13:48.800
The min is being pulled to the smaller side.
00:13:48.800 --> 00:13:55.400
The most frequent numbers are up on the higher end and that is why median and mode tend to be bigger.
00:13:55.400 --> 00:14:03.900
When you know the skewness, you will automatically know the relationship between min, median, and mode.
00:14:03.900 --> 00:14:11.300
On the other hand, when distributions are largely symmetrical, when there is very little skew on either side
00:14:11.300 --> 00:14:17.900
then what we see is that the min, median, and mode are often very close to each other.
00:14:17.900 --> 00:14:23.100
Sometimes they will be exactly the same, this time they are just very close to each other.
00:14:23.100 --> 00:14:34.500
That makes sense because there is a little bit of skew on top of each other.
00:14:34.500 --> 00:14:39.300
Let us talk about how to calculate skewness.
00:14:39.300 --> 00:14:43.600
Here I have put the formula for skewness.
00:14:43.600 --> 00:14:59.000
Sometimes people are not comfortable with moving around that sigma sign, it is like where does it go?
00:14:59.000 --> 00:15:02.300
That sigma will affect some things and not other things.
00:15:02.300 --> 00:15:06.300
That sigma will affect anything that has an I next to it.
00:15:06.300 --> 00:15:16.800
Because of that you could put ∑ and put a divided by stdev³ in there and it would not affect it very much.
00:15:16.800 --> 00:15:19.600
You could either do this part after you add them all up or you could do it before you add them all up.
00:15:19.600 --> 00:15:30.400
It actually does not matter because of the distributive property.
00:15:30.400 --> 00:15:35.100
You could either put this inside or this or you could take it out.
00:15:35.100 --> 00:15:36.500
It does not matter.
00:15:36.500 --> 00:15:41.100
Let me show you a way that you could write this.
00:15:41.100 --> 00:15:50.100
Actually this might not be helpful to write it on Excel, I will write it when we go back to our power point slides.
00:15:50.100 --> 00:15:51.800
Let us start with this.
00:15:51.800 --> 00:15:58.100
We know that we need to get all of these distances away from the min.
00:15:58.100 --> 00:16:02.600
Let us start that with cubing that.
00:16:02.600 --> 00:16:15.400
I need to start with the equal sign (=) and I’m going to put in a parentheses to say take my value, subtract the min which would be the average.
00:16:15.400 --> 00:16:19.200
I’m going to put in all of these values.
00:16:19.200 --> 00:16:25.600
I do not want that average to jiggle around so I’m going to lock that data in place.
00:16:25.600 --> 00:16:37.500
Because I want to copy and paste it later on, I’m going to leave the A to vary but I’m going to say lock the 4 down.
00:16:37.500 --> 00:16:44.600
When I copy down this column it would not change the A column, but when I copy across it will change it.
00:16:44.600 --> 00:16:49.200
I only want to lock down the number row.
00:16:49.200 --> 00:16:55.800
Then I’m going to close that parentheses and cube that value.
00:16:55.800 --> 00:17:05.600
Here I could also divide by this as well but I’m just going to do this for now.
00:17:05.600 --> 00:17:16.800
This is just the part where I’m doing x – x bar³, that is all I’m doing in this column.
00:17:16.800 --> 00:17:25.200
I’m going to go ahead and copy and paste that all the way down.
00:17:25.200 --> 00:17:41.000
Here what we have is just this part but now I need to sum them all up and divide by s³ as well as n – 1.
00:17:41.000 --> 00:17:49.700
I’m going to modify this to make my formula a little bit simpler.
00:17:49.700 --> 00:17:58.200
I will have n – 1 down here.
00:17:58.200 --> 00:18:10.800
Now I need to add these up and divide by n – 1 × stdev³.
00:18:10.800 --> 00:18:12.800
Here let us do that.
00:18:12.800 --> 00:18:47.200
Let us sum this guys all up and divide by count my data – 1, multiply that by stdev of my data cubed.
00:18:47.200 --> 00:18:50.700
It is a lot of parentheses.
00:18:50.700 --> 00:18:59.400
Because it is color coded I know that I need another blue one to signal my denominator and hit enter.
00:18:59.400 --> 00:19:07.900
There we get a skewness of 1.84.
00:19:07.900 --> 00:19:11.300
That is way close to the 2.
00:19:11.300 --> 00:19:13.500
Something that we got using the Excel function.
00:19:13.500 --> 00:19:18.500
The Excel function will just multiply by a slightly different constant so that is the only difference.
00:19:18.500 --> 00:19:27.800
But once we have this, we can actually copy this over here.
00:19:27.800 --> 00:19:34.900
Here if I double click on this it shows that it is using the C column because I let C vary
00:19:34.900 --> 00:19:42.300
and it is being relative about the columns but it is locking the rows in place.
00:19:42.300 --> 00:19:52.100
I could also check down here, it is using the C column to count those data and get the standard deviation.
00:19:52.100 --> 00:20:01.300
What we find is -1.7 which is pretty close to 1.9 that we found with Excel’s function.
00:20:01.300 --> 00:20:09.300
I’m just going to copy and paste all of that again to our approximately symmetrical one.
00:20:09.300 --> 00:20:16.400
We got something close to the .44 that Excel gave us, .39.
00:20:16.400 --> 00:20:22.200
That is how we are going to calculate shape here.
00:20:22.200 --> 00:20:26.000
One of the features is skewness.
00:20:26.000 --> 00:20:32.800
That is skewness.
00:20:32.800 --> 00:20:36.400
Now let us talk about interpreting kurtosis.
00:20:36.400 --> 00:20:42.400
I actually said that I will talk a little bit about where the skewness stuff goes.
00:20:42.400 --> 00:20:44.100
Let me just draw it over this corner.
00:20:44.100 --> 00:20:47.500
I’m not going to need all this room for kurtosis.
00:20:47.500 --> 00:20:54.300
Let me just talk a little bit more about the skewness function here.
00:20:54.300 --> 00:21:15.200
The formula for skewness as I said before is going to be the sum of all the cubed distances over some constant and s³.
00:21:15.200 --> 00:21:16.900
I have said that is the idea.
00:21:16.900 --> 00:21:34.600
Sometimes you might see this formula written more like this x sub I – mean³ / s³.
00:21:34.600 --> 00:21:40.900
This and this they mean the same thing.
00:21:40.900 --> 00:21:42.900
It is just different ways of writing it.
00:21:42.900 --> 00:21:52.600
This affects things that are affected by i and because this is not affected by the I at all,
00:21:52.600 --> 00:21:57.800
s³ will be the same whether it is the first value or the last value.
00:21:57.800 --> 00:22:03.600
It does not matter whether you write it inside included in that ∑ or not.
00:22:03.600 --> 00:22:18.000
Another way you could write it is also 1/s³ × x sub I – x bar³.
00:22:18.000 --> 00:22:24.400
This is another way you could write it and so all of these 3 ways of writing it are equivalent.
00:22:24.400 --> 00:22:28.200
It does not change anything.
00:22:28.200 --> 00:22:32.400
Do not get tricked out by one of this other option.
00:22:32.400 --> 00:22:36.300
They are not trying to be tricky.
00:22:36.300 --> 00:22:38.000
Let us talk about kurtosis.
00:22:38.000 --> 00:22:48.100
Kurtosis is a concept that is weird for people because this is not something that we are used to dealing with in regular shapes that we know.
00:22:48.100 --> 00:22:50.900
Roundness and squareness.
00:22:50.900 --> 00:22:53.300
Kurtosis is something a little bit different.
00:22:53.300 --> 00:22:57.900
Kurtosis is about two things that are bundled into one.
00:22:57.900 --> 00:23:02.000
One is pointiness or peakness.
00:23:02.000 --> 00:23:08.300
Very kurtotic shapes might look something like that.
00:23:08.300 --> 00:23:12.200
Super non kurtotic shapes looks something like that.
00:23:12.200 --> 00:23:23.300
One important aspect of this is the pointiness.
00:23:23.300 --> 00:23:33.000
At the same time that peakness is going up, what you are seeing is the tails becoming thinner.
00:23:33.000 --> 00:23:41.200
Peakness and thin tails usually go together and kurtosis is about both of these things
00:23:41.200 --> 00:23:46.700
because here we not only have no peakness but we also have fat tails.
00:23:46.700 --> 00:23:51.600
The tails are just as fat as that lock of peak in the middle.
00:23:51.600 --> 00:23:56.600
Something in the middle might look something more like that.
00:23:56.600 --> 00:24:06.700
This is a kurtotic dimension where you are getting increasing kurtosis.
00:24:06.700 --> 00:24:11.100
When you have increasing kurtosis, it means two things simultaneously.
00:24:11.100 --> 00:24:18.200
Having more peaks but also having thinner tails.
00:24:18.200 --> 00:24:23.900
Let us talk about calculating kurtosis.
00:24:23.900 --> 00:24:32.100
Kurtosis let us talk about the main idea before we get into things.
00:24:32.100 --> 00:24:35.400
Like skew, you can multiply constants to it.
00:24:35.400 --> 00:24:36.900
It does not matter.
00:24:36.900 --> 00:24:40.000
This is the heart of kurtosis.
00:24:40.000 --> 00:24:48.100
Here is my sigma, it is going to be that same distance for each point away from my mean.
00:24:48.100 --> 00:24:58.200
Instead of cubing it, we are going to raise it to the 4th power.
00:24:58.200 --> 00:25:07.000
When we raise it to the 4th power, we know that we do not care about whether it is on the left side or the right side.
00:25:07.000 --> 00:25:10.300
That is one thing to know already about kurtosis.
00:25:10.300 --> 00:25:16.300
It is not counting how many are on one side versus the other side of the mean.
00:25:16.300 --> 00:25:25.400
Kurtosis already we know it will probably be positive because it is going to raise everything to the 4th power
00:25:25.400 --> 00:25:30.300
and when you raise something to an even number power it is going to be positive.
00:25:30.300 --> 00:25:36.300
We are going to divide that to stdev⁴.
00:25:36.300 --> 00:25:42.700
Once again, this will always be positive, stdev is already positive.
00:25:42.700 --> 00:25:44.700
We are going to raise it to the 4th power.
00:25:44.700 --> 00:25:53.200
Kurtosis is largely going to be a positive number.
00:25:53.200 --> 00:26:04.300
The only difference between different values of kurtosis might be whatever constant they decide to multiply by.
00:26:04.300 --> 00:26:12.000
Frequently, 1/n – 1 is one of the constants.
00:26:12.000 --> 00:26:14.900
I forget what Excel does, Excel does something crazy.
00:26:14.900 --> 00:26:17.100
We will figure it out when we get there.
00:26:17.100 --> 00:26:21.800
Once again, even with kurtosis you could write it in very different ways.
00:26:21.800 --> 00:26:40.400
You could write it as 1/n – 1 × s⁴ and then put your sum of 4th powers here.
00:26:40.400 --> 00:26:45.900
Since it is sum of squares, we are raising it up to the squared.
00:26:45.900 --> 00:26:47.900
That is one way of writing it.
00:26:47.900 --> 00:27:03.900
Another way of writing it is you can make ∑ x sub I – mean⁴ / n-1 × s⁴.
00:27:03.900 --> 00:27:07.300
All of these things are the same thing.
00:27:07.300 --> 00:27:13.900
Once again, this is the heart of idea of kurtosis.
00:27:13.900 --> 00:27:18.300
One of the reason that is r⁴ is let us think about it.
00:27:18.300 --> 00:27:23.500
Remember it is very concerned about being neither on the outside or the inside of the distributions.
00:27:23.500 --> 00:27:25.500
Are you on the tails or at that peak?
00:27:25.500 --> 00:27:32.200
By raising it to the 4th power it makes everybody matter a lot especially if you are on the outside.
00:27:32.200 --> 00:27:39.300
You matter wait more that if you are on the inside.
00:27:39.300 --> 00:27:43.500
One more thing to the idea of kurtosis.
00:27:43.500 --> 00:27:52.800
Typically kurtosis is going to be for normally distributed function, let me try here.
00:27:52.800 --> 00:28:06.000
For approximately normal looking distribution the kurtosis if you calculate it with some function like this, it is going to be 3.
00:28:06.000 --> 00:28:11.000
That is so arbitrary.
00:28:11.000 --> 00:28:24.700
What they have done is they made the kurtosis function so that you subtract 3 from it so that the normal distribution has a kurtosis of 0.
00:28:24.700 --> 00:28:26.500
Like 3 – 3 =0.
00:28:26.500 --> 00:28:29.800
That is actually how you will get negative kurtosis.
00:28:29.800 --> 00:28:34.200
It is not because of this function, but it is because you subtract by 3.
00:28:34.200 --> 00:28:39.700
The lowest kurtosis you can get is -3.
00:28:39.700 --> 00:28:42.400
It is an odd bizarre of things.
00:28:42.400 --> 00:28:57.700
I’m not sure when you decide to normalize it to 0 but my theory is that it will be hard for people to remember above 3 is something for normal.
00:28:57.700 --> 00:29:04.400
They just make you do it in the formula.
00:29:04.400 --> 00:29:12.300
Now that we have this weird correction of subtracting 3, we could talk about interpreting kurtosis.
00:29:12.300 --> 00:29:26.000
You already know that the kurtosis above 0 would mean that you have an approximately way normal distribution.
00:29:26.000 --> 00:29:29.600
That is what a kurtosis of 0 would be.
00:29:29.600 --> 00:29:44.600
A kurtosis of less than 0 and greater than 0 is going to be more peaked, more pointy than normal.
00:29:44.600 --> 00:29:48.500
Something like this.
00:29:48.500 --> 00:29:59.600
That is a kurtosis that is greater than 0 and we call that leptokurtic.
00:29:59.600 --> 00:30:12.400
It means more peaked than normal.
00:30:12.400 --> 00:30:18.800
We could call other things that have a kurtosis that is similar to the normal distribution.
00:30:18.800 --> 00:30:24.800
We could just say similarly kurtotic to the normal distribution but that would be long.
00:30:24.800 --> 00:30:37.600
We say they are mesokurtic because you do not have to be a normal distribution to have a kurtosis of 0.
00:30:37.600 --> 00:30:53.600
Mesokurtic just means it is about the same peakness as normal.
00:30:53.600 --> 00:30:58.500
That is mesokurtic.
00:30:58.500 --> 00:31:08.300
We need to have another for something that is less peaked or flatter than normal.
00:31:08.300 --> 00:31:17.100
I remember this because mesokurtic, leptokurtic, that sounds crazy but meso I just remember it is like in the middle kurtosis.
00:31:17.100 --> 00:31:21.500
Lepto is hard for me to remember which it is.
00:31:21.500 --> 00:31:27.100
That is why this last one helps me because this one I could always remember, it is called platykurtic.
00:31:27.100 --> 00:31:31.900
I think of a – and how it has a flat peak.
00:31:31.900 --> 00:31:52.700
Platykurtic means that this is flatter than normal, smaller peakness than normal.
00:31:52.700 --> 00:31:57.900
That would be something that looks more like this.
00:31:57.900 --> 00:32:01.800
That is platykurtic there.
00:32:01.800 --> 00:32:05.100
Those are our 3 interpretations of kurtosis.
00:32:05.100 --> 00:32:11.500
Let us go to our Excel examples and look at kurtosis there.
00:32:11.500 --> 00:32:20.200
Here let us click on a kurtosis data, that is the 3rd sheet and we could look at 3 distributions
00:32:20.200 --> 00:32:25.700
that are already put in there that might be good for us to look at regarding this idea of kurtosis.
00:32:25.700 --> 00:32:32.300
The uniform distribution obviously the tails are just as fat as the peak, it is not simple peaked.
00:32:32.300 --> 00:32:34.000
The tails are super fat.
00:32:34.000 --> 00:32:40.000
Here we have normal peak but the tails do not look pretty fat.
00:32:40.000 --> 00:32:47.400
Here we have the thinnest tails and the peaks are higher than the tails are.
00:32:47.400 --> 00:32:51.800
There is a bigger difference between the peak and tails.
00:32:51.800 --> 00:33:05.400
Handily Excel has a kurtosis function so we could put in kurt and then put in our data.
00:33:05.400 --> 00:33:08.900
Hit enter.
00:33:08.900 --> 00:33:17.300
What we have here is negative kurtosis where it is flatter than the normal distribution.
00:33:17.300 --> 00:33:21.500
I’m just going to drag all of this over here.
00:33:21.500 --> 00:33:32.100
Here we still have a negative kurtosis because it is not as peaked or pointy as the normal distribution.
00:33:32.100 --> 00:33:41.700
Here we have it is not normal but is more pointy, more peaked than the normal distribution would be.
00:33:41.700 --> 00:33:52.400
If you want to know the precise formula that Excel uses in order to calculate kurtosis, go ahead and click on kurt.
00:33:52.400 --> 00:33:56.800
Here is shows you that this is formula for kurtosis.
00:33:56.800 --> 00:34:06.100
What they do is they multiply by this crazy looking stuff but the crooks of the formula is still there.
00:34:06.100 --> 00:34:20.700
It is the distances, deviations, to the 4th power, stdev⁴ – 3 × crazy stuff.
00:34:20.700 --> 00:34:25.200
That is the heart of that function.
00:34:25.200 --> 00:34:29.400
You can see that we use that.
00:34:29.400 --> 00:34:34.100
If we click on the kurtosis, we will calculate it on our own.
00:34:34.100 --> 00:34:43.400
I use this n – 1 constant, other people who use other things.
00:34:43.400 --> 00:34:56.700
What I’m going to do is I’m going to put in n – 1 down here instead.
00:34:56.700 --> 00:35:07.100
All of this, this whole thing, and you subtract 3.
00:35:07.100 --> 00:35:10.000
Bizarre but true.
00:35:10.000 --> 00:35:12.600
Let us start with this part right here.
00:35:12.600 --> 00:35:17.400
The deviation to the 4th power.
00:35:17.400 --> 00:35:31.900
I just put in this value – average of my data and all of that raised to the 4th.
00:35:31.900 --> 00:35:38.500
I do not want my mean to jiggle around so I’m going to lock my rows.
00:35:38.500 --> 00:35:40.700
We are not going to lock the columns.
00:35:40.700 --> 00:35:47.500
As long as I copy and paste it down here, we can just use column A.
00:35:47.500 --> 00:35:51.900
I’m just going to copy and paste all of that down here.
00:35:51.900 --> 00:35:57.200
Notice that all of these values are positive.
00:35:57.200 --> 00:36:04.000
Down here, what I’m going to do is sum them all up.
00:36:04.000 --> 00:36:07.900
That is one thing I know I need to do.
00:36:07.900 --> 00:36:24.200
I know I need to divide by n – 1, count all of these guys and subtract 1.
00:36:24.200 --> 00:36:33.500
That is within my green parentheses there, multiply by stdev⁴.
00:36:33.500 --> 00:36:42.500
That is stdev of my data raised to the 4th power.
00:36:42.500 --> 00:36:51.700
Because Excel knows order of operations it is going to do that power before it does the multiplying.
00:36:51.700 --> 00:37:00.200
Then I’m going to close that and that is my blue parentheses closing there.
00:37:00.200 --> 00:37:15.400
Here we have the sum ÷ n – 1 × stdev⁴.
00:37:15.400 --> 00:37:19.300
I need to take all of that and subtract 3.
00:37:19.300 --> 00:37:27.800
I’m going to put on another set of parentheses around this whole thing and subtract 3.
00:37:27.800 --> 00:37:29.500
Hit enter.
00:37:29.500 --> 00:37:37.900
Here I get negative kurtosis.
00:37:37.900 --> 00:37:39.800
That means it is flatter than normal.
00:37:39.800 --> 00:37:47.200
Notice that it is not more negative than -3, that is the maximum.
00:37:47.200 --> 00:37:54.300
I’m going to take this whole thing right here and paste it right here.
00:37:54.300 --> 00:38:12.700
What we see is similarly this is less flat than the one we just saw but it is not quite close to normal but is more normal.
00:38:12.700 --> 00:38:21.800
If we copy and paste all of that over here, we find here this is more sharply peaked than normal.
00:38:21.800 --> 00:38:30.000
That is our kurtosis on Excel.
00:38:30.000 --> 00:38:31.100
Let us move on to some examples.
00:38:31.100 --> 00:38:33.800
Here is example 1.
00:38:33.800 --> 00:38:40.600
Given that on a particular sample, mean is less than the median is less than the mode.
00:38:40.600 --> 00:38:44.100
What is the likely shape of this distribution?
00:38:44.100 --> 00:38:55.500
This means that somehow the mode or the peak and the mean is somewhere on this side of it.
00:38:55.500 --> 00:39:01.000
Here is the mode and the mean, median is somewhere in between.
00:39:01.000 --> 00:39:11.900
That would mean that since this guy, the mean is highly affected by outliers there must be outliers on this side.
00:39:11.900 --> 00:39:26.000
I’m going to guess that this is a negative skew, left skewed.
00:39:26.000 --> 00:39:32.900
That means the skewness number should be negative.
00:39:32.900 --> 00:39:38.700
What about in a sample where the mean is greater than the median which is greater than the mode.
00:39:38.700 --> 00:39:40.400
What is the likely shape?
00:39:40.400 --> 00:39:42.800
We just have to reason backwards.
00:39:42.800 --> 00:39:54.900
The mode is the smallest and the mean, the median is somewhere in between.
00:39:54.900 --> 00:39:59.800
The mean is pulled by the outliers, my outliers must be here.
00:39:59.800 --> 00:40:06.400
I’m going to say this is a right skewed distribution.
00:40:06.400 --> 00:40:12.900
That would mean that the skewness is greater than 0.
00:40:12.900 --> 00:40:16.800
It is a positive number.
00:40:16.800 --> 00:40:18.300
Example 3.
00:40:18.300 --> 00:40:28.600
If a distribution has a kurtosis close to 0 and skewness close to 0, what is the likely shape of the distribution?
00:40:28.600 --> 00:40:37.700
We know that skewness close to 0 means that it is basically symmetric, but it could be symmetric in lots of ways.
00:40:37.700 --> 00:40:40.300
It does not have to be normally distributed.
00:40:40.300 --> 00:40:51.100
In a kurtosis that is also 0, we know that must means that the tails are not too fat, not too skinny.
00:40:51.100 --> 00:40:55.200
The peak is not too pointy and dull either.
00:40:55.200 --> 00:41:08.100
If both skewness and kurtosis are 0, we could very likely think of this as approximately normal.
00:41:08.100 --> 00:41:12.900
That is probably a good way to guess.
00:41:12.900 --> 00:41:14.600
Finally example 4.
00:41:14.600 --> 00:41:25.000
Sketch a potential distribution that can have a kurtosis of 1 then sketch over in a distribution that can a have a kurtosis of -1.
00:41:25.000 --> 00:41:36.200
I thought the positive 1 to be easier because to me I always think when it is positive it means it is pointy.
00:41:36.200 --> 00:41:43.800
Over in the sketch of the distribution that is less pointy.
00:41:43.800 --> 00:41:47.200
It is always something like that.
00:41:47.200 --> 00:41:49.500
That is it for skewness and kurtosis.
00:41:49.500 --> 00:41:51.000
Thanks for using www.educator.com.