WEBVTT mathematics/statistics/son
00:00:00.000 --> 00:00:01.500
Hi and welcome to www.educator.com.
00:00:01.500 --> 00:00:05.500
We are going to be talking about hypothesis testing for the difference between two independent means.
00:00:05.500 --> 00:00:11.900
We are going to go over the goal of hypothesis testing in general.
00:00:11.900 --> 00:00:15.000
We have only looked at it for one means so far, but we are going to look at
00:00:15.000 --> 00:00:19.200
how it changes just very suddenly when we talk about two means.
00:00:19.200 --> 00:00:25.400
We are going to re-talk about the sampling distribution of the difference between two means.
00:00:25.400 --> 00:00:32.200
You have just watched the confidence interval for two means, then you do not need to watch this one.
00:00:32.200 --> 00:00:34.600
You do not need to watch that section.
00:00:34.600 --> 00:00:42.800
We are going to talk about the same conditions for doing hypothesis testing as first confidence interval.
00:00:42.800 --> 00:00:47.600
They need to meet three conditions before you could do either of these two.
00:00:47.600 --> 00:00:54.700
When we talk about the modified steps of hypothesis testing for two means and the formulas that go with those steps.
00:00:54.700 --> 00:01:00.000
Let us talk about the goal of hypothesis testing.
00:01:00.000 --> 00:01:04.800
In one sample what we wanted to do was reject the null if
00:01:04.800 --> 00:01:12.800
we got a sample that was significantly different from the hypothesized μ.
00:01:12.800 --> 00:01:16.500
For instance, significantly lower or significantly higher.
00:01:16.500 --> 00:01:23.600
A significant does not mean important like it does in our modern use of the word.
00:01:23.600 --> 00:01:26.500
It actually means does it standout?
00:01:26.500 --> 00:01:28.100
Is it weird enough?
00:01:28.100 --> 00:01:31.600
Does it stand out from the hypothesized μ?
00:01:31.600 --> 00:01:35.200
In those cases we reject the null.
00:01:35.200 --> 00:01:37.200
Our goal is to reject the null.
00:01:37.200 --> 00:01:46.100
We can only say whether something is sufficiently weird w cannot say whether it is sufficiently similar.
00:01:46.100 --> 00:01:49.900
Experiment is actually a success if they reject the null.
00:01:49.900 --> 00:01:58.100
If they do not reject the null it is considered a null experiment or what we think of as uninformative which is not actually true.
00:01:58.100 --> 00:02:03.000
That is how traditionally is that.
00:02:03.000 --> 00:02:10.500
This is the case where we only have one sample and we have a hypothesized population.
00:02:10.500 --> 00:02:24.400
Here we have two samples and in order to reject the null we need to get samples that are significantly different from each other.
00:02:24.400 --> 00:02:31.300
They stand out from each other so x is different from y, y is different from x.
00:02:31.300 --> 00:02:33.800
That is what we are really looking for.
00:02:33.800 --> 00:02:39.400
Once again, just like the one sample, we cannot say whether they are sufficiently similar,
00:02:39.400 --> 00:02:42.700
but we can say whether they are sufficiently different.
00:02:42.700 --> 00:02:49.900
It is okay if x is significantly lower than y or significantly higher.
00:02:49.900 --> 00:02:50.700
We do not really care.
00:02:50.700 --> 00:02:53.100
We just care about significantly different.
00:02:53.100 --> 00:02:59.100
If you do not care about which direction these are called two-tailed hypotheses.
00:02:59.100 --> 00:03:09.200
Let us think if x and y are different from each other then x - y should not be 0.
00:03:09.200 --> 00:03:16.400
But if x and y are exactly the same, x = y then x – y =0.
00:03:16.400 --> 00:03:21.300
Because you can think about this as x – x because x – y.
00:03:21.300 --> 00:03:30.700
If you want to think about it algebraically even if you add y to each side you would get perfectly x= y.
00:03:30.700 --> 00:03:38.500
If x and y were the same, we should expect their difference to be 0.
00:03:38.500 --> 00:03:48.400
Let us just review very briefly the sampling distribution of the difference between two means.
00:03:48.400 --> 00:03:53.400
This is the case where we do not know what the population is like,
00:03:53.400 --> 00:04:02.000
but because of the CLT we actually end up knowing quite a bit about the SDOM.
00:04:02.000 --> 00:04:07.100
This is x the population of x and population of y.
00:04:07.100 --> 00:04:17.700
This is the SDOM of x bar, so the whole bunch of x bars and this is the SDOM for y which is a whole bunch of y bars.
00:04:17.700 --> 00:04:39.800
We know some things about these guys and we also know we can figure out the standard error from the sample.
00:04:39.800 --> 00:04:43.700
What is nice about this if we do not need to know anything about the population.
00:04:43.700 --> 00:04:48.200
All we have to do is know the standard deviation of the sample which we could easily calculate
00:04:48.200 --> 00:04:54.600
in order to estimate the standard error of these two populations.
00:04:54.600 --> 00:05:06.300
Once we have that now we can start talking about the SDOD (the sampling distribution of the difference between means).
00:05:06.300 --> 00:05:21.700
What we want to do is instead of finding μ sub x or μ sub y, we want to know μ sub x bar – y bar.
00:05:21.700 --> 00:05:32.300
Here you have to think of pulling out one sample from here and one sample from here getting the difference and plotting it.
00:05:32.300 --> 00:05:36.300
If these guys are normal, we can assume this one to be normal.
00:05:36.300 --> 00:05:42.000
Not only that but we can figure out the standard error of this guy as well just
00:05:42.000 --> 00:05:57.000
from knowing these because the standard error is going to be square roots of s sub x².
00:05:57.000 --> 00:06:06.500
The variance of s/n sub x + variance of y/ n sub y.
00:06:06.500 --> 00:06:08.200
These are all things that we have.
00:06:08.200 --> 00:06:10.600
We do not need anything special.
00:06:10.600 --> 00:06:12.000
We do not need sigma or anything like that.
00:06:12.000 --> 00:06:14.500
We just need samples in order to calculate this.
00:06:14.500 --> 00:06:24.400
If these two distributions or if these two distributions, the population distribution,
00:06:24.400 --> 00:06:29.200
if we have a reason to suspect that these have homogeneous variance.
00:06:29.200 --> 00:06:35.000
If their variances are the same then instead of s sub s² and s sub y²,
00:06:35.000 --> 00:06:45.300
we can actually use spull² but we would not be doing that in this lesson, but you can.
00:06:45.300 --> 00:06:55.500
Remember the rules of the SDOD are very similar to the CLT and if the SDOM for x is normal
00:06:55.500 --> 00:06:59.200
and SDOM for y is normal then SDOD is normal too.
00:06:59.200 --> 00:07:01.400
There is two ways that this could be true.
00:07:01.400 --> 00:07:08.300
The first way is if populations are normal.
00:07:08.300 --> 00:07:15.300
If population of x and y are normal then we could assume SDOM for x and y are normal.
00:07:15.300 --> 00:07:23.200
Or are your other possibility is if n is large enough.
00:07:23.200 --> 00:07:30.500
We want to talk about the mean for the null hypothesis.
00:07:30.500 --> 00:07:36.900
The null hypotheses is saying that the population of x and population of y,
00:07:36.900 --> 00:07:41.000
the difference between them is going to be 0 because they are similar.
00:07:41.000 --> 00:07:47.100
The null hypotheses is saying both are similar, which means that the means of
00:07:47.100 --> 00:07:54.600
the sampling distribution of the means, the SDOM means is going to be similar.
00:07:54.600 --> 00:07:58.100
Which means that is strap in and will give us 0.
00:07:58.100 --> 00:08:06.300
The null hypothesis says the mean of these differences of means it is going to be 0.
00:08:06.300 --> 00:08:17.500
That is the null hypotheses and that is really saying that the SDOM for x and SDOM for the y are very similar.
00:08:17.500 --> 00:08:22.400
Let us talk about standard error for independent samples.
00:08:22.400 --> 00:08:25.900
Remember, we are still talking just about independent samples.
00:08:25.900 --> 00:08:31.200
When variance is homogenous that is only used as Spull idea.
00:08:31.200 --> 00:08:40.800
That means that x sub x bar - y bar is going to be equal to and pretend you are
00:08:40.800 --> 00:08:49.500
writing just the regular idea where you are dividing by n sub x and n sub y.
00:08:49.500 --> 00:08:56.500
Instead of using the variance from x and the variance from y, we are going to use that pulled variance idea.
00:08:56.500 --> 00:09:07.300
That is going to be s pulled.
00:09:07.300 --> 00:09:13.700
Some people think why do we just put that on top and put n sub x and n sub y at the bottom?
00:09:13.700 --> 00:09:19.600
That will be algebraically wrong because remember, these are the denominators we would have
00:09:19.600 --> 00:09:25.500
to have common denominators in order for us to put these together and we do not have common denominators yet.
00:09:25.500 --> 00:09:36.100
What about in the case where variance is not homogenous and this is the vast majority of time and when in doubt,
00:09:36.100 --> 00:09:42.000
when you do not know anything about the variance of the population go with this one.
00:09:42.000 --> 00:09:44.300
It is just a safer option.
00:09:44.300 --> 00:10:02.400
This is going to mean that this standard error is represented by the variance of x /n + variance of y /n.
00:10:02.400 --> 00:10:05.200
Add these together and square the whole thing.
00:10:05.200 --> 00:10:16.000
Just to recap, same conditions must be met in order to do hypothesis testing
00:10:16.000 --> 00:10:22.400
for two means as the conditions for doing a confidence interval for two means.
00:10:22.400 --> 00:10:32.100
It is that the two samples were randomly and independently selected from two different populations,
00:10:32.100 --> 00:10:36.100
it is reasonable to assume that both populations that the sample come from are
00:10:36.100 --> 00:10:40.700
normally distributed or the sample sizes are sufficiently large.
00:10:40.700 --> 00:10:43.400
This was to ensure the normality of the SDOM.
00:10:43.400 --> 00:10:51.500
Also in the case of the sample surveys, the population size should be at least 10 times larger than the sample size for each sample.
00:10:51.500 --> 00:11:03.200
That is just assume so that we could assume replacement because probability actions change when you do not assume replacement.
00:11:03.200 --> 00:11:09.000
Let us go in the steps of the hypothesis testing.
00:11:09.000 --> 00:11:16.800
These are the same steps as you did when you have one mean, except now that we are subtly changing a few things.
00:11:16.800 --> 00:11:19.600
I'm going to highlight those changes as we go through this.
00:11:19.600 --> 00:11:26.300
First we need to state our hypotheses and remember now instead of having just the hypotheses that
00:11:26.300 --> 00:11:36.000
the mean of the population equals this, what we are saying is that the mean of x,
00:11:36.000 --> 00:11:41.300
population of x and the mean of the population of y those are the same.
00:11:41.300 --> 00:11:47.200
μ sub x - y will be 0.
00:11:47.200 --> 00:11:51.900
You can also write it as μ sub x = μ sub y.
00:11:51.900 --> 00:11:58.500
The alternative is that they are different from each other in some way.
00:11:58.500 --> 00:12:01.100
Then we pick a significance level.
00:12:01.100 --> 00:12:06.600
How different do these two populations have to be for us to say they are different?
00:12:06.600 --> 00:12:13.700
We set a decision stage, but instead of drawing the SDOM now we draw the SDOD.
00:12:13.700 --> 00:12:19.000
Because now we are looking at the differences between these to means.
00:12:19.000 --> 00:12:23.000
We identify critical limits and rejection regions.
00:12:23.000 --> 00:12:27.400
We also find the critical test statistic, the boundaries.
00:12:27.400 --> 00:12:33.000
In order to do this we have to find the degrees of freedom for the difference.
00:12:33.000 --> 00:12:38.900
We cannot just use the degrees of freedom for 1, degrees of freedom for the other but we actually add them together.
00:12:38.900 --> 00:12:44.400
And then use the samples and the SDOD to compute the mean difference.
00:12:44.400 --> 00:12:53.000
We are not just computing mean, but we are computing mean difference test statistics, as well as the p value.
00:12:53.000 --> 00:12:59.200
And then we compare the sample to the hypothesized population.
00:12:59.200 --> 00:13:01.600
We either reject the null or not.
00:13:01.600 --> 00:13:12.400
We reject the null if our test statistic and p value lie in those zones of rejection.
00:13:12.400 --> 00:13:14.500
It is like these are the weirdo zone.
00:13:14.500 --> 00:13:18.900
This is all we know that our sample is really different from this population.
00:13:18.900 --> 00:13:25.700
Let us talk about the different formulas that go along with these steps.
00:13:25.700 --> 00:13:37.400
Remember the first step is going to be, what is the hypothesis, the null hypotheses, as well as the alternative.
00:13:37.400 --> 00:13:56.000
This is not really a formula, but it is helpful to remember that this is what we really mean versus x bar – y bar does not equal 0.
00:13:56.000 --> 00:14:05.900
This is often what is going to be the case and you can rewrite this as μ sub x bar – μ sub y bar sometimes,
00:14:05.900 --> 00:14:16.700
but there are some mathematical ideas that you have to learn before you can write that.
00:14:16.700 --> 00:14:19.600
I will leave that aside for now.
00:14:19.600 --> 00:14:22.600
Second thing is significance level.
00:14:22.600 --> 00:14:33.400
Here there are no formulas but you should know that when we say α= .05 we are talking about that false alarm rate.
00:14:33.400 --> 00:14:37.600
This is the rate of rejecting the null when the null is actually true.
00:14:37.600 --> 00:14:40.900
This is a very low rate of false alarms.
00:14:40.900 --> 00:14:47.600
When we say α = .05 it is not that we calculated it but it is just that
00:14:47.600 --> 00:14:55.000
by convention science tends to say this is the reasonable level of significance.
00:14:55.000 --> 00:14:59.900
Sometimes people are more conservative than 1.0 or 1.001.
00:14:59.900 --> 00:15:05.400
Number 3, we need to set that decision stage.
00:15:05.400 --> 00:15:24.600
It is helpful to draw the SDOD and it is helpful to have our hypothesized population here.
00:15:24.600 --> 00:15:30.300
μ sub x bay – y bar = 0.
00:15:30.300 --> 00:15:32.500
We assume that this point is 0.
00:15:32.500 --> 00:15:40.900
One thing you probably also want to know about the SDOD is the formula for standard error.
00:15:40.900 --> 00:15:51.400
The formula for standard error of the SDOD we written this a lot of times,
00:15:51.400 --> 00:15:58.700
is the variance of x / n sub x + the variance of y / n sub y.
00:15:58.700 --> 00:16:04.900
Another thing, you probably want to know is that we need to find these critical t.
00:16:04.900 --> 00:16:13.500
We need to find the t values here and in order to find that you will need to know
00:16:13.500 --> 00:16:19.500
the degrees of freedom for the difference and it is pretty easy.
00:16:19.500 --> 00:16:23.600
It is the degrees of freedom for x + the degrees of freedom for y.
00:16:23.600 --> 00:16:27.800
To find this, it is n sub x -1.
00:16:27.800 --> 00:16:30.500
To find that it is n sub y -1.
00:16:30.500 --> 00:16:41.900
We could write this as n sub x -1 + n sub y -1.
00:16:41.900 --> 00:16:51.600
You could write it like that and then I think that is all you need to know for the decision stage.
00:16:51.600 --> 00:17:19.600
Step 4, if you have to compute the samples mean difference you need to calculate its test statistic as well as its p value.
00:17:19.600 --> 00:17:26.200
Remember we are going to be using t from here on out because obviously we are using s instead of sigma.
00:17:26.200 --> 00:17:30.400
Let us talk about how to come to the sample t.
00:17:30.400 --> 00:17:38.400
Let me write this as sample t.
00:17:38.400 --> 00:17:49.600
The sample t is really the distance between where our sample differences versus the hypothesized difference.
00:17:49.600 --> 00:17:55.100
We do not want it just in terms of that raw distance, we want in terms of the standard error.
00:17:55.100 --> 00:18:04.800
It is going to be whatever our x bar - y bar is the actual sample difference -0.
00:18:04.800 --> 00:18:17.400
That is our hypothesized population divided by the standard error s sub x bar – y bar.
00:18:17.400 --> 00:18:24.600
That will give you how many standard errors away our actual mean difference is from 0.
00:18:24.600 --> 00:18:33.100
Once you have this t value and you have the degrees of freedom,
00:18:33.100 --> 00:18:40.800
then you can find the p value and then you could reject or accept the null hypotheses.
00:18:40.800 --> 00:18:45.900
Reject or do not reject, that is really the technical idea there.
00:18:45.900 --> 00:18:50.700
Let us go onto some examples.
00:18:50.700 --> 00:19:00.600
The Cheesy Cheesy cookies company wanted to know whether they should have a coarse or fine texture in their cheesy cookies.
00:19:00.600 --> 00:19:03.300
They assembled a series of taste testing panels that tasted either the coarse
00:19:03.300 --> 00:19:13.100
or fine textured cookies and gave it a palatability score.
00:19:13.100 --> 00:19:14.400
The higher score the better.
00:19:14.400 --> 00:19:21.900
Is there a statistical difference in the mean palatability score between the two texture levels?
00:19:21.900 --> 00:19:33.800
If you download the examples below and you look under the example 1, you should see a data set that looks like this.
00:19:33.800 --> 00:19:37.600
This is the palatability score and this is the texture.
00:19:37.600 --> 00:20:00.100
I believe that 0 = coarse and 1= fine, just so that we can make some sort of recommendation at the end.
00:20:00.100 --> 00:20:08.700
Here we go, we have these different sets of scores, so this is the score that
00:20:08.700 --> 00:20:14.200
one panel came up with and that panel tasted coarse textured cheesy cookies.
00:20:14.200 --> 00:20:20.700
This panel also tasted coarse and that is the score it gave it.
00:20:20.700 --> 00:20:22.700
Let us go up to fine.
00:20:22.700 --> 00:20:27.300
They tasted fine texture and they give it that score.
00:20:27.300 --> 00:20:31.100
They also tasted fine and they give it that score.
00:20:31.100 --> 00:20:39.700
You could go and see what the different scores are and what texture they had.
00:20:39.700 --> 00:20:45.200
First, let us think about what our x and y?
00:20:45.200 --> 00:20:47.200
What are our two independent samples?
00:20:47.200 --> 00:20:51.600
The two independent samples here seem to come from the two different textures.
00:20:51.600 --> 00:21:00.000
One group of scores they all tasted coarse texture cheesy cookies.
00:21:00.000 --> 00:21:03.900
The other group of scores tasted fine textured cheesy cookies.
00:21:03.900 --> 00:21:10.300
It might be helpful to us to sort this data by texture.
00:21:10.300 --> 00:21:21.200
I am going to take this and I am going to ask.
00:21:21.200 --> 00:21:30.700
It would work if I move score over.
00:21:30.700 --> 00:21:35.700
What I am going to do is just hit sort.
00:21:35.700 --> 00:21:51.400
Here these are all our coarse cheesy cookie, the palatability scores and here are my fine cheesy cookie palatability scores.
00:21:51.400 --> 00:21:55.400
Let us think about how we want to approach this problem.
00:21:55.400 --> 00:22:02.200
First thing we want to do is create some sort of hypothesize population.
00:22:02.200 --> 00:22:06.900
Our hypothesize population is really going to say that the coarse and
00:22:06.900 --> 00:22:10.600
fine textured cheesy cookies there is really no difference between them.
00:22:10.600 --> 00:22:11.700
They are the same.
00:22:11.700 --> 00:22:17.200
The μ sub x bar - y bar should equal 0.
00:22:17.200 --> 00:22:26.200
The alternative is that they are different from each other in some way.
00:22:26.200 --> 00:22:31.700
We do not know which one taste better.
00:22:31.700 --> 00:22:38.600
Let us just be neutral and say we do not know whether the coarse cheesy cookies
00:22:38.600 --> 00:22:44.100
are better than the fine or to fine cheesy cookies are better than the coarse.
00:22:44.100 --> 00:22:50.500
We want to know whether these palatability scores are different or the same.
00:22:50.500 --> 00:22:57.100
Let us set a significance level for how different they have to be.
00:22:57.100 --> 00:23:06.100
Our significance level could be α= .05.
00:23:06.100 --> 00:23:09.700
Finally let us set a decision stage.
00:23:09.700 --> 00:23:18.000
Here I am going to draw SDOD, can we assume normality?
00:23:18.000 --> 00:23:25.500
Well, they are different and let us look here.
00:23:25.500 --> 00:23:36.100
We have 8 scores and 8 scores, the n is low.
00:23:36.100 --> 00:23:44.500
Technically, we might not be able to do hypothesis testing.
00:23:44.500 --> 00:23:50.000
Let us say for some reason that your teacher wants you doing anyway.
00:23:50.000 --> 00:23:56.300
But one of the things that should come up when you see low n like this is that you should question
00:23:56.300 --> 00:24:05.800
whether hypothesis testing is the right way to go because it may not reflect the conditions
00:24:05.800 --> 00:24:09.500
that we need to have set before we can assume all the stuff.
00:24:09.500 --> 00:24:14.400
Just for the problem solving and practice here, let us go with that.
00:24:14.400 --> 00:24:25.800
But if you want it to be smaller you can tell your instructor the conditions are meet for hypothesis testing.
00:24:25.800 --> 00:24:37.900
Here we set our little lower n rejection and why do we just go ahead and put in our μ here.
00:24:37.900 --> 00:24:43.400
It is going to be 0 and it will be helpful to find out that t values out here.
00:24:43.400 --> 00:24:46.300
Let us go ahead and do that.
00:24:46.300 --> 00:24:51.000
What are our critical t?
00:24:51.000 --> 00:24:54.000
Critical t or the boundaries.
00:24:54.000 --> 00:25:03.100
In order to find the critical t, we are going to have to find the degrees of freedom, DF of differences.
00:25:03.100 --> 00:25:12.100
N sub x we will call x coarses.
00:25:12.100 --> 00:25:20.900
X will be coarse cheesy cookies and y will be fine.
00:25:20.900 --> 00:25:24.000
You can use c and f if you want to.
00:25:24.000 --> 00:25:28.400
This is going to be 8 and this is also 8.
00:25:28.400 --> 00:25:34.100
The degrees of freedom for each of these is 7 so this is going to be 14.
00:25:34.100 --> 00:25:36.800
That is a pretty low degrees of freedom.
00:25:36.800 --> 00:25:39.900
That is all we can assume normality here.
00:25:39.900 --> 00:25:45.600
Let us find the critical t.
00:25:45.600 --> 00:26:02.000
In order to find that we would use t inverse because we have the two tailed probability .05 and we have the degrees of freedom.
00:26:02.000 --> 00:26:04.900
This gives us a positive version.
00:26:04.900 --> 00:26:13.000
The negative version would just be the negative of that number because they are perfectly symmetrical.
00:26:13.000 --> 00:26:21.200
2.14 the critical t is + or -2.14.
00:26:21.200 --> 00:26:28.900
Now that we have that, then we could go ahead and look at the actual samples themselves.
00:26:28.900 --> 00:26:38.200
Step 4, is we need to find the samples mean difference.
00:26:38.200 --> 00:26:46.300
We need to find x bar – y bar, but we also need to find this mean differences t.
00:26:46.300 --> 00:26:50.100
The t sub x bar - y bar.
00:26:50.100 --> 00:26:53.200
We need to find that as well as the p value.
00:26:53.200 --> 00:26:57.800
Let us go ahead and do that.
00:26:57.800 --> 00:27:35.700
We just started from step 3 and step 4 is really the mean difference and that is just the average of these guys - the average of these guys.
00:27:35.700 --> 00:27:41.900
That is their average difference.
00:27:41.900 --> 00:27:48.000
This is saying that the coarse scores tend to be on average lower than
00:27:48.000 --> 00:27:51.300
the fine scores because we do course score – fine score.
00:27:51.300 --> 00:27:52.600
We get a negative number.
00:27:52.600 --> 00:27:57.000
The coarse score number must have been small.
00:27:57.000 --> 00:28:10.200
Actually before we go on, it might be helpful to find the standard error of this situation.
00:28:10.200 --> 00:28:19.100
In order to find the standard error of the difference we need to find
00:28:19.100 --> 00:28:36.700
the square roots of the variance of x ÷ n sub x + the variance of y ÷ n sub y.
00:28:36.700 --> 00:28:43.800
This is going to be our standard error that we need.
00:28:43.800 --> 00:28:50.800
In order to find that it would be helpful to find each of these pieces by themselves.
00:28:50.800 --> 00:29:10.200
I guess we could find the whole thing, the variance of x ÷ n sub x and the variance of y ÷ n sub y.
00:29:10.200 --> 00:29:14.400
I will put each of these on different lines like we can do all of it together.
00:29:14.400 --> 00:29:16.800
We could just add them all up here.
00:29:16.800 --> 00:29:23.000
Let us find that.
00:29:23.000 --> 00:29:30.700
The variance, thankfully Excel has all these functions.
00:29:30.700 --> 00:29:38.000
Let us check and make sure that this variance will give us n-1.
00:29:38.000 --> 00:29:58.700
The variance of x ÷ 8 and the variance of all my fine cheesy cookie values ÷ 8.
00:29:58.700 --> 00:30:11.400
We have these two variances and when we divide by n sub x we are getting the variance of the SDOM.
00:30:11.400 --> 00:30:20.000
If we add those together then get the square root, then we get the standard error of the difference.
00:30:20.000 --> 00:30:30.200
The square root of these two guys added together and that is 11.16.
00:30:30.200 --> 00:30:50.700
Here I will just add this information so the standard error of the difference =11.16.
00:30:50.700 --> 00:31:06.300
In order to find this t, we need to have this difference between the means -0 / the standard error of the difference.
00:31:06.300 --> 00:31:11.000
We can easily do that now.
00:31:11.000 --> 00:31:31.100
Here in order to find the sample t we could put the mean difference -0.
00:31:31.100 --> 00:31:41.500
If you want to keep it technical you do not need that -0 / the standard error of the difference.
00:31:41.500 --> 00:31:53.800
Our sample t says the difference is not at 0 it is actually way down here.
00:31:53.800 --> 00:31:57.300
It is not significantly different.
00:31:57.300 --> 00:32:02.700
Well, one thing we could do is just operate here and compare this number to this number.
00:32:02.700 --> 00:32:07.700
This sub boundary here is -2.14.
00:32:07.700 --> 00:32:15.300
-4.73 is like out here so we definitely know it is way significant.
00:32:15.300 --> 00:32:23.700
It is way standing out from the expected mean but we can also find the p value.
00:32:23.700 --> 00:32:30.800
Now remember in Excel one of the things it needs a positive t value.
00:32:30.800 --> 00:32:39.300
If you have a negative t value you have to turn it into a positive one, but it is okay because it is perfectly symmetrical.
00:32:39.300 --> 00:32:43.100
The degrees of freedom that we are talking about are going to be this
00:32:43.100 --> 00:32:48.900
new combined degrees of freedom because we are always talking that the SDOM now.
00:32:48.900 --> 00:32:56.000
This is the degrees of freedom for this SDOD and that is 14 and it is a two-tailed hypothesis.
00:32:56.000 --> 00:33:01.500
Our p value is .0003.
00:33:01.500 --> 00:33:11.600
I will not write the last up here but we can just talk about it.
00:33:11.600 --> 00:33:16.700
The last step would be we reject or do not reject the null.
00:33:16.700 --> 00:33:23.400
Well, we reject the null here because our t value is much lower than our significance level.
00:33:23.400 --> 00:33:30.300
Our t value, our sample t is more extreme than our critical t.
00:33:30.300 --> 00:33:38.200
Here what we would say is that there is a statistical difference between the two texture levels.
00:33:38.200 --> 00:33:46.500
One that is very unlikely to be attributed to by chance, because that is what this t values.
00:33:46.500 --> 00:33:52.900
If it was by chance it would have .03% probability.
00:33:52.900 --> 00:33:54.800
It is pretty low.
00:33:54.800 --> 00:34:02.300
Example 2, scientists have found certain tree resins that are deadly to termites.
00:34:02.300 --> 00:34:09.900
To test the protective power of resin protecting the tree, a lab prepared 16 dishes with 25 termites in each.
00:34:09.900 --> 00:34:15.600
Each dish was randomly assigned to be treated with 5 mg or 10 mg of resin.
00:34:15.600 --> 00:34:20.500
At the end of 15 days, the number of surviving termites was counted.
00:34:20.500 --> 00:34:25.800
Assume that termites survival tends to be normally distributed with both dosage levels.
00:34:25.800 --> 00:34:32.400
Is there a statistical significant difference in the mean number of survival for those two doses?
00:34:32.400 --> 00:34:37.500
Now here I think it is worth than just discussing what will be our x and y.
00:34:37.500 --> 00:34:46.700
Our x might be the 5 mg population and our y might be the 10 mg population.
00:34:46.700 --> 00:34:59.600
The n sub x some people might think there are 25 termites but actually there is 25 termites in each of 10 Peachtree dishes.
00:34:59.600 --> 00:35:08.900
There are 8 Peachtree dishes that have been randomly treated with 5 mg and 8 have been treated with 10 mg.
00:35:08.900 --> 00:35:13.100
This is 8 and 8.
00:35:13.100 --> 00:35:24.000
When I say 8, we mean the dishes of treatment and the termites are not the subject they are the cases that we are interested in.
00:35:24.000 --> 00:35:27.800
The termites are the test.
00:35:27.800 --> 00:35:33.700
You can get 25 termites surviving or you could get 0 surviving.
00:35:33.700 --> 00:35:35.500
How many termites survived?
00:35:35.500 --> 00:35:37.200
That is our dependent variable.
00:35:37.200 --> 00:35:42.000
Okay, let us see.
00:35:42.000 --> 00:35:46.400
Well one thing we could do is start off with our hypotheses.
00:35:46.400 --> 00:35:53.400
Our null hypotheses is that these two dosage levels are roughly the same.
00:35:53.400 --> 00:36:00.900
We might say something like the μ sub x bar - y bar which is equal 0 are the same.
00:36:00.900 --> 00:36:05.800
The alternative is that they are not the same.
00:36:05.800 --> 00:36:09.200
Maybe that one is more powerful than the other.
00:36:09.200 --> 00:36:12.900
We do not know which one.
00:36:12.900 --> 00:36:18.700
We could easily set our significance level to be .05.
00:36:18.700 --> 00:36:24.400
Let us talk about the actual set up, the decision stage.
00:36:24.400 --> 00:36:34.900
In the decision stage, let us see what we have here.
00:36:34.900 --> 00:36:53.000
We have set up this .05 level rejection and we could just go ahead and this is the x bar - y bar, but what would be that t?
00:36:53.000 --> 00:37:06.500
The nice thing about this being 0 is that the t distribution as well as the x bar – y bar start off the same.
00:37:06.500 --> 00:37:09.500
They are not going to have the same numbers out here.
00:37:09.500 --> 00:37:13.100
Okay, so that is why we do have to put them on different lines.
00:37:13.100 --> 00:37:14.800
They are still talking about different things.
00:37:14.800 --> 00:37:19.700
Let us talk about the t values.
00:37:19.700 --> 00:37:27.200
Before we do, it might be helpful to figure out the new degrees of freedom.
00:37:27.200 --> 00:37:35.100
The degrees of freedom of differences will be 7 + 7 =14.
00:37:35.100 --> 00:37:40.700
Here we can do hypothesis testing just jump in right away because given
00:37:40.700 --> 00:37:47.300
the termite survival tends to be normally distributed within these two dosage rates.
00:37:47.300 --> 00:38:04.600
If you go to example 2, you will actually see the data here.
00:38:04.600 --> 00:38:12.700
Here we see dosage and here is the 5 mg, as well as the 10 mg.
00:38:12.700 --> 00:38:14.600
Here are the survival counts.
00:38:14.600 --> 00:38:16.200
How many termites survived?
00:38:16.200 --> 00:38:19.500
Notice that there is no survival count over 25.
00:38:19.500 --> 00:38:24.100
25 is the maximum you can have, but even the highest gives me 16.
00:38:24.100 --> 00:38:31.500
What if the survival count cannot go below 0 because we cannot have negative termite surviving.
00:38:31.500 --> 00:38:36.800
Here we have the survival count.
00:38:36.800 --> 00:38:43.600
Let us see what we have here.
00:38:43.600 --> 00:38:49.600
Can we figure out what the critical t is.
00:38:49.600 --> 00:38:54.900
Can we figure out what the critical t is?
00:38:54.900 --> 00:38:56.200
I think we can.
00:38:56.200 --> 00:38:58.400
Let us see.
00:38:58.400 --> 00:39:04.500
You can use the book but I am going to use Excel to find the critical t.
00:39:04.500 --> 00:39:07.100
I am going to write for myself step 4.
00:39:07.100 --> 00:39:22.600
I know the two-tailed probability that I need .05 and I know my degrees of freedom is 14.
00:39:22.600 --> 00:39:27.100
I see that the critical t is the same as before and because we use
00:39:27.100 --> 00:39:32.600
the same two tailed probability and the same degrees of freedom of differences.
00:39:32.600 --> 00:39:44.300
Here we know that it is -2.14, as well as positive 2.14.
00:39:44.300 --> 00:39:54.400
What we can do is now from here go on to looking at our actual sample.
00:39:54.400 --> 00:40:06.000
This is actually step 3, it is a part of our decision stage.
00:40:06.000 --> 00:40:10.200
Step 4, is now actually talking about the sample.
00:40:10.200 --> 00:40:30.900
It will help to find the sample mean difference, so that is going to be the average of one of these x - the average y.
00:40:30.900 --> 00:40:36.600
We want to know is this is difference going to be significantly different from 0?
00:40:36.600 --> 00:40:43.000
We cannot just look at the raw scores because we need to figure out how many standard errors away we are.
00:40:43.000 --> 00:40:47.700
How shall we find the standard error for the difference?
00:40:47.700 --> 00:40:58.300
That is equal to the square root of the variance of x/ n sub x + variance of y/ n sub y.
00:40:58.300 --> 00:41:07.900
Let us find the variance of x/ n sub x over and variance of y/ n sub y.
00:41:07.900 --> 00:41:26.100
Let us find the variance of x/8 and the variance of y /8.
00:41:26.100 --> 00:41:33.100
We see that the variance for y is a lot different than the variance for x.
00:41:33.100 --> 00:41:40.000
That is helpful for us to just look at briefly right now just because this will probably give us an idea
00:41:40.000 --> 00:41:47.200
that the variance of samples are so different we probably do not have a good reason to pull these two together.
00:41:47.200 --> 00:41:51.400
We do not have a good reason to assume that the populations are similar.
00:41:51.400 --> 00:41:58.400
When in doubt go with non homogenous variances.
00:41:58.400 --> 00:42:00.400
Just assume that they are different.
00:42:00.400 --> 00:42:14.700
Once we have that then we can find the square root of adding these two standard errors together and we get 2.5.
00:42:14.700 --> 00:42:28.600
Once we have all of that then we can find the samples mean difference t.
00:42:28.600 --> 00:42:52.600
And that would be the samples mean difference -0 divided by the standard error of the SDOD.
00:42:52.600 --> 00:42:55.100
What would that be?
00:42:55.100 --> 00:43:06.500
That would be this guy and I am going to leave that subtract 0 part divided by the standard error and we get to 2.15.
00:43:06.500 --> 00:43:14.900
We are close but it is still more extreme than 2.14.
00:43:14.900 --> 00:43:23.300
It does not have to be extreme and the -n could be either extreme in the negative n or extreme in the positive n.
00:43:23.300 --> 00:43:26.800
This is extreme in the positive n.
00:43:26.800 --> 00:43:28.800
It is just right outside our borders.
00:43:28.800 --> 00:43:31.200
Let us find the p value.
00:43:31.200 --> 00:43:40.000
In order to find that p value we use t distribution because we have the t value that
00:43:40.000 --> 00:43:45.100
we want the degrees of freedom and we wanted to be a two-tailed p value.
00:43:45.100 --> 00:43:55.000
It is going to add up this little chunk and this little chunk together and that can be .049.
00:43:55.000 --> 00:44:12.800
We will just skip step 4, our p value =.0449 that is right just a hair underneath our α.05.
00:44:12.800 --> 00:44:17.000
We would probably reject the null.
00:44:17.000 --> 00:44:32.500
Example 3, 2 months before smoking ban in bars, a random sample of bar employees were assessed on respiratory health.
00:44:32.500 --> 00:44:38.200
Two months after the ban, another random sample of employees were assessed.
00:44:38.200 --> 00:44:44.100
Researchers saw a statistically significant increase in the mean scores of health.
00:44:44.100 --> 00:44:48.800
P= .049 we had an example of that two tailed.
00:44:48.800 --> 00:44:53.400
Which of the following is the best interpretation for this result?
00:44:53.400 --> 00:45:05.700
The probability is only .049 that the mean score for all of our employees increased from before to after the ban.
00:45:05.700 --> 00:45:07.900
Is that what this means?
00:45:07.900 --> 00:45:15.400
For me it helps to draw that SDOD and it is saying the null hypotheses would be
00:45:15.400 --> 00:45:20.300
the same like before and after are the same.
00:45:20.300 --> 00:45:26.700
What they actually found is that there is some extreme value.
00:45:26.700 --> 00:45:35.500
There is the increase in mean scores.
00:45:35.500 --> 00:45:42.100
There is a positive difference from after – before.
00:45:42.100 --> 00:45:44.900
There is the increased.
00:45:44.900 --> 00:45:49.300
It is somewhere up here, that increase tells us that.
00:45:49.300 --> 00:45:53.100
P= .04.
00:45:53.100 --> 00:45:59.900
We can actually draw this carefully, it is just right above that cut off.
00:45:59.900 --> 00:46:15.400
There is only .049 probability that the mean score for all bar employees increase.
00:46:15.400 --> 00:46:18.200
That is not what this means.
00:46:18.200 --> 00:46:22.900
It is not saying that there is only a small chance that it increase.
00:46:22.900 --> 00:46:27.500
It is actually saying there is a pretty good chance that it is not the same.
00:46:27.500 --> 00:46:32.300
There is a pretty small chance that it is the same.
00:46:32.300 --> 00:46:36.500
This one we can just rule out.
00:46:36.500 --> 00:46:45.400
Another possibility is that the mean score for all bar employees increased by more than 4.9%.
00:46:45.400 --> 00:46:54.400
Does this p value actually talk about the raw score on respiratory health?
00:46:54.400 --> 00:47:01.200
It does not talk about that score at all, it is the probability of finding such a difference.
00:47:01.200 --> 00:47:05.200
It does not have anything to do with actual scores.
00:47:05.200 --> 00:47:07.900
What about this one?
00:47:07.900 --> 00:47:15.000
An observed difference in the sample means as large or larger than the sample is unlikely to occur
00:47:15.000 --> 00:47:19.000
if the mean score for all bar employees before and after the ban were the same.
00:47:19.000 --> 00:47:21.700
This actually have something we can use.
00:47:21.700 --> 00:47:30.900
This is about considering that the means score for before and after are the same.
00:47:30.900 --> 00:47:34.300
That is important because that is what the SDOM actually represents.
00:47:34.300 --> 00:47:44.700
That is what this p value is actually talking something about this idea that when we get the sample,
00:47:44.700 --> 00:47:46.900
we consider that they were just the same.
00:47:46.900 --> 00:47:56.600
This is saying an observed difference in sample means as large or larger than a sample is very unlikely to occur.
00:47:56.600 --> 00:48:08.900
It is likely to occur with .049% if the mean score for all bar employees the true score is actually the same.
00:48:08.900 --> 00:48:19.900
This is a pretty good contender because the SDOD is talking about how .049 means very unlikely.
00:48:19.900 --> 00:48:22.600
This I would leave as a definite contender.
00:48:22.600 --> 00:48:24.900
Maybe there is a better answer.
00:48:24.900 --> 00:48:35.300
There is a 4.9% chance that the mean score of all bar employees after the ban is actually lower than before the ban.
00:48:35.300 --> 00:48:44.800
There is a small chance of the opposite hypotheses picture that is probably not the case.
00:48:44.800 --> 00:48:54.200
It depends on what the null hypothesis was.
00:48:54.200 --> 00:49:12.800
The null hypothesis and a two mean hypotheses test is usually the same not the one is less than the other.
00:49:12.800 --> 00:49:14.600
We do not usually do that.
00:49:14.600 --> 00:49:18.200
Maybe there is a way and that could be true.
00:49:18.200 --> 00:49:21.000
It is probably not true if we did hypothesis testing at all.
00:49:21.000 --> 00:49:31.200
Only 4.9% of the bar employees had their score drop but the other 95% had their scores increase.
00:49:31.200 --> 00:49:36.700
This would be a correct interpretation if we are not talking about the SDOD.
00:49:36.700 --> 00:49:41.800
If this was not a reflection of the population then maybe that would be true.
00:49:41.800 --> 00:49:47.500
This is not talking about population, it is talking about the SDOD.
00:49:47.500 --> 00:49:49.900
This is a wrong interpretation.
00:49:49.900 --> 00:49:52.600
The correct answer is c.
00:49:52.600 --> 00:49:58.000
That is our last example for hypotheses testing with two independent means.
00:49:58.000 --> 00:50:00.000
Thank you for joining us on www.educator.com.