WEBVTT mathematics/statistics/son
00:00:00.000 --> 00:00:01.900
Hi and welcome to www.educator.com.
00:00:01.900 --> 00:00:04.800
Today we are going to be introduced to competence intervals.
00:00:04.800 --> 00:00:13.400
Here is the roadmap for today, first we are going to do a brief overview of inferential statistics.
00:00:13.400 --> 00:00:22.500
We have been trying to do some inferential statistics but there have been a couple of problems we keep running into.
00:00:22.500 --> 00:00:24.400
So far I have fudged it.
00:00:24.400 --> 00:00:30.700
We will address some of those problems head on and come up with 2 solutions.
00:00:30.700 --> 00:00:38.900
One of those solutions is the competence interval and we are going to talk about competence intervals
00:00:38.900 --> 00:00:46.400
when the sigma, population, standard deviation is known and when sigma is unknown.
00:00:46.400 --> 00:00:49.500
Those are the two situations we are going to be focused on.
00:00:49.500 --> 00:00:54.800
Let us go over inferential statistics.
00:00:54.800 --> 00:01:05.400
We know the big picture idea there is some population represented by X and we wish we could know the population but we do not.
00:01:05.400 --> 00:01:10.900
But instead what we can know is little samples.
00:01:10.900 --> 00:01:14.400
We could know that but the problem is samples are biased.
00:01:14.400 --> 00:01:23.900
Whenever we have samples and we summarize them using these mathematical summaries we call them statistics.
00:01:23.900 --> 00:01:33.200
Just to give you an example of some statistics there things like x bar or s, those are all statistics.
00:01:33.200 --> 00:01:40.200
What we would like to do is use these samples to understand something about the population.
00:01:40.200 --> 00:01:48.200
Statistics, the field is about using these statistics to estimate parameters and
00:01:48.200 --> 00:01:52.600
to give you ideas about parameters there are things like μ or sigma.
00:01:52.600 --> 00:01:54.600
That is our whole goal.
00:01:54.600 --> 00:02:11.700
Here we realize in order to jump from things like x bar and s to μ and sigma we are going to need more than just wishful thinking.
00:02:11.700 --> 00:02:15.000
And that is where the sampling distributions come in.
00:02:15.000 --> 00:02:22.500
Here we talk about sampling distribution often we are talking about some sort of statistic.
00:02:22.500 --> 00:02:28.200
When we talk about sampling distribution of the mean we are talking about a whole bunch of x bars.
00:02:28.200 --> 00:02:29.900
Here we have a whole bunch of x.
00:02:29.900 --> 00:02:34.800
Here we have a whole bunch of x bar and that is the distribution.
00:02:34.800 --> 00:02:44.100
When we summarize these statistics in the sampling distribution we call them expected values.
00:02:44.100 --> 00:02:48.200
So it is not just μ, it is μ sub x bar.
00:02:48.200 --> 00:02:52.200
It is not just sigma it is sigma sub x bar.
00:02:52.200 --> 00:02:57.400
What we want to do is go from this to understand this but what we have learned
00:02:57.400 --> 00:03:06.000
so far is how to see the relationship between parameters and expected values.
00:03:06.000 --> 00:03:09.300
We know that these things have a relationship to each other.
00:03:09.300 --> 00:03:14.900
And from doing that we could then make this jump.
00:03:14.900 --> 00:03:18.900
It is like we use this to say something like this.
00:03:18.900 --> 00:03:30.000
There are two problems with this picture although it seems rosy and there is still to nagging questions.
00:03:30.000 --> 00:03:36.800
We would look at them a little bit before but we need to solve this more rigorously than we had before.
00:03:36.800 --> 00:03:41.900
One question is this, what happens when we do not know what the population looks like?
00:03:41.900 --> 00:03:49.000
Of course we could use the central limit theorem when we know μ and sigma from the population.
00:03:49.000 --> 00:03:51.000
What if we do not know μ?
00:03:51.000 --> 00:03:52.700
What if we do not know sigma?
00:03:52.700 --> 00:03:54.500
Then what happens?
00:03:54.500 --> 00:03:59.700
Also, how do we know whether a sample is sufficiently unlikely because remember the whole point
00:03:59.700 --> 00:04:13.900
of the sampling distribution is for us to take sampling distributions from a known population and compare it to an unknown population.
00:04:13.900 --> 00:04:21.100
If this sample does not match the sampling distribution enough that it is very unlikely to come from the sampling distribution.
00:04:21.100 --> 00:04:26.200
We could say this is probably not the population that the sample came from.
00:04:26.200 --> 00:04:29.000
How do we know when it sufficiently weird?
00:04:29.000 --> 00:04:35.000
To answer these two questions there is going to be to solutions.
00:04:35.000 --> 00:04:40.800
You can think of it as this one.
00:04:40.800 --> 00:04:47.000
This first question roughly, they are both actually are answered in each of these but this one goes along better with that one.
00:04:47.000 --> 00:04:50.800
This one goes along better with that one.
00:04:50.800 --> 00:05:01.700
The two solutions are these, one is competence interval.
00:05:01.700 --> 00:05:06.000
When we talk about competence interval here is what we are doing, we are going to figure out where μ might be from the sample.
00:05:06.000 --> 00:05:36.500
We are going to try to figure out the population μ from the sample and
00:05:36.500 --> 00:05:42.100
that is what we do when we do not know what the population looks like.
00:05:42.100 --> 00:05:44.600
We try to figure it out from the sample.
00:05:44.600 --> 00:05:49.200
Hypothesis testing actually takes another view.
00:05:49.200 --> 00:05:55.500
The hypothesis testing, we come up with a hypothesis for what the population is like.
00:05:55.500 --> 00:06:03.200
Hypothesize a population μ first.
00:06:03.200 --> 00:06:16.000
In this case we are saying we are going to pull from something and figure out and pick a potential population μ.
00:06:16.000 --> 00:06:27.000
And then we are going to test how weird the sample is.
00:06:27.000 --> 00:06:33.600
We are going to come up with a number to tell us this is how weird the sample is.
00:06:33.600 --> 00:06:38.200
We are going to decide is that weirdness weird enough?
00:06:38.200 --> 00:06:41.500
That is going to be hypothesis testing.
00:06:41.500 --> 00:06:44.100
But we are going to focus here on competence intervals.
00:06:44.100 --> 00:06:53.600
Okay, when we talk about competence intervals we need to get an inventory of what we know so far.
00:06:53.600 --> 00:06:58.600
Basically that is asking the question, which parameters are known or given to us?
00:06:58.600 --> 00:07:02.400
What happens when we do not know what the population looks like?
00:07:02.400 --> 00:07:04.100
Well we may not know what
00:07:04.100 --> 00:07:07.900
The population looks like because we do not know anything about the population, or we know
00:07:07.900 --> 00:07:11.000
Only a little bit about the population.
00:07:11.000 --> 00:07:13.800
This is the case where we know a little.
00:07:13.800 --> 00:07:24.400
Here we do not know μ but we do know sigma.
00:07:24.400 --> 00:07:30.400
For some reason we have some partial information and that helps us out.
00:07:30.400 --> 00:07:33.900
Here we know nothing.
00:07:33.900 --> 00:07:44.300
Here nothing is helping us that we do not know μ and we are trying to figure it out but we do not know sigma either.
00:07:44.300 --> 00:07:46.500
It is like nothing is helping us out here.
00:07:46.500 --> 00:07:51.100
We just have to pull ourselves up from our own bootstraps.
00:07:51.100 --> 00:07:55.200
These are the two situations that we are going to talk about it.
00:07:55.200 --> 00:08:00.000
Here is the goal of competence interval.
00:08:00.000 --> 00:08:04.100
The basic idea of the competence interval is going to be this.
00:08:04.100 --> 00:08:17.400
We are going to try to figure out where μ might be but we do know x bar.
00:08:17.400 --> 00:08:23.300
We know everything about the sample but we do not know anything about the population.
00:08:23.300 --> 00:08:28.400
But in this case I am going to show you what happens when we already know sigma.
00:08:28.400 --> 00:08:29.500
So we have a leg up.
00:08:29.500 --> 00:08:33.400
We know sigma life is little that easier for us today.
00:08:33.400 --> 00:08:43.700
Here is the thing, we do not know what the population looks like so cannot draw a normal or skewed or anything.
00:08:43.700 --> 00:08:50.800
We have no idea what the population looks like and we have no idea what the population μ is.
00:08:50.800 --> 00:08:52.800
But we for some reason know sigma is.
00:08:52.800 --> 00:08:54.300
Sigma is given to us.
00:08:54.300 --> 00:08:59.700
From there can we construct an SDOM?
00:08:59.700 --> 00:09:08.300
Given that n is sufficiently large we can assume that it is normal.
00:09:08.300 --> 00:09:13.300
We have no idea what μ is and so we do not know what μ sub x bar is.
00:09:13.300 --> 00:09:19.300
We do not know it at all but we can figure out sigma sub x bar.
00:09:19.300 --> 00:09:26.500
We could figure out the standard error because we have sigma and we could divide that by √n.
00:09:26.500 --> 00:09:30.200
We have a little bit of information about the SDOM.
00:09:30.200 --> 00:09:34.200
Here is what we do in competence intervals.
00:09:34.200 --> 00:09:46.100
First assume that the x bar is the μ sub x bar.
00:09:46.100 --> 00:09:51.600
Whatever your sample x bar is we are going to put back here.
00:09:51.600 --> 00:09:54.900
We are going to assume it.
00:09:54.900 --> 00:10:00.900
Here is why, because we always assume one thing to figure out the other,
00:10:00.900 --> 00:10:05.300
here we are going to assume things about the x bar to figure out μ.
00:10:05.300 --> 00:10:09.200
And hypothesis testing, we assume something about the population to figure out how
00:10:09.200 --> 00:10:11.700
Weird x bar is.
00:10:11.700 --> 00:10:20.700
Here because we know that the SDOM tends to be normal given a sufficiently
00:10:20.700 --> 00:10:32.000
Large n what we know is that we can find out with reasonable competence what some
00:10:32.000 --> 00:10:34.000
Significant borders are.
00:10:34.000 --> 00:10:42.200
For instance, let us say we are one standard deviation away.
00:10:42.200 --> 00:10:49.700
This is raw score and this is z score so we know at one standard deviation away
00:10:49.700 --> 00:11:00.500
this base right here we know that that is 68% of SDOM.
00:11:00.500 --> 00:11:03.100
Let us think about what this might mean.
00:11:03.100 --> 00:11:18.800
When we get these borders what we might end up saying is that these are the borders in which 68% of our values will fall in the SDOM.
00:11:18.800 --> 00:11:25.900
And here is what we could say we could also say that there is 68% chance that our
00:11:25.900 --> 00:11:31.600
Population μ will fall in that zone.
00:11:31.600 --> 00:11:37.000
That is a 68% competence interval.
00:11:37.000 --> 00:11:42.300
For 68% is higher than half, but it is not that high.
00:11:42.300 --> 00:11:46.700
But here is the thing we can have a high competence interval.
00:11:46.700 --> 00:11:53.300
We can have a 95% competence interval or we can have a 99% competence interval.
00:11:53.300 --> 00:11:56.000
That is what we can do.
00:11:56.000 --> 00:12:10.300
We can have here is my x bar, here is 0 but what we can do is figure out
00:12:10.300 --> 00:12:24.000
These borders such that we are now sure that 95% chance of having our
00:12:24.000 --> 00:12:28.500
Population mean fall in this interval.
00:12:28.500 --> 00:12:30.100
We can know that.
00:12:30.100 --> 00:12:32.700
That is called the competence interval.
00:12:32.700 --> 00:12:36.500
That is pretty hypothetically and you can even go to 99%.
00:12:36.500 --> 00:12:39.600
And we could easily figure out these borders.
00:12:39.600 --> 00:12:41.300
Here is how.
00:12:41.300 --> 00:12:51.700
Because we easily figure out the border we could figure out what the z scores are.
00:12:51.700 --> 00:13:04.900
This is what we call a two-tailed competence interval because even though the middle part is 95% that does not mean that part of 5%.
00:13:04.900 --> 00:13:15.900
You will have 105% so that means that part is .025 so 2.5% and this part is .025.
00:13:15.900 --> 00:13:18.600
And those the only parts that we are not sure.
00:13:18.600 --> 00:13:29.200
There is a small chance that the population mean will fall somewhere out here but it is a very small chance.
00:13:29.200 --> 00:13:32.500
We are trying to reduce it as much is possible.
00:13:32.500 --> 00:13:36.900
Let us think about how we could find the z score out here.
00:13:36.900 --> 00:13:50.000
We could use our tables in the back of the book, our z tables and we can look up and usually z tables will give you like one side.
00:13:50.000 --> 00:13:57.500
We can look up .025 and look at the z score or we could do it on our Excel.
00:13:57.500 --> 00:14:08.100
Instead of using normsdist, normsdist will give you the proportion of the distribution.
00:14:08.100 --> 00:14:14.700
We are going to put in normsin as the inverse and here we want to put in the probability.
00:14:14.700 --> 00:14:30.200
Now this is going to be my probability.
00:14:30.200 --> 00:14:44.100
I am going to put in this probability .025 and we get 1.967.
00:14:44.100 --> 00:14:52.200
This value here is -1.96 and because the normal distribution is symmetric we know that this part is also 1.96
00:14:52.200 --> 00:14:56.000
but now a positive instead of negative.
00:14:56.000 --> 00:15:08.100
We know our z values on the end and if we know the z values what is our raw score here?
00:15:08.100 --> 00:15:15.400
Tell me what this value is and also tell me what that value is.
00:15:15.400 --> 00:15:21.500
Well the z score tells you how many standard errors away you are.
00:15:21.500 --> 00:15:25.900
How many jumps away and each jump is worth that much.
00:15:25.900 --> 00:15:30.800
We are away 1.96 of these jumps.
00:15:30.800 --> 00:15:34.000
We are going to multiply this by this and then
00:15:34.000 --> 00:15:38.200
Either subtract it from x or add it to x.
00:15:38.200 --> 00:15:52.800
Step two in finding competence interval is let us say you want to find a 95% competence interval finds the z scores.
00:15:52.800 --> 00:15:57.600
It is all in the case where you know sigma.
00:15:57.600 --> 00:16:24.100
Step 3 is this, now you want to actually find the actual scores and that is going to be x bar + or -the z score × standard error.
00:16:24.100 --> 00:16:26.300
That is what you are going to do.
00:16:26.300 --> 00:16:28.700
And we know what the standard error is.
00:16:28.700 --> 00:16:42.800
I am going to rewrite this to be x bar + or - z score × sigma / √n.
00:16:42.800 --> 00:16:49.200
When we do that we could find these competence intervals.
00:16:49.200 --> 00:16:59.300
Once you have these competence intervals then you that with 95% competence that
00:16:59.300 --> 00:17:07.800
your population mean will fall in this interval between these two numbers.
00:17:07.800 --> 00:17:22.300
Now the 95% is actually called the capture rate that is like 95% and 99%, whatever.
00:17:22.300 --> 00:17:27.200
What would the competence interval be for 100%?
00:17:27.200 --> 00:17:33.500
It would go from –infinity to infinity because that is how far the normal distribution goes.
00:17:33.500 --> 00:18:00.300
But the capture rate is this the proportion of random sample for which this interval captures μ.
00:18:00.300 --> 00:18:10.800
Let us imagine taking a whole bunch of random sample, it is going to be that 95% of the
00:18:10.800 --> 00:18:16.700
Time those random samples in tail μ.
00:18:16.700 --> 00:18:19.200
They somehow overlap with μ.
00:18:19.200 --> 00:18:23.400
That is what we mean by 95% capture rate.
00:18:23.400 --> 00:18:33.200
That is when you know sigma but now we do not know sigma.
00:18:33.200 --> 00:18:34.900
We are in trouble but we do not know μ.
00:18:34.900 --> 00:18:36.500
We do not sigma either.
00:18:36.500 --> 00:18:48.600
Still our goal remains the same, we try to figure out μ from x bar.
00:18:48.600 --> 00:18:51.900
But now we are a little hobbled.
00:18:51.900 --> 00:18:55.400
I do not have a tool that I use to have.
00:18:55.400 --> 00:18:59.000
The beginning part of the story stays the same.
00:18:59.000 --> 00:19:06.500
The population we have no idea and from there we want to find the SDOM because
00:19:06.500 --> 00:19:11.100
we are going to figure out how good our sample is.
00:19:11.100 --> 00:19:17.500
We know the shape of our SDOM as long as our s is sufficiently big.
00:19:17.500 --> 00:19:21.000
Can we figure out sigma sub x bar anymore?
00:19:21.000 --> 00:19:29.900
No we cannot because we do not have sigma so how can figure out sigma sub x bar.
00:19:29.900 --> 00:19:31.500
We cannot figure out that standard error.
00:19:31.500 --> 00:19:35.200
Here is where another idea comes in.
00:19:35.200 --> 00:19:45.900
There is another way we can estimate the standard error of the sampling distribution that is going to be s sub x bar.
00:19:45.900 --> 00:20:00.000
Because we are going to use the sample standard deviation s instead of sigma.
00:20:00.000 --> 00:20:14.100
Remember s is more variable, not quite right and because of that we corrected already a little bit by using n -1 instead of n.
00:20:14.100 --> 00:20:18.100
Here we are going to divide that by √n.
00:20:18.100 --> 00:20:30.700
If you double click on this you would see the square root of the sum of squares ÷ √ n -1.
00:20:30.700 --> 00:20:34.000
You would see this inside of that.
00:20:34.000 --> 00:20:42.600
We already tried to correct it a little bit, but s is still variable.
00:20:42.600 --> 00:20:44.800
It is not quite as good as having sigma.
00:20:44.800 --> 00:20:49.000
And there can be other problems that we run into.
00:20:49.000 --> 00:20:54.400
This is pretty good though and it is a pretty good estimate but you always have
00:20:54.400 --> 00:21:02.400
to keep in mind we have not as good of a standard error as we used to.
00:21:02.400 --> 00:21:05.100
We have to account for that.
00:21:05.100 --> 00:21:07.100
But the steps remain the same.
00:21:07.100 --> 00:21:15.200
First assume x bar for μ sub x bar.
00:21:15.200 --> 00:21:26.900
Two, find z for your capture rate.
00:21:26.900 --> 00:21:37.200
If your capture rate for example 95% then you would find the z scores.
00:21:37.200 --> 00:21:45.100
It is helpful to memorize that for this capture rate the z scores are going to be + or -1.96.
00:21:45.100 --> 00:21:46.400
It is going to come up a lot.
00:21:46.400 --> 00:21:50.100
Find the z scores for your capture rate.
00:21:50.100 --> 00:21:54.500
Here we run into a problem.
00:21:54.500 --> 00:22:10.600
I wish we could use z scores but here is an issue, we actually cannot because s is to variable for us to assume perfect normality.
00:22:10.600 --> 00:22:28.000
And because of that we cannot use the z and instead we have to use the t which is very similar to z.
00:22:28.000 --> 00:22:32.500
Find the t score for your capture rate.
00:22:32.500 --> 00:22:46.200
Instead of having raw score and z score we are going to find t score.
00:22:46.200 --> 00:22:52.400
For now you just need to know that you can find your t score in the back of the book but in
00:22:52.400 --> 00:22:57.500
The next lesson we are going to go over why you use t and why you cannot use z.
00:22:57.500 --> 00:23:00.200
That is a big story.
00:23:00.200 --> 00:23:02.800
You are going to find t.
00:23:02.800 --> 00:23:09.900
Once you find the t for your capture rate and that will also be + or -, t is going to be very similar to z score.
00:23:09.900 --> 00:23:16.300
We are going to use this formula.
00:23:16.300 --> 00:23:26.800
You are going to use a very similar idea to the z score competence interval where you want to know x bar + or -.
00:23:26.800 --> 00:23:31.200
How a t score is also going to tell you how many standard errors away.
00:23:31.200 --> 00:23:36.700
T × standard error.
00:23:36.700 --> 00:23:45.800
But remember, you use t when you estimate this from sample.
00:23:45.800 --> 00:24:03.200
If we unpack this, this is what it can look like x bar + or - t × this is that estimated standard error s/√n.
00:24:03.200 --> 00:24:05.900
It is still the same idea.
00:24:05.900 --> 00:24:11.400
It is how many jumps away, figuring that out and then multiplying that to the length of the jump
00:24:11.400 --> 00:24:18.600
and adding that to x bar for the high-value and then subtracting that from the x bar for the low value.
00:24:18.600 --> 00:24:25.600
In order to find t here is what you need to know for now.
00:24:25.600 --> 00:24:30.200
You need to know whether it is a 1 or 2 tailed distribution.
00:24:30.200 --> 00:24:38.600
If your competence interval is two-tailed then remember these are .025
00:24:38.600 --> 00:24:42.500
because you would split the remaining 5% on both side.
00:24:42.500 --> 00:24:47.100
But sometimes where t values though only give you one side.
00:24:47.100 --> 00:24:53.200
They might give you a one sided 5% or one sided .25%.
00:24:53.200 --> 00:25:01.800
You have to just keep in mind whether it is one tailed or two tailed and also the t distributions are a whole bunch of different distributions.
00:25:01.800 --> 00:25:07.700
They are a whole bunch of different tables basically.
00:25:07.700 --> 00:25:13.900
You have to also know what degrees of freedom.
00:25:13.900 --> 00:25:21.200
For now you could remember degrees of freedom as n -1.
00:25:21.200 --> 00:25:27.700
There are reasons for all of these things why we use t, why we use degrees of freedom all that stuff.
00:25:27.700 --> 00:25:29.500
That will be covered in the next lesson.
00:25:29.500 --> 00:25:32.200
For now, here is what you need to know.
00:25:32.200 --> 00:25:34.400
You need to know whether it is one tailed or two tailed.
00:25:34.400 --> 00:25:35.900
You also need to know degrees of freedom.
00:25:35.900 --> 00:25:42.000
Once you have that you could actually look it up in t table usually found in the back of your book.
00:25:42.000 --> 00:25:53.000
It might also be called the students t distribution because - invented it but he was actually contracted to work for Guinness.
00:25:53.000 --> 00:25:56.100
That is why I cannot publish it under his actual name.
00:25:56.100 --> 00:26:00.500
We published it under the pseudonym student because that is called the students t.
00:26:00.500 --> 00:26:12.900
You can look up your degrees of freedom and then look for the area that you need and then go down and find the t score.
00:26:12.900 --> 00:26:14.500
Very similar to z score.
00:26:14.500 --> 00:26:21.700
Let us go on to some examples.
00:26:21.700 --> 00:26:30.900
Example 1, consider two extreme situations n=10 and n=1,000.
00:26:30.900 --> 00:26:49.300
If you use s in the formula for CI given sigma, here is the actual formula for when you have sigma.
00:26:49.300 --> 00:26:54.600
We use 1.96 because we use the z score.
00:26:54.600 --> 00:27:01.600
Which of these situations would you expect to give a capture rate closer to 95%?
00:27:01.600 --> 00:27:04.300
Here is what this question is really asking.
00:27:04.300 --> 00:27:29.600
When you know sigma for competence interval for 95% competence interval 1.96 that is my z × sigma / √n.
00:27:29.600 --> 00:27:36.400
What it is asking you is what if you substituted in s?
00:27:36.400 --> 00:27:56.000
Here we do not know sigma but we are going to just take this formula and use the z value s/√n.
00:27:56.000 --> 00:28:07.300
In order to answer this question you really only need to keep in mind one thing, when is s more like sigma.
00:28:07.300 --> 00:28:28.500
S is more like sigma when n is very large.
00:28:28.500 --> 00:28:41.300
This situation would give you a very close capture rate of 95%.
00:28:41.300 --> 00:28:43.900
This would be very, very similar.
00:28:43.900 --> 00:28:53.400
However, when n is 10 you have more uncertainty and because of that the t distribution it is not as tight.
00:28:53.400 --> 00:29:08.100
It is actually more like spread out and because of that, when n=10 you do not capture 95% just by being about 2 standard deviations out this way.
00:29:08.100 --> 00:29:13.600
That would not capture 95% of those samples.
00:29:13.600 --> 00:29:18.300
In fact you have to go out further to capture 95%.
00:29:18.300 --> 00:29:23.600
This is going to be much closer to 95% capture rate.
00:29:23.600 --> 00:29:26.000
This is going to give you a smaller capture rate.
00:29:26.000 --> 00:29:37.700
That is because your s is going to be more variable and because of that your t distribution
00:29:37.700 --> 00:29:45.400
is going to be more disperse because more variable means sort of wider.
00:29:45.400 --> 00:29:58.000
95% CI for a population mean is calculated for random sample of weights and the resulting CI is from 42 to 48 pounds.
00:29:58.000 --> 00:30:06.800
For each statement indicate whether it is a true or false interpretation of the CI.
00:30:06.800 --> 00:30:11.200
This question is asking you do you understand what the competence interval means?
00:30:11.200 --> 00:30:13.400
Do you understand what it is for?
00:30:13.400 --> 00:30:21.200
Let us see, 95% of the weights in the population are between 42 and 48.
00:30:21.200 --> 00:30:30.000
Does competence interval tell us about the actual population numbers?
00:30:30.000 --> 00:30:33.100
No, it only tells us about the population mean.
00:30:33.100 --> 00:30:36.400
This is actually not true.
00:30:36.400 --> 00:30:40.600
We do not know anything about the actual numbers of the population.
00:30:40.600 --> 00:30:46.800
We do not know whether it is skewed, whether it is uniform distribution.
00:30:46.800 --> 00:30:48.400
We do not know any of those things.
00:30:48.400 --> 00:31:02.000
The 95% thing would only be reasonable if the population was normal and its μ was exactly equal to x bar.
00:31:02.000 --> 00:31:04.400
That would be the case.
00:31:04.400 --> 00:31:06.200
That is not true.
00:31:06.200 --> 00:31:08.000
What about number 2?
00:31:08.000 --> 00:31:18.000
95% of weights in the sample are between 42 and 48, does the CI tell us anything about this sample?
00:31:18.000 --> 00:31:22.600
No, using the sample to estimate population mean.
00:31:22.600 --> 00:31:24.600
We are using the SDOM.
00:31:24.600 --> 00:31:28.600
We do not know anything about the sample itself.
00:31:28.600 --> 00:31:30.400
That is also not true.
00:31:30.400 --> 00:31:32.800
What about number 3?
00:31:32.800 --> 00:31:39.500
The probability that the interval includes the population mean is 95%.
00:31:39.500 --> 00:31:41.700
This is actually true.
00:31:41.700 --> 00:31:56.200
There is only a 5% chance that this interval does not contain the population mean.
00:31:56.200 --> 00:31:59.300
What about number 4?
00:31:59.300 --> 00:32:03.900
The sample mean might not be in the competence interval.
00:32:03.900 --> 00:32:12.100
That does not make sense if you look at the picture because we use the sample mean in order to construct the competence interval.
00:32:12.100 --> 00:32:16.300
Of course this is in the competence intervals and this is just ridiculous.
00:32:16.300 --> 00:32:29.900
Example 3, a random sample of 22 men had a mean body temperature of 98.1°, standard deviation of .73.
00:32:29.900 --> 00:32:36.400
Construct a 95% competence interval for the mean of the population that the sample was drawn from.
00:32:36.400 --> 00:32:42.800
Interpret the CI and 98.6° included in this.
00:32:42.800 --> 00:32:45.400
This the average human body temperature.
00:32:45.400 --> 00:32:55.600
We have body temperatures in the world and we do not know what that population looks like.
00:32:55.600 --> 00:33:09.600
We are asking can we construct 95% competence interval such that whatever
00:33:09.600 --> 00:33:14.200
the population mean is there is a 95% chance that we have covered it.
00:33:14.200 --> 00:33:23.700
We start by assuming that the mean of the sample x bar is the mean of the sampling distribution of the mean.
00:33:23.700 --> 00:33:27.900
We have done step one.
00:33:27.900 --> 00:33:43.600
Step two is we have to construct CI and so here they give us x, but do we have sigma?
00:33:43.600 --> 00:33:45.000
No.
00:33:45.000 --> 00:33:48.900
We know that we cannot use the z score.
00:33:48.900 --> 00:33:51.400
We have to use the t score.
00:33:51.400 --> 00:33:53.600
Let us find the t for this.
00:33:53.600 --> 00:34:03.300
This is .025 chance that we would not find it on the site and here is .025 chance that we can find it on the site.
00:34:03.300 --> 00:34:05.900
What is the t scores?
00:34:05.900 --> 00:34:08.900
This is the raw score or the temperature.
00:34:08.900 --> 00:34:25.300
What is the t score for .025 when the degrees of freedom and that is n -1 there is 22 man so 22-1= 21 degrees of freedom. S
00:34:25.300 --> 00:34:34.100
If you look in your book, at your students t distributions I am going to go down to where the df=21.
00:34:34.100 --> 00:34:40.400
I am going to go across to where it says you know .025.
00:34:40.400 --> 00:34:46.400
My table actually gives me this area so I am going to look at .025 on the side.
00:34:46.400 --> 00:34:53.000
You and it says 2.08 is my t score.
00:34:53.000 --> 00:34:55.300
That makes sense.
00:34:55.300 --> 00:34:58.100
That is around 1.96.
00:34:58.100 --> 00:35:08.000
You will see that as degrees of freedom get greater and greater this value becomes more and more close to 1.96.
00:35:08.000 --> 00:35:14.600
On this side we know that it is symmetrical so I know it is -2.08.
00:35:14.600 --> 00:35:18.200
From here I can construct my CI.
00:35:18.200 --> 00:35:29.000
The CI is going to be the x bar + or – the t value × my standard error.
00:35:29.000 --> 00:35:37.300
My estimated standard error here is s sub x bar because we do not have sigma.
00:35:37.300 --> 00:35:41.500
That is going to be s ÷ √n.
00:35:41.500 --> 00:36:07.500
Let us put in numbers here, so that is 98.1 that is our sample mean ± t value 2.08 × s .73 ÷ √22.
00:36:07.500 --> 00:36:21.900
I am just going to calculate this on a calculator so that is going to be 98.1 and I will do the + side first. +2.08.
00:36:21.900 --> 00:36:24.700
Excel does order of operation.
00:36:24.700 --> 00:36:34.800
It needs to do the multiplication before the addition and its .3 ÷ √22.
00:36:34.800 --> 00:36:56.900
That is the high-end of my competence interval is 98.4 and the low end is going to be 97.8.
00:36:56.900 --> 00:37:09.000
98.4 and 97.8 are my CI.
00:37:09.000 --> 00:37:19.600
When we interpret the competence interval we want to say something like
00:37:19.600 --> 00:37:29.800
there is a 95% chance that the mean of the population lies between these two values.
00:37:29.800 --> 00:37:43.700
Or another way we could say it is that if we draw samples at random, 95% of those samples will include the population mean.
00:37:43.700 --> 00:37:51.600
95% of the samples in between this interval will include the population mean.
00:37:51.600 --> 00:38:00.000
Let us think about this competence interval, is it reasonable?
00:38:00.000 --> 00:38:05.900
Is 98.6° included that is supposed to be the mean for everybody.
00:38:05.900 --> 00:38:08.600
We see that it is not actually.
00:38:08.600 --> 00:38:17.500
Maybe this sample is odd because our competence interval does not actually include the mean
00:38:17.500 --> 00:38:26.900
that we secretly know for providing temperature of people.
00:38:26.900 --> 00:38:29.700
That is when competence intervals are helpful.
00:38:29.700 --> 00:38:44.300
Here is example 4, in a random sample of 1000 community college students, their mean score on a quantitative literacy test was 310.
00:38:44.300 --> 00:38:50.900
The standard deviation on this test of all the community college students have taken is 360.
00:38:50.900 --> 00:38:58.200
Construct a 95% competence interval for the mean of all community college students have ever taken this test.
00:38:58.200 --> 00:39:11.100
Here is our random sample and their mean or x bar is 310 but the standard deviation
00:39:11.100 --> 00:39:18.600
of all the students who have taken this test that is the sigma is 360.
00:39:18.600 --> 00:39:21.300
Construct a 95% competence interval.
00:39:21.300 --> 00:39:34.000
Well, the first part that we know population we do not know but we are given the population standard deviation.
00:39:34.000 --> 00:39:36.900
And from that, let us construct the SDOM.
00:39:36.900 --> 00:39:42.600
Well given that this n is quite large let us assume normality.
00:39:42.600 --> 00:39:54.800
Here we could find out the standard error by putting 360 ÷ √ 1000.
00:39:54.800 --> 00:40:11.900
Now going to our steps of our competence interval first we assume that x bar is the mean of our sampling distribution of the mean.
00:40:11.900 --> 00:40:25.100
Here we could use the z instead of t because we have sigma and because of that we know that this is normal.
00:40:25.100 --> 00:40:35.700
That is going to be +1.96 and -1.96 in order to construct a 95% competence interval.
00:40:35.700 --> 00:40:46.100
Our CI is going to look something like this x bar + or – z × standard error.
00:40:46.100 --> 00:41:04.200
If you sort of double click on standard error what you will find is sigma / √n.
00:41:04.200 --> 00:41:07.100
Let us put in numbers here.
00:41:07.100 --> 00:41:11.300
310 is our x bar.
00:41:11.300 --> 00:41:15.000
Our z score is 1.96.
00:41:15.000 --> 00:41:18.800
Our sigma is 360.
00:41:18.800 --> 00:41:22.800
Our n is 1,000.
00:41:22.800 --> 00:41:27.200
Let us put these in our calculators.
00:41:27.200 --> 00:41:48.300
I will do the high end first 310 + 1.96 × 360 ÷√1,000.
00:41:48.300 --> 00:41:56.300
Order of operations says it does not matter anything you multiply or divide it in.
00:41:56.300 --> 00:42:04.600
That is my high end 332 as the high scoring end.
00:42:04.600 --> 00:42:16.800
The low scoring end, the lower bound of my 95% CI is 287.7.
00:42:16.800 --> 00:42:27.300
That is going to be 287.7 as well as 332.3.
00:42:27.300 --> 00:42:38.000
The mean of the population 95% should fall between this interval.
00:42:38.000 --> 00:42:41.400
That is the end for our competence intervals.
00:42:41.400 --> 00:42:46.000
That is part one of competence intervals.
00:42:46.000 --> 00:42:51.100
Hope you join me for t distributions to find out why we use t instead of z sometimes.
00:42:51.100 --> 00:42:53.000
Thank you for using www.educator.com.