Transcript:
Hello, everyone in this video. We’re going to learn about statistical hypothesis testing and how to carry out basic statistical tests using python, namely, the t-test Python Scipystat’s Library contains an array of functions that make it easy to carry out hypothesis tests, so we’re going to start just by talking a little bit about what statistical hypothesis testing is so basically in a hypothesis test. You have some data, say some sample data, and you think it might be part of some underlying population or distribution of data, but you’re not entirely sure maybe the sample data or the data you have in hand is actually different in some way from some underlying distribution or population that you think may have generated the data, so in hypothesis testing, you start off with what’s known as a null hypothesis, which is basically the assumption that there’s nothing interesting going on and that the sample data you have is actually from the underlying population or distribution that you’re comparing it to, so there is no difference between the two, and you basically just drew samples from the population and the alternative hypothesis is that there is actually something interesting going on the data or sample you have is actually different in some significant way from the population, you’re comparing it to, and they’re probably drawn from two distinct distributions, so the purpose of a hypothesis test is to check whether you should accept the null hypothesis and just say well, there’s no difference between my data and the underlying population or whether you should accept the alternative hypothesis that your data was actually drawn from a distribution that is different than the population. You are comparing it to so in a statistical test. We need to start with two things, a population distribution that we are going to be comparing to, and then some sample data that we are comparing to that population. Once you have those two things in hand, you want to choose a significance level, which is often denoted by the Greek letter, Alpha and the significance level is a probability threshold that determines when you reject the null hypothesis and say. Hey, there is something interesting going on here so to make this more concrete, we’ll go through an example of a hypothesis test using the t-test, the t-test is perhaps the most basic statistical test, which allows you to determine whether a numeric data sample differs significantly from a population or whether two different samples differ from one another, so we’ll start by going through the one sample t-test. So the one-sample t-test checks whether the sample average of some data differs from the population average, So I’m going to just create some dummy data for us to use for this example of voters in the state of Minnesota so to start with, we’re just going to run some code here, load some packages that we’re going to need for this lesson, and then this next code block here is just going to generate some data. You don’t need to worry too much about what this is doing, but what we’ve done here is generate two different distributions. We’ve generated a population, so we’re checking the mean of that, which is 43. Years old, and then we generated a different sample from this population or a supposed sample that’s ages for Minnesota and we took the average there and we can see that the average age of this sample is a bit lower than the population, so our statistical test here will tell us well is this number so much lower than this population average so that we can say that the voters in Minnesota, their average age is actually significantly different than the population As a whole and one thing you might notice is that the code used to generate these two distributions right here. We actually did use different distributions in the population distribution. We had this Mu set to 35 which made the numbers a bit larger and for Minnesota that was set to 30. So we know having generated this ourselves that these two distributions are actually different, but we want to see whether a statistical hypothesis test will be able to detect this difference and say that it is significant enough that we would accept the alternative hypothesis that these two distributions are actually different so to conduct a t-test with a 95 percent confidence interval. We can do that. Using the Scipy stats function T-test 1-samp. So we’ll show an example of how to do that here. We’re going to say stats. DOT T test underscore one samp that was loaded in earlier when we when we loaded in the scipy as stats here, so all we do. Is we call that function our first argument? A is our sample data and then the second argument is just the population average that you are comparing it to so here. We’ll, say, pop mean is going to be the mean of population ages. So when we pass that in and we run it, you can see the sample results. Give us a statistic here, which is known as the T statistic, but more importantly for us to be able to assess the result. Here is the P value. Now what the P value tells you is the probability that you’d see a result as extreme as the one observed due to chance, so basically, we’d only expect to see a difference between the these two distributions as large as the one we’ve observed here due to chance about 1.3 percent of the time, And this is where our statistical significance level comes into play. If we want a 95 percent confidence that we won’t accept the alternative hypothesis when it isn’t actually true. Well, we want to see a p-value. That’s five percent or lower than before. We’re willing to say. Okay, that’s strong enough evidence that we’re going to accept the alternative hypothesis and say there is something interesting going on now. The T statistic we see here is equal to minus 2.5 or so. This test statistic tells us how much our sample mean deviates from the population or null hypothesis mean. Now, if the T statistic lies outside the quantiles of the T distribution corresponding to our confidence level and degrees of freedom, we should then reject the null hypothesis. Basically, this is another way of saying If we see a result, that’s extreme enough. We’re going to say that there is something interesting going on here. Now we can check the quantiles of the T distribution using Statst Dot PPF, so that’s we’re going to do here. So for a 95 confidence level 2.5 of that is at the top tail and 2.5 percent is on the bottom tail. So if we’re off, either of those ends far enough, then we’ll say that it is a significant result, so we’ll pass in that as our Q for the quantile. The degrees of freedom is the size of your sample minus one, so our sample happen to be 50 so we’re going to pass in 49 for that, so when we run that we get our lower quantile of minus two a little bit more than that. And if we run the opposite sides, we’ll just get the upper quantile, which is essentially this number, but the positive version of it so basically what this is telling us is. If we see a T statistic that lies outside of this range, we’ve observed a result that’s far enough away from our population. Mean that we can consider it different and we’d accept the alternative hypothesis. Since the t statistic we saw here. Minus 2.574 is below this bottom end of this range. We know that we got a result that is outside of this range and we would accept the alternative hypothesis. That’s another way of saying that the p-value is going to end up being smaller than 0.5 in this case, and if we wanted to the result of our test actually showed us what the p-value was, but we can also calculate it using this T statistic. If we wanted to do that, so I’ll just show how to do that below. We can use this statst DOT CDF function, and then you can pass into that your T statistic value as the X argument, along with your degrees of freedom, and you want to multiply it by two in this case because we’re doing a two-tailed test, so we’re interested in both sides. But the result of this will allow us to calculate the p-value and it should align pretty closely with the one that was spit out by the test above and this p-value means that we’d expect to see data as extreme as the one in our sample, due to chance about 1.3 percent of the time, Assuming that the null hypothesis is actually true. In this case, the p-value is lower than the significance level Alpha of 5 So in this case, we would reject the null hypothesis and accept that our distribution for the Minnesota voters is actually different than the population as a whole now. This also means that if we were to construct a 95 confidence interval for our sample data, it would not capture the population mean of 43. So we’ll calculate that here. We’ll get the standard deviation divided by the sample size and we’ll use statst dot interval to create a confidence interval with a 95 confidence level. We pass in our degrees of freedom. Here we pass in the sample mean to LOC And then scale is going to equal the value we calculated here for Sigma, and when we run this, it will create a confidence interval around our sample mean, and we can see that on the top end 42.1 The top of the confidence interval here is less than the population average that was around 43. So we can see that even a 95 confidence interval around the sample doesn’t capture the true population mean on the other hand since there was a 1.3 percent chance of seeing a result as extreme as the one we observed due to chance, That’s what the p-value told us if we were to construct a new confidence interval, but use a 99 confidence level now the top end of the confidence interval should actually capture the population mean, so let’s do that and just confirm that that’s the case and that under a significance level of one percent, which corresponds to a 99 confidence level in that case, we wouldn’t be able to reject the null hypothesis so to do that we’re just going to run the same code, except we’re going to change our alpha or the confidence level to 99 this time. And when we run this, we can see that on the top end now. 43.11 Does just barely capture the population mean, so this test doesn’t provide enough evidence to reject the null hypothesis if we’re interested in a 99 confidence level, and this was an example of how to compare a sample to a population mean another type of statistical test or t-test you can do is the two-sample t-test. In this case, We are comparing two different data samples that we have to one another. The null hypothesis is that both groups are the same and the alternative is that they are from different distributions so to give an example of a two-sample t-test we’re going to generate another sample of data in this case we’re going to say it’s data for the state of Wisconsin and we’ll compare that to our Minnesota data that we made. Earlier now. The distribution is going to be slightly different than the one we made for Minnesota, so we’ll see whether the hypothesis test is able to find that the difference is big enough to be significant, so we can see that the mean for Wisconsin here is going to be 42.8 which is a bit higher than the one for Minnesota now to do a two-sample t-test in Python. We can use statst test underscore end that is for independent t-test and for the a argument We’ll pass in our first sample. That was Minnesota ages for the B argument. We’ll pass in our second sample Wisconsin ages, and in this case, we’re going to say equal Var equals false. This argument. Lets you specify whether your samples have equal variance or not, and when we run this test again, we are given a T statistic and a corresponding p-value now in this case, the p-value of 0.9 says that we’d expect to see a result as extreme as the one we observed about nine percent of the time due to chance, so in this case, if we were dealing with a 95 confidence level or a five percent significance level that corresponds to it, we would fail to reject the null hypothesis because this P value of around nine percent is higher than the significance level of five percent. Now, finally we’ll give an example of a paired t-test, the basic two-sample t-test we showed above is designed for testing differences between two independent groups, But in some cases you might be interested in testing differences between samples that are from the same group, but at different points in time, that means those samples are not independent of one another because they’re actually the same group just sampled at different points in time so to do that. We need to run a paired t-test. As an example, a hospital might want to check whether some new weight loss drug works by checking the weights of some group of patients before and after treatment, A paired t-test will allow you to check whether samples from such a group over time differs so to give an example of using a paired t-test in python. I’m first just going to generate some new data that aligns with that hospital example so here we have a data frame of some weights, some sample weights before treatment after treatment and then a calculated change in the weights now to conduct a paired t-test on this data, we can use the stats that test underscore rel function and again for a we can pass in the first sample, so that’s going to be the data before the treatment and for B we’re going to pass in the second sample or the data set after the treatment. And we run this again. We get a T statistic here. 2.57 and a p value. In this case, the P value is just above one percent. So in this case, if we are using a five percent significance level, this test would show that there is a significant difference in weight between the before treatment reading and after treatment reading for this group of people that took the weight loss drug. Now we’ve seen how to do some statistical tests in python, but it’s worth going into a little bit more detail about how to interpret the results of a test. Most importantly, it’s good to be aware of the different sorts of errors you can make when performing a statistical test, so we’ll take a moment to discuss the different types of errors that you can make and in a statistical test, there’s basically two main types of error. You’re dealing with type One error and type two error Type 1 error describes a situation where you reject the null hypothesis when it’s actually true, this type of error is known as a false, positive or false hit. Basically, a type 1 error means you’re saying something is there when there actually isn’t anything there now? The type 1 error rate of a statistical hypothesis test is equal to the significance level alpha that you set so a higher confidence level and therefore a lower alpha value reduces the chances of getting a false positive. Basically, if you’re going to say, we want a significance level of say, one percent. Well, then we would need a pretty extreme P value before we’re going to accept that alternative hypothesis, which means our chances for these false positives is going to be pretty low. So you might be asking yourself. Why don’t we just set a very strict significance level? And then we can be pretty sure that any time we have a positive result that it’s actually true well? You could do that, but the problem is, there’s another side of that coin. The type 2 error describes a situation where you fail to reject the null hypothesis when the alternative hypothesis is actually true or the null hypothesis is actually false, so type Two error is known as a false, negative or miss. Basically, it occurs when there actually is something interesting going on your two populations or samples are actually different, but you say that they’re not different and that can have bad consequences, too, so basically, if we’re very strict on our significance level, we might end up missing actual effects by doing that, so there’s always going to be a trade-off between Type 1 error and Type 2 error, and that means there really is no correct value to use for your significance level that will work. In all instances, that’s why a confidence level of 95 percent and a corresponding significance level of five percent is kind of just used as a rule of thumb that tends to work well in a lot of cases, but in some cases you want might want to use something different now. Illustrating the Type 1 and type 2 errors with a plot might be helpful. So you don’t need to worry about this code, but what it’s going to do is generate a plot that shows us what the type 1 and type 2 errors look like. So you can see in this case. These two distributions are different, but that doesn’t mean that the results of a statistical test on these will always reveal this difference because sometimes the sample mean we get out of this alternative might be a little bit lower than the average the overall average of that distribution, which might mean we’re close enough to the average of the null hypothesis that we actually fail to detect that There’s actually a difference here, so this is being shown graphically here by this type 2 error this blue area. So let’s say we take a sample from all our alternative distribution and the mean of that sample happens to be right here. Well, that mean of that sample then is not far enough away from the mean of the null hypothesis that we would detect a difference that’s large enough to reject the null hypothesis. So that’s what this type 2 error blue area is showing us now if we were to assume that the population and the sample distributions were actually the same, then the type 1 error would be these red tails of the null hypothesis distribution here, so that would basically mean if we took a sample from the distribution. If we just happen to get a sample mean that was quite a bit larger than average, we might be out here in this type 1 error tail. And if that was beyond our significance level, which is the cut off line here, then we get a type 1 error and again if it was too much lower than the average here, we would get a type 1 error on the other side, so that covers the basics of statistical hypothesis testing and the t-test and how to run some t-tests in python. Now the t-test is a powerful tool for investigating differences between sample and population means, but the t-test just operates on numeric variables. Well, what happens if we have some categorical variables and we want to check whether sample numeric values differ across different categories to do that we can use a different type of statistical test called the chi-squared test, so the next lesson we’ll learn how to run chi-squared tests using Python. So thanks for watching and ill. See you again next time.