Transcript:

Okay, so in this video. I’m going to take a look at the students t-test and the T distribution, which is a continuous probability distribution that arises in estimating the mean of a normally distributed population when the sample size is small and the population standard deviation is unknown, so the origins of this test. Hail from actually. Guinness Brewery in Dublin. There was a data scientists of the early 20th century Gossett, and he developed a number of tests to verify the quality of the beer that was being produced and couldn’t taste all the beer. It was limited to working with samples and he had to infer from samples the standards of the batch, right, so it meant that it, he couldn’t work necessarily with large samples, and he had to come up with critical values that reflected at us so he couldn’t use standard normal critical values. He had to Intuit around it. Agartha that posits at the tables that he created can be given by which it can. It can judged whether a series of experiments, however, short have a given result, which conforms to any required standard of accuracy or whether it’s necessary to continued investigation and so this became important and has become important for comparing the means of two populations has become important for constructing confidence intervals and also in linear regression. When we look at the linear regression outputs, we very often look at the T stats of the coefficients and then explored p-values to determine if the variable is statistically significant or not so there. Guinness has a lot of history in Dublin, but we will move just to the t-test itself and the T distribution. I’m going to come down here. This is our coach. I just copy it and we’ll take a look at it and we’ll set it up in. I just clear out everything. Here, clear console and we’ll just paste in our values paste in the code and basically we’re gonna start off here by looking at a normal distribution and we’re going to go negative, four standard deviations or standard deviates to form two positive, forced on the deviates and go and step in a hundred steps. So we have intervals of we divide up if you like the data points into 100 data points. Okay, so let’s just run that that sequence you could take a look at it, and if we print out or or print a text, so let’s just take a quick, let’s look at X, So if we take a look at X here, we go from negative 4 all the way up to 4 and there are a hundred data points, so ace 99 100 Okay, so we’ve gone in a hundred steps, And then we’ve estimated the density function the normal density function on the Pakka dot again. We could take a look at. Hey, checks here, write a checks and just run and this is just the density function for each of these corresponding axis. If we were just to plot those two and this got nothing to do with the T distribution, at least initially, so, although I’ve it’s a bit of a misnomer here at T distribution and this is just a normal distribution so normal distribution looks like this and what Gossett tried to do was normally if you’re working with large samples, you can develop critical values and test statistical significance from the critical values from the normal distribution, but when we have small samples, you need somehow to not be so confident and he tweaked the normal distribution to allow for smaller sample size, so let’s just if we consider, for instance, where we had sample of one or three degrees of freedom. Shall I say one 3/8 and 30 and we just load those into that object that vector and then we set up colors corresponding with each of those and then labels as well and then we plus, and if I remove this table here and plus what we have before with the normal distribution, okay, and then we create a loop, right, I going from 1 to 4 and we’re going to read Sub means. We’re going to do for loop through four times, and then we take the Xs, so the x-values that recruit we created these ones going from Negative 4 to 4 so we’re going to run those X values and we’re going to take a the T distribution we’re gonna get the density of the T distribution for those X values with the degree of freedom of 1 then tree, then 8th, then 30 Okay, so let’s just run this, okay, and then data al, I’ll rest and to add your entered applause, right, The other distributions where I will put in the legend so that we can see more clearly where with the degree freedom is warm, where the degree freedom is tree where the degrees of freedom are ace and when two degrees of freedom of 30 were quite close to the normal distribution, so for varying sample sizes and degrees of freedom, we can adjust the normal distribution or in suit from a tweak. The normal distribution to reflect confidence levels are to reflect critical values when we’re working with small sample size and that’s that the benefit of the T distribution that we can move from large samples to small samples and perform statistical testing in particular. The t-distribution plays a key role in statistical analysis processing the statistical significance. Perhaps of the differences between two means, and also it allows us to construct confidence intervals where we have small sample size, so in the next video clip and we’ll take a look at cap minder data and Wilkins will test the C Esters. This is statistically difference between France and South Africa in terms of life expectancy. Okay, so I’ll stop you.