Transcript:
What we’re going to be going over here is linear regression and we’re going to be testing for the significance of the independent variable here and really what we’re doing here is we’re trying to develop and determine if there is a relationship here between two different variables, variable X and variable Y. So there’s really nothing intuitive about this here, and it’s a difficult concept, so what we’ll do is we’ll just go and we’ll develop the arithmetic or the numbers that we have here and then we’re going to take those numbers and we’re going to apply them to some statistical output, namely some. Excel output. Okay, So what we’re talking about here is well. Look at it in terms of an XY graph here plotted some plotted points here on a graph here, so we’re going to have twelve different points here on this graph here and so along our X axis, that’s going to be our X variable here, and that’s going to be our independent variable here and then along our Y axis, that’s going to be our dependent variable variable. Y here, so variable. Y here, that amount that we have here for Y is going to really depend on whatever X quantity. You have here, okay, so maybe let’s just go look at those numbers first, and then we’ll get into it, so we’re going to have some actual data points here. We’re going to really have 12 different data points here, and those are the actual figures that we’re going to be fitting here with this regression line and testing for us. We’re going to have our X variable here. That’s our independent variable, and then our Y variable here is the dependent variable here. Okay, so back to our graph here. What we’ve done here we’ve taken. We’ve plotted these twelve different data points. XY coordinates for each one of those data points. And then we’ve taken a computer program here and we’ve established a regression line here. So this regression line really represents the this data here, so we’re going to look at how that how that regression line fits the data here, so the regression line that we’ve developed here is going to be y our predicted of a dependent variable, whatever. Y value we want to determine that’s going to equal the slope of the line here in the slope, which we calculated to be ten point three one here times. X, our dependent variable here, Plus 300 year 300 is the Y intercept here when X equals zero. Okay, so what we’re going to do here? We have to test for it. What they call the standard air here or the standard deviation here of our estimated coefficient here, and we’re going to look at it in terms of M the slope here. So we want to find out if we got a relationship here. M the slope of ten point three, one of this slope relationship if there’s a relationship again between our X and y variable. And if this regression line is a good choice here. Okay, so what we’re going to do here to do the testing just to go through our numbers here? We’ve got really two different quantities that we want to look at here first off. We got lets. Just say we got x1 y1 and we’re going to have those twelve different points here. X2 y2 x3 y3 and so forth here. So what we’re looking at as far as our arithmetic here we’re going to look at the difference in our change in our X value here and really, that’s working off the average or the mean value of our XS, so we would have added up all our X values here taken a mean average of them and wed established average line. Here, this horizontal blue line going up and what we would do here We’re just looking at the change between the actual. X point for each of those points in this mean line here and we’re going to be. That’s going to be called. Our X sum of squares. We’re going to square those differences. Here we square them just to get rid of any negative numbers that we’re dealing what they’re always dealing with some positive them up. So that’s going to be our X SS or the X is the sum of the squares here for our X SS represented by X SS X SS here, and then the other one is our residual sum of squares here. Those representatives are SS, and that’s really the change in Y between each of those points, but it’s coming off our regression line here. We’re always looking at our regression line and then looking at whatever difference we have in. Y here between what I predicted, Y value here would be for a regression line and what our actual live value was here, so we’d be taking that difference and squaring it coming up with the sum of squares, so that’s the two different, really two different quantities we’re dealing with here when we’re going to be doing this relationship testing. Okay, so just again here, just looking at these terms. X, our sum of squares here. Four Xs is really the difference between each of these X points in the mean value that difference squared, and you’re going to sum up from 1 to N here. So that’s your X SS here, and then your RSS here was that residual sum of squares that’s taking each of those Y values y1 to end here and what we’d be subtracted into comparison we’d be making, we’d be subtracting our predicted amount here. That was that line equation that we have here. Whatever predicted amount. We have the M, the slope times, whatever X value that represents the Y or that is compared to our Y value here, plus B the slope of the line, so that difference between the difference between our actual Y and our predicted Y here that quantity squared and then we’d be summing that quantity from 1 to M. Okay, so that’s our arithmetic that we have to deal with. So the first thing we have to well rather what we ultimately do. Is we want to get to the standard error here of that estimated coefficient or the slope of the line here that we’re going to looking at it in terms of a standard here here in M. We’d also have a standard error here for our y-intercept here. In this case, it was 300 but we’re going to concentrate on this going through our analysis. You’re looking at our standard error or standard deviation for estimated coefficient as they call it the slope of the line here. So really, we want to look at. Is this slope? Is there a really near relationship or proper linear relationship between our X independent variable and our y dependent variable based on the slope of the line? Okay, so first off, let’s just go and develop our number here. So what we have to do here? The first thing we have to calculate is our standard error of the regression or the standard deviation here of those residuals, and that we’re looking at that. Our SS here really we’re looking the difference between our high and low value here for those different data points not to get into all the complexities here, but we have to come up with that standard error of the regression here for the RSS, our residuals. Okay, so let’s look at that that’s really taking your predicted Y here and then a standard deviation on both sides of it, so we have our line here 10.3 1 1 times X plus 300 But what we want to do is how that Y varies here based on our residuals, so what we went and asked, we’re going to calculate that to be 170 0.5 so a standard error here of the residuals S here of the residuals equals. R really the standard deviation. So what you’re going to do? Is you’re going to take the square root here of those? Some of those residuals skier squared here divided by the degrees of freedom. Okay, so we take our SS here. And then that would be divided by whatever number data point you’re looking at in this case, whatever degrees you have, your freedom we’re going to have two different variables here, X and Y So what we’re going to do here We’re dealing with 12 different data points here, and we got those two X and y variables that we have to deal with, so we have to subtract them. Make our degrees of freedom with 12 minus 2 which is going to be 10 and then if we run through those numbers here for our standard error of the DVR of our residuals here squared. We could come up with 290,000 824 divided by the difference here 10 take the square root of that, and you’re going to get the standard error here or the deviation of our residuals to be 170 0.5 Okay, so that’s taking care of our standard error here of the residuals. Okay, all right, so the next thing we have to go. And we have to determine the standard error of this slope of that standard error here, the slope or M that standard error here of ten point three one here. That’s the slope of the line, based on our regression analysis that we have here, so what we’re going to do is take the standard error, the residuals that we calculated up above here that was 170 point five here, and we divide that by the square root of the Xs. Some of the X is squared and remember. We talked about that. It was X versus the X mean that difference here between the amine value and the actual X value for each of those we would have squared that, and then from one to N, we would have some that, and then we’re going to take the square root of that here. So that’s our Xs are some of our Xs squared, so we would take that quantity here and divide it into the standard error, the residuals that we calculated up there, so we have the standard error of the residuals. We calculate it to be 170 0.5 and divide it by the standard, Our sum of the squares here they’re squared value of the Xs x SS here, and that came up to 20 979 for our sum of the squares of Xs and take the square root of that, and that’s going to give us 54 point five nine, So we divide that into our standard error of the residuals 170 point five, and this is what we’re going to be working with this standard error of that coefficient. M here or our slope that we have here. So this is what we want it to boil down, go through all those numbers to get our standard error here or standard deviation of the slope, and that’s one standard deviation, or one standard error in that that division gives us three point one to four. Just so you understand that standard error, the residuals that you have to calculate it divided by the square root here of your sum of the squares or the difference between your sum of the squares here. Okay, that totaling those amounts. Okay, so that takes care of our numbers here. And what we’re dealing with. The standard errors is three point one to four here. Okay, so what we’re going to do here? Get really four different steps. We’re just going to go through a really some. What you how you approach this here for regression analysis. So our sir, step one regression analysis so know, this is the quake series if there is a significant relationship between our X, Our independent variable in Y our dependent variable, then the slope will not equal zero here. That’s really hypothesis testing. All we’re going to say is if we can come up with a relationship between X and Y here. And this is where the slope is not going to be equal to zero. So we’ll just say until we know that we just assume there is no relationship and then that slope of the line would be zero here. So step two here, test the slope. M here in our equation Y equals MX plus B. So we’re going to have our linear regression using the t-test here. We’re going to be using a t-test here, and we’re going to say it’s going to have a site. Five percent significance. Sniff Achatz about the level here, and that would be 5% here represents point. Oh, five here. We’re going to look at that in terms of our distribution here, our normal distribution or T distribution, and then that’s and then our confidence level is going to be ninety five percent or point nine five. Okay, so now we’re going to have, we’re going to analyze their data here for step three and we’ll go, and we’re going to look at that. So we’re going to have used an Excel program here through the data and selecting our data analysis, and then we’d had a regression routine here where to put our numbers and and then we’ve got some output here, so we’re going to go and we’re going to be looking at this output just to understand what it is first off, and then we’ll go and look at it, So our predictive value here we really have two different values. Our y-intercept here will be the coefficient of our different variables. Our y-intercept here was just that single value word across the y-axi’s here when X is equal to zero, but what we’re really going to be looking at here Is this X variable It called it, and that’s really the slope of the line here that 10.3 one, so its coefficient we calculated with the Excel here. Our statistical program would have been ten point three one to four here, okay, and then we have the standard error. We’re going to look at that standard error well. We figured it out here for the we’re going to follow through with our our slope. Our our coefficient. M here as our X variable that we calculated to be three point, one to four shown here and then based on that, we’re going to look at. We’re going to have a T stat here. That’s our T statistics and we’ll get look at that, and we’re going to have a p-value here. A percentage value and then we’re also have a confidence interval, based on our confident a confidence level here, ninety-five percent lower and upper values. Okay, so what we’ll do is well? Go in and now we’ll do our relate all this information. We have to a statistical program that we’ve run here using. Excel for the numbers that we were looking at. Okay, now, let’s interpret our data here and we’ll look at it in terms of some Excel output, so we’ll start with our predictor here, and that would be for the slope of the line that we’re out here Our X variable M here 10.3 one. So our statistical output would have said that was ten point three one, two, four here, and then the next thing we have our standard error of their coefficient. Here of our slope. Erm, and we calculated that to P three point one two four, and that’s what our Excel output shows here. Three point one, two, four one four six here now with the thing we’re going to have to deal with. Is these T stat in the p-value here, so let’s look at this T stat and how we got that so what you would do here, You’re going to take the coefficient. Whatever value I have for your coefficient. In this case, it’s ten point three one two here, and you divide it by the standard error of the coefficient and that in this case was three point one to four here, So that division is going to give us the T stat. This is what the T stat is. Three point three. Oh, oh, here, so the T stat is really taking the coefficient here that’s calculated and divided it by the standard error. Gives you your T step. Okay, so now we’re knowing our T stat here. We can determine our p-value now. Remember a p-value is the probability of observing a sample statistic is extreme as our T statistic or our Ts that so what we’ve had done here with the T stat of 3.30 0.86 eight here that gives us a p-value here of 0.08 0.02 So what you’re going to do? Is you’re going to really be comparing this p-value to the cutoff or what we have here of 5% so really, what we’re looking at here is that our p-value is much less than 5% It’s only 0.8% here and then based on our our. T stay out here. P-value you’re going to come up with some confidence interval and we’re just look at the lower ninety-five and upper 95% limit. Okay, so let’s go and let’s understand this T stat and this p-value and how we make our decision. Okay, So we calculated our T staff to be 3.30 here, so laying it out on our T distribution and we’re looking at it in terms of two different tails here. So Excel would have calculated based on two different tails. So what we have to do here with our T distribution here? We have to determine our critical. T value here. So what we would do here? Excel, you use this equals T dot inv here that in this case, 2t here for details and we put in our test statistic at 5% and in this case, we have 10 degrees of freedom here. We have 12 samples less two of variables here, so that gave us 10 10 degrees of freedom here, so then hit your. Enter key and Excel is going to give you a Tivo of two point. Two, two, eight one, three, nine here, so that’s all we’re going to use as our critical. T value here. So we lay it out in our T distribution here. It would be plus or minus that amount, plus two point two eight here minus two point. Two, eight here. Hello, this is where we’re going to make our comparison here, so we see our. T staff here at 3.30 here. That’s going to lay in the tail here outside of the tail. Here, buy whatever amount that is 3.30 here. Okay, so this is is going to really tells us right now that we will be able to accept the fact that we do have a relationship between X and Y here, but let’s look at it in terms of our probability here. Okay, so lets based on our t stat here, our T stat here in our critical t-value, let’s look at their probability that we have here. So what you’re going to do here in excel, we can look at it. Our take it’s equal. T dot d is t here, dot two tails here to t. And then you put in your critical value that we have here two point two eight that we calculated for that five percent level here degrees of freedom. Ten here, and that’s going to give you your critical percentage here that you have your p-value here up 0.050 one two here. So all we have to do is compare that to what the actual p-value would be based on. Artiste, out here 3.30 so put in our three point three all here for our actual T set that we have here ten degrees of freedom. That’s going to give us the p-value of Point Zero Eight here or a 8/10 of a percent. Our cutoff here was test value here at five percent. So if we gone over to our probability curve, it would be the same here on both sides here, but I’m just looking because I have room here on this side. Our point, eight percent here lies in a region much less our cutoff value here five percent, so it’s a it’s almost four over four percent less here, so it’s sitting here in those tails. Okay, so what that’s really telling us here based on our actual? T stat here that we have versus on our results here And the P value that we calculated based on that actual T stat here 0.8 percent here. We can say that the sample is unlikely that we’re looking at say we had a sample that we looked at. It would be unlikely that it would not have a leaner relationship between X and Y variables. So because our P value here is less than our significance level here, Significant level was 5% here in our probability was based on our actual. Zar, only 0.08 percent here. And then that’s really based on and we can look go out as well on our T stat here after we if we didn’t figure out our probability value here, we could just look at our T step because T stat here is greater than it’s ending up in a tail here. It’s greater than our cutoff value. Three point three here is greater than a two point. Two eight here, so P value is less than our significance level and the T staff here is greater than our cutoff here, but everything was based on our teeth out here once we knew our T stat, we could put it in determine our significance level here for our What level we’re looking at here and then based on that significance level here, we could determine a compare determine our critical P value or peepers. P cutoff your percentage cutoff that we have here 5% and then we can compare to the actual amount here because we put our T stat in here. At three point, three here 10 degrees of freedom gave us that eight tenths of a percent or point. Oh, oh, eight here. So our P value here is less 0.08 is less than our point five. Oh, five here, our test value here. Peak tenths of a percent is less than five. So yes, X and Y are related here. Okay, so the next thing we have to deal with is the range here. Our lower and upper range here for a confidence level, and that’s just taking our in our variable here. M our slope, ten point, three one plus or minus the standard error of our of our coefficient here or a deviation, relieve our coefficient, Three point one two times, our critical t-value that we have here at 2.28 so adding a plus or minus that you’re going to get a range here. Three point three five here to seventeen point two seven here for your coefficient here. M your slope here. So that’s going to be arranged Lower and upper incher. So just going up to our chart here. You can see that, and maybe before we do that. Let’s just look at excel here. Has these different functions here for the T distribution here? And that gives you the percent put in T dot and then two tails in this case here that give you the percentage that you’re looking at and then this T inverse here that gives you your critical. T values here. Okay, so they’ve got those functions here, okay, so back to our confidence. Interval interval here, lower, 95% upper 95% That’s what we’re showing you three point three, five, two, seventeen point, two one seven. Okay, so, secondly, here, assuming we didn’t have a relationship between X and Y here with that are our quote our that coefficient here for the slope of the line there would have been it would have we would have fell in between an arranger. We’d have a written somewhere as zero would have been in this range here in this confidence and in the folio, but it isn’t its three point three five to seventeen point two seven here so zero. It had to been somewheres in the range, but because we do not have the zero here. It does not appear in our confidence Interval Here. You can also look at it. In those terms, X and Y have a linear relationship here. Okay, so really, those are the two different tests are going to look at using these confidence interval. It tells you the range here on that slope that you’re looking at that. We haven’t. It’s pretty big range here, but it also really, you’re looking at in terms of that relationship here, so zero waterfall and in their relationship, then they you would have in in a range here. Then you were to say that now. X&Y would not have had a linear relationship now. If we just looked at our our y-intercept here that we had a three hundred point. Nine, seven. We did come up with a t stat for that and based on that. T stat here, one point three. Oh, nine here, you two. I put a p-value of 0.2 1 here. And that is 21% Now that’s way over our test amount here, and then you can see here. Our lower value would have been minus 210 Our upper value would have been 812 here, and that’s the case here where 0 would have been in that range. And then you can conclude that for our y-intercept here, there would be no relationship here, but in our case here we’re interested in that slope of the line here, because that’s the key driver here that M line here, and we were able to determine that, yes. There was a relationship here based on between our X and Y variables based on what we have for actual T stat. And what we based on our teeth that we could derive here our P value, Leo. Okay, so that will summarize our discussion here and looking at this output here where we interpreted our data base, then determining if there is a relationship here between our X and y variables.