Transcript:

Welcome to a short tutorial on how to do simple linear regression in calculated and interpreted for this. I have a dependent so one too explanatory variable, in my example that would be the section of the Questioned note. School grades are normally scaled originally when one. However, an average value from schools can only be imagined from one so to speak. In metric scale speaking metric should be the dependent variable typically be as independent variable, so as the influence more variable. I have the intelligence quotient. So the idea is the high school graduation to explain the intelligence quotient of the test subjects. One calculates a simple one linear regression in he. It is very easy, but a few things are still too. First of all read data that should be noted. No problem if you haven’t succeeded in doing this right above now, A link in the info maps for a simple, linear one. In principle, one has the possibility of regression, the two variables, one sets in a directed contextual relationship again in a streu Diagram to illustrate in a straubinger came. Because you were, there can see how one expression relates to the other. I would in principle. We recommend this to everyone. In the case of a simple linear regression also illustrate for reasons of linearity because we yes to speak of a linear recession. How do we do it? We do it over the platform action. We plot in our case. I already have them here. Data read in data. Xls is my data frame and I want to have the IQ and I would also like to have the high school graduation variable at the same time right down here. That means Harvey Cut is abbreviated and the two in a straubinger gram That means plot is enough at this point completely off. We need the iks and the Y variable for the plot in a regression. The ik’s variable is typically the independent, one explanatory variable and the y variable, the dependent variable to be explained in a plot. You always write the ik’s variable first of all that is me first. Go into my data frame If I haven’t replaced the data dollar sign. And now my ex variable, as I said is here. The IQ then just add a comma and say data underlined XLS. I go back in the data frame and now. I want the high school graduation from that is just something you can easily block. And since they we at least something we do. We, too have the IQ plates on the x-axi’s leave and on the y-axi’s, the high school grade. And here we see that with increasing IQ, the Abitur grade decreases so something linear and the question is now how well our data can be explained by a model so to speak and we will calculate that now and we will simply define ourselves for it again. A model this model is made up of a small hyphen, and although pose stands for linear model and here, we only have to do both of ours. You could also relate variables to one another double click on ln and the pressure on it once again gives you quick help, but show the rough what a linear regression is at least only for the calculations and for the interpretation. I need to show in. This video. Prerequisite exam is a completely different topic at the linear regression and to keep this video here as short as possible and to keep it crisp. I refer you to other videos for testing of prerequisites As I said, I now only have to enter the two variables setting relationships. So the high school graduation is now the variable that I explain. So, of course, I want to choose the name as it is here. Why I don’t go into the data frame now? I don’t want to show it right away. With this snake line, then say the high school graduation results from simple the IQ and I now also provide a data source. Ie data is right away and, of course. I want the data foreign, Which means I save myself now. This somewhat awkward spelling data frame dollar sign variable. I namely, can take the variables directly in the LM function or take their name and then refer to the data source, ie the data from. I could do the math now. The whole has now been calculated that we see model s new down here. We see that many variables or many values were created here we must, of course, allow ourselves to be purposefully displayed again. We could interpret them too, but I would now for one too. Tax Gram, What? I have just had another very briefly drawn up. Let the line run smoothly and I just take it so that the app will be there for laypeople. Just bend a line into these points and put this scatter diagram into it, namely the line that I have calculated with the model in the model. I break the one that has the smallest distance to all points and this is what we are going to simply spend now just after typing, so I let a line output from the model results and the red color has I let the plot I see that I already have a pretty good one here. So the points are there sometimes on it, but some are just not on it. That is in nature. The thing about repression is, for example. This point is relative, Alexander, especially whereas this one is in a very good mood, so it could be worth it. I explain the value very well with the model, but not as well as I said that is in the nature of the fact that points are, of course, not exactly on the line lie that would, of course, be a wishful thinking. Now we’ve at least seen it briefly, Okay, that’s about linear. That means we can expect a pretty good fit for the model, so fit a goodness how the data fit on the model. And now I finally let myself. About the sunny function and the model that’s ultimately just that model up here that could also be called a tree house. Then I would have to be here. Simply call up the tree house again and use the tree house here as well. I now simply look at the results of give out there. We see call M formula up here, So we have calculated the model here. That is ultimately just what is up here and in the back. It said that she won of all things What distances there are therefore risks between the line and the actual observations are the distances between the line and the actual observations. The line these are the estimated values points are the observed values and the distance between the observed values and the estimated values. that is the risk the residence statistics need. You usually don’t look at each other here, but you can see bigger one’s settlement and the smallest in quotes government more important is rather fishing and down here. Then these values would be the first thing. I recommend please look at the FW statistic, namely, the FW statistic often neglected. The FW statistic is the so-called gatekeeper for the FW statistic model. Gives me the F who? So the FW statistic, with the degrees of Freedom 1 and 49 that is not important. Of course, you always include it, but the P value that is actually the interesting thing here, namely a value under 0.05 2.2 times, 10 to the power of minus 16 is an extremely small value. It wouldn’t be worth it to me thinks. I can type that I will change again. So 2.2 -16 Caution here is a point. I, of course work with the comma as the especially since separator. And when I go in here, you can see that here too. A lot of zeros after that are in other words, an extremely small value, consequently, a very, very small significance and consequently, the new hypothesis discarded. What is the new hypothesis of this app statistics the new? The hypothesis is that the model does not make any contribution to the explanation and this 0 hypothesis that the model does not make any contribution to the explanation. Reject this value in favor of the alternative hypothesis, which Mirja says the model has an explanatory contribution that is the gatekeeper. F statistics tells us yes. Our model explains something to us contributes to the explanation, that’s why. I can continue where I continue with. There are now two approaches either. I look at the quality measures right here. For the model. An Das would be the multiple R square and the adjusted Erkrather has or I’ll look at the coefficient’s first Personally. I always go straight to the R square first. I work my way up from the bottom up and I see mine. R-square 08272 is the r-square itself is a measure of the variance clearing up variance clearing up means what percentage of the dependent’s variance variables. So my high school graduation is only possible with my model? The IQ is declared and 08272 is 82.7 percent rounded, which is a very good one worth that means. I can explain a lot. That means one looks at the user first on the variance explanation and then, however, in addition to the Adjusted R Square, the Adjusted Square 08 23 7 That’s not so many different from it. That’s an adjusted r-square, especially if you have more than one independent variable would have, if, for example. The high school graduation would explain not only by the eco, but also by the motivation or other changes. Then you would look at or above all here. Even if you use other variables to explain the high school diploma in others models that other people expect, then you would always be more on the adjusted look r-square than on the multiplayer square or here. Only the R square multiple is there because it is a simple regression is not really in residence location. R rating can be done on the spot. Ignore the need, not be important now is much more Still, what’s here with the coefficient’s los, the coefficients that is actually a relatively simple thing, the intercept that is the constant. Imagine it. You have a value here That is 8.8 ie with an IQ of 0 If you have a high school graduation grade of 8.8 that is, of course more perfect in terms of content nonsense. If you look at this diagram, then you would look here because you start here at 90 the whole thing, except for zero can work back and then you’d get that line somewhere at 8.8 Let the y-axis cut this vertical line here, that means. I would theoretically have an IQ of zero. Then I would starting with a high school. Graduation grade of 8.8 is nonsense, but the model works in such a way that it is a linear function, so just plain a straight line always has an axis section that is constant and one slope. We see the slope. Here that is negative. We can do that here. Acknowledge that the IQ has a negative influence called negative influence. – 00 5838 five, very long number. I’ll just abbreviate it in the future and say 0.06 and Although this value is of interest to us now because that is as I said the slope or how this is a straight field and the more negative, it is the smaller it is worth the stronger falls, of course. This one that we have here. It is important to first see what the value of the pi is. Wörth is indicated at the back that PR and this value and also these asterisks and here. We see two times 10-16 again. This is the same as the one below so that a decimal point is missing here. But that is not because it is important because it is rounded for a simple one regression. This value is always equal to this value and only with the simple regression Because I only have a predictor of an independent variable. I also see these three asterisks at the same time. These three asterisks mean I have a very small significance of less than 0.001 so high significant. That means if this value is below 0 5 This projector is significantly significant means he has an influence on the dependent, namely variable. The Abitur grade. The simple one up here interpretation would be at this point. The IQ, then increases by one unit changes, the Abitur grade and 0.058 units down because of the minus sign in this table is always looked when I use the above variable increase the IQ by one unit in this case, what happens to the dependent variable if it changes at -0 58 units? I want the whole thing all over again. To illustrate briefly, we have already said the constant of the axes section That is this 8.8 and then we have better here 1 This is the coefficient that we just saw that is an independent, variable and. Their influence on the dependent variable is now estimated to look like this off. We have the normal repressions equation and my high school diploma cut and now just calculated that. I will get out of y. The high school graduation macher and the high school graduation result from exactly this equation, namely behind the IQ. That means I can in my equation. Now better use the weather. I have this bed here. I put in here and then I have, for example, if I for one IQ out of 100 put this equation up here and switch over here. Minus 0.05 83 85 and the X of this 100 that means this x1 is the IQ that me can use here for the calculation to estimate my high school graduation. That is if on the basis of the model as I have calculated it. I have someone an IQ of 100 then. The model calculates a high school graduation grade of 299 7 So out of 3 you would make another one. Drift on and say instead of 100 I would now assume 120 so here. We had 120 otherwise. Everything stayed the same. So these 120 are now here. Then we would have an Abiturs and 1.8 29 And that is also plausible. If you look at this diagram again can see here if we have an IQ of 120 then we would have to just go up here until we get this straight this line. This repression’s just hit, and if we left that now project, that is a bit crooked now, but we get around 19 to 20 out. That is what we have now calculated so to speak in 1829 That was what? We calculated that I am here now. Based on the fact that I am Couldn’t go straight, left a bit underestimated. But still, would you proceed? Exactly like this or that is exactly what we have now calculated what? I now show graphically here at the 100 you can also follow it up just as easily up and then you go to the left and there you can see that we are at that mentioned 2997 would come out almost at 3 so you could now. This model calculates a brief note nor you can also use this diagram, of course in his evaluation with pack that should then something this should then be adjusted a little, especially when it comes to the labeling of the axes and maybe also the title in addition. I have my left info cards in the upper right corner like that. A little better succeed as you can see, you have only one quite simple way to calculate a linear regression important again. You have to look at the FW statistics. You look at the R Square and you look at your coefficients.