Transcript:
Okay, in this video. I’m going to provide a brief illustration of how you can run a least-square’s regression using robust standard errors. The reason why you might choose to utilize this particular strategy when analyzing your data is when you have run your analysis, using least square’s regression and evaluated the assumptions concerning the residuals from that model, the main assumption in this case being the assumption of constant variances. And when you violate that assumption, what that often translates into is that your standard errors for your regression coefficients are biased downwards, and it increases the likelihood of making a type 1 error when making inferences about the population regression coefficients. So this is one strategy that you can use in those cases where there’s evidence of a violation of the constant variance assumption. So so in Stata, you can run the your analyses in a couple of ways you can run it through menus or you can run it through syntax. I’ll illustrate running this analysis both ways, so we’ll start off. I’m gonna start actually by running Just a least square’s regression and in this data set that I have, I’ve got variables, gender, subject matter, interest, mastery, goals, anxiety and academic achievement, and so I have basically data on 50 fictional students with these measures and I’m going to regress achievement on to the remaining predictors. And I’m going to use like I said both the menu option and using syntax so starting off with the menu options, just for starters, we’ll just we’ll begin with just a basic least square’s regression model where we make the assumption of constant variances. So in this case, I’m just gonna move achievement to the dependent box and the independent box. I’m going to add gender subject matter. Interest mastery, goals and anxiety. Gender is a dummy coded variable coded 0 for male 1 for female. And so if I run the analysis, you can see here are my regression results, and you know, you can see that there’s a significant relationship between gender and achievement, basically with the coding females who are coded One scored about three point nine four points higher on average than males who were coded zero and you can see a positive relationship between subject matter, interests and achievement, positive relationship between mastery, goals and achievement and anxiety. It was a negative relationship of the predictors. It looks like gender subject matter. Interest and anxiety we’re all significant, but mastery goals was not now let’s run the same analysis, but using robust standard errors, so I’m gonna go to linear models and related linear regression and click on standard errors robust so basically, it’s like this if I had run this analysis and performed a residual analysis and looked to see if there was any kind of violation of the constant variance assumption or homogeneity of variances. Then this might be a strategy for dealing with that problem because the standard error is that you see right here might be deflated and that increases the likelihood of incorrect rejection of the null hypothesis. So assuming that I’ve run this analysis and performed my residual analysis and concluded that we have a violation of the assumption of constant variances, Then I would click on robust and there’s actually three different approaches here. We’re just gonna stick with the default and Ill. Click on, okay, and so now you can see that. I’ve we have the robust regression that has been run, so you’ll notice that the coefficients are all exactly the same whether we ran it, using the standard least square’s regression or using the robust standard errors. All the are gonna be exactly the same. The differences are gonna be in the standard errors. That you see right here. You can see that these standard errors. Now, In this particular data set, there was no evidence of a violation of the Assumption concerning constant variances or homogeneity variances. So really, there wasn’t any need to run the robust run, the least square’s regression with the robust standard errors. But nevertheless you can visually say, you know, you can see that, you know you’ve got a. These are the standard errors and you can see that, you know? In many cases, the standard errors are higher after running a robust regression are running a regression with the robust standard errors as opposed to the default standard errors. And so that that will generally translate into. You know, the T values then are going to be a little bit smaller, oftentimes down here, and that also can translate into less powerful significance tests, but that’s the basic idea is that you’re trying to offset any deflation in the standard errors as a result of the violation by increasing the standard errors and that thereby making the test a little bit less powerful and making it less likely that you will commit a type 1 error, so if I want to run the regression analyses using syntax, you know, this is basically the syntax for the least square’s regression, so just to rerun that I can just type in regress, and I have to type in the names of the variables as they appear, including the caps, so I will type in achieve. That’s my dependent variable, and then I’ll type in gender, You know, there’s subject matter interest, M and anxiety here and so there you go. That’s the initial least square’s regression that we ran if I want to run the robust the the least-square’s regression with the robust standard errors. This is a syntax here, but I could have also just typed in regress, And, and also, you know, I can drag these over here, so I don’t actually have to type in literally everything, so I could say regress achievement, my dependent variable on – we got gender. We got subject matter. Interest here, mastery, goals, anxiety here and then comma, and I can type in robust and basically, that gets me the same information as if you know, running it through the menu option with with this syntax, so either way would work so basically like. I said the reason why you would do. This is in those cases where you might have. You’ve run The standard least square’s regression, and you find evidence that you violated the assumption of constant variances and so it doesn’t. The this particular analysis does not impact the actual regression coefficients it doesn’t affect, you know, the R Square value or anything like that. But it does affect the standard errors and the T values and the P values and essentially it’s designed to make the test a little bit more conservative so that you’re not more so that you’re less likely to commit a type 1 error when making inferences about your regression coefficient’s.