Transcript:

Hello and welcome. This will be an example of Provident logic models using a data. Set before you watch this. Please make sure that you watch. The theory overview also posted on Youtube. So this is the example that we will consider here. We will be using a data set from the health and retirement study wave 5 from 2002 collected by the National Institute of Aging and we would study the factors influencing the purchase of health insurance, so our dependent variable would be equal to whether or not a person has a health insurance so 0 if they don’t and 1 if they do and independent variables that we will use would be whether or not the person is retired, their age, Whether or not they have good health status household income education years, Whether or not the person is married and whether or not a person is Hispanic, and so we will be estimating three models. First one is a linear regression model Logit model and then a profit model. And I will show you the code. How to do that with different software. So once we estimate these models? I will show you the results and basically how to interpret them. So here’s the table of health insurance. Yes, or no, this is our dependent variable, and the Y codes would be one for yes and zero for no. And the percent frequency coming from the data is that 39 percent of the people happen to have health insurance and 61 Don’t have health insurance. If you estimate the three models, these are the results that you will get here. Are the results for the regression model is simple. Just oil-les’s model. These are the results for the logit model, and these are the profit model results. And I have reported here. The coefficients and the story indicates significant at the 5% level now. I don’t have the standard airs or T statistics here, but you should have them when you write a paper. I was just trying to save some space here so my table. Doesn’t get too too long, okay. So how would you interpret these coefficient’s remember from the lecture that you can only say more likely or less likely, but you cannot interpret the magnitude of those coefficients, so here’s how you would say for coefficient interpretation retired individuals so basically, this would be our independent variable. In comparison to those who are non retired, basically would be more likely to have health insurance because these coefficients are positive is significant also individuals with good health status. These guys that will be more likely to have health insurance those that have higher household income. They will be more likely to have insurance. Those that have higher education those that are married and they would have. They will be more likely to have insurance. If you look at the Hispanic, those that are Hispanic, there will be less likely to have health insurance. We see a negative and significant coefficient here, so notice one very interesting thing about these coefficients is that they differ by scale factor. So if you look at this point for point 19 and point 11 you can’t say that there’s the highest effect or or impact using the logit model of someone being retired because they simply differ by a scale. If you look at those coefficients, they’re also higher than the rest and it’s simply based on the functional form. So make sure again. You do not interpret the magnitude here. You just say more likely or less likely to have insurance, okay. Next one, we will talk about the marginal effects and here. I have put the marginal effects for the regression model. If you look on the previous slide, these are exactly the same as the coefficients they to be because of the linear functional form that we have and here we have. The marginal effects add the mean for the logic model and the average marginal effects for the logic model and then again marginal effects at the mean and the average marginal effects for the profit model. So here, when you see that these are marginal effects, you can go ahead and interpret their magnitude, so the way you would say say, For example, for this coefficient, you would say that retired individuals of four percent more likely to have insurance in comparison to those that are not and notice again that across all models. Now we have the same marginal effect, So regardless of the fact that we had different coefficients. The marginal effects are pretty much the same, the way to interpret the continuous variable and that effect is say this one so here, you would say that for each additional year of education, individuals are two percent more likely to have insurance. You can also say that. Hispanics are twelve percent less likely to have insurance than non-hispanics. Now, one very important point here is to use more likely and less likely not just the word likely or less likely because they still may not be likely to have insurance, remember? Only 39 percent of the people have insurance so on the whole, they’re not likely to have insurance, But if you’re retired, they’re more four percent more likely than their base of 39 percent of having insurance. And this is what this this whole thing means. So so the next point that I have here is that even though we had the coefficients different look at these marginal effects, I mean, they do differ across models, but they’re very, very similar and also if you look at the margin effect at the mean and the average margin effects as we discussed in the lecture, they’re also very, very similar so impact it doesn’t matter which approach you’re using. Another point to note here is the signs of the coefficients in the margin effects are the same for the logit and Probit models, and that simple comes from the formula for the marginal marginal effects. A couple more points that you need to point out here is that the average of the predicted probability would be about 38% which is very similar to the actual frequency for having insurance and we’re going to see that from the programs and another thing that you need to comment on when you write A paper is that the Provident Logic models correctly predict 62 percent of the values and the rest are misclassified. So we basically have an okay fit, but not not great fit. Okay, so let’s see now how these models are estimated with different software.