Transcript:
All right, so, in the previous video, we talked about how to convert The wide format data set, where these columns are each representing outcomes measure at different time points to a long format data set. Where you can have all of your all of your outcome in one column where you have multiple rows per patient and this is usually possible to convert from white format to long format using, uh, some of the basic software packages that we use in our and I also have shown that per person. You may have multiple rows. So in this particular case, we have four different roles. Uh, for subject 99 all right, so in terms of data analysis, when we have a data set like this, we generally use this. Um, long format of the data where we have just one column for the outcome variable so in that scenario. Um, before talking about appropriate analysis strategy. Let us see how we can work with conventional regression methods in a conventional regression method. You have your y, you have your X matrix. That means that would include your exposure variable, as well as your adjustment, covariates and then associated with your exposure variables. You will also have your slopes. You will have your intercept term as well as at the end, you will have your error term. So if this was the case, then in our scenario, the BDI variable was a continuous variable and this BDA pre variable was the BDI that was measured before the treatment assignment, and then we have all of these adjustment variables such as time treatment, drug length and the error term. Remember this treatment was our exposure variable in this particular exercise, so obviously, since we are dealing with a continuous outcome, we can try to use a LM function to see how the results look like and using this lm function. You can obviously get some estimate of the different covariates as well as you can get the aic and Bic values, all right, but one problem is that, um, the data set that we have used the outcome measurements. We’re taken repeatedly that means that in this particular data set, we have, um, this outcome measurement for the same subject subject one that were or in in this data set where for the same subject 2 We have multiple measurements of this BDI that means each subject is contributing to, um, at most four different rows and as you can imagine, all of these observations from this same subject is exactly the same, so the standard errors that we will get out of. This type of linear regression model should be problematic. So what do we do? We do in that scenario. In that scenario? One of the things we can do is to introduce a random intercept that means that we are now talking about, uh, not only the overall intercept, but we are also talking about an individualized intercept say for subject One. There will be one intercept subject two. There will be one intercepts of the three there will be another intercept, but in terms of the X variable or the slopes You you will have, uh, those will be unified or common, all right, so each subject is a is a source of repeated measurement, right because, um, from each subject, we are measuring the outcome repeatedly and that is why we are trying to have individualized intercept for every single subject that we have, but in this regression, we are not really considering individual slope, so the slopes will be common as usual, so the only new term is that additional intercept term for every single person. So in that scenario, you will see the structure of model specification when you are using the function within R, the structure will be somewhat different where we will add a parenthesis and in the parenthesis after the condition sign, you will have the subjects that means that for part subject, we will have a new intercept, but we will have one before the condition That means that the slope, uh, is not going to be random. The only random part is going to be the intercept associated with the subjects. All right, so if you run this type of, um, mixed model where you have some fixed effect and some random effects, the fixed effect will look like this and the interpretation for the estimates that you get from the fixed effect model will be very similar to the inter interpretation that we do from the linear regression. Um, and the P values were estimated by using a specified formula or that uses, uh, some specific degrees of freedom, all right, and you can, as usual, get the aic and bics. And for the fixed effect, you can also get some sort of pseudo R square that can, and all of these can be used as some sort of goodness of feed statistic. You will also get a new table called the random effects and these random effects. Um, are interesting in a sense that these are not really the standard errors. These are the standard deviations for each subject’s, mean outcome or the BDI, so say, for example, remember that subject 2 had, uh, four BDI measurements Subject 99 had four BDI measurements. So you first take the mean of each of these subjects bdis, and then we simply take the standard deviation of those means, and that will give you the standard deviation for the intercept note that there is no coefficient associated with the random effects. All right, so remember, we talked about having random intercept but same slope. So if you plot the predicted values against the time you will see all of the predicted lines will be parallel, that means the slopes are exactly the same only thing that is different. Is the random intercept, so the intercept will be different for each of these persons, all right, so previously, we have already talked about random intercept but fixed slope. Now let us make the model a bit more complex where we also. We now have random slope as well as random intercept, so random intercept part is common, but we are now adding this, um, additional flexibility that the slopes can be individualized as well. So that means that for every single time, so remember. There was this time for month. 2 3 5 8 And for each of these times, we have a new beta and there will be additional variables that are associated with this X and there will be additional intercept that will be associated with each subject, so in in the context of this random intercept, those random intercepts are coming from each subject and in context of the random slope. These random slopes are coming from each time points, and if we have set up the model like this, then what we are doing is we are setting subject after the condition that means that this is going to be the intercept for each subject, and then we will have time before the condition that means that we will have slope that will be associated with each time and again if we feed this model where we are modeling the BDI, with respect to all of the other coherence, then we can see that these estimates that will be very similar to the previous regression fixed fixed effect regression that we have seen again. We can get the aic and Bics. Uh, and you will see that now. The Bic here is say, for example, 1927 where the previous Bic was 1916 so that means that this Bic is smaller as well as this aic is also smaller. Remember this one eight, eight, seven here. And in here, one eight, nine one. So the previous models aic was better than this one right, also when we are getting the, uh, random effects we can get, uh, the standard deviation for the intercept. We can also get the standard deviation for the time, so in here, the standard deviation of time means that we will take the mean of all vdi for each time point, so we will take the mean of all the bdas for time equal to 2 we will take the mean of all VDI where time equal to 3 will take all the mean of BDI for time equal to 5 and we will do the same for time equal to 8 and then we will take the standard deviation and then we will get this standard deviation again. There is no coefficient associated with the random effects and one additional term we get is the ICC, so ICC is basically some measurement that gives you how strongly the units in the same group resemble with each other and obviously remember there were in total 100 subjects, but there were some missing observations in the outcome, so excluding those, um, subjects with missing observations, we only end up with 97 groups, all right, so in this case, since we have random intercept as well as random slope, you can see the intercepts are different, as well as the lines are not parallel anymore, so the slopes are different as well when we are modeling or when we are plotting the predicted values versus time. All right, so we talked about two different models here, right, so the first model was only the random intercept, and the second model was random intercept as well as the random slope and we, we have already compared the Aic and Bic, and we know that the first model was, uh, associated with smaller aic, so this model with the random slope sorry. Random coefficient only was associated with a smaller aic and does a better model in terms of the AIC criterion. We can also do a likelihood ratio test comparing these two models because these two models should be considered as nested model as well and in here because because why, because all of the co-operators are the same. Um, this random intercept is also the same only difference is that in one model, we have not used any random slope, but in another model, we have used the random slope. And when we do the likelihood ratio test, we can get a non-significant value That means that these models are not statistically significant significant. Uh, different, that means. What if you have a complex model? And if you have a non-complex model, obviously this is these two are nested models. Then we generally prefer the more persimmonious model that means only the random intercept, but fixed slope would be a model that we will prefer in this particular situation. So one last point that I want to talk about is the assumptions. How do we check the assumptions because there are two specific assumptions that are associated with the final model that we have chose, remember? What was the final model? The final model was that the subject had a random intercept And, um, of course, associated with this model, there will be some residuals, so our assumption was that the residuals or the error term follows a normal distribution. Also, the intercepts of each subject also follows a normal distribution. We can easily check this normal normality assumption whether it is debiting severely, either of this assumptions are debating severely from normality and for that we basically use the estimated residuals as well as the estimated random intercepts, um, based on the predicted values and we can check the theoretical quantiles based on normality, and we can figure out whether it is deviating severely from normality or not. So this is this, um, estimated residuals, and there is some deviation here. Some deviation here, but and also in in this, we are seeing somewhat similar results so based on this, we can try to judge whether they are deviating severely from the normality assumption or not.