Transcript:
[MUSIC] Today we’re going to explore the beauty of cross validation we’re going to use the same data set as before the one that we use with K M and we’re going to try to use this correct functionality. So before that, if you go to my Github webpage, you can find you can go here. Okay, remember, Gitcapcom mariocast73 and this direction so as I was saying. If you go here and you go to scripts, I have included in one of one interesting carrot snippet that you can use in in our studio. If you don’t know anything about snippets, go to my video about the snippets, so you can copy this and plug this into rstudio. Let’s get, let’s do this, so let’s go back to our studio tools, global options, code edit snippets. And if you go to the end, remember this one from the other video, I’m going to add this carrot snippet and it’s going to be beautiful lets. Save it, okay, and here we go so first of all, Let’s call the library caret. And then the data set that I was mentioning. Let’s explore a little bit. This data set as useful, so we have two predictors that are numerical values, x1 and x2 and this categorical variable X as you can see, we have two factors and labels No, and yes, okay, so let’s use this carrot snippet red snippet. This is something written by me. It’s not something standard, but I love this sort of things, So if you type carrot, and then you wait a little bit, You see this snippet here and you have this summary here. Remember that anytime you press the top key. You’re going to go to the different al to different arguments, so we’re going to start with the first argument, which is the date, the name of the data set then the name of the variable and so on and so forth. So let me show you this. Let’s click in caret and now imagine that instead of typing data, I type, I know Titanic, and this is beautiful because automatically you can see that Titanic is filled in some other parts of this snippet so here. The name of our data set remember is data, so I’m going to call this data and then I press the tab key. Can you hear that? And then again? This could be called survive or whatever and as you can see here. These changes are going to be propagated in different parts of of the code Here. I’m going to use just Y. And then the percentage of used for training and for testing, let’s change this to 75 and then again press tab and now the method. So this is the beautiful of carrot, so you can perform with the same, almost with the same syntax, A K N. You could use dl’m. You could use a partition tree and that we are going to discuss in another video here. I’m going to play with K n then tab again. And then the parameters for K and N remember that the only hyper parameter is K. So you can use this k. Lets we could use cp or whatever. Let’s plug this K again tab key and then the values that we want to explore. If I leave this, I like if I leave this like that, I’m going to explore just one parameter, and this is what I’m going to do in the first demo. So here we go and as you can see here. We have all the steps in a machine learning project. So first we’re going to define the training and testing data sets here 75 percentage. Let’s run this, then we have the training set. As an exercise you can try str data trn or you can use n-rows and row data trn and compare to to the whole data set and with testing so you can see that we have actually 75 percent length for each data. Set, okay, the next step, let’s run! This next step is train control. This is this is performed for cross validation in this case. I’m not going to do any cross validation because I’m just playing with one hyper parameter value. So in this case, K equals 10 so this basically this sentence is not going to do anything, but let’s leave it like that. And now we’re going to fit a K N algorithm in which I’m going to fit y versus all the other parameters in the data set. In this case, the features are x1 and x2 I’m going to use only the training part of the data. Set, I’m going to pre-proces’s the data. Remember that K N is very sensitive to to the dimensions of the variable. So with with this, I’m going to subtract the mean of the parameters. So if I let me do some calculations here, if I calculate the mean of data training x1 then I’m going to subtract this and I’m going to divide by the standard deviation so again if you compute SD, basically what I’m saying with this with this sentence is that take any feature x1 and x2 subtract the mean so center the data and divide by the standard deviation so scale the data. Okay, remember that from another video and here I’m going to say what parameter I’m using. In this case. K equals 10 and here we go. So this is a K N training, so lets. Do some predictions. If I run this just simply by calling the predict function. And I’m here you can see. I’m using this fitting and the testing data set, okay. If I run this and I look inside this prediction variable, I’m going to see categorical values. Okay, I can change the syntax. So if I run this in step using this, I use type equal prop. Then let’s run, lets. Call this dot prop. And if you look inside this new variable print that prop, you’re going to see probabilities, probability of yes and probability of no as you can see here as we only have two categories. So one is one minus the other, so 30 for yes. That means 70 for no and so on and so forth. Okay, okay, let’s forget about this, but okay, keep that in mind for another video and now we’re going to compute the confusion matrix. Remember, we’re going to create a table in which we are taking the testing category, so this is from from the data set and these are the predictions, so let’s go and as you can see here, we have a lot of true negatives and true positives and a few false negatives and a few false positives so accuracy is pretty high in this case as you can see 88 percent, this is huge and in general for a classification algorithm. That means that K N is pretty good with that. Okay, so now if I plot this, sorry if I print this, I’m going to see that this accuracy again and Kappa, and if I plot this as you can see here, we have an error why I’m getting this error because I’m using just one value of K and basically this plot when you plot a fit by itself, basically, what you are trying to compare is different outcomes of the cross validation. Okay, so here what I’m going to do is repeat all the process, but instead of using just one value of K, I’m going to tune this a little bit, so let’s use again. This snippet carrot with cross validation. Okay, so again, carrot, click here again. I’m going to use the value data. Y 80 K and N K equals N and here I’m going to change this. So instead of using K equals M, I’m going to use different values. I’m going to use a sequence from K equals 5 to a equals 100 by five. So basically, this means that I’m going to run this algorithm over and over again using K equals 5 10 15 and so on and so forth and again, what’s the mean of this? I’m going to use cross validation and I’m going to use 10 10 volt cross validation. Okay, so let’s run this again Just in case, okay, let’s see the predictions now. I have higher accuracy and why is that because I’m training? My K and N with different values of K and cross validation is giving me the best one, so let’s plot. Let’s print this one, and as you can see here in in the case before we have just let me show you This. Just one value of the accuracy and globally for the Kappa. Remember that the higher Kappa is the best because Kappa equals one means that we are nailing the prediction with our model. Okay, so in this case, we have different values of K as you can see here. These values of K comes from this sequence. So you can change this. If you wish and for different values, we have different values of accuracy and different values of Kappa. Okay, so let’s now plot this, and as you can see here this we have this beautiful plot in which basically we’re changing the number of neighbors. The number this is K and we’re plotting here accuracy, and as you can see in this plot, The highest accuracy comes from K equals 40. You can see this by simple inspection, so K equals 40. Accuracy here is 0.9375 Okay, and this is what you see in this summary. So this is beautiful because instead of playing or plugging K different values of K and trial and error until we find the best one in this case, basically, cross validation is giving us the best outcome. So this is why cross validation is is a queen in town. Okay, let’s do this in another way. So imagine that you don’t have a feeling of what’s the best value of this range. You have this rule of thumb. If you take enroll data, we have a thousand observations. If you take the square root of this, this is a rule of thumb. So basically, we should play with values around this one. This is what I’m doing here, so I’m ranging from 5 to 100 so 31 is more or less over there, but there is another way to call cross validation and you can comment this. And instead of using tune length and basically, cross validation is going to take 50 50 values of K that this is basically guessing what values are are going to use so starting with the default value. So let’s run this again prediction and let’s take a plot and here you can see that we have 50 values, so you don’t need to count them. And now the maximum one is in 35 so for 35 We have here we go, so we have this accuracy and basically, this corresponds to this one. So basically, this is all you need to know in cross validation, but I’m going to give you a bonus track. Okay, because here we’re maximizing accuracy, but maybe we want to maximize Kappa, or if we’re playing with some, I know training with vaccines or some copy testing or whatever maybe we want to improve sensitivity. So what we can do Here is change this line here, so let’s copy this here and we’re going to do a new summary that we are going to use a multi-class summary. So let me try this first and run again and as you can see here. Oh, sorry, let’s run this. Oh, sorry, we need a bracket, okay. Let’s run the with this new parameter here. Let’s plug in the same line and then train again, make some predictions and then plot the confusion matrix and play the outcome and the outcome as you can see here has tons of information, so not only accuracy and kappa, but also different values like sensitivity, positive predictive value and so on and so forth. So imagine that you want to maximize. Let’s say Kappa. So here we’re going to add a new value and let’s use metric equals Kappa, so let’s plot this again so you can compare okay, so this is maximizing accuracy and now let’s train again by using this metric kappa blah, blah, blah confusion, matrix print this, okay, and now plot, and as you can see here now, we are maximizing Kappa and now remember. The maximum is obtained for 35 and this 35 basically means that this is the value for which not the accuracy, But the Kappa is the highest, so lets. Go there to this line and as you can see here. Accuracy can change in other lines. In this case, we also are maximizing accuracy, but basically with here we’re maximizing Kappa. Which is the second column so again you could change this metric. We could use sensitivity, Vt, and then if you plot, okay, we’re maximizing sensitivity here. The maximum is for 49 Okay, of course, if you run this over and over again, maybe you get different values, but but you get the idea. So main messages of this video use cross validation. You can play with a metric and you can do a couple of things. Tune your own parameters. If you know enough about this algorithm, so this should change for different methods or simply let train the carat train function to play with different values, okay.