Transcript:
In this video, we are going to talk about how to choose the best model for your given machine learning problem and how to do hyper parameter. Tuning here is a list of topics that we are going to cover in this video. Let’s say you are trying to classify. SK learns iris flower data set where based on the petal and sample width and length. You are trying to predict what type of flower it is now. The first question that arises is which model should I use? There are so many to choose from, And let’s say you figured out that SVM is the model. I want to use the problem. Doesn’t end there now. You have hyper parameters what kind of kernel and C and gamma of LSU Should I be using? There are just so many values to choose form. The process of choosing the optimal parameter is called Hyper tuning in my Jupiter notebook. I have loaded Iris. Flower data set here and it is being shown in a table format, The traditional approach that we can take to solve this problem Is we use trained test split method to split our data set into training and test data set here. I am using 70/30 partition And then let’s say we first try SVM model. Okay, so first I am going to show you how to do. Hyper parameter tuning, and then we’ll look into how to choose the model. So just assume that you are going to use SVM model and using SVM model. You can train the model and you can calculate the score okay here. I randomly initialize these parameters. I don’t know what is the best parameter, So I’m just going with some value. The issue here is that based on your train and test set, the score might vary right now. My score is 95% But if I execute this again. Extraneous test samples are going to change, so it will change from 95 to now it chained to 1 I cannot rely on this method because the score is changing. Based on my samples For that reason, we use K-fold cross-validation. I have a video on k4 Cross-validation. So if you want to pause here and take a detail, look at it, you can go there, but I will just give you an as shown in the diagram. What we do in a cave fold cross validation Is we divide our data samples into N number of folds here. I’m showing five holes and we take Fiight raishin. In each iteration, one fold is test set remaining our training set. We find the score for that iteration, and we take these individual scores from each iteration and we make an average. This approach works very well because you are going across all the samples. Okay, and we have a method called cross we’ll score, which can tell you the score of each iteration here what I have done is tried cross we’ll score four five fold, so CV is equal to five means five fold and tied this matter on different values of kernels and see. Okay, so here kernel is linear. Here it is RBF c10 and here C is 20 for each of these combinations. I found the scores, so these are like five. You can see. There are five values here, and these are the scores from fire attrition. You can take the average of these and find out. What is your average score and based on that you can determine the optimal value for these parameters, but you can see that this method is very manual and repetitive because there are so many values you can supply as a combination of kernel and C right. C could be one two three hundred. So for how many times you’re going to write this line, so the other approach you can take is you can just run a for loop. Okay, so I’m doing the exact same thing, but using a for loop, so I have my possible values of kernel and then C and then I ran a for loop on both of this, and I’m supplying those values here You can say see here key. L and C well, and then find average course, right, when I execute this, I get this course, okay, so with RBF and one, the score, is this RBF in ten? The score is this and so on just by looking at the values, I can say that RBF and the value of C being either 1 or 10 or linear kernel and C being 1 will give me the best score. You can see that these scores are low so this way. I can find out the optimal score using the hyper parameter tuning, but you can see that this approach also has some issues, which is if I have 4 parameters, for example, then I have to run like four loops and it will be too many iterations and it’s just not convenient. Luckily, Escalon provides an API called grid search CV, which will do the exact same thing. Okay, so grid. Search CV is going to do exact same thing as shown in this code here in line number 14 Alright, so I’m going to do the same thing, but you will not that. We will be able to do that in a single line of code. Okay, so the first thing you do. Is you import Grid search CV from a scaler and model selection? And then we will define our classifier. The classifier is going to be Chris’s CV. Where the first thing is your model, okay. My model is sv m dot SVC. I am supplying gamma value to be auto. If you want gamma to be in your parameters, you can do that, okay, but for this example. I’m just keeping it static now. The second parameter is very important. Second parameter. Is your parameter grid? Okay, in parameter grid, you will see. I want the value of C to beat one. Ten and twenty, okay. These are like different values that you want to try. The second parameter is kernel. And you want to try the kernel and you want the value of your kernel to be RBF linear. Okay, so these are two values. There are other parameters in grid CV, for example, how many cross validations you want to run grid cells? CV is still using cross-validation. Okay, it’s just that we are making this particular code blog convenient, and we are writing the same thing in one line, Of course. Okay, So Cv is this. There is another value called a return train score. If you this is some parameter that this method returns which we don’t need. That’s why we are saying, okay. It is false! Once this is done, you will do model training by saying, is Dot data and Irish dot target. Okay, and once that is done, we will print the cross-validation results. When you execute this. You get these results now. If you look at these results, you will notice that you got this mean test score. CV results are not easy to view, but luckily, SK learn provides a way to download these results into a dataframe. Here I have SQL and documentation, and it says that this can be imported into pandas dataframe. So that’s the next thing I’m going to do and all you guys are. I think exports into pandas by now, so you just create pandas, dataframe and supply CV results as an input? And when I run this, I get this nice tabular view here. You can see that these are the C parameter values and kernel values, and these are the scores from each individual split. Okay, B then five fold cross validation. That’s why you get spread zero to split four, and then you have mean test score as well. Some of these columns in this grid might not be useful, so I’m going to trim it down and just look at parameter values and means go. So you can see that these are the possible values of Para’m. See and then. Colonel, and these are the scores. I got based on this. I can say that I can supply for three values into my parameters to get the best performance, so we already did hyper tuning of these parameters. You see that this how this works, right, and now you can have many many parameters. All you have to do is supply them in parameter grid. And this grid, sir. CV will do permutation and combination of each of these parameters using k-fold cross-validation. And it will show you all the results in this nice pandas dataframe. I can do Dir on my classifier and see what other properties this object has and I see some of the properties such as Best Estimator, Best Params and based score. So let me try best score for so see, laughs. The dot based score and the base score. It is saying Point 98 which is well. 0.98 Is the base score. I can also do see a left dot best params and it will tell me the best parameters in our case. There are multiple parameters, which gives you optimal performance, but you can see the point. You just run grid search CV and then call based Panems to find out the base parameters and these are the parameters. You are going to use for your model. One issue that can happen with grits or CV is the computation cost. Our data set right now is very limited, but just imagine you have millions of data points into your data set and then for parameters, you have so many values right now. See values. I random! It took them to be one to ten, But what if I just want to try? Range, let’s say number one to 50 Okay, Then my computation cost will go very high because this will literally try permutation and combination for every value in each of these parameters to tackle this computation problem. Escalon library comes up with another class called randomized search CV, randomizer CV will not try every single permutation and combination of parameters, but it will try a random combination of these parameter values. And you can choose what those iteration could be. So let me just show you how that works here. I imported analyze CV class from the Esculent model selection and the API kind of looks same as grid search CV. I supplied my parameter grade. My cross-validation value, which is again 5 fold cross validation and the most interesting parameter here is an iteration. I want to try only two combinations okay here. We tried total six. You see zero to five so here, it will try only two combinations and then we’ll call fit method and then we will download the results into data frame. When I run this, you can see that it randomly tried C value to be 1 and 10 and then kernel value to be linear and RBF. When I run this again, It change the value of C to be 20 and 10 this way. It just randomly tries the values of C and Cano and it gives you the base core. This works well in practical life, because if you don’t have too much computation power, then you just want to try random values of parameters and just go with whatever comes out to be the best, all right. We looked into hyper parameter tuning now. I want to show you. How do you choose a based model for a given problem for our iris data set? I’m going to try these two classifiers okay, SVM random forests and the logistic regression, and I want to figure out which one gives me the best performance. You have to define your parameter grid and I am just defining them as a simple JSON object or simple Python dictionary where I am saying, I want to try a sphere model with these parameters, random forests with these other. I want the tree value of random forests to be one five and 10 and this N Estimator is an argument in random Forest Classifier. OK, similarly, the value. C is an argument or a parameter in logistic regression classifier. Once I have initialized this dictionary. I can write a simple for loop, so I’m just going to show you that for loop here, and this for loop is doing nothing, but it’s just going through this dictionary values and for each of the values it will use grid search CV. So you can see that grid search CV. The first argument is the classifier. Which is your model so here you can see. The model is classified, so just trying each of these classifiers one by one with the corresponding parameter grid that I have specified in this dictionary. You can see that parameter. The second object second argument and then cross-validation is five. I then run my training and then append the scores into this course list when I run this. My scores list has all those values and all I’m going to do now is convert those results into pandas dataframe. When I do that, I see a nice table view, which is telling me that for SVM model. I am going to get 98% score. Random Forest is giving me 96 and logistic regression is getting little more than 96 so here. I have my conclusion that the best model for my iris dataset problem is SVM. It will give me 98% score with these parameters, so not only we did hyper parameter tuning, but we also selected the best model here. I have used only three models for the demonstration. You can use 100 models, for example here. Okay, So this is This is more like trial and error error approach, but in practical lives. This works really well. And this is what people use to figure out the best model and the best parameters now comes the most interesting part of my tutorial, which is the exercise. You have to do. This exercise guys just by watching video. You are not going to learn anything. So please move your butt work on this exercise here. We are going to take. SK learns handwritten digits data set and then classify those digits using the listed classifiers. And also you’re going to find out the based parameters for it. Post your answer as a video comment below. And if you want, you can tell your answer with the solution. I have provided now. My solution is not the best one because I just tried only few parameters, so you should try more parameters and I hope you can find better score than me. Alright, so don’t click on the solution link until you have tried it yourself. Thank you very much for watching this video. If you like the content, please give it a thumbs up. Subscribe to my channel and share it with your friends. Thank you very much. I will see you next tutorial.