Transcript:
Hello in this episode. We are going to mention how to build machine learning models with light. GBM framework as a data set we’re going to use this golf to dot TXT file at each team. This is a simple data set. It includes golf-playing decisions. Yes, or no based on some features here. Some pictures are nominal, such as block or bind, and some features are continuous such as temperature and humidity. I’m going to import pandas to read two data set one last dot read CSV, and I’m going to pass the name of the data. Set, it’s going to be go through, Dot. Txt ant is going to be my data frame. Let’s see the content of the data frame. Head function returns. The first five rows of the data set like GBM expects you to transform nominal features into numeric ones. Here we can find the column names, data frame dot columns and let’s build a forward in data frame problem. Let’s print the column name first here. We can’t access the data frame column, just like that data frame and in the parentheses, we will pass the column name and we bonded the data type of this column. I seen some columns are object type, and some features are in integer type. We have to transform object taps into numerical months so here we need to check the type of the column. And if it’s equal to object that we have to do some transformation. I’m going to import psychic learnt, pre-processing library, import, pre-processing and Here I am going to create a label Encoder and it’s equal to PA Processing Dot label encoder. And we are going to add like label The encoder dot fit and here. I’m going to pass the object code on tap in the data frame After that, I’m going to call the encoder dot transform and again I’m going to pass the object code on here. This is going to be the encoded feature and I’m going to transform the encoded feature into the data frame, and this is going to be my new problem. I deactivate the print column and run this block. Now I’m going to check the data frame content here. A scene nominal features such as outlook and white verb transformed into numerical ones now. I need to split the data frame into features and target values here. The target value is stored in the decision column. That’s why my target is equal to data frames. The decision column and features are stored in the other columns. I can drop the decision column. Just like that produced is equal to decision when this way X wouldn’t be out loop to bind features and here I’m going to call values command to transform the pandas dataframe into Numpy Array. You can see the content of the X, just like that. These are my features, and these are my target values. They are all in numerical type now. It’s time to modeling here. I’m going to change the type of this sad from code. Back down here. I’m going to import like GBM. First is algae be here. The both X and Y in other words, features and target values are in type of non pie array. We can check it just like that. Type X + 5 y is also in type of numpy array. We will convert these vampire s into light. GBM format Lgb Dot data set and here I’m going to test features and target values as 1st and 2nd argument. This is going to be algae be trained. Let’s see the content of the lightly banned dataset. Here they can pass categorical feature as input argument and remember that the bot block and wine are categorical pictures in my dataset, but here X and Y are non pipe arrays and they don’t store the column names. That’s why here we can pass feature names as input as well creature name. This is going to be the columns of the data frame, but we should not include the target value, That’s why. I’m going to call data frame dot drop and codons throw decision target failure. It’s going to be like Gbm. Train data set. I’m going to create my configuration parameters and is going to be a JSON array here. Boosting pipe will be gradient boosting decision trees. Secondly, my objective function will be multi class because the state classification problem remembered the base dataset. Its target. Rhythms are has had no classes on the other hand. If your dataset has numerical target period, damn, you should define the objective function as regression just like that. This should be regression in that case. Number of classes in my dataset is actually equal to data frame decision, dot number of unique values That’s equal to two. Alternatively, we can’t get unique values and get two lengths off that array. It’s going to be two in my case here. I should also define metric and it’s going to be multi vocalist because it’s the multi-clas’s classification problem on the other hand. If your problem is a regression problem, then you should define the metric differently in that case, for example, root mean squared logarithmic error could be a metric for your regression problem. Normally you shouldn’t pass MIN Data parameter here. But in that case, the data set is very small, that’s why. I’m going to test. Min data is equal to 1 It’s going to be my configuration parameters. Now it’s time to train. You can’t build your like GBM model by GBM. Dot train and here. You should pass your parameters. Then your train set My train set was like GBM. – train. Calling train function returns an error. I actually know why this error occurred here. Remember the column names in my base. Data set here Temperature column is specified bit dot character. That’s why lightly Bamp training function turns on error here. We can’t pass this array, but I remove the dot character and call chain function again and it’s it’s completed successfully In this time. Our GBM model is built. Not we can make predictions here. We can’t pass our input features. That’s going to be our prediction set. Let’s see the predictions. Its multi-clas’s classification problem. If 0 index value is greater than the 1 index value done, it’s going to be no on the other hand. If the 1 index value, for example in the third line is greater than the zero index value. Damn, this is going to be yes decision. Let’s built a for loop here. For prediction in predictions, print prediction First actually is an array. It’s non Pari actually here. I need to import numpy and here. I’m going to check Numpy Dot art Max prediction. Let’s print it first. For example, the first line returns 0 and the third line returns 1 and I’m going to say if this is equal to 0 then decision will be no other vice is going to be. Yes, Actually, it’s going to be my prediction and I defined it as P. Let’s print the prediction here. We actually have the actual values in the Y variable. These are the zeros and ones as well. Let’s create the index variable here and initialize it as 0 and in every loop. Increase its content and here. I need the actual value. Actual value a is equal to Y erase index index value. And let’s print worse. Actual is a prediction by the way a is a numeric value here. That’s why I’m going to check the content of a here. If this is equal to zero, then a would be no. Otherwise it would be yes as in our print predictions and actual values for all instances in my data set. It seems that all instances are classified correctly. Prediction is no and actually is not, for example, prediction is yes and actual value is yes, – like. GBM also offers repeatable or explainable machine. Learning models. Let’s see the interpretability section here lightly. Bam prom work that lot importance and here. I’m going to pass the my GBM model. Also the see the input parameters. Max number of features. Let’s say that, actually, our features are size of four. That’s why we are going to print all features in the huge importance graph store. This in the X variable. Let’s see the content of the X. We need to call Plz Dot short, but before that, we need to from matplotlib that pipe lot as PLT. Actually, this should be important loaded. Then we can call Piatti draw function here and it’s in. These are the importance of our features. Outlook is the most dominant feature in my data set and temperature comes after the old block. And what is the third important feature in my data set and the humidity is the last we can’t plot to build decision tree in similar way, like GB M Dot plot three until they are going to pass the built. Gbm model. And let’s stir it in X Breville and that’s called pelted up short again. It returns an error and stays make sure the graph is executable’s are on your system here. The pet of the graph. This is located here in my computer, that’s why. I pad this to the pet variables, and let’s call this look again. That’s the built decision tree, but it’s very small. You can handle this issue Just like that initialize. The figure size here is the build decision tree. I think we can see the content of it. This is the built! Gbm model in my case. So we have mentioned have to build machine learning models with like GBM framework. Thank you all for your watching and see you next time.