Transcript:

In this video, we will learn how to do one hot encoding in python tensor. I will explain you with one picture of what is one hot encoding and how it works in this picture. You can see that we have colors, red, red, yellow, green and yellow. There are total five rows, but out of five rows, there are only three unique colors that is one red, second, yellow and third green, and these type of data is known as categorical variable when we are developing a neural network, we cannot pass categorical variable to our neural network model, and we have to convert this categorical variable into a numerical variable and this is done using one hot encoding. Here we can see that we have got three new columns. The first is Red Second is yellow and the third one is green, and we have two type of numbers here. One and zero One indicates that the color red is present here, while zero indicates that the color red is not present here similarly for yellow, we can say that wherever the value zero is there. We say the color yellow is not present. It is absent and wherever the value is 1 We say that yellow is present and all the other variables will be 0 in short. I am telling you that. If the value of color red is 1 the value of yellow and green will be 0 because only the color red is present. If the value of yellow is 1 the value of red and green will be zero because only the color yellow is present if the value of green is one, then the value of yellow and red will be zero because only green color is present. This numerical transformation of this categorical variable is done using one hot encoding and we are going to learn about one hot encoding how we can do that using Pi Torch in this video. First of all, we are going to import one CSV file using Numpy and then we will convert that Numpy CSV file into a pytho’s tensor and then we will apply the one hot encoding. Let’s import all the important libraries that we need import numpy as NP import torch import CSV. We will copy the path of our CSV file. My csv file is saved in this location. Click on the CSV file, go to properties and here in location. We have to copy this location address from here. Let’s go to our jupyter notebook. I’ll make one variable. Vine underscore numpy. This is a wine data and here I will save the path in double quotes and one thing we have to make sure that we have to make this single flash into double slash. Let’s make all the single slash double slash. This is the path of my csv file and now here. I have to write the csv file name. My csv file name is vine data dot csv, lets. Run this line! We have saved our path now. Let’s read the CSV data set using numpy. We are going to read the CSV file using numpy and store in a new variable, I’ll make a new variable. Vine underscore numpy and here I will use the method load text from numpy inside this first. We have to pass the path name. We have stored the path in this variable wind underscore path we will type that find underscore path then we have to define the data type. The data type is going to be Numpy Float 32 After this. We will define the delimiter. It is a comma separated file. After this, there is one more parameter skip rows. We are saying Skip rows is equal to 1 because we don’t want to import the column names, so we will skip the first row, which contains the column name. Let’s print the data. This is our white data. We have successfully loaded using numpy. We are going to convert this data into pythontensor, lets. Check the data type type of find, underscore numpy. It is a numpy nd array. We are going to convert this into a python tensor. Let’s make one variable. Vine underscore tensor in this. We are going to convert this numpy into python tensor, using the method from underscore numpy and we will pass our numpy variable name. Let’s print our new data set print wine underscore tensor. This is a new data set in tensor format. We can also check the type of our new data type of wine underscore tensor. This data set is a torch dot tensor. We have successfully converted a numpy dataset into torchtensor, and now we are going to start one hot encoding. We are going to start one hot encoding. First of all, we are going to split our data into independent and dependent variables. Our independent variables will be in a variable data, while our dependent variable will be in a variable target. Let’s do that, the first variable is data here. We are going to save all the independent variables. We will say find underscore tensor and we are saying. We want all rows all column, except the last column. Our last column is a target variable. That’s why we are not including that column in our independent variable. Let’s print data here. These are all the independent variables. Now we are going to make one more variable, which is target. We will say find underscore tensor. We want all rows and we want only the last column. That’s why we are passing the index minus one. This indicates that out of the 12 columns. We are going to take only one column in last column. Let’s do print target. These are the values which are present in a target variable lets. Check the shape of our target variable target dot shape. It has 4898 rows. If I say target dot shape of 0 it also gives me the same result that is 4 898 rows. Let’s also check the unique values, which are present Target dot unique. These are the unique numbers which are present in our variable target. Three, four, five, six, seven eight nine. We are going to start converting our target variable into a one hot encoding. Let’s make one variable target underscore one hot here we will say from our module torch import the method zeros and we are giving the rows here. Target dot shape zero. We have seen here that target dot shape. Return us all the rows, and we are saying 1.0 Let’s run this code, Let’s print this variable print target dot one knot in a new variable target underscore one hot all the values will become zero because we are calling the method zeros from our module torch and all these numbers will become zero in this new variable. Let’s print it. We have converted all the numbers to zero. We have to now assign one to our target label using the column index number and for that we will use the method scatter, underscore all these zeros will be converted into one hot encoding and it will be stored in a new variable. Let’s make a new variable result. We will call our variable target underscore one hot, which is present here and we will call the method scatter. This will finally convert into one odd encoding scatter underscore, and we have to first give the dimension. We are giving number one as a dimension on dimension one. We want to convert into one odd encoding. Then we have to call Target Dot unsqueeze. This is going to be 1 and 1.0 in short. We are saying for each row. Take the index number of the target label That is these are the level of our target 3 4 5 6 7 8 9 and use it as the column index to set the value 1.0 Let’s print the result variable and check our one hot encoding. We got runtime error. Scatter, fill! Underscore CPU expected data type integer 64 for index. And this error is because when we are taking here target values, we have to convert it into long type. We will type long and now let’s run again all the codes now. We have successfully run this line. Let’s print our variable result. We can see that we have successfully converted our target variable into a one hot encoding variable in the output. We can see one point. Zero is here. This is how we can convert a variable into one hot encoding in pi torch. I’ll see you in the next video. If you like my video, please subscribe to my channel. Thank you for watching [Music].