Transcript:
Hi, everybody, welcome to a new. Pi Torch Tutorial. Today We will implement our first multi-layer neural network that can do digit classification based on the famous. End this data set in this tutorial. We put all the things from the last tutorials together, so we use the data loader to load our data set. We apply a transform to the data set. Then we will implement our neural net with input layer, hidden layer and output layer and we will also apply activation functions. Then we set up the loss and the OPTIMIZER and implement the training loop that can use batch training and finally we evaluate our model and calculate the accuracy and additionally, we will make sure that our whole code can also run on the GPU if we have GPU support, so let’s start and first of all we import the things we need, so we import torch, then we import torch dot and N S and N Then we import torch vision for the data sets and we import torch vision dot transforms s transforms, and we also import matplotlib pipe lot SPL T to show you some data later, and then first of all we do the device configuration so device config and for this, we create a device by saying device equals torch dot device. And this is the name CUDA. If we have GPU support. So if torch dot CUDA dot is available and if it is not available so else, we call our device simply CPU and then later, we have to push our tensors to the device and this will guarantee that it will run on the GPU. If this is supported, so yeah, so let’s define some hyper parameters and here. Let’s define the input size and this is 784 because later we see that our images have to size 28 by 28 and then we will flatten this array to be a 1d tour 10 Zoa and 28 Times 28 is 784 So that’s why our input size has to be 784 then let’s define a hidden size and here. I will say this is 100 You can also try out different sizes here and the number of classes and this has to be 10 because we have 10 different classes. We have 2 digits from 0 to 9 Then let’s define the number of epochs and here. I will simply say 2 so that a training doesn’t take too long, but you can set this to a higher value. Then we define the batch size here and this is let’s say 100 and let’s also define the learning rate here by saying learning rate equals point 0 0 1 And now let’s import the famous. M this data, so you can have that from the Pi Torch library by saying training data set equals. And here we use torch vision. Dot data sets, DOT M. NIST, and this will have to have the route where it has to be stored. So root equals and here this should be in the same folder so Dot, and then it should create a folder called data and then we say train equals true. So this is our training data set, and then we say we apply a transform right away, so we say, transform equals transforms dot to Tenza, so we convert this to a tenza here, and then we also say download equals true, so it should be downloaded if it is not available already. Then let’s copy this and do the same thing with our dataset and here we have to say train equals false, and we also don’t have to download this anymore. So now let’s continue and create the data loaders by same train loader equals. And here we get this from. Torche Dot Util’s, dot data dot data loader. And then it will have to have the data set by saying data set equals. And here it gets the training data set so train data set. Then we have to specify the batch size, so this is equal to the batch size, and then we also have to say, or we can say shuffle equals true, so this is pretty good for training, and then we copy this again and do the same thing for our test. Loader so test loader equals. It gets the test data set and we can say shuffle equals false because it doesn’t matter for the evaluation, and now let’s have a look at one batch of this data by saying examples equals, and then we converted to a error object, Bitter off the drain loader, and then we can call the next method and unpack this into samples and into labels by saying this equals examples dot next and now let’s print the size of these, so lets print samples, dot shape and also print print the labels dot shape. And now let’s save this and run this, so let’s call. Python feedforward dot pi to see if this is working so far. And yes, here we have the size of the sample. So this is 100 by 1 by 28 by 28 and this is because our batch size is 1 red, so we have 100 samples in our batch. Then the one is because we only have one channel, so we don’t have any colored channels here so only one channel, and this is our actual image array so 28 by 28 as I said in the beginning and our label, us is only a tensor of size 100 so for each class label. We have one value here, so yeah. This is our some example data, and now let’s also plot this here to see how this is looking so for. I in Range 6 and here we use matplotlib. So I call Plt dot subplot of with two rows and three columns and the index. I plus 1 and then I can say PLT Dot M show and here I want to show the actual data so samples of I and then of 0 because we want to access the first channel, and then I will also give this a column map, so see, map equals gray, and then I say Plt Dot show and lets. Save this and run this again and here. We have a look at the data, so these are some example handwritten digits, and now we want to classify these digits, so for this we want to set up a fully connected neural network with one hidden layer, so let’s do this, so let’s comment this out again, and now let’s create a class neural net, and this has to be derived from N N Dot module, and now we have to define the init and the forward method, so the init. Method, so this will get self and then it will has to have the input size, then the hidden size and then the output size, so the output size is the number of classes and here first we want to call the super in it so super of neural nets and self and dot in it self dot in it, and then we create our layers. So first we want to have a linear layer by saying self dot l1 equals n. N dot. Linea and this will have has the input size as input and the output size is the hidden size. Then after the first layer we want to apply a activation function and here. I simply use the famous Riilu activation so self dot Riilu equals N N Dot re Lu, and then at the end, we have another linear layer so self dot l2 equals n n dot linear, and now we have to be careful, so the input size here is the hidden size and the output size is the number of classes, and now let’s define the forward method, so this will have self and one sample X, and now we apply all these layers, so we say out equals, and now we use the first layer l1 which gets the sample X and then the next out, is self dot riilu now use the activation function, which will get the previous output here and the last out equals self dot l2 and out so this will apply the second linear function. And now we have to be careful again because here at the end, we don’t want an activation function, so we don’t apply the softmax here as usual in in multi class classification problems because in a second, we will see that we will use the cross entropy loss and this will apply the softmax for us, so no softmax here, so we simply say return out, so this is our whole model, and then we can create it here by saying model equals neural net, and this will get the input size, then the hidden size and the number of classes. So yeah, now we have the model so now let’s come create the loss and the OPTIMIZER so here we say criterion equals N N Dot Cross and for P loss and this will apply the softmax for us. So that’s why we don’t want this here, so be very careful about this, and now let’s create our OPTIMIZER as well by saying Tour OPTIMIZER Equals Torch Dot Optim Dot. Now let’s use the atom OPTIMIZER here and this has to get the parameters and here we can use model dot parameters and it also has to get the learning rate. L R equals learning rate. Now we have the loss and the OPTIMIZER. And now we can do our training loop so training loop now and for this, let’s first define the number of total steps, so N Total steps equals, and this is the length of the training loader. So now we can do the typical loop, so we say for. Deepak in range Num. Be pox and so this will loop over the epochs, and now we loop over all the batches so here we say for I and then again, we unpack this, so we say images, images and labels, and then we iterate over a number right over our train loader. So the enumerate function will give us the actual index and then the data and the data here is the tuple of the images and the labels, and now we have to reshape our images first, because if we have a look at the shape, then we see that this is 100 by 1 by 28 by 28 as I showed you in the beginning, and now we said our input size is 784 so our images tensor needs the size 100 PI and 784 a second dimension, so the number of Spatulas first, so let’s reshape our our tens of first, so we can do this by saying images equals images dot reshape and here we put in minus 1 as the first dimension. So then Tensor can find out this automatically for us and here as second dimension. We want to have 28 by 28 and then we also call to device so we will push this to the GPU. If it is available and we have also have to push it it to the push the labels to the device so labels equals labels to device And now let’s do the forward pass so first we do the forward pass and afterwards, the backward pass, so the forward pass, we simply say outputs equals model and this will get the images and then we calculate the loss by saying equals. And then here we call our criterion and this will get the predicted outputs and the actual labels. So this is the forward pass and then in the backward pass. The first thing we want to do is call OPTIMIZER DOT 0 Grat to empty the values in the gradient attribute, and then we can do the next step by saying loss dot backward, so this will do the back propagation, and now we can call OPTIMIZER DOT Step so this will do an update step and update the parameters for us, and now let’s also print some print the loss, So let’s say if I plus 1 Modulo 100 equals equals zero so every 100th step, we want to print some information, so let’s print the current epoch so by saying this is epoch epoch plus 1 and then we want to print all the epoch’s so number of epochs, then let’s also print the current step by saying step, and this is I plus 1 and then the total number of steps by saying N Total steps and we also want to print the loss by saying loss equals loss dot item and let’s also say we only want to print four decimal values. So, yeah, now we are done with the training, so this is the whole training loop, and now let’s do the testing and the evaluation, and for this, we don’t want to compute the gradients for all the steps we do, so we want to wrap this in a with torch dot, no rat statement and then first, we say the number of correct predictions equals zero, and the number of samples equals zero in the beginning, and then we loop over all the batches in the test samples, so we say for images and labels in and here we can simply say in Test Loader, and then again we have to reshape this so like we did here so images and labels we want to reshape this and put it and push it to the device and then let’s call. Let’s calculate the predictions by saying outputs equals model, so this is our trained model now, and this will get the test images here and then let’s get the actual predictions by saying underscore, and then predictions equals torch dot max of the outputs and along the dimension along the number one, so the torch that Max function will return the value and the index so we are interested in the actual index. So this is the class label, so that’s why we don’t need the first actual value, so these are our predictions, and now let’s say the number of samples plus equals and here we say labels dot shape zero, so this will give us the number of samples in the current batch so should be 100 and then we say the number of correct, so the correct predictions equals and here we can say predictions equals equals the actual labels and then dot sum and then dot item so for each correct prediction, we will add plus one, and then, of course we have to say plus equals the number of correct values, and then when we are done with the loop, we calculate the total accuracy by saying. AK equals 100 times. The number of correct and predictions divided by the number of samples. So this is the accuracy in percent and now let’s print this so print and we want to print accuracy equals and here we simply say AK, and then we are done so now let’s save this and clear this and let’s run this and hope that everything is working. So now our training starts and we should see, though that the loss should be increased with every step. Sometimes it will also increase again, but finally it should get lower and lower, and now we should be done and testing is very fast. So now we see that the accuracy is 94.9 so it worked. Our first feet forward model is done and yeah. I hope you understood everything and you enjoyed this. If you liked it, please subscribe to the channel and see you next time bye.