Transcript:
[MUSIC] In this video, I want to walk through how to create a simple, fully connected neural network in Pi Torch. There’s a lot of information online, and I know that it can be overwhelming. So let’s go through this step-by-step. The outline for this video is that we’re going to import all the necessary packages from Pi Torch. We’re going to create our fully connect neural network very simple one, and then we’re going to do some initialization of our model like learning rate, etc. We’re going to load data in this example. We’re going to look at a very simple data. Set the emne. Stata Set and we’re going to initialize the network that we created in this step and also setup the loss function and the OPTIMIZER that we’re going to use. Then we’re going to train our network and in the end. We want to see well. How good is the network that we trained on the training and the test set so to begin with? We want to import all the necessary. This is your packages, so I’m going to copy in and let’s go through what each of them does? Torch is just the entire library and then from torched @nn, we’re going to import it as and then essentially, this is like all the inner. Network modules if we’re going to define our feet Warner network like we’re going to use their linear like N N Dot linear. That’s included in the NN module. If we’re going to do something, more complex is advanced like convolutional neural networks. It’s also going to be in this module. Loss functions are also inside here and towards that optimize. Upton, this is all the optimization algorithms like stochastic, gradient descent, atom, etc, then torch and unfunctional as f. Essentially, this is like all the functions that don’t have any parameters so activation functions like relu, 10h etc are in this one. They were also included in the NN package. So a little confusing you could use either. We’re going to use this one. In this example, then we have tortured Utils that data import data loader. Essentially, the data loader gives us easier data set management. It helps us create mini batches to Train on etc, then torch vision that data sets as data sets so pipe torch has a lot of standard data sets and makes it very easy for us to import them in a nice way. We’re going to use this one to import our am-nes’s data set. Then, Lastly, torch vision transforms as transforms is like we’re not going to go into this in depth, but essentially this has transformations that we can perform on our data set. All right, so now that we have the packages and we understand like roughly what they do, Let’s create our fully connected. Network we’re going to create a class and then foreigner Network. Then we’re going to inherit from N N Dot module and then we’re going to create our initialization and the first thing that we’re going to do is call super and then Dot Init search and essentially the super calls the initialization method of the parent class, which is in this case and then dot module. So we run the initialization of that that method. Then we want to create a very, very small. Network, so we’re going to do just two layers. The first one is going to do an end off linear, and yeah, so let’s input some input size and then some number of classes in our case. The input size will be 784 since the amnesty, the set has 28 by 28 images, which is 784 nodes. Then yeah, so we’re going to do an end linear for the input size and then let’s do some hidden layer nodes of 50 nodes so very, very small and then another one to have fifty two. The number of classes next step is to define the forward method so the forward will run on some some input. X and then what we’re going to do Is that we’re going to perform the layers that we created or initialized in the init method, so we’re going to call Ffc Self Dot F c1 on X, and then we want to call Rel Loop on that a non linearity function and let’s call this X just for simplicity and then we’re going to call self dot FC2 on X and also call this X and then return X. Okay, one idea is that we check if it if it runs and gives the correct shapes on some random generated data, so let’s say that we initialize our model to B and then and 784 comma, 1010 for the number of digits, and then we just initialize some X, which is going to be shape 64 comma 784 so essentially this this is the number of examples that we are going to run simultaneously, so the the number of like images are yes. So this is our mini batch size, and then we all we do is that we run the model on that input and then just print the shape. So what we want this to return is that we want to have 64 by 10 so we’re going to want to have 10 like values for each of the digit like four. So what is the probability that it’s a like a zero? It’s a 1 et cetera, and we want to have that for each image, so we see that return 64 by 10 so that we have like a rough idea that OK? This does what it’s supposed to do. Now what we want to do Is we want to initialize some device so we can either run a model on the GPU or the CPU. So what we’re going to do is device is stored touch device. Cuda, if is available, so if we have available If we have CUDA, then we want to run it on the GPU. If we don’t, then we just run it on the CPU and the hyper parameters in this case we’re going to have an input size of 784 with input size. 704 number of classes is going to be 10 The learning rate is going to be 0.001 batch size, And then the number of epochs that we want to train. So let’s say just 1 in this case. Now what we want to do is that we want to load data And, as I said In the beginning, we can use data sets from torture vision, so we can use use data sets that Amnesty and then what we want to do first is that we want to have some route like where it should save the data set. Let’s just create a folder called data set and then train equals true. This is going to be the training set, so train data set and then transform is transforms to tensor. That’s all we’re going to use the transforms for in this case, so I think when it loads the data, it’s going to be numpy arrays or something like that, and we just wanted to transform it to a tensor so that it can run in Pi Torch. And then if we don’t have it, we want to download it. I torch if it if the data set is in the folder data set, then it’s not going to download it, but if it doesn’t, then we want to download it. And then we want to create a training loader, so we’re going to use the data loader in this case and we’re going to use the data set train data set and batch size we’re going to set to batch size, which is 64 in this case and shuffle equals True is just going to for each when it has gone through each image in the training set and it trains it for a second epoch and it’s just going to shuffle the the batches. Make sure that we don’t have the exact same images in a batch every epoch. Then we want to do the same thing for lets. See for data set so test data set test data set test train equals false now. We want to so yeah, initialize the network. Our model is going to be n n of input size. Yeah, okay, input size equals input size. Common number of classes equals number of classes and then we’re going to do this dot 2 and device. So this is what we talked about before. Cuda, if it’s possible, otherwise CPU and then for our loss function, we’re going to use N. N Dot cross-entropy loss and OPTIMIZER is Adam that we’re going to use opt in that Adam, and for first we want to call all the parameters that we have so model dot parameters and we’re going to set the learning rate to learning rate, though we initialize in the hyper parameters, and, yeah, so next step, we’ve imported the data or we loaded theta. We’ve initialized the network. We’ve created the Lausanne Optimizer. Now we want to train the network. So what we’re going to do here is first. We wanted to run for epoch in range of number of epochs, so an epoch is essentially how, like one epoch means that the network has seen all the images in the data set. So then we want to go through each batch that we have in our training loader. So for data comma targets in train loader, one thing that is common is that we want to see which batch index that we have. So how we do that is that we take enumerate of train loader, and then here we set batch index comma and then in a tuple set data comma targets, so the data, essentially the images the targets is the correct correct digit for each label for each image. Then what we want to do is that we want to make the data that is currently a tensor to the device that we’re using so data equals data to device equals device targets equals targets. Dot to device equals device. Now, one thing that we want to do. Is that if we? I guess if you run this and check the shape. Okay, it’s going to download first. Okay, so one thing we see here. Well, okay, let’s restart this, okay. Okay, so what we see here? Is that the? I mean, like a number of examples. The number of images that we have is 64 which is correct. And then we have one for the number of channels. Essentially, this is like. Amnesty is black and white, so it’s only going to have one input channel. If we would have colored images, this would be RGB so this would be three in that case, and this is just the height and the width of each of the image. So Eminence is 28 by 28 but we want this to be just one single 784 so we want to unroll this matrix into a long vector one way to do. This is to do data equals data reshape and then set the first access to be like the first dimension to to just remain 64 And then just do a minus 1 then it’s going to essentially, yeah, flatten all of these ones into a single dimension. Then what we want to do so this is get data to CUDA if possible. Yeah, and this is just like reshape, get into correct shape, then we want to do the forward part of the neural network, and then this is quite easy. We just call the model of the data. Let’s call that scores. Then we want to call the lot like the loss function, which we define as criterion criterion of scores. So this is yeah, let’s see so criterion of. Yeah, okay, so criterion calls the input first, so this course, and then the correct. Yeah, so the input, which is scores and the target, so the ones that we have targets here. Those are the correct labels, then we want to go backward in them in the network, as I said before, we don’t have to actually implement like the backward propagation, So this is quite easy in in Pi Torch. We’re just going to call loss top backward. One thing we need to remember, though, is is that we need to do OPTIMIZER DOT 0 grad. Essentially, we want to set all the gradients to 0 for each through each batch so that it doesn’t store the back like the back prop calculations from previous previous forward problems. And then in the end, we want to do a gradient descent or Adam step, and this is quite easy OPTIMIZER Dot step so here, we update the weights, depending on the the gradient’s computer in lost out backward after doing that we. Yeah, so that’s the training now what we want to do? The last thing that we want to do is check like the accuracy. Let’s see if we if it performs any good. So we created the define check accuracy and lets let’s input the model, the loader and the model then we want to do Is we want to first set the like how many number of correct to be 0 and the number of samples to be 0 in the beginning and then we want to do model dot evaluate, so we want to set them all to evaluate because well, if we would use some other techniques, then we want to let the model know that this is evaluation mode, which might impact how the calculations are done, then we want to do. We torch Dot no grad, so this when we just check the accuracy, we don’t have to actually compute the gradients that would be unnecessary computation. So we just do torch that no grad to that. Pi Torch know that you don’t have to actually compute any gradients in the calculations. So for X and Y in loader, then we do first so X equals X dot two device equals device Y equals y dot two device equals device, and then X dot equals reshape. It’s essentially like the same thing that we did here. Write X dot shaped 0 comma minus 1 and then call scores model of X, And now we want to do is that we want to take scores, Dot Max, and we want to have the max of the second dimension, so we can either set 0 or 1 here. If we would look at the shapes of scores, this would be as we saw. Previously it will be 64 images, common times 10 and we want to know which one is the maximum of those total 10 digits. So is it if like, if the maximum value would be at the first one, then it will be a digit zero, so do predictions to be scores. Max of 1 so this would give us the value we’re not really interested in the value interested in the index of the maximum value for the second dimension. Then we just do a number correct to be the predictions that are equal to the correct label, and then we just take the sum and number of samples will be the predictions so predictions dot size of the of the first dimension, so essentially this right 64 and then we want to compute the accuracy to be. Yeah, so. I guess we could just do print and then let’s do an F string. Got number correct number of samples with accuracy. So just want to print it in a nice way. So what this does is we convert, remember? These are tensors. So we want to convert them into floats, and then we don’t want to print. For example. Let’s say we have ninety five point like this. We would rather just print with two decimals like this, so what we can do with F strings. It’s just like this comma to. F and that should be that should be all I think then we we just return the model. The Train. So after we’ve checked accuracy. Return the model to training. I guess it’s not necessary for this example because we are done with the training part. But perhaps you would check that we receive all your training to see that it actually improves then. This would be something you want to have. Then in the end, Let’s see we want to run. Check accuracy on the train loader, a train loader comma model and then similarly on the test set. One thing we can do also is we can check in the beginning if the loader dot dataset train, then we know that it’s running on the training data so we can do print check checking accuracy on training data, and if it’s not, then we can say print checking accuracy on test data like that. And I think this should, yeah. This should be all so. Let’s see if this runs and case. I’m running on the CUDA because I have a GPU. If you don’t think it’s going to run a DCP, It’s not really going to be any difference, so let’s see we got an error return accuracy. Yeah, so we think to find accuracy. I guess we don’t actually have to return it. Okay, But we see a herd that sort of around checking accuracy on training data, ninety-three percent. All right, so training, just one epoch on this small network actually gives quite okay A crazies. Yeah, well depends on what, like 93 percent is okay. I guess, and in this case, it actually performs better on the test data. I think if we run this form or epochs, it’s going to perform even better. And also if we make the elective network larger, it’s definitely going to improve, but this was that the idea was not to get the best accuracy. It’s just to show an example of how we create a simple neural network in Pi Torch. Yep, so hope you enjoyed this video and I hope to see you in the next one.