Transcript:
Hi, everybody, welcome to a new. Pi Torch Tutorial. Today We are implementing a convolutional neural network and do image classification based on the Syfy 10 dataset. The cipher 10 is a very popular image data set with 10 different classes like we have airplanes, cars, birds, cats and other classes and this data set is available directly in Pi Tersh, So you will create a convolutional neural net that can classify these images So now let’s talk about convolutional neural networks very briefly. I will not go into too much detail now. Because this tutorial should be focused on the PI Torch implementation, but I will provide further links in the description. If you want to learn more in detail. So convolutional neural nets or confidence are similar to ordinary neural networks, they are made up of neurons that have learn about Sande biases and the main difference. Now is that convolutional nets mainly work on image data and apply the so called convolutional filters, so a typical confident architecture looks like this, so we have our image and then we have different Convolutional layers and optional activation functions, followed by so-called pooling layers, and these layers are used to automatically learn some features from the images, and then at the end, we have a one or more fully connected layers for the actual classification tasks. So yeah, this is a typical architecture of a C and N and these convolutional filters. Now they work by applying a filter kernel to our image, so we put the filter at the first position position in our image. So this is the filter here, and this is the input image, so we put it at the first position, the rat position, and then we compute the output value by multiplying and summing up all the values, and then we write the value into the output image so here at the red, and then we slide our filter to the next position, so the green position, then if you can see this here, and then we do the same thing and the same filter operation, and then we slide our filter over the whole image until we are done. So this is how convolutional filters work and now with this transform, our resulting image may have a smaller size because our filter does not fit in the corners here, except if we use a technique that is called padding, but we will not cover this here in this lecture, so getting the correct size is an important step that we will see later in practice and now let’s also talk about pooling layers briefly, so pooling layers are more specific. In this case, the max pooling Max Pooling is used to down sample an image by applying a maximum filter to separations. So here we have a filter of size two by two, and then we look at the two by two sub regions in our original image, and we write the maximum value of this region into the output image, So Max Pooling is used to reduce the computational costs by reducing the size of the image, so this reduces the number of parameters that our model has to learn and it also helps to avoid overfitting by providing an abstracted form of the input. So, yeah, these are all the concepts we must know and again, please check out the provided links if you want to learn more and now enough of the theory. And let’s get to the code so here. I already wrote the most things that we need, so we import the things that we need. Then we make sure that we also have the GPU support, then we define the hyper parameters, and if you don’t know how I structure my pie chart files, then please also watch the previous tutorials because there I already explained all of these steps, so then first of all, we load the data set and here as I said, the Cypher 10 data set is already available in pie charts, so we can use it for from the Pie Chart Data sets module. Then we define our pie chart data sets and the pie chart data loader, so then we can do automatically batch optimization and batch training, then I defined the classes and hard-coded them here, and then here now we have to implement the convolutional net, and then as always, we typically, we create our modeled and we create our loss and the OPTIMIZER. So in this case as this is a multi-clas’s classification problem, we use the cross entropy loss and then as OPTIMIZER, we used a stochastic gradient descent, which has to optimize the model parameters and it gets the defined learning rate, and then we have the typical training loop, which does the batch optimization, so we loop over the number of epochs and then we loop over the training loader, so we get all the different batches and then here again, we have to push the images and the labels to the device to get the GPU support, then we to do our typical forward pass and create the loss and then we do the backward pass where we must not forget to call to empty the gradients first. You with the zero crap, then we call the backward function and optimize a step and then print some information, then when we are done, we evaluate the model and as always, we wrap this in a with torch dot no gret argument or statement, so because we don’t need the the backward propagation here in the gradient calculations, and then we calculate the accuracy, so we calculate the accuracy of the total network and we are lady accuracy for each single class, So yeah, so this is the script. You can also find this on my Github, So please check that out there. And now the only thing that is missing now is to implement the convolutional net, so for this, we define a class confident, which must inherit an N dot module and as always, we have to define or implement the init function and the forward function for the forward pass. So now let’s write some code here, so for this, we have a look at the architecture again, so here first, we have a convolutional layer and then followed by a real activation function. Then we apply a max pooling. Then we have a second convolutional layer with a real function and a max pooling, and then we have three different, fully connected layers and then at the very end we have the softmax and the cross entropy. So the softmax is already included in the cross entropy loss here, so we don’t need to care about this, so yeah, so let’s set up or create all these layers, so let’s say self conf one equals and here we get the first convolutional layer by we get this by saying N N Dot conf 2d and now we have to specify the sizes, so the input channel size now is three because our images have three color channels, so that’s why the input channel size is a 3 and then let’s say the output channel size is 6 and the kernel size is 5 so 5 times 5 and now let’s define a pooling layer. Self pool equals N N dot max pool 2d with a kernel size of 2 and a stright of 2 so this is in as in the image that we have seen so our kernel size a size two by two and after each operation, we shifted to pixels to the right. So that’s why the stride is two and then let’s define the second convolutional layer so self-conscious and now the input channel size must be equal to the last output channel size, so here we say six and as output, let’s say 16 and kernel size is still 5 and so now we have our convolutional layers, and now let’s set up the fully connected layer by saying self dot. FC 1 equals N n dot linear, and now here as an input size. So first, I will write this for you. So this is 16 times 5 times 5 and as output size. I will simply say I will say 100 so you can try out a different one here, and I will explain in a second. Why this is 16 times 5 times 5 Then let’s set up the next fully collected layer. So this has 120 input features and let’s say 84 output features and then let’s use a next of final, fully connected layer, so we have. FC 1 FC 2 and FC 3 And this is an input size of 84 and the output size must be 10 because we have 10 different classes, so you can change the 120 here and also the 84 but this must be fixed and also the 10 must be fixed. So now let’s have a look at why this is. This must be this number so here. I have a little script That does exactly the same thing, so now let me change the number of epochs. Oh, yeah, is for so here. I have the same thing in the beginning. I load the data set and let’s also print or plot some images, and then here I have the same layers, so here I have the first convolutional layer and the pooling layer and the second convolutional layer and first of all, let’s run this and plot the images, so let’s say Python C and N Test Dot Pi, and I’ve already downloaded. It’s a prince. Yeah, it’s very blur, but I think you can see this. This is a horse and maybe a bird and another horse and yeah. I don’t recognize this, actually. Let’s run, run this again. See some better pictures? Maybe so you got still very blurred, so I think this is a deer, a car, a frock and a ship, so yeah, so let’s see how the sizes looks so first. We just print images touch shapes, so this is 4 by 3 by 32 by 32 and this is because our batch size is 4 and then we have three different color channels and then our images have size 32 by 32 so now let’s apply the first convolutional layer, So we say X equals Quant 1 and this will get the images and now let’s print the next size after this operation, so lets don’t. Oh, sorry! I don’t want to plot this anymore. So now we have the next size. So this is 4 by 6 by 28 by 28 and so the 6 now we have 6 output. Chen, as we specified here and then the image size is 28 by 28 Because, as I said, the resulting image may be smaller because our filter doesn’t fit in the corners here and the formula to calculate the output size. Is this so this is the input Width minus the filter size plus 2 times paddings. In this case, we don’t have padding and then divide it by the stright and then plus 1 so in this example, we have an input. Size 5×5 a filter size 3×3 padding is zero and stride is 1 so then we have. The output size is 5 minus 3 plus 1 so this is 2 then divided by 1 Still 2 and then plus 1 so that’s. Why here, our output image is 3 by 3 and now we have to apply the same formula in our case, so we have 32 minus the filter size so minus 5 so this is 27 plus 0 still, 27 divided by 1 still 27 and then plus 1 so that’s why it’s 28 so here we have 28 by 28 Then let’s apply the next layer, so the next operation is the pooling layer, so let’s save this and run this. So now. Our size is 4 by 6 by 14 by 14 so this is because as in the example, our pooling layer with a kernel size 2 by 2 and a stride of 2 will reduce the images by a factor of 2 so yeah, and now let’s apply the second convolutional layer, so let’s print the size after this operation, so clear this first and run this, and then again, we would have to apply the formula as I just showed you to reduce the size so here. Pi Torch can figure this out for us. So the size is 4 by 16 and this is because the next channel output size and that we specified is 16 and then the resulting image is 10 by 10 and then we apply another pooling operation that will again reduce the size by a factor of two. So this is why now we see that the final size after both convolutional layers and the pooling layers is 4 by 16 by 5 by 5 So and now if we have a look again, so now after these convolutional layers. Now, when we put them into our classification layers, we want to flatten the size, so we want to flatten our 3d 10 0 to a 1 D 10 ZOA. And now this is why now if we have a look at the size now. The input size of the first linear layer is exactly this that we have here so 16 times 5 times 5 so this is very important to get the correct size here, but now we know why this is so this must be 16 times 5 times 5 and now we have the correct sizes. So now we have all the layers defined, and now we have to apply them in the forward pass, so we say X equals, and now let’s apply the first convolutional layer which gets X and then after that, we apply an activation function so we can do this by calling. F so I imported torch and and functional as F and then I can call F dot riilu and then put in this as the argument and then after the activation function, so by the way, the activation function does not change the size. So now we apply the first pooling layers of self to pool and rep this here, and so this is the first convolutional and pooling layer, and then we do the same thing with the second convolutional layer, and now we have to pass it to the first fully connected layer and for this we have to flatten it so we can do this by saying X equals X dot U and the first size we can simply say minus one so Pi touch then can automatically define the correct size for us, so this is the number of patches, the number of samples we have in our patch here, so for in this case, and then here, we must say sixteen times five times five, and now we have our tens of flattens, and now let’s call the first fully connected layer by saying X equals self dot FC one and this will get X and then we apply an activation function again. We simply use the riilu. I also have a whole tutorial about activation functions. So please check that out. If you haven’t already so now after this, we apply the second one, so X equals this. The second fully connected layer with a real uu activation function and at the very end, we simply have X equals self dot the last fully connected layer fc3 with X and no activation function at the end and also no softmax activation function here because this is already included in our loss that we set up here, so then we can simply return. X and this is the whole conlou net model. Now you should know how we can set up this, and yeah, so then we create our model here, and then we continue with the training loop that I already showed you so now. Let’s save this and let’s run this. So clear this and say Python C and N Dot Pi and hope that this will start the training. So oh, yeah, one thing. I forgot, of course, is to call the Super Init. So never forget to call super, and this has to get the confident and self and then dot underscore in it. So let’s clear this again and try this one more time, and now this should start the training, so I don’t have GPU support on my Macbook, so this can take a few minutes, so I think I will skip this and continue when the training is done. So see you in a second. Alright, so now we are back. Our training has finished. And if we have a look, we can see that the loss slowly decreased and then we have the final evaluation. So the accuracy of the total network is 46.6% and the accuracy of each class is listed here, so it’s not very good and this is because we only specified for epochs here, so you might want to try out and more epochs, but yeah, you now you should know how a convolutional neural net can be implemented and. I hope you enjoyed this tutorial. If you enjoyed this, please leave a like and subscribe to the channel and see you next time bye.