[MUSIC] In this video we’re going to walk through how to code the VG architecture in Pi Torch, but let us start with understanding how it works first. So here’s the VG paper. We’re not gonna read entire paper. We’re just gonna look at the the part where they mention the implementation details, so specifically the sentence the convolutions ride is fixed to one pixel. The spatial padding of calm player input is such that the spatial resolution is preserved after convolution, ie. The padding is one pixel for 3×3 comm layers. So if the kernel is three by three, the padding is one and destryed is one and also Max pooling is performed over two by two kernel with a stride of two, and then they have some different VDG architectures, So really, there are several different. VG architectures And the one that we’re gonna focus Most on is VG 16 because that’s the one that’s most popular. I’m actually gonna show you at the end of the video of how to implement each of them, so you can choose, but the one we’re gonna focus on. Is this one and why it’s called? VG 16 is because it has 16 weight layers. So if we go through just briefly, they say Comm 3 the 3 here means that it’s a 3 by 3 kernel, right, and as we saw above in the text 3 by 3 kernel is associated with a padding of 1 and a stride of 1 and that’s the nice thing about the VG architecture is that using those parameters the the image resolution or the size of the the number of features always stay the same, so the it’s the same convolution and the last one here is the number of channels or filters that’s the output channels, so we input our Gbs. We have three input channels. After the first layer, its 64 output channels 64 and then 128 256 so we see that the number of channels is just doubled after in this case to comm layers to come layers and then doubled after three count layers. And, yeah, so in between those blocks. I guess you could call them. We have a max pooling and the Max Pooling is with a kernel of 2 by 2 and stride of 2 which means that the the the size is halved, so let’s say we have to 24 by 2 24 in like image that we have as input. Then after each of these complex, it’s gonna stay exactly the same, so it’s gonna be – 24 – 24 after this, but after entering after leaving the Max Pool, it’s going to be hundred and twelve times 112 and similarly here it’s gonna be under in 12 divided by 2 etc. So after doing all of those comm layers, we go to another max pool and then we have three fully connected layers. Yeah, so that’s the basic of the architecture. Let’s go back to the code and try to code this from scratch, So I’ve summarized the VGG architecture like this, where the integer values represent the output channels after performing that come layer, or if we write M, that means that it’s a max pool. So after doing all of those, this is the comp layers part of the network. Then we do a flatten and we use three linear layers. So what we want to do now is we want to essentially create a class which we’ll call VDG net will inherit from the NN module. We’ll define an init. Let’s say we also input the number of in channels and, yeah, the number of classes. I guess the number of classes that we’re going to use and the first thing that we’re gonna do is call the Super Vgg net in it. So in hair, we run the innit of the parent method and what we want to do now is essentially we can’t create. We can do like we’ve done in the previous videos like self calm one is. Anand, calm cetera and specify all of these. What we’re gonna do is actually something more clever. I think, and it’s going to generalize better and we’re going to be able to implement all of the VG architectures, and the code is going to be cleaner as well. So we’re gonna have a forward as we always have, and then we’re also gonna create another function. That’s we’re gonna call create complex and we’re going to send in the architecture, so the first thing that we want to do here is is defined, so we call self that in channels is the number of in channels and we can set them to three and a thousand by default. Yeah, so then what we want to do is that we want to call. Melissa stuff, like, um, layers will create all the comp letters from this function. So we’re gonna call this with our VG 16 the list of how to construct the comp layers since really, all the information is stored in this this array here, right since we know that it’s always going to be a three by three kernel with a padding of one and a schreiter one. It’s really that information is always the same, no matter what the output channels are so we want to do. Here is that we’re gonna we’re gonna call layers to be an empty list. We’ll set the in channels to be self dot in channels and then we’re going to go through each in that architecture, so we’re going to go through each in this list, so let’s call for 4x in in architecture and we’re going to check if the type is is integer. Then we know that it’s going to be a comp layer, right, so we’re gonna do first Is that we’re gonna call the out channels to be. X right, we know that it’s gonna be first in channels is three, and then it’s gonna be 60 for our channels, and then what we’re gonna do is, and we’re gonna add layers plus equals, and we’re gonna set all of those comm. Just add them to layers with this for loop, so this is gonna make the code a lot cleaner and then we will set in channels to be in channels, then we’re gonna set out channels to be out channels, then we’re gonna set kernel size to be three by three, or we don’t have to write three by three, but yeah, we can do that to me. Be clear and yeah, and then. I stride of one and a padding of one. Then you’ll see what we’re gonna do is actually something that’s a little bit different from the there. You you don’t have to include us, but there’s really no reason why not to do it since like, the only reason why it’s not there in the VGA architecture is because it wasn’t invented at that time, so we’re gonna include a bachelor layer and then we’re gonna do a relly right, It’s a pretty standard convolution back from Relu. You can remove this one. This is not including the original vgg paper. We just included here because it’s going to improve performance. Yeah, yeah, let’s see. Is there something missing another princess here? And then what we’re gonna do is we’re gonna say if this is the outer channels currently, right. X is the outer channels. Then we need to input the in channels for the next color that we’re gonna create. So the in channels now needs to be equal to X to update in channels for the next layer. The in channels are going to be 64 if we’re considering the first element in the list, but if it’s not a integer value, we know that it’s a string, so I guess we could just do else if X equals M then all were gonna do is add a a max pool and we know that the kernel size is 2 by 2 and destryed is 2 by 2 and, yeah, the only thing we need to do in the end when we return, it we’re gonna call N N Dot sequential, and we’re gonna do star layers, essentially unpacking all that we stored in the in the empty list, and the package is gonna create a entire block of all of those come to the bottom rail. Ooh, that we’ve created and all we want to do here, then is yeah, so we’ve already called create complex and guess itself, calm, create, calm layers and the only thing we need to do more Is we need to create the the fully connected layers, right that that was the calm part, and we have the flatten and the linear layers left, so we’re gonna do self dot fully connected, let’s go! Ff Cs, and we’re gonna again Use an end of sequential, so NN sequential is like when we’re having a lot of when we’re using a lot of like Anand linear and then calm. It’s, uh, it can like, make it more compact. If we just include them in a and N dot sequential case, we can use a nonlinear and all we want to do now is just create the linear part, so we have. N and not linear and the number of channels that we have is gonna be 512 and what we’re gonna have left of the the image is a seven by seven, yeah. I guess you could. I guess we could calculate it quickly. So we have. We have 224 and then we have one max pool. – max pool three max pool for Max Pool, five max pool. So we have to raise to 5 which is 7 and the the next is going to be 4096 This is just what they chose. Then we’re gonna do an under tray loop and then dot drop out. I don’t believe I mentioned that they used drop out in the when we went through the implementation of in the paper, but they usually drop out as well in the linear layers, and then they have another linear 4096 4096 and another relu and another drop out and another linear, and yeah, we’re going to call the last one to the number of classes. That’s that’s it, We create the calm layers, and then we knew the fully connected part, and we want to call them, so we want to do. X equals self calm layers of X, and then we want to reshape it because now we want to flatten it to the to the linear part. So what we’re gonna do is just X equals X add reshape X shape of zero comma minus 1 and then again call on that flattened part. We’re gonna send it to the fully connected. We just want to return. X, okay, so now what we want to do is that we want to check that. It actually works, so let’s hope we call it model to be. Vg net of. Yeah, we can set in channels to three NUM classes to a thousand and we can do some torch that random and let’s say that we generate a single image of in Channel Three, – 24 – 24 And then we do print model. X dot shape and so remember. Well, what we do now is that we just ran generate some random data That’s gonna have the form like having an image and in this case, we send in a single image and then we just want to print the shape and we want it to be 1 by 1000 in this case. So let’s run this. This actually might be slow and the. VG is kind of, okay, pretty fast so 1 by 1000 and that’s what we expected, So this architecture is really really, it’s kind of large. It’s not large by today’s standards. I guess, but still, if you don’t have a great CPU, this might take a while. What we can do is that we can set it to. Cuda, if CUDA is available else CPU and then we can just set it to the to 2 device and dot to device. I think that should make it a little bit faster. Perhaps CUDA thought is available and you. Oh, yeah, so we can to torch dock. Cuda is available and that should also work Now. Its run on the GPU, lets. See, so one thing now is that we’ve only implemented it for VDG 16 right, but the type of like how we implemented, it is very general, so all that we have to do and let me get that piece of the code and all we’re gonna do is just change one thing and it’s going to be a general implementation, so I replace this part here so instead we have a dictionary, which includes VGG 11 13 16 and 19 and then the flattened part is the same for all of those architectures, so the only difference between VG 16 and 19 is that they have more of these calm layers and that’s represented in the array. So that we want to do now is just change this part we can just call. VG types of VG 16 and this should work exactly the same. Yeah, and we can also change this to VG 11 VG 19 digit 13 depending on the one that you want to use. Hopefully this was a clear. If you have any questions, then please leave them in the comment. Thank you so much for watching the video, and I hope to see you in the next one.