Transcript:
[music] hello! All my name is Krishnak and welcome to my vlogging channel. So guys, we’ll continue the discussion with respect to advanced convolution neural network architecture, and in this particular video, we are going to discuss about the VGD16 architecture. In short, I’ll also be discussing about vg16 and VGD19 that are two variation with respect to vgg net. You know, and in my previous video? If you remember, we had already discussed about Alex Net architecture itself. Now this particular architecture and this particular model that we usually develop in CNN is very, very efficient And I have seen, according to my experience, right. If I compare some advanced architectures like resnet, even vg16 has performed better than resonant in some of the scenarios, so it is very, very important that you understand the whole architecture and how it works. We’ll also try to compare what is the basic difference between vg16 and alexnet, and then we’ll try to see that why vg16 performs better. What are the disadvantages in Alex Neta that we are trying to overcome in with the help of vg16 So all these things will be covered in this particular video. If you’re new to this particular channel guys, please do subscribe the channel and press the bell notification icon because I will be definitely uploading videos every day and I upload. Nowadays, I’m making the speed to two to three videos every day itself, So I hope you’ll like it, so yes, please do. Subscribe the channel, okay. Now let us go ahead and try to discuss about the VGC 16 architecture now to begin with guys. Let me just consider this suppose. This is your whole architecture. The source of this particular image is from HTTP researchgatenet. Okay, so the whole credit goes to researchgetnet and over here, you will be able to see that the image is getting passed through a convolution layer, so this convolution layer the count that I would like to keep is two because there are two convolution layer, right, one thing that you need to note over here is that the image size That is going 224 cross 224 cross 64 This basically means this is your height and width and, uh, sorry. The image size will be 224 cross 224 cross Three, right three is basically a RGB channel, Then when it is getting passed through this convolution layer. We are getting this don’t worry, I’ll just discuss about. What is the kernel size? What is the filter size? How many number of filters are being used in my next diagram? Okay, but in short, we’ll try to understand so first of all, we have two convolution layer. Then we have one max pooling layer. Okay, one max pooling layer, then again we have two convolution layers. Okay, then again, one max pooling layer now in this particular scenario, right, I have three convolution layer, right, then we have max pooling layer as one then again. Three convolution layer, then again. I have one max pooling layer. Then this is my three convolution layer, then again, one Max pooling. Then I have fully connected layer, funny connected layer and my output is thousand. Why thousand? Because we, this whole model is basically trained on image net data set, right, so where you have thousand categories and there are millions of images and definitely this, uh, uh. This has actually won so somewhere around. I think it was in 2013 or 14 This whole paper had actually come. We’ll also see the research paper. Don’t worry about it now with respect to this, I will. I can basically write. Initially, we use two convolution layer, then one max pooling then two convolution layer Then, uh, we had one Max Pooling and three convolution layer one max pooling then three convolutional layer one max pooling three convolution layer one max pooling. I guess I have written three one. Three one, three, one perfect, and then I have one fully connected, fully connected and finally fully connected, which is my output layer of thousand categories. So this is the overall architecture one two important things that you have to see in this architecture. You are providing your image of 224 comma 224 comma 3 This is based on the architecture, But you can always change your image whenever you are implementing this, you can give 50 cross 50 cross 3 It depends on the image quality that you actually have right so, uh, now. This was a basic architecture still. I did not decide like. How did we get these values? How did we get these values that I’m going to discuss over here in my next architecture in my next diagram basically, and you’ll also see that. What is the filter size that we will be using? Okay, that filter size? I really want to mention it over here so that you will be able to do the calculation and remember, guys. This is a very, very simple model. Okay, very, very simple model, very easy to understand model. You can also remember it. If you just know this particular patterns, the main thing in Alex Net. What is the main problem in Alex Net? You will be seeing that in Alex Net. If I take this particular example, uh, you you you see that sometime? It is using, uh, 227 tokar’s 227 cross 3 Sometimes it uses convolution filter or filter size 11 cross, 11 and stride is 4 over here. Then you have suddenly 96 kernel, then suddenly in Max pooling, you have 3 cross 3 then again, your convolution layer says 5 cross 5 so here, you see a lot of variations, right, and it is very, very difficult for people to just remember all these things, right and also becomes difficult to understand if there are so many changes in this, so what they have done Is that in order to overcome this right, they have come up with vg16 and in VG-16. It is very, very simple. Okay, I’ll just mention you about vg16 in vg16 It says that all the convolution layer. Okay, all the convolution layer. Okay, in this, remember, The filter size is three cross three. Okay, and then here, you basically have a stride of one and your padding will be same. What does this basically mean? This basically means that when I’m passing through this convolution layer, right, This image is 224 cross 224 cross 3 Right now, whenever I pass through through this convolution layer, I need to get the same output with respect to the previous image. Only this will change, right. This is the number of kernels, but here, in short, you are applying a three cross three filter, okay, and the number of filters that you are using is 64 that you need to remember. So when you apply this, you will be able to get this, but whenever you do Max pooling in Max pooling because there are only two things that are being used, right, Convolution layer conversion and layer with value and max pooling layer with Ray Loop, right in conversion. You have the filter size. You have the filter size of 2. Cross 2. Okay, and your stride is actually 2. Okay, so this is the property of convolution layer. This is the property of max pooling layer so every time whenever we use stride of 2. Here, you can see that this 224 when we are passing through the convolution layer. Sorry, max pooling layer of this one, the output that you will be seeing will be the half of this. Okay, half of this. So how do I calculate? You just have to use this formula, right. Uh, 9 n minus, or I can also write N Plus 2. P minus f divided by, uh, the stride right plus 1 Just try to use this formula and this. I have easily explained in my Alex Net architecture, right in this particular case. What is my n n is nothing. But 224 padding is almost same. Nothing is there then I will be using minus F. What is my filter size? My filter size is 2. Divided by what is the stride stride is again 2. So it is nothing, but 222 divided by 2. Right, if I divide this, this is nothing but 1 1 1 This plus 1 is still there, so I’ll add this plus 1 and I’ll get 1 1 2. So that is the reason why I’m getting 1 1 2. Over here, right, So that is how each and every competition is done. Now you see that in this convolution layer again 3 cross 3 is there stride is 1 but padding is same so even though we pass one one two images like this, we’ll be getting one one two images itself, right, and in this particular case, one two eight is basically the kernel that is getting passed over here again, the filter size with three cross three. Only the number of number of filters are getting changed, right so after this, if we are again passing through the max pooling layer again when I apply this particular formula, it will be one one two, then for that. We will be getting output as 556 cross 56 Now again over here, you can see that number of filters is basically changing initially. We had filters like 64 here. We had 128 here. We had 256. Right here. We had 256. Now. Similarly, in this Max pooling again, This 56 will get reduced to 28 Right and here number of kernels is 5 to l again whenever we are applying, Max. Pooling with this filter, size 2. Cross 2. And stride is equal to 2. You will be seeing that your image size is getting reduced by half, okay. This is getting reduced by half and in this particular case. The number of kernels are not changing again when you apply Max pooling so you can see that this divided by 2. I I’ll not put this divided by 2. But instead I’ll apply this particular formula and then from 14 I’ll be getting 7 cross 7 and this will basically be my kernel size, right, and this is pretty much simple, fully connected layer. If you multiply all these things 7 cross 7 cross 5 12 we are going to get 1 cross 1 cross 4 0 9 6 And finally, you will be able to see that in this. Also, you have 4 0 9 6 in this. Also, you have 4 zero nine, six and finally, the number of output is basically based on the output of the image net, right. This is basically my image net. So this is the output. This is the thousand outputs that you are getting, but two important things. If anybody ask you, what are the advantages when compared to vg60 which I am going to discuss sorry when compared to Alex Net and the second question is that what is the main thing you’ll be saying that in each and every convolution layer, the filter size is three cross three. The stride is one and padding is same, but in the case of mass Max pooling layer, you have filter size as 2. Cross 2. And the stride is equal to 2. So every time when you pass through a max pooling layer, the image size will get reduced by half. Okay, and that is how the whole architecture actually deals with right, and this is pretty much important now. What, I’m going to explain you that how, uh, how it is better than the previous architecture of LX net? First of all guys, there are many layers, Okay, in this particular thing, we have 16 layers, okay, in VGD, 19 will be having 19 layers so can again. There’ll be a slight difference with respect to, uh, how the architecture will behave. Because I’ll just show you in the coding. You’ll be able to see everything in the coding like how the architecture looks like. You know, when we’ll be seeing the model summary, we’ll be able to see this. Let me drop this quickly because I really have some more things to explain because you need to understand after Alex and trust me, guys. Alex net architecture was the architecture from where this advanced deep learning. CNN came. People started experimenting with different different convolutions. Uh, Max, pulling layers and all okay now. Let me just go to the next slide now. In this this slide, you will be able to see that we are passing three cross three convolution layer 64 three cross three, the same thing. Whatever I have explained the same thing. Finally you will be getting, uh, the thousand outputs. Uh, this is the architecture and remember in vgg net architecture. You have two variations. One is Vgg, Vgg 16 and the other one is VGD19 again, some architecture chain, so this vg19 may perform better than vg16 but, you know, just just a small accuracy. Gap will be there, not not much Now. Let’s compare with alexnet and vggnet now. We saw that in Alex Net. Sometimes you use 11 cross 11 Sometimes you use 5 cross. Try some suddenly you’ll apply. Padding is equal to some value. Suddenly you’ll say stride is equal to 4 right so this. This is really really difficult randomly because they have experimented with all these things. They have put a lot of efforts in experimenting so they are basically used, but with respect to vgd net, you have understood. Yes, my convolution layer, right, It will be having three cross three filter size. It will be having padding is equal to same, and it will be having stride is equal to one right and by applying this hole and similarly with Max pooling layer. Right, we did the same thing you had to cross through. You had stride is equal to two, right. This is the thing and one more thing that you see that here. The layers are only seven, okay. These are really seven, and if you know, guys. If you’re using relu, even you create a deep, deep convolution neural network, your really will take care of the vanishing gradient problem, right so it that I have already explained in my complete deep learning playlist, so this is also a deep, uh, convolutional neural network, so you will be able to extract more parameters from the images, right and remember the fast last layer is also called as soft Max because that softmax will actually be giving you the thousand categories. If you have two binary categories, you basically use sigmoid. But in this particular case, it is soft, Max. So if an interviewer asks what is the difference between VGD16, Vgnet and alexnet? You should say that, okay. LX net has less number of layers when compared to vgnet. And you know that both this layer uses relu now, even though we create a deep convolution neural network, definitely, there is no chance of a vanishing gradient problem because we are using relu, right because of the vanishing gradient problem. We sometimes use relu right to overcome those right, and then you can actually explain about this, and you can also say one point in Alex Net. They have randomly not randomly basically after experimenting, they have selected some pixel filter size, Right, what if if you have a general architecture and we know that, and it is also very, very easy to remember all these things right, so these are the some of the things that you can compare with respect to LX net and vgnet. Now once you have understood this architecture? The next thing that you need to understand definitely is coding right unless, and until I don’t show you coding. Then everybody will shout at me. So it is better that I show you. The coding part, also okay and coding part is very, very simple, guys. You just have to believe in Kera’s. That’s it, okay. You just have to believe in Kera’s, so ill! Just try to show you, so I’ll minimize this now. This is the research paper guy’s. Very deep Convolution neural network for large scale image recognition. Now, remember, coding is pretty much simple one. Is that you can you can, uh, write with your own, okay. You can write the whole by just seeing the architecture. It’s also showing you in the Alex Net. How you can write your own. Similarly, you can write it, but this all models are already presented care as you can reuse the weight you can see you have exception. V3 169 16 VG, 19 resonate and all so through Keras. If I go to vg16 this is what you just have to use, you know, in order to download the weights of vg60 So let me go over here. Now, in this example, what I’m actually going to do? Is that guys? I am going to solve a problem statement, which is called as cotton disease. Okay, so I have these four categories now. Remember, uh. I think I’ve also taken this in live class, but I really want to show you This particular example, because it will be very, very important to you. So these are my images, right. This is these are actually all my images you can see over here with respect to different different categories, and it not be this problem, Take simple cats and dog data set from kaggle. Okay, and you can basically, uh, implement your own. So this is the image classification disease image leaves classification that I’m actually going to do over here, remember? The output is having four categories now. How do you download the weights? I’m actually going to show you ill. Go going to show you for VG 16 and VG 19 using Keras Again. This material will be present in the Github now. Don’t worry about this particular code, guys. This code basically says that what memory fraction of GPU you have to use because I. If I if I show you, I have a GPU, which is called as NVIDIA TITAN RTX. So let me just show you Nvidia dash SMI. Okay, So if you see over here, we’ll be seeing that I have Titan rtx. And, uh, you know, all the other things. What is my cuda version? Everything 24 gb, uh, vram. I actually have how much is being accessed and all these things are available over here. Now, first thing, I’ll just execute this. This is just to allocate. How much memory? Uh, I want from the GPU to be used, and then you’ll be see able to see and remember guys. This tensorflow version, right, this tensorflow tensorflow version, right if I print it TF DOT underscore, underscore version underscore underscore right. If I execute this, you’ll be able to see that I’m having 2.2.0 okay, so that is the reason if you have less than 20 just remove this tensorflow from starting, okay, start from Keras Dot layers and all okay, so here, I’m putting input Lambda, dense flatten. Then I have model then you can see that again. Just if you go and see over here from tfkerasapplicationvg16 so similarly, I’ve written away from Kerasapplicationvg16 import vg610 right, so I’ve done this. You can also use the image data generator for data augmentation, and finally you also have to use some sequential model Now. Once I execute this No module. Keras, Where is Keras? Which line is this? Okay, So I made a blunder over here, so I showed you. If your tensorflow is less than, uh, 20 at that time, that particular thing will work, so I’ve executed this right then. I am given the image data set as 224 comma 224 Why did I give it? Because in vg16 you’ll be expecting as 224 comma 224 as a good practice. I’m giving that same size again. You can give your own size. Okay, then I’m giving my training path and my test path. So this is my train and test now. This is the most important statement. Okay, I’m just going to write VGG16 library, okay, as shown below to this and this. Okay, now when I’m using, I have imported this vg16 right so here I’ll be using this vg16 The input image shape is nothing but image size plus 3 So this basically says that 224 comma 224 comma 3 3 Is my rgb channel. Always remember if you really want to reuse the weights. This weight’s parameter should be given as image net. Please do remember this. This is very, very important to understand, and then we have include top is include. Underscore top is equal to false, okay. This basically says that remove the first and the last layer because we definitely know that my first layer will definitely have 224 comma 224 from the weights itself, but I sometimes people want to put their own input image size right apart from the last layer, which will definitely be thousand categories. But my problem statement in this case has four categories. Suppose if you are really developing your problem statement, you want to develop a classifier for cats and dogs. You may be requiring two categories, right, so at that time, you have to put two categories, two dense layers in the output, So I have removed this so here you can see that it is downloading the model of vg16 quickly. Okay, it has downloaded it from this. Uh, url itself, right, So that is where your model is. Basically done now, this fun. This code is very, very important in this code. I have basically said that. For layers in VDG 16 dot layers, layer dot renewable is equal to false. Remember, we are using the existing weights of image net, so we need not retrain the weights. That is pretty much important, right, only the training, which should happen in the last layer, not in this middle layers. Okay, so for this, you can write like this, and you can basically make the layer strainable as false now. How many number of output classes I have written a generic code guys so that you get a number of output classes, so we have used a glob function so here you see that once I execute this, I will be getting the folders. Okay, and if I execute it over here. What is in my folders? My folders will give me the path of everything now. If I go and calculate the length of folders in this list, it will basically give me a four output right then. Obviously, you know that from the architecture from the architecture, we saw that we have to flatten this layer. After this right, so flattening, the layer is required so for flattening the layer. What I’ll do is that I will be using. Sorry this I had written vg16 Okay, you just have to write v16 DOT output. Once you execute this, and then you can see the length of folders. Okay, uh, yes, vg16 dot output length of folders here. Also, I try to see the length of folders, right then. Uh, you can see over here. I’ve used the length of folders as my output layer because the length of folder is nothing but four that basically means four output categories is put inside the dense layer and the activation is basically softmax function. If you don’t remember, guys here is my activation softmax function that I have applied, okay, so this will get applied in the last layer, which is pretty much important, and I think I’ve taken all these things in my live sessions also, and finally you get your input and finally you create your model as this where your inputs is nothing but vg16input your output is nothing, but whatever dense layer you created in this prediction is basically output. So execute this, then once you see your model summary, so this is your whole model summary. So after executing this code, you can actually see your model Dot summary. Remember to check the last layer that is having four nodes. Okay, after this you can do compile, and these all are simple operations. You can read the data. You can basically read your training data. This is basically your data augmentation techniques that you want to apply. Remember data? Augmentation should not be applied to test data. Okay, and this code you can see in the Github, right, I’ve already given in the description. So, uh, this is basically your training data. You’re reading your test data. You’re basically reading, and then you can actually start running this here. I’ve given as Epochs as 20 If you see after the end of training, I’m able to get somewhere around 69 I just did 20 if you increase it to 30 and 40 you’ll be able to get this. This is how your graph was, right. It is in increasing order, and then you can also save this model so instead of writing resonant Because I don’t know why I have written this resonant. But this should be vg60 vg16 model. You can save it In the h5 file. It will be getting saved in the same location. Now you can take your own image example, guys. Just follow the steps. I think you will be able to understand each and everything right, so, uh, okay, now one more thing about this. I told you about vg16 now. There is one more architecture of VGG 19 okay, so in order to use vg19 just remove this instead of writing vg16 just make it vg19 and just make it vg19 so once you executed it, you’ll be able to see it’ll work absolutely fine over here. Instead of vg16 just write it as vg19 change. This variable name over here, make your layer trainable as false and put all the information that you want to put in this right, based on the changes that you have actually done, that’s it. That is how VG 19 will actually work, and this is a very, very common architecture itself. So I hope you like this particular video. Please do subscribe the channel. If you are not already subscribed and guys, these are important to understand. These are some advanced CNN techniques, uh, in the upcoming videos, you’ll be seeing that I’ll be discussing about Resnet inception mobile net, and there are many things you can see over here, resonate different different versions. Are there like we discussed about VCD60 and vg19 I missed about exception. Exception, also, I’ll try to include. We have mobile net mobile and v2 lot of this transfer learning techniques. Is there we’ll discuss this also so? I hope you like this particular video. Please do subscribe the channel. I’ll see you all in the next video. Have a great day. Thank you all bye.