Batch Normalization Pytorch | Batch Norm In Pytorch – Add Normalization To Conv Net Layers


Subscribe Here





Batch Norm In Pytorch - Add Normalization To Conv Net Layers


Welcome to deep blizzard. My name is Chris. In this episode, we’re gonna see how we can implement batch norm in our convolutional neural network using Pi George. Alright, so I want to implement batch norm in the neural network that we’ve been using throughout this course now if you saw or if you’re following the course, and you saw the last couple of videos, they’re kind of prerequisites to this. So the two videos ago we did data normalization and we saw that in order to normalize our data before passing it to the network that we need to first calculate the standard deviation, and then we calculate or first, we calculate the mean then we calculate the standard deviation and this allows us to normalize our dataset using those two values and a process known as standardization. This is where we calculate a Z value or a Z score by taking each pixel and subtracting the mean from it and then taking that result and dividing by the standard deviation so at the end of the video on normalization. I mentioned that the standardization process that we went through to normalize our data as input is also the same process that’s used in batch norm, where we normalize not the data that’s coming in or the input data, but we normalize the activations after a particular layer, so what the idea is that we want to normalize the data coming from one layer that way it’s normalized when it goes into the next layer, so those processes are very similar. So I’m here in The Bachelor on paper where when Bassdrum was created as a method? This is a particular particular paper for that, and I just want to show you The calculation that they cite, which is the calculation that we implemented before when we normalize our data using standardization. So they have of a little kind of sequence of formulas here. The first one is calculating the mean. And so what we can see here. This is what we did. The sum of every pixel divided by the total number of pixels that gives you the mean, and they’re calling this Mu Mu Sub B or beta. I think that’s a looks like a B or beta, but Mu Sub B and we take that, and we use that in the calculation of the what they’re calling, They’re calling it the variance. I call this. The mean squared error, so we what we do is. We sum the square of the difference between the each value, each pixel value and the mean that was calculated before then we divide that all by the total number of pixels, then finally, what we end up with is what they’re calling to normalize and this is the value X for each pixel X, and we subtract the mean, and then we divide by. This is the standard deviation, and then the difference between between normalize the normalization that we saw before and Bachelor arm is that there’s also these parameters, so there’s scale and shift parameters and these are learn about parameters that exist inside a batch norm layer. So if you want to learn more about these particular parameters and the Batchelor process be sure to see the Batchelor video and the deep learning fundamentals course, where we’ve created a dedicated video that goes into discussion of that in this video, we’re gonna focus on getting batch, Norm added to our convolutional neural network that we’ve been working with throughout this course, So let’s do that now, so as we’ve been doing each time. We kind of create new variations in our process. We are testing the previous version of our network or whatever parameters we have set with the new version. So in this case. I want to create two networks that one with the old way and then second one I wanted to have bachelor matted and then we’ll run both of these to the training process and see what the difference is. All right, so what we’re gonna be doing here? If you haven’t seen the previous video, we are no longer using the class definition of our network we’re going to be using the sequential way of defining our network, so we discussed everything about the how to use the the sequential class to build models rapidly and on the fly in the last video of the course so now we’re gonna use the the sequential model module to do just that so we can test two variations of our network one with bit batch norm and one without alright, so let’s just take a quick look at our network. This network is gonna be defined sequentially. The first layer is a conf layer. Then we have a rel u. Then we have a max pool. Then we have our second conf layer another value. IMAX pool, then we’re gonna flatten the output coming from the last kind of comm block with a max pool and then we’re starting the the flatten operation. We’re gonna start at the first dimension, and this is because we don’t want to flatten our batch. We want to flatten each image within the batch, so if our batch layer or a batch access is here, then we’re gonna flatten every image in the batch, but we won’t change our batch size, and that’s what this start. Dims, dim function indicates. Then we have a linear layer followed by our. Lu followed by another linear, followed by value and then finally our output layer, which is gonna output 10 predictions because we’re using the fashion in this data set that has ten prediction classes. So let’s run this and initialize this network, All right, and then, actually, something that I noticed was that we should what in the world, all right, something that notice is that we should set the seed, so I’m gonna do Torch Dot manual seed and we’ll go with 50 That’s just arbitrary well. This is going to do is make sure that the the weights that are created randomly for both of these networks should be the same. I think it’ll still work. Even though, even though, these have additional. This one’s gonna have batch norm. I think the weight should still be the same, but either way this won’t hurt what this is gonna do is make sure that the random numbers that are generated for the weights are gonna be the same, so then down here for our second network second network, which I’ve already I’ve already ran the cell, But the difference is is that we’re adding batch alarms, so we’re adding Bashan alarm here and then we’re adding batch norm down here, so we’re gonna add 1 1 batch norm right after the first con flayer, so when our network when our when our data comes in its normalized, and so it’s gonna hit the first con flayer as normalized data, then it’s gonna be transformed through this column value Max pool. So then we’re gonna normalize it again using bash norm, and so at this point, the this is how we use bash Norm. All we do is we we access the Batchelor arm 2d because we’re dealing with images and we just say so notice the 2d there, and then we say the only thing that we need to say here is how many input features are coming in. So the number of input features to a 2d batch. Norm is gonna be the number of out features or out channels in this case, which would be 6 coming from the conf layer, so we have 6 in features coming in to this bash norm layer and so what that’s gonna do is normalize the data and then those 6 normalized channels are gonna come out and come into this conflate as normalized data and then those to the scale and what was that scale and shift parameters are also going to be inside the batch norm layer and they’re gonna be being updated throughout the learning process, so we do a batch norm. Then we do our column, rel. You Max pool, then we’re ready to flatten, and then we can do a linear layer and then a rail. U and then we can do our other batch norm. So it’s kind of like you can put. You can do batch Norm. After every single layer or you kind of sprinkle it throughout, so in this case, we’re gonna sprinkle it throughout if you want to try it, adding more batch norm into the network. Then then give it a shot, but for now we’re just gonna add in two places, so then here the this is a 1d because we’re dealing with one-dimensional tensors at this point because we’ve flattened out our channels and we’re gonna pass in the 120 features that are coming from the previous layer. So that’s here. This 120 is gonna get passed into here, which then feeds through to the next linear layer. We come in with 120 we come out with 60 and then we go into the output layer, passing those 60 features in, and then we finish up with our 10 predictions, so this has already been ran the cell, so we’re ready to just jump down and start working with this thing. Both of these models. Alright so to get set up for training, we’re going to create a train set and let’s see that. I want to, yeah. I didn’t want to go ahead, so we’re gonna create this train set here. This is a non normalized train set. The only reason we’re doing. This is so that we can recalculate our mean and standard deviation values, so that’s? What we’ll do here? If you want to know more about this process, be sure to see the previous episode in this course to create a train set normal and this is gonna be a normalized train set with because we’re gonna pass the mean and standard deviation into the normalize function or the the normalized transformation, which we’ve seen in a past video. When we debugged, we saw that basically, this and normalized transformation all boils down to taking those values there for every pixel in the data set and subtracting the mean and divided by the standard deviation. So that will happen. Indeed, it will happen, all right, so now, as we saw before to hook into our testing framework that we’ve built throughout the course or that we’ve been developing. We need dictionaries so here. I’ve got two dictionaries, One dictionary of train sets, which we’ve already seen this. You this particular dictionary in action and then we have a new one, so this is cool. I don’t know you might have noticed this in the last episode, where when we actually set up the Trainset’s dictionary? Let me know if you came on to this in that episode, But essentially, it’s this same process is gonna allow us to work with networks, so we we created a trainset dictionary. Well, we can also create same thing. Same process with networks so we can run multiple networks through our testing framework. So here we have network 1 which is here. This is just the name of it that we’re gonna use to access it, and then here’s the instance again. The name and the instance. Alright, so let me show you how we actually get this to work so up here. We set up our run configurations now. This allows us to test all different configurations, and if there’s just one value, then for every run, then that’s just gonna be the value that will be used, but if there’s more than one value, so say, like we wanted to do two learning rates, then we could do like zero zero one and what this would do, it would run every possible. Combination of runs first with the first learning rate and then with the second learning rate, one of the things about batch norm that allows us to do is to train with larger learning rates, so we could test with a larger learning rate to see the effect. But here we’re just gonna keep it at 0.1 All right, so down here is something else. That’s cool is that we can just say what I want to do Is. I’m going to get the keys inside this dictionary, this network’s dictionary, and I want to try all the all those values, so what that means is that we can keep putting networks up here all day long, and then they will be injected into this run this testing framework, so we have two two keys network one that work two, and then now they’re gonna be made available inside of our runs so now. The only thing we have to do to make this process work is come down here and redefine our network so before we just we’re creating the same work every time every run we create the same network using a network class. Well, now we have various networks that we’re going to try coming from this dictionary. So what happens is we get a run and then that run comes in, and it’s going to be the active run or the current run and that run is gonna have an Associated network with it and that network is a name. This name here. One of the names coming from this list and so depending on what run we’re on, we will be using Network 1 or Network 2 and then we put that network on the target device, which is gonna be CUDA for all runs and that’s it now. We have injected networks or the ability to test multiple networks into this framework. Cool, all right, so, actually, yeah, let’s just change that. Let’s say no match, Norm, and then for this one, we’ll call it national arm. OK, so then that’ll make it clear for us in the in the output. So, OK, let me re reset that, and then on this code and all right so. I’ve got an error here And the error is size mismatch. And so I’m gonna go and just look at my network definitions and something that jumps out to me right away Is I see 12 by 4 by 4 then I see 12 by 20 by 20 here, so this needs to be 4 This was left over from an experiment that I was doing before recording, and so we just need to reinitialize the two networks. The training set will just rerun all this and well rerun this code. All right, so we just finished and here are all the results in order to see these results. I want to go grab this line from a previous. You know, this one here didn’t have to retype this, but basically, this is gonna sort the results by accuracy so here, let’s do that, so we’re going to get the run data from the run manager, and then we’re gonna sort the values by accuracy and descending. Okay, so basically, they’re kind of what you can see. Here is that The batch Norm Network had gotten as high by the 20th epoch of 93.7% and pretty much smoke the no batch norm network, so the no batch norm network got up to 91% and that was by Epoch 20 So we see the bachelor arm was already higher than that at epoch tin, so we typically refer to this as much faster convergence, so the model or the network converged on its minimum much faster than when it had batch norm versus when it did not have batch norm. So this is something that batch Norm can give us and why it’s powerful. Plus, it got us all the way up to 93% after pretty much 94% after 20 bucks. Now there was one thing that I had to do. I got an error when I first tried to run this. I got two errors the firt, the second one. I showed you the first one I had to go debug. And it was because I changed. These names and these names here are longer than the previous name that I had and what that did was. It caused problems in the in the in the program or in the class that in our run manager class. So essentially, it was with the tensor board portion of this particular class. I had to in the went instant. When the tensor board instance is created, we pass a comp ass. A comment to its constructor and that comment is used to name the file. The 10 Suppor file and the issue with is is that it uses all these names to construct. Its it’s a file name and this, this particular name made it too long of a file name and so essentially what I did Was. I removed that comment because I’m not using tensor board anyways. Whenever we did the tensor board lessons, I mentioned that Tensor board is kind of weak in regards to querying the information and that it would break down at some point as a viable solution to really query all the information that’s possible. And so at this point, we have quite a bit of info here. We’re changing networks and we can really build on top of this test framework. Quite a bit more and so, really. I’m not gonna try to find a solution to the tensor board issue. If you want to, and you find an issue or find a solution to it, then put your solution in the comments for what could be done there, but I’m not going to do it because I’m not using Tensor board. I would just query that basically query this table like we have to just get a birds-eye view of like we could do. Probably I don’t know thousands really any number of tests and then just query through the information tensor boards just not going to do much for us there so anyway, if you run this code and you get an error, then go up into the run manager and remove the comment from the Tensor Board constructor, and then so what that’ll do is that will instead of manually setting the filename tester board. Well, just name the file based on based on the time and date. I believe so it’s not like the tenant. You can still go into tents aboard and look at the run. You’re just not gonna have all of the run information like in the file name like we had before, so keep that in mind. Now, if you didn’t know, we’re actually filming this video from Vietnam and we have another channel called Deep Lizard vlog, where we connect with you guys in a new way and we document all of our travels, so go over to Deep Lizard vlog on Youtube and check it out right now, We’re in Vietnam. The videos that are coming out on that channel are right now are from when we were previously in Thailand before coming to Vietnam, So the Vietnam videos will come out sometime in the future. But also if you haven’t already be sure to check out the deep lizard hog mind where you can get exclusive perks and rewards thanks for contributing to collective intelligence. I’ll see you in the next one [Music] [Music]!

0.3.0 | Wor Build 0.3.0 Installation Guide

Transcript: [MUSIC] Okay, so in this video? I want to take a look at the new windows on Raspberry Pi build 0.3.0 and this is the latest version. It's just been released today and this version you have to build by yourself. You have to get your own whim, and then you...

read more