Transcript:
Hey, what’s going on? Everyone in this video, We’re going to discuss what Max Pooling is in a convolutional neural network, so let’s get to it. We’re going to start out by explaining what Max Pooling is and we’ll show how it’s calculated by looking at some examples. Well, then discuss the motivation for why Max Pooling is used and we’ll see how we can add Max. Pooling to a convolutional neural network and code using Kerris we’re going to be building on some of the ideas that we discuss in our video on convolutional neural networks. So if you haven’t seen that yet, go ahead and check it out and then come back to watch this video once you’ve finished up there. Max pooling is a type of operation. That’s typically added to Cnn’s following individual convolutional layers when added to a model, Max pooling reduces the dimensionality of images by reducing the number of pixels in the output from the previous convolutional layer Lets. Go ahead and check out a couple of examples to see what exactly Max pooling is doing, operation wise and then we’ll come back to discuss why we may want to use Max pooling. We’ve seen in our video on Cnn’s that each convolutional layer has some number of filters that we define with a specified dimension and that these filters convolve our image input. So when a filter convolved a given input, it gives us a resulting output. This output is a matrix of pixels with the values that were computed during the convolutions that occurred on our image So here we’re going to be using the same image of a7 that we used in our previous video on Cnn’s recall on the Left. We have a matrix of the pixel values from an image of a7 from the in this data set. Then we have this 3×3 filter that’s been initialized with random numbers, and now on the right we have our output resulting from this filter convolving our input. So as mentioned earlier, Max pooling is added after a convolutional layer. So since we have this output from our convolutional layer here, Max pooling would follow. Let’s scroll to the right, and we see that we have some type of transformation of our output here. This transformation was achieved by doing Max. Pooling Max Pooling works like this. We defined sum in by N region as a corresponding filter for the MAX pooling operation. We’re going to use two by two for our example, then we define a stride meaning by how many pixels do we want our filter to move as it slides across the image? We’re going to use two for this as well. Then we come over to our convolutional output and we take the first 2×2 region and calculate the Max value from each value in this 2×2 block. We then store that value, which is going to be used to make up the full output from this Max. Pooling operation, in this example, the max of this first 2 by 2 block is 0 since all the values in the block or 0 and then we store it here. We then move over by the number of pixels that we defined our stride to be we’re using two here, so we just slide over by two. Then do the same Max operation. We calculate the Max value in this 2 by 2 block store it over here and then go on our way sliding over by 2 again, so we do that all the way until we reach the edge on the far right we then move down by 2 because that’s our stride size, and then we do the same exact thing of calculating the Max value for all the 2×2 blocks in this room we can think of these 2×2 blocks as pools of numbers, and since we’re taking the Max value from each pool. We can see where the name, Max. Pooling came from so this process we went through is carried out for the entire image and when we’re finished, we get this new representation of the image in this example. Our convolutional output was twenty six by twenty six in size now. After performing Max pooling, we can see the dimension of the image was reduced by a factor of two and is now 13 by 13 just to make sure we fully understand this operation we’re going to quickly look at a scaled-down example that may be more simple to visualize so here on the Left. We have some sample input of size four by four and now we’re going to use the same two by two filter size with a stride of two to do Max pooling on this input, so our first two by two region is here in orange and we can see the max value of this region is nine, so we store that over here in our output, then we slide over by two pixels and we see the max value in the green region is eight, and we stored that over here in our output as well since we’ve now reached the edge, we move back over to the far left and go down by two pixels here. The Max value in the blue region is six and we store that here in our output and finally we move to the right by two and see the Max Value of the Yellow region is five, so we store that over here in our output as well. Now, we’ve just gone through the complete process of Max Pooling on this sample 4×4 input and the resulting output. Is this 2×2 block here? So our input dimensions were again reduced by a factor of two. All right, so now we know what Max pooling is and how it works so now what’s left for us to discuss, is the why why would we want to add this to our network? Well, there are a couple of reasons why adding Max pooling to our network may be helpful for one since Max. Pooling is reducing the resolution of the given output of a convolutional layer. The network will be looking at larger areas of the image at a time going forward, which reduces the amount of parameters in the network and consequently reduces computational load. Additionally, Max pooling may also help to reduce overfitting. Now the intuition for why Max Pooling works is that, for a particular image, our network will be looking to extract some particular features, so maybe it’s trying to identify numbers from the in this data set and so it’s looking for edges and curves and circles and such then from the output of the convolutional layer we can think of the higher valued pixels as being the ones that are the most activated, so with Max pooling as are going over each region from the convolutional output were able to pick out the most activated pixels and preserve these high values going forward while discarding the lower valued pixels that are not as activated now just to mention quickly before going forward. There are other types of pooling that can follow this exact same process. We’ve just gone through except for that. These other types will do some other operation on the regions rather than finding the Max value, For example, average pooling is another type of pooling. And that’s where you take the average value from each region rather than the Max value. Currently, though Max Pooling is used vastly more than average pooling or any other type of pooling for that matter. But I did just want to mention that point. All right, now, Let’s jump over to Kharis and see how this is done in code. So here in our Jupiter notebook. I have a completely arbitrary CNN that I define. It has a dense input layer that accepts input of 20 by 20 dimensions, then a convolutional layer followed by a max pooling layer and then one more convolutional layer that’s finally followed by an output layer following the first convolutional layer. This line here is how we specify Max pooling since the convolutional layers are 2d here. I’m using the max pooling 2d layer from Charis. But Kerris also has 1d and 3d max pooling layers as well. The first parameter that we’re specifying here is the pool size. This is the size of the filter, and in our example, we used a 2 by 2 filter so we can specify that here by providing this tuple that contains 2 comma 2 The next parameter is. Strides again. In our example earlier, we used a 2 as well. So that’s what I’ve specified here And the last parameter that I’ve specified is the padding parameter if you’re unsure what padding or zero padding is in regards to Cnn’s. Be sure to check out my earlier video. That explains this recall from that video. We discussed how valid padding means to use no padding. So that’s what I’ve specified here for my max pooling layer and actually, I don’t think it’s common practice at all to use padding on Max pooling layers, but while we’re on the subject of padding, I wanted to point something else out, Which is that for my two convolutional layers. I’ve specified same padding so that the IMPA is padded. Such that the output of the convolutional layers will be the same as the input And the reason I wanted to point that out is because if we go ahead and look at a summary of our model, we can see that the dimensions from the output of our first layer are 20 by 20 which matches the original input size, then the dimensions of the output from our first convolutional layer maintain the same 20 by 20 values because we’re using same padding on that layer. Now once we go down to the max pooling layer, we see the value of the dimensions has been cut in half to become 10 by 10 This is because as we saw in our earlier examples, a filter of size 2×2 along with a stride of two for a max pulling layer will reduce the dimensions of the input by a factor of two. So that’s exactly what we see here and then, Lastly, this max Pooling layer is followed by one last convolutional layer that’s using same padding so we can see that the output shape for this last layer contains the ten by ten dimensions from the previous max pooling layer, the cause of this specified padding so at this point. I hope you’ve gained an understanding for what Max Bullying is what it achieves when being added to a CNN and how you can specify Max pooling in your own neural network using Kerris. Let me know what you think in the comments below. Thanks for watching, and I’ll see you next time [Music]!