Transcript:
Welcome to this series on neural network programming with Pi Torch. It’s time now to learn about the layers of our CN N by building an understanding of the parameters we used when constructing them without further ado. Let’s get started last time. We started building our CN N by extending the Pi Torch Neural Network module class and defining some layers as class attributes, we defined two convolutional layers in three linear layers by specifying them inside our constructor. Each of our layers extends Pi torche’s, neural network module class for each layer. There are two primary items encapsulated inside a fort function definition and a weight tensor. The weight tensor inside each layer contains the weight values that are updated as our network learns during the training process. And this is the reason we are specifying. Our layers as class attributes inside our network class. Pi Torch is a neural network module class keeps track of the weight tensors inside each layer and since we are extending the neural network module class, we inherit this functionality automatically. Remember, inheritance is one of those object-oriented concepts. We talked about last time. All we have to do to take advantage of this. Functionality is assign our layers as attributes inside our network module and the module base class will see this and then register our weights as learn about parameters of our network. Learn about parameters will be the topic of the next post, so stay tuned for that one. Our goal in this post is to better understand the layers we defined to do this. We’re going to learn about the parameters and the values that we pass for these parameters in the layer constructor’s first things first, though, let’s make sure we can make the distinction between a parameter and an argument we often hear the word parameter, and we often hear the word argument, but what’s really the difference between the two well parameters are used inside function definitions for this reason, we can think of parameters as placeholder’s arguments on the other hand are the actual values that are passed to the function when the function is called parameters of a function are like local variables that live inside of a function and arguments are the values that are assigned to these variables from the outside by the caller of the function in our networks case, The names are the parameters and the values that we have specified are the arguments to better understand the argument values for these parameters. Let’s consider two categories or types of parameters that we used when we constructed our layers. The two categories are hyper parameters and data dependent hyper parameters. A lot of terms in deep learning are used pretty loosely, and the word parameter is definitely one of them. So try not to let it throw you off. The main thing to remember about any type of parameter is that the parameter is a placeholder that will eventually have a value. The goal of these particular categories is to help us. Remember how each parameter value is decided when we construct a layer, we pass values for each parameter to the layer’s constructor with our convolutional layers. We have three parameters and with our linear layers, we have two parameters so for the convolutional layers, we have in channels out channels and kernel size and for the linear layers we have in features and out features. Let’s see how the values for these parameters are decided we’ll start by looking at hyper parameters and then we’ll see how the dependent hyper parameters fall into place. Hyper parameters are parameters whose values are chosen manually and arbitrarily this means that as neural network programmers, we choose hyper parameter values, mainly based on trial and error and increasingly by utilizing values that have been proven to work well in the past for building our CNN layers. These are the parameters we chose manually kernel size out channels and out features when we say manually, we mean that these values are not derived values, they’re simply arbitrary, and it’s the job of the networks to choose these values. This is pretty common and we usually test and tune these parameters to find the values that work best. Let’s see what each of these parameters actually do, so we can understand how our decisions can impact the network’s architecture. The kernel size sets the size of the filter that will be used inside the layer in deep learning. The word kernel is another word for filter, so we can say convolutional kernel or we can say convolutional filter inside a convolutional layer. The input channels are paired with a convolutional filter to perform the convolution operation, The filter convolve X, the input channels and the result of this operation is an output channel so one filter convolving the input channels gives us a corresponding output channel. This is why, when we set our output channels, we are actually setting the number of filters. If we pass a value of 6 for our out channels parameter indicating that we want 6 output channels, we are also indicating that we want to have 6 filters inside the layer in the case of our first convolutional layer, we have one input channel that will be convolve by six different filters, which will create six output channels. [MUSIC] These output channels also go by another name and this name is feature maps. If we are dealing with linear layers, we don’t call them feature maps because the outputs are just rank one tensors. We just refer to them as features, so we have out features instead of out channels or feature maps. We choose values for the out features based arbitrarily on how many nodes we want in our layer. However, when we talk about data dependent parameters, we’ll see that there is a caveat with the output layer, the last linear layer, one pattern that shows up quite often for setting these parameters is that we increase our output channels as we add additional convolutional layers, and after we switch to linear layers, we shrink our out features as we sort of filtered down to the number of output classes, we have all of these parameters impact our network’s architecture. Specifically, these parameters directly impact the weight tensor’s inside the layers, we’ll dive deeper into this in the next post when we talk about learn about parameters and will actually inspect the weight tensors, but for now let’s go ahead and cover the data dependent hyper parameters data dependent hyper parameters are parameters whose values depend on the data, so take a second and think about which of these parameter values depend on data. [MUSIC] to data-dependent hyper parameters that stick out are at the start of the network and at the end of the network, the end channels of the first convolutional layer and that out features of the last linear layer, You see the in channels of the first convolutional layer depend on the number of color channels present inside the images that make up the training set since we are dealing with grayscale images, we know that this value should be one. The out features of the output layer depend on the number of classes that are present inside our training set since we have ten classes of clothing articles inside the fashion in this data set. We know that we need ten output features. These ten outputs are the predictions from the network for each category in general. The input to one layer is the output from the previous layer and so all the in channels in the complex and the end features in the linear layers depend on the data coming from the previous layer. So all of these are data dependent hyper parameters. Now, when we switch from a conflate to a linear layer, we have to flatten our tensor. This is why we have 12 times four times Four As the number of N features here. The twelve comes from the number of output channels in the previous layer. But why do we have these two fours? We’ll cover how to get these values in a future post, but for now. This is our challenge. What do these two fours represent? Put your answers for these in the comments, we’ll learn a lot more about the inner workings of our network and how our tensors flow through our network when we implement our for function, but for now this gives us a good start, check out this table on deep lasercom that describes each of the parameters and their values to make sure you can understand how each parameter value is determined. Don’t forget to check out the deep lizard hivemind, where you can get exclusive perks and rewards leave a comment and hit the like button to support collective intelligence. And I’ll see in the next one. I mean, I hope we are out there. Mars, and maybe beyond Mars. The moons of Jupiter. I hope we’re traveling frequently throughout the solar system, perhaps preparing for missions to nearby star systems. I think all of this is possible within 50 years, and I think they’ll be very exciting to do that, and I think we’ll see autonomy and artificial intelligence advanced tremendously. Like that’s actually quite near-term. My guess is in probably ten years. It will be very unusual for cars to be built that are not fully autonomous. I think almost all cars built will be capable of full autonomy in about 10 years as it is the Tesla cars that are made today. Have the sensor system necessary for full autonomy, and we think probably enough compute power to be safer than a person, so it’s mostly just a question of developing the software and uploading the software and if it turns out that the compute power that more compute power is needed, we can easily upgrade the computer and so that’s all. Tesla’s built since October of last year and other manufacturers will follow and do the same thing. Getting in a car will be like getting in an elevator. You just tell it where you want to go. And it takes you there with extreme levels of safety, and that’ll be normal. I’ll just be normal. Thank you for elevators. There used to be elevator operators. You get in there, big guy moving a lever. Now you just get in you. Press the button. [MUSIC].