Transcript:

Hi, everybody, welcome to a new. Pi Torch Tutorial This time. I want to talk About Activation functions. Activation functions are an extremely important feature of neural networks, so let’s have a look at what activation functions are why they are used. What different types of functions there are and how we incorporate them into our. Pi torch model. So activation functions apply a linear transformation to the layer output and basically decide whether a neuron should be activated or not. So why do we use them? Why, it’s only a linear transformation, not good enough. So typically, we would have a linear layer in our network that applies a linear transformation so here it multiplies the input input with some weights and maybe add sub buyers and then delivers the output and let’s suppose we don’t have activation functions in between. Then we would have only linear transformations after each other, so our whole network from input to output is essentially just a linear regression model and this linear model is not suited for more complex tasks, so the conclusion is that, with nonlinear transformations in between our network and learn better and perform more complex tasks so after each layer, we typically want to apply this activation functions. So here first, we have our normal linear layer, and then we also apply this activation function and with this, our network can learn better and now let’s talk about the most popular activation functions, so the ones. I want to show you is the binary step function, the sigmoid function, the hyperbolic tangent function, the real Ulla, the leaky riilu and the softmax. So let’s start with the simple step function, so this will just output one if our input is greater than a threshold so here, the threshold is zero and zero, otherwise, so this is not used in practice, actually, but this search should demonstrate the example of if the neuron should be activated or not. And, yeah, so a more popular choice is the sigmoid function, And you should already know this. If you’ve watched my tutorial tutorial about logistic regression. So the formula is 1 over 1 plus E to the minus X. And this will output a probability between 0 & 1 and this is typically used in the last layer of a binary classification problem. So, yeah, then we have the hyperbolic tangent function or tan. H This is basically a scaled seed MoIT function and also a little bit shifted, so this will output a value between minus 1 and plus 1 and this is actually a good choice in hidden layers. So you should know about the TUN H function. Then we have the real UU function. And this is the most popular choice in in most of the networks, so the real UU function will output 0 for negative values, and it will simply output the input as output for positive values, so it is actually a linear function for we’ll use greater than 0 and it is just 0 for negative values, so it doesn’t look that much different from just a linear transformation, but in fact, it is nonlinear, and it is actually the most popular choice in the networks, and it’s typically a very good choice for an activation function, So the rule of thumb is if you don’t know which function you should new use, then just use a real uu for hidden layers. Yeah, so this is the real uu. Very popular choice. Then we also have the leaky real UU function, so this is a slightly modified and slightly improved version of the real ooo. So this will still just output the input for X greater than zero, but this will multiply our input with a very small value for negative numbers so here. I’ve written a times X for negative numbers, and this a is typically very small, so it’s, for example, point zero zero one, and this is an improved version of the riilu that tries to solve the so-called vanishing gradient problem because with a normal real. Ooh, our values here are zero, and this means that also the gradient later in the backpropagation is zero and when the gradient is zero, then this means that these weights will never be updated, so these neurons won’t learn anything, and we also say that these neurons are dead and this is. Why sometimes you want to use the leaky real function? So whenever you notice that your weights won’t update, you’re in training, then try to use the leaky riilu instead of the normal reroute and yeah, then as the last function. I want to show you. The softmax function and you also should already know this because I have a whole tutorial about the soft match Softmax function, so this will just this will basically squash the inputs to be outputs between 0 and 1 so that we have a probability as an output and this is typically a good choice in the last layer of a multi-class classification problem. So yeah, that’s the different activation functions. I wanted to show you and now let’s jump to the code and see how we can use them in Pi Torch. So we have two options and the first one is to create our functions as N N modules so in our network in the init function first, we define all the layers We want to have so here. For example, first, we have a linear, and then after that, we want to have a real ooh, activation function, so we create our real Ooh, module here and we can get that from the torch dot and end module, so this contains all the different functions. I just showed you, and then we have the next layer. For your example. It’s a next linear layer and then the next activation function. So here we have a sigmoid at the end and then in the forward pass, we simply call all these functions after each other. So first, we have the linear, the first linear layer, which gets an output and then we use this output at and put it into our riilu and then again we use this output and put it in the next linear layer and so on so this is the first way how we can use it, and the second way is to use these functions directly so in the init function, we only define our linear layers, so linear, one and linear two and then in the forward pass, we apply this linear layer and then also call this torch Dot Riilu function here and then the torch dot sigmoid function directly, so this is just from the torch API, and yeah, this is a different way how we can use it both ways. We’ll achieve the same thing. It’s just how you prefer your code and yeah, so all the functions that I just showed you. You can get from the N N module. So here we had n n riilu. But we can, for example, also have N N dot sigmoid and we have N N dot softmax and we have an N dot ton. H and also and N dot leaky real. Ooh, so all these functions are available here and they are also available in the torch. API like this. So here we have towards Dot reload, and we have torch Dot Sigmoid. We also have torch Dot softmax and torch dot ton age, and but sometimes they are not used in the the functions are not available in the torch API directly, but they are available in torch dot and N dot functional. So here I imported torch and and functional SF, And then I can call here, for example, F Dot Riilu. So this is the same as torch dot real, but here, for example, is the torch is f dot. Li Kiri Lewis only available in this API. So yeah, but that’s how we can use the activation functions in PI Torch, and it’s actually very easy, and I hope you understood everything and now feel comfortable with activation functions. If you like this, please subscribe to the channel and see you next time bye.