Transcript:
Hi, everybody! Welcome back to a new. Pi Torch Tutorial This time. We talked about the Softmax function and the cross entropy loss. These are one of the most common functions used in neural networks. So you should know how they work now. I will teach you the math behind these functions and how we can use them in Numpy and then Pi Torch. And at the end, I will show you how a typical classification neuron that work with those functions look like, so let’s start and this is the formula of the softmax, so it applies the exponential function to each element and normalizes, it by dividing by the sum of all these Exponentials. So what it does it basically squashes the output to be between 0 and 1 so we get probabilities, so let’s have a look At an example. Let’s say we have a linear layer which has 3 output values and these values are so-called scores or all logits, so they are raw values, and then we apply the softmax and get probabilities, so each value is squashed to be between 0 and 1 and the highest value Here gets the highest probability. And, yeah, if we sum these three probabilities up, then we get 1 and then this is our prediction, and then we can choose for the class with the highest probability. So yeah, that’s how the Softmax works and now let’s have a look at the code so here. I already implemented it in Numpy, so we can calculate this in one line so first we have the exponential, and then we divide by the sum over all these Exponentials. And now let’s run this. This has the same values as in my slide and then here. We also see that the highest value, the highest logit has the highest probability around it. Then the in my slide. So it’s slightly different, but basically, we see that it’s correct, and, of course we can also calculate it in Pi Torch and for this, we create a 10-0 so let’s say X equals torch dot tensor and it gets the same values as this one, and then we can say outputs equals torch dot softmax of X and we also must specify the dimensions, so we say dim equals zero, so it computes it along the first axis, and now let’s print these outputs, so yeah, here we see that the result is almost the same, so this works and now let’s continue so a lot of times. The softmax function is combined with the so called cross entropy loss, So this measures the performance of our classification model, whose output is a probability between 0 & 1 and it can be used in multi class problems and the loss increases as the predicted probability diverges from the actual label. So the better our prediction, the lower is our loss so here. We have two examples so here. This is a good prediction, and then we have a low cross entropy loss and here. This is a bad prediction, and then we have a high across entropy loss and what we also must know. Is that in this case our? Y must be hot. That one hot encoded. So let’s say we have three three possible classes class 0 1 & 2 and in this case, the correct label is the class 0 so here we must put a 1 and for all the other classes, we put a zero, so this is how we do one hot encoding and then for the predicted why we must have probabilities. So for example, we apply these softmax here before, and yeah, so now again, let’s have a look at the code. How we do this in numpy, so we can calculate this here. So we have the sum over the actual labels times the lock of the predicted labels, and then we must put a minus one at the beginning, and we can also normalize it, but we don’t do this here so we could divide it by the number of samples, and then we create our. Y so, as I said, this must be one hot encoded. So here we have other examples, so if it it’s class one, then it must look like this, for example, and then down here, we put our two predictions, so these are now probabilities. So the first one has a good prediction because also here the class zero has the highest probability and the second prediction is a bad prediction. So here class zero gets a very low probability and plus two gets a high probability and now then I compute the entropy, the cross entropy and predict both of them, so let’s run this and here we see that the first prediction has a low loss and the second prediction has a high loss and now again let’s see how we can do this in Pi Torch. So for this first, we create the loss, so we say loss equals N N from the Nura torch and N module and N Dot cross entropy loss. And now what we must know, let’s have a look at the slides again so here we have to be careful because the cross-entropy law’s already applies the lock softmax and then the negative log likelihood loss, so we should not or must not implement the softmax layer for ourselves. So this is the first thing we must know and the second thing. Is that here, our? Y must not be one hot encoded. So we should only put the correct class label here and also the Y predictions and has raw scores. So no softmax here, so be careful about this, and now let’s see this in practice, so let’s say, let’s create our actual labels, and this is a torch dot tensor. And now here we only put the correct class label, so let’s say in this case. It’s class zero and not one hot encoded anymore, and then we have a good prediction. Y prediction good equals torch dot tens or and then here we must be careful about the size. So this has the size number of samples times the number of classes, so let’s say in our case, we have one sample and three possible classes, so this is an array of arrays and here we put in 2.0 1.0 and 0.1 and remember and this. These are the raw values, so we didn’t apply the softmax and here the highest or the the class zero has the highest value, so this is a good prediction and now let’s make a bad prediction, so prediction bad so here. The very first value is a lower value. Let’s say, and the second value is high and let’s change this also a little bit, and now we compute our loss like this. So now we call the loss function that we created here, and then we put in the Y prediction and the actual Y and the same with our second. Let’s compute a second loss with Y prediction bad, and why, and now let’s print them, so let’s print L 1 dot item. So it only has one value, so we can call the item function and also L 2 dot item, so lets. Run this and yeah here. We see that our good prediction has a lower cross-entropy loss, so this works and now to get the actual predictions, we can do it like this, so let’s say underscore because we don’t need this. And then predictions predictions equals. Torche Dot Max. And then here we put in the prediction, so why prediction good and then along the first dimension and also the same with the bad one, so let’s call this prediction 1 and prediction two and let’s print our prediction so predictions 1 and print predictions 2 so this will here we see that we choose the highest probability. So in this case, we choose this one and in in the second case, we choose this one so class number one here, so this is how we get the predictions and what’s also very good is that the loss in Pi Torch allows for multiple samples, so let’s increase our samples here, so let’s say we have three samples so three possible classes. Then our Tenza must have three class labels, our actual. Y so, for example, 2 0 & 1 and then our predictions must be here of size number of samples times the number of classes. So now this is of size 3 by 3 So let’s do this so here. We must put in another list with 3 values, so like this and like this, so let’s say this one is a good prediction, so the first class. So the first correct label is class Number 2 so this one must have the highest value and this one must be low, so let’s say point 0 1 and here, the very first one. The first class must have a high value so like this, and then the value in the middle must have the highest raw value, so for example, like this, and then we do the same for our bad prediction and let’s say here we have this one higher and also changed this a little bit, and then we can again. Compute the cross at the cross entropy Los with multiple samples. And now let’s run this. And then we also see here again. Our first prediction is good and has a low loss and the second one is not so good and, yeah, here. We get the correct predictions from the first prediction. Tenza so here. We also have two zero one like in the actual. Y so, yeah, this is how we can use the cross-entropy loss in Pi Torch, And now let’s go back to our slides so now. I want to show you. How a typical neural network looks like so here is a typical neural net in a multi-clas’s classification problem. So here, we want to find out what animal our image shows, so we have an input layer and then some hidden layers, and maybe some activation functions in between and then at the end, we have a linear layer with one output for each class. So here we have two outputs and then at the very end, we apply our soft. Max, and get the probabilities. So now, as I said in Pi Touch. We must be careful because we use the cross entropy loss here, so we must not use the softmax layer in our neural net, so we must not implement this for ourselves, so let’s have a look at how this code looks so in a multi-class classification, our net, for example, and looks like this, so we define our layers, so we have one linear layer, which gets an input size and then a hidden size. Then we have an activation function in between and then our last layer it gets the hidden size and then the output size is the number of classes, so for each possible class, we have one output and then in the forward method so here we only apply our layers and then no softmax here at the very end, and then we create our model and then we use the cross entropy laws, which then applies to softmax. So yeah, be careful here. And so this example also works for more classes, so if our image could, for example, also be a bird or a mouse or whatever then this is also the correct layout, but if we just have a binary classification problem with two possible outputs, then we can change our layer like this. So now we rephrase our question, so we just say. Is it a dog, yes or no? And then here? At the end, we have a linear layer with only one output and then we do not use the softmax function, but we use the sigmoid function, which then gets a probability and if this is higher than 0.5 then we say yes and here in Pi Torch. We use the BCE loss or binary cross entropy loss. So here we must implement the sigmoid function at the end, so let’s have a look at our neural net in a binary classification case. So again here first, we set up our layers and our activation functions and the last layer has the output size 1 so this is always fixed in this case and then in the forward pass now here after we applied our layers, we also must implement the sigmoid function so yeah, and then here is a criterion. We use the binary cross entropy loss, so be very careful here about these two different, different, possible neural nets and, yeah, that’s basically what I wanted to show you So the last structures also what I used in the logistic regression tutorial. So you can check that out if you haven’t already, and for now, that’s all I wanted to show you. I hope you enjoyed it and understood everything. If you have any questions, leave them in the comments below. And if you like this tutorial, then please subscribe to the channel and see you next time. Bye bye!