Transcript:

Hello, everyone, let’s quickly. Go over! What is loaded and softmax? If you are working on the tip loading model for the multi-clas’s classification, you might heard about the softmax and the logit and here is the example we have the input. Which is this cute dog? We have. The employer will take the back to rise the value of this input. We have the hidden layer and we have the last layer. Let’s suppose we don’t have the activation yet. Then you have the range from the minus infinity to the plus infinity for the output values here we can say the input is the double here because we have the greatest value from the output layer here. Well, we can also have the minus failure because the Renji is starting from the minus infinity. In this example, we have the minus tuned for the rabbit. This is not the greatest situation for us because we want to have the probability from the upper layer. Then how can you have the probability from this or minus infinity to the plus infinity ranging well? If you’re familiar with binary classification, you might heard about the sigmoid function and we can have this sigmoid function as a activation function of this last layer. Then you can have the probability say we have the 85% for the dog, which is a greatest probability from this neural network so we can say the input. Is the dog here well? This may be a good solution when you want to output multiple possible outputs, something like multi-label classification, but it is not a good solution when you want to output. Only one item for multiple class classification. The constellation is to have probability distribution over predicted output classes. We call it soft, Max. So if you are summing all of this example of output values that will be exactly one here, so we can say it is the the probability distribution of a predict the upper classes and that we can. You can say everybody’s here is a property for each places, So let’s talk about Softmax and before talking of the Softmax. We need to understand the logi’t first because logit is the input to the softmax. In the statistics, there is a term logit and the probability and the ones you have the probability you can convert the probability to the logit. There’s the equation here, and when the probability is 0% the logit is like a minus infinity and the window probabilities like 100% the logit is plus infinity. Also, you can check here from the graph, and the Logit is like a monotonically increasing with a probability here that means if the tip loading model has a loaded, we can use the logic as your squirrel for the probability, so we can think about the last layer from the last example as a loaded layer here, and you can get the probability using this equation, so let’s get the probability then you can get like the 98.2% for the dog and the properties for the cat and the levels and in this example, so you can see the input. Is your dog in this example? But this situation is very similar to having the sigmoid function at the end. This is not the probability distribution over all predicted class, just summing all the probabilities is nothing 1 here, so we need soft mix Softmax guarantee state summing up all the values is the one. So what is the softmax? Well, we can use the e to the power of the logit as a input to the soft mix and you can see from this chart that e to the power of the logit also is monotonically increasing with the probability we can use this as a score from the tip learning model as well and if e to the power loaded in the score, the high school will have a much more probability and low score will have less probability. This is also benefit of the softmax for the deep learning model for this situation. When you are training well for the multi-clas’s classification, we normally use the 1 hat encoding for the y-value, for example, the dog white dog value in one. Hut, including C, is something like 1 0 0 You want to compare this one with your predictive value by using the soft Max, you will exaggerating the probability area. The heist property will be very close to the one, and the low property will be close to the zero, so even have to optimize the tip. Naeem model during the training as well and the other questions I’m getting, is ye? Can we use two or three some number as base here? Yes, you can use any number, which is greater than zero as base here, but we consider the last layer value as you loaded here, right, and you can see the equation here as well here. We have the e as a base of this. Look, e is nothing but just like mathematical, constant and having E as base makes sense and easy to calculate as you can see from the C equation as well, so let’s calculate this soft mix. Soft message does not normalize the property here, so we can have the denominator like the E to the power of the four, plus eight to load two plus e to the minus two from this example, and we can get the normalized property here, so the soft mix for the dough is like a point. Eight, seven, nine and the Photic head is point one one nine and the photo rabbit is point zero zero two summing up over this value will be one now and that we can now say this is the probability for each class for all over. The predicted classes here were also softmax our pool representative learning Mod’s confidence on the class here. Okay, so here’s. The takeaway softmax gives the probability distribution over predicted output classes and the final layer in the deep learning has logic values, which are the raw value for prediction buyer. Softmax and the logit is input to the softmax. That’s it, thanks for watching and Ill. See you on the next video.