Transcript:
To get a good encoding for the face image, One of the ways to train the variables of a neural network is Is to define and apply gradient descent to the triplet loss function Lets. Take a look at its meaning. To apply triplet loss, we need to compare image pairs. For example, to train the variables of a neural network. Given these pictures? I have to see several pictures at the same time. Because they are the same person when these pictures are given, I expect the encoding to be similar. When I was given these pictures at the same time Because it’s a different person, I expect there will be a difference in the encoding. The term triplet loss means. Always look at one anchor image, then. With a positive image that means being the same person Is to find the distance between the anchor images, Conversely, if the anchor image is compared to the negative image. The distance between them will be much longer. This is why the term triplet loss is used. Because I always see three images at the same time View, anchor image, positive image and negative image. At the same time, I will express these as A P and N respectively. More formally, what we want is. The variables or encodings of a neural network follow the following characteristics: The value of the anchor’s encoding minus the positive images encoding is I want to be small, especially Of the difference between the encoding of the anchor, and the negative image Must be less than or equal to the norm squared. This is D(A P) and this is D(A N). You can think of d as a function of distance. That’s why it was named the letter D. If the right side term of this equation is transposed to the left side, Subtract ||f(A)-f(P)||², Right-handed becomes ||f(A)-f(N)||². This value must be less than or equal to 0, Let’’s change this expression a little. One of the obvious ways to satisfy this is Is learning that everything is zero. If the function value of f is always 0, This is 0, minus 0, so it becomes 0, This is also 0, minus 0, so it becomes 0, So if f for an image is always equal to the zero vector, You can almost certainly satisfy this expression. Neural network does not return 0, for all encodings. So you need to know that not all encoding’s are the same. Another way for neural networks to return self-explanatory values, i’s Encoding for all images is the same as for other images. In this case, it is also 0, minus 0, To prevent neural networks from doing this. Modify the purpose of this so that it is not just less than or equal to zero Should have a value significantly less than zero. Especially less than -α. Α is another hyperparameter. This prevents the neural network from doing self-evident har’m. Rather than putting -α here by convention, I do +α here. This is also called margin. Probably a familiar term. If you’ve ever seen a paper on a support vector machine, But it doesn’’t matter if you don’’t know. You can also modify the expression by adding a margin variable to the above expression. For example, let’s say the margin is 0.2 If the D function value for the anchor and positive image is 0.5 If the D value between the anchor and the negative image is slightly larger of 0.51 Then You won’’t be satisfied with this expression. Even though 0.51 is greater than 0.5 That’s not enough, I want D(A N) to be much larger than D(A P). Especially here, this value is Must be at least greater than or equal to 0.7 Or in order to satisfy the minimum margin of 0.2 between them, You can raise this more or lower This more Anchor and positive image, the gap between anchor and negative image To make the value of hyperparameter Α equal to 0.2 This is what margin variables do. The anchor-positive pair and the anchor-negative pair move away from each other This expression below Formulated on the next slide. Let’’s define a triplet loss function. The triplet loss function is defined for three images. Three images A P N In other words, given an anchor image, a positive image and a negative image. The positive image is a person like an anchor. Negative images are anchors and others. We define the loss as In this example, the losses for the three images are: First. Let’’s bring back the equation from the previous slide, that’s. Subtract ||f(A)-f(P)||² ||f(A)-f(N)||². Was the plus margin variable α? I want this to be less than or equal to 0, So to define the loss function. Find the maximum value between this and 0, The effect of finding the maximum value is when it is less than 0, Since the value less than 0, and the maximum value between 0, is 0, Loss is zero. I’m trying to find the value of the green underlined one. If you get an object that makes it less than or equal to 0, The loss in this example, is zero, Conversely, if this value is greater than 0, The result of the maximum value is the green underlined part. So you have a positive loss. Trying to minimize this. The effect of this equation is Try to make this part less than or equal to 0, If this is less than or equal to 0, the neural network will know how negative it is. I don’t care! This is the definition of the loss for a single triplet. The overall cost function for a neural network is. For each triplet in the training set. Is the sum of the losses, Therefore, If you have a set of ten thousand image training for a thousand people Using ten thousand pictures To create a triplet like this Defined for the triplet of the training set Using gradient descent for the cost function? We need to train the neural network. To define a triplet data set. We need pairs of A and P Pairs of the same person. So to train the system, I need a dataset of many images of the same person. That, in this example, That’’s why there are 10,000 images of a thousand people? On average, 10 pictures per person Will be in the data set. If you only have one picture per person, you can’t train the system. Of course, by applying this After training the system, It can be applied to the one-shot learning problem of the face recognition system. If you only have one picture of the person you want to recognize? But in the training set, Multiple images are required for at least a few people To create a pair of anchor image and positive image. How do you actually pick a triplet to create a training set? One of the problems with randomly picking A P and N from the training set is A and P are the same person. When A and N are different, people Is that it is easy to satisfy this constraint. If you choose these at random. Because from two randomly picked pictures Because in probability, A and N will be much different than A and P. I think you will remember, but d(A P) is. In the previous slides, this encoding like this Expressed as the norm squared of the distances of the encodings. But if A and N are randomly chosen different people, There is a high probability that this is much larger than the margin α of the left side. So the neural network will not learn. So when creating a training set, You have to choose triplet A P N that is difficult to learn Specifically. We want every triplet that satisfies this constraint. If triplet is difficult. The values of, D(A P) and D(A N) are similar. You have to choose A P N. In that case, the learning algorithm pays more attention to By increasing the right value and lowering the left value. So there is at least a margin α between the left and the right. The effect of choosing these triples is Is to improve the computational efficiency of the learning algorithm. If you pick a triplet at random, it’s easy to get too many triplet Gradient descent. Won’’t, do anything. Because the neural network will satisfy the constraints in most cases. Only if you pick a difficult triplet Gradient descent tries to keep these values away from each other. Will do the job. If you are interested, the details are in this paper. This is a paper by Florian Schroff, Dmitry Kalenichenko and James Philbin. They developed a system called FaceNet. Many of the ideas I introduce in this video originate from here. This is also an interesting fact about naming deep learning algorithms. If you run it on a domain and call it blank. Usually it is called (blank) Net or Deep (blank). I explained about face recognition. This paper is called Facenet and in the last video I saw Deepface. This idea of naming the algorithm (blank) Net or Deep (blank). It is very common in the field of deep learning. Choose the most useful triplet to speed up the algorithm. If you are curious about the details, please see this paper. It’s a good paper. I’ll organize To train triplet loss. We need to convert the training set into a lot of triplet. There is one triplet here. Anchor a positive image of an anchor-like person And it’s a negative image, That’s different from the anchor. Another triplet Anchor and positive are the same person. Anchor and negative are different people. What you need to do to find an anchor. Positive and negative training set Using gradient descent Is to minimize the cost function J defined in the previous slide. It takes all the variables of the neural network to train the encoding. It has the effect of backpropagating. If you are the same person, the D value of both images will be small. If it’’s someone else, it’’s going to be a big price. This is triplet loss and for learning and encoding of face recognition. It’’s about how to train a neural network. Commercial facial recognition systems are trained on fairly large data sets. There are more than a million or often tens of millions of images. Some commercial companies use over 100 million images. These are very large data sets. This is triplet loss and for learning and encoding of face recognition, It’s about how to train a neural network Face Recognition System. Nowadays In particular large-scale commercial facial recognition systems Trained on very large data sets. Data sets of over a million images are common and Some companies use more than 10 million, or 100 million images Train the system. These are very large data sets. Even with modern standards, it is not easy to master. These data sets, Fortunately. These companies have trained large networks. I have posted variables online. So rather than trying to train this network directly from scratch. Because there is a problem with the size of the shared data, It will be useful to download. Someone else’s training model. But if you’re already training someone else’’s training model. Even if you downloaded it, It would be useful to know how these algorithms work. In some applications, you may have to apply it directly from scratch. The triplet loss ends with this. In the next video, the other variants of the Siamese network. Let’’s see how to train these systems. Then let’’s go to the next video.