Transcript:

When implementing backpropagation, there is a test called gradient check. Helps to check if backpropagation is implemented correctly. Because even if you write all the formulas, you are not 100% sure that the details are correct. Therefore, in order to implement the gradient check first, numerically approximating the calculation of the gradient. Learn how and in the next video. Let’’s cover how to implement a gradient check to check if the implementation of backpropagation is correct. First, let’s take the function F and plot it as a graph F(Θ) = θ^3. Let’s consider the case where θ is 1, Instead of slightly shifting θ to the right to get θ+ε. By moving both right and left little by little, we get θ-ε as well. Θ is 1, right is 1.01 Left is 0.99 Ε is 0.01 as before. Instead of taking this little triangle and getting the height of the width, You can get a better slope ratio. At this point, where f is θ-ε and this point where θ+ε is. Calculate the height in width from the larger triangle. I won’’t go into depth, but the technical reason for doing this is. Finding the height of the width in the larger triangle approximates the derivative at Θ. Everything gives a better value. There is also a triangle below like the triangle above this This triangle at the top and this triangle at the bottom To use the larger green triangle. Consider both of these triangles. Don’t just use the difference on one side. Use the difference on both sides. Now let’’s calculate it mathematically. This point is f(θ+ε). This point is f(θ-ε). So the height of the big green triangle is f(θ+ε)-f(θ-ε). The width of the triangle is ε in one part and ε in the other, So The width of the green triangle is 2ε Therefore. The height of the triangle’s width is f(θ+ε)-f(θ-ε). Divide this by the width of 2ε And this value should be similar to g(θ). We will use f(θ) = θ^3 to substitute values. Θ+Ε is 1.01 And we take the cube of this. Subtract (0.99)^3 here and this value Divide by 2(0.01) Pause the video for a moment here and do your own calculations. So you get 3.0001 on the previous slide G(Θ) = 3θ^2. And when θ is 1, This value is 3 These two values are very close. The approximation error is 0.0001 When using only the difference of one side θ+ε on the previous slide, I got 3.0301 So the approximation error was 0.03 Which is greater than 0.0001 So if we use the difference between both sides to approximate the derivative, You can see that it comes out very close to 3 So we know that G(Θ) is a more correct implementation of the derivative of f. I can be more confident. If you use this for backpropagation gradient check, It will run twice as slow as using only one difference, Actually. I think it’s worth using this other method to be much more accurate. For those of you familiar with calculus, I would like to tell you an additional theory. It’’s okay, if you don’’t understand what I’’m talking about? The official definition of the derivative is F(θ+ε)-f(θ-ε) / 2ε for a very small value ε. This is the extreme value when ε goes to zero in the official definition of the derivative. The definition of extreme can be learned in an aesthetic class. I won’’t cover it in this lecture. For values, where ε is not zero. The error of this approximation is O(ε^2). Ε is a very small number. When Ε is 0.01 ε^2 becomes 0.0001 This big-o notation indicates that the error is a constant. This is our approximate error example. So you can think of O(ε^2) as 1, On the other hand, using this equation, the error is O(ε). Since ε is less than 1, it is much larger than ε^2. In other words, this expression is more than the expression on the left. Is a much less accurate approximation. So when doing the inclination test To use the difference between both sides, calculating f(θ+ε)-f(θ-ε) / 2ε More accurate than using one-sided difference. Don’’t worry, if you don’’t understand my last words down here. For those familiar with aesthetics and numerical approximations, It’s just an additional explanation. Just be aware that it is more accurate to use the formula for the difference between the two sides. In the next video, we will use it to do the slope check. By taking the difference between both sides, the function G(Θ) is numerically. We were able to see if we were implementing the correct derivative of the function F. Now this is done to see if Backpropagation is implemented correctly. Or if there are any bugs to be fixed, I’ll cover how to use it to find out.