Transcript:

Neural networks learn by calculating the difference between their current output and the true values. It’s learning from this difference is encapsulated in a loss function. The process of training and neural network involves minimizing this loss function so that the output of the network is as skilled as possible to the true values. This process involves calculating partial derivatives of the loss function with respect to parameters in the network’s layers. I’m Carlos Lara. And today we will learn how Adam Attak differentiation works in tension flow, automatic differentiation allows us to estimate arbitrary derivatives of functions. Numerically, now, if you haven’t already go ahead and subscribe to the Channel and turn on notifications for future tutorials, So the first thing that we do here is we are going to import tensorflow. Stf and we just check the version to make sure that we’re at least in center flow. – we are and now successful provides. Tf gradient tape API for automatic differentiation, Now think of a gradient tape as let’s say you have a your your neural network. Let’s say just linear layers and imagine a and, like literally a tape underneath it. That is recording the operation, so we have an input tensor or tensor’s coming in. And then they get transformed as they go through the network towards the output. All of those operations tensor operations are recorded on to this tape, and that tape allows recording those computations on that tape allows us to compute the gradient to calculate derivatives and partial derivatives from the output. All the way to the input and in between and we’ll show you how how that works so here for this? For this first example, we’re going to create a variable called X and it’s a tensor and we’ll create a tensor the TF dot once, so it’s a tension and we pass in one argument here, which is the shape, so we pick two by two, so that’s a 2 by 2 matrix. So this is going to create a tensor off once with the shape 2 by 2 so it’s a 2 by 2 matrix of ones and we’ll just print it out just in a moment to see what that looks like. Just double check what we’re doing and now we’re going to open up here texts manager, and we’re going to call it say with TF gradient ape as T. So this is the tape object where the operations will be recorded. And here this T dot watch. It’s just going to record those operations, so this just ensures that the tensor is being traced by this tape is just a safety. Check here for the our input tensor, and as we can see, the input Tensor is a two by two matrix of ones as we expect, And now here we’re going to record an operation here, which is, and we’re storing it in a variable called. Y, as TF reduce some and reduce some grabs a tensor as an argument here as an input and then just calculates the sum of the elements of that tensor along its dimensions and reduces it to just a scalar. So as you can see computer, zoom of the elements across dimensions of a tensor and you can specify optionally an access parameter here. Maybe you just want to do a sum along one of the axis, one of the dimensions, but here we’re just going to reduce the whole tensor to a sum, so since we have a two by two tensor of once, then we expect here. Y to be a four because we’re going to reduce the dimensions to zero dimension so just to rank zero tensor and sum those elements, which is going to be one plus one plus one plus one equals four great, and we do as we can see here when we print our output, which I say it’s still a TF tensor object, of course, but it’s a tensor of Rank 0 with a value 4 and now we want to do another operation here, store it in a variable called Z and we’re going to do now a different tensor operation, a multiply operation, so TF dot multiply and we’re going to pass in this Y here, which we saw its four, so four times four so square, it essentially multiplied by itself, gives us 16 and that’s exactly what we get and we’re we’re printing that out so now that we have those operations recorded onto this gradient tape, We want to do calculate the derivative of Z So Z was the final operation. Here you can think of the output, so the final operations Z. And we want to calculate the derivative of Z with respect to the original input. Tensor X here, and so we just call this tape this T object here and call it a gradient method and we pass in two arguments here we’re going to pass in the function that we want the derivative off and then the the second argument here is what we want. The derivative of Z with respect to in this case is X so the derivative of Z with respect to X, and that is the gradient here that we calculated some method here within our within our tape, where all those operations were recorded and were just storing that in a variable called DZ DX Same calculus as you. If you have you recall Derivative partial drip total derivatives are calculated as partial derivatives, multiplications of partial derivatives. And that’s what we call the chain rule in calculus. So now, since the input tensor shape is two by two, we expect the gradient to have the same shape because we’re calculating the derivative of Z with respect to X, so we expect to get four parameters for derivative values here, and we’re just doing a check to make sure that we the output. The values here are what we expect. And I’ll show you in a moment. What that looks like how we know what values were supposed to get? Oftentimes, it’s just it’s complicated. These inter rivers again are calculated numerically behind the scenes by tensorflow. But in a moment, we’ll show you an example, and you’ll see how that works. Okay, great, so we have that, and then we have our output tensor as well, so it matches the initial shape, but it has the appropriate values here for the partial derivatives so again, just just to just to recap. The gradient tape allows us to record the operations and this could be a neural network by the way. This is just a simple example to show you how it works, but we do some. We have an input tensor. We do operations and transformations of that input tensor as it goes to our network to our layers in the in the general in the case of a neural network and we record those operations, then we can do gradients the direct partial derivatives of that final operation, with respect to operations in between and parameters in the network because again that’s really how what we’re doing with a net with a neural network. Our output will be a loss function and we’ll want to calculate the partial derivatives of that loss function, with respect to the learn about parameters to weights and biases in the layers of them of the network. So let’s go to the next and here you can. Also, you can also get gradients of the output with respect to intermediate values and that’s. What actually happened to the Neural Network is a chain rule, right, so we get all of those in-between values so here we calculated the total derivative D/DX all at the same time, but here, let’s say we just want to calculate one of the partial derivatives. Here, let’s say we just want to calculate. Dz dy this this first part. So we do the same thing. We create this tensor of once shape, two by two same open up a context manager here, gradient tape and again we just watch the tensor. Make sure we up. We record the operations on it, and then same thing we reduce some, but and same thing as before. But here we want to do the gradient instead of doing the whole gradient here. Dz DX We want to do DZ dy so an intermediate to rip derivative here and let’s see and let’s see what we get, so we get something different so here, so the value that we get for Dz Dy is eight and the value of DY. DX is just the original tensor here. The ones of shape 2×2 and that’s why we actually get for the total derivative. We get this 2 by 2 matrix tensor of Rank 2 with 8 values, right, because this 8 is actually multiplied element wise and again. This is a little bit more more complicated, lets. Actually run that. So you see what we get same thing. Just the difference is the same. Is the tensor here. The output, the derivative itself, the intermediate derivative. And you’ll see in a moment here now, in our next example, how that works now by default or resources held by a gradient tape are released as soon as this method is called. So here, you can only call this gradient method Only once. If you try to call it again, you’re going to get an exception, But if you want to calculate multiple gradients within for this same context here, what you can do is specify an argument here, so the same thing you open up a context manager gradient tape, but you specify this persistent argument and set it to true. It’s a boolean, so you set it to true, and that allows you to calculate multiple gradients within for this same tape so here we’re actually initializing a tensor here. X, It’s just a constant to a rank zero tensor with the value three and we’re going to record and do some some operations and let’s see how there’s gradient eight. How this how this works in a simpler case, so we don’t do matrices and tensor’s. Just let’s do it just with simple scalars so to learn a little bit more about what’s going on, so we’re going to record an an operation here, which is the square of this tensor, which really comes on to the actual value. So as I sign it on to Y so X squared X by X so 3 by 3 that’s a nine and let’s actually go ahead and run it, so we get the output here, so we have our initial tensor 3 right, It’s a scalar and then three by three, it’s a nine and again that’s recorded here because we’re in the context manager in the context of a tape of a gradient tape, so that operation is recorded. Then we print it and then we’re going to record it again, so we’re going to now square this output. Now so 9 and so 9 squared 9 times 9 is going to be a a 1 and that’s what we have, and let’s calculate the gradient here so now that we calculate. Dz the derivative of Z with respect to X Here. We get 108 and lets. See why we get 108 Now, lets. Ask us ourselves the question. What is this, this this function? Z consider considering the input. What is that function and again? Tensorflow calculates it numerically behind. This is behind the scenes, using this this gradient tape and automatic differentiation and here because we’re using very simple numbers. We get a clean answer. So Z actually is X squared X to the fourth power evaluated at X equals 3 Y at X equals 3 because that’s the input. The input of the network is this value of Xs tensor and the value we know is 3 Now think about it for our, if so, we have our our value. Our X of Z is X to the fourth power. Then what is the derivative of that? So the derivative of Z, with respect to X is 4x to the 3rd power from calculus, right, that’s just to send a simple derivative rule for calculus, right, we just grab the exponent. We’ll bring it down as a multiplier in front, and then we just root use the power by one so that becomes 4 X 4 minus 1 so 4x to the third power, which we have here and we evaluated it at X equals 3 so 3 by 3 9 by 3 27 times 4 is 108 So that’s why the gradient is is that is the derivative of Z, with respect to X happens to be 108 again, numerically estimated and evaluated with respect to the to the input here now before taking the derivative just taking Z by itself, the value of Z by itself is X to the 4th, and you can do the reverse. You can do the integral to the reverse derivative so to speak here and get the DD and the integral right, but here it’s X to the 4th. Then we take the derivative of that, and then we get that, but the integral. Look, this is that it’s X to the to the 4th power so very, very interesting and we and we can see here. Kind of the combination of of calculus and linear algebra is very central and fundamental to deep learning and in neural networks here with with tension flow and now, so we calculate this DZ DX so the derivative, the output with respect to X, but we can also calculate an intermediate derivative here so dy, we can do D Y with respect to X now as well and with same thing, we just call this T. This tape variable here call the gradient method on it, and then we do the derivative of Y with respect to X, so we’re calculating both at the same both individually here and we can do that as well now because we said purchased any to true here in our gradient tape, we have to manually drop the reference to the tape so to make sure we clean up our our resources and we have some predictable flow, so that’s nice, so we have we have that, And now also you can also record control flow control flow means conditional statements in Python, so Aves and Wiles and and things like that in and the specific. The specifics here is not that important, but we just defined a simple function of two variables. X and Y initialize this output variable to 1 And we’re basically grabbing this this value of Y and saying for I in the range of Y, So let’s say this is 5 so we’re going to have for I. In the range of five, so zero one two, three four and were we have to sum conditional statements. So if if I is less than one and I is less than five, we want to to update that variable here with a multiplication with a tensor multiplication’s. Do you have not multiply the output? The output with that value of X the way that we passed in here. So the point really here is that we want to be able to record onto a tape. These conditional statement is condition these because some operations may be recorded or may only happen only within if a certain conditional statement of Boolean condition is met. So let’s see how then we just return that output for this function, so let’s see how that works, so we define a grad function here stands for a gradient and again it takes in two variables X and Y and now with. Tf dot gradient, a pass team again, we open up a context manager for the gradient to record those operations. Make sure we’re watching that input here. Tf dot watch, lets. Actually, go ahead and run it. Yep, everything is good and then so we have an output. Which has this function here, so these? X&Y, we’re going to pass them in here into this F function that we define and we’re going to test multiple values and we’re going to get different values, based on whether these boolean conditions are met or not because this operation here This TF dot multiplied operations may or may not be executed, depending on whether we hit, though we meet those boolean conditions and we try different values. So we have an initial tensor here. Do you have to convert to tensor from tensor flow? So we’re just grabbing a scalar here 2.0 and we’re just converting it to a tensor, so we’ve got a scalar and we convert it to an actual rank zero tensor property, a tensor of rank zero here two, and then we’re going to start to pass it, pass it in here and because we need, we need actual ten to TF tensors to actually compute these these these gradient. So those are our equity operate. The tensor operations are recorded, so we pass different values and we just convert to numpy right to just pick. The non pipe conversion goes from TF Tensor to an actual number as we saw and in the previous tutorial AI programming with tensors and feel free to check out that tutorial. If you haven’t already and we just do some asserts, make sure that we get what we expect and we can also do higher order gradients, which is pretty cool so operations inside of the gradient tape are recorded as we know, but now we can also have nested gradient tapes that we can record. So for example, here we can whit let’s initialize a tensor flow variable if you have that variable initialize it to one stored it in this variable called X and so now a Python variable in a tensor flow variable are different things, but here, so we just store in the city of dot variable, and by the way in the next video in the next tutorial will show you how you can actually convert very efficiently and serialized and just how you deal with the conversions of tensor flow variables. Whenever you want to go into production, so we have. Tf dot gradient APIs T. As before. But now we have a nested gradient tape with TF gradient a path state to make sure you have a different name, but you have two different Radian tapes and width, and this allows us to calculate higher order gradients, so within the second gradient tape within the first we have y equals assigns X Times X Times. X okay, so it cubes. Basically, this initial value. And now we want to compute the derivative dy/dx. So it’s going to be this T to this second to create a tape object. We couldn’t call the gradient method on it, so that allows us to calculate the derivative of Y with respect to X within here that within the second gradient. And then once we go outside of it once we come back to the first gradient tape context, we can now calculate B 2 Y DX 2 So in calculus terms, this means it’s the second derivative of Y with respect to X, and that’s that’s the first tape object each gradient and we’re just instead of passing in just a function we pass in the derivative of a function, which is itself a function DY/DX and we calculate the derivative of dydx with respect to X here, and we get some values for that as well and we can just do some assert statements. Make sure we have what we expect, and and we do so again that shows how you can calculate not only first derivatives, but second derivatives, third derivatives and so on so this gradient tape. API intention Flume gives us an incredible flexibility for calculating derivatives and partial derivatives of functions of any order, based on operations that we recorded test flow operations that we recorded on to a tape and recording all of those dependencies, those operations and calculating all those those derivatives and in actual practice when training a neural network, we want to calculate for calculating partial derivatives of a loss function of us, an output with respect to layers with receptive parameters in layers coming before that output layer in that network, all the way up to the input, and that’s how a neural network calculates that the difference between what we got what it got and what it should have gotten and and it’s just a neural network through the process of gradient descent and back propagation to converge on the optimal network that best Maps inputs to outputs. So thank you for watching if you have any questions. Thoughts comment below, and also if there’s something specific that you’d like me to cover, feel free to also put it in the end in the comments below, and if you haven’t subscribed to the Channel already go ahead and hit the subscribe button below, turn on notifications and you’ll get notified for future tutorials as well So I will see you next time.