Transcript:

Hi, this is Jeff. Heaton, you know, when you’re using Cara’s you always specify these activation functions and loss functions. You can define your own, and there’s times that you’ll want to do this activation functions. This might be if you’re implementing and algorithm from a paper and the activation function being specified in that paper is very new and not yet supported by tensorflow. However, the more common case is the loss function where you want to really fine-tune what the optimizer is setting your weights towards so that it’s solving truly the exact problem that you are encountering you might want to wait certain cases higher than others as it’s looking through your training set. Cagle also is known to sometimes throw in very obscure loss functions or custom loss functions that are used to evaluate you when they determine your score and to align the loss function to that can be very advantageous. What’s just as interesting about creating a custom loss function or activation function in Caras is how it is actually done. Normally, you have to come up with a derivative. The calculus derivative, the gradient of the loss function or activation function that you’re doing. However, tensorflow has a very innovative way to take care of this automatically for you so long as you know what’s going on for the latest on my. Ai, course and projects click subscribe in the belt next to it to be notified of every new video activation functions and loss functions, both have to plug into the back propagation algorithm, which works largely by looking at the gradients, the derivatives of the loss function and everything that means into it and optimizing the weights of the neural network by taking the partial derivative of each weight. So lets. Look at how we used to do this. And then this will let you see why the way the tensor flow does. This is so very cool. One thing that I used to work on is a project called Incog. It was basically a Java and C-sharp neural network and some other machine learning frameworks as well jar file for jar and DLL file for Windows Incog was basically a Java and also a c-sharp machine learning framework that I developed back before tensorflow. Really, I started doing this in 2008 When deep learning was pretty, new and neural networks were not as widely supported. Let me show you how? I set this up and how most of the frameworks of this time period Set it up. And then you can see why it’s so cool the way that tensorflow does this now activation functions and loss functions. You’ve got to have a way to take the derivative of those, and if we look at the Java source code for incog, some of my longer-term subscribers will remember in Cocke and will look at the sigmoid activation function. The way I did. This was basically by passing in a vector of values and you would calculate essentially the activation function on that vector, but then any activation function that you would create to extend incog you would have to basically define one of these classes implement the activation function interface, and then you’d provide activation function, which is the actual activation function itself, and then the derivative and the derivative was passed in here, also by passing in the vector in tensorflow. When you define one of these, you don’t have to take the derivative yourself. Tensorflow has a really cool way of taking the derivatives automatically and we’ll see how that works. It’s understood it’s important to understand how that works through. Sometimes it doesn’t, and if you know how tensor flow is taking those automatic derivatives, you’re able to understand that and construct your activation and loss functions so that it works well with whatever trainer. You want to use it with so. I’ll give a link to this notebook up on the description that is below this video first. Let’s look at how you calculate things in tensor flow, and this is very much Tensor Flow 2.0 because this changed a decent amount up to 2.0 So this is the current stuff. If you try to run this on earlier versions of tensor flow, you’ll run into problems. This is how you do. Calculation in tensor Flow. Tensor flow is fundamentally a linear algebra package, and it provides some very useful capabilities for that first lets. Do something really simple. We are going to multiply two by five notice. These are tensorflow constant’s. A tensorflow constant is really just a numeric value that is not going to change a variable is something like weight that will be changed as you potentially optimize things, so go ahead and run that takes it a moment to import tensorflow and you’ll see that. I get a tensor back. It is a single number. You always want to look at the numpy. That is what it gave you back, so it’s two times Five is ten. If you want just the numpy number, then put that function in there. Put that if you want just the numpy number, then put the numpy function at the very end, and you get just a simple number returned. We can also do multiplies now. I’m doing the multiply a little bit differently. Here I am multiplying two vectors and we’re passing a numpy vectors, just showing that you can put not be directly in so a vector of 2 4 times 2 4 is 4 and 16 vector. You can also just pass in numbers. You have to be a little careful with this one. Though, because notice, this isn’t 32 that will often get you into trouble. Usually you want everything to be in floating point, or you’ll get other various errors that are not always obvious. Divide similar here. I’m doing an integer division, not everything while I pass 10 integers, a floating point comes back, Not everything supports integers as we’ll see in tensorflow you can also do exponents quite a few things. There are trig functions, all sorts of things built into it now. I’m using TF dot math there. You, it’s really kind of the older form of that. Usually you just want to do. Tf POW, you don’t really need to do TF math. They were all sort of moved into the main The main package. You can also do more complicated things, so this is a more complicated expression. This is actually logistic and it calculates it out 0.99 So this is very functional. Essentially, we are taking the negative of X then raising that to the power of E then adding one to it and then taking the reciprocal, essentially one divided by that whole thing. So this is how you calculate values and tensorflow, and this is how you’ll write your activation functions and there’s a whole host of functions available. You can do if statements you can do a variety of things, even loops and other things, so now that we see how to calculate with this, This is what you will use to actually build your activation and loss functions. Let’s see how we can do some basic calculus on this. So how do we take derivatives? There’s really three primary ways. There is symbolic differentiation. This is what you probably learned when you took your first calculus class. All these neat things where they give you. These tables and the derivative of a constant is zero. The derivative of this is an important one. This is one that we will use. As an example example, the derivative of X to the power of N is equal to input in the front is a coefficient and then the power minus one so X to the power two would become just X If you took the derivative of it, this is called the power rule and we will use this in as a simple example of how to take the derivative of something and then we’ll get a lot more complicated. There is numeric differentiation. This is often used to double-check, things and machine learning. That’s what I primarily used it for. It is an algorithm whereby you essentially calculate this is fundamentally it here. You take the function and you essentially take the function at a location X and plus H subtract those so see the difference between those divide it by H And then you take the limit of that as H approaches zero. Obviously you can’t take it at zero because be division by zero. So this is a great way to estimate a derivative and this is used sometimes just as a double check. If you look at incog what? I showed you earlier when I took those derivatives of various things by hand, I would use symbolic differentiation to to do that, and usually I would have to do a bit of code. So most of those older neural networks they would use chain rule over and over and over again, you would put the chain rule into a loop and derive some of these by or differentiate some of these by hand, and then you would use numeric differentiation just to check to see that your code is really doing that symbolic differentiation curve. Now, symbolic differentiation is really a pain in the neck to write computer programs. That can actually do that. There are packages available for Python that will do that and will probably go through one of those in a future video. I’ll probably do a video where I talk about all three of these and just how you literally take these derivatives in different ways, the problem, though, with the numeric differentiation, the finite difference like we just saw is it’s it’s an accurate, and you don’t get a really good result. You get a good estimate of your derivative, but it’s it’s not anything you’d probably want to use for a really really precise machine learning that being said. I have used it in that capability before or that capacity before and it works. Okay, but for but for anything real, you’ll probably want to do either. By hand, symbolic or automatic differentiation and automatic is what tensorflow uses. This essentially just keeps a log of what you. PD has a pretty good description of it. Basically, what it’s doing is, it’s keeping a log of every calculation that you do in your function and then using the chain rule to sort of unwind that because it realizes that just about any function is going to break down to addition, subtraction multiplication division and it can simply chain rule those and it even has other functions like the trig functions. I won’t get into exactly how that works, but it does work very, very well, so let’s look at this. We’re going to take the derivative of X Square that that’s a very easy, one symbolic derivative. That is the rule that I just showed you. If you want to take this for a specific number like, say, 4 4 to the power of 2 is 16 If we take the derivative of that, it’s just going to be 2 times 4 or 8 so that’s taking the derivative at a single point. This shows how you would do it actually in code, so we’re going to take that same number 4 This is where we use the gradient tape, and this is how we log. All these calculations is we’re going through this and allow ourselves to actually take the derivative. You always have to tell it what you’re watching this is. What you’re taking the derivative with respect to so this is going to be used to take the derivative with respect to X. Now you can watch multiple ones. You just call it multiple times and when you’re coding your activation function later on, you don’t actually put the watch in here or even this. This is all done in the background, so I’m just showing you How tensor flow actually takes a derivative in the background so that you can understand when it goes awry. How to how to fix that because you might use something accidentally. That’s not differentiable and then you have a problem. I don’t really need this. Reduce some. I put that in there. I mean, leave the example that I pulled this from had that in there, but this is if I put something that was not a constant in here, then it would, it would basically reduce it to a sum, so if I put a vector or a tensor in there, then that would that would handle that it would still be a constant, but it would be a constant tensor and then we just multiply Y by Y, so I’ll actually remove that part since it’s it’s not necessary for this example, then to take the the derivative, we want to take the derivative of Z with respect to X, We just call a gradient and we can print that out and now if we run this there, it shows you, basically the output. It’s a tensor of eight, which matches what I did by hand. It’s always good when you can match what you did by hand so now let’s try to do a much more complicated function or not tremendously more complicated, but we’ll do the logistic function. This is essentially the sigmoid activation function that you see in neural networks, and this is what it looks like. It’s basically just the reciprocal of one plus e to the power of negative X here. I essentially write it in tensorflow so notice the divide. You always take the sort of the last thing that you would have to do in this whole thing is the divide. So the last thing you do comes first, Then you the one there that’s the one up there. I’m going to add 1 1 plus exp e to the power of Negative X. So this is writing it all functional so that it fits into tensorflow, and then we do exactly the same as before. We’re doing it with respect to X, and I’m passing in five, so I run this and I print out Essentially, two things I print out Y so that you can see what the logistic is actually calculating it as and so, 0.99 is the result from the logistic and then the derivative is zero points, 0:06 sex When taken at the five that we pass in so lets. Check this, lets. Make sure that this is actually right. So I am going to run this. This is just this written in Python, and it calculates it. It gets that same 0.99 Now! Let’s take the derivative of this symbolic. I could step you through how to do that, but I like to use Wolfram. If you ever looked at Wolfram, this is great, it’ll. Take the derivative of just about anything. You get rid of the math in front of that. Bs that is Python. And now it can. It’s already calculating it. But I’m gonna put an X in four there. And now we take the derivative of that, and we scroll down to where the derivative is, which is right here E to the power of X Negative E to the power of X over X squared, plus one and that whole thing squared. We can take that put that into here and calculate it as well now. I did when I did this. Actually, on my own eyes, pausing for a second to there to see where my negative went, but I’m basically putting the negative there, So that shows you essentially how to, and then we write essentially the same thing. The negatives are a little bit different locations than mine, but it comes out to the same, and we run that. And you see the result there. You could also use this to say, take second derivatives and beyond, so you just nest it to take additional higher-order derivatives. Now let’s look at this and see how to create a custom loss function. Here I am essentially giving you. The root-mean-square loss function, so it’s essentially mean square error with a square root at the very top of it. Tensorflow does not provide this for you because it’s kind of pointless to provide it for you because it’s just mean square error, which it does provide with the square root taken. And here you can see. I’ve basically taken the root mean square function and let me just give you The root mean square error. So it’s essentially taking the square root of the sum of these squares divided by T. Which is how many how many numbers are. Actually, there’s somewhat like an average somewhat like showing you your average error without the sign because the squares eliminate the signs, so Ill. Go ahead and define it here if you want to. I’m not going to walk you through every single one of these. The thing. The only thing that happened with this that cost me a little bit of trouble implementing is. I do have to cast this. So this is the this is T. Essentially, this is the size of the sample that comes back is an integer and that throws all kinds of havoc into here, so I convert it to a floating point. Always be aware of that when you’re creating these, so I’m going to run it to find my function. Now the I called. It mean prediction, so basically copy that there, and you’ll put it in as your loss function, and this is an example for my class. Basically, the auto miles-per-gallon regression, a fairly simple neural network, will run it and it trains now. What you’re seeing for loss now is root mean square error, so this is it trains to the point that we’re correct by about plus or minus three point seven miles per gallon. This is showing you just how to create your own loss function. And if you wanted to get a lot more customized on what you’re optimizing to, this is a very useful technique. I use this actually frequently with with neural networks. I don’t use this one ever, really. If you did truly want to create your own activation function you could. I chose an activation function that tensorflow does not have the Elliott activation function. It’s a pretty old one. I don’t think it’s got too much modern use in the days of rally, but just to show you something that was not there. This is meant to be a computationally efficient, meaning it doesn’t use E to the power of X representation of the sigmoid or the tangent, the hyperbolic tangent activation functions that used to be really really popular. So I implement it here and we run it and to put it in Im. Basically putting it in here and here just where I would put. In the rectified linear unit does not perform as well as the rectified linear unit does decrease and gets to get stuck right about in the 60s Maybe further optimization would allow me to get this activation function type to work with it. But this is really just showing you the motions that you go through if you wanted to create your own activation function like if you’ve read about a new one in a paper and it’s not supported and tensorflow, and you wanted to add it, or if you’re really going at it, I wanted to truly create your own activation function and then write a paper about it. This is the stuff that you would go through of these two technologies. I think you’ll definitely make use of the loss function customization more. This can be very useful in Kaggle where you’ve got a very specific evaluation function that you want to align your objective function to this can squeeze out a couple of additional points, which means all the difference in Kaggle. Also, this can really let you fine-tune your objective to the actual business problem that you’re trying to solve. You can weight different things differently. You can really fine-tune how this optimization is actually going for the given problem. Thank you for watching this video and don’t forget to subscribe.