Transcript:

Hey, geeks. Welcome to a new video today. I’m going to talk about bias and optimization, besides working a lot with our own multi-objective optimization. So protested burritos? I really enjoyed using bios. Opt the last weeks to solve single objective problems. I think to best possible, apply it. You really need to understand it in a nice way. That’s what I’m gonna do today. I’m gonna explain it to you. Step by step how it works that you can get most out of it. If you enjoyed the video, don’t forget to like and subscribe the channel, and if you have feedback or comments, just drop them. Below, let’s get started. Let’s start with an example for neural networks. If you have a neural network, you hardly know what is happening inside, but what you want to achieve is that you want to optimize, for example, the precision that it finds stuff. So what you need to do Normally Is, you have different hyper parameters like the learning rate or the batch size and many others that you can tune before you start the training to optimize the precision. If you look in more general on this problem, we just can call it. We have a black box problem where we don’t know 100 What is really happening inside? We have a lot of different input variables and what we try to do is to optimize the target value. So what’s coming out of the black box problem? And this is exactly where bison optimization is really well suited to help you to find most efficiently or really efficiently the best target solution how it works is a iterative process in this case, so we have five steps that are partially repeated iteratively. I’m going to explain every step afterwards with an example in detail, but yeah, first you have an initial sampling set, so you need to start with something, so you start the initial sampling. After having this sampling, you evaluate all the samplings out of the initial samplings with the black box problem, so for example, before you would have different training runs for neural networks to see how they perform based on their hyper parameters. Based on these results, you can start a training of a gaussian process. Regressor what this means in detail. We get to this later. Based on these results, you do a calculation of an acquisition function and you use this acquisition function the last step to identify the next to evaluate input, so you try to minimize that function and see which evaluation Am I going to do next in the black box problem? And then you start the process over and over again until a certain criterion is met, Let’s start with a simple example where we have a black box problem where we only have one input, which is allowed to be in a range between 0 and 10 and we have also one target that we want to optimize. In this case. I take a mathematical function just for you to see clearly the conditions between and that we can later see how good the optimization we’re, so we take the input and we multiply it with the sign of the input, So let’s take a look. I said we start with the initial sampling. Which is in this case. You have a lot of different possibilities to do. Initial samplings like Latin hypercube sampling guessing or just, Yeah, a random crop. So this is you don’t have one option that is mandatory. I just took a random crop here and have five samples, which, when I one After the other evaluate them have different target values like you can see here. Based on these values, we have now and the correct input values based on them. We can now train our gaussian process Regressor. It looks like this, so we have now two different indicators here. I’m going to explain them to you. So the difference in Gaussian probes and regressors is you don’t train like one regression function, but you rather train a set of a lot of different tuned regression functions with different kernels, different tails, and what you do is The blue line is the mean of all predictions of all functions, while the yellow area indicates the uncertainty of the model and is the standard deviation of all models and their predictions. So you can see here, obviously when we don’t have noise at all points when we have a sample, there’s no uncertainty while the more the points are out away from each other. The uncertainty rises in the next step. We now have our gaussian process regressor and we start to do our acquisition function. Um, what is an acquisition function? Basically, you can have a lot of different approaches. But it’s somehow a mathematical function, describing a gain or potential optimization volume by a function. In this case? I took a very common one. It’s called lower confidence, bound some know it as upper confidence bound what it says is the acquisition function means that we take the normal standard. We take the mean so the blue line and we take from that. The standard deviation times. Kappa Kappa at this place is a hyperparameter, so it just you want to see it later? Depending how I choose this cover, My optimization is going to be more locally focused or more global focused at this point. I just want to let you know that we talk you about a minimization problem. I forgot to tell this before, so our goal is to get a target as small as possible. I did the same acquisition function on the right taking a couple of 10 just for you to get a first feeling how it looks like what you can see like depending. How big my copper is the more my uncertainty gains in value and at this point, for example, we see that for both we more or less sample at the value between four and six, but still for the copper, that is ten. The value is more or less between five or six and for the copper with one, it is nearer to 4. Sampling now. These two values will lead us to a new point and we start our iteration, so we now have one more point that we evaluate, so we retrain our model and what you can now see really beautifully here on the left side, where we have the acquisition function with copper one. The next sample that we should do is still very close to already the one that we did now and where we have the copper 10 on the other side. You can see that it’s far away, so it’s at a totally new point because the uncertainties are much higher prioritized. So here we sample at 10 at the other one. We sample at one. Uh, not that one. Sorry at five and what you see here is now the model for copper 10 We have a really, really good point. Actually, there, um, But the model didn’t expect the point to be so low so the uncertainties rise and this process now is actually repeated iteratively, so it’s done one more time, and as you see, the best point that is found by Kappa One is more or less between four and six, and you see that the samples are getting very close to each other already while with a couple of ten, we still try to go in a wide variety. So now the next sample point would be between zero and two we can now iterate this process as long as we want or we can say. Okay, stop condition is I only have 20 runs because the training is expensive, or I want to converge in such a way, but in the end, yeah, this is up to you, and it’s probably an own topic or video to talk about this, but now what is interesting in the end, we can see it more or less already that in this time, Kappa with 10 was better, but just taking a look in the end on the real function, we see that the hyper parameter we choose is really mandatory or has a big impact. If we find the best point, or if we find just a locally best point and I’m going to do a video about hyper parameter tuning soon for exactly these optimization problems for now. I just hope that you enjoy, that’s it. That’s all you needed to know to start with. Biogen optimization. Wasn’t that hard, wasn’t it? If you want to get even deeper into some parts like acquisition functions, just drop in the comments below what you’re missing or where you want to go deep in and I make a video about it in General. Don’t forget to subscribe to stay always up to date with the topics that we are providing for you. I wish you a nice day and keep optimizing.