Transcript:

In order to effectively develop deep neural networks, not only parameters, The hyperparameters should also be well configured, Lets. Take a look at what hyperparameter’s are. Your models, hyperparameters are W and B. And there is something to tell the learning algorithm, For example, Learning rate α Setting Α determines how the parameter will advance. The number of iterations of gradient descent to perform can also be a hyperparameter. There are also other numbers that need to be set in the learning algorithm Number of hidden layers. It’s called the capital letter L Or the number of hidden units Or the number of hidden units. For example, n^[1] n^[2] Etc. There is also a selection of activation functions. There is also a selection of activation functions, Relu, TANH, Sigmoid etc, Especially it is a function of the hidden layer, ReLU, TanH Sigmoid etc is a function of the hidden layer in particular. So all of this is what you need to tell the learning algorithm. So these parameters ultimately control parameters, W and B. Therefore, the parameters below are called hyperparameters. Because the learning rate the number of iterations the number of hidden layers, etc, Because it is a parameter that controls W and B. Therefore, it is called a hyperparameter. Because it determines the final values of the final parameters W and B. In fact, deep learning has a wide variety of hyperparameters. We will look at other hyperparameters in a later lesson. Momentum term mini-batch size. There will be various types of normalization parameters, etc. Don’t worry if you don’t know. The terms below Will be covered in the second lecture. Contrary to the early days of machine learning, deep learning has a lot of hyperparameter’s. Let’’s call the learning rate α a hyperparameter, not a parameter. In the early days of machine learning where there weren’t so many hyperparameters. Most people called α a parameter. Technically, α has to be the right parameter. However, it is precisely the parameter that determines the real parameter. Therefore, α the number of repetitions, etc will be called hyperparameters. When training deep networks for your application, You will find that there are many possible settings for hyperparameters that need to be tried. Applying deep learning today is a very empirical process. For example, when you have an idea that best fits your learning rate. For example, when you have an idea that best fits your learning rate. You can try with α as 0.01 So you can implement this and try and see how it works based on the results. You can change your mind and change your learning rate to 0.05 If you are not sure what value of the learning rate to use Put one value in the learning rate, α and report. Check that the cost function. J goes down like this. Then try a larger value for the learning rate α. Check that the cost function goes up and emanates. Then try another version and go down very rapidly. We see that it converges to a higher value. Try another version to check the cost function J. After trying to set the value like this. This α value makes learning pretty fast. We can see that it converges to the lower cost function J. Then we will use this α value. As we saw in the previous slide, there are various hyperparameters. When starting a new application, the most suitable value of the hyperparameter is selected. It’’s very difficult to know exactly in advance. So the method I use often is to try different values and. Try some values along this cycle. For example, try 5 hidden layers for this number of hidden units, Implement it. Make sure it works and repeat this process. The title of this slide is Applying Deep. Learning is a very empirical process. What is an empirical process. Is that you try a lot and see if it works? Another effect I’ve seen is Today’s deep learning ranges from computer vision to speech recognition to natural language processing. Is that it applies to a wide range of problems. Applies a lot to structured data applications. Like online advertising, Web searches or product recommendations. Also, the first thing that I felt was that no matter. What kind of researcher? Trying different things in some cases for hyperparameter’s Intuition continues, and in some cases, it wasn’t. So my advice, especially for those starting a new problem, is Is to try a range of values and. See what works? We will see how to try a range of values in a systematic way. Even if you’ve been working on one application for a second long time In the case of online advertising, for example, progress on the matter continues The most suitable values, for learning rate and number of hidden units are likely to change as well. Even if the system is set to the most suitable value of the hyperparameter, The value may change a year from now. Because the computing infrastructure like the type of CPU or GPU can change. But to tell you the rule of thumb. Every few months, if you deal with a problem for a long time, that is for years, Try a few values, for hyperparameters and double check if there is a better value. Then you will slowly gain the intuition of finding the hyperparameter that best suits your problem. That you have to try every value for this hyperparameter. I know it’s an unsatisfactory part of deep learning. This is just an era. Deep learning research continues to evolve and over time Will be able to give you better guidance on finding the most suitable hyperparameters, However, because CPU GPU network and data are all changing. In some cases, the guidelines may not apply. In that case, you have to try and evaluate various values. And you have to choose a value. That works for that problem. So it was a brief discussion about hyperparameters. In the second lecture, I will propose a method of systematically exploring the hyperparameter space, However. I have learned very little about the tools needed to do programming examples. Before that, let me share one more idea about what deep learning and the human brain have to do with it.