Transcript:

What’s up, everybody? And welcome to part 13 of my basics of deep learning series in the previous video, we left off at the point where we’ve written down all the equations that we need to implement back-propagation algorithm in code and this what we’re going to do now, so let’s get over to the tubular notebook and here. I’ve already made some preparations, so I floated in the Iris Flower data set, and I’ve selected the flowers that we’ve been working with all the time all at the whole time, and then I’ve also created the Labor Matrix and our input Matrix. Then I have defined the sigmoid function and here. I’ve set some parameters, so I’ve set the learning rate to zero point one. So this learning right here. And then I’ve defined Capital N, which comes from our mean squared error function and which is simply the number of elements that we have in our labor matrix, which corresponds obviously to the number of elements that we have in our then output matrix and these elements is just a number of examples times the number of nodes and then here. I’ve also defined the number of nodes that we have in each layer so in the input layer, we have four nodes, didn’t layout – and in the output layer, We have three nodes and then here. I’ve initialized some random rates so weight matrix one is going to be a four by two matrix because it goes from four nodes to 2n right Matrix Two is going to be a two by three matrix because it goes from those two nodes to those three notes and then here. I’ve written down the feed-forward algorithm. So if I run this, then I will create the output layer outputs and those ten of those ten in the labels we can then calculate the mean squared error and here the me to add our ostensibly 0.13 like we’ve had in this diagram so now we can start to implement the backpropagation algorithm, so lets. Make a comment here. I’ll pick propagation and to implement that We actually just need to implement those equations here so first we’re gonna say output layer arrow equals the output layer outputs, – the labor matrix, so output layer outputs minus Y then in the next step output layer data. This equals the output layer arrow times, the output layer outputs times 1 minus the output layer outputs, and if those now we can calculate the update the weight matrix to update. So let’s make another comment for that, and you’re gonna write weight update and then gonna say weights to update equals this matrix multiplication here, so we’re gonna say MP dot hidden layer outputs, trans post and then the output layer data understand we divided by capital and and then to actually execute this weight update. We then have to do this calculation here, so we’re gonna say weights 2 and let’s also say here. Weights 2 and then we’re gonna say weights 2 equals weights to wait 2 – learning rate times our weights to update, so that’s the first update, then here going through with the background question. You’re gonna then say. Hidden layer, hidden layer arrow equals MP dot of output, layer data and weight matrix two transposed and then hidden layer Delta equals hidden layer our times, the hidden layer outputs times 1 minus the hidden they are outputs and with that then we can create our weight matrix two updates, so I’m gonna say weights. One update equals MP dot of our input matrix transposed and hidden layer Delta. Just n we divide by n and then here we can say weights 1 equals weights 1 minus the learning rate times weights, 1 update, and now, if we run this, then we should update our weight matrices and then we should be able to make better predictions, so the output layer outputs should be more closer to our targets to our labels. So if you run the feed-forward algorithm again this time, if the updated weights then here, the means whatever should be somewhat lower and let’s see if that’s the case, and as you can see, it is somewhat long, but the improvement is only very small, which means that we have to run the feed forward and back propagation algorithm algorithms many more times, so the question then is. How often should we run them? And one approach might be to say, okay. Let’s run them until the mean Squared error is zero, but this doesn’t really work because as we’ve seen in one of the previous videos, one characteristic of the gradient send algorithm is that we will only get an approximation for the minimum of the function, so we might never reach a mean squared error of 0 and, furthermore, as we get closer and closer to the minimum, the steps that we take, get smaller and smaller, so in this area here, we are very close to the minimum. We are going to take many steps, which might take a long time to run, but the mean square error then wouldn’t change that much and this begs the question. When should we stop training our neural net and this will be the topic of the upcoming video, so thanks for watching and hopefully. I will see you in the next video.