Keras Metrics Accuracy | 135 – A Quick Introduction To Metrics In Deep Learning. (keras & Tensorflow)


Subscribe Here





135 - A Quick Introduction To Metrics In Deep Learning. (keras & Tensorflow)


Hey, guys, you’re watching. Python tutorial videos on my Youtube channel Python from across the best in the last couple of tutorials, we talked about loss functions and optimizers, and in this tutorial, Let’s talk about metrics that we use in deep learning again. I’m not going to cover a lot of metrics, but the most common ones that we use. And when I say metrics, this is exactly what I mean when you do model. That compile loss OPTIMIZER is what we covered in the last two tutorials and today in this video. I’m going to talk about metrics, okay, and in this example, we’re using accuracy now it’s it’s self evident. That’s why this will be a rather short video, but in Kari’s, especially if you are using this. It allows to monitor certain metrics. Okay, as we are training, the model now metrics are recorded after each epoch during the training process and loss function can also be used as a metric, for example for regression problems. You can use mean squared error or mean, squared logarithmic error, one of these, and that can also be a metric, okay, because you’re after all using a loss function or trying to minimize the loss function so by monitoring loss function, you can actually use that as one of your metrics. Okay now! If you use a validation data set, for example, if you have a training data set and a validation data set, then whatever the metric like accuracy, for example, can be calculated for both of these. So you can see. Are you overfitting? Because if the accuracy is very high for your training data set but low for validation. That means you’re kind of overfitting. You’re your model, so then you can take appropriate actions okay now. Some of the metrics again. I listed out here, Regression metrics and and in the parentheses here. This is how you kind of define it as part of your Kerris model that compiled MSE for mean squared error mean, absolute error, absolute percentage error and most of these are self evident cosine proximity means squared And although I should mention for most regression matrix, its MSE that’s commonly used now classification metrics. I didn’t put anything in the parentheses because that would make my slide very R That would make the font very small, but think of binary accuracy. The way you define that in your model That compiled is just binary underscore accuracy. All lowercase, okay, so binary categorical accuracy, spark and sparse and so on and to be frank with you most of the time. I have defined it just by typing accuracy. Just like I did right here. Okay, and this is a built-in function That’s commonly used in Kerris, especially now I would like to show you a few lines of code. Applying both, for example, mean squared error. Actually, let’s plot a few of these and also for classification metrics. Okay, so let’s jump into our spider. Ide and here is a few lines of code now. So this part I am trying to demonstrate the regression matrix and all I’m doing is just a very simple straight line. Create an array of values 1 2 3 4 5 all the way to 10 that’s it. This is my input data. If you want, you can just test it on any type of values and then my model is basically is. I’m using sequential method to add my model and the first layer is dense and the second layer is also dance with a value of 1 because it’s just outputting a single value. Okay, and I’m compiling this using a mean squared error again, a regression type of problem, so mean, squared error. My optimizer is Adam. We talked about what it is again. In the last tutorial and the metrics to monitor, you can give multiple by the way. So I’m defining them as Mse Ma Ma PE and so on and and then we were just doing Model Dot fit in the next tutorial. I’m going to talk about what our batch size means. OK, so let’s save that for now, but you probably know what epoch C is again. I’m going to talk about it pretty soon in the next tutorial, but this is basically the number of times you are actually doing this going through all of the data, at least once forward in once backward in your back prop, if it is a back propagation so and we are just doing fitting X to X itself, OK, the same data, and then I’m just plotting, mean squared error and everything. Remember here, I’m just giving these as MSE, but I’m printing this as mean squared error. Now I believe if you you a newer version of Python, then it may throw some error, saying that. Hey, mean, squared error. You never told me what it is then. Change this to MSE. Okay, just giving you a heads up. Okay, let’s go ahead and plot all of these so now. How many epochs did We say 500 epochs that they should actually go through all of these 500 epochs? It’s just very simple data set and here. I should have added a label, but the first the green one is. I thought I added a label. Sorry about that. The green one is your mean squared error or plot one at a time. I’m absolutely sorry about just doing this one, so let’s go ahead and plot. This one first, you see, this is your mean squared error and mean, absolute error. It shows up in the same color. Is this one so it’s actually along the same? Reince mean square error is up here now. What am I trying to prove by showing you? This nothing right now put real data. I’ll share this code. Go ahead and change your X to whatever your real X value is and then just keep track of these metrics and see how these metrics kind of compare to each other. Okay, so that’s the point over there, and now this bottom part again. I got this code just by browsing, reading some blogs in terms of generating this synthetic code. So here for the classification problem. I do not want to use a simple, you know, like X and y I can. But in this case, I’m actually generating some artificial data by using psychic learns make blob’s datasets. Okay, with this, you can actually tell how many samples you want. And how many clusters or centers around which you’re generating these and how many features and what is the cluster standard deviation so by keeping a lower standard deviation, all of these clusters are kind of overlapping, making this a challenging problem, which is okay again, a good way of testing these things. So let me comment this part out. I just wanted to, I wrote these extra lines to show you how it looks like, but this is the actual part of the code. We are trying to use, which is basically the same thing, Except I’m encoding or one heart encoding or why use two categorical why why again because we need to convert them from integers to some sort of a labels or binary classes. So that’s what I’m trying to use this and model again. Keep it very simple. I’m using dense layers of 50 and then input dimensions is 2 Right, we have x and y and my activation function. I’m going to use Ray. Lu and you need to initialize your kernel. I’m going to use he uniform, which is very common and the output is going to be 3 outputs. Because we have three clusters here, OK? And the activation function is again. Try try. This is a great piece of code to try different activations, for example, and see how the output looks like. So please try to educate yourself by writing little snippets of code like this or copying little snippets of code like this if you can find it online. I’m gonna share this with you. So you don’t have to search for it so now. I’m going to do model Dot compile. When I found this snippets of code online, they were originally using a stochastic gradient descent with a learning rate of 0.01 in momentum of 0.9 Remember the momentum? I talked about in my previous tutorial. Here you can actually change the momentum and see how things are optimising. OK, so that’s why. I’m leaving this as commented. If you want, you can change it now. Where would that go as part of your optimizer here? I’m using atom. But instead of atom, just type opt will obviously remove the quotes but type opt. And you’re all set, OK? And since this is a multi-clas’s problem, I’m going to use categorical cross entropy, OK? And the metrics. I’m going to track our accuracy. The whole point of this tutorial is just metrics in this case. I’m just plotting the accuracy again. The accuracy needs to improve as the epochs goes by. So let’s actually look at that. And lets actually yeah. I think that should that should pretty much do and model at fit is we are doing it for 100 bucks and let’s take a batch size of 32 because we have 1,000 samples. Batch size 32 means 1,000 divided by 32 that many iterations again. Stay tuned for the next tutorial where I talk about it this and then let’s just go ahead and plot the accuracy. Okay, so let’s run these lines and see how the output looks like. Okay there. You go, it’s done so now as you can see. The accuracy was pretty bad, and then it started improving, started, improving, started improving. And then it about, like, for example, twenty five ish or so? Epoc sore 30 epochs. Now it’s flattening out, right, so no matter how much I spend In terms of computing time, I’ll probably asymptotically, you know, reach a value of about 83 percent, You see, eighty two point six percent or so now you can improve this, You know, using more training data or, you know, changing your learning rate, for example, and and all that stuff. This is fun stuff if you are really trying to learn how the learning rate is going to affect how momentum is going to affect. So use your. I’ll leave this to you. Use stochastic gradient descent or some other function. I mean, even for OPTIMIZER. Adam, try to define your learning rates. Try to change the default learning rate and see how things perform okay so. I hope you learned something again. As usual from this tutorial and please subscribe to this channel in the next tutorial, let’s talk about one more aspect of this, which is batch size and epochs. Okay, so thank you very much and let’s meet again in the next tutorial.

Wide Resnet | Wide Resnet Explained!

Transcript: [MUSIC] This video will explain the wide residual networks otherwise known as wide ResNet and then frequently abbreviated in papers as wrn - the number of layers - the widening factor, so the headline from this paper is that a simple 16 layer wide. Resnet...

read more