Transcript:
What’s gone, guys in this video? We are going to be experimenting with imagenet models in hi torch. So in the last video, we went over how to load data and a little bit about pre-processing and now in this video, what we’re going to do is we’re going to use these image named models and feed in images that it hasn’t seen and see how well these image new models have generalized. Alright, so let’s just go over some terminology and first we’ll talk about what image net models are so image. Net models are models that have been trained on the image net data set, so the image net data set is a huge collection of images, but for this particular case. I think they only used 1 million images out of that huge data set and used a thousand different categories so image Net was hosting a competition every year called the imaging A competition and lots of big companies were participating in it such as Google, Microsoft. And I think face book as well. So they participate in this competition and the winners of the competition, usually open source the model architecture, as well as the weights weights, meaning the Train model itself, so this competition was going on every year, but I think it stopped last year because some of these models were getting too good and they needed a different data set. So these imaging and models are basically models that participated in that competition or they’ve just been trained on that same subset, a million images in a thousand categories. So even after the competition stopped, researchers have continued to work or have continued to train models on this image, net data set and a lot of these train models are open sourced image net models. So that’s why Pike torch actually has a collection of these image net models. So what we’re going to do is we’re going to access a couple of them, and then we will download a couple of images. I actually did. I already downloaded a couple of images and we’ll see how these models do on images it hasn’t seen so this is known as inference. So basically, you have your training portion where you’re training the model on a set of images and you have a validation set where you continuously testing. How well your model is generalizing and then after you have your model, you want to be able to deploy it and use it for real-world problems, so in this case, we’re going to see how well this model is going to classify images that it hasn’t seen and this process of using your model to try to classify images that it hasn’t seen is known as inference so with that terminology under our belt. Let’s take a look at the code and get started. So the first thing I did was. I actually got a JSON file. I downloaded it and this. Json file contains all of the labels. So remember, I said, there’s a thousand classes. These imaging it models can classify so this. Json file has all of those labels, so what we’re going to do is open the JSON file and save it to the label’s variable now we’re going to look at the length of labels latkes, and it should be a thousand because the image net data set that these models were trained on have a thousand labels so thousand labels, and now we’re going to look at the first five of the keys, which is just zero one two three four now we’re going to look at the values, so each key represents the label number and the values are going to represent the label names. So we have a tent, a goldfish, great, white shark tiger shark hammerhead. So these are their first five values. Let’s take a look at last five values. All right, so we have Earth Star Hen-of-the-wood’s Bull eat. I’m not sure I pronounce that tier toilet tissue. I’m assuming that’s tissue paper anyway. So these are the labels. Now, what we’re going to do is we’re going to import from torch region to models, so models consists of all of the computer vision models that PI torch comes with, all right, so we’re going to import models and we’re going to import torch so now we can look at most of the default models that come with torch vision. So if I run this, you’ll see Alex. Net dense net inception and then each of the dense net has actually a number accompanying it, and these numbers represent the number of layers. And as you can see, there’s a quite a few imaging models, but there’s actually a lot more and more and more models are coming out every year so this is just a very small subset of all the different models that have come out the past few years and you’ll notice these first four or five. Alex and intense Annette Inception are capitalized and sort of different from these. These are the most popular implementation of these. So when you see dense net, you’ll look at the most popular implementation of dense net inception 3 will be the most popular implementation inception, same with ResNet squeezing in etc. So this is just to make it easier just in case you’re very new to deep learning, and you don’t know which model to choose having all these could be confusing, so but face book or people. Pike torch did is they picked the most popular implementations and they called it dense net, Alex, net, etc. Alright, so these are models and we’re going to go with resonant 18 so resident, 18 was only trained on 18 layers And the way we initialize it is. Resnet 18 just like this now. This is going to be a random initialized weights so basically, it’s not going to use the image net weights that it was trained on and we want the image net weights. So with Pi Torch. What you can do is you can call the model and just get the architecture without the weights, so basically you. You’ll be starting from scratch and this is. If you want to train your model on a data set from scratch and the other thing you can do is you can set Pre-trained equals true. That means it’s going to use the weights that it learned from training on the image net data set, so this first version models 18 without the pre train is just the model architecture and the weights are initialized randomly from scratch, basically, so it’s just basically taking the model but acting as if it wasn’t trained on imagenet and starting from scratch and the bottom one, it’s actually taking the model with the weights after was trained on image net, so we’ll be working with the second one because we want to see how the imaging models do on images it hasn’t seen, which is inference, all right. So now, of course, what we need to do Is we need to set our pre-processing or set up our pre-processing. We went over this in the last video. The rezident models they are mostly trained on – 24 – 24 So that’s what we will resize our models to. Then we have transfer to tensor trans transform to normalize now some of these models. I think both of them were trained on center crop as well so the way they were resized was they were first sent a cropped and then they were resized and for now. I guess we’ll just skip that and doing transfer learning. Maybe I’ll use that as well. They sent a car portion. I forgot to include that, but shouldn’t make much of a difference. So these are the transforms. We went over this in the last video and I’ll just run. This oops, actually. I need to run this above cell. Okay, now we will run this and get the transforms ready, all right, so now we will use three different images. I have a horse, a duck and a cat. So these are three images. I randomly download it from the net and we will look at each of the images, so I use pill to open these images and we have a horse, a duck and a catch, all right, so now we’re going to pre-proces’s these images and we’re just using the pre process function that we created earlier and now after running this, we will have three pre-processed images. If you look at the type now there are type tensors. The shape is the channel’s first, and then the height and width. So the next thing we need to do is unscrews. Basically, all of these images need to be in four dimensions because the first dimension will represent batch, so all of these models are expecting images to be passed in as batches. So so currently. If you look at torch size, you’ll see its channel height and width by using unscrew’s image at the zeroth offset so basically, this is the first index second third. So if we unscrews at the zero offset so here, you will see 0 1 2 we want to unscrews at the 0 meaning. Add an extra dimension if we run this. That’s exactly what o2 all right, so as you can see. Now, we’ve added an extra dimension at the zeroth offset so now we have batches, channel, height and width, so this is the sort of dimension or the shape that your images need to be in, all right, so the next thing we’ll do, and this is a very important step. Is you need to set your model to the eval mode, so there’s a training mode in eval mode and in training mode? Your model will behave a little differently than in eval mode, so whenever you’re using inference which is not trained but to test your model on images always remember to such a model to the eval mode now, with that said what we’re going to do is run the model and each of the images and we get back probabilities, so we run that, and that’s all done now and we’ll look at the shape, so you’ll see problems that shape is just the first line you get back a torch, that’s of size one and a thousand, so we have a thousand probabilities all adding up to one so basically, that’s how these models work they give you a probability of each of the labels and the highest probability is usually associated with what the model thought that image was so what we’re do now is we’re going to run torch time acts on the probabilities, and this one is just representing the first dimension, which is these tiles in here now to zero dimension, but the one to first dimension. So we want to get the max out of the a thousand different labels, so we get back a value in in index, so the value is going to represent the value and the index is going to represent the index of the thousand by different categories. So now what we’re going to do is just print it out so to print the index first, we have to convert it from torch to numpy so right now it’s a tensor, which only Pi Torch understands but to be able to print out the actual value. We need to convert it to an umpire, so we’ll do that. We’ll convert it to numpy and then we’ll convert it into an int, but basically, these are the values for 63 for 63 Now that’s a little strange. I don’t know why we’re getting the same number. Let’s see image 1 image 2 H. 3 you. Yeah, we’re getting the same label back for two different images. Not sure why that’s the case, but let’s see we run this. Okay, so let’s go back up and let’s see. I must have missed. I’m calling the Pre-trained version. All right, so I forgot to convert the Model. T eval, all right, so as you can see, there’s a huge difference if I don’t convert the model to eval so here we go, so we have 351 and 146 so that basically, the model is thinking that the first image is is labeled 351 and the second image is labeled 146 earlier. These will all buckets, But if I run this now, you’ll see it thinks T. The first image is a harvest harvest. The second image is an albatross and the third image is a Persian cat, so these labels are very detailed. It’s not just a cat. It thinks it’s a Persian cat, and it seems to be correct. I don’t know what the first one is hard. A BIST are the best. Let’s see so it’s. It looks like an animal, so the second one is an albatross, which is sort of a duck type of thing. Let’s see, yeah, so it’s a large seabird related to the. I don’t know, no pronounce that, but basically alright, so you got the second one right, and the third one, the Persian Cat. So this model got this one completely wrong. Got the second one right and got the third one, correct? Alright, so not only that what we can do is we can look at the top five probabilities, so right, now we’re looking at the Max Probability, which is just the first value. Would it’s be sure stuff now? Let’s take a look at the top five and what it thinks is the what are the top five choices. So the first choice was hard a best for the first image, but let’s look at the first five choices for the first image, which is drops props. If I run this, We get top K so returns five different indices and what I’m going to do is as you can see. These are the indices we’re going to basically just run these couple of lines, which is first. Let’s look at the shape. It’s a numpy array, so we’re going to squeeze it to basically get rid of the 5 V 1 so this will be toward size 5 so basically turn into a list. The squeeze is just turning into a list instead of an array and then we convert it to an UMP. I list array so as you can see, It’s a one-dimensional instead of here being two-dimensional and then we’re just going to iterate through it. These last few lines are basically just converting it in a format That’s easy to iterate through and we’re going to iterate and look at the labels, so we have a harder best. Sorell, Arabian, Camel Impala. And this is LA. So I am not sure what this is. Just look at this, so this seems like some kind of plant, which is a huge mistake. Number Three is a camel, which I can see the horse Sort of could look like a camel in Paula. I’m not sure what these are. So these are the first five choices, but the first choice being a hearted hardest. Alright, so that was a small model, lets. Go back up here. So that’s one of the models, which is trained on only 18 layers, so try a large model, So what I’m going to do is I’m going to keep the name the same because we’re using this resonant 80 model throughout these cells, but I’m going to assign it to a different model resident 101 so this resident 101 actually was trained on 101 images. So let’s see if this does any better, so this is actually a more accurate model pocket. I guess I have to download the image. I download the model first, okay, so the model has been downloaded. Let me just run that again. Just in case okay, so we will. I guess we don’t really need to do this, but I’ll just run through all these. Oops, that’s a strange error, okay. I don’t know why that hurricane, but basically be converted into a tensor and this is just pre-processing. All right now. We’re just going to run it into the eval mode. Remember, this is the hundred fifty layer under the one layer. Yeah, one hundred and one layer president model. It just named resident eighteen, so we’ll run. This run this. Alright, so you see, it took a little longer because the models are bigger, All right, Surprisingly, it gives up the same value, except for the last one. It’s a tabby, so first of all, let’s look at what a tabby is, all right, So this actually looks more realistic than a Persian cat. Yeah, so first, it does not look like a Persian cat. It actually looks like a teddy, so this is. This has gotten a little more accurate, but the first one is still a hardy beast or harder best. Now, let’s look at the top five and see if anything has changed. All right, run this right so now. Some of them are changed. It’s still surreal for number two vizla as moved to number three, which is. I guess the same color, but it’s a dog. This is a hound and I’m not sure what a Rhodesian Ridgeback is. Yeah, so I guess it’s just basing it on the same color. So yeah, so it’s not perfect. But so there are other models that actually have a higher training percentage compared to the ones we’ve been working with, so we’ve only looked at Resnet and resident 150 and actually I could just throw in. Let me see any dense net 201 All right, what I’ll do is actually, so I’ll also use a dense at. Let’s see – one, so let’s try a dense net 201 so this should be all pre-processed, so we shouldn’t worry about any of this, but yeah. I guess I’ll just run it. Just in case it is will pre-proces’s them. Alright, so once again, this ship against that. Let me just make sure, yeah. It’s tense net. Alright, so we will run this now. We’ll get the probs. You’ll see This is taking longer probability shape, all right, so now you’ll see it’s completely different in the index for the indices, and if we run this, you’ll see or Sorrell, a goose and an Egyptian cat. All right, all right, so a Sorrell hartebeest Arabian camel so very similar both these models and I am not sure why they think it’s a Sorrell. I know there’s a horse category or label. I’m pretty sure there is within the image yet, or maybe there isn’t. I haven’t looked through all of these thousand labels, but this seems to be a plant, sir. All is a common sorrel, accordance or Loftons as we call it as a perennial herb or herb. All right, all right, so that was just a quick introduction to difference, all right, basically, the exploration of image name models in Pi Torch. Now what I’m going to do in the next video is we’re going to be using these on our personal data sets, so they’ve been trained actually on the thousand categories, and they have these weights based on the thousand categories. Now in the next video, what we’re going to do is something called transfer learning, so a transfer learning is so instead of training from scratch, we’re going to be using the weights that it trained or that it learned from the image new models. So if our data say deals with some sort of animals, the model should be able to learn quicker because the the first X amount of layers should have picked up some of the features that are needed to classify animals so a transfer learning what we do. It is basically we’re using a trained model and where you are trying to save time by not training from scratch but using the weights of a trained model and in this case, this trained model will be trained on the thousand categories of imaged in so that’s. What transfer learning does we basically take that? And we apply it to our datasets, so what we need to do is we’re going to chop off the last layer, which is for a thousand labels and we’re going to apply it to the data set. We work through in the first video. Which was the ants and the bees, so we’re going to create our last layer to only classify two labels instead of these thousand labels. All right, so that’s it with this video. I will see you guys in the next video, you.