WHT

Pytorch Source Code | Pytorch Dataloader Source Code – Debugging Session

Deeplizard

Subscribe Here

Likes

104

Views

3,547

Pytorch Dataloader Source Code - Debugging Session

Transcript:

Welcome to deep! Blizzard, my name is Chris. In this episode, we’re gonna pick up where we left off last time looking at data normalization. Only this time instead of writing the code we’re gonna be debugging the code and specifically, we’re gonna be debugging down into the PI torch source code to see exactly what’s going on when we normalize the data set. So I’m here in Visual Studio code and before we start debugging. I just want to give you a quick overview of the program here that I’ve written. That’s gonna allow us to step in and see the normalization of the data set or see exactly how it’s done underneath the hood and Pi torch, so I have here my imports and these are the most basic imports that we’re gonna need to actually get this done. So the two primary ones are torch and torch vision, then we’re getting transforms out of torch vision. We’re getting the neural network package out of pipe out of torch or Pi Torch. Then we are getting the data loader class, so we can load our data next as we discussed in the last video, we have got the mean and standard deviation values now, instead of having to calculate these. I just pulled these and hard-coded them into the program here, and this is kind of the same thing that you would do if we were to snag these values offline, so I don’t want to have to go through the trouble of recalculating these values. So I’m hard coding them here. We have the mean and standard deviation, and we know that we need both of these values to be able to normalize every member or every pixel of our data set next. We initialize our train set using the fashion industry and the key point to notice here or to take note of is the transforms, so we have a composition of transforms. The first one is to transform our pill image into a tensor and then the next transform is the norm, the call to normalize which is going to normal our data for us and then down here where we have the two break points that. I’ve put into place. We have the data loader constructor where we’re specifying a batch size of one and we’re passing in our training set. Then we are passing the loader into the inner function to get an iterator that we will then pass into the call to next to get the next value from the iterator and that next value is going to be a label and a tensor. It’s only going to be a tuple that contains our tensor image and our label, so we can unpack it in this way. Using kind of double initializing these variables, so we’re initializing the image and the label both at the same time. Alright, so now we’re ready to actually debug so to debug. I’m gonna go ahead and just make sure that I have my Python Run configuration selected, and then I’m going to click, start debugging and our breakpoint has been hit and pulls the execution of this program. So now I’m gonna hit Kay Z on my keyboard to launch us into a view that basically blocks out kind of everything else that kind of the IDE has that could be distracting us, so we can just focus on this source code can’t. Remember what this view is called? Let me actually, so we get a view. Let’s see Kz Xin mode. I think, yeah, Kz Xin mode. So I often use Xing mode and you can even kind of, like position the code wherever you want, but anyways, let’s keep on moving here, so I’m gonna be using f10 to step over and f11 to step into you on my keyboard, so we’ll go ahead and step into this code and we can see now that we’ve stepped down into the data loader class. So this is data loader dot py and it exists inside of torch, so we’re in Pi Torch source code at this point. Now there’s not much that. I think I want to show you inside the constructor. We can see that the data set is being initialized here. The NUM worker is being initialized and in some other technical aspects being initialized. And okay, so down here. This is the other thing that I wanted to show you, so I’m gonna hit continue to drop us down to this particular breakpoint. So what this is is a sampler, which is the thing or the object that actually gets the values for us. So if you can see here, there’s two particular sample orders that are relevant, so the first one is a random sample. Err, and you can see that this one is initially. The sampler is initialized to be a random sample. Err if the shuffle value is true. And since we didn’t shuffle, then we’re not gonna have a random sampler. What we are gonna have. Instead is a sequential sampler, so that just means that we’re gonna run through the data set sequentially, retrieving each value. Alright, so we set our sampler and then. I think there’s not much to talk about in here, so we’re gonna step out. And so that was the initialization of our data loader. Now we’re ready to actually use the data loader, and this is where we’re gonna see our normalization process. Be in play. Alright, so we’re gonna step in here and the first thing that we’re gonna hit is the implementation of the inner function because we’ve passed our data loader to the Iterator Built-in method, which in turn calls the iterator so or excuse me the inner special double Dunder it or under it or inside of the data loader. So if you implement this particular method, it is what allows you to pass this. The data loader object to the inner function and allows you to iterate over the members that you have so here, the kind of thing that I wanted to highlight to you is that we can see that. Two kind of things are in play here, number one. We can get a single process data loader hitter or we can get a multi processing data. Loader itter. And this is all dependent on whether or not we’ve specified numb workers to be greater than zero. So if we want additional workers, then we would get this multiprocessing data loader, it err and otherwise, we’re gonna get the single process and so to keep things simple we’re obviously in this example we’re gonna work with a single process, so we didn’t set any additional workers, so then that means that when we step, we are gonna return a single process data load or it err, so oh, whoops accidentally stepped into. This did not want to okay, so here we are. Our inter or iterator has been returned from calling it err, and now we’re ready to call next, so we the next time we step in. We’re gonna be stepping into the call – next now. The call – next lands us back in the data loader class on inside the next implementation. And this is just another another special. Python method that you have to implement whenever you want your object to be iterable, so you have to you have to implement the inner function and you have to implement the next function, so we’re inside of next, and this is where we’re gonna get our the next element that is available through the iterator, all right, so when we’re speaking about elements coming from the data loader, these elements are batches. The data loader iterator returns to US batches of elements from the underlying data set, So let’s go ahead and step into this next index method Because it’s gonna lead us to the use of something that we saw earlier And that is the sampler. Alright, so here we’re inside of this next index call. And we can see the use of the sampler here. The sampler is being passed in to another next function, so we’re kind of dealing with an inception of iterators. Here we have. The data loader is an iterator, and the sampler is also an iterator, so the sampler is going to be handling The indices that the data loader will ultimately use to whole the particular samples from the underlying data set. So let’s step in in this call. And then we can see where the indices are being collected inside of this batch list, so we can step down and we can see that this is where our batch size comes into play, so our batch size is one, so we’re gonna be iterating through the indices that have been made available by the sampler and we’re going to be collecting them and then as soon as we reach this condition here where the length of the collected indices is equal to the batch size we’re gonna yield this particular batch of indices, which just means return. You may recall. This is a sequential sampler, so it just iterate’s over the indices sequentially If we were dealing with thee if we had enabled shuffling on this particular data loader, then we’d be dealing with a different type of sampler one that sampled randomly, so we returned and now we have our index, which, in this case, our batch size is 1 and this is the first batch that we’re pulling, so we’re just getting the data element in the underlying data set that lives at index 0 So now, with this index in hand, we’re ready to go fetch the data, so we’re gonna call fetch and pass in the index. And now this is where the batch itself is constructed. What we get after we construct a batch is this data, which is a list and then this collate function, actually takes that data in list form and transforms it into a single tensor that is ultimately returned. So now we’re inside of a file called fetch pie. We can see that this is a part of the tortes Doc Util’s, data package and essentially what’s happening here is we’re returning our data as a list members in our data set for the indice indices that were passed in so since we only pass in one index, we’re only gonna get one value back, but if we were working with multiple, if we were asking to get multiple, then we would get multiple, so if our batch size was greater than one, then this is where we would iterate over the data set, grabbing each additional value. All right, so I’m gonna step into this call And the specific call that we’re stepping into. Is this right here? We’re stepping into this data set index. And what what this is? Whenever we pass an index like this to a data set, we’re going to the call is going to be intercepted and we’re gonna be routed to a particular method that must be implemented for all. Pi Torch data sets, which is the get item method. So let me just show you what I mean. We’re gonna step into this now and here. We are so now we’re in the imminent class. You can see here in this. The reason we’re in the in this class is because the fashion in this data set extends the in this class. So in other words, the get item method is the same, regardless of whether or not you’re using in this or using fashion amidst all right, so we’re ready to actually get our data, so here’s where this happens. We go into a attribute of the class called data and we pass in our index and then we also do the same for targets, so target is another name for label, and that’s why it’s called targets here, so we’re gonna get our image and our label or aka target pure list here. This is just this data. This data is actually a tensor. So let’s just take a look. I’m gonna pull up the debug console and I’m gonna type in self dot data and I want to look at the shape of this tensor now. This tensor is actually 60,000 by 28 by 28 So what that means is that this is the entire data set right here in memory. So before whenever I was discussing when we were looking at iterating through our data set to calculate the mean and standard deviation manually, I mentioned that the data set may be too large to actually load the whole thing in at once, and that’s why we did the easy way in the hard way when we were calculating our mean and standard deviation. But in this case, we already have the data set loaded, which I wasn’t aware of this until I did this debugging. I’m not sure if they changed it or not, okay. I’m not sure if they changed this behavior or not. I can’t remember like what it used to be if I ever looked in here. But basically what this means is that you can’t use this class. If you don’t have enough memory to put all the image, the whole dataset into memory, so the other option is basically up here. This is where if we look inside the initialization or the class constructor for this class? We can see like what happens so first the data gets downloaded and then we can see here that the torch dot load call calls the folder where all the data lives. So that’s here, and it basically just loads in a. Pi Torch Pickle file. So Pi Torch has pickled out this tensor and it essentially just loads it all in at once. Now you may be wondering. Well, what’s the alternative well? The alternative is is instead of loading all of the data in at once you load in like file paths to the data, so then in the get item, what we do instead of just grabbing the data straight up from a tensor, that’s in memory, we would go at this time, Read from disk the file, using the instead of having a data here the raw data we would have like the data of the file path, so we’d be referencing the data and at this time it would be read in now. Obviously that’s going to be a much slower, but it is what’s required. If the data set is massive, so at this point, we just have the image and we’re about to transform it. Alright, so I’m gonna step down and here We are just getting the image from the Numpy Array, which, yeah, the doing, this says doing this so that it is consistent with all of the data sets to return a pill image. So we already know that this is a tensor because we see that, so this is a sir, and so this call is kind of pointless, but they’re being consistent, all right. I like consistency, but anyway, okay, so now we we definitely have two transforms or we have one composition of two transforms. So let’s just step into this. We know that we have one. We’re going to step into self transform. So now we are in the transforms dot. Pi file, which lives in torch vision so torch vision transforms. This is where we are and what we’re gonna do is we’re in the call method. So we basically when you call a transform. Then you come into the implementation of what you want to happen when you call it, so we have two transforms. You can see them here. We have two tensor and then we have normalized so what this is saying. Is this saying for each transform in the transform’s composition or list? Let’s transform the image. Alright, so we’ll step down here. We’ll step into this transform, and we can see that what this is is a this is calling the neural network functional. API it’s calling the two tensor on this. Okay, so this is gonna cause this app to a little bit more investigation. I’m not sure how we ended up with the pill image because we were just dealing with a tensor, so I must overlooked something, but let’s just go ahead and assume, okay, we’re going from a pill image to a tensor. All right, so I didn’t mean to step into this, so we’re gonna run over all this code until we’re out. Our next Transform is are normalized. Call so this is where? I wanted to actually show you how the normalization works in Pi Torch. So we’re gonna step in and just like with two tensor whenever we did the transformation to a tensor, we got that method, or we ultimately ended up in the neural network, functional package or functional. API where all of the all the pipe torches, functional or functions live. So in this case with normalize, it’s the same, you can see that we’re passing the tensor passing the mean and the standard deviation. So let’s step in here and we can see that if we just step down, there’s some pre-processing that’s occurring, but ultimately we get down to what we’re taking the tensor and then what we’re doing is we’re saying, subtract the mean and then we’re saying divide by the standard deviation, so this is exactly the process that we discussed in the video on normalization. We calculate the mean and standard deviation First, then for every pixel, we subtract the mean, and we divide by the standard deviation we’re done so at this point we’re ready to return our image and our label back out to the fetch function, which puts it in the data and we keep running across. Eventually we’ll eventually get back out. I think I accidentally just stepped into something here. Yeah, so now we’re back out into our next data call, and then here’s the next call and the data loader and we’re back out so here we have our normalized image, So I’m gonna just rerun this because I want to go back and see how we ended up with a pill image. They used to have a pill image and then it when I saw this again the second time when I was prepping for this video. I thought that it was just changed to like. They change implementation, but it may be the case that I just overlooked something. So let’s get out of Zen mode real quick. I’m gonna rerun this program. I’m gonna kick back into Zen mode and then just gonna run through here until we get into the get item call. So here’s a get item, okay, So indeed, the image is coming out of this data set, which is a tensor. All right, so let’s just step over that so image at this point image at this point is definitely a tensor, so we’ll look at this shape just to see. Yeah, so this is one okay, 28 by 28 from Array. Oh, is this it ah okay. I see so this is where it just got transformed into a pill image, yeah? I thought that was weird before, okay, so they take it from a tensor to a pill image, you know? I bet they actually probably changed from the last time I looked at it because it definitely. I think you used to read the data off of disk, and if you do it that way, then you’re gonna have a pill image here, so it’s gonna be a requirement to have that 2 tensor transformed, But now in this case, well, yeah. I guess it’s still a requirement, but if they didn’t have this line, then you wouldn’t have to do a transform on your data set. You could just load your data set right away and no transforms required, but in this case, since they do this, then you basically you have to have a two tensor call inside of these transforms list, and I guess they’re doing that, like they said here to be consistent with other data sets, all right so. I hope that that helps you mainly with two things. First of all understand the normalization process, a little more deeply and number Two is to see kind of the benefits of messing around with debugging code. The best way to learn about code and from my perspective and experience is to actually spend a lot of time, debugging it and trying to understand what’s going on, and it’s most. It’s very easy, or it’s a lot easier to understand what’s going on with code when you can in real time, inspect the data versus trying to have, like print statements or something like that all through your code, so definitely if you’re not debugging at this point, set up the debugger and start learning how to do it, and if you didn’t know, we’re actually filming this video from here in Vietnam and we have another channel where you can learn more about us and our travels because we document all the places we go and many of the things that we do. So if you’re interested in connecting with us in a new way, then go over and check out a deep lizard vlog on Youtube, and also if you haven’t done so already be sure to check out the deep lizard hivemind, where you can get exclusive perks and rewards thanks for contributing to collective intelligence. I’ll see you in the next one [Music] [Music]!

0.3.0 | Wor Build 0.3.0 Installation Guide

Transcript: [MUSIC] Okay, so in this video? I want to take a look at the new windows on Raspberry Pi build 0.3.0 and this is the latest version. It's just been released today and this version you have to build by yourself. You have to get your own whim, and then you...

read more