WHT

Cuda Pytorch | Pytorch On The Gpu – Training Neural Networks With Cuda

Deeplizard

Subscribe Here

Likes

299

Views

10,863

Pytorch On The Gpu - Training Neural Networks With Cuda

Transcript:

Welcome to deep blizzard. My name is Chris. In this episode, we’re gonna learn about how to use the GPU with Pi Torch. We’ll start by learning how to use the GPU in general, and then we’ll see how we can apply these general techniques to train our neural network If you haven’t seen the video where we talked about Gpus in general and why we would even use GPUs in deep learning. Be sure to see that video as well because it’ll help you get the best understanding of these concepts in this video. We’re gonna be more practical in looking at exactly how we use the GPU rather than why we would use the GPU for. Now I want to hit the ground running with some examples there. Now we have all of our code set up and we’re ready to run some examples. If you’re not familiar with this code setup, be sure to check previous episodes in this course where we got all the code up to this point set up. Pi Torch allows us to seamlessly move data to and from our GPU as we do Tensor computations to show our first example in action we’re going to create a tensor and a network. Then we’ll move our tensor and our network to the GPU. Finally, we’ll pass our tensor to our network and get a prediction. Alright, so here we can see that we have a GPU prediction that we’ve gotten back, and if we just print out the device, we can see that, indeed, this tensor. This prediction tensor is on the device of type. Cuda, which is the GPU. Now let’s make a tensor and the network back to the CPU and perform this same task there we can see now that our CPU prediction that we got back from the network after passing the tensor and has a device of type CPU, and this is how we can work with tensors and networks on the GPU. We can move them back and forth in this way, and this gives us a hint at how this would be done in the training loop to do this in the training loop. We just need to make sure that our data would is the tensor in this case and the network are both on the same device. This, in a nutshell, is how we can utilize the capabilities of Pi Torte when it comes to a GPU. Now something interesting. A note here is that we called the CUDA method on both a tensor and on a PI torch network. Now, if you think about that for just a second, you realize that something’s odd about that. Because both of these objects are not the same type, so even though the method name is the same CUDA in this case and CPU when we’re moving to the CPU, these are actually different methods and they’re working differently under the hood. So by the end of this video, we’re going to understand exactly what those differences are. But before we actually get there, Let’s look at a couple more examples that will highlight some of these differences. I want to start by creating two tensors. Now let’s check the device of both of these. Both of these tensors were created on the CPU. This is the default behavior of Pi Torch. Now what I want to do is move one of these tensors and only one of them to the GPU. So here we’ve created one 10 Circle T 1 and another 10 Circle T 2 and then we’ve moved the tensor T 1 to CUDA, and now we after checking the device, we can see that. T one’s device is indeed CUDA. Now since we expect, or we’re thinking that we might get an error, we’re gonna wrap the next call in a try and then we’re gonna catch the exception, so we’re gonna do T 1 plus T 2 and we can see down here that indeed, we get an error. Expect a device CUDA zero but got device CPU by reversing the order of the operation here. We can see that the error also changes. Both of these errors are telling us that the binary plus operation expects the second argument to have the same device as the first argument. Finally for completion’s. Sake, let’s move this second tensor to the puter device and see that the operation does succeed there now. The operation has succeeded, and we see that the result is indeed on the cuda device, note the use of the two method here in this case, instead of using the cuda method or the CPU method. We used the two method and we passed in the parameter instead. The parameter in this case is what specified which device we want to move to. The two method is a preferred method to use when we’re moving tensors to and front devices. One last thing to notice here is in the output of the device whenever our device is on CUDA, the GPU, we also see an index and this is because Pi Torch supports multiple GPUs now by default. If you only have one GPU on your system, it’s going to default to index of zero using multiple GPUs is out of the scope for this lesson, so we’re not going to touch on it in any more detail than that just know that the index specifies which GPU you’re using, so we just covered how tensors can be moved to and from devices now. I want to turn our attention to how we move networks to and from devices. More generally we want to know. What does it even mean to move a network to or from a device? This is the essential thing that we need to grasp here. Pi Torch aside, This applies no matter which framework or programming language we’re using. We need to know. What does it mean to put a network on a device to understand this? Let’s create a Pi George Network and then take a look inside at and network’s parameters so here we’ll just create a network and then now we’ll iterate through the named parameters of the network. We’re going to print out the name and the shape of the parameter. I want to do this same iteration again, but this time I want to print out the device of the parameter and the name whenever we print out the device of each one of these parameters, it is indeed the CPU, so this shows us that by default when we create a PI torch Network, all of its parameters or tensor’s underneath the hood are initialized on the CPU. An important consideration of this is that this is why neural network modules like networks don’t actually have a device attribute. It’s because the network isn’t the thing that’s on a device Technically speaking, it’s the network’s parameters or, in other words, the tensors that live inside the network that are actually on any given device. Now, let’s see what happens when we ask a network to be moved to the GPU. All right, so we’re going to move this network to CUDA there. Now we’ll check the named parameters, and we see that. Indeed, this network has been moved to CUDA specifically all of the network’s parameters, which are tensors have been moved to CUDA. So let’s create a sample and then pass this sample to the network. All right, so we’ve created a sample tensor. We check the shape here. It’s 1 by 1 by 28 by 28 Alright, so we’ll try and we’ll catch the exception that is sure to come because this sample is going to be initialized on the CPU and our network is now on the GPU. OK, and we get an error expected object of device type CUDA but got device type CPU for Argument 1 itself in Call 2 and then we can see here a reference to our first comp layer in the form method, so this error is very similar to what we saw before when we were simply adding tensors, let’s see this computation succeed by moving our sample to CUDA there after moving a sample to CUDA, we can pass this sample to the network and get a result, so the next thing is just to take a look at how we can detect if CUDA is available in our system now. The reason we want to be able to do This has to do with something that we call device agnostic code and what that means is is that we want our code or our programs that we write and Pi Torch to be agnostic to which device The particular program is running on, so we don’t want to write a program that just calls. Cuda, everywhere or says to, and then give that to someone who’s gonna run it on a machine that doesn’t have CUDA because our program would have work in that case, so one of the ways to alleviate that or to write device. Agnostic code is to use this torch. Dot CUDA dot is available called and this tells us whether or not CUDA is available in our system and we can go about the rest of our program, setting our device based on the output of this call now on the blog post for this episode. There’s a little bit more detail about writing device agnostic code and the only reason. I’m using that kind of language is because you may see it in Pi Torch documentation or on other blogs. It’s basically just writing code that that works. You don’t want to write code. That only works on one device so now we’re ready to go ahead and take a look at using the GPU in our training loop, and we want to do a performance test to understand exactly how much the speed-up actually is when we use a GPU versus CPU, so in order to do this, we’re gonna build on the code that we’ve been developing over the last few episodes in the series, which is just here, this is where we configure our runs and our training loop, so before we modify or I show you What modifications have to happen inside of this code here? There’s one change that we have to make to a run builder class in order for all of this to work. So I want to scroll up where we defined our run builder and just show you. I misspoke earlier. I said run builder when I meant to say run manager. So we need to modify the run manager class. Alright, in the place, where the modification needs to take place is in the begin. Run method. So just down here. You can see where we’re adding a graph to the tensor board instance. We need to actually do some checking to see which device is our run for? Are we running? Is this run using the CPU? Or is it running the GPU? What this does is makes this code backward compatible with what we’ve already written, so we need to check to see if our run has a device attribute, and if it does, we want to use that that particular value, and if it doesn’t, then we’re going to default to the CPU, so make sure to update your code with this, and then you’ll be backward compatible and ready to move forward, so let’s jump back to our training loop now and see what modifications we need to make in order to see this thing in action, all right, so the first order of business is that we need to make a device or put a device inside of our run configurations. So in this case, I have device and I’ve set the two values that we want to try to be CUDA and CPU, and so then what this does is. It exposes these values inside of our runs, So let’s go see. There’s a few places inside the run and inside the training loop. Well, we need to actually access these values. The first place is gonna be right here. At the top of the run, we’re gonna go ahead and just get our device, so we say run Dot device and we create this Pi torch device. This is one way that you can actually create a PI tours device as you say, torch device and then pass in a CPU or CUDA. And you’ll get back that particular device and this one we can use to pass around and saw our program and it will be the device that is used the next place or the first place. Actually, that will use the device. It’s going to be the network, so we initialize our network and then right away before we even set it equal to a variable. We can just chain the to call to it and then send the device in so we’ll move our network or initialize our network on this particular device, whichever one it may be, we’re going to be swapping between CUDA and CPU, and then the next thing that has to be changed is just down here when we unpack our tensors, our images and our labels tensors. We need to do it separately before we just unpack them all at once from the batch. Now we need to do it one at a time, and as we do that we’re gonna use indexing. Then we’re going to chain the to call on here to send both the images and label sensors to the device that the network is also on and that’s all there is to it to get our training loop running against these two different devices so now with all the work that we’ve done before we can use this to do a performance test to see. How much of a speed-up do we get whenever we run on? Cuda versus CPU. So let’s run this code and see what we get. We’re gonna vary up here. You can see, we’re gonna vary. We’re gonna do the same learning rate every time because that shouldn’t change anything and then we’re gonna have three different batch sizes that we’ll try, It will try 1,000 10,000 and 20,000 and then for numb markers, we’ll iterate on two variations. Okay, so I’m gonna run this code now. All right, we can see here that we finished all of our runs and just down here just so we can make better sense of it. We’re going to query the information and sort it by the What are we gonna sort it? By the epoch duration so here, we are accessing the run data from within the run manager by doing M Dot run data. And then we want to Orient along the columns and we want to sort values based on epoch duration and we’re using the pandas dataframe to do this. Okay, so we run this and here we can see that sorted by run duration That CUDA blew away the CPU every time. It looks like it was about twice as fast in this case. So when you get your results post on in the comments and let us know how you did. What was your speed up? So that covers how we can use? Gpu with PI torch. Now all of this comes baked in. There’s no need to do any additional kind of installs or anything like that. So Pi torches makes it really easy to use the GPU. I hope this video helps you understand what it actually means to move a network to a device actually starting to run out of light here. This is the one of the first videos that we’ve ever recorded face to face, so we’re still trying to work our way through this new kind of way of doing things, and it’s actually kind of hard because we’re using the sunlight coming in from outside and it’s starting to get dark, so I’m losing all the precious light that I need now. Fun Fact is that we’re actually recording this video from Vietnam and we’ve been traveling for, like the last I would say a year and a half and we document all of that on our other channel. Deep lizard vlog. So if you’re interested in following us over there, go check it out. We basically document everything that we do. In terms of travel, and if you haven’t already be sure to check out the deep lizard hive mind where you can get exclusive perks and rewards thanks for contributing to collective intelligence. I’ll see you next time [Music] [Music]!

0.3.0 | Wor Build 0.3.0 Installation Guide

Transcript: [MUSIC] Okay, so in this video? I want to take a look at the new windows on Raspberry Pi build 0.3.0 and this is the latest version. It's just been released today and this version you have to build by yourself. You have to get your own whim, and then you...

read more