Pytorch Checkpoint Example | How To Save And Load Models In Pytorch

Aladdin Persson

Subscribe Here





How To Save And Load Models In Pytorch


[MUSIC] Let’s say we’re at the point where we get a model to work and it’s training and we want to be able to save the model and continuing the training at another point. So for this, we need to be to figure out how to save and load a check point of the model in this case. I’ve got a very simple CNN network. That’s training on the M&S dataset. What we want to do is we want to be able to lets. Say, check point. We’re gonna store it as a dictionary it’s called. State Dictionary and Model Dot State Dictionary and also not only the model parameters were also gonna store the optimizer. You can store more things as well like the epoch, the best accuracy and whatever information you want, Let’s just keep it simple, and we won’t store the model parameters and the OPTIMIZER. In this case. Then we’re gonna do OPTIMIZER the State Dictionary, And then we want to do is. Let’s say if epoch is equal to two, you can do whatever here, if maybe the fifth epoch or something like that, then we want to. [MUSIC] then we want to call to some function save checkpoint with the checkpoint. With the dictionary we created, so let’s create that function. Define nine checkpoint and the checkpoint is gonna take some state so the dictionary we created here and it’s gonna also outputs to some file. Let’s put it call it my checkpoint and use the standard convention or using dot. Pth Tatar. So let’s do some print saving checkpoint and then we’re going to use torch dot save state at filing. Okay, right, so see checkpoint to checkpoint. Yeah, so let’s try. Run this undefined serger. Yeah, okay, it’s gonna be a save checkpoint and this might take a while so. I’ll just continue when it’s done. Yeah, so now we’re trying to epochs and we see saving checkpoint and let’s actually see. Yeah, so if I take up that folder, it’s going to show my checkpoint and that’s the file. Now next point, let’s say we want to actually load it. Yeah, we wanted. Create another function on the final load checkpoint from a checkpoint and we’re gonna do print loading checkpoint and then we’re just gonna do model dot load. State dictionary, quick point from state ticked and then pretty much the same thing, but for the OPTIMIZER and again if you save more things in the checkpoint like accuracy or epoch or whatever you’re gonna have to take this from the dictionary as well. So, for example, if we would have checkpoint of best current accuracy or something, you would call it like this, but we only have the State Dictionary and the OPTIMIZER. Then let’s say we’re gonna have another high parameter. Let’s say we have load model is true, so we’re gonna do. After we’ve initialized the model in OPTIMIZER. We’re gonna do load model of of Torch Dot. Load my chick points of the file that we created whatever you called that one, and that’s all so we can do if load model then load checkpoint, that’s what we call that function so now it should load model, and if the epoch is two is gonna also save the a checkpoint, But let’s see, let’s say that we want to. I don’t know, store it. Every third epoch, so we can do epoch. Modulu’s three equals zero. Then it’s going to create another checkpoint and save the checkpoint. For example, you could also do. You could check the accuracy. See if the accuracy is better than some best accuracy. Then you could save the model. There are multiple ways of doing it. Let’s say we just want to do it in this simple way. I’m gonna let it rain for a while. Just so we can see all of it. Alright, so it trained for 10 epochs and we can see that. I in the beginning. It loaded the check point, and then it also saved the Check Point because epoch zero module three is zero, so then we ran for three epochs or train for three up ox, and we saved checkpoint another three, save another three save, so it seems to be working. Let’s see so now it’s 0.4 T three the mean loss for that epoch. Let’s say that we now rerun it and we can see so it restarted with about the same lost value as the previous one, right, so this means that it’s continuing from this point rather than restarting, if for example, we set load model to false. Then we see that it restarts, right, The loss is much higher now. One thing to be careful of is now when we set load model to false it now. When it’s shaved this check point it actually over it over, writes the previous file, so you have to be cautious of that not to train for a long time and then rewrite. Overwrite your checkpoint file? Yeah, if you have any questions about this. Leave them in the comment section. Thank you for watching the video and hope. T in the next one.

0.3.0 | Wor Build 0.3.0 Installation Guide

Transcript: [MUSIC] Okay, so in this video? I want to take a look at the new windows on Raspberry Pi build 0.3.0 and this is the latest version. It's just been released today and this version you have to build by yourself. You have to get your own whim, and then you...

read more