Transcript:
In this video, we’re going to continue on with the multi processing module in Python and what we’re going to be focused on in this video. Is something called the pool class. So this is going to allow us to create a pool of processes which can carry out tasks submitted to it with the pool class, which is what we’re going to be making use of in this video and these this content and the comments up here is a little bit more information on this class, and I just lifted it from the documentation for the multi processing module. So if you want more information of things that might not be covered in this video, you can check out that documentation, but I’m gonna try to kind of give you everything that you need to at least get started with this idea and to see what you can do with it. So the general idea is we’re going to make use of this class to take a function and distribute the calls of that function across all of the processors or subsets of the processors on our machine, and each of the processors will be responsible for some collection of the subsets of the problems we’re trying to run and we’ll have another function, which will run all of these things serially so to run all of these things in one go and we’ll do some time comparisons to determine if the multi processing thing where we’re actually able to split all of these calls is performing better and we’ll see that indeed, as the problem gets larger, it does scale better and then we’ll see how the serial process works as well. So this is a little bit vague, so let me just go and start to write code, and I think is a right code. Hopefully that description will start to kind of fill in, so let me import two things which we’ll be making use of in this video. One is time again. This should come standard with your. Python installation and we’ll be using this for it’s to measure the time execution of the process that we’re going to be running on the multi processing module and also the serial implementation of our code as well. The next thing that we’re going to import is we’re going to import pool because that’s the whole fourth point of this video, so we’re going to instantiate a pool of workers to be used to distribute our process or a function across the processors on our machine. So just kind of give you a flavor of what that’s all about. Let me just give you a taste of what we’ll be doing, and then well kind of generalize, and then go from there to do some time comparisons, so I’m going to just create the main function, so I’m going to say if double underscore name is equal equal. Double underscore main double underscore. What are we going to do here is? I’m just going to make use of this pool class, so I’m just going to say P is equal to pool, so I’m creating a pool object from the class that we just imported and then what I’m going to do is. I’m just going to say results, which is going to be what is returned after calling a specific function of this pool class. Specifically, what we’re going to be doing is saying. P Map. So essentially, this function is going to take a function and a list of something in interval and it’s going to map all of those intervals in that function onto the processors on our machine. So we’re going to give it the name of the function, which in this case is going to be some square. It’s a function we’ve yet to create and then some iterable, so in this case, we’re going to give it a list of numbers and then after we do this after we map this call to the processors and run it we’re going to say P close, and then P joins so that way anything that we run after this is going to be completed. It’s going to wait for this process to run, and then what we’re going to do is just kind of print this out. We’ll go and generalize this and make a serial implementation, But I kind of want to give you a flavor of what this is actually doing. So first what we’re gonna do is we’re going to go up above the main and create a function which we’ll call sum square, so we’ll say sum. Square and what this is going to do is it’s just going to take a single number and all we’re going to do is we’re going to loop through a range of that number and then sum from zero to that number and then we’re going to be squaring the number every time in this loop. So let me just write that out because I think it’s a little bit more explicit if I do so so we’ll define a variable S Well, so that equal to zero, and then we’ll say for I in range of the number What we’re going to do is we’re going to say S plus equal. I times I so as we’re looping through whatever number we’re given, we’re ranging over that numbers if the number is ten, we’re going from zero through nine, and then along the way what we’re doing is we’re adding the value of I times I to the S variable there, so that’s the general point of this function we’re squaring each number in the sum, and then what we’re going to do is we’re just going to return S so the result of that we’re just going to return. So then if I go back down to the main, and if I write this and terminal. I guess what I also need to do is. I also need to make sure that it defined a list of numbers so just very briefly. I’m just going to say lets. Just do it above the pool. I’m going to say numbers is equal to let’s say 1 2 3 4 5 so it’s just the numbers from 1 to 5 but actually just to make this a little bit more concise. Let me just do the numbers from 0 to 4 so let me just do something like that, so it’s the same thing. This is just 0 1 2 3 4 OK, so we have our numbers list. We’re bringing that into the argument for each of the calls of the function and again what we’re doing is we’re distributing the sum square function on each of the processors and this numbers is list and each of the elements will be distributed to, however, many processors. We happen to have on our machine, so let’s just go ahead and run this function here, so we’ll go ahead and say Python multi-processing pool. If we do that, we’ll get this list here. Which is the result so 0 0 1 5 and 14 so just to kind of clarify what’s going on. We have this list of numbers, which is 0 1 2 3 4 What we’re doing is we’re mapping each of the calls to the sum square function to each one of the numbers in the list, where the cores are getting an appropriate distribution of each of those numbers in the function here, what we’re doing for the first element, which is 0 we don’t even enter the for loop because we’re ranging over essentially 0 so this doesn’t even happen so that returns 0 that’s where the first argument comes from for the time. When number is 1 1 comes in here again. We don’t even we don’t do anything s plus equal. I times I is still going to be 0 The next element is equal to 2 so in that case, what we’re going to do is range over that that’s going to give us 0 and then back into the and so the loop is going to give us the next part of that sum, which is 1 That’s where we’re getting the 1 from here in this third argument and so on for 5 and 14 so that’s pretty much all we’re doing here, just kind of a simple application of the pool class instantiating this object and one thing that. I want to point out as well is. This pool class takes an argument and it’s going to take how many processors we wish to distribute for in this particular pool. So how many processors were able to give to this depend on the number of processors on our machine and by default? If we don’t specify how many processors we want to make use of this pool, it’ll just allocate for the maximum number of processors on our machine. If you want to figure out how many processors you have on your machine, if you open up another terminal and open up Python, if you import OS and say OS dot CPU underscore count. This is a function that’s going to let you know how many how many cores you essentially have access to so on this machine that I’m currently running on, I have 16 cores and we can distribute up to 16 separate processes. So if I don’t specify how many CPUs I want to make use of, you know, just default to the maximum. If you for whatever reason, maybe you don’t want to do that, you can specify at lower numbers, so we’ll just close that go back to our code and make this a little bit more elaborate and specifically what we’re going to do is we’re going to create two separate functions, one which is going to perform this task serially and another, which is essentially going to perform just what we have here in a parallel using this pool class and what we’re going to do is we’re going to time each of these functions and see how they behave as the problem scales and the problem is going to be scaling based on how big this number list is so based on how big the numbers are depend on how many processors we use and also depend on how many elements each of those processors need to work on. OK, so let’s just go ahead and and allocate our code in in this way, so we have our sum square function. That’s pretty much going to be the same. Let’s go ahead and create a function which is called sum Square with MP and this is going to take the numbers list that we’re going to pass in and what we’re gonna do here is we’re just going to more or less copy What we’ve written in the main function and just paste it right in there, so let’s go ahead and change. Let’s just get rid of this numbers thing because we’re passing it in explicitly as an argument to this function, and then also what? I’m going to do before I declare the pool is I’m going to say start? Time is equal to time time and what this is going to allow us to do is time execution of this particular function. So I’m getting the current system time at this point before we run anything in this function and then what I’m going to do here. I’m going to get rid of this print statement as well after we join what I’m going to do is I’m going to say end. Time is equal to time time. – start time, so this will give us the total execution time that it takes to run this bit of code here. Okay, so down over here. We’re going to just print out a message to kind of give us a little bit more information, so what? I’ll say is I’ll say processing. Let’s say this many numbers. So length of the numbers list numbers took. Let’s say I took end time using also time using multi processing so that will be our print statement there, so after this completes, we’ll print this out, It’ll tell us how many numbers total we’ve processed, given the length of the list, and also it’ll print out the end time variable that we defined here, so that’s all we need for that, so I’ll go ahead and write that now. Let’s go ahead and compute a function or write a function that does the exact same behavior as this function that we’ve just written using multi processing, but does it in a serial way we’re gonna do that next, so let’s go ahead and create that function, so we’ll say sum square, let’s say no. MP sum, no multi processing. It’ll also take a list of numbers and then we’re going to do the same kind of the same idea, so we’re gonna say start. Time is equal to time time, get the current time, and then what we’re going to do is we’re going to say result is equal to an empty list, just like this stores, the results in a list we’re going to declare our empty list here and then we’re going to say for I and numbers so ranging over all of the numbers we’re going to append the result of applying our function to each of the numbers that we loop through in this loop, so we’re going to say sum square sum square of I so what we’re doing there looping all the numbers that were given as input into the function, appending them to the empty list, the previously empty list as we compute each number in this loop and then we’re just as we run that function we’re appending the results of that function to our results list and then we’re going to say N Time is equal to so. I’m just going to copy these few lines because they’re pretty much identical. Paste them in there. Get rid of that So this time processing this many numbers took that time using. Let’s just say serial processing. Okay, so we’ve got that, so that’s pretty much all we need for our two functions. One is using multi processing one is using serial computation. So does it in a sequential matter, And then what we’re gonna do in the main part of the code is we’re just going to call both of those functions and we’re going to see how much time each of those things take, so let’s start off with a relatively small numbers list. We’re gonna say numbers is equal to range. Let’s say 100 and then what we’re gonna do is we’re going to say sum square with MP, so we’re first going to run it with the multi-processing module and then we’re gonna do the same thing, but we’re gonna do it without multi-processing so it’s just the serial version of this, so I’ll go ahead and write that Ill clear the terminal and then we’ll say Python multi-processing pool. Okay, and let’s see it looks like numbers is not defined. That’s because I didn’t add an S here, lets. Try that again, so it looks like here. We have processing 100 numbers took this much time. Using multi processing and processing 100 numbers took even less time using serial processing. So you’re probably thinking to yourself. Why would you want to use multi processing? If this is taking, you know, quite a bit longer than the serial processing, so you’re not really gonna see any gains or benefits if you’re operating at such a small amount of data, so we’re going to really leverage multi processing by operating on much more data so in order for us to see the benefits of multi processing. Let’s bump this numbers list up from 100 to let’s say, let’s let’s go up for over a thousand ten thousand. A hundred thousand. Let’s go all the way up to a hundred thousand and see what numbers we get, then so we’ll write this and we’ll run it, and we should see hopefully that the multi processing module is going to take substantially less time. In fact, this might just take might be too boring to wait all around for all that. Let’s bring this down back to ten thousand. Try running that again. So multi processing took about half a second. Were still waiting on the serial code and that took about almost four seconds there. So if you were to run this for a hundred thousand? I won’t do that in this video because it’s just going to be running in the background. It’s not going to be very exciting to watch, but if you were to continue to scale this, you would continue to see multi processing perform better than the serial processing code, so maybe that’s some motivation as to why you would want to use multi processing at least in this instance, and this is somewhat contrived, of course, but you can imagine having a similar type of problem where you’re trying to iterate over some interval in this case we’re iterating over some interval list of numbers, and you want to distribute each call to a function on each of those intervals the list and you want to somehow put them on separate processors, and if you can find a problem that has such a parallel nature, then you can use this type of idea and apply it to it and hopefully get some time boost from that. So anyway, that’s pretty much all. I wanted to cover in this video if you have any questions or comments on anything that. I’ve covered, please don’t hesitate to leave them. In the comments section of this video, the code, as always is going to be available on my Github page And I’ll leave a link to that in the description. So thanks again for watching and have a great day.