Transcript:

I remix! Listen, sir. Thank you, thank you for the introduction. Hi, shotgun. I am a master student at the University of Montreal in the in the Miller Lab. And I’m going to talk a bit about Pi Torch and Numpy. So how many people here have used by torch awesome? How many people here have used numpy or use it on, like a regular basis? Cool so? I think a lot of people are going to love the dark, huh? So what are we basically going to talk about introduction to Numpy and Pi Touch, which is going to be very quick because most of you seem to know about it already. The second part would be why. Pi torch at all, and when I say wipe. I thought I would not be talking about the machine learning part of Pi Torch. So I’m a graduate student. I use my torch for machine learning. But there is this other side of Pi Torch, where people who just use numpy can also use Pi Torch without having to worry about the deep learning or the machine learning hype. They can just use Pi Torch to augment their numpy use cases, and this is what the talk would focus on. It will not focus on the machine learning aspect of it, then some pitfalls, so if you are using both bye torch and numpy at the same time, certain things can go wrong and we would see what those pitfalls are. The next part would be. What is missing in potage? Basically, can I use Pi Torch as a drop-in replacement for Numpy? If no, then you know what things am I going to miss? And what’s the future roadmap for by Dodge Disk level? I have made a few contributions to Pyro’s, but I’m not at all affiliated to the core development team or anything, so I’m just a neutral user, just like you people. So what Numpy gives us on top of let’s say standard Python data structures that like lists instead of using a list, you would use an umpire or the so called nd arrow in dimensional array because it gives you speed because it gives you a lot of built-in. Big Ben functions functions which are optimized functions which do exactly what you want them to do. So you do not have to write those functions from your hand. The standard example when talking about number is you have an example where you are iterating over a list and then you are right rating over the numpy array, and then you show that number arrays up a bit 1.0 times faster than a list, so that is the basic idea behind using an umpire which I’m sure everyone already knows about other benefits of Numpy include things like it allows you to easily integrate C C++ or Fortran code. So you’d like you can write your loop in Python, and then you can actually learn the loop in C or C++, which makes your loop much much faster than had it only been in Python, which you like the benefits of other inbuilt Python sequences and, yes, so this was the example that I was talking about, so you have a list and given this list, you can easily convert it into an umpire. A by just passing it through N Periodic array. The point is given a Python or given given a list. It’s very, very easy to migrate to the to the numpy ecosystem, The point, which I am trying to convey here is when you had the standard Python list and you were thinking about migrating to Numpy. There were a couple of things that were in your mind, but this transition be easy would. I have to rewrite my code, But the transition actually give me the kind of speed ups that I am looking for. Now when we’ll be talking about Pi Toss. We would be raising the same questions we would ask whether going from Numpy to. Pi Watch is easy or not, and is it really going to give me the computational benefits which I get when I move from a Python list to an umpire array, right Pi touch the open Source Machine Learning Framework, which was which which is being developed at Facebook. I research commonly called as fair what’s similar between Pi Taj and Numpy is Numpy entire numpy ecosystem. Inter Numpy framework is built around this thing called as an NDA array, A N dimensional array, and obviously Pi Toss has another pair. Toss has something very similar, which is called as a tensor and we’ll talk about it in the thing, which Tensile offers on top of Numpy is basically GPU acceleration. So if you have a GPU and you have numb pile, you cannot use numpy with GPU, but if you have a GPU and you have to by touch, you can just take a tensor, which is basically PI torch equivalent of an array and you can start running it over a GPU and we’ll go through all these things in in detail. The second part is that it offers you something called as reverse mode auto differentiation, which is more important for the machine learning or the deep learning part of it, so we would not be focusing about it, but it’s something on top of for something extra. In addition to what Numpy already offers you just a slight bit of discussion about why Python is preferred over other machine learning frameworks because there are plenty of them outside available Right now, Pi Dot is kind of a Python first framework. It has a hybrid front-end, which means you can write your code in Python as well as in C++. At the same time, it’s not a wrapper over a C++ library. I mean, of course, it’s a wrapper, but it doesn’t like the kind of errors you get are pythonic errors. You can look at the error and you can mix. And so there’s a shape mismatch or something. Other benefits is that it supposed distributed training both over a CPU and a GPU, which would be something useful for us even when we are just using it as a replacement for numpy and not really talking about the machine learning story and it has an ecosystem of solutions which we would talk about towards the end. Oops, so, Pi touch versus Numpy. I’m sorry good, right, so Pi dos. Has this base computational object called a stencil, which is equivalent of Numpy is NDRA and pencil has a lot of operations or API is which are very similar to the way Numpy Apis work like so, for example, if you look at the creation operations, which are basically the operations that you use to make an umpire array so NP dot once would make you an umpire array of the given shape and the exact same thing with alts like instead of NP or an umpire, you would write torch Dot once, and it would give you the exact like that. The EPI remains exactly the same. If you are doing something like and ie and making an identity vector identity matrix of a certain shape or you are doing things like, take a matrix of ones and multiply it by two, you could do the exact same thing again just by replacing NP or numpy by touch. So these two pieces of code are exactly the same, and I haven’t you. I have a collaboratory notebook along side in the repo where you can run the coach and you would see that the results are exactly the same other than the fact that at one place you have a tense and the other place, you have an ndre But other than that, the results and the Apis are very, very similar. In fact, when I was creating these examples, I just rode the numpy code and for riding the torch code, I actually picked up the Numpy code and replace npy with a torch and most of the time it worked perfectly well like I did not even have to look up the apis to see whether the EP has a consistent or not. This is the extent of similarity they have so just and then just like you can take a list and pass it to an NP dot. Add a function call to get an umpire E! You can do the same thing with torched or tensile. And you would get a torch tensor. Eric again. The point is the creation operators. All a lot of pieces of API are very similar. So if you or if you are used to Numpy center, so you are used to the numpy way of doing things. Picking up on torch would be very very easy for you are picking up on. Pi Torch would be very easy for you Because the Epas are very similar. Then some people here would be using this other style of using numpy where instead of using the Equality operator. You pass two operands to the function and you use the out variable of the function. What I mean by that is when you are doing something like this, so you have two arrays X and Y you can either add of the arrays and then you can assign it to Y, or you could call NP dot add X Y, and you can write output into Y So the two are two pieces of code are exactly the same. You could do this kind of syntax and torch as well again without any changes. Sure, start your. Np or your number is replaced by a torch again. The takeaway messages, the apis are pretty similar as long as like, at least for the creation parts. They are exactly the same, another very useful. Feature of Numpy is the advanced indexing, which it gives you so you can take an array and you can start indexing it in in in weird ways in the sense, you have a an array which has got four dimensions, and then you are slicing across one dimension and making strides and so on again, The exact same advanced indexing is supported in Todd, so I have some weird, advanced indexing code and I could run the exact same code again just by replacing NP by Dodge again, The point being the API is a pretty similar, plus using things like negative integers and so on you can do those things again with touch so so far, the things we have seen. I’m showing you a toolkit which is very similar to numpy. And and this is the only only point. I’m trying to make so far. This tool has an API which is very similar to number. So now the question comes up if this toolkit has an API, which is very similar to numpy. And I already love Numpy. Why should I consider to move? What’s what’s wrong with numpy that I should consider something like by touch and I would use the same kind, of example, that we use when we talk of lists to numb. Fine, I will show an operation being done in Numpy and I will show an operation being done in torch, and we would see the difference in the amount of time it takes. The basic argument here is number. I cannot use GPU acceleration. Will spy torch can and in certain cases going to give you a huge speed up so here? I’ve got an array. This is a huge R. It’s a 10,000 cross 10,000 array and I am doing the multiplication of this array with itself. So basically, I have this X array and I am multiplying it by itself, doing it in Numpy, It takes around 36 point six seconds, doing it in torch on. GPU takes less than a second. It takes 800 milliseconds so this is the key benefit. This is the key key benefit that you get when you start using thoughts. Instead of numpy and the EPI again remains very similar in P. Dot Matt Mole has been replaced by torch torch madman, so it’s not that you have got a whole new learning curve, but you can now start using better hardware without having to give up on the syntax and so on so this is good, but then the question is do. I need to rewrite my numpy code into this torch code. Even replacing things like from Numpy important umpire’s, NP2 torch is going to take time. You don’t really have to. You can keep doing all your work flows in umpire, and whenever you reach a point where there are heavy matrices or large matrices involved and you are going to take things like the product or matrix multiplication and so on, that is the time where you can move to torch and smooth and moving from numpy to torch is very easy. All it takes is just one function call and the other way round as well. So if you have an umpire array and you want to convert it into a torch tensor, it’s just one function call, and if you have a torch, then send you want to bring it back into numpy. It’s another function. Call so here. I have an umpire array. All I do is touch taught from Numpy and I have a torturer, so that’s it. That’s all it takes and the other direction where I have a torch array and I want to convert it into an umpire, a dot numpy. This is just a sample code which shows that you could start with an umpire. I convert it into torch and then reconvert the torch RN2 Numpy array and you would see that the two air is matched, so you move started with Numpy went to torch and then went back to numpy and everything remains the same. So this is what idea workflow for would look for people. If they do not really want to rewrite everything in torch. You have your numpy logic as it is. Then they comes in operation. Well, it’s going to take a lot of time because the matrix is huge. So at that point, you move your data to the GPU. You use a GP for the costly operation, And then you come back into numpy so really only that one piece of code. Where, which is the so-called bottleneck that bottleneck is what you send over to the better hardware. You let the hardware take care of that part, and then you bring back the the computation into your numpy code and it can go the other way around as well, they’ve. I haven’t talked about those cases it, but there would be cases where you cannot use. Pi Touch Because Pi Touch does know. Pi Touch does not feature compatible with Numpy. Pi Toss is not a drop-in replacement for Numpy. So there are API is there are things which are still missing, which you do not have in Pi Touch, but you have an umpire so it would be cases where you are writing your entire code in Python, and then there comes an OP, which is not defined, so you bring your data into the to numpy, you apply that operation and then you again, take it back into torch right to be clear the benefit that we have seen so far is coming from the GPU and not per se from Pi Touch, But the benefit is Pi O Touch. Makes it very easy for you to use the GPU. So you should not be confused at Pi Touch is much faster. No, it’s not, it’s probably as fast as numb file, definitely not faster than it, but it lets you use. GPU, which Numpy does not let you and that’s why you see those boost in performance by default, all the Tensors are created and the Levin CPU and they can be easily moved to a GPU. In fact, they can will move between Gpus, depending on how many GPUs you have, so we have a notion of something called as device so you can make a CPU device by calling to our store device and sending the argument as CPU, or if you have a cuda device, A GPU device, you just say torch or device cuda. And this index says, which GPU are you talking about? The indexing starts at zero and then you can have 0 1 2 and so on and once you have a tensor, sending it to a GPU or to basically, any device is very simple. You just do the tensor dot. Whatever device you send, want to send it to, and your work is done again going back to the to the numpy workflow. You have your numpy code. You make a GPU device. You send your data to the GPU. You do whatever heavy computations you want to do. He’ll want to do there and you bring back your data. That’s it now when you are using. Python numpy. At the same time, there are certain pitfalls to look outward. There could be certain things that can go wrong. The first thing is. I will show this with the example, so I make an umpire array. I put it to Taj, and now I’m changing the tall cherry. I’m not changing the numpy array. What happens is it changes my numpy array as well. Because when you use something like dot from numpy, the memory is being shared and this is one reason why by this transformation of via this transition from an umpire a to. Tosh Tensor is very inexpensive because you are not really creating a copy when you do when you call tall, short from numpy and you take an umpire to touch, you are not creating a copy. You are using the exact same data again, so this is something to look out for and this is this gives you performance benefits in lot of cases. Thus, yeah, so if you actually want to make a new copy and not just share the underlying data, then you call, then you use the tensor function, not the dot from Numpy function, the second thing that can go the second thing that can go wrong is whenever you are moving data to a GPU, it’s going to create a copy that is something you cannot avoid because GPU cannot like Juba does not have access to your CPU memory in that sense, so instead of moving your data back and forth between CPU GPU multiple times. Idly, move your data to GPU ones. Do whatever computations you want to do there and bring back your data to the CPU because there is some cost involved when you take your CPU data to the GPU, remember? There was no cost when you were taking an umpire array to a torch tinsel because underlying storage was exactly the same. But when you are doing this CPU GPU thing this computational cost fixing. Oh, yes, and the third thing is if you have a differ if you have a tensile on. GPU and you are directly trying to convert it into an umpire. It that’s going to fail. You have to explicitly first bring it on the CPU by doing something like torture 8.2 CPU device. And then only you can call Dot number the reason for keeping it like an explicit error and not doing it internally is because sometimes it can it can lead to confusions as long as the data is on CPU it’s being shared between the processes it’s being shared between the torch library and the and the Numpy Library. So you could. You could just implicitly assume that if it if this transit? If this conversion was done implicitly, you could modify the CPU copy, hoping that your GPU copy would also be modified, which would not be the case, so you cannot actually convert a GPU tensor to an umpire array. What if I do not have any fancy GPUs? Good Pi thoughts be slow for me, not really, so I don’t have any bench models, but I have been using Pi thoughts for quite some time mixed with Numpy and sometimes without an umpire and I have not really seen performance differences. There are happening cases where performance differences have been reported and the core team. The fair team has been very responsive on those things, so if there are operations, which are slow, they very quickly. Roll out of fix for that. What are the other benefits of? Pi Torch other than GPU acceleration. Well, numpy is at the end of the day. Numpy is a tensor manipulation under ndre population library. All it does is give you this array and allow you to manipulate this array. Pi thought is something more. It allows you to manipulate the array and at the same time it lets you build up this entire deep learning framework on top of it. The point is if you know number, you know. Pi Torch. And if you know by touch, it becomes much easier to transition into other things like machine learning and deep learning. Pyro ecosystem also has a set of other libraries, which provide you a lot of flexibility when you are working on machine learning kind of lose cases, which I do not want to go in this talk. Yes, the other benefits well. As long as you have an umpire array, you can very easily convert it into torch and vice versa. This means other machine learning libraries in Python, which play very well with with number arrays, would play well with ten cells as well with Pi Touch as well, so for example, cycle on inside Pi, which takes number arrays as a primitive or the main data structure work. Well with Pi Torch as well, so you don’t start to lose out on all these libraries just because you are using torch tensors in some cases, specific wrappers also exist. So you have something called as quads, which is basically cycle on Plus Pi torch, and then there’s another side to it as I said you can use numpy to Augment Pi Torch as well so it is possible to write down functions. In fact, it’s possible to write on optimization functions in Pyre Torch, which using just Saipan. Um, bye and so on, so it’s not just really one-sided thing using Python and having GPUs gives you this accelerated runtime and having Numpy allows you to augment some use cases on some. API switch are missing in Python, which are missing in PI torture. Wait, so what is basically missing in PI torch? One thing is, there is no feature parity. It’s not a drop-in replacement. Numpy uses the uses as a parameter called as dim for a lot of matrix operations, whereas spite or chooses axis. So this is one one place where your code would break down. Pi trois does not really have a good support for spar stencils right now. So if you if your use cases are where you have huge tensors, but they are spouse, then probably you would not really see the speed ups. Neither on CPU nor on GPU, but this is something which is on the roadmap. Then there are other tools like cue pipe and a lot of other tools which allow you to, which are basically dropping your placements for Numpy and the later. Let Numpy run on. GPU, the only benefit which by Raj gives you on top of these kind of libraries, is the following. When you’re using numpy and you start using Q pi. You are bit basically using better hardware, and there is no denial about that. That is if your use case is totally around Numpy, and you just want to use better hardware, probably Q. Pi is a better thing to go, But if you are looking for better primitives, better data structures, better abstractions to build your models or build other things on top of them. Then Pythons gives you that flexibility as well. So on one hand, you can use Q PI and start using GPU without changing anything in your code, but if you at the same time, want better primitives or better functions which you can build upon. Then Pi Raja is something to try out. Yes, so the support for installing both the stable version, as well as preview version is very well given on the website. It’s very, very easy to install. There are no complicated errors for anything, and if you prefer cloud, there are built-in cloud disk in some sense for AWS A zero as well as GC P. What’s coming up? December 2018 is when Pi Touch 1.0 comes out, which is supposed to be the production-ready Pi Watch and the API do is not exactly in sync with an umpire API, but it’s going to get closer and closer to numpy API another benefit, which I did not go very well or very deep is that Pi Touch makes it very easy to run distributed code again, even for basic tensor operations. Pi Toss has a pretty good community of developers as well as contributors, and they learn this discourse forum, where the main developers are very active. And they very actively reply to your queries. I’m thankful to Adam. Adam is one of the core developers of Pi Touch for reviewing the slide and giving me some useful feedback and yes, so most of the stuff is taken from Numpy Org and Pi Torch OGG. The slides and everything are available on Github. In case you want to check out? There is a cool ability notebook where all this code is given. And you can actually run it and see the kind of numbers. I showed or real numbers and the API similarities which I showed also exist. So you can just run all the code that we through in the in the slates. Thank you [Applause] [Applause] [Music].