Transcript:
In this video, we will demystify the heavily underused notation einsum. [MUSIC] right. So what is Einsum? Well, it’s an extremely general way of performing various tensor or ND array operations as you will see soon, but before we try to understand how it works, Let’s first ask the question why so Einsam is extremely convenient and very compact and it’s an operation that can be used as a replacement for so many tensor operations, so just a small list would be matrix multiplication, element-wise multiplication, permutation, etc, and what is even more amazing is that it can combine multiple of them in a single einsu’m call, so we can say goodbye to remembering the syntax for matrix multiplication for Numpy, Pi torch and tensorflow. Also, let’s say you need to permute the input to match the functions call ordering. Yes, batch matrix multiplication. I’m talking about you with einsum. You can say goodbye to that as well. You don’t even need to permute the output that can also be done inside Einsa’m. So I guess we can say goodbye to that too. So what about the cons of einsu’m? Well, first of all, it can be a bit confusing and that’s. Why, I’m making this video? Second is that we in practice oftentimes lose some performance because it’s not as optimized as for a specific function call, but this is a bit of a generalization because einstem is actually faster in some cases, too, especially if you’re combining multiple calls into a single einsa’m call. So how does Einsu’m work? I think that is best explained with an example, so let’s take a look at matrix multiplication, so the math for matrix multiplication looks like the following where we will sum over multiplying the rows of a with the columns of B now with Einstein summation, we can actually remove the sigma entirely because we’re using K both for a and B. So the index K is repeated in the input sequence, so we can write this without the sigma because we implicitly know that those dimensions are going to be multiplied and summed over, so let’s compare this to the code for matrix multiplication using nested loops, which would look something like this where we have two outer loops, I and J and then an inner loop. Summing over the element-wise multiplications of a and B, we will come back to this in a second, But using einsum we can do matrix multiplication with the following call, where the IK specifies the dimensions of the first input, a and KJ specifies the dimensions of the second input. B then we do arrow, and then I J specifying the dimensions of the output M and, as I said before K here is repeated over the input, and this means that this dimension will be multiplied. So two important definitions is that we will define three indices to be the indices specified in the output, and then the summation indices will really be all the others, but those indices that appear in the input, but not in the output, so going back to our example I J here will be the free indices because they are specified in the output and K will be a summation index, so the free indices are associated with the outer loops. In this case. I and J and then the inner loop is where we’re summing over the summation index in this case K After the outer loops, we first initialize a variable total and then in the inner loop in the summation loop over index K here we will sum over as we multiply the element wise of a and B After obtaining this sum M. I J will be equal to this total, so hopefully this was clear. Let’s take a look at another example where we have defined two vectors a and B and we’re doing ein sum. And then I and then J. And then the output I J so this can feel tricky to understand what is actually going on so first of all, we have the free indices, I and J. And then we have no summation index because all are used in the output. So when you field out use the nested loops, write out the nested loops, so we will have the outer loops. I and J After that, we will initialize our variable total now. In this case, we won’t have a summation loop, so we will just do total plus equals, and then the indices A I element-wise multiply by BJ. And then we will set the output outer In this case. I J to be equal to that total now. If you’re familiar with this operation, this is called outer product, but the idea here is really that if you don’t understand what’s going on, you can convert it to loops, which you can then understand, let us write down the general rules for Einsu’m. So the first rule is that repeating letters in different inputs means those values will be multiplied and those products will be the output, so an example of this is as we saw previously when doing matrix multiplication where the index K here is repeated. Now you have to actually be careful because K for both A AND B needs to be of equal length for this to work. Otherwise, you will get an error. But the second rule is that omitting a letter means that that axis will be summed, so if we have an example where we define a vector X and we do in sum and then I and then simply arrow specifying no output dimension, this will sum the Vector X, so essentially, we’re doing sum of X. The third rule is that we can return the unsummed axis in any order that we would like, so, for example, if we input a three-dimensional array with shapes five by four by three and specify them as the dimensions, I J and K and then we do arrow K J I. This will reverse the shape to be 3 by 4 by 5 as the output. All right, so I think you now understand the fundamentals of einsa’m, but you may or may not agree with the following ein, some to rule them all ein sum to find them ein sum to bring them all and in the elegance, bind them, so let’s go to the code to convince you of this fact, so we’re going to show how to do a bunch of different common operations that you want to do using just einsum and you can use Numpy pytorch you can use tensorflow. I’m going to use Pytorch, but of course you can just do. You can just change it to using the specific library that you want, so I’m just going to import torch and in Pytorch, its torch Dot Iron sum in tensorflow, it’s tfironsum in Numpy, it’s. I guess numpyinsome so it’s a pretty trivial to. I guess. Convert them to the different libraries. All right, let’s start with initializing a random tensor and we’re going to do a matrix, or I guess a two by three tensor and first thing we’re going to show is how to permute the tensors so we can do torchinsum we can specify IJ, and then we can do. Let’s see arrow, and then we can do J I, and then we just actually change that to another, and then we just send in the input X. So what this will return is the same tensor just, uh, permuted so and this is the same as a transpose, but of course you can use this for multiple dimensions, so it’s a really the general way of permuting a tensor. All right, if you want to do a summation and let’s say you want to do a summation over all of the elements in in the entire 2×3 matrix, then you would do torchinsu’m you do. I, J arrow, That’s it and then X. So that would return the sum of the six elements. So if you want to do a column sum, we would do torchensum I. J and then we would just specify J and then X. So this is the second rule or whatever where we don’t specify the dimension and in this case, we’re not specifying I so it’s going to be summed over that dimension. If you want to do a row sum, it’s going to be pretty similar. We would just do I, J. And then we just specify I instead of J and then we send in X. Now, let’s say we want to do a matrix vector multiplication, so we could do V. We can do V and we can do torch Dot Rand one by three, So let’s say it’s a one by three vector and, uh, we wanna multiply X. Uh, with this vector now what you would do is you would just do the, uh, transpose of this, right, you get it three by one, and then you would multiply it with X, so you would do X matrix multiply v transpose, but, uh, we can just do Torch Dot iron sum. We can specify the dimensions, I J and, then, uh, K for the one here at the V and then we’re just gonna do, uh, specify J because that those two dimensions are the same, and then we can just do. Let’s see arrow, and then we can specify I K and this einsam will know to multiply along the index that are the same, so the j1 and then we can just send in X, and then the second would be V. Now, uh, notice here that we don’t need to care about reshaping stuff right, Normally, As I said, we would have to do a transpose before, but now we don’t have to do it. We can just specify the dimensions as we would like really, so if we would want to do matrix multiplication, let’s actually just use, uh, X again, so let’s say we would multiply X together with itself. We would do something like X dot matrix multiply, X dot, transpose something like this, right, and that’s that’s pretty clean too. Uh, but how we would do it with iron sum is we can do torch Einstein. We can do I, J. KJ, because remember? If we’re gonna send in X two times, then the second input, the second dimension rather is the one that’s gonna match, then we do arrow, we do I, K and then we do we’re sending the inputs X and X, right so this would return a, uh, two by two where we multiplied sort of we multiplied X with X, but we multiplied 2. By 3 times 3 by 2. And then let’s see, we’re going to do a dot product and let’s say, we’re just going to take the first row of X. So how we would, uh, so how we would do this? If we would do a torch, Dot Einstein, we would do I specifying the that dimension, right, this would be a three-dimensional vector. Then we would do comma. I specifying the same dimensional vector in X again, but of course, this could be two different vectors, and then we would just do, uh, arrow, and then nothing right that would multiply them and then sum them together, so we could. We could, uh, just index in X right for the getting that specific. Um, row, so we can do that two times, and that will be, uh, be the dot product. Now, let’s say you want to do the dot product with a matrix, so you want to multiply element-wise multiplication of the matrix matrix, and then you would add them together, you could do torcheinsum of IJ and then I J and then just arrow and then X and X, so this would multiply those dimensions element-wise and then do a summation because we’re not specifying any output dimension here, all right, so if we would just want to do the element-wise multiplication, but not the sum we could do torcheinsum we can do. I J. I, J and then specify I J and then in this case, X and X. But, of course, were just using the same here for simplicity you could, of course, use two different ones, all right, so for the outer product, let’s define two different ones. Let’s do a is torchrand and let’s do a vector of three and then three elements rather and then torchrand and then lets I know five we would do towards towardsinsum we would specify I for the input a and then J for the input P and then we would do I, J. And we saw this example on the slides too, but then we would just input a and B now for batch matrix multiplication. We can do sort of we define two different. Uh, three dimensional tensors. So we would do torchrand we would do three two five and we would do. B equals torchrand of three, five and three. All right, and, uh, we want to in this case. Multiply this five. The last dimension of the A with the second dimension of the b, which is also five. I mean, d. They need to match sort of the having the same number of elements. Uh, so we would do torcheinsum. We would specify the dimensions in this case. I J and K for the input a and then we would do. I because those are going to be matched, then K for the second and then l for the last one and, uh, so here I needs to match, and then the K needs to match, and then we would do arrow, and we want the output. Let’s see I J and then l, right. We want to multiply these these. Uh, this dimension with this dimension for all of the batches. Which, in this case, we have three examples in our batch, so we want to do that. We just do send in Amb. And that’s how we do it. Of course. Now I’ve made so that you could just do torchbmm and it would also work, but these dimensions. Um, you know, if you flip these dimensions, but you still want to do the same thing you could. Just flip the indices here and, uh, reorder them. The way you like, so you don’t have to permute the input in any way now. Let’s say we want to obtain the diagonal of a matrix. We can do that. If we first Initialize X to be torchrand and it’s going to be a 3×3 so it’s going to have to be diagonal, but then we would do torcheinsam. We would do I I, and then we would just do 2i and then X so again. If you’re confused over this, this would just obtain the the diagonal elements, but you could write out the nested loops and check that this works and then for the matrix trace, so this would be the, uh, the sum of the diagonal we would do torch Dot ein sum of II and then just simply arrow, right, that would sum sum those values and then we would send in the input. X, all right, so I think you now have seen a bunch of different examples for einstem. Of course, these are only the basics you can do so many more advanced use cases, and maybe I’ll do another video on more advanced cases when using einste’m. But I think this is really enough to build you a solid foundation to explore more with Einsam and to see the benefits. And perhaps you now agree that ein sum to rule them all.