Transcript:
Welcome back to this series on neural network programming in this video, we will be expanding our knowledge beyond reshaping operations by learning about element wise operations without further ado. Let’s get started element. Wise operations are extremely common operations with tensors and neural network programming. Let’s lead this discussion off with a definition of an element wise operation. An element wise operation is an operation that operates on tensor elements that correspond or have the same index location within the tensor. We say that an element wise operation operates on corresponding elements and two elements are said to be corresponding if the two elements occupy the same position within the tensor. The position within the tensor is determined by the indices that are used to locate each element. I’m here in a notebook now and let’s suppose that we have the following Tensor’s [Music]. Both of these tensors are rank two tensors with a shape of two by two. This means that we have two axes that both have a length of two elements. Each the elements of the first axis are arrays, and the elements of the second axis are numbers. This is the kind of thing that we’ve seen quite a few times already in the series, but let’s build on this now. At this point, we know that two elements are said to be corresponding if the two elements occupy the same position with Anna Tensor in the position is determined by the indices used to locate each element. Let’s see an example of corresponding elements between these two tensors t1 and t2 This allows us to see that the one in t1 is the corresponding element for the nine in t2 The correspondence is defined by the indices. This is important because it reveals a general feature of element wise operations using this information, we can deduce that two tensors must have the same number of elements in order to perform an element-wise operation. We’ll go ahead and make this statement. Even more restrictive two tensors must have the same shape in order to perform an element-wise operation, Having the same shape means that the tensors have the same number of axes and each of the corresponding axes have the same length. This ensures that indeed it is possible to create. The correspondents needed to perform that element-wise operation. Let’s look at our first element wise operation addition. This is going to be a simple one. Don’t worry, it will get more interesting. Each pair of elements in corresponding locations are added together to produce a new tensor with the same shape, so addition is an element-wise operation, And in fact, all the other arithmetic operations like subtract, multiply and divide are also element wise operations. There’s another point that we need to touch on operations that we commonly see what tensors are arithmetic operations using scalar values. Hmm, something seems to be wrong here. These examples are breaking a rule. We established that said element wise operations operate on tensors of the same shape scalar values after all, are ranked zero tensors, which means they have no shape and our tensor t1 is a rank two tensor of shape 2×2 So how does this fit in? Now you’ve probably done it a thousand times without even noticing. That’s kind of weird, right that you’ve got these things of different ranks and different sizes. So what is it actually doing? The first solution that may come to mind is that the operation is simply using the single scalar value and operating on each element with an ax tensor. This logic kind of works. However, it’s a bit misleading and it breaks down in more general situations where we’re not using a scalar to think about these operations differently. We need to introduce the concept of tensor. Broadcasting Broadcasting defines how tensors of different shapes are treated during element-wise operations. Let’s think about the operation. T1 + to the scalar value tensor is being broadcasted to the shape of the Tensor t1 and then the element wise operation is carried out. We can see what the broadcasted scalar value looks like using the broadcast to numpy function. [MUSIC] So the scalar value is transformed into a rank two tensor, just like t1 and just like that the shapes match and that element-wise rule of having the same shape is back in play. This is all under the hood. Of course. The piece of code here paints the picture so to speak right, but what it’s actually doing is it’s taking that scalar and copying it here here here, right, and then it’s actually going. Element-wise tan is greater than zero six is greater than zero minus four is greater than zero. You haven’t giving us back the three answers, right, and that’s called broadcasting broadcasting means copying one or more axes of my tensor to allow it to be the same shape as the other tensor. So let’s look at a trickier example to hit this point home. Suppose we have the following two tensors. [music] What will be the result of this element-wise addition operation is it even possible, given the same shape rule for elementwise operations, Even though these tensors have different shapes that element-wise operation is possible and broadcasting is what makes this operation possible. The lower rank tensor t2 will be transformed via broadcasting to match the shape of the higher Rank Tensor t1 and that element wise operation will be performed as usual. The concept of broadcasting is key to understanding how this operation will be carried out as before we can check the broadcast transformation using the broadcast to numpy function and then run the operation after broadcasting. The addition operation between these two tensors is a regular element-wise operation between tensors of the same shape. Broadcasting is a more advanced topic than the basic element wise operations, so don’t worry if it takes longer to get comfortable with the idea. Understanding number one element, wise operations and number two, the same shape requirement for element. Wise operations provides a basis for understanding why we need broadcasting. We often use broadcasting in neural network programming when we are pre-processing our data and especially during normalization routines. Because I’m using git for source management. I can see the disk between our original predict J S file and the modified version of this file that uses broadcasting on the Left. We have our original predict J S. File within the click event recall. This is where we transformed our image into a tensor. Then the rest of this code was all created to do the appropriate pre-processing for VG 16 where we centered and reversed the RGB values. Now on the right. This is our new and improved, predict J S. File that makes use of broadcasting in place of all the explicit one by one tensor operations on the left. So look, all of this code in red has now been replaced with what’s shown in green. That’s a pretty massive reduction of code. There’s a post in the tensor flow. Jessie that covers broadcasting in greater detail in the video. There there’s a practical example and the algorithm for determining how a particular tensor is broadcasted is also covered. Don’t worry about not knowing tensorflow. Gs it’s not a requirement, so check that out for a deeper discussion on broadcasting. I highly recommend it. Having a deep understanding of broadcasting is one of those things that can take you to the next level as a neural network programmer, very few people in the data, science or machine learning communities, understand broadcasting and the vast majority of the time, for example, when I see people doing pre-processing for computer vision like subtracting the mean they always write loops over the channel’s right, and I kind of think like it’s it, it’s like so handy to not have to do that, and it’s often so much faster to not have to do that. So if you get good at broadcasting, you’ll have this like super useful skill that very very few people have we’ve looked at arithmetic operations. Let’s look now at comparison operations, which are also element wise for a given comparison operation between two tensors. A new tensor of the same shape is returned with each element containing either a 0 or a 1 will have a 0 in the corresponding component. If the comparison between the corresponding elements is false and will have a 1 if it is true, so suppose we have the following tensor and here we have a few operations. Check it out! I’ll let you pause the video or check this on the blog post to get a good understanding of what’s happening here, but for now, let’s just look at these operations from a broadcasting perspective. We can see that the last one. T dot less than or equal to seven is really this. So take some time here also to make sure you have an understanding of how this is working with the elementwise operations, we can also use functions and with functions. It’s fine to just assume that the function is applied to each element of the tensor. So here are some examples of that. Something that you may come across is that there are other ways to refer to element wise operations. So I just wanted to mention that. All of these mean, the same thing we can say, element, wise component, wise or point wise. Just keep this in mind. If you encounter any of these terms in the wild, remember the blog post for this video on deep lizard comm and also check out the deep lizard hivemind, where you can get deep lizard perks and rewards. Thank you for contributing to collective intelligence. I’ll see you in the next one. It’s an ancient skill, you know, because it goes all the way back to the days of APL. Stands for a programming language and Kenneth Iverson wrote this paper called notation as a tool for thought in which he proposed a new math notation and he proposed that. If we use this new math notation, it gives us new tools for thought and allows us to think things we couldn’t before, and one of his ideas was broadcasting, not as a computer programming tool but as a piece of math notation. And so he ended up implementing this notation as a tool for thought as a programming language called Apl and his son has gone on to further develop that into a piece of software called J. Which is basically what you get when you put 60 years of very smart people working on this idea, and with this programming language, you can express very complex mathematical ideas, often just where the line of code or two, and so I mean, it’s great that we have J. But it’s even greater that these ideas have found their ways into the languages we all use like in Python, the numpy and Pi Torch libraries, right, These are not just little kind of niche ideas is like fundamental ways to think about math and to do programming. [MUSIC].