Transcript:

Welcome back to this series on neural network programming in this video, we will look at a practical example that demonstrates the use of the tensor concepts, rank axes and shape to do it. We’ll consider a tensor input to a convolutional neural network without further ado, let’s get started in this neural network programming series. We are working our way up to building a convolutional neural network, so let’s look at a tensor input for a CN n. In the last two videos, we introduced tensors and the fundamental tensor attributes rank axes in shape. If you haven’t seen those yet. I highly recommend you check them out what I want to do. Now is put the concepts of rank axes in shape to use with a practical example to do this. We’ll consider an image input as a tensor to a CN N. Remember that this shape of a tensor encodes all of the relevant information about attend sirs AXYZ rank and indices, so we’ll consider the shape in our example, and this will enable us to work out the other values. The shape of a CNN input typically has a length of four. This means that we have a rank four tensor with four axes. Each index in a tensor shape represents a specific axis and the value at each index gives us the length of this corresponding axis to break this down. Let’s work backwards, considering the axes from right to left. Remember the last axis, which is where we’ll be starting is where the actual numbers or data values are located within a tensor if we are running along the last axis and we stop to inspect an element there we will be looking at a number. If we are running along any other axis, the elements there will be multi-dimensional arrays for images. The raw data comes in the form of pixels that are represented by a number and laid out using two dimensions, height and width to represent two dimensions. We need two axes. So the image, height and width are represented on the last two axe’s possible values. Here are 28 by 28 as will be the case for our our image data in the fashion in this data set or, as another example 224 by 224 which is the image size that is used by the VGG 16 neural network or really any other image dimensions. We can imagine the next axis represents the color channels. Typical values here are three for RGB images or one. If we are working with grayscale images, the color channel interpretation only applies to the input tensor, though as we will reveal in a moment, the interpretation of this axis changes after the tensor passes through a convolutional layer up to this point using the last three axes we have represented a complete image as a tensor. We have the color channels and the height and width all laid out in tensor form, using three axes in terms of accessing data. At this point, we need three indices. We choose a color channel, then we choose a height, and then we choose a width to arrive at a specific pixel value. This brings us to the first axis of the four, which represents a batch size in neural networks. We usually work with batches of samples opposed to single samples, so the length of this axis tells us how many samples are in our batch. This allows us to see that an entire batch of images is represented using a single rank for tensor. Suppose we have the following shape for a given Tensor 3 by 1 by 28 by 28 using the shape, we can determine that we have a batch of 3 images. Each image has a single color channel and the image, height and width are 28 by 28 respectively. This gives us a rank for tensor that will ultimately flow through our convolutional neural network, given a tensor of images like this, we can navigate to a specific pixel in a specific color channel of a specific image in the batch, using 4 indices. Let’s look now at how the interpretation of the color channel axis changes after the tensor is transformed by a convolutional layer. Suppose we have a tensor that contains data from a single 28 by 28 grayscale image. This gives us the following tensor shape 1 by 1 by 28 by 28 Now suppose this image is passed to our CNN and passes through the first convolutional layer. When this happens, the shape of our tensor and the underlying data will be changed by the convolution operation, the convolution changes, the height and width dimensions, as well as the number of color channels. The number of channels changes based on the number of filters being used in the layer. Suppose that we have three convolutional filters and let’s just see what happens to the color channel axis since we have three convolutional filters, we will have three channel outputs from the convolutional layer. These channels are the outputs from the convolutional layer, hence the name output channels opposed to color channels. Each of the three filters involves the original single input channel producing three output channels. [MUSIC] The output channels are still comprised of pixels, but the pixel values have been modified by the convolution operation, depending on the size of the filter, the height and width dimensions of the output will also change, but we’ll leave those details for a future post with the output channels. We no longer have color channels, but we have what we can think of as modify color channels and we call these channels feature. Maps these so-called feature maps are the outputs of the convolutions that take place using the input color channels and the convolutional filters. So we combine an input color channel with a convolutional filter. We do a convolution operation and we get a resulting output channel that we call a feature map. The word feature is used because the output represents particular features from the image like edges, for example, and these mappings emerge as the network learns during the training process, We should now have a good understanding of the overall shape of an input tensor to a CNN and how the concepts of rank axes and shape apply to this understanding. We’ll deepen our understanding these concepts in the future videos when we begin building our CN N don’t forget to check out the blog post for this video on deep lizard comm and check out the deep lizard hivemind for exclusive perks and rewards. I’ll see you in the next one. In 2012 a young researcher named Alex Khrushchev ski working under Geoffrey Hinton, in the University of Toronto in the AI Laboratory, designed a piece of software that was able to learn visual recognition by itself. It was able to learn visual recognition by learning from a large amount of data using a technology using an algorithm called deep learning. In 2012 he designed the Alex Net and he submitted Alex net to the global competition of Large-scale computer vision recognition competition and in 2012 when he submitted Alex net to that competition. He won Alex Net beat every single computer vision algorithm that was developed by experts computer vision experts who hand coded mathematics hand coded algorithms specifically for visual recognition without specific training and computer vision. Alex Khrushchev skis Alex Ned, trained with data, performed on two GTX 580 S from Nvidia and he won the competition. He beat everyone he beat. Every single computer vision algorithm that had been developed up to that point, the results of Alex net, his achievements caught the attention of every computer vision scientist every artificial intelligence scientist around the world. Alex Net 2012 Alex Khrushchev ski started the GPU deep learning revolution. It was the Big Bang of modern AI [Music].