Transcript:

So welcome to Section 2 of the course trading your first neural network. The purpose of this section is to get our basics right with the help of simple. Liril network. So as part of this section first, we’ll build an artificial neural network and in that process, we will learn concepts like nodes, weights, vices, activation functions and how they come together. Then we’ll take a look at a loss functions, which help us know if our model is being tuned in the right direction or not, a higher loss value would obviously mean that we are moving away from where the correct model weights should be. Then we learned about optimizers, which helped us tune the model weights properly and it does so with the help of gradients calculated on the loss with respect to all the weights and biases. Once our model is ready with the loss function and OPTIMIZER in place. We will proceed to train it and once trained, we will try to do inferencing on unseen data. We’ll also see how to save this model to disk and load it back and for those with GPUs, well tweak the model to run all the computation or the training on the GPU, So let’s get started building a simple neural network. We have already seen the motivation behind creating a neural network. So in this video, we will create neural network and do some useful tasks. We’ll start by understanding the concept of a neuron and with that we will understand weights, biases and activation functions and then we’ll proceed to create a multi-layer feed-forward neural network. A neural network also called an artificial neural network or a. NN is actually a graph data structure, it has nodes and edges the nodes with their incoming edges are called artificial neurons with reference to the brain component, which from which this is partially inspired so to understand your networks, its best to start by understanding a neuron. It’s actually a mathematical equation, but conveniently visualized as shown it is characterized by a node, the circle with multiple incoming edges and an outgoing edge. Each incoming edge has a weight associated with it. The node itself has a bias associated with it. Each incoming input gets multiplied by the edge weights, and then that value gets added to the bias in a sense we are calculating the weighted sum of the inputs and the resultant output can take any value from negative to positive. It is then passed through what is called an activation function, which is very important and there output becomes the output of the total neuron, So that’s the composition of a neuron when we say that we are training a neural network, all we are doing is tuning the values of these weights and biases, which we have no idea of when we start training it so that we can achieve a desired input-output relationship. The desired input-output relationship is given by our training data set the weights as you can imagine suggests, what is the relative importance of each of the incoming inputs and the bias suggests the output of the neuron independent of the inputs, For example, it is the value the neuron will emit when all the inputs are 0 Now let us take a look at activation functions. They are a very important component of neural networks. It is due to them that the network is able to learn complex boundaries in your data set. They bring in non-linearity into the network. They decide when the neuron should fire or have a value of something other than zero. Now without these functions, the model can only dissect the data set with linear lines because all the calculations together become one large linear equation with all these functions with the activation functions. The network can partition the data with complex boundaries and can do is full tasks. Here are some activation functions and their input-output characteristics. The sigmoid takes its input and squashes, it between 0 and 1 when the input is near to 0 the output has this nice gradient, whereas if the input tends to be negative or positive, the output saturates to 0 or 1 tan. H does the same thing. But in this case, the output changes from minus one to plus one. Elio is in that sense very simple. If the input is zero or less, the output would be zero and for every other case, the output would be same as the input, Almost all of the deep learning models use relu, which has given good results for a variety of tasks. A neural network is constructed by joining together neurons that we saw earlier. The neurons are segmented into an input layer and output layer and one or more hidden layers. So here we have a multi-layer feed-forward neural network, A user of the neural network gets to interact with the input and output layers, The input can be, for example, pixels from an image and output can be the class of the image any given layer will have incoming connections and outgoing connections. For example, In this case, the hidden layer gets three incoming edges per node and has four outgoing edges. Also, the output of the neural network can be given to a classifier for classification. Okay, so with that knowledge, let us proceed to build an artificial neural network and inspect its elements and so for that, we will jump right into this ipython notebook. So in this section, we’ll be building a fully connected feed-forward neural network to classify a flower based on its structural attributes we start by importing all the modules that we need now. In order to access the data set, which is in the form of comma separated variables. I’ve written a small module and created a by torch data set. You can take a look at Iris Dot Py. If interested in knowing how that was done, the data set that we are talking about the Iris data set, which is quite popular in machine. Learning teaching has four attributes of a flower called the iris. The four attributes are the sepal length, sepal width petal length and petal width along with class of the flower. See, pal is the part of the flower bending downwards. Let this one, whereas petal is the one which is standing upright. These attributes map to one of the three kinds of these flowers found the iris setosa iris. Very ver See color and Iris Virginica. The challenge is to create a model which when given the attributes of the flower should classify it into one of these. Three classes here are some examples. Let us read from the dataset first few lines and see how it looks like. So we see Four columns of the structural attributes and the class of that flower with that dataset and challenge in mind, let’s create a fully connected three layer neural network, the choice of the number of layers and number of nodes per layer, part of the design process of building a neural network. You might have to try a few variations to see which fits best given the data set you have. All the neural networks in potage are built by creating custom modules, which are classes inheriting from the. NN dot module base class. The advantage of this is that modules can be nested within each other like a tree structure and so you can group different groups of layers together and keep them logically separate. The two methods that needs to be implemented are the init method and default method. The init method gets called. When you instantiate this class. It is used to create the various layers, create activation functions, Initialize stuff and so on. So in this case, we are taking in four inputs. The input size is the number of attributes of the flower, and it will be used to create our first fully connected layer. The next two parameters, hidden one size and hidden two size suggests the number of nodes you want to have in the hidden layers, the higher the number. The better the model can fit the training data, but too high of a number can lead to overfitting a problem where the model launched our training data So well that it does poorly on the unseen data. The NAMM Classes parameter will be the number of classes. We have in the data set, which happens to be three. There are also the number of neurons we want to have in the output layer. Hence these three neurons on the output layer will give out three class course one for each class if the model is properly trained, the class with the highest score should match the actual class for that data instance, so in the body, we are creating three linear units and these are the classes which implement fully connected layers. They take two arguments, the number of incoming edges and number of outgoing edges. They also create one bias per number of outgoing edges. Also, we are creating two rayleigh activation functions to go with the first two hidden units. The last one, which is f3 will be our output layer, which will give out the three class scores. The fourth method is where you combine all these together. The fourth function is called with inputs to this module in our case. The attributes of the flower and it returns the result of the calculations. Which are the class course In this case first, we are passing the input through a fully connected layer, followed by our Rayleigh activation function. This constitutes our first hidden layer. We do the same with our second hidden layer and finally, the output is passed through the output layer, giving us the class scores, hence in goes the attributes of the flower and out comes the class course also note that we always process data in batches What will be passing in are a number of instances as a batch and the output will be a set of class scores for the entire batch. The number of data instances in the batch is governed by the batch size, which is set while iterating through our data set. So let’s create this class. You can actually print all the layers from an object of this class like so, so in this case, we are instantiated. Irish net with 4 as the input size 3 as the NUM classes and we have chosen hundred to be the number of nodes. We want first hidden layer and 52b the number of notes we want in the second hidden layer, the description printed here suggests everything that we discussed next we’ll be creating the data loader we select bat size of 60 suggesting that in each iteration, the data loader will give us 60 instances, so if the total number of instances in the training set is 120 we’ll have 2 iterations to cover the entire set, which is actually 1 a POC. This is the file which contains the comma separated values that will be ingesting in. So the first thing we do is create the data sets by calling the Iris Target data sets function, remember? This is the module that I implemented. You can take a look, so let’s run this. So what we see here is that We have 120 instances in the training set and 30 instances in the testing or validation set. I’ll be using these two terms interchangeably next. We create Data Loader, which whose responsibility is to allow us to iterate through this data set and there are a few things we can do with the data loader. First, we set the data set and then we select the batch size and then we set shuffle is equal to true. The shuffle is equal to true flag indicates or instruction, a tall order to have the data reshuffled at every epoch so that we get different batch compositions every book, OK? So that was about creating our first neural network and creating the data loader.