Transcript:
Hi, and welcome to an illustrated guide to recurrent neural networks, A Michael, also known as learned Vector. I’m a machine learning engineering, the natural language processing and voice assistance space. If you’re just getting started in machine learning and want to get some intuition behind recurrent neural networks. Those videos for you. If you want to get into machine learning, recurrent neural networks are a powerful technique. That’s important to understand if you use smart phones and frequently surf the internet odds. Are you use applications that leverages? Rn ends recurrent. Neural networks are using speech recognition language translation stock prediction. It’s even using image recognition to describe it content in pictures. So I know there are many guys on recurrent neural networks, but I want to share illustrations along with an explanation of how I came to understand it in this video. I’m going to avoid all the math and focus on the intuition behind RN ends. Instead By the end of this video, you should have a good understanding of RN ends and hopefully have that light bulb moment, so RN ends are neural networks that are good at modeling sequence data to understand what that means. Let’s do a thought experiment, say you take a still snapshot of a ball moving in time. Let’s also say you want to predict a direction that the ball is moving so with only the information that you see on the screen. How would you do this well? You can go ahead and take a guess, but any answer you come up with would be that a random guess without knowledge of where the ball has been. You weren’t having an update of to predict where it’s going. If you record many snapshots of the ball’s position in succession, you will have enough information to make a better prediction. So this is a sequence, a particular order in which one thing follows another with this information. You can now see that the ball is moving to the right sequence. Data comes in many forms. Audio is the natural sequence you can chop up an audio spectrogram into chunks and feed that into. RN ends text is another form of sequences you can break text up into sequence of characters or sequence of words. Okay, so our ends are good at processing sequence data for predictions, but how well they do that by having a concept. I like to call sequential memory to get a good intuition behind what sequential memory means. I like to invite you to say the alphabet in your head. Go on, give it a try. That was pretty easy, right. If you were taught the specific sequence, it should come easily to you now. Try saying the alphabet backward. I bet that was much harder. Unless you practice the sequence before you’ll likely have a hard time, Here’s a fun one. Start out the letter. F at first, just struggle with the first few letters. But then after your brain picks up the pattern, the rest will come naturally, so there’s a very logical reason why this can be difficult. You learn the alphabet as a sequence. Sequential memory is a mechanism that makes it easier for your brain to recognize sequence patterns. All right, so are N ends. Have this abstract concept of sequential memory? But how the heck does it replicate that concept? Well, let’s look at a traditional neural network, also known as a feed-forward neural network as an input layer, hidden layer and output layer. How do we get a feed-forward neural network to be able to use previous information to affect later ones? What have we had a loop in a neural network that can pass previous information forward and that’s essentially what a recurrent neural network does an. Rnn has a looping mechanism that acts as a highway to allow information to flow from one step to the next. This information is the hidden state, which is a representation of previous inputs. Let’s run through an art and use case to have a better understanding of how this works. Let’s say we want to build a chatbot. They’re pretty popular nowadays. Let’s say the chatbox can classify intentions from the user’s inputted text to tackle this problem first. We’re going to encode the sequence of texts using an RNN, then we’re going to feed the RNA and output into a feed-forward neural network, which will classify the intents. Okay, so a user types. In what time is it to start? We break up. The sentence into individual words Rnn’s work sequentially, So we feed it one word at a time. The first step is to feed what into the RNA, the RNA encode what, and produces an output For the next time we beat the work time in a hidden state from the previous step, remember that the hidden state represent information from all previous steps the? Rnn now has information about the work. What in time we repeat this process until the final step you can see about a final step. The Arnon has encoded information from all the words in the previous steps since the final output was created from the rest of the sequence, we should be able to take the final output and pass it to the feed-forward layer to classify in intent For those of you. Who, like looking at code here are some Python showcasing the control flow first. You initialize your network layers. In the initial hidden state, the shape and dimensions of the hidden state will be dependent on the shape and dimension of your recurrent rail network. Then you loop through your inputs, past a word and hence a into the Artnet DRN and returns the output at a modified hidden state, this modified hidden state should now contain information from all your previous steps. You continue to loop until you’re out of words. Last you pass the output to the feed board layer and it returns a prediction and that’s it. The control flow of doing a forward pass of a recurrent neural network is a for loop. Okay, now, back to our visualization, you may have noticed the odd distribution of colors in the hidden states. This is to illustrate an issue with our N ends known as Short-term memory. Short-term memory is caused by the infamous vanishing gradient problem, which is also prevalent in other neural network architectures, so as yarn and processes more steps, it has troubles retaining information from previous steps As you can see the information from the word what, and time is almost non-existent at the final step, short-term, memory and vanishing gradient is due to the nature of back propagation algorithm used to Train and optimize neural networks To understand why this is lets. Take a look at the effects of back propagation on a deep feet board, neural network training and neural network has three major steps first. It does a forward pass and makes a prediction second. It compares the prediction to the ground truth using a loss function, the loss function outputs an error value, which is an estimate of how badly the network is performing last. It uses the error value to do back. Propagation, which calculates the gradients for each node in the network. The gradient is a value used to adjust the network’s internal weights, allowing the network to learn the bigger the gradient, the bigger the adjustments and vice versa. Here’s where the problem lies. When doing back propagation. Each node in a layer calculates its gradient with respect to the effects of the gradients and the layer before so the adjustments in the layer before it is small, then the adjustments in the current layer will be even smaller, this cost gradients to exponentially shrink, as it back propagates down the earlier layers failed to do any learning as the internal weights are barely being adjusted due to extremely small gradient. And that’s. The vanishing gradient problem. Let’s see how this applies to recurrent neural networks you can think of each time step and over current no network as a layer to train a recurrent neural network use an application of backpropagation called back propagation through time, the gradient’s value will exponentially shrink as it propagates for each time step again, The gradient is used to make the adjustments in the neural network’s weights, thus allowing it to learn small gradients means small adjustments. This causes the early layers to not learn because of the vanish ingredients, the RNN doesn’t learn the long-range dependencies across time steps. This means that there is a possibility that the word light and time are not considered when trying to predict a user’s intention. The network has to make its best guess with is it that’s pretty ambiguous and would be difficult even for a human so not being able to learn on earlier time steps causes the network tap short-term memory. Okay, so RNN suffer from short-term memory. So how do we combat that to mitigate short-ter’m memory to specialized recurrent neural networks were created, one called long, short term memory or LSDM, for sure, the other is gated recurrent units or GR use. Ls TMS and GR use essentially fokin just like our meds. But they’re capable of learning. Long-term dependencies using mechanism called Gates. These gates are different tensor operations that can learn what information to add or remove to the hidden state because of this ability, Short-term memory is less of an issue for them to sum this up. RNS are good for processing sequence data for predictions but suffer from short-term memory. The short-term memory issue for vanilla arm ends doesn’t mean to skip them completely, and you still more involved versions like Lst, Ms. Or GR. Use Rns have the benefit of training faster and uses less computational resources that’s because there are less tensor operations to compute you could use LST, Ms. Or GR use when you expect a model longer sequences with long-term dependencies. If you’re interested in digging deeper, I’ve added links to the description on amazing resources, explaining RN ends and its variants. I had a lot of fun making this video, so let me know in a comment. So this is helpful or what you would like to see in the next one. Thanks for watching.