Transcript:
Hello, people from the future. Welcome to normalized nerd. Today I’m gonna talk about markov chains. This is a concept that is used in many places from statistics to biology from economics to physics and, of course, in machine learning. I guess some of you have heard of markov chains before, but don’t have a clear idea about it. I hope by the end of this video, you will have a fair understanding of markov chains and you will know why they are so interesting. If you are new here, then please subscribe to my channel and hit the bell icon. I make videos about machine learning and data science regularly. So let’s get started well as you know me. I like to start with an example, assume there’s a restaurant that serves only three types of foods, hamburger pizza and hot dog, but they follow a weird rule on any given day they serve only one of these three items and it depends on what they had served yesterday. In other words, there is a way to predict what they will serve tomorrow. If you know what they are serving today, For example, there is a 60 chance that tomorrow will be a pizza day, given that today is a hamburger day. I’m gonna represent it In the diagram as weighted arrows, the arrow originates from the current state and points to the future state. I’m using the term state because that’s the convention. Let’s see another one. There’s a 20 chance that tomorrow again They will serve hamburgers and that is represented as a self-pointing arrow. Well, each arrow is called a transition from one state to the other. Now I’m gonna show you all the possible transitions. The diagram you are seeing right now is actually a markov chain. Yes, the concept. Is that simple? Now let’s discuss some properties of the markov chain. The most important property of the markov chain is that the future state depends only on the current state, not the steps before. Mathematically, we can say the probability that N plus 1. X step will be. X depends only on the nth step, not the complete sequence of steps that came before N. At first glance, this might not seem so impressive, but on a closer look, we can see that this property makes our life a lot easier while we tackle some complicated real world problems. Let’s understand it a little better suppose. The restaurant served pizza on the first day hamburger on the second and again pizza on the third day. Now, what is the probability that they will serve hot dogs on the fourth day? Well, we just need to look at the third day and it turns out to be 0.7 This is the heart of markov chains and it’s known as the Markov property. The second important property is that the sum of the weights of the outgoing arrows from any state is equal to 1. This has to be true because they represent probabilities and for probabilities to make sense. They must add up to one here. I should probably tell you that. There are certain markov chains with special properties, but they deserve individual videos on their own. So here I’m just gonna stick to the basic kind of markov chains. Now that you know what a markov chain actually is, let’s explore our markov chain by doing a random walk along the chain. Initially, we are on a hamburger day. Let’s see what they serve for 10 consecutive days, so after 10 steps. We have a situation like this. Now we want to find what is the probability corresponding to each of the food items, AKA the probability distribution of the states. Well, it’s pretty simple, just divide the number of occurrences of an item by the total number of days for hamburger, It’s 4 upon 10 for pizza, it’s 2 upon 10 and it’s 4 upon 10 for hot dogs. It’s obvious that if we change the number of steps, then these probabilities will vary, but we are interested in what happens in the long run. Do these probabilities converge to fixed values or they continue to change forever? I wrote a python script to simulate the random work for hundred thousand steps. I found that the probabilities converge to these values and this probability distribution has a special name, which is the stationary distribution or the equilibrium state. This simply means that this probability distribution doesn’t change with time for this markov chain. Okay, so we have managed to find the stationary state, but this method doesn’t seem to be very efficient. Is it moreover, we don’t know if there exists any other stationary states, well, a better way to approach. This problem is linear algebra. Let me show you first of all. Let us think about how we can represent this markov chain more efficiently. Well, this is essentially a directed graph so we can represent it by an adjacency matrix. If you don’t know what it is, then, please don’t worry, it’s very simple. The elements in the matrix just denote the weight of the edge, connecting the two corresponding vertices, for example, the element in the second row and first column denotes the weight or the transition probability from pizza to hamburger. If an element is 0 then it just means that there’s no edge between the two vertices. This matrix is also called transition Matrix. Let’s denote it by a remember that our goal is to find the probabilities of each state. Conventionally, we take a row vector called Pi, whose elements represent the probabilities of the states, basically, Pi Denotes the probability distribution of the state’s suppose in the beginning, we are on a pizza day, so the second element, which corresponds to the pizza, becomes one and other two remain zero. Now see what happens when we multiply this row vector with our transition matrix. We get the second row of the matrix. In other words, we get the future probabilities corresponding to the Pisa state. Then we take this result and put it in the place of Pi Zero. We will do this once again now. If there exists a stationary state, then after a certain point, the output row vector should be exactly same as the input vector. Let’s denote this special row vector as Pi. So we should be able to write Pi a equals Pi. If you have ever taken a course of linear algebra, then you should find this equation somewhat similar to a very famous equation. Yes, I’m talking about the eigenvector equation, Consider Lambda equal to 1. And just reverse the order of multiplication. And you have got our equilibrium state equation. How to interpret this? You might ask well. We can imagine that Pi is a left eigenvector of Matrix a with eigenvalue equal to 1. Please pause the video. If you need a moment to convince yourself now. The eigenvector must satisfy another condition. The elements of Pi must add up to 1. Because it denotes a probability distribution so after solving these two equations, we get the stationary state as this. It tells us that the restaurant will serve hamburger about 35 of the days. Pigs are about 21 and hot dog about 46 percent using this technique. We can actually see if there are more than one stationary states. Can you guess how we just need to see? If there exists more than one eigenvectors with Eigenvalue equal to 1. Isn’t it super elegant? Now, let’s compare this result with the previous one that we got from our simulation. Look how close they are! They kind of validate each other. By now. You should have a very good understanding of markov chains and you should be able to identify Markov chains successfully. Obviously there is much more to learn about different kinds of markov chains and their properties that cannot be covered in a single video. If you want me to make more videos on this topic, then please let me know in the comment section. If you like this video, do share it. And please please please subscribe to my channel. Stay safe and thanks for watching [Music] you?