What is going on everybody and welcome to this awesome cities. And in this series, we are going to build chat. Bots from scratch and we’ll be using sequence to sequence model mostly encoder/decoder model and we’ll be using bi-directional lsdm to design our model, and I am really sorry because I was not able to upload videos in the past days, so I have decided to take something awesome to you, and I’ll be again, creating this chat board in this series as an example when we asked. Hi our chat about sports. Hi, how are you find quite fine and forward for the data will be using your corner movie data side. If you want to download this. I will link it down description. Go ahead to this website and download the zip file. After you download the zip zip file. You will even get some files, but from those files. We need just two files. One is movie conversation and move it aft. Let me quickly show you how these files looks like. This is movie Lines, Dot TXT and I will suggest you to give this data set a time to read it and to to know how you can pre-process. This data set quickly. Let me explain you over here. We have the ID of the conversation, then user that movie, then the name of the user and this is the dialogue itself, and this files contains the whole conversation between the this. Id, which I have shown you. So these are the conversations. So basically, we have a total of so many conversation. I think we have total of 80 3097 conversations full conversation between the users and you can give more time to understand this data set. All right, so let’s quickly jump to the spider and make sure you have the Python file and the data file in the same directory. If not, you have to give the full path, which we which I’m going to define in a second, so for now, we are not going to import any library, so what we are going to do is to open the files first of all so. I’ll be using spider because it is quite easy, too. It’s quite easy to access the variables from this variable Explorer Tab. So these are my files in the corpus. Porter, let’s first open the movie. Nice, let us name it as lines and we’ll be saying open will be using open, which is in blue. You don’t know, you do not need to install this module separately, so I’ll be saying open inside the photos folder, which is movie Lines, Dot TXT, open this and encoding for this is utf-8 then my opening this. We might get some errors. What we want to what we want is to ignore those errors. So we have to specify right here. Make sure there is s after the error, so don’t get confused or capital Po over there. So I’d be saying errors. It was to ignore then after that. We need to read this file. Alright, so, after reading this file. What we what we need to do is to alright. So this is the movie lines, right, and this is splitted after new lines in programming in Order / and represents new life. So what we have to do is to say dots Click at every new line so that we will get the list off. All right, so we’ll get the list of every new. Niles, let me quickly run this line and show you how this looks like, all right, so this is our variable, and it will take some time at it. It is too much data and zoom this up, so we will be having the conversations after every new line great and this is list, right, and these are the dialogues. So now we have to do same thing to read conversation time. Ctrl + Z and copy this and paste it and name it as converse and the name of the file is conversations and run it, so we have two variables, conversations and lines. Now we have to process this, so why we need to pre-process and what we need actually from this pre-processing already. So what we actually want is to let me take you to first. Alright, So what we want is to lisp of this L 194 at 195 at 196 and other 97 and so on so in our list, first list will be This second list will be this so we’ll be having nested lists how we can do that. We are going to iterate in this and split it at this particular keyword. Then we’ll be grabbing the last value, and then from that, we will just take these values and gonna replace these single cores and commas, and then we will be splitting it at spaces, right these spaces, so that will be having a list of this, which is our task. Alright, so we will be saying for. Khan, we’re in Converse. Alright, so we’ll be saying, okay, So first of all, let’s create another variable, which will store this list and we will name it as X. Ch n s teens. We will take it as a list, so we’ll be saying or comma in con words and everything. E XE and got a pen. What could happen is put on work towards click. We have to split. I have already shown you Plus plus plus dollar plus plus, plus what we want to grab is till last value from this split list and from that last value. We need our these values. I have already explained. Do please remind if you are getting confused right here. Then I have also told you that. We need to replace single code with nothing and one more thing we need to replace zoom. This one more thing we need to replace. Is this comma with nothing? Then at last, we will be saying Dot Strip. All right, so let’s create and hope for we do not get lets. Run this up and congratulations. It will take some time to open. Alright, so this is what we actually require to do, and we got the list of these conversations now. We have to do same thing for this lines, but this time we will be doing something different. What we want is to take you through it. These are our movie lines, what we want. We need to create a dictionary with the key. This and the value of that key will be this. Why you need to do it because later we need to use that list and convert that to question-and-answer list, so let’s quickly create a dictionary will be naming it this dictionary as dialogue, you can use any naming convention you want so. I’m using for trying in lines. What do we want if we actually will be same for dialogue and for the key? What we want is line dots and so we will be splitting at this again and we’ll be grabbing first pose first position or first index or 0th index as the key of the dictionary and the last index as the the value of that key. All right, so let’s quickly do that. Alright, so we’ll be saying to line Dot. Click where to split, add taller. Alright, so split at this position. -, what is this? We need -, all right, so we need first position as the key and as the value what we need is to. We directly passed it. Nine dot. We are displeased. Late at plus, plus plus taller, plus plus plus and give it a spin and what we need is the last value. This is quite easy compared to previous one. Let’s run this and we’ll be having dialogue dictionary. It will take some time to load. Alright, so this is a dictionary, and that is what we want as you might know that for training a sequence to sequence model, we need to convert our data to questions and answers where question will be the input to our sequence or encoder encoder will output a context. We’ll talk about it later in detail. We need to convert them to question and answers how we can do that. That is quite easy first of all. I’ll be creating a lisp for questions and a list for answers for training. We need to pass questions and we will be expecting it to predict the answer and we’ll calculate the loss based on original answer and the predictive answer. All right, so for doing that what we will be doing and okay, we have our e. Xc @ n okay. This is it! This is the list of conversations so what we need to do is to iterate over this Lisp, and then from that, it read in this particular list, then this particular list, then this particular list and when we grab this particular value and this particular value, we will be using that dictionary to access the value by passing in this key, this key this key and so on so we’ll be saying for name it s convert because we not using this old valuable and I’ll be thing for corner in exp at the end and what we’ll be doing will be iterating to the list and then we will be iterating in that convert list because convert is also a list and extend is list a place nested list, so we’ll be saying for I in range of length of convert and now -1 y -1 Give me some time to open this. Alright, so this is the list as an example. Think of this that, like these two are the question and serve or conversation. Well conversation, so what we need? Is this s question and this as answer? And if we take this as question? What is the answer for this? We will have no more answer for it. So what we need is question will be length minus 1 and answer will be up to the length of it. I hope this makes sense to you. Make sure this whole code makes sense so will be same questions. Dot Append What we need to append. We need to append the value from this tired so will be same diet and this index position meaning convert in the convert that index position. I as a question this is it, and for the answer, we answers Dot Append will be saying dialed and in that we need to pass on the world and index of corner will be. I plus one because as will be in next to it. Let’s quickly run this code and let’s see how it looks like. Alright, So these are our answers. These are our questions and it will take some time. Lord, or so these are our answers. These are our questions and that’s quick to see. If any question made sense, let’s get this seventh. One 720 is why, and here’s his unsolved mystery. You may spend some time to look at them, But they are clearly question and answers, and now we will delete all other variables, which we are no longer using why we are going to do that because then we using tackle to train this data set, then memory would be the greatest issue because it requires a lot of memory when we won for encoding output vectors, so we need to delete them all and lines. Let’s run them up and we will be having just question and answer, all right, so this is it for this video in this series and in the next video and with this said. I would like to end in this video. I will see you in the next video. If you haven’t press the like button, make sure you hit the like button and subscribe my channel for more such awesome.