There are a variety of natural language processing techniques out there and then one. I’ll be talking about in. This video is sentiment analysis. So let’s say you are a manager of a company that sells hats and also shirts. And you want to know what your customers are thinking about your hats and shirts. Do they have positive feelings about them or negative feelings about them? So then you go to your call center and you see that a bunch of people have called in without your hats and your shirts, and you could go through all of these and listen to every single message, but that would take you a really long time. So instead you can use an NLP technique to automatically tag these as positive or negative calls, and then at the end of the day you can figure out that people tend to think that your hats are pretty good and that your shirts are not very good. So this concept is called sentiment analysis in this video. I’m going to be going through the details of how to do sentiment analysis in Python. So the input to this step is a corpus and we’ve talked about this in previous videos, but a corpus you can think of it as a collection of text documents, and in this case, we want to keep this collection of text documents in the order that it’s written. Because if we have a word, let’s say great in there. We want that to be flagged as positive, but if we have a word, that’s not great in there or a phrase that’s not great. We want that to be flagged as negative, so we want to keep all of our original text in the original order so that we can capture these sentiments. The tool that we’re going to be using is called text blob. This is a Python library That’s built on top of N. Lt K. And it’s a lot easier to use than N. Lt K. And it also provides some additional functionality such as sentiment analysis and the output to this step will be a sentiment score for every comedian. In this case, we have transcripts for a bunch of comedy routines, so for every comedy routine for the Swift comedian, we’re going to be given a sentiment score, So how positive or negative that comedian is and then also a subjectivity score. Which is how opinionated that comedian is, so this is what the code looks like for sentiment analysis Using text blob You can import text blob, and then you can text blob A file your text. So in this case, my phrase is. I love NLP, and then you just do dat sentiment and this would be the output of my code right here, so in this case. I have the polarity. If it’s positive 1 that means it’s really positive Sentiment. Negative 1 is a negative sentiment, and there’s also subjectivity, so positive 1 means there’s some opinion in that phrase, and then 0 means that the phrase is more about facts less about opinions. So at this point, we know everything about text blog, right, and we can start using it, so we’ll jump right into the Jupiter notebook, but actually, you should not jump into the do Padar notebook just yet, because before you import and use the module, you always have to understand what’s going on behind the scenes, and this is something. I mentioned in the first part of this video series in the data science Venn diagram and how there’s that danger zone. So if you know how to code, and you know what you want to do, you can just download or import a library and start using it, but it’s really dangerous. If you don’t know what’s going on behind the scenes and you really should understand the math behind the scenes. And so before we dive into the duper notebook. I want to go over What’s actually happening in text blob. So this is a researcher and why she’s done is he’s taken a bunch of words, so you can see all the words on the left over there, and for every word he’s flagged or tagged it as certain polarity and a certain subjectivity. And so let’s take a look at the word. Great, if you look at all of those tags, then you’ll see that within all those words that that researcher. I’m tagged. There’s actually multiple lines for the word great and these. They’re multiple lines because every row here has a different definition for that word great, and so you can see on the right there that there are different polarities and subjectivity Z. For each of these words Over there, you also. I just want to note that there’s a word Annette. Id, which connects to Princeton’s word net. And then there’s also the part of speech tag here, but the main things to know here are that the polarity and subjectivity are different for each of these rows of data. And so what text blob does is if you if it sees the word great, it’s going to take the average of all of those polarities and subjectivity, so you saw in the last slide and let’s see it. Let’s say it shows you the word not great or so show you the phrase not great. What’s happening here is it’s actually just taking great and then multiplying the polarity by negative 25 let’s say you see the phrase very great. What’s happening here is it’s multiplying both the polarity and subjectivity by one point, three and a four polarity. It caps at one and then finally. I am great! What’s happening here is I? Am don’t have a polarity or subjectivity, and so you see the same scores as great at the top so to summarize what’s happening with text blog what it does is, it finds all the words and phrases in your text that it can assign a polarity and subjectivity to, and then an average shows them all together, so for one body of text or in our case, a comedy transcript, it’s going to have one subjectivity and one polarity score assigned to it, so this isn’t the most sophisticated technique, in fact, we’re not using any machine learning here. This is solely a rule based technique, but it is a good starting point. If you want to do some sentiment analysis, if you do want to use a statistical method, one of the most popular ones out there is called naive. Bayes for text data. And I’ll be going through that in another one of my videos. All right, so now that we’ve gone through how texts Bob is calculating sentiment analysis. I’m calculating the sentiment scores. Were gonna go through the Jupiter notebook. So all of my code again can be found on Github a dash of data. Alright, here’s my notebook For sentiment analysis. So the introduction here you see, we’re using the text blob module and then the labels and outputs are polarity and subjectivity. So what I’m gonna first do is I’m gonna read in a pickle file in my past video. I talked about how you can pickle a file. Basically saying, or you can pickle a data set. So let’s say you’ve clean the data set. You can then pickle it or save it. And then now that had this new Jupiter notebook, I can breed in that pickle file or read in that data set and what I’m reading in here is my corpus of comedy transcripts, so you see, I have all the different comedians. And then their full transcript for one of their comedy routines. So now I’m going to use text blob What I’m doing here is I’m going to find the polarity and the subjectivity of every single transcript, and you can see, I’m applying those functions here, and at the end of the day for every transcript, I get a polarity score and a subjectivity score, So what I always like to do is plot my results and here you can see on the X-axis. I have the polarity, So whether it’s positive or negative and on my y-axis, I have the subjectivity, so whether it’s more opinionated or a fact base and every comedian here is rated on these two attributes. And you see specifically. Ali Wong, who I’m most interested in. She is similar to Ricky Gervais and John Mulaney. Alright, so that was pretty interesting, and now I’m going to take it a step further so instead of just looking at a comedy routine as a whole, I’m gonna divide it up over time, so I want to know. Does the comedian start out negative and then end up positive? Or is there a different pattern in there? So what I’m going to do here is I’m going to split each routine into ten different parts, so I’m looking at all the text data and then what I’m going to do is. I’m going to take that text data and split it into my ten parts, so let’s take a look at our original data again. These are our transcripts with all the text data And what I’m gonna do here is I’m going to split up that text data. So if I scroll back up to the top here, you’ll see. This is the beginning of Ali once comedy routine, and this is one block of text data, and then now I’ve split it. See this comma. I’ve split into the second block of text data and so on and at the end of the day. I’m going to do this for all. 12 comedians and each comedian is going to have their transcript broken up into ten different parts. And then what I’m going to do is for every part. I’m going to look at the polarity of that part. So if this is one comedian right here and you can see. This is the ten parts of their routine. Okay, so let’s plot this. I’m gonna show the plot for one comedian up here and then down here. I’m creating all these subplots to show the plots for all the comedians, and you can see that here’s Ali Wong and she is generally this orange line shows zero polarity, meaning they’re neutral and you can see that she’s generally pretty positive. Then you have Bo Burnham over here who starts really positive, goes negative and then ends up really positive. So looking at all this. I see that Ali Wong stays generally positive and then similar comedians are Louis CK. My / Biglia So Louie CK Microvidia you see, they’re both generally positive throughout and then on the other hand, you have some more extreme patterns, So I mentioned Bo Burnham, then also, Dave Chappelle, you can see he gets pretty negative towards the end of his routine. Alright, so you can do this additional exercise if you’d like this is the end of my sentiment analysis tutorial next. I’ll be talking about topic modeling.