Transcript:

Hi, everyone my name is. Michelle Sharfstein. And today I’m going to talk to you about fast text, which is a library for text classification. So in this tech talk. I’m gonna cover a few different things first. Why is text classification important? And what are some applications of text classification then? I’m going to look at fast, text the library itself and talk about how it builds classification models and then how it uses them and evaluates them. And then I’ll go through an example or two so that you can see it in practice so first, why text classification? There are actually a lot of applications of text classification that are incredibly relevant for us every single day, so the first are around Web application, search and content ranking. So if I’m a user on on a website, I want to find all of the texts that are related to sports. It’s really important for every piece of text to be able to classified into it’s correct category So that when I’m looking for sports, all of the most relevant content comes to the top, another application is spam filtering, so I don’t know about you, but I get a ton of spam emails every day, and it’s really useful that my email applications can automatically classify the text into spam or not and be able to filter out those spam emails for me and then, Lastly, there are some applications around understanding users and their sentiments, so we post a lot of content online and you can use text classification to understand that content, understand the sentiments and categorize it and then be able to understand users better and target them better going forward. So how do we actually do this? Enter fast text so fast. X is an open source free library from Facebook and it’s used for just that for classifying texts and one of the cool things about fast text is that the methods it uses help train models significantly faster than other methods that are being used in other libraries and we’ll talk about the reasons for that in just a little bit. So what is fast sex do first fast sex trains models for you. So if you give fast texts a variety of pieces of text with associated labels, it’ll take those texts, it’ll take the labels and it will create a model for you once you have that model, you can then input new pieces of text and will put it into the model and output the predicted classification for that text, and, Lastly, it allows you to actually evaluate the models so given pieces of text that have labels associated but that we didn’t use when we were training the model, you can input those pieces of text output the prediction and then compare it to the actual labels in order to be able to understand how well the model does so talk about each of these in a little bit more detail starting with model training because this is really the meat of it so first we’ll start by talking about training using a linear classifier. In this method, you can consider each piece of text and each label as a vector in space and the coordinates of that vector are what we’re actually trying to tweak and train so that the vector for a given piece of text and for its associated label are really close to each other in space. So for example, here on this graph, we have a 2d example the vector representing the piece of text that says calves beat. Golden State is close to the vector. I’m representing the Associated label, which are sports. The way that we actually do. This is we can take the vector representing the piece of text, take the vector representing its associated label and input it into this function that returns a score for us. We take that score and we normalize it across the score for that same piece of text and every other possible label, so we have a score for the calves beating Golden State and the sports label, and then we have a score for travel and food and any other possible label, and that provides us this type of probability that the calves beating Golden State will have this sports label. Then we can use a sarcastic gradient descent algorithm to keep tweaking those coordinates until we maximize the probability of the correct label for every piece of text. This is actually pretty computationally expensive because for every single piece of text, not only do we have to compute the score associated with its correct label. We have to compute the score for every other possible label in the training set and so what fast text actually does is it uses something called a hierarchical classifier in this method? It represents the labels in a binary tree, so every node in the binary tree is representative of a probability and so a label is represented by the probability along the path to that given label. As you guys may permit. Right, remember? We’ve talked about binary trees for search before and how it speeds up the time of search right, because instead of having to go through, you know, all of the different elements that you’re searching for now, you just go through the nodes and the same is true here, so if we have a piece of text, the Cavs beat in Golden State, for example, and we have this binary tree representation of all of the labels instead of computing, a score for every single possible label now what we have to compute is just the probability on each node on the path to that one correct label and so it vastly decreases the number of computations we have to do for each piece of text, and so when we have a lot a lot of labels, this really increases the speed of this model training and reduces the time complexity and not only does this increase the speed compared to the linear classifier that we were just talking about. But actually, both of these types of classifiers are actually a lot faster than some of the deep, the deep learning techniques that are being used like neural networks or and things like that because those are actually a lot more complicated and they have a lot more parameters to train and they, it means that the training takes a lot longer and luckily increasing this speed is actually not sacrificing accuracy, so the models that are created in these methods are actually similarly accurate to the models with some other deep learning techniques. So if you do have an instance where you have a ton of different labels in a big data set, it’s really useful to consider a model algorithm like this and fast text in particular, because it’s so quick, great, so once we actually have that model, then we can take an any new piece of text and input it into the model, calculate the probability for every single label and then output, the labels that have the highest probability that can be used to classify that piece of text. In addition, fast text allows us to then evaluate the models, So this is just an example again with the Cavaliers. The text is who are the players on the Cavaliers. We also have four associated labels with it and then an example of three different labels that might have been predicted when you inputted this text into the model and the way that we can try to understand how good the model is is looking at two different measures. So the first is the recall, So if we look at all of the labels that actually existed. What percent of them did we actually recall when we put the text through the model? And in this case, we recall two of the four labels, so the recall would be 50% We can also look at the precision of the model and when we’re looking at the precision, that means that we look at all of the predicted labels, and we calculate what percent of the predicted labels were actually labels in the first place. And in this case, two of the three labels were labels, so the precision would be 67% great, so all of that can be done in fast text and Facebook actually provides a tutorial to help you walk through that and the tutorial that they provide or is looking at cooking questions on Stack Exchange, so they take all the cooking questions on Stack Exchange, they take the associate labels for each of the questions, and then they hold out some of them to be used for testing their model the rest of them they put into the model, and they show you how to do this that you put it into the model and you train it, And then you can take some of the testing examples and put it into the model to see how well it does, so in this particular case we put in one of the testing examples. The precision was about 60% so for that particular example out of all, the labels predicted 60% of them are correct and it recalled about a quarter of the original labels. For that piece of text, this can obviously be optimized further, but it’s actually pretty impressive for you can do this in about 5 minutes, and for a lot of these questions, There’s a ton of different labels associated so being able to accurately, you know, predict 60% of them or some percent of them is actually pretty good. A good start for classifying your text so Lastly. I want to walk you through an example that I made myself very small example of spam identification, so there’s actually a Javascript interface that you can use fast text with, and that’s what I did here so I could integrate it into a small application that I built and what I did is. I have a variety of different email subjects whether they’re classified is not spam or a social email. I can build a model with fast X with this input and then apply that model to an example inbox so that it can classify if my emails are spam or not, so I will just show you this visually representation, so here’s an example inbox with some messages that are spam and some that are not, and if I click check for spam, it creates the model, and then it looks at all of my different emails and it labels the ones that it thinks are spam, and it also outputs the confidence that it has in that categorization. So this is this a small example of how you can actually use this in an application. Great so obviously. I explained this at a pretty high level, but there’s a ton of resources out there about fast sex in particular and text classification in general. So I encourage you all to take a look and I’d love to talk to any of you afterwards. If you are interested. Thank you [Applause].