Transcript:

I’m at home during lockdown working on my step quest. Yeah, I’m at home during lockdown working on my stack quest. Yeah, Stack Quest. Hello, I’m Josh Starburn’s. Welcome to Static Quest. Today we’re gonna talk about naive. Bayes, and it’s gonna be clearly explained. This Stack Quest is sponsored by JAD Bio, just add data and their automatic machine. Learning algorithms will do the rest of the work for you for more details. Follow the link in the pinned comment below note when most people want to learn about naive. Bayes, they want to learn about the Multinomial naive Bayes classifier, and that’s what we talk about. In this video, however, just know that there is another commonly used version of naive. Baye’s called Gaussian naive Baye’s classification, and I cover that in a follow-up stat quest. So check that one out when you’re done with this quest BAM. Now imagine we received normal messages from friends and family. And we also received spam unwanted messages that are usually scams or unsolicited advertisements and we wanted to filter out the spam messages, So the first thing we do is make a histogram of all the words that occur in the normal messages from friends and family, we can use the histogram to calculate the probabilities of seeing each word, given that it was in a normal message, for example, the probability we see the word dear, given that we saw it in a normal message is eight. The total number of times deer occurred in normal messages divided by 17 the total number of words in all of the normal messages. And that gives us 0.47 So let’s put that over the word, dear, so we don’t forget it. Likewise, the probability that we see the word friend given that we saw it in a normal message is 5 The total number of Time’s friend occurred in normal messages divided by 17 the total number of words in all of the normal messages. And that gives us zero point two nine, So let’s put that over the word friend, so we don’t forget it. Likewise, the probability that we see the word launch, given that it is in a normal message is 0.18 and the probability that we see the word money given that it is in a normal message is 0.06 now we make a histogram of all the words that occur in the spam and calculate the probability of seeing the word dear, given that we saw it in the spam, and that is two the number of times we saw deer in the spam divided by seven, the total number of words in the spam, and that gives us zero point two nine. Likewise, we calculate the probability of seeing the remaining words given that they were in the spam. Bam now! Because these histograms are taking up a lot of space, let’s get rid of them, but keep the probabilities. Oh, no, it’s the dreaded terminology alert because we have calculated the probabilities of discreet individual words, and not the probability of something continuous like weight or height. These probabilities are also called likelihoods. I mention this because some tutorials say these are probabilities, and others say they are likelihoods. In this case, The terms are interchangeable. So don’t sweat it. We’ll talk more about probabilities versus likelihoods when we talk about. Gaussian naive Bayes in the follow-up Quest. Now imagine we got a new message that said, dear friend, and we want to decide if it is a normal message or spam, we start with an initial guess about the probability that any message, regardless of what it says is a normal message. This guess can be any probability that we want, But a common guess is estimated from the training data, for example, since 8 of the 12 messages are normal messages. Our initial guess will be 0.67 so let’s put that under the normal messages, so we don’t forget it. Oh, no, it’s another dreaded terminology alert. The initial guests that we observe a normal message is called a prior probability. Now we multiply the initial guess by the probability that the word dear occurs in a normal message and the probability that the word friend occurs in a normal message. Now we just plug in the values that we’ve worked out earlier and do the math beep-boop beep-boop it and we get 0.09 We can think of 0.09 as the score that dear friend gets if it is a normal message. However, technically, it is proportional to the probability that the message is normal, given that it says, dear friend, so let’s put that on top of the normal messages, so we don’t forget now, just like we did before we start with an initial guess about the probability that any message, regardless of what it says is spam and just like before the guests can be any probability we want, But a common guess is estimated from the training data and since four of the twelve messages are spam. Our initial guess will be 0.33 so let’s put that under the spam, so we don’t forget it now. We multiply that initial guess by the probability that the word dear occurs in spam and the probability that the word friend occurs in spam. Now we just plugged in the values that we worked out earlier and do the math. Bip, BIP, BIP, BIP, BIP, and we get 0.01 like before we can think of 0.01 as the score, the dear friend gets If it is spam. However, technically, it is proportional to the probability that the message is spam, given that it says, dear friend. And because the score we got for normal message. 0.09 Is greater than the score. We got for spam 0.01 We will decide that dear friend is a normal message double. Bam, now! Before we move on to a slightly more complex situation, let’s review what we’ve done so far. We started with histograms of all the words in the normal messages and all of the words in the spam, then we calculated the probabilities of seeing each word, given that we saw the word in either a normal message or spam. Then we made an initial guess about the probability of seeing a normal message. This guest can be anything between zero and one, but we based hours on the classifications in the training data set. Then we made the same sort of guess about the probability of seeing spam. Then we multiplied our initial guests that the message was normal by the probabilities of seeing the words, dear and friend given that the message was normal, then we multiplied our initial guests that the message was spam by the probabilities of seeing the words, dear and friend, given that the message was spam. Then we did the math and decided that dear friend was a normal message because 0.09 is greater than 0.01 Now that we understand the basics of how naive? Baye’s classification works. Let’s look at a slightly more complicated example. This time, let’s try to classify this message. Lunch, money, money, money, money note. This message contains the word money four times, and since the probability of seeing the word money is much higher in spam than in normal messages, then it seems reasonable to predict that this message will end up being spam, so lets. Do the math calculating the score for a normal message works just like before we start with the initial Guess. Then we multiply it. By the probability we see lunch, given that it is in a normal message and the probability. We see money four times given that it is in a normal message. When we do the math, we get this tiny number. However, when we do the same calculation for spam, we get zero. This is because the probability we see lunch in spam is zero since it was not in the training data. And when we plug in zero for the probability we see lunch given that it was in spam, then it doesn’t matter what value we picked for the initial guess that the message was spam, and it doesn’t matter what the probability is that we see money given that the message was spam because anything times zero is zero, in other words. If a message contains the word lunch, it will not be classified as spam, and that means we will always classify the messages with lunch in them as normal, no matter how many times we see the word money, and that’s a problem to work around this problem. People usually add one count represented by a black box to each word in the Histogram’s note. The number of counts we add to each word is typically referred to with the Greek letter Alpha. In this case. Alpha equals one, but we could have said it to anything anyway. Now, when we calculate the probabilities of observing each word, we never get 0 for example, the probability of seeing lunch, given that it is in spam is 1/7 the total number of words in spam, plus for the extra counts that we added, and that gives us 0.09 note, adding counts to each word does not change our initial guess that a message is normal or the initial guess that the message is spam because adding a count to each word did not change the number of messages in the training data set that are normal or the number of messages that are spam. Now, when we calculate the scores for this message, we still get a small number for the normal message. But now, when we calculate the value for spam, we get a value greater than zero and since the value for spam is greater than the one for a normal message, we classify the message as spam spam. Now let’s talk about why naive? Bayes is naive. The thing that makes naive. Baye’s so naive is that it treats all word orders the same, for example, the normal message score for the phrase dear friend is the exact same for the score for friend Dear. In other words, regardless of how the words are ordered, we get 0.08 Treating all word orders equal is very different from how you and I communicate. Every language has grammar rules and common phrases, but naivebayes ignores all of that stuff. Instead, naivebayes treats language like it is just a bag full of words, and each message is a random handful of them. Naive Bayes ignores all the rules because keeping track of every single reasonable phrase in a language would be impossible, that said, Even though naive Bayes is naive, it tends to perform surprisingly well, when separating normal messages from spam in machine learning lingo, we’d say that by ignoring relationships among words, Naivebayes has high bias, but because it works well in practice naive. Bayes has low variance. Shameless self-promotion! If you are not already familiar with the terms bias and variance, check out the quest, the link is in the description below triple spam. Oh, no, it’s one last shameless self-promotion! One awesome way to support Stack Quest is to purchase the naivebaye’s Stack Quest Study Guide. It has everything you need to study for an exam or job interview. It’s eight pages of total awesomeness. And while you’re there, check out the other stack quest study guides, there’s something for everyone. Hooray, we’ve made it to the end of another exciting stat quest if you liked this stack quest and want to see more, please subscribe. And if you want to support Stack Quest, consider contributing to my patreon campaign, becoming a channel member buying one or two of my original songs or a t-shirt or a hoodie or just donate the links are in the description below. Alright, until next time quest on.