Transcript:

Hey, guys, and welcome to another fun and easy machine learning tutorial on naivebayes. Ah, it’s a good day to play golf, or is it lets. Take a look at some of the factors that determine whether or not we play today. These features are weather conditions or outlook. If it’s sunny, overcast or rainy, there’s also temperature, humidity and wind. Please smash the subscribe button and click the doll icon to join our notification squad. If you think about it intuitively, we are more likely to play when it’s not too hot or cold, which means overcast and mouth temperatures if it is less humid, but minimal wind speeds, we are more likely to play if it’s too hot or sunny, who may become exhausted very quickly. Let’s take a look at our data set. So if we recorded our circumstances of 14 days, we have our outlook, temperature, humidity, wind speed and our dependent variable, which is whether or not we like golf. The objective here is to estimate the likelihood of playing golf. Yes, or no given weather condition information so based on this data, Let’s see how we can approach this problem using the naive. Baye’s algorithm. First we determine how many yeses and how many knows we get from our data set, so the probability P of C or probability of our classes, which are yes in known, we can calculate this as follows, so the probability of Y ES is 9 out of 14 we count 1 2 3 all within 9 total yeses from the 14 possible days and similarly we can count 5 nodes, which gives us a probability of 5 out of 14 now we also need to calculate the individual probabilities with respect to each features or very conditions in our dataset so for sunny, the probability that it is sunny, given that it is yes is two out of nine, so be sunny, given a yes. Why, because from nine yeses, it is only sunny twice as we can see over here to calculate the probability that it’s sunny, given an O is three out of five so from the five no play days. Only three days were sunny for the overcast outlook. We can do the same for both here’s. In no classes, we count four days that we got a s and zero days to do that and no, so the probability that it was overcast, given a yes is four out of nine and four, not play class. We got zero out of five, respectively. Now we can compute these probabilities for all the other features. You can see how easy this is so soon. We need to classify the following new instance with outlook is sunny temperatures. Cool humidity is high and it’s a tad, but windy should recall often play some golf or rather stay in those watch a movie. Firstly, we look at the probabilities that we can play the game, so we use our lookup tables to get the probability that Africa sunny, given place. Yes is to over nine. Probability of temperature equals cool given places, which is three out of nine and similarly for humidity we win, we have three are denied for both of them, and then we have probability that play is a yes, it’s nine out of 14 as we discussed earlier next. We consider the fact that we cannot play a game so for elliptical sunny, we get three out of five temperature to school under five humidity 4.5 when it goes strong, given a no street of five and then play equals no is five out of 14 then using those results. You have to multiply the whole lot together, so you multiply all the probabilities for lay coziest, such as the probability that X given Pleco’s years X play equals. Yes, so that is to develop a 9 by 3 over 9 times 3 over 9 times 3 over 9 times 9 over 14 and this gives us zero point zero five three, and this gives us a value that represents the probability of X given a class times probability of a class or in this case. We have probability of X given play Casillas Times probability. That play was a yes. We also have to do the exact same thing for play was no, so the probability of X given that labels. No, thanks probability. The plays know that equals 3/5 times 1 over 5 is 4 over 5 times 3 or 5 times 5 over 14 and excuse US, 0.02 or 6 And finally, we have to divide both results by the evidence or probability of X to normalize the evidence for both equations is the same and we can find the values we need within the total columns of the look-up tables. Therefore, probability of X equals probability our local sunny times probability of temperature equals pool times probability that humidity is high as well as windy or strong as we mentioned earlier and lets gives us our probability of X, which is zero point zero, two one eight, six and then dividing the results by this value. We get the probability that we plays golf given X and S gives us zero point two, four, two four, as well as probability that we don’t play given an X condition where we get zero point nine four to one so given the probabilities. Can we play the game or not to do this? We look at both probabilities and see which one is the highest value, and that is our answer and therefore since zero point, nine four to one is greater than zero point. Two, four, two four. The answer is no. We cannot play golf today. Probably you guessed it, right. It looks like baye’s theorem. Baye’s rule. Now naive Bayes is based on base here, also known as conditional theorem, which you can think of as an evidence theorem or trust theorem so basically. How much can you trust the evidence that is coming in and it’s a formula that describes how much you should believe the evidence that you are presented with an example, would be a dog barking in the middle of the night. It’s a dog barks for no good reason. You will become desensitized to it and I’ve got checked. If anything is wrong, this is known as a false positive. However, if the dog barks only went so and enters your premises, you’ll be more likely to act on the alert and trust or rely the evidence from the dog. So Baye’s theorem is a mathematical formula for how much you should trust the evidence. Let’s take a deeper look at the formula. We can start off with the prior probability, which describes the degree to which we believe the model accurately describes reality based on all of the prior information. So how probable was our hypothesis Before observing the evidence here? We have the likelihood which describes how well the model produces the data. This term over here is the normalizing constant, the constant that makes the posterior density integrate to one like we see over here and finally, the output that you want is the posterior probability, which represents the degree to which we believe a given model accurately describes the situation, given the available data of all our prior information. So how probable is our hypothesis, given the observed evidence? So what’s our example above we can view the probability that we play golf given it is sunny equals the probability that we play golf given a yes times, the probability of it being sunny, divided by the probability of a yes, so why not? Eve probability theory says if several factors don’t depend on each other in any way, the probability of seeing them together is just the product of the probabilities. So in our example earlier, we have the probability that our local sunny given is times the probability of temperature equal school given a yes times, the probability that humidity is high given ES time’s probability of the pin equal strong given a yes all looking at another example, we can assume that sneezing has the impact on whether you are a bola, so the probability of sneezing and being a Boulder given you got the flu equals the probability of sneezing, given you that flu times, the probability that your bola given the flu, so the probability of a sneezing? Boulder having flu must depend on the chances of this combination of attributes indicating flu looking at the pros and cons of naive Bayes. It is easy and fast to predict a class of it is deficit. It also performs well in multi class predictions when the assumption of Independence holds a naive. Baye’s classifier performs better compared to other models like logistic regression, and you need less training data. It performs well. In the case of categorical input variables, compared to numerical variables for numerical variables. Normal distribution is assumed or a bell curve, which is a strong assumption looking at the disadvantages. If categorical variable has a category in a test dataset, which is not observed in the training dataset, then the model will assign a zero probability and will be unable to make a prediction. This is often known as zero frequency to solve this. We can use the smoothing techniques and one of simplest winning techniques is called the Laplace estimation in some cases, Like our only example, you can just add one to avoid the algorithm dividing by zero on the other side naive. Bayes is also known as a bad estimator, so the probability outputs from the predicted probabilities are not taken too seriously. Another limitation of naive. Bayes is the assumption of independent predictors in real life. This almost impossible that we get a set of predictors which are completely independent, naive. Bayes can be used for the following applications for credit scoring for e-learning platform’s, medical data classification through the naive Baye’s approach. You can be used for Real-time prediction so naive. Bayes is an eager learning classifier, and it is really fast. Thus, it can be used for making predictions in real time, so this algorithm is also well known for multi-clas’s prediction features we can predict the probability of multiple classes of targets favor can be used for text classification, spam, filtering and sentiment analysis. So the naive. Baye’s classifier, mostly used in text classification due to better results in multi class problems and independence rule, having higher success rate as compared to other algorithms as a result, it is widely used in spam filtering to identify spam, email and sentiment analysis in social media analysis to identify positive and negative customer sentiments. It is also used for recommendation systems, So the naive. Baye’s classifier and collaborative filtering together post a recommendation system that uses machine learning and data mining techniques to foster unforeseen information and predict whether a user would like a given resource or not. Okay, so that is it for me, Please don’t forget to. Like, subscribe and share, click the bell icon. If you like to see some more machine learning tutorials and please support us on Patreon. If you’d like to download the script to this video, please click the link down below for free download and stay tuned to the next lectures. We will see how we can implement a naive. Baye’s algorithm in pipe. Thank you for watching and see you in the next lecture. [music] you!