Transcript:
Hey, guys, and welcome to yet another fun and easy machine. Learning algorithm on random forests, the random forest algorithm is one of the most popular and most powerful supervised machine learning algorithm that is capable of performing both regression and classification tasks as the name suggests, this algorithm creates the forest with a number of decision trees in general, the more trees in the forest, the more robust the prediction and thus higher accuracy to model multiple decision trees to create the forest. You are going to use the same method of constructing the decision with the information gain or Gini index approach amongst other algorithms. If you’re not aware of the concepts of the decision, tree classifier, please check out my other lecture on decision. Tree got for machine. Learning you’ll need to know how the decision tree classifier works before you can learn the working nature of the random forest algorithm. So how does it work in random forests? We grow multiple trees as opposed to a single tree in court model to classify a new object based on attributes. Each tree gives a classification and we say the tree votes for that class, The forests choose the classification, having the most votes over all the other trees in the forests and in the case of regression takes the average of the outputs by different trees, So let’s look at the advantages of the random forests. So the same random forest algorithm or random forest classifier can be used for both classification and regression tasks. Random forest classifier will handle the missing values and maintain accuracy when a large proportion of the data are missing when we have more trees in the first random classifies won’t over fit the model, it has the power to handle large data sets with higher dimensionality. If you look at the disadvantages of rain forests, however, it surely does a good job at classification, but it’s not as good as for regression problems as it does not give precise, continuous nature predictions in the case of regression it doesn’t predict beyond the range train data and they may offer datasets. There are particularly noisy. Random Force can feel like a black box approach for statistical modulus. You have very little control of what the model does. You can try at best. Try different parameters and random seeds. So what are the applications, our friend forests? So let’s check a few of them down below. So we can use them in the banking sector, so these are for finding loyal customers and finding the fraud customers. It can be used in medicine, where we identified the correct combination of components to validate medicine. Random force algorithms are also help for identifying disease by analysing. The patient’s medical records in the stock market, random forest algorithm is used to identify the stock behavior, as well as the expected loss or profit by purchasing a particular stock in e-commerce. The rain forest is used in a small segment of the recommendation engine for identifying the likelihood of a customer liking their recommended products and this is based on similar kinds of customers in computer vision. The rain forest is used for image classification. Microsoft have used rain forests for body passed classification for Xbox Kinect and other applications involve lip-reading as well as voice classification. Let’s take a look at the random forest serial code and how it works, so it works in the following manner where each tree is planted and grown as follows so assume enum of cases in the training set is in then the sample of these end cases is taken at random, but with replacement, the sample will be the training set for growing the tree if they are. M Input variables or features a number of M smaller than M is specified such that each node M variables are selected at random R of DM. The best blood on these M input variables is used to split the node. The value of M is held constant while we grow. The first each tree is grown to the largest extent possible and no pruning. And then we predict data by aggregating their predictions of the entries, which means majority votes for classification and average for regression so to perform the predictions using the Train random forest algorithm. We need to pass the test features through the rules of each randomly created trees. Suppose let’s say we formed a thousand random decision trees to form the random forest, say we detecting If an image contains a hand. Each random forest will predict a different outcome or class for the same test feature, a small subset of the forest look at a random set of features, for example, a finger suppose hundred random decision trees, building some three unique targets, such as a finger thumb or maybe the mail. Then the votes of finger is tallied out of hundred random decisions and likewise for the other two targets if finger is getting highest votes. Then the final random forest returns the finger as it predicted target. This concept of voting is known as majority voting just like elections. The same applies to the rest of the fingers of the hand. If the algorithm predicts the rest of the fingers to be fingers, then the high-level decision tree can vote that the image is a hand and this is why random forests are known as ensemble machine learning algorithm ensembles are a divide and conquer approach used to improve performance. The main principle behind ensemble methods is that a group of weak learners can come together to form a strong learner. Each classifier is individually a weak learner, while the classifiers taking together are a strong learner and thus ensemble methods, reduce the variance and improve performance. Before we end the lecture, Let’s take a look at some terms and definitions that you might come across such as bagging and boosting Bootstrap aggregating is also known as bagging, which is a machine learning ensemble meta algorithm designed to improve the stability and accuracy of the machine learning algorithms used in statistical classification and regression. It also reduces Ference and helps to avoid overfitting boosting is a machine learning ensemble after algorithm for primarily reducing bias and also fails in supervised learning and a family of machine learning algorithms, which also convert reach learners into strong one’s algorithms that the chief hypothesis boosting literally becomes simply known as boosting. Okay, so that is it. Thank you for watching, Please don’t forget to. Like, subscribe and share, click that Bell icon to get notified of more fun and easy machine learning algorithms. Also animating these videos takes a lot of time and effort. So please don’t forget to support us on patreon. So in the following lecture, we’ll learn how to implement a simple, random forest in Python. See in next lecture. Thank you for watching.