Transcript:
Our tutorial, random forest regression supervised machine learning consists of finding which class output target. D that belongs to or predicting its value by mapping its optimal relationship with input predictors, data, main supervised learning tasks are classification and regression. This topic is part of regression machine. Learning with our occurs feel free to take a look at curse curriculum by clicking link at description box below, this tutorial has an educational and informational purpose and doesn’t constitute any type of forecasting business trading or investment advice. Please read full tutorial disclaimer At the end of this video, random forest regression consists of supervised learning meta algorithm for predicting output target feature average by bootstrap aggregation or bagging of independent appealed decision, trees, Bootstrap aggregation or bagging is used for lowering variance error source of independently build decision trees for full reference. I recommend you read. Bremen, random forests publishing machine learning in 2001 classification and regression trees algorithm consist of greedy top-down approach for finding optimal recursive binary node splits by locally minimizing variants at terminal nodes measure through sum of square errors function at each stage as a formula we have the minimization of sum of square errors equals to the sum from the first to the last of the difference between Alpo Target feature data minus terminal node output target feature mean, and that result to the power of two terminal node output target feature mean in turn is equal to 1 divided by M. M is the number of agents in terminal node multiplied by the sum from the first to the last of the output target feature data. You three bugging Algorithm consists of predicting output target feature of independently build decision trees by calculating their arithmetic mean for random forests, a combination of random feature selection and bootstrap aggregation or bagging algorithms is used. Bootstrap consists of random sampling with replacement as a formula independently build decision. Trees mean output target feature prediction is equal to 1 divided by K K is the number of independent labelled decision trees multiplied by the sum from the first to the last of the independent label decision trees, output target feature prediction. Great, so let’s go into our studio so that we can study random for is regression with greater detail. Excellent, so here we are within our studio. The first step within the tutorial is to load its packages. This is done with the library function and within it, the package name. So for this tutorial, we’ll be using Kwan Mod and random forest, so we select these two code lines. Then we click run or contravention a keyword, which is equivalent. The next step is to create data for random forest regression. This is done through daily reading, so we create this variable, which is data it’s equal to read Dot Csv and within and we have the name of the data file random for is regression data as a plain text file with dot CSV or comma separated values store within the working directory comma header equals to true, so we select the code line, click run or content on the keyboard and notice that within the global environment, it created data object as a data frame, so we click on the spreadsheet kind of icon, and that opens a data object, which has two columns dates with a daily frequency and S Py Adjustin. S PY corresponds to the ETF investment vehicle, which intends to replicate the Standard & Poor’s 500 index and adjust it because this includes adjustable closed prices, which were adjusted for difference and splits to us. Mention this data has a daily frequency and it’s from the beginning of 2007 all the way to the end of 2015 therefore, nine years of data. So the following step is we are going to create an XTS or extensible time series. We are going to name it, s py, and it’s equal to XT S and from data, we are going to select the second column with those sty adjusted clothes prices, which were adjusted for dividends and splits and were going to order by equals as date data at the first column. So we select that code line. Click run all content on the keyword, which is equivalent and notice that this created the second object named S Py now as an ex. TS or extensible time series. So again, we click on the spreadsheet kind of icon, and it opens as to why for us and as we can see, it’s the same data, but now the dates became the index so after doing data reading now we’re going to create a target and predictor features for the random force regression. So first we have a target feature which we’re going to name our SP. Why because it will correspond to its arithmetic return, so we have daily return capital R of s py. So this will calculate the arithmetic rate of return of those S PY adjusted closed prices, and then we have as predictor feature RSP y1 and we’ll be using as unique predictor feature Previous days returns, so we’re going to lag the previous calculation and the number of positions were doing the lag is for one position. We’ll bring both of this together into one data frame, which we’re going to name our s py all, and we do so with C bind or Columbine of our s py, which is a target feature and our SP y1 which is a predictor feature. We renamed their corresponding columns with Col names and the variable names within see that’s columns and the last step is as this corresponding lagging of those daily returns. We have and none available at the first row. We’re going to remove that first row for both of these columns, and we do so with an a exclude for this RSP. Y all, and we override it. We select all these cold lines. Then we click run or contravention the keyboard, which is equivalent so now that we have our data ready together with target ampere nature features we continue to delimit training and testing ranges. Training range is commonly used for algorithm training and testing range for algorithm testing for evaluating its forecasting accuracy, So we create first of all training range, which is Rs. Gyt and then testing range, which is RSV YF F to distinguish it with the T will be using window function for our SP Y. All and the training range is going to be from the beginning of the time series, which is the beginning of 2007 and going to end at the beginning of 2014 therefore, we’ll be using the first seven years of data as our training range and then for the testing range also from that. Rsp y open. In this case. It’s going to start at the beginning of 2014 so it’s going to go all the way to the end of 2015 therefore, the last two years of data as a testing range notice that this training and testing range, the limiting was all included for educational purposes, as an example, therefore it is not fixed and it can be modified, according to your needs, so we select these two code lines, click run or confluent on the keyboard. Within this tutorial, we’ll only be working within the training range, so now that we have our training and testing range delimiting ready, we can continue to do random forest regression, so we create this barble, which is going to be named our F for random forest tea because we’re doing the calculation within the train range and we’ll be using random forest function with capital F. And here we have the formula for the regression in which we have the target feature, Rsp Y explained by the predictor feature, our spy-1 or current date returns explained by previous days returns data. RSP YT, the one within the training range and then specifications for the random forest number of trees, two node size equals to two meaning that each of these two trees will only have two terminal nodes empty. Y equals to one that’s. The number of predictor features used for the random forest. So in this case, we only have one so we’ll be using that pretty to feature notice that this was all included to show one of the parameters of this random forest and replace equals to true for the corresponding bootstrap to be done with replacement. Two very important observations. Regarding these parameters. First of all is that they are not fixed. They were all included as an educational example, therefore they can be modified, according to units, and also very importantly, you might obtain different results when doing your own random forest regression because of the bootstrap and the random seed used within its calculation and the following step after doing the calculation of that random forest regression is to visualize it within a chart with plot of the previously created RFT, so we select these two code lines. We click run or content on the keyboard, which is equivalent excellent, so right here, we have the chart, so we’re going to zoom into it, and we have our FT random forest regression within the training range right here. We have on the vertical axis, the corresponding mean square error and on the horizontal one, the corresponding number of trees. So here we have one tree up two to three, so as we can see in this example as mentioned, you might obtain different results when doing your own Brandon Forest regression because of the bootstrap random seed. In this case, we have the relationship that as the number of trees increased, the corresponding means for error also increased, so we’re going to close that chart there, and now that we finished starting random forest regression, let’s go back into the slides and as mentioned previously, this tutorial has an educational and informational purpose and doesn’t constitute any type of forecasting, business trading or investment advice. Please pause the video now. So you can read the full tutorial disclaimer. Okay, so with this, we finish this tutorial. Thank you for watching.