Transcript:

Oh, [Music] [Applause]. Hi, welcome to mastering our programming course by Packt Publishing authored by Selva Prabhakaran. I’m Greg Arnold, the Voice-over artist. For this course, Silva is a data scientist by profession and a lot of his work involves solving problems related through machine learning and advanced analytics. He extensively uses our for such projects. In addition to this, he writes blogs at our – statistics dot CO. And our statistics dotnet. Mastering our programming is an advanced our course where we follow a step by step approach to go from good to great with our and machine learning. This course picks up where we left in Silva’s previous introduction to our programming course with pack publishing. Therefore, it is assumed that you have some working knowledge of R. In this course, we will learn the core machine learning algorithms and advanced in-depth concepts and finally implement them in our all videos have hands-on working examples with detailed explanations, most videos in this course close with a coding challenge and the solutions we begin the first section with the pre model building activities. The first two topics are aimed at describing the data and understand the relationships between variables, whereas the next two are for detecting outliers and treating missing values using the mice package in Section 2 We see regression modeling in depth, basic and advanced concepts related to regression models are discussed with hands-on examples, We go to the complete procedure of building linear regression models and interpret the model results, Then we go into details of residual analysis and extracted virtual observations using cook’s distance, the best subset’s stepwise regression. Innova help to choose better models while the k-fold validation helps to assess how the models perform on new data. Finally, we see how to build nonlinear models with splines and gams in Section. Three we discuss classification models. This section introduces multiple approaches to model categorical variables We begin with logistic regression, which is used to model binary response variables. We will tackle the problem of class bias and understand the special evaluation metrics computed from the confusion matrix and the ROC curve. We’ll understand the math behind classification algorithms, such as naive Bayes classifier, K nearest neighbors and tree base model. The last two videos introduce the powerful carrot package Section 4 deals with modern machine learning algorithms, such as SVM, XG boost and other ensemble methods. We will understand the core concepts behind these algorithms and do detailed hands-on sessions, followed by coding challenges, Section 5 is dedicated to unsupervised learning algorithms, We discuss dimensionality reduction with principal components and clustering algorithms, We also build recommendation engines with a real world example to recommend movies to existing users in Section 6 we shift focus to time series analysis and forecasting models, we understand the basics of time series modeling and start using the robust. Xts package. We’ll break down a time series into its components and understand key concepts such as stationarity. D trending D sees analyzing ACF P. ACF and so on with the fundamentals in place we venture into forecasting models such as exponential smoothing Holt winners and Arima models in Section 7 We dive into text analytics. We begin with scraping text from Wikipedia and process them into a consumable format. The TM NLP and open NLP package provides extensive facilities for text mining. Well see how to create a term document matrix normalize with tf-idf and draw a word cloud cosine similarity can be used to score similar documents and latent semantic indexing LSI to be used as a vector space model to group similar documents in Bladen directly allocation. We will extract underlying topics discussed and related keywords from a set of related documents, We will then score sentiments from user reviews using the tidy text and size n packages. Finally, we classify. Tex, with machine learning algorithms using facilities from the art text tools package in Section 8 we focus on constructing nice-looking chart using the ggplot2 package. We will understand the principles that go into creating any ggplot and customize it. Modify the theme elements, change aesthetics, layouts and faceting different types of charts such as bar chart box plot and so on are discussed in detail in Section 9 We discuss multiple strategies to speed up our code. Beginning with the best practices, we see how to implement parallel computing to run parallel loops using do parallel and for each package, we then go over to the powerful DPR and data table packages and familiarize ourselves to work with the pipe operator during the process. Finally, we will learn to write an interface. C++ code in our using the powerful our CPP package, our CPP is discussed in detail with many hands-on examples. Finally, in Section 10 we build an R package using facilities from our oxygen 2 and Dev tools packages. We will write the help documentation and host the package publicly on Github so that anyone can install and use it. We will also build and check the package perform mandatory checks before submitting to cran and acts. Submit a package to Cran. We do this so you have a full picture of what it takes to build a package acceptable to cran. So, by the end of the course, you will gain a wholesome knowledge of machine learning, traditional predictive modeling and the our language itself, You’ll see me running hands-on coding sessions all along and explain the concepts in detail. We will be learning a lot interesting concepts and solve numerous coding challenges throughout the course. Are you ready? Let’s dive straight in to master our.