Transcript:

Hi, everyone! David Mele here today. I’m going to show you something really cool. I’ve had a lot of people. Ask me well, what if I don’t want to do a full length customer Rima model and I just want to do an auto. Arima, quickly! I want a quick method for some forecasting for some data for some time series data. So that’s what we’re gonna do today. We’re going to show you how to use the auto. Arima package in our and you’re going to end up with a nice graph like this. We’re just going to forecast your data and show you exactly how to do it. So this is 2016 and 2017 data from an auto body shop. Here’s the data right here. You have by week week number year month transactions, items, sales and the average transaction, so an auto body shop may be the average It’s pretty expensive, But you know, they do a, you know, fixing the the paint fixing the bodywork fixing the frame might be more expensive. It depends then you have some months where it’s lower. Maybe they have fewer accidents, then, or maybe they had more of like door. Dings and smaller repairs that so that’s what this is, then we’re going to be showing today, so we’ve got two years of data right, 2016 2017 Let me open this up, so you can actually see the full screen here and so we’re going to start with this art. We got to pack just running use today. Not a whole lot. This is very simple stuff, so your packages are read. Excel right here and forecast, which is right here, so if you don’t have. These just used like this install dot packages and then that package name and the pack or the actually, the data set is obtained through this line Right here, so all we do is read, underscore Xlsx, which is a part of the read excel package and then the location on your computer desktop laptop. Whatever it is of where your file is, that’s the name of my file. It puts into this data frame called data. Next we’re going to do. You’re going to use the TS. Which is the time series functions called Ts to place this in time series format. Even though you saw the data, it has it by month and date and all that stuff we need to have in a time series format. So what we do is we use this and here you’re gonna see we have. Ts, the name of the data feel or the data frame, which is data and then we have whichever field or column we’re gonna pick that. We’re going to forecast off so in this case. I’m using sales. I could have used transactions or any of the other fields that I want to in this case. I’m gonna use sales because probably the most likely one that we would do then what we have to do. Is we look at this? Okay, so we need the frequency, so the frequency is gonna be. It’s could. If it’s quarterly, it’s four. If it’s monthly, it’s twelve. If it’s weekly, it’s only 52 if it’s gonna be daily will be 365 and so on. Now there’s some times where this gets a little different so fringe. A if you have a time frame in your data, that’s not quite weekly, so you have something that is like a week plus four or something like that. That’s gonna change this to be something like thirty or forty also. Some people like to use custom modeling and boosting and trying and tweaking to make a to overfit, which you can do. And in that case, you would see this number change, but for this is a very straightforward, easy math that we’re going to do here. So we’re just using 52 for the weeks because weekly. Then what we have here Is you have to put in your start year and month. If you don’t, it won’t show the years or months down here. It would only show one two three four five, something like that, which is not meaningful. So that’s what this does right here and so you run this. Then we do we’re gonna plot the time series function or formatted data. So if you plot the TS data, this is your original data to end up with this, right. See that now if I didn’t use this C 2016 comma one? Right here! This right here, then down here. I would just have numbers. Would you be less meaningful? In this case? I have 2016 and 2017 No, obviously have dot fives in the middle for each, and here’s your time series data, so it shows you the breakout of the data for this over time, then so that’s what this plot function does right here Next. We’re going to do is going to use the auto. Arima function and we’re going to get the optimal auto Rhema model. So this is where it’s going to be a zero zero zero most likely, but it may not be so we do. Is we run this right here? So lets. Let me just show you what auto? Remo, that’s this right here of the TS data. Time series data does. So we just do that, and it gives you this little blurb down here, lets. Pull this up so you can see a little bit better. It starts right here and in this case, it’s 1 1 1 It’s saying for this data, and that’s fine. It’s saying that 1 1 1 which is the normal default for most bi applications. And it’ll give you. You know, some other added numbers here, but what? I want to show you is once we have this what I want to do is. I want to actually forecast based on this right, and we’re gonna actually test the accuracy of this. And, you know, see how good this auto? Rhema really is later on and keep in mind. Every data sets different so just because one person gets a 98% accuracy. Another person gets 50% if it’s not the same data says if it’s a different data set, you know, the data could be more variant. It could have more missing numbers could have trending, you know, could have hiding campaigns hidden underneath it. That can you know, throw the numbers off and obvious when you look at our numbers here when you look at this, you see, the first year is a lot more standardized. The second year has a lot of variants thrown in it. That’s gonna throw our accuracy awful up in this second year, but that’s the way the numbers were so with this we’re going to do. Is we’re going to take this. We’re gonna put this in the auto. Rhema, one frame right, just like this. And then next we’re going to do is we’re gonna run the forecast right here, and we’re gonna put that in forecast one. So we have that here and so this the. H equals 17 is how many periods don’t want to forecast out, so we’re doing weekly, so 17 would give me basically 4 months or so of data. And that’s what I would do. I couldn’t do it for less. I can do it for 15 I can do it for 12 I can do it for 10 tended to be a little bit over two months. If you look at four point, two three weeks per ah per month. Okay, so that’s where the H equals seventeen. Is that’s not the same thing as the frequency up here, okay. H is the number of periods in this case weekly periods that I’m forecasting out of. Okay, so now to show you this. If I take forecast one, which is where I put this in, lets. Just do this, all right. Enter and oh, you know what? I misspelled it. This is where you got to be very careful on how you spell stuff in. R So let’s do it again there. We go there. We are okay. I forgot to put me in there somehow when I was doing that, but that’s fine, okay, so, yeah, see, that’s where it says. Error object forecast, one not found because we didn’t have the in there, but regardless, here’s the the main thing here is. You have both an 80 percent and a 95 percent confidence levels here upper and lower bounds, That’s. What these four right here are and then this first one right here is your actual forecasted value And what you’ll notice is as you go through this. It starts to get over here. It gets very close and that stays the same as it gets out here. And there’s a reason for that, okay. This is not oh! Rhema model and it’s going to do that towards the end, depending on the amount of length of the H period or the forecast period. That you pick. Now, let’s go to this so we did that now. Let’s plot our forecast, right, Let’s plot the action. So the forecast one is the actual forecast. And let’s plot that there. We go, that’s what you saw earlier. We first started this and it looks very plausible. If you go straight across here, you know, that’s about right. That’s pretty much, right, And then it shows you the 80% and 95% confidence intervals, so I visit the 80 and this is the 95% and so the wider you go the higher the percent that you will be fit in that range. Now, let’s go to the next one, so we already plotted it out now. Let’s plot the residuals because the residuals will tell you if we have congruence or if we have variance and how much of each so let’s plot that out. And if you look here, you start the zero here. The first year is pretty good, And then as I told you the second year has a lot of variance. Maybe there was a huge campaign or spike here. Maybe there was a lot of advertising done at different times and they caught in sales or maybe there just was bad weather that caused a lot of accidents that caused this huge spike here and then after it spiked in a huge drop. Because you know people that would have come here came here. I don’t know so next we want to do is. We want to look at so this looks. You’re going to see because the second year. Here that act true. It’s gonna be a little bit off, but it still going to be fairly good. Very good for a fairly good for this in this case when you have variants like this in the second year, you may want to go with a customer remodel, and we have that in our videos here already for you. This is just a quick method to get you to forecasting, okay. If you want that, go back through my videos, you’ll find a look under ARIMA. Models look under dealing with seasonality. Things like that so next we’re gonna take it, and we’re gonna actually plot the norm range of these residuals. So here you have the sample quintiles against theoretical quintiles and basically what you want to see is a line that goes straight like this, but we have one over here. That’s a little bit out, and then we have these starting to get little bit spaced out. So what’s happening here is as I told you in the second year. We have some higher variance. That’s gonna affect our our values, then that you’ll see, but it’s still gonna be okay. So next we want to do Is we look at the ACF and the P ACF, autocorrelation and passive correlation. And so when we look at this, it’s pretty good, except for at the start right here. This one right here is a little bit off, but the rest of it pretty much all falls in line, and that’s what you want to see and with this one, We have one lagging area over here that this could be dealt with to get higher accuracy again with a custom model that would that would put in so instead of one. Autorama model or email? One one one, which is what we have here. It might have a number here You’re going to put in the so zero zero one or something, whatever it is depending on the lag for that, so we’ve got the ACF and PCF they’re pretty good so now let’s look at the summary and the actually, so we have two ways to measure our accuracy with our our forecast. So let’s take a look again at our forecast here. Where is it right here? So let’s plot this up and we’re looking at that, and that looks pretty good in line with it, right, so if we were to follow this and probably go within this range would be good so but we want to know more about it, so we. Hank, we can use the summary right here. Let’s bring this up a little bit, So summary gives you this little bit right here and what it does. It gives you the the model. Do you remodel and the numbers behind it the? Pdq values they’re called, and we’re not going to delve too much into those. But we want to see here. Is this guy right here? So this this is the MAPE. And that is the percentage error for this that so basically the way that this works is one of the leading values or ways to judge accuracy. So what you do is subtract that from 100 so our accuracy or would be 60 about 60% Okay, which is not the most accurate. But as I told you, we have a lot of variance in this second year, so if we were to run this just for the first year, our actually would be a lot higher, but still it’s 60% accurate, which is better than flipping a coin, which would be 50% and that’s one way to look at it, and then you’ve got all these other leading indicators in here too, but let’s go and look at this, and this is the same way, so we got accuracy. The accuracy function does. The same thing gives you the exact same without the little bit in front of it, so you can use either one and what it does. It gives you the mate the same way. Forty point four, nine, three seven, so the accuracy here is about sixty percent fifty nine percent sixty percent with this data set again, it’s a very variable data set or a higher variance in the second portion. And then you can again plot it like we did here, and if you want to see the forecasted value, just make sure you spell it correctly and here they are, so you have your forecast for 17 periods and you can see it right here. This is the column right here, which starts at 38 eight, five eight, and it goes on down and then stays about thirty four, four, three eight through the rest of it. So that’s how you do. Auto Arima. That’s how we forecast and are very quickly and easily It’s just a few lines of code. Let me open this up so you can see a little bit better here, and I’ll give the whole name of the file that I’m using, and I’m gonna post this file on Kaggle and I’ll give you a link down below of how to get to it, and or just you can just use it. Just download it and use it, and this covers everything you need right here from lets. Bring this down so you can see the whole code from front and there we go so here it. Is you got install packages? If you need to install them to to. Libraries read excel forecast, the data set that we use the all the different functions that we use from TS data or Ts plotting the auto. Rhema, on down. So this is exactly how you do it. You end up with this and you can take this data. The forecast you could take make it. You know, four different weeks different periods, but it’s a quick and easy way to forecast stuff with an auto. Rhema model. And this will give you your values. You know your in this case? It was one one one based on this data and so that’s quickly how to do it, so thanks again for watching. I hope you found this helpful. Hope this answered the question for the people that want a quick, you know, an easy method for forecast and time series data. Please take a moment to bribe like and share and have a great day, thank you.