Transcript:

Hi, in this video. We are going to see how to check. If a time series is stationary or not, we will be looking. We’ll be looking at some of the statistical tests like Kps’s Test and ADF test area stands for augmented Dickey Fuller test to test whether a time series is stationary or not. Now before going there, let’s understand what is a stationary time series and non-stationary time series now stationary time series has property where the mean variance and auto correlation structure. Do not change over time. Now let us look at this particular graph, right. The first one is a stationary time series now in this. If you see, basically, the this is a pretty flat looking series. The entire series is centered around zero mean right, and this also seems to be constant if there is no like known constant change variance over time, the variance have been I and low, but that’s fine, but it’s not a constant change over time and there is no periodic seasonality over here like we can see some spikes, but this spike is not continuing the entire series so there is not much seasonality as well. That can be a cyclic nature in the data, but there’s no seasonality. Now I will tell why it is important to understand the structure of time series because it’s very critical to select the right model. That will fit your data. This is the stationary time series now. If you see below here, you can see, basically. The the time series is not constant or centered around a particular value. There is an, uh, kind of a decreasing trend in the time series, so that is a trend to this particular time series component. That’s a decreasing trend and that’s. Why, like we call this as a non-stationary time series? Now let’s kind of get started, uh, with the with the statistical test, but what is the purpose of understanding? If n, uh, if a time series is stationary or not? Now let’s talk about different models like the very first model is Arma model, right, the auto regressive and moving average that this model expects your time series to be stationary now. If your time says is not stationary adds as some trend component to it then. Basically, you need to go for a Arima model. The the Arima model that is one additional component of the Algal model. That is the eye. Now what this high component does is this I component? This makes your time series stationary out of your non-stationary. One and this procedure is called differentiation. So basically, you are taking your time series and differentiating it from the previous time series lag and that’s where Arima model comes into play. So if your data is completely stationary, then you can go for Arma model. If it’s non-stationary you have to think of Arima model and this differentiation that you can do that can be over one time period or multi-time period. And that is the parameter called D that you set in your arma model and it can be one two. Typically two is what the maximum is, but sometimes it can go above that as well. Now, if your data has seasonality or trend seasonality, then you need to look at a seasonal. Yes, yes, or there is seasonal Arima model, uh, because you need to differentiate it from the seasonal component, so that’s why it’s very important to understand whether your data has an, uh, stationarity or not, because depending on that the model that you choose will vary right, so for this purpose, what I am going to do is I have run an initial few steps, so I have imported different packages here. Matplotlib Plotly Express to visualize the data and pandas to load the data. I am using the Amazon revenue data set. Which is that in my Github. Repo, over here! I will also mention the link in my Youtube video description below. You can check it out. I am just using this particular. Amazon revenue data set and this data set as quarterly level data of Amazon revenue. Uh, the actual revenue that it has done from, uh, like, uh, some 2005 year to 2020 and the net income right, and once I plotted this how the data looks like. Um, well. If you see, I’m using plot express, so the plot key express what I’m doing is I’m feeding quarter as my X axis that is a time and then y axis is revenue. I’m taking only revenue. I’m not taking net income. If you see over here this particular time series. The data is increasing from year 2005 till 2020 so that is a trend component to this particular time series. So now here we know, the data is non-stationary, but how do we test it like one is from visual, We know exactly stationary or non-stationary, but we can be wrong as well, so it is better to run some statistical tests. Now what I’m going to do is I’m going to run two statistical tests to check whether the time series is stationary or not. I’m going to run the kps’s test and ADF test, so the first one I’m going to do is run Kps’s test in Kps’s test. Basically, the null hypothesis. Thus, the time series is stationary and the alternate hypothesis is the time series is not stationary now. If you just remember this particular null hypothesis and alternate hypothesis, because when we go to ADF test, it will be completely the other way around, right, so this is a null hypothesis. The series is stationary and the alternate hypothesis is the series is not stationary so to run the k-piece’s test. What I’m going to do is I’m going to import KPS test from the task model package. That’s what I’m doing over here, and then I am basically calling the Kps’s function and I’m passing the revenue object from the data frame that I created with the Amazon data set. And I am passing CT. Basically ct. I’m telling that this particular data has a trend component to it because what we saw on the top, the the data was creasing over time, so there’s a trend to trend component to it. The series was not constant around the mean the mean was shifting towards up like. That’s why we call trend now. This is where we give c d now. If that it does not have a trend, it is still around mean, then you have to give C. So if you click on this kps’s, you can basically, uh, see the different options C is. The data is stationary around a constant CT is that the data is stationary around the trend, so these are the two values and the output. I get is I get the stats, which is nothing. But the Kps’s test statistics that’s the value that we get and the P value. Uh, that that’s going to be for base for our hypothesis, Testing the lags and the critical value the lags is. How much lag did it use? Uh, you can you can feed your own lags. Otherwise, Kpss is going to use the default lag. And if you again, go to the documentation, it will mention how it calculates the lags and the critical values for the, uh, for the test, right, so these are the four values that we are going to get. Let me run this one and then let me print the test. Statistics P value and the critical values And what I am doing is if the P value is less than 0.05 then the series is not stationary, right so basically, what we are doing Is. We are kind of, uh, rejecting the null hypothesis in this case, but if it is greater than 0.5 that means the series is stationary, so that’s what I have entered over here, so let me quickly run this, and now if you see over here on the visual, also, we saw the series is not stationary and here. Also, the P value is pretty less less than 0.05 so and the test statistics is 0.17 So if you see the critical values of, uh, like five percent, if you take 0.05 that is 5 The critical value is 0.14 since the test statistic is 0.17 We know like we are kind of rejecting the null hypothesis Over here. We are kind of accepting the alternate hypothesis, and, uh, that this shows basically the, uh. The particular time series is not stationary. Right now. What we are going to also try is we are going to try with the ADF test so the. Why do we try with two tests like it? Sometimes it’s good to try and make sure, um, both the tests are giving not stationary. Uh, that is like some, uh. You may have like some iota or doubt because sometimes it may in the border range, so it’s better to run two tests. But, uh, even a single test will, uh, work as well, right, So both tests are available. You can choose which test we want to do now. In this case of ADF test, the null hypothesis is basically, the series poses a unit and hence is not stationary now. If you see on the top, the null hypothesis was the series is stationary here. The null hypothesis series is not stationary. Uh, forget the unit root. For now. I will cover unit root in a separate video because depending on the unit load, uh, you may have to apply differentiation as well, so I will take it in a separate video, but think, like null hypothesis is the series is not stationary and all alternate hypothesis. The series is stationary. It is completely opposite of what we saw in the kps’s test. So in this case, what I’m doing is I’m again from Statmo stats model. I’m importing the ad. Adam Fuller package, sorry augmented Dickey Fuller Test package that is the ad, fuller, and then what I’m doing is I’m passing the, uh, revenue data frame again and I’m getting the result and from the result again, it will give me a test statistics, P value and critical value this for this particular test that is the zero one and fourth index in this particular result object. I’m pretty good and here what I’m telling is if my P value is greater than point naught five right the earlier, it was less than so in if it’s greater than point naught five. Uh, in that case, what I’m doing is I’m failing to reject the null hypothesis, So my series is not stationary else. My series is stationary, so that’s. What, I’m printing, right, The red between the output. And here you can see. Basically, our P value is 0.12 that is greater than 0.05 so basically, what we are doing is we are failing to reject the null hypothesis and the end server series is not stationary in this case. And even if you see the test statistics, it’s minus 2.44 and the five percent significant level is point, uh, minus two point nine two, so it is still less than your five percent significance level. So in this case, when both both of this test what we can conclude, is this particular series is not stationary and since it’s not stationary, we can. We cannot use the harma model. We have to go and use the Arima model with the I component. That is the integrated component in it where we feed the D value. Uh, that is the image. Three parameters, P D and Q p is for your auto regressive. Uh, D is for your integrator, and Q is for your moving average. We have to feed the P value that is a differentiation parameter and I will talk about differentiation parameter separately. Uh, but this test is critical to understand what model we can use. All right, so that’s about it in this video. Uh, one other point I want to add. Is I said, Arima? But this particular trend as seasonality as well, you can see constant peaks during the December quarter. So basically, we have to use seasonal arima. Uh, that should be a seasonality component as well. Uh, but but in this case, uh. I just want to give an intuition How to test for non-stationarity. Thank you very much.