Transcript:
Hi, when we talk about model monitoring, there are two important concepts that one need to remember. First is concept drift and Second is data drift. Now, coming to concept drift concept drift is basically when the statistical properties of your model output are more specifically the value you are trying to predict changes over time. Now it can change due to an business scenario change. The business itself is completely changing. You are acquiring a company. It can be anything or it can change because the training data that you used to train, the model does not represent the entire population that basically is coming during the prediction. Now let me give you An example of concept drift, take the initial telecom industry. Basically, the mobile number you are using is bound to the service provider it can be. AT&T, it can be a T-mobile. It can be anything if you’re in another country. It can be JIO Airtel! It was pretty much bound to your service provider. But later, after few years, they introduced something called number portability. Now number portability is basically you can keep the same number, but you can go to a different service provider and think of the time like when there was no number portability and you developed an Church prediction model, a customer churn prediction model. Most of the time customer do not want to change their number. Their numbers is with known contacts and they want to keep that number. So basically, even though they are not happy with the service, they would stick on to that particular service provider, and you have a model that is been predicting customer churn pretty well, but all of a sudden, basically, an number portability concept was introduced and lot of customer took opportunity and moved from one carrier to another. So basically, what you are seeing here is your prediction is no longer valid based on the underlying assumption. That a new business scenario has come into existence and that can cause a higher number of churn. So basically, a model has to be re-calibrated and since you will not have lot of data points for some time to re-calibrate it, Your model has to take that into account or maybe substitute it with more like a deterministic model where customers can be incentivized to not quit. That is one scenario. The second scenario is in the credit card industry in earlier stages. Basically, the credit card used to work based on a magnetic tape. There’s a magnetic tape in the back, which you go and slide your credit card. Now the the problem with the magnetic tape is anybody can skim your credit card. So basically, if you go to merchant, their employee can use a skimmer to skim the credit card, and that’s the exact reason the chip (EMV) was introduced in a card, which cannot be skimmed. You insert the chip these these days and lets bank reduce the fraud substantially. Now think if you have developed a model in an pre magnetic tape card era. And you moved into an EMV world. Your model is no longer having the same predictive power, But one thing if you notice what happened After introducing of EMV, lot of fraud that was happening on the point of sales got shifted into online so basically in online. You don’t you don’t use your chip anyway? What happened like people started stealing the credit card and then selling the credit card number in the gray market, so others can buy it and use it for fraud. So basically, these are some of the scenarios where your underlying model assumptions can change and you may have to recalibrate the model and this is one scenario where a continuous monitoring of your model is required in some cases. Even if you continuously monitor, it may not give alert. So there are some techniques which we’ll talk in a later stage, but this is concept drift. The second is data drift so data rift is your model is not drifting or your model has drifted. It can be drifted due to an a particular feature that is used in the model or your model is fine, but the data can drift by itself, but your model is using it for prediction and that particular model prediction might not be accurate at all, so there are multiple scenarios over here. Now, data can drift because of any data quality issue also. So your upstream that is sending the data might have change something in the system and that might have caused the data to come wrong to downstream system and it can be your model as well or you are. You are depending on the the particular data that is coming from a third-party vendor and the third-party vendor might have changed some logic in the data. And when you are getting the data, basically you are, you’re kind of seeing an completely different data point and your model is trying to predict based on that where the model can get drifted. The other scenario is your business scenario is again changed because of which a new additional data element has been introduced or additional category has been introduced. Say you are having one of the categorical value as product, and you basically have some line items in product category and a new product category has been introduced in your enterprise then basically that the model has not seen. So in this scenario, also, you may have to re-calibrate and re-train the model so that the model is predicting on the new instances. So, in the next video, we will see techniques to handle this concept, drift and data drift. You will see some of the techniques on how this can be detected and what is the process for recalibrating and further deploying it in the system? Thankyou.