Back to blog home

Turbocharge Sagemaker ML Operations with Monitoring and Explainability from Fiddler

Watch the Sagemaker + Fiddler demo - Watch on YouTube - a deep-dive product demo, demonstrating the ways SageMaker customers can use Fiddler to continuously explain and monitor AI models.

MLOps (ML Operations) is a quickly growing discipline intended to solve the challenges of machine learning model development, deployment and monitoring in the enterprise. Production ML model performance can frequently fluctuate and decay over time due to changes in data. Continuously monitoring for model centric operational issues allows ML teams to limit its impact on business metrics. In addition to the operational challenges posed by AI, companies deploying AI face ethical and compliance risks, as indicated by frequent news stories about algorithmic bias. When not addressed, these issues can lead to a lack of trust, negative publicity, and regulatory action.  

SageMaker provides optimized model training and scalable distributed deployment on the cloud. Fiddler complements these capabilities with a pluggable Explainable Monitoring solution that enables teams to continuously explain, validate and monitor AI models  for complete visibility and trust.

In this post, we describe how customers can seamlessly integrate Fiddler with Sagemaker to accelerate their ML deployments with increased confidence and trust. We use a regulated use case of credit lending given the importance of explaining and monitoring black-box AI to adhere to not just anti-discrimination laws but also Federal Reserve guidelines like SR 11-7. 

About Fiddler

Fiddler offers the first solution for ML that combines production monitoring with explainability to help model developers and ops personnel inspect and address a comprehensive range of operational ML issues. In addition to production monitoring and explainability for model developers and ops, Fiddler provides dedicated interfaces for business and risk/compliance stakeholders, as well as APIs for end users, to ensure all stakeholders have visibility into and an understanding of model behavior.

Watch the Sagemaker + Fiddler demo - Watch on YouTube - a deep-dive product demo, demonstrating the ways SageMaker customers can use Fiddler to continuously explain and monitor AI models.

Use case: Credit Lending

In this post, we demonstrate how a credit lending model can be quickly trained and deployed on SageMaker, and then explained, validated and monitored in Fiddler. We use the popular lending dataset from LendingClub, a tabular dataset that consists of 59 columns. We train an XGBoost model with this dataset using SageMaker. Our model is a classification model that uses a loan applicant’s information to predict the probability of the loan being “charged off” (defaulted-on: a sign of a risky loan). 

Steps to deploy a model on SageMaker

1. Train the model in a SageMaker notebook (refer to the notebook here). This is what our final trained model looks like.

2. Next, this model can be deployed either from the notebook itself or by creating an endpoint using the “Create Endpoint” link at the top of the model page as we show below.

Key challenge: Explain and monitor the model with rapid troubleshooting

It’s a critical challenge today to get complete model operational visibility into performance of models that unlike other software can decay over time and behavior especially given the context of increasingly black-box models. Without this teams are unable to scale, maintain or assess the ROI for their ML deployments.

Fiddler helps resolve this by seamlessly integrating with the deployed SageMaker model. For monitoring, Fiddler needs the inference traffic from the deployed model. Fiddler also needs the ability to invoke the model to explain it along with an optional sample of data representing its composition that would be used for the explanation itself.

Fiddler explains the model inference or behavior using industry leading Explainable AI. Explainable AI refers to the process by which the outputs (decisions) of an AI system are explained in the terms of its inputs (data). Explainable AI adds a feedback loop to the predictions being made, and empowers teams to address the visibility and transparency challenges of AI systems. Explainable AI is integrated into monitoring to allow teams to quickly root cause a raised operational issue, saving considerable time.

Steps to setup SageMaker DataCapture

Let’s walk through how to plug model traffic into Fiddler. This is done with a built in SageMaker capability called Data Capture which enables logging of all prediction calls to the model.

3. Enable Data Capture: Here’s how you enable this capability for our deployed endpoint in SageMaker notebook

4. The endpoint detail page show now displays the details of the data capture 

5. Once your model receives traffic, DataCapture will save these predictions in structured directories in S3 like below

Here’s what each logged model inference call looks like:

Watch the Sagemaker + Fiddler demo - Watch on YouTube - a deep-dive product demo, demonstrating the ways SageMaker customers can use Fiddler to continuously explain and monitor AI models.

Steps to integrate a deployed Sagemaker model, data and output to Fiddler

Now let's connect this monitoring traffic along with the model and sample data into Fiddler. Data science teams can install Fiddler’s python client package to do this from Jupyter notebooks or an IDE of their choice.

6. Install Fiddler client in your sagemaker notebook and instantiate a client. You can get the auth token from ‘Settings’ in your Fiddler public or private cloud account.

7. Now upload a sample dataset. We use this to run our explainability algorithms and as a baseline for monitoring metrics. It is also used to generate the schema for this model in Fiddler.

8. Next, we generate and save the model schema, so we can call it on SageMaker

9. Fiddler has a default lightweight hook to invoke the model. This is captured in our package.py file. In this case we pass in the model endpoint so Fiddler can invoke the hosted SageMaker model via API.

10. We then push this proxy for the SageMaker model into Fiddler

11. Finally, once the model and dataset have been connected, we link the DataCapture monitoring traffic to Fiddler to continuously monitor this traffic. To do this we write a lambda function to notify Fiddler when a new log file is created on s3. Then add notification to the data capture s3 bucket that pushes the traffic to Fiddler.

12. With data, model and traffic connected, you can now explain monitoring drift, outliers, data issues and performance blips and share dashboards with others.

Conclusion

With Fiddler’s Explainable Monitoring, SageMaker customers can seamlessly explain, validate and monitor their ML deployments for trust, transparency and complete operational visibility to scale their ML practice responsibly and ensure ROI for their AI. 

Watch the Sagemaker + Fiddler demo - Watch on YouTube - a deep-dive product demo, demonstrating the ways SageMaker customers can use Fiddler to continuously explain and monitor AI models.