Accelerating the Production of AI Solutions with Fiddler and Databricks Integration

Published

August 15, 2023

Last Edited

April 26, 2024

Karen He

Principal Product Marketing Manager

Fiddler AI

Anushrav Vatsa

Solutions Engineer

Fiddler AI

We’re excited to announce the Fiddler and Databricks integration! Together, we’re helping companies accelerate the production of AI solutions as well as streamlining their end-to-end MLOps experience.

Simplify the MLOps workflow with Fiddler and Databricks

When Fiddler and Databricks are used together, ML teams simplify their MLOps workflow and create a continuous feedback loop between pre-production to production to ensure ML models are fully optimized and high performing. Data scientists can explore data and train models in the Databricks Lakehouse Platform and validate them in the Fiddler AI Observability platform before launching them into production. Fiddler monitors production models for data drift, performance, data integrity, and traffic behind the scenes, and alerts ML teams as soon as high-priority model performance dip.

ML teams have a seamless experience in their end-to-end MLOps workflow using Fiddler and Databricks

Fiddler goes beyond measuring model metrics by arming ML teams with a 360° view of their models using rich model diagnostics and explainable AI. Contextual model insights connect model performance metrics to model issues and anomalies, creating a feedback loop in the MLOps workflow between production and pre-production. ML teams can confidently pinpoint areas of model improvement, and go back to earlier stages of the MLOps workflow as early as data exploration and preparation, in Databricks, to explore and gather new data for model retraining.

Get started with Fiddler in minutes

Install and initiate the Fiddler client to validate and monitor models built on Databricks in minutes by following the steps below or as described in our documentation:

1. Upload a baseline dataset

Retrieve your pre-processed training data from a Delta table. Then load it into a data frame and pass it to the Fiddler:

baseline_dataset = spark.read.table("YOUR_DATASET").select("*").toPandas()

dataset_info = fdl.DatasetInfo.from_dataframe( 
    baseline_dataset, 
    max_inferred_cardinality=100)

fiddler_client.upload_dataset(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    dataset={'baseline': baseline_dataset},
    info=dataset_info)

2. Upload model metadata

Share model metadata: Use the ML Flow API to query the model registry and signature which describes the inputs and outputs as a dictionary:

import mlflow 

model_uri = MlflowClient.get_model_version_download_uri(model_name, model_version)

mlflow_model_signature = mlflow.models.ModelSignature.to_dict(model_uri)

Now you can share the model signature with Fiddler as part of the Fiddler <code>ModelInfo</code> object :

model_info = fdl.ModelInfo.from_dataset_info( 	
    dataset_info = fiddler_client.get_dataset_info(YOUR_PROJECT,YOUR_DATASET), 	
    target = "TARGET COLUMN", 	
    #optionalArguments 	
    mlflow_params = fdl.MLFlowParams(mlflow_model_signature))
    
fiddler_client.add_model(
    project_id="Your Project ID", 
    dataset_id=DATASET_ID, 
    model_id="Your Model ID", 
    model_info=model_info)

3. Publish events

Live models: Publish every inference format and send every model output as a dataframe to Fiddler using the <code>client.publish_event()</code>

fiddler_client.publish_event(
    project_id=PROJECT_ID,
    model_id=MODEL_ID,
    event=my_event,
    event_timestamp=xxxxxxxxxxx)

Batch models: Use the data change feed on live tables and put the new inferences into a data frame:

changes_df = spark.read.format("delta") \
    .option("readChangeFeed", "true") \
    .option("startingVersion",last_version) \
    .option("endingVersion", new_version) \
    .table("inferences").toPandas()

fiddler_client.publish_events_batch(
    project_id=PROJECT_ID,
    model_id=MODEL_ID,
    batch_source=changes_df,
    timestamp_field='timestamp')

Contact our AI experts to learn how enterprises are accelerating AI solutions with streamlined end-to-end MLOps using Databricks and Fiddler together.