Model download for MLflow

MLflow Models is a cross-platform standard format for packaging machine learning models so that they can be used in a variety of downstream tools.

MLflow integrates easily with frameworks like:

This page will show an end-to-end example of how you can package a TensorFlow SavedModel, as downloaded from the Peltarion Platform, and deploy it with the MLflow Server which comes bundled with MLflow. This bundled server is quick to get started with, but consider other options for a production system.

The example uses a MNIST classifier as the model.


  • Set up a Python environment and install MLflow with pip install mlflow

  • Download a model in the SavedModel format from the Peltarion Platform

Converting a SavedModel to a MLflow Model

The first step in using a SavedModel with MLflow is to package it as a MLflow Model. There are two ways of doing this for TensorFlow models:

  1. Using the specialized TensorFlow flavor. See mlflow.tensorflow for the complete API.

  2. Using the general PyFunc flavor which is a Python interface that can wrap arbitrary models, including TensorFlow. See mlflow.pyfunc for the complete API.

The second option allows for more flexibility in encoding and decoding the input and output data for the model, so we’ll use it in this guide. The API for creating a MLflow Model package with the PyFunc flavor and a custom model wrapper is:


where the loader_module is a Python module that contains a function called _load_pyfunc() which loads the model from data_path and returns a PyFunc-compatible model wrapper. To create a Python module it is easiest to write this code into a separate file and import it into a notebook so that the module can be referenced. If one were to run the code from a Python script it might be possible to keep the code in the same file and reference __name__ to find it.

Peltarion models are defined with an input signature that expects a tf.train.Example, which is a tensor with binary data. This cannot be sent over HTTP as-is, and has to be base64 encoded first. MLflow does not provide flexible support for this, but it can be added to client code and the model wrapper. The predict() method assumes that the input data is coming as a (-1, 1)-shaped numpy array with base64 encoded string data, so the method transforms and decodes the data into a 1D tensor before sending it to the inference function.

The predictions are returned as a dictionary, with keys corresponding to the output block names and the values in tensors, so the values are transformed to numpy arrays that can be JSON-serialized before being returned over HTTP.

Write a file with the following content:

import tensorflow as tf

class _TFModelWrapper:
    def __init__(self, model_uri):
        self.model = tf.saved_model.load(model_uri)
        self.infer = self.model.signatures["serving_default"]

    def predict(self, input_data):
        print(f"Running prediction...")
        input_data = input_data.ravel()
        input_tensor = tf.convert_to_tensor(input_data, dtype=tf.string)
        input_tensor_decoded =
        preds = self.infer(input_tensor_decoded)
        preds_numpy = {k: v.numpy() for k, v in preds.items()}
        return preds_numpy

def _load_pyfunc(data_path):
    Load PyFunc implementation. Called by ``pyfunc.load_pyfunc``.

    return _TFModelWrapper(data_path)

With this module imported into the current process, we can call the save_model() API as follows. It is important to give the input schema definition as shown here, as it determines the pre-processing and validation that MLflow does before forwarding the data to the model wrapper.

import os
import shutil

import mlflow
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, TensorSpec
import numpy as np

# Note that this is the file we created just above. Modify the import path as needed.
import tf_loader

output_path = "mnist_mlflow_pyfunc"

# Delete any previous model at the path to avoid overwrite errors
if os.path.exists(output_path):
    except OSError as e:
        print("Error: %s - %s." % (e.filename, e.strerror))

# Define a Tensor-based data schema for the MNIST model
input_schema = Schema([TensorSpec(np.dtype("bytes"), (-1, 1)),])
output_schema = Schema([TensorSpec(np.dtype(np.float32), (-1, 10))])
signature = ModelSignature(inputs=input_schema, outputs=output_schema)

# Save the model in the MLflow Model format

Deploy a MLflow Server

Deploying the MLflow Server is straightforward from a terminal. Assuming that we already are in a Python environment with all dependencies available, run the following command, where the -m argument references the directory where the MLflow Model package is located.

mlflow models serve --no-conda -m mnist_mlflow_pyfunc

Get a prediction from the server

This section will demonstrate how a Python client can call the MLflow server we just deployed. As mentioned, we have deployed a MNIST classification model which has one input feature named Image.

Serialize the input data

Using the utility functions and the feature_mapping.json file, we can easily serialize the features into a tf.train.Example as follows:

# The input path may vary depending on how and where you have placed the utility function
from utils.parse_input import TrainExampleSerializer

feats = TrainExampleSerializer(

# Read a file from the MNIST dataset with the correct resolution
with open("mnist_pic_1.png", "rb") as f:
    mnist_bytes =

# Use the utility function to serialize the image into a tf.train.Example
example_tensor = feats.serialize_data(
        "Image": mnist_bytes

Call the MLflow Model Server REST API

MLflow can deploy models locally as REST API endpoints or to directly score files. In addition, MLflow can package models as self-contained Docker images with the REST API endpoint. The image can be used to safely deploy the model to various environments such as Kubernetes. See Deploy MLflow Models for the full details. The endpoint is

POST http://host:port/invocations

and it supports data in these structures:

  • JSON-serialized pandas DataFrames in the records orientation

  • JSON-serialized pandas DataFrames in the split orientation (not tested)

  • CSV-serialized pandas DataFrames (not tested)

Using the pandas-records format, we can call the REST API like this:

import base64
import json
import requests

# URL for the REST API invocations endpoint
URL = ""

# base64 encode the tf.train.Example
b64_example = base64.urlsafe_b64encode(example_tensor.numpy()).decode("utf-8")

data = {
    "columns": [
    "data": [

headers = {
    "Content-type": "application/json",
    "format": "pandas-records"

# sending get request and saving the response as response object
r =, headers=headers, data=json.dumps(data))
Was this page helpful?