Model download for TF serving

TF Serving is part of the TensorFlow ecosystem and provides a high-performance serving system for machine learning models in production environments.

Prerequisites

  • TF Serving is easiest to use with Docker. Install Docker and pull the image tensorflow/serving to your system.

  • Download a model in the SavedModel format from the Peltarion platform.

Deploy a TF Serving server

TF Serving requires a specific file tree structure to find the model artifacts. The path to a model should follow the pattern <model_base_path>/<model_name>/<version>/ and the SavedModel artifacts should be placed underneath that directory. As an example, for a base path models, a model named mnist with the version 1, the file tree looks as follows:

models/
└── mnist
    └── 1
        ├── feature_mapping.json
        ├── assets
        ├── saved_model.pb
        └── variables
            ├── variables.data-00000-of-00001
            └── variables.index

The file bundle that you downloaded from Peltarion will have a different file structure, so restructure and rename the files to match the above format. Note that feature_mapping.json is not required by TF Serving, but is created by Peltarion and required for creating the expected input data.

Start a TF Serving Docker container with the following command. This mounts all the models that are saved in ./models/ into the container. The service will load the latest available version of the <model_name> model and expose two APIs; the gRPC API is available on port 8500 and the REST API on port 8501.

docker run \
    -p 8500:8500 \
    -p 8501:8501 \
    --mount type=bind,source=${PWD}/models,target=/models \
    -e MODEL_NAME=<model_name> \
    -t tensorflow/serving

There are many options available to control the lifecycle of models in TF Serving, and the reader is referred to TF Serving Config for a complete walk-through of the options.

Get predictions from the server

This section demonstrates how a Python client can call the TF Serving server we just deployed. TF Serving exposes three REST API endpoints, as described in TF Serving REST API, which define corresponding interfaces for the model:

  • Predict

  • Regress

  • Classify

The predict endpoint is a low-level interface available for all models, whereas regress is only available for regression models, and classify is only available for classification models. The two latter are high-level APIs and do additional pre-processing on the server-side to simplify the client code.

For demonstration purposes, assume that we have deployed a MNIST classification model which has one input feature named Image. Both the predict and classify endpoints are shown here.

Predict endpoint

The predict endpoint performs minimal pre-processing before feeding the request to the model. All Peltarion models expect to get the input features serialized into tf.Train.Example objects, so we have to perform that serialization on the client side before sending a request.

Serialize the input data

Using the utility functions and the feature_mapping.json file, we can easily serialize the features into a tf.train.Example as follows:

# The input path may vary depending on how and where you have placed the utility function
from utils.parse_input import TrainExampleSerializer

feats = TrainExampleSerializer(
    feature_mapping_file_path="<relative path to feature_mapping.json>"
)

# Read a file from the MNIST dataset with the correct resolution
with open("mnist_pic_1.png", "rb") as f:
    mnist_bytes = f.read()

# Use the utility function to serialize the image into a tf.train.Example
example_tensor = feats.serialize_data(
    {
        "Image": mnist_bytes
    }
)

Call the TF Serving predict endpoint

The URL for the predict endpoint is:

POST http://host:port/v1/models/${MODEL_NAME}[/versions/${VERSION}|/labels/${LABEL}]:predict

The request body for predict API must be JSON object formatted as follows:

{
  // (Optional) Serving signature to use.
  // If unspecifed default serving signature is used.
  "signature_name": <string>,

  // Input Tensors in row ("instances") or columnar ("inputs") format.
  // A request can have either of them but NOT both.
  "instances": <value>|<(nested)list>|<list-of-objects>
  "inputs": <value>|<(nested)list>|<object>
}

Using the row format and the signature serving_all, we can call the predict endpoint as below. See TFX Encoding Binary Values for more information about how TF Serving handles encoding of binary data.

import base64
import json
import requests

# URL for the REST API predict endpoint for the MNIST model
URL = "http://127.0.0.1:8501/v1/models/mnist:predict"

# base64 encode the tf.train.Example
b64_example = base64.encodebytes(example_tensor.numpy()).decode("utf-8")

# Build the data to send in the request. For binary data one must use the nested object {"b64": <binary_data>} to instruct TF Serving to decode it on the server side.
data = {
    "signature_name": "serving_all",
    "instances": [
        {
            "examples": {"b64": b64_example}
        }
    ]
}

headers = {
    "Content-type": "application/json"
}

# Send POST request and save the response
r = requests.post(url=URL, headers=headers, data=json.dumps(data))

print(dict(r.json()))

Classify endpoint

The classify and regress endpoints will perform more pre-processing before feeding the data to the model. The endpoints expects to get the list of features in the request and the server will serialize it into a tf.train.Example before feeding it into the model. This simplifies the code on the client side, and we only need to map the feature names and perform base64 encoding for binary features.

Serialize the input data

Using the utility functions and the feature_mapping.json file, we can prepare the features as follows:

# The input path may vary depending on how and where you have placed the utility function
from utils.parse_input import RegressClassifySerializer

feats = RegressClassifySerializer(
    feature_mapping_file_path="../tf-serving/saved_models/mnist/feature_mapping.json"
)

# Read a file from the MNIST dataset with the correct resolution
with open("mnist_pic_1.png", "rb") as f:
    mnist_bytes = f.read()

processed_features = feats.serialize_data(
    {
        "Image": mnist_bytes,
    }
)

Call the TF Serving classify endpoint

The URL for the predict endpoint is:

POST http://host:port/v1/models/${MODEL_NAME}[/versions/${VERSION}|/labels/${LABEL}]:classify

The request body for the classify and regress APIs must be a JSON object formatted as follows:

{
  // Optional: serving signature to use.
  // If unspecifed default serving signature is used.
  "signature_name": <string>,

  // Optional: Common context shared by all examples.
  // Features that appear here MUST NOT appear in examples (below).
  "context": {
    "<feature_name2>": <value>|<list>,
    ...
  },

  // List of Example objects
  "examples": [
    {
      // Example 1
      "<feature_name1>": <value>|<list>,
      ...
    },
    {
      // Example 2
      "<feature_name1>": <value>|<list>,
      ...
    }
    ...
  ]
}

Using the signature classify, we can call the classify endpoint as below. See TFX Encoding Binary Values for more information about how TF Serving handles encoding of binary data.

import base64
import json
import requests

# URL for the REST API predict endpoint for the MNIST model
URL = "http://127.0.0.1:8501/v1/models/mnist:classify"

data = {
    "signature_name": "classify",
    "examples": [processed_features]
}

headers = {
    "Content-type": "application/json"
}

# Send POST request and save the response
r = requests.post(url=URL, headers=headers, data=json.dumps(data))

print(dict(r.json()))
Was this page helpful?
YesNo