Deploying models coverted to ONNX format

Last updated: Dec 19, 2024

You can deploy and inference machine learning models that are saved in different model formats such as PyTorch or Tensorflow and converted to the Open Neural Network Exchange (ONNX) format. ONNX is an open-source format for representing deep learning models. Developers can use the ONNX format to train their models in one framework, such as PyTorch or TensorFlow, and then export it to run in another environment with different performance characteristics. The ONNX format provides a powerful solution for converting a maching learning model to ONNX and perform inferencing by using the ONNX runtime.

Benefits of converting models to ONNX runtime

Converting a model to ONNX runtime offers several benefits, especially in the context of machine learning and deep learning applications. Some of the advantages of converting models to ONNX runtime are as follows:

Cross-platform compatibility: ONNX provides a standard format for representing machine learning models, which makes it easier to deploy models across different frameworks such as PyTorch or Tensorflow. You can train models in one frameworks and deploy them in another framework that supports ONNX runtime.
Improved performance: ONNX runtime optimizes models for inferencing by applying various hardware and software-specific optimizaitons, such as graph optimizations. Also, it supports execution on diverse hardware, such as CPUs and GPUs, ensuring efficient utilization of resources.
Interoperability: ONNX provides a way to train models, such as PyTorch, TensorFlow, and scikit-learn in one framework and then export them to run in another environment, which streamlines the workflows. It breaks down the barriers between different deep learning frameworks, allowing developers to leverage the strengths of different libraries without getting locked into a single ecosystem.

Supported frameworks for conversion

You can convert machine learning models that use the following frameworks to ONNX format:

PyTorch
TensorFlow

Converting PyTorch models to ONNX format

Follow this process to convert your trained model in PyTorch to the ONNX format:

Import libraries: Start by importing the essential libraries, such as onnxruntime for running the model, torch for PyTorch functionalities, and other libraries required for your application.
Create or download PyTorch model: You can create a PyTorch model by using your own data set or use models provided by external open source model repositories like Hugging Face.
Convert PyTorch model to ONNX format: To convert the PyTorch model to ONNX format:

a. Prepare the model: Ensure that your PyTorch model is in evaluation mode by using model.eval() function. You may need a dummy input tensor to match the shape of the model.

b. Export the model: Use the torch.onnx.export function to convert the model to ONNX format.
Verify the conversion: After converting the model, verify that the model is functioning as expected by using the onnx library.

Converting TensorFlow models to ONNX format

Follow this process to convert your model TensorFlow to the ONNX format:

Import libraries: Start by importing the essential libraries, such as tf2onnx to facilitate conversion of TensorFlow models to ONNX, and other libraries required for your application.
Download TensorFlow model: You must download the externally created TensorFlow model and the data that is used for training the model.
Convert TensorFlow model to ONNX format: Use the tf2onnx.convert command to convert your TensorFlow model that is created in the SavedModel format to ONNX format. If you want to convert a TensorFlow Lite model, use the --tflite flag instead of the --saved-model flag.
Verify the conversion: After converting the model, verify that the model is functioning as expected by using the onnx library.

Additional considerations

Here are some additional considerations for converting your TensorFlow models to ONNX format:

Dynamic axes: Dynamic axes can be used by a model to handle variable input shapes, such as dynamic batch sizes or sequence lengths, which is useful for models deployed in application where the input dimensions may vary. Use dynamic axes if your model handles variable input sizes, such as dynamic batch size or sequence length.

Dynamic axes also reduce memory overhead as they can be used with multiple inputs and outputs to adapt dynamically without re-exporting the model. You can specify the dynamic axes during model export in PyTorch or TensorFlow.
Opset version: The opset version in ONNX determines the set of operations and their specifications that are supported by the model. It is a critical factor during model conversion and deployment.

Different ONNX runtimes and frameworks support specific opset versions. Older opset versions may lack features or optimizations present in newer versions. Incompatibility between a model's opset version and the ONNX runtime can cause errors during inferencing. You must ensure that the ONNX opset version that you choose is supported by your target runtime.

Deploying models converted to ONNX format

Use the onnxruntime_opset_19 software specification to deploy your machine learning model converted to ONNX format. For more information, see Supported software specifications.

To deploy models converted to ONNX format from the user interface, follow these steps:

In your deployment space, go to the Assets tab.
Find your model in the asset list, click the Menu icon Menu icon, and select Deploy.
Select the deployment type for your model. Choose between online and batch deployment options.
Enter a name for your deployment and optionally enter a serving name, description, and tags.
Note:
- Use the Serving name field to specify a name for your deployment instead of deployment ID.
- The serving name must be unique within the namespace.
- The serving name must contain only these characters: [a-z,0-9,_] and must be a maximum 36 characters long.
- In workflows where your custom foundation model is used periodically, consider assigning your model the same serving name each time you deploy it. This way, after you delete and then re-deploy the model, you can keep using the same endpoint in your code.
Select a hardware specification for your model.
Select a configuration and a software specification for your model.
Click Create.

Testing the model

Follow these steps to test your deployed models converted to ONNX format:

In your deployment space, open the Deployments tab and click the deployment name.
Click the Test tab to input prompt text and get a response from the deployed asset.
Enter test data in one of the following formats, depending on the type of asset that you deployed:
- Text: Enter text input data to generate a block of text as output.
- Stream: Enter text input data to generate a stream of text as output.
- JSON: Enter JSON input data to generate output in JSON format.
Click Generate to get results that are based on your prompt.

Sample notebooks

The following sample notebooks demonstrate how to deploy machine learning models converted from PyTorch or TensorFlow to the ONNX format by using the Python client library:

Sample notebooks
Notebook	Framework	Description
Convert ONNX neural network from fixed axes to dynamic axes and use it with ibm-watsonx-ai	ONNX	Set up the environment Create and export basic ONNX model Convert model from fixed axes to dynamic axes Persist converted ONNX model Deploy and score ONNX model Clean up Summary and next steps
Use ONNX model converted from PyTorch with ibm-watsonx-ai	ONNX	Create PyTorch model with dataset. Convert PyTorch model to ONNX format Persist converted model in Watson Machine Learning repository. Deploy model for online scoring using client library. Score sample records using client library.
Use ONNX model converted from TensorFlow to recognize hand-written digits with ibm-watsonx-ai	ONNX	Download an externally trained TensorFlow model with dataset. Convert TensorFlow model to ONNX format Persist converted model in Watson Machine Learning repository. Deploy model for online scoring using client library. Score sample records using client library.

Parent topic: Deploying machine learning assets