Top banner logo
We are very proud to be named a 2020 Gartner Cool VendorLearn more

PyTorch and ML.NET Inference Performance Comparison

Let’s say you have a working and a developer-friendly .NET ecosystem. There are a lot of services and your team doesn’t cherish the idea of having a service built without .NET. Additionally, there is a pending request to develop software to serve some machine learning models.

Most of the machine learning models are created with Python and libraries like scikit-learn, PyTorch, or TensorFlow. Here come the questions. Should the team just implement a Python service with the abovementioned libraries? Perhaps using ML.NET for the sake of the ecosystem consistency would be a better solution? I will try to find some rationale behind these solutions. To resolve it I made inferences of the ResNet18 deep learning model using PyTorch and ML.NET and compared their performance.

PyTorch Performance

PyTorch is a widely known open source library for deep learning. It’s no wonder that most of the researchers use it to create a state of the art models. It’s a popular choice for Python developers to evaluate acquired models. 

To get measurements of the models I used the following environment and hardware:

  • GeForce GTX 1660.

  • AMD Ryzen 5 3600.

  • Ubuntu 18.04.

  • CUDA Toolkit 10.1.

  • cuDNN 7.0.

  • torch 1.7.1.

  • torchvision 0.8.2.

The ResNet18 model is obtained from the PyTorch Hub. In order to get the inference times, I made 10000 inferences with GPU and CPU. At this point my interest didn’t lie in the output of the model so using a random tensor as an input sufficed. It’s also the reason why I didn’t scale the input tensor. I used this code to generate inference time:

image (5)
image (5)

The following distributions were received after the code execution:

PyTorch CPU and GPU inference time
PyTorch CPU and GPU inference time

The mean inference time for CPU was `0.026` seconds and `0.001` seconds for GPU. Their standard deviations were `0.003` and `0.0001` respectively. GPU execution was roughly 10 times faster, which is what was expected.

ML.NET Introduction

ML.NET is a machine learning framework built for .NET developers. It has a built-in AutoML allowing you to avoid choosing the best type of model manually. However, it lacks possibility to define and train your own neural networks. You need to import your model in ONNX format from PyTorch with the following code:

image (7)
image (7)

ML.NET heavily relies on the ONNX Runtime accelerator to make use of the deep learning models and all the inferences are made using ONNX Runtime. 

To conduct ML.NET inferences I used the same hardware but another environment because of ML.NET requirements:

  • Ubuntu 18.04.

  • CUDA Toolkit 10.2.

  • cuDNN 8.

This ML.NET code will have a more thorough description because it’s much less popular than PyTorch. At the first step, we need to install NuGET packages with ML.NET and ONNX Runtime:

  • Microsoft.ML 1.5.4.

  • Microsoft.ML.OnnxRuntime.Gpu 1.6.0.

  • Microsoft.ML.OnnxTransformer 1.5.4.

Before trying to make any inference we need to create two classes representing the input and output of the model:

Screenshot 3
Screenshot 3

The most important thing is to create `MLContext` object. This object is central to the ML.NET framework. Every inference and preprocessing operations are defined with it. It has a nice feature to set a random seed to make operations deterministic. 

As the next step, you can define a model estimator, fit it to let it know about the input data scheme, and finally create a prediction engine. This can be achieved with the following code:

image (8)
image (8)

Finally, you can create some input data, make inferences, and look at your estimation:

image (6)
image (6)

This resulted in the following distributions:

ML.NET CPU and GPU inference time
ML.NET CPU and GPU inference time

Mean inference time for CPU was `0.016` seconds and `0.005` seconds for GPU with standard deviations `0.0029` and `0.0007` respectively.

Conclusion

Upon measuring the performance of each framework we can summarize it with the following graph:

Framework performance
Framework performance

ML.NET can evaluate deep learning models with a decent speed and is faster than PyTorch using CPU. It can be a dealbreaker for production use. With ML.NET you can have all the advantages of the .NET ecosystem, fast web servers like Kestrel, and easily-maintainable object-oriented code.

Yet there are some drawbacks to take note of. PyTorch GPU inference is faster. Using ML.NET for some tasks which rely heavily on GPU computation may be slower. Before evaluating a model you would often need to preprocess your data. Even though ML.NET has a rich library of different transformers for data preprocessing, it may be a complex task to reproduce the same preprocessing in your .NET software.

Deciding what framework to use depends on your current situation. How much time do you have to develop the service? How important performance is for you? Does data preprocessing depend on some Python library? You need the answer to all of these questions while keeping in mind ML.NET and PyTorch framework performance. 

Related Posts