Deploy MNIST Trained Model as a Web Service
I cover the training, service and client implementation. The service receives an image of a hand-written digit between 0 and 9 (tensor format) and guesses which number the image represents

In one of my articles on deep learning, I teach how to implement a simple image recognition system. The program does everything coded in one file — dataset loading, model definition, training and evaluation.
In this post, I’ll walk you through how to save the model and load it from a service implemented using Flask (Python web framework).
I’ll also show how to build a simple client to invoke the service.
The code is available on Github. The repo contains the following files,
- client.py
- service.py
- train.py
- neural_network.py
Training and Saving The Model
I covered the training phase in my previous post. I used PyTorch.
It’s a simple neural network with one input layer, two hidden layers and one output layer.
The example I use is the “Hello World” of image recognition. It’s great for beginners starting with deep learning.
After training, I save the model learned weights/parameters. Saving the learning parameters or also referred to as “saving for inferencing”. It’s the recommended approach.
Although it’s easier (less code) to save the entire model, it’s not advised because the data is bound to the classes and directory structure when the model was saved.
Below is how I save the model,
torch.save(model.state_dict(), "model.pth")Service
For developing the service, I use a Python web framework called Flask. The implementation contains only one endpoint (/guess).
As a reference, I used this article by Tanuj Jain. Tanuj also teaches how to deploy in a cloud virtual machine (VM).
When service.py is executed, the program loads the model (previously saved). The model is loaded only once.
Request and response
The service receives a JSON containing the hand-written digit (tensor representation). It converts the string JSON to an array and then converts it to a tensor.
Next, it flattens the tensor to size 784 (28 times 28) and passes to the model. The model output (10 nodes) contains the guess for each digit from 0 to 9.
torch.argmax returns the maximum value of all output nodes. Lastly, the service response is a string.








