Summary

This article discusses deploying a trained MNIST model as a web service using Flask, a Python web framework, and demonstrates how to build a simple client to invoke the service.

Abstract

The article begins by mentioning a previous post where the author covered the training phase of a simple neural network for image recognition using PyTorch. The network consists of one input layer, two hidden layers, and one output layer. After training, the model's learned weights/parameters are saved. The author then explains the implementation of a service using Flask, which contains only one endpoint (/guess). The service receives a JSON containing the hand-written digit (tensor representation), converts it to an array, and then to a tensor. The tensor is flattened to size 784 (28 times 28) and passed to the model, which outputs a guess for each digit from 0 to 9. The service runs on port 8888. The article also includes a simple client program that loads the MNIST data set and passes some hand-written digit (tensor representation) to the service. The client shows the digit as an image to compare it with the service's response. The article concludes by encouraging readers to try out the AI service recommended by the author, which provides the same performance and functions as ChatGPT Plus (GPT-4) but at a more cost-effective price.

Bullet points

The article covers deploying a trained MNIST model as a web service using Flask.
The service contains only one endpoint (/guess) and receives a JSON containing the hand-written digit (tensor representation).
The service converts the JSON to an array and then to a tensor, which is flattened to size 784 (28 times 28) and passed to the model.
The model outputs a guess for each digit from 0 to 9.
The service runs on port 8888.
The article includes a simple client program that loads the MNIST data set and passes some hand-written digit (tensor representation) to the service.
The client shows the digit as an image to compare it with the service's response.
The article encourages readers to try out the recommended AI service, which provides the same performance and functions as ChatGPT Plus (GPT-4) but at a more cost-effective price.

Deploy MNIST Trained Model as a Web Service

I cover the training, service and client implementation. The service receives an image of a hand-written digit between 0 and 9 (tensor format) and guesses which number the image represents

In one of my articles on deep learning, I teach how to implement a simple image recognition system. The program does everything coded in one file — dataset loading, model definition, training and evaluation.

In this post, I’ll walk you through how to save the model and load it from a service implemented using Flask (Python web framework).

I’ll also show how to build a simple client to invoke the service.

The code is available on Github. The repo contains the following files,

client.py
service.py
train.py
neural_network.py

Training and Saving The Model

I covered the training phase in my previous post. I used PyTorch.

It’s a simple neural network with one input layer, two hidden layers and one output layer.

The example I use is the “Hello World” of image recognition. It’s great for beginners starting with deep learning.

After training, I save the model learned weights/parameters. Saving the learning parameters or also referred to as “saving for inferencing”. It’s the recommended approach.

Although it’s easier (less code) to save the entire model, it’s not advised because the data is bound to the classes and directory structure when the model was saved.

Below is how I save the model,

torch.save(model.state_dict(), "model.pth")

Service

For developing the service, I use a Python web framework called Flask. The implementation contains only one endpoint (/guess).

As a reference, I used this article by Tanuj Jain. Tanuj also teaches how to deploy in a cloud virtual machine (VM).

When service.py is executed, the program loads the model (previously saved). The model is loaded only once.

Request and response

The service receives a JSON containing the hand-written digit (tensor representation). It converts the string JSON to an array and then converts it to a tensor.

Next, it flattens the tensor to size 784 (28 times 28) and passes to the model. The model output (10 nodes) contains the guess for each digit from 0 to 9.

torch.argmax returns the maximum value of all output nodes. Lastly, the service response is a string.

Figure 1. Service running. Image by the author.

The service runs on port 8888. It could be any port. If you choose port 80 or another standard port number, you may face a “Permission” error.

Stopping the service

If you need to stop the service, the method I used is simply looking for the process based on some keyword and killing it.

ps -ef | grep service.py
kill -9 <process>

Client

The client is simple. It’s a program that loads the MNIST data set and passes some hand-written digit (tensor representation) to the service. You choose which number to send.

Before invoking the service, the code converts the tensor to a list and wraps it in JSON. Also, before calling the service, it shows the digit as an image to compare it with the service’s response.

Figure 2. Response from service. Image by the author.

Final Thoughts

I hope you enjoyed this tutorial. The example shown here is for you to get started and evolve to something more interesting. For instance, imagine writing a number on the paper, pointing to your camera, and having the service invoked.

The most critical step is learning to save, load the model, and have it as a service. Explaining how to deploy in different cloud vendors comes next. I plan to write a tutorial on it.

That’s it for now. Thanks for reading.