Summary

The web content outlines the deployment of a machine learning model using Kubeflow Pipelines V2, MLflow, and Seldon Core, focusing on the final part of a four-part MLOps series.

Abstract

The article is the concluding part of a comprehensive MLOps tutorial series, detailing the process of deploying a trained machine learning model onto a local kind Kubernetes cluster using Seldon Core. It provides a step-by-step guide on building a non-reusable Seldon server, including the creation of a Dockerfile, writing the Python Inference code, and setting up the necessary environment variables. The author explains how to containerize the model, push the Docker image to a registry, and deploy it using a SeldonDeployment YAML configuration. The deployment is then tested using a custom Python script that sends image data to the model server and receives predictions. The tutorial emphasizes the integration of MLOps tools to manage the machine learning lifecycle, from experimentation to production.

Opinions

The author advocates for the use of Seldon Core for deploying machine learning models, highlighting its ability to create both reusable and non-reusable servers.
The tutorial suggests that using a local kind Kubernetes cluster is a practical approach for learning and experimenting with MLOps workflows.
The author's approach to containerization and deployment indicates a preference for leveraging existing Python Seldon Core modules and Docker best practices.
By using environment variables and a SeldonDeployment YAML file, the author demonstrates a methodical way to manage model deployment configurations.
The use of a custom Python script for testing the deployed model reflects the author's emphasis on practical, hands-on validation of MLOps pipelines.
The author expresses satisfaction with the completed MLOps steps, encouraging readers to continue learning, which implies a positive outlook on the effectiveness of the tools and methods discussed.

MLOps with Kubeflow-pipeline V2, mlflow, Seldon Core : Part4

This is last part of the four parts MLOps series.

Part 1: Introduction to the basic concepts and installation on local system.

Part 2: Understanding the kubeflow pipeline and components.

Part 3: Understanding the Mlflow server UI for logging parameters, code versions, metrics, and output files.

Part 4: Deploying model with Seldon core server over kubernetes.

In this section we will see how to deploy our trained model on local kind kubernetes cluster with the help of seldon. With Seldon Core you can build two type of servers: reusable and non-reusable ones. For this example I will be using non-reusable one.

First we need to build a docker image using Python Seldon Core Module provided by the seldon. Your source code should contain a python file which defines a class of the same name as the file. In our case our python file is named Inference.py , hence we put the MODEL_NAME variable to Inference. For more information on this concept please check Link.

➜  ~ cat Dockerfile

FROM python:3.10-slim
WORKDIR /app
RUN apt-get update \
    && rm -rf /var/lib/apt/lists/*

# Install python packages
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

# Copy source code
COPY . .

# Port for GRPC
EXPOSE 5000
# Port for REST
EXPOSE 9000

# Define environment variables
ENV MODEL_NAME Inference
ENV SERVICE_TYPE MODEL

# Enviorment variable to be replaced by the model_uri at the runtime in k8s. 
ENV MODEL_URI "s3://modeloutput/experiments/1/b3b05490b57b4a5ea7373a77e8b09dcd/artifacts/CNN-TinyVG"
# Changing folder to default user
RUN chown -R 8888 /app

CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE

➜  ~ cat Inference.py

import os
import numpy as np
import torch
import mlflow
import model_builder
import random
import base64
import PIL.Image as Image
from io import BytesIO

class Inference(object):
    def __init__(self):
        self.device = 'cpu'
        os.environ["MLFLOW_S3_ENDPOINT_URL"] = "http://minio-service.kubeflow:9000"
        os.environ["AWS_ACCESS_KEY_ID"] = "minio"
        os.environ["AWS_SECRET_ACCESS_KEY"] = "minio123"
        self.model_uri = os.environ['MODEL_URI']
        self.model = mlflow.pytorch.load_model(self.model_uri)
        self.class_names = ["pizza", "steak", "sushi"]


    def predict(self, X, feature_names):
        X = X[0].encode()
        im_bytes = base64.b64decode(X)
        im_file = BytesIO(im_bytes)
        img = Image.open(im_file)
        img = np.asarray(img.resize((64, 64))).astype(np.float32) / 255.0
        if img.ndim == 2:
            img = np.stack([img, img, img], axis=-1)
        img = np.transpose(img, (-1, 0, 1))
        img = np.expand_dims(img, axis=0)
        img = torch.from_numpy(img).to(self.device)
        with torch.inference_mode():
         pred_logits = self.model(img)
         pred_prob = torch.softmax(pred_logits, dim=1).numpy()
        return [pred_prob.tolist()]

➜  ~ docker image build -t mohitverma1688/seldon_serve:v0.1 .

[+] Building 1.8s (10/10) FINISHED
 => [internal] load build definition from Dockerfile                                                                                         0.0s
 => => transferring dockerfile: 304B                                                                                                         0.0s
 => [internal] load .dockerignore                                                                                                            0.0s
 => => transferring context: 2B                                                                                                              0.0s
 => [internal] load metadata for docker.io/library/python:3.10-slim                                                                          1.7s
 => [auth] library/python:pull token for registry-1.docker.io                                                                                0.0s
 => [internal] load build context                                                                                                            0.0s
 => => transferring context: 151B                                                                                                            0.0s
 => [1/4] FROM docker.io/library/python:3.10-slim@sha256:7de57d5840f51e10462ba0eed4c4e8aa26f28b44cedbf0fc8dd7fd77ea59c7d9                    0.0s
 => => resolve docker.io/library/python:3.10-slim@sha256:7de57d5840f51e10462ba0eed4c4e8aa26f28b44cedbf0fc8dd7fd77ea59c7d9                    0.0s
 => CACHED [2/4] RUN apt-get update     && rm -rf /var/lib/apt/lists/*                                                                       0.0s
 => CACHED [3/4] COPY requirements.txt requirements.txt                                                                                      0.0s
 => CACHED [4/4] RUN pip install --no-cache-dir -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu                   0.0s
 => exporting to image                                                                                                                       0.0s
 => => exporting layers                                                                                                                      0.0s
 => => writing image sha256:563442d7737f6eb249787c08865279e2a26a9299a21ad53c6f9f9476e90157bf                                                 0.0s
 => => naming to docker.io/mohitverma1688/seldon_serve:v0.1

If you remember in the part2, the last component model_eval results in a output parameter with name model_uri. We will need to copy that string and replace in the seldon custom CRD yaml file.

model_uri location stored in minio bucket

Now we will use the build image tag in previous step and copied model_uri enviornment variable to deploy the seldon deployment CRD.

➜  ~ cat seldon_deployment.yaml
---
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  labels:
    app: seldon
  name: cnn-tinyvg-seldon
  namespace: kubeflow
spec:
  annotations:
    project_name: CNN Pipeline
    deployment_version: v1
  name: "cnn-tinvg-v1"
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: mohitverma1688/seldon_serve:v0.1
          imagePullPolicy: IfNotPresent
          name: inference
          resources:
            requests:
              memory: 1Mi
          env:
            - name: MODEL_URI
              value: "s3://modeloutput/experiments/1/b3b05490b57b4a5ea7373a77e8b09dcd/artifacts/CNN-TinyVG"
    graph:
      children: []
      name: inference
      endpoint:
        type: REST
      type: MODEL
    name: cnn-tinyvg-v1
    replicas: 1
    annotations:
      predictor_version: "1.0"
      seldon.io/svc-name: tinyvg-svc

Now deploy the crd and wait for the deployment pods to be up and running. This deployment serves the inference requests on its endpoint (default: 9000) http://localhost:9000/api/v1.0/predictions

➜  ~ kubectl create -f seldon_deployment.yaml

➜  ~ kubectl get seldondeployment -n kubeflow

NAME                AGE
cnn-tinyvg-seldon   5m42s

➜  ~ kubectl get pods -n kubeflow | grep seldon
cnn-tinyvg-seldon-cnn-tinyvg-v1-0-inference-6f748bcc7d-fbmcz   2/2     Running     0             3m48s

We have successfully deployed our model, now we will test this model with the custom python script . I have created a python script testing.py which takes two argument and to send the request to our server and get the prediction response.

➜  ~ kubectl port-forward svc/cnn-tinyvg-seldon-cnn-tinyvg-v1-inference 8004:9000 -n kubeflow

➜  ~ python3 testing.py http://localhost:8004/api/v1.0/predictions ./1063878.jpg
sending image ./1063878.jpg to http://localhost:8004/api/v1.0/predictions
caught response <Response [200]>
{'data': {'names': ['pizza', 'steak', 'sushi'], 'ndarray': [[[0.370800644159317, 0.33560964465141296, 0.29358971118927]]]}, 'meta': {'requestPath': {'inference': 'mohitverma1688/seldon_serve:v0.1'}}

Finally, we have completed our MLOps steps using kubeflow, Mlflow and Seldon. Happy learning :) .