MLOps with Kubeflow-pipeline V2, mlflow, Seldon Core : Part1
This is first part of the four parts MLOps series.
Part 1: Introduction to the basic concepts and installation on local system.
Part 2: Understanding the kubeflow pipeline and components.
Part 3: Understanding the Mlflow server UI for logging parameters, code versions, metrics, and output files.
Part 4: Deploying model with Seldon core server over kubernetes.
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ok7cy-Fl7DLt1yAOe9iFWg.png)
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*6eG4tEWbQtRZ-D1sm6LzPw.png)
MLOps is a very trending topic among Machine Learning engineers and data scientists. It is basically setting up a workflow for training, testing and making the model available to production.
I am not a MLOps engineer but a telecom cloud engineer and learning new tech is my hobby, this tutorial will only explain the basics of MLOPS and may not provide full production grade deployment practices. In this example I have created/trained model using PyTorch framework and deployed CNN model architecture copying tinyVGG from CNN Explainer. The results of this training and inference are not great as the dataset used to train is really small.
PyTorch: PyTorch is an optimized Deep Learning tensor library based on Python and Torch and is mainly used for applications using GPUs and CPUs. PyTorch is favored over other Deep Learning frameworks like TensorFlow and Keras since it uses dynamic computation graphs and is completely Pythonic.
To enjoy the 25 hours pytorch tutorial by freecodecamp on youtube please click here. Highly recommend this course to the beginners like me. Some basic knowledge of machine learning is needed for this tutorial.
Kubeflow: Kubeflow is a community and ecosystem of open-source projects to address each stage in the machine learning (ML) lifecycle. It makes ML on Kubernetes simple, portable, and scalable. The goal of Kubeflow is to facilitate the orchestration of Kubernetes ML workloads and to empower users to deploy best-in-class open-source tools on any Cloud infrastructure. In this tutorial kubeflow pipeline v2.2 part of kubeflow will be used.
MLflow: Whether you’re an individual researcher, a member of a large team, or somewhere in between, MLflow provides a unified platform to navigate the intricate maze of model development, deployment, and management. MLflow aims to enable innovation in ML solution development by streamlining otherwise cumbersome logging, organization, and lineage concerns that are unique to model development. This focus allows you to ensure that your ML projects are robust, transparent, and ready for real-world challengese.
Seldon: Seldon core converts your ML models (Tensorflow, Pytorch, H2o, etc.) or language wrappers (Python, Java, etc.) into production REST/GRPC microservices.
Setting up the local environment.
- Create kubernetes cluster with kind.
# Create the kind cluster ➜ ~ kind version kind v0.21.0 go1.21.6 darwin/arm64 ➜ ~ kind create cluster --name kubeflow ➜ ~ kubectl version Client Version: v1.29.1 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.1
2. Install the kubeflow components and access the kubeflow UI and minio UI.
➜ ~ kubectl apply -k 'github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic?timeout=120&ref=2.2.0' ➜ ~ kubectl get pods -n kubeflow - NAME READY STATUS RESTARTS AGE cache-deployer-deployment-cf9646b9c-jt5n5 1/1 Running 0 44d cache-server-56d4959c9-m9h8p 1/1 Running 0 44d metadata-envoy-deployment-9c7db86d8-9kkc8 1/1 Running 0 44d metadata-grpc-deployment-d94cc8676-n96ds 1/1 Running 1 (44d ago) 44d metadata-writer-cd5dd8f7-qnm2q 1/1 Running 3 (22h ago) 44d minio-5dc6ff5b96-2c8g8 1/1 Running 0 44d ml-pipeline-64d6db5897-jr5jw 1/1 Running 0 44d ml-pipeline-persistenceagent-77947c888d-7drjn 1/1 Running 0 44d ml-pipeline-scheduledworkflow-676478b778-6t96t 1/1 Running 0 44d ml-pipeline-ui-87b9d4fb6-6hts2 1/1 Running 2 (22h ago) 44d ml-pipeline-viewer-crd-8574556b89-82t9d 1/1 Running 0 44d ml-pipeline-visualizationserver-5d7c54f495-zzqhr 1/1 Running 0 44d my-mlflow-6689ff755d-5b7fp 1/1 Running 0 5h49m mysql-5b446b5744-gtpt6 1/1 Running 0 44d workflow-controller-66d557786-sxkll 1/1 Running 1 (22h ago) 35 ➜ ~ kubectl port-forward svc/ml-pipeline-ui 8002:80 -n kubeflow
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*kmtxWuqBD1JZpgYhOfWVLg.png)
=> Access the minio S3 storage UI.
➜ ~ kubectl port-forward svc/minio-service 9000:8081 -n kubeflow
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*pRpRzAPecWGZpNO-j2LkZg.png)
3. Install the Mlflow server with helm .
➜ ~ helm repo add community-charts https://community-charts.github.io/helm-charts # edit the S3 minio url and the credentials in the values.yaml. extraEnvVars: AWS_ACCESS_KEY_ID: AWS_SECRET_ACCESS_KEY: artifactRoot: s3: # -- Specifies if you want to use AWS S3 Mlflow Artifact Root enabled: true # -- S3 bucket name bucket: "modeloutput" # required ➜ ~ helm install my-mlflow community-charts/mlflow --version 0.7.19 -f mlflow-values.yaml -n kubeflow ➜ ~ kubectl get pods -n kubeflow | grep mlflow my-mlflow-6689ff755d-5b7fp 1/1 Running 0 5h57m ➜ ~ kubectl port-forward svc/my-mlflow 8004:9000
=> Access the mlflow server UI.
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*aQsnkmCzv8n48qs6Y1etNQ.png)
4. Install the seldon-core on k8s cluster using helm.
➜ ~ kubectl create namespace seldon-system ➜ ~ helm install seldon-core seldon-core-operator --repo https://storage.googleapis.com/seldon-charts --set usageMetrics.enabled=true --set istio.enabled=false --namespace seldon-system ➜ ~ kubectl get pods -n seldon-system NAME READY STATUS RESTARTS AGE seldon-controller-manager-65f8dbf9bc-h9fdb 1/1 Running 1 (23h ago) 3d8h
5. Create the conda virtual environment and install the dependencies for creating the environment .
python 3.10.13 kfp 2.7.0 kfp-kubernetes 1.2.0 kfp-pipeline-spec 0.3.0 kfp-server-api 2.0.5 typing-extensions 4.9.0 typing_extensions 4.9.0 jupyter 1.0.0
6. Creating a image to use that image as base image for the components creation in the kfp pipeline. KFP’s Components are explained later.
The need of creating this base image is to reduce the size of the image for quicker download and install during pipeline runtime.
➜ ~ cat requirements.txt
torch
kfp-kubernetes
pathlib
boto3
mlflow
requests
pillow
numpy
typing
➜ ~ cat Dockerfile
FROM python:3.10-slim
RUN apt-get update \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu
➜ ~ docker image build -t mohitverma1688/model_train_component .
➜ ~ docker image ls
mohitverma1688/model_train_component v0.1 81bbc346d378 4 weeks ago 782MB
Creating KFP component directory structure and building component Docker image.
Components are the building blocks of KFP pipelines. A component is a remote function definition; it specifies inputs, has user-defined logic in its body, and can create outputs. When the component template is instantiated with input parameters, we call it a task. For this pipeline I have used “Containerized Python Components”.
src
└── components
├── data_download
│ ├── data_download_component.py
├── model_eval
│ ├── model_eval_component.py
│ └── utils.py
├── model_inference
│ ├── data_setup.py
│ ├── engine.py
│ ├── model_builder.py
│ ├── model_inference.py
│ ├── model_inference_component.py
│ └── utils.py
├── model_train_cnn
│ ├── data_setup.py
│ ├── engine.py
│ ├── model_builder.py
│ ├── model_train.py
│ ├── model_train_component.py
│ └── utils.py
└── register_model
├── model_builder.py
├── register_model_component.py
In a Containerized Python Component, base_image specifies the base image that KFP will use when building your new container image. Specifically, KFP uses the base_image argument for the FROM instruction in the Dockerfile used to build your image.
Now that the code is in a standalone directory as above , we can conveniently build an image using the kfp component build CLI command for each component in the components directory. Refer model_train example below.
%%writefile src/components/model_train_cnn/model_train_component.py
from kfp import dsl
from kfp import compiler
from typing import Dict
from kfp.dsl import Dataset,Output,Artifact,OutputPath,InputPath, Model,HTML
@dsl.component(base_image='mohitverma1688/model_train_component:v0.1',
target_image='mohitverma1688/model_train_component:v0.24',
packages_to_install=['pandas','matplotlib']
)
def model_train(num_epochs:int,
batch_size:int,
hidden_units:int,
learning_rate: float,
train_dir: str,
test_dir: str,
model_name: str,
....
Note: For simplicity only a snippet of code is pasted.
As you can see that I have used previous created docker image as “base image” tag. Now using the kfp component build command , will produce additional artifacts mainly the Dockerfile and the runtime-requirements.txt. You can specify the target_image section to push the docker image to your registry directly.
!kfp component build src/components/model_train_cnn --component-filepattern model_train_component.py
src/components/model_train_cnn
├── Dockerfile
├── component_metadata
│ └── model_train.yaml
├── data_setup.py
├── engine.py
├── kfp_config.ini
├── model_builder.py
├── model_train.py
├── model_train_component.py
├── runtime-requirements.txt
└── utils.py
➜ ~ cat Dockerfile
# Generated by KFP.
FROM mohitverma1688/model_train_component:v0.2
WORKDIR /usr/local/src/kfp/components
COPY runtime-requirements.txt runtime-requirements.txt
RUN pip install --no-cache-dir -r runtime-requirements.txt
RUN pip install --no-cache-dir kfp==2.7.0
COPY . .
➜ ~ cat runtime-requirements.txt
# Generated by KFP.
matplotlib
pandas%
In the next part I will explain each component of the pipeline in details :)