avatarAnupam Chugh

Summary

The article demonstrates how to integrate PencilKit and Core ML to create an iOS application that recognizes hand-drawn digits using the MNIST dataset.

Abstract

The article discusses the integration of Apple's PencilKit framework with Core ML to develop an iOS application capable of recognizing hand-drawn digits. It introduces the PencilKit framework, which simplifies the implementation of drawing capabilities in iOS and iPadOS 13 applications, and outlines the three main components required for setup: PKCanvasView, PKDrawingView, and PKToolPicker. The MNIST dataset, consisting of 60,000 images of handwritten digits, is used for digit recognition. The article provides detailed code snippets for setting up the PencilKit canvas, tool picker, and navigation bar buttons, as well as preprocessing the drawing input to prepare it for Core ML. The final sections cover image preprocessing, including resizing and converting to grayscale, and using the Core ML model for prediction. The article concludes with a link to the full source code on GitHub and expresses enthusiasm for the potential of on-device machine learning.

Opinions

  • The author believes that the introduction of the PencilKit framework during WWDC 19 was a significant benefit for developers.
  • The article suggests that the combination of PencilKit and Core ML opens up exciting possibilities for on-device machine learning applications beyond just digit recognition.
  • The author assumes that the reader has been gifted a ready-made Core ML MNIST Model, indicating a focus on practical implementation rather than model training.
  • The author emphasizes the importance of preprocessing the drawing input correctly to match the Core ML model's expected input format.
  • The GIF demonstrating the final outcome is presented as a compelling illustration of the application's capabilities, suggesting a strong visual component to the article's appeal.
  • The author provides a subjective opinion by stating, "That’s it for this one. I hope you enjoyed reading," indicating a desire for reader engagement and satisfaction.

PencilKit Meets Core ML

Recognizing digits from drawing using MNIST

A sketch from our iOS Application

This story was originally published on my Substack, iOSDevie.

Introduction of the PencilKit framework during WWDC 19 was a boon for developers looking to leverage the drawing framework in their iOS and iPadOS 13 applications. Three actors play a major role in setting up the PencilKit framework in our applications. They are :

  • PKCanvasView
  • PKDrawingView
  • PKToolPicker

In the following sections, we’ll be using the PencilKit framework and Core ML framework together in order to recognize digits from drawing.

Our Goal

  • Setting Up a PencilKit framework based iOS Application.
  • Using the famous MNIST dataset to recognize digits drawn on the PencilKit canvas.
  • Leveraging the Core ML framework to predict and display the drawn digits.

MNSIT: A Quick Word

The MNIST dataset is an image dataset consisting of around 60,000 images of handwritten digits with dimensions 28 x 28 in grayscale.

The images are of the size 20 x 20 and are normalized to fit in the center of the box. The accuracy works best when the digits are centered in the input image.

We won’t be digging deep into the model layers and training the dataset in this article. Let’s assume we were gifted a ready-made Core ML MNSIT Model.

Our Final Destination

An image is worth a thousand words. A GIF is composed of thousands of images. Here’s the final outcome you’ll get by the end of this piece.

Final outcome

Setting Up

Before Core ML asks out the PencilKit framework on a date, let’s get our PencilKit framework dressed.

Setting up the canvas

It’s really easy to set up the PKCanvasView in our application, as the following code shows:

let canvasView = PKCanvasView(frame: .zero)
canvasView.backgroundColor = .black
canvasView.translatesAutoresizingMaskIntoConstraints = false
view.addSubview(canvasView)
NSLayoutConstraint.activate([
   canvasView.topAnchor.constraint(equalTo: navigationBar.bottomAnchor),
   canvasView.bottomAnchor.constraint(equalTo: view.bottomAnchor),
   canvasView.leadingAnchor.constraint(equalTo: view.leadingAnchor),
   canvasView.trailingAnchor.constraint(equalTo: view.trailingAnchor),
])

Setting our tool picker

The ToolPicker is responsible for displaying the various brushes in our application. It provides ink, pencil, selection, eraser tools along with an option to undo and redo(this is available on iPadOS only owing to the size of the screen).

The following code shows how to set up the ToolPicker UI in our application:

override func viewDidAppear(_ animated: Bool) {
    super.viewDidAppear(animated)
guard
     let window = view.window,
     let toolPicker = PKToolPicker.shared(for: window) else {return}
toolPicker.setVisible(true, forFirstResponder: canvasView)
    toolPicker.addObserver(canvasView)
    canvasView.becomeFirstResponder()
}

Setting our navigation bar buttons

The navigation bar was already added to the storyboard. In the following code, we’ve added a few action buttons to it.

func setNavigationBar() {
        if let navItem = navigationBar.topItem{
            
            let detectItem = UIBarButtonItem(title: "Detect", style: .done, target: self, action: #selector(detectImage))
            let clearItem = UIBarButtonItem(title: "Clear", style: .plain, target: self, action: #selector(clear))
navItem.rightBarButtonItems = [clearItem,detectItem]
            navItem.leftBarButtonItem = UIBarButtonItem(title: "", style: .plain, target: self, action: nil)
            
        }
}

The left bar button is where the final predicted output is displayed.

Preprocessing the Drawing Input

In order to feed the PencilKit drawings to the CoreML framework. We first need to extract the image from the canvas. Let’s see how’s that done.

  • Converting thePKDrawing instance into a UIImage is straightforward. The real challenge is in preprocessing it for the Core ML Model.
  • The UIImage we get from the PKDrawing contains just the drawn image with no padding.
  • We need to create an image with the size of the view and overlay the UIImage from the PKDrawing in the center of it. Basically a UIImage within a UIImage.

The following code does that for you:

func preprocessImage() -> UIImage{
        var image = canvasView.drawing.image(from: canvasView.drawing.bounds, scale: 10.0)
        if let newImage = UIImage(color: .black, size: CGSize(width: view.frame.width, height: view.frame.height)){
if let overlayedImage = newImage.image(byDrawingImage: image, inRect: CGRect(x: view.center.x, y: view.center.y, width: view.frame.width, height: view.frame.height)){
                image = overlayedImage
            }
        }
}

The following helper extensions functions were used in the above code:

extension UIImage {
    
    public convenience init?(color: UIColor, size: CGSize = CGSize(width: 1, height: 1)) {
        let rect = CGRect(origin: .zero, size: size)
        UIGraphicsBeginImageContextWithOptions(rect.size, false, 0.0)
        color.setFill()
        UIRectFill(rect)
        let image = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()
guard let cgImage = image?.cgImage else { return nil }
        self.init(cgImage: cgImage)
    }
func image(byDrawingImage image: UIImage, inRect rect: CGRect) -> UIImage! {
        UIGraphicsBeginImageContext(size)
draw(in: CGRect(x: 0, y: 0, width: size.width, height: size.height))
        image.draw(in: rect)
        let result = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()
        return result
    }
}
extension CGRect {
    var center: CGPoint { return CGPoint(x: midX, y: midY) }
}

Prediction Using Core ML

Now that the image is input-ready, we need to do the following three things:

  1. Resize it to the input size 28 x 28.
  2. Convert it into a CVPixelBuffer in the grayscale color space.
  3. Feed it to the Core ML Model.
private let trainedImageSize = CGSize(width: 28, height: 28)
func predictImage(image: UIImage){
        if let resizedImage = image.resize(newSize: trainedImageSize), let pixelBuffer = resizedImage.toCVPixelBuffer(){
guard let result = try? MNIST().prediction(image: pixelBuffer) else {
            return
        }
            navigationBar.topItem?.leftBarButtonItem?.title = "Predicted: \(result.classLabel)"
            print("result is \(result.classLabel)")
        }
}

The following extension functions were used for the above code:

extension UIImage{
func resize(newSize: CGSize) -> UIImage? {
        UIGraphicsBeginImageContextWithOptions(newSize, false, 0.0)
        self.draw(in: CGRect(x: 0, y: 0, width: newSize.width, height: newSize.height))
        let newImage = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()
        return newImage
    }
    
    
    func toCVPixelBuffer() -> CVPixelBuffer? {
       var pixelBuffer: CVPixelBuffer? = nil
let attr = [kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue,
        kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue] as CFDictionary
        
       let width = Int(self.size.width)
       let height = Int(self.size.height)
CVPixelBufferCreate(kCFAllocatorDefault, width, height, kCVPixelFormatType_OneComponent8, attr, &pixelBuffer)
       CVPixelBufferLockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags(rawValue:0))
let colorspace = CGColorSpaceCreateDeviceGray()
       let bitmapContext = CGContext(data: CVPixelBufferGetBaseAddress(pixelBuffer!), width: width, height: height, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer!), space: colorspace, bitmapInfo: 0)!
guard let cg = self.cgImage else {
           return nil
       }
bitmapContext.draw(cg, in: CGRect(x: 0, y: 0, width: width, height: height))
return pixelBuffer
    }
}

Conclusion

So we managed to use CoreML and PencilKit framework together to determine the sketches drawn using the MNIST dataset. Machine learning on device has plenty of use cases and inferring the drawings is just one of them. You can find the full source code in the Github Repository.

That’s it for this one. I hope you enjoyed reading.

iOS
Machine Learning
Programming
Software Development
Deep Learning
Recommended from ReadMedium