YOLOv3 is a deep learning model for detecting the position and type of objects from an input image, trained on the MS-COCO dataset and can identify 80 categories.
Abstract
YOLOv3 is a machine learning model that can be used with ailia SDK to create AI applications. It can classify objects in one of the 80 categories available and compute bounding boxes for those objects from a single input image. The model has been trained on the MS-COCO dataset and can identify objects such as cars, trucks, people, handbags, and more. YOLOv3 can be used for various applications such as counting people entering a store, monitoring occupancy ratio in a restaurant or road traffic, detecting abandoned bicycles, and detecting access to dangerous areas.
Opinions
YOLOv3 is a powerful model for object detection and recognition.
The use of ailia SDK makes it easy to create AI applications using YOLOv3.
The model can be used for various applications, making it versatile and practical.
The high recognition accuracy of the standard model YOLOv3-416 makes it a reliable choice for object detection.
The tiny model, while having a slightly lower recognition accuracy, runs faster and can be used for applications that require real-time processing.
YOLOv3 : A Machine Learning Model to Detect the Position and Type of an Object
This is an introduction to「YOLOv3」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.
Overview
YOLOv3 is an deep learning model for detecting the position and the type of an object from the input image. It can classify objects in one of the 80 categories available (eg. car, person, motorbike…), and compute bounding boxes for those objects from a single input image.
Below is a sample video of YOLOv3 recognition. It is able to detect cars, trucks, people, handbacks, and more.
Example applications of this model include counting people entering a store, monitoring occupancy ratio in a restaurant or road traffic, detecting abandoned bicycles, and detecting access to dangerous areas.
There are two types of YOLOv3 models: the standard model, which has a high recognition accuracy, and the tiny model, which has a slightly lower recognition accuracy, but runs faster.
The mAP (accuracy) of the standard model YOLOv3–416 is 55.3 and the mAP of the tiny model is 33.1. The FLOPS (computational power) are 65.86 Bn and 5.56 Bn, respectively.
Using YOLOv3 with ailia SDK
The ailia SDK supports both models with the Detector API since version 1.2.1. ailia SDK makes it possible to use YOLOv3 from Python and Unity on Windows, Mac, iOS, Android and Linux.
This is a sample of running YOLOv3 using ailia SDK and Python.
In case you are using YOLOv3 with the ailia SDK in Python, you need to first load the model using the ailia.Detector API, feed in an image using the compute API, then simply get the count of detected objects using the get_object_count API, and the bounding box and categories using the get_object API.
When using YOLOv3 from Unity, use the AiliaDetectorModel class to load the model, then use the ComputeFromImage API to feed it an image, and finally get a list of detected objects, bounding boxes and categories.
The Unity Package of the ailia SDK includes scenes that use YOLOv3 and can be used out-of-the-box on Windows, Mac, Linux, iOS and Android.
Training the model on your own dataset
Now we will discuss how to run YOLOv3 trained on our own dataset using the ailia SDK.
A Keras implementation is available in the following repositories and can be used to convert Darknet models into a form that can be used in the ailia SDK or re-trained on your own dataset.
Models trained with Darknet can be converted to hdf5 files using keras-yolo3 with the following command. hdf5 to ONNX can be found in the next section.
To retrain the model on your own dataset, use the script train.py from the keras-yolo3 repository. The format of the annotated data needed to train keras-yolo3 is: one line per image, each line containing the path to the image file, the coordinates of the bounding box (x1,y1)-(x2,y2), followed by the category index. If there are multiple bounding boxes per image, separated each one of them with a space. The coordinates will be in pixel coordinates.
Next, create a text file containing the categories. Categories are listed in order starting at index 0. For example if one category is for face of people, you can simply list the category as below.
face
With those files ready you can now start the training.
python train.py
The paths of those input files being hardcoded in the script train.py, make sure to update them before running the script.
annotation_path = ‘train.txt’ # update here
log_dir = ‘logs/000/’
classes_path = ‘model_data/voc_classes.txt’ # update here
anchors_path = ‘model_data/yolo_anchors.txt’
Next you can create an hdf5 file.
Conversion from hdf5 to ONNX
To be used with ailia SDK, you need to convert hdf5 files to ONNX, using the following code from keras2onnx.
Using the repository above, you can convert in ONNX with the following command. Please use Keras 2.2.4, Tensorflow 1.13.2, and keras2onnx 1.5.1 for the conversion.
ax Inc. has developed ailia SDK, which enables cross-platform, GPU-based rapid inference.
ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to contact us for any inquiry.