avatarJavier Martínez Ojeda

Summary

The web content provides an overview of object detection datasets, focusing on the conversion between COCO JSON format and YOLOv5 PyTorch TXT format, and introduces a tool for facilitating this conversion to train state-of-the-art YOLOv8 models.

Abstract

The provided web content delves into the intricacies of image annotation formats used in object detection tasks. It specifically addresses the COCO JSON format, widely recognized due to the MS COCO dataset, and the YOLOv5 PyTorch TXT format, which is associated with the YOLOv8 architecture developed by ultralytics. The article outlines the dataset structures for both formats and introduces a script available in the COCO_YOLO_dataset_generator GitHub repository. This script simplifies the conversion of datasets from COCO JSON to YOLOv5 PyTorch TXT format, streamlining the preprocessing of data for training YOLO models. The content underscores the importance of understanding these formats for computer vision tasks and provides a practical guide for users interested in leveraging the YOLOv8 architecture for object detection.

Opinions

  • The MS COCO dataset is highly regarded for its extensive collection of images and annotations, serving as a benchmark in the field of object detection.
  • The YOLOv8 architecture is considered state-of-the-art for real-time object detection and is particularly noted for its accuracy in detecting small objects.
  • Converting datasets from COCO JSON to YOLOv5 PyTorch TXT format is recognized as a valuable step for optimizing the training process of YOLO models, saving time and effort.
  • The COCO_YOLO_dataset_generator repository is highlighted as a useful tool for researchers and practitioners to convert dataset formats, thereby facilitating the use of the YOLOv8 model.
  • The article expresses that understanding the representation of annotations in both COCO JSON and YOLOv5 PyTorch TXT formats is crucial for those working in object detection.

Object Detection: COCO and YOLO Formats, and Conversion Between Them

Learn the structure of COCO and YOLOv5 formats, and how to convert from one to another.

Photo by Matt Briney on Unsplash

Introduction

Image annotations used to train object detection models can have different formats, even if they contain the same information. Among the different formats that exist, two very commonly used are the COCO JSON format and the YOLOv5 PyTorch TXT format. The former owes its fame to the MS COCO dataset [1], released by Microsoft in 2015, which is one of the most widely used for object detection, segmentation and captioning tasks. On the other hand, the popularity of the YOLOv5 PyTorch TXT format is due to the fact that the YOLOv8 architecture (state-of-the-art model for object detection) developed by ultralytics [2], uses it as input.

This article will first introduce the basis of the popularity of both formats, which as explained above are the MS COCO dataset, and ultralytics’ YOLOv8 architecture.

The article will then introduce the structures and components of COCO JSON and YOLOv5 PyTorch TXT formats. Next, it will show the structures of the MS COCO dataset and the dataset expected by ultralytics’ YOLOv8 API, and finally it will explain how to convert a dataset from COCO JSON format to YOLOv5 PyTorch TXT format easily. This last step will be very useful to save work and time in the pre-processing of the data, and thus optimise the training process of the YOLOv8 architecture.

YOLOv8 architecture and COCO dataset

In the field of object detection, ultralytics’ YOLOv8 architecture (from the YOLO [3] family) is the most widely used state-of-the-art architecture today, which includes improvements over previous versions such as the low inference time (real-time detection) and the good accuracy it achieves in detecting small objects.

On the other hand, MS COCO dataset is one of the most widely used datasets for computer vision tasks such as object detection or segmentation. Microsoft released this dataset back in 2015, and included more than 328K images containing objects belonging to 80/91 different classes (80 object categories, 91 stuff categories), as well as annotations for detection, segmentation and captioning. Because of its large number of quality images and distinct classes, as well as the savings in workload that implies having labeled data, it is commonly used as a benchmark when evaluating new architectures for object detection.

COCO dataset and COCO JSON format

Structure of the dataset

dataset_directory
├── annotations
│   ├── instances_train.json
│   ├── instances_val.json
│   └── instances_test.json
│   
└── images
    ├── train
    │   ├── image_00001.jpg
    │   ├── image_00002.jpg
    │   ├── ...
    │   └── image_00100.jpg
    ├── val
    │   ├── image_00101.jpg
    │   ├── image_00102.jpg
    │   ├── ...
    │   └── image_00200.jpg
    └── test
        ├── image_00201.jpg
        ├── image_00202.jpg
        ├── ...
        └── image_00300.jpg

Structure of the COCO JSON annotation files

In the COCO JSON format, there is an JSON annotation file for each directory (image set) inside the /images directory.

These COCO JSON annotation files contain different headers/sections with information about the dataset, the license, the different classes/categories present in the annotations, as well as metadata for each of the images composing the dataset and all the annotations. The following example shows the structure of the COCO JSON annotation files:

{
    "info": {
        "description": "COCO 2017 Dataset","url": "http://cocodataset.org","version": "1.0","year": 2017,"contributor": "COCO Consortium","date_created": "2017/09/01"
    },
    "licenses": [
        {"url": "http://creativecommons.org/licenses/by/2.0/","id": 4,"name": "Attribution License"}
    ],
    "categories": [
        {"supercategory": "animal","id": 0,"name": "cat"},
        {"supercategory": "animal","id": 1,"name": "dog"}
    ],
    "images": [
        {"id": 123456, "license": 4, "coco_url": "http://images.cocodataset.org/val2017/xxxxxxxxxxxx.jpg", "flickr_url": "http://farm3.staticflickr.com/x/xxxxxxxxxxxx.jpg", "width": img_width, "height": img_height, "file_name": "xxxxxxxxx.jpg", "date_captured": "date"},
        {"id": 123457, "license": 4, "coco_url": "http://images.cocodataset.org/val2017/xxxxxxxxxxxx.jpg", "flickr_url": "http://farm1.staticflickr.com/x/xxxxxxxxxxxx.jpg", "width": img_width, "height": img_height, "file_name": "xxxxxxxxx.jpg", "date_captured": "date"}
    ],
    "annotations": [
        {"id": 654321, "category_id": 1, "iscrowd": 0, "segmentation": [[float, float,..........float, float]], "image_id": 123456, "area": float, "bbox": [float, float, float, float]},
        {"id": 654322, "category_id": 0, "iscrowd": 0, "segmentation": [[float, float,..........float, float]], "image_id": 123457, "area": float, "bbox": [float, float, float, float]},
        {"id": 654323, "category_id": 1, "iscrowd": 0, "segmentation": [[float, float,..........float, float]], "image_id": 123457, "area": float, "bbox": [float, float, float, float]},
        {"id": 654324, "category_id": 1, "iscrowd": 0, "segmentation": [[float, float,..........float, float]], "image_id": 123457, "area": float, "bbox": [float, float, float, float]}
    ],
}

For each element (each annotation) inside the “annotations” list, category_id correspond to the class ID of the object, image_id corresponds to the ID of the image containing the object, and bbox correspond to an array with 4 values: x, y , width and height, respectively.

x and y are the x and y coordinate of the top-left corner of the bounding box, and width and height are the width and height of the bounding box.

The other keys of the annotation dictionaries are id (unique ID for that specific annotation), iscrowd (boolean indicating whether or not to label large groups of objects), area (segmentation area) and segmentation (object instance segmentation masks for an image).

Note that, with this format, annotations and information about images are located in separate directories, and the only way to link annotations to their corresponding image is the image_id value of the annotation dictionary.

YOLOv8 dataset and YOLOv5 PyTorch TXT format

Structure of the dataset

dataset_directory
├── test
│   ├── images
│   │   ├── image_00001.png
│   │   ├── image_00002.png
│   │   ├── ...
│   └── labels
│       ├── image_00001.txt
│       ├── image_00002.txt
│       ├── ...
├── train
│   ├── images
│   │   ├── image_00003.png
│   │   ├── image_00004.png
│   │   ├── ...
│   └── labels
│       ├── image_00003.txt
│       ├── image_00004.txt
│       ├── ...
├── valid
│   ├── images
│   │   ├── image_00005.png
│   │   ├── image_00006.png
│   │   ├── ...
│   └── labels
│       ├── image_00005.txt
│       ├── image_00006.txt
│       ├── ...
├── data.yaml
├── test.txt
├── train.txt
└── valid.txt

Structure of the YOLOv5 PyTorch TXT annotation files

YOLOv5 PyTorch TXT format, which is a modified version of the Darknet annotation format [4], stores all the annotations of one image in a single file, with the same filename as the image but with .txt extension. This means that for each image with filename xxxx.png, an annotation/label file with the same name but .txt extension must exist: xxxx.txt.

Each of these label files contain one line of text per annotated object. If, for example, there are 3 annotated objects in an image, the label file will look like this:

0 0.5591796875 0.6618075117370892 0.533578125 0.631596244131455 # First object
0 0.2299453125 0.3176760563380282 0.250109374 0.383849765258216 # Second object
1 0.6085703125 0.2213615023474178 0.237859374 0.167323943661971 # Third object

The first integer represents the class id of the object. The second and third numbers are Cx and Cy, which are the x and y coordinates of the bounding box center. Finally, the last two numbers are w and h, the width and height of the bounding box. Cx, Cy, w and h values are normalized by image size.

This format makes it much easier to access the annotations of a given image, as they will be located in a single file. On the other hand, for the COCO JSON format it would be necessary to iterate over all the annotations, selecting the ones whose image_id is the same as the ID of the image.

How to perform the conversion

For the conversion the COCO_YOLO_dataset_generator repository will be used. This repository contains a script which makes it very easy for the user to convert datasets from COCO JSON to YOLOv5 PyTorch TXT format.

1. Clone the repository

This step will download the repository from GitHub, and copy it inside a directory with the repository’s name: /COCO_YOLO_dataset_generator.

git clone https://github.com/JavierMtz5/COCO_YOLO_dataset_generator.git

2. Install the dependencies

This step will install all the Python libraries required to run the script. As of today, the only dependency of the script is the tqdm library, used to show progress bars during the conversion.

cd COCO_YOLO_dataset_generator
pip install -r requirements.txt

3. Convert the dataset

This last step will execute the script with the parameters required to convert the dataset from COCO JSON format to YOLOv5 PyTorch TXT.

The first parameter is the path to the original dataset relative to the COCO_YOLO_dataset_generator directory. The convert_to_yolo parameter is set to True, as the goal is to convert the dataset format and structure from COCO to YOLO. Finally, the output_dir parameter should be set with the name of the new converted dataset.

python3 coco_to_yolo_extractor.py <path_to_the_original_dataset> --convert_to_yolo true --output_dir <path_to_new_dataset>

4. Train a YOLOv8 model with the new dataset

After the execution of the script, a new directory will be created with the name given by the user for the output_dir parameter. This directory will contain the new dataset, with annotations in YOLOv5 PyTorch TXT format, and the structure expected by the YOLOv8 architecture.

It will be possible to train a YOLOv8 model using the new dataset, by running:

ultralytics library is not included in the requirements.txt file of the repository, so it might be necessary to install it before running the training

from ultralytics import YOLO

model = YOLO('yolov8n.yaml') # Load the YOLOv8 nano architecture

training_result = model.train(data='<dataset_directory>/data.yaml', epochs=100)

Conclusion

COCO JSON and, above all, YOLOv5 PyTorch TXT, are commonly used formats in the field of object detection and, therefore, it is very interesting and useful to know how they represent the annotations. Moreover, the repository that has been used, COCO_YOLO_dataset_generator, helps and facilitates any user to be able to convert a dataset from COCO JSON format to YOLOv5 PyTorch TXT, which can be later used to train any YOLO model between YOLOv5 and YOLOv8. As YOLOv8 is a state-of-the-art architecture, the repository is a useful preprocessing tool for training such models with data originally in COCO JSON format. If you find the repository useful, start it!

Repository

Dataset License

MS COCO images dataset is licensed under a Creative Commons Attribution 4.0 License.

References

[1] COCO dataset https://cocodataset.org/#home [2] Ultralytics YOLOv8 Docs https://docs.ultralytics.com [3] Your Comprehensive Guide to the YOLO Family of Models https://blog.roboflow.com/guide-to-yolo-models/ [4] YOLO Darknet TXT https://roboflow.com/formats/yolo-darknet-txt

Computer Vision
Object Detection
Yolov8
Coco Dataset
Deep Learning
Recommended from ReadMedium