Object Detection: COCO and YOLO Formats, and Conversion Between Them
Learn the structure of COCO and YOLOv5 formats, and how to convert from one to another.
Introduction
Image annotations used to train object detection models can have different formats, even if they contain the same information. Among the different formats that exist, two very commonly used are the COCO JSON format and the YOLOv5 PyTorch TXT format. The former owes its fame to the MS COCO dataset [1], released by Microsoft in 2015, which is one of the most widely used for object detection, segmentation and captioning tasks. On the other hand, the popularity of the YOLOv5 PyTorch TXT format is due to the fact that the YOLOv8 architecture (state-of-the-art model for object detection) developed by ultralytics [2], uses it as input.
This article will first introduce the basis of the popularity of both formats, which as explained above are the MS COCO dataset, and ultralytics’ YOLOv8 architecture.
The article will then introduce the structures and components of COCO JSON and YOLOv5 PyTorch TXT formats. Next, it will show the structures of the MS COCO dataset and the dataset expected by ultralytics’ YOLOv8 API, and finally it will explain how to convert a dataset from COCO JSON format to YOLOv5 PyTorch TXT format easily. This last step will be very useful to save work and time in the pre-processing of the data, and thus optimise the training process of the YOLOv8 architecture.
YOLOv8 architecture and COCO dataset
In the field of object detection, ultralytics’ YOLOv8 architecture (from the YOLO [3] family) is the most widely used state-of-the-art architecture today, which includes improvements over previous versions such as the low inference time (real-time detection) and the good accuracy it achieves in detecting small objects.
On the other hand, MS COCO dataset is one of the most widely used datasets for computer vision tasks such as object detection or segmentation. Microsoft released this dataset back in 2015, and included more than 328K images containing objects belonging to 80/91 different classes (80 object categories, 91 stuff categories), as well as annotations for detection, segmentation and captioning. Because of its large number of quality images and distinct classes, as well as the savings in workload that implies having labeled data, it is commonly used as a benchmark when evaluating new architectures for object detection.
COCO dataset and COCO JSON format
Structure of the dataset
dataset_directory
├── annotations
│ ├── instances_train.json
│ ├── instances_val.json
│ └── instances_test.json
│
└── images
├── train
│ ├── image_00001.jpg
│ ├── image_00002.jpg
│ ├── ...
│ └── image_00100.jpg
├── val
│ ├── image_00101.jpg
│ ├── image_00102.jpg
│ ├── ...
│ └── image_00200.jpg
└── test
├── image_00201.jpg
├── image_00202.jpg
├── ...
└── image_00300.jpg
Structure of the COCO JSON annotation files
In the COCO JSON format, there is an JSON annotation file for each directory (image set) inside the /images directory.
These COCO JSON annotation files contain different headers/sections with information about the dataset, the license, the different classes/categories present in the annotations, as well as metadata for each of the images composing the dataset and all the annotations. The following example shows the structure of the COCO JSON annotation files:
{
"info": {
"description": "COCO 2017 Dataset","url": "http://cocodataset.org","version": "1.0","year": 2017,"contributor": "COCO Consortium","date_created": "2017/09/01"
},
"licenses": [
{"url": "http://creativecommons.org/licenses/by/2.0/","id": 4,"name": "Attribution License"}
],
"categories": [
{"supercategory": "animal","id": 0,"name": "cat"},
{"supercategory": "animal","id": 1,"name": "dog"}
],
"images": [
{"id": 123456, "license": 4, "coco_url": "http://images.cocodataset.org/val2017/xxxxxxxxxxxx.jpg", "flickr_url": "http://farm3.staticflickr.com/x/xxxxxxxxxxxx.jpg", "width": img_width, "height": img_height, "file_name": "xxxxxxxxx.jpg", "date_captured": "date"},
{"id": 123457, "license": 4, "coco_url": "http://images.cocodataset.org/val2017/xxxxxxxxxxxx.jpg", "flickr_url": "http://farm1.staticflickr.com/x/xxxxxxxxxxxx.jpg", "width": img_width, "height": img_height, "file_name": "xxxxxxxxx.jpg", "date_captured": "date"}
],
"annotations": [
{"id": 654321, "category_id": 1, "iscrowd": 0, "segmentation": [[float, float,..........float, float]], "image_id": 123456, "area": float, "bbox": [float, float, float, float]},
{"id": 654322, "category_id": 0, "iscrowd": 0, "segmentation": [[float, float,..........float, float]], "image_id": 123457, "area": float, "bbox": [float, float, float, float]},
{"id": 654323, "category_id": 1, "iscrowd": 0, "segmentation": [[float, float,..........float, float]], "image_id": 123457, "area": float, "bbox": [float, float, float, float]},
{"id": 654324, "category_id": 1, "iscrowd": 0, "segmentation": [[float, float,..........float, float]], "image_id": 123457, "area": float, "bbox": [float, float, float, float]}
],
}
For each element (each annotation) inside the “annotations” list, category_id correspond to the class ID of the object, image_id corresponds to the ID of the image containing the object, and bbox correspond to an array with 4 values: x, y , width and height, respectively.
x and y are the x and y coordinate of the top-left corner of the bounding box, and width and height are the width and height of the bounding box.
The other keys of the annotation dictionaries are id (unique ID for that specific annotation), iscrowd (boolean indicating whether or not to label large groups of objects), area (segmentation area) and segmentation (object instance segmentation masks for an image).
Note that, with this format, annotations and information about images are located in separate directories, and the only way to link annotations to their corresponding image is the image_id value of the annotation dictionary.
YOLOv8 dataset and YOLOv5 PyTorch TXT format
Structure of the dataset
dataset_directory
├── test
│ ├── images
│ │ ├── image_00001.png
│ │ ├── image_00002.png
│ │ ├── ...
│ └── labels
│ ├── image_00001.txt
│ ├── image_00002.txt
│ ├── ...
├── train
│ ├── images
│ │ ├── image_00003.png
│ │ ├── image_00004.png
│ │ ├── ...
│ └── labels
│ ├── image_00003.txt
│ ├── image_00004.txt
│ ├── ...
├── valid
│ ├── images
│ │ ├── image_00005.png
│ │ ├── image_00006.png
│ │ ├── ...
│ └── labels
│ ├── image_00005.txt
│ ├── image_00006.txt
│ ├── ...
├── data.yaml
├── test.txt
├── train.txt
└── valid.txt
Structure of the YOLOv5 PyTorch TXT annotation files
YOLOv5 PyTorch TXT format, which is a modified version of the Darknet annotation format [4], stores all the annotations of one image in a single file, with the same filename as the image but with .txt extension. This means that for each image with filename xxxx.png, an annotation/label file with the same name but .txt extension must exist: xxxx.txt.
Each of these label files contain one line of text per annotated object. If, for example, there are 3 annotated objects in an image, the label file will look like this:
0 0.5591796875 0.6618075117370892 0.533578125 0.631596244131455 # First object
0 0.2299453125 0.3176760563380282 0.250109374 0.383849765258216 # Second object
1 0.6085703125 0.2213615023474178 0.237859374 0.167323943661971 # Third object
The first integer represents the class id of the object. The second and third numbers are Cx and Cy, which are the x and y coordinates of the bounding box center. Finally, the last two numbers are w and h, the width and height of the bounding box. Cx, Cy, w and h values are normalized by image size.
This format makes it much easier to access the annotations of a given image, as they will be located in a single file. On the other hand, for the COCO JSON format it would be necessary to iterate over all the annotations, selecting the ones whose image_id is the same as the ID of the image.
How to perform the conversion
For the conversion the COCO_YOLO_dataset_generator repository will be used. This repository contains a script which makes it very easy for the user to convert datasets from COCO JSON to YOLOv5 PyTorch TXT format.
1. Clone the repository
This step will download the repository from GitHub, and copy it inside a directory with the repository’s name: /COCO_YOLO_dataset_generator.
git clone https://github.com/JavierMtz5/COCO_YOLO_dataset_generator.git
2. Install the dependencies
This step will install all the Python libraries required to run the script. As of today, the only dependency of the script is the tqdm library, used to show progress bars during the conversion.
cd COCO_YOLO_dataset_generator
pip install -r requirements.txt
3. Convert the dataset
This last step will execute the script with the parameters required to convert the dataset from COCO JSON format to YOLOv5 PyTorch TXT.
The first parameter is the path to the original dataset relative to the COCO_YOLO_dataset_generator directory. The convert_to_yolo parameter is set to True, as the goal is to convert the dataset format and structure from COCO to YOLO. Finally, the output_dir parameter should be set with the name of the new converted dataset.
python3 coco_to_yolo_extractor.py <path_to_the_original_dataset> --convert_to_yolo true --output_dir <path_to_new_dataset>
4. Train a YOLOv8 model with the new dataset
After the execution of the script, a new directory will be created with the name given by the user for the output_dir parameter. This directory will contain the new dataset, with annotations in YOLOv5 PyTorch TXT format, and the structure expected by the YOLOv8 architecture.
It will be possible to train a YOLOv8 model using the new dataset, by running:
ultralytics library is not included in the requirements.txt file of the repository, so it might be necessary to install it before running the training
from ultralytics import YOLO
model = YOLO('yolov8n.yaml') # Load the YOLOv8 nano architecture
training_result = model.train(data='<dataset_directory>/data.yaml', epochs=100)
Conclusion
COCO JSON and, above all, YOLOv5 PyTorch TXT, are commonly used formats in the field of object detection and, therefore, it is very interesting and useful to know how they represent the annotations. Moreover, the repository that has been used, COCO_YOLO_dataset_generator, helps and facilitates any user to be able to convert a dataset from COCO JSON format to YOLOv5 PyTorch TXT, which can be later used to train any YOLO model between YOLOv5 and YOLOv8. As YOLOv8 is a state-of-the-art architecture, the repository is a useful preprocessing tool for training such models with data originally in COCO JSON format. If you find the repository useful, start it!
Repository
Dataset License
MS COCO images dataset is licensed under a Creative Commons Attribution 4.0 License.
References
[1] COCO dataset https://cocodataset.org/#home [2] Ultralytics YOLOv8 Docs https://docs.ultralytics.com [3] Your Comprehensive Guide to the YOLO Family of Models https://blog.roboflow.com/guide-to-yolo-models/ [4] YOLO Darknet TXT https://roboflow.com/formats/yolo-darknet-txt