Things they don’t tell you about installing Mask R-CNN for custom datasets!

Summary

The web content provides insights into the challenges and practical steps for installing and customizing Mask R-CNN for instance segmentation tasks, emphasizing the importance of compatible dependencies, proper environment setup, and the use of pre-trained models.

Abstract

Mask R-CNN is a sophisticated deep learning model used for precise instance segmentation, which involves detecting objects and segmenting them at the pixel level. The article discusses the difficulties faced when setting up a Mask R-CNN model for custom datasets, particularly the struggle with dependency compatibility. It offers a step-by-step guide for Windows 11 users with NVIDIA GPUs, recommending specific versions of CUDA, CUDNN, and Python, along with a curated list of dependencies to ensure a smooth installation process. The author also highlights the necessity of a well-annotated dataset, suggests a 9:1 ratio for training and validation datasets, and provides resources for annotation tools. The use of data augmentation to expand the dataset and the application of transfer learning with pre-trained COCO weights are recommended to enhance model performance. The article concludes with advice on evaluating the model using metrics like mean average precision (mAP) and fine-tuning hyperparameters for improved results.

Opinions

The author found creating a virtual environment with compatible dependencies to be the most challenging aspect of installing Mask R-CNN.
Specific software versions, such as CUDA 10.0, CUDNN 7.4, and Python 3.7.11, are recommended for a successful setup.
The use of an annotation tool like VIA or MakeSense.ai is crucial for preparing the dataset for training and validation.
Data augmentation is a valuable technique to artificially expand the dataset and improve model robustness.
Starting with a pre-trained model on the COCO dataset can expedite the training process and improve performance.
The author emphasizes the importance of using appropriate metrics to evaluate the model's performance and suggests that fine-tuning may be necessary to achieve satisfactory results.
The article suggests that Mask R-CNN is a powerful tool for instance segmentation with a wide range of applications, despite the complexity involved in its implementation.

Things they don’t tell you about installing Mask R-CNN for custom datasets!

Source: Canva

Mask R-CNN is a powerful deep learning model widely utilized for instance segmentation tasks. It excels at detecting objects and precisely segmenting them at the pixel level. The model builds upon the Faster R-CNN architecture, seamlessly integrating object detection with instance segmentation.

There are several tutorials and repositories on the Internet about Mask R-CNN, including an official repository provided by the original authors at Matterport. The repository contains the code and pre-trained models for Mask R-CNN. I used this repository to build my custom model.

When I was creating my own Mask R-CNN model for custom datasets, the most difficult thing for me to do was create a virtual environment with all dependencies compatible with one another. It took me a lot of hours to figure out what was not working. All I knew was that the dependencies were not compatible with each other. But after several combinations, I managed to get the right ones together.

Before proceeding, I have Windows 11 with the NVIDIA GPU driver installed (528.02) and 20 GB of GPU memory. Make sure that you have installed CUDA and CUDNN on your computer. I used CUDA 10.0 and CUDNN 7.4 on my computer.

I created a virtual environment in Python 3.7.11. I tried several, and this one worked for me. Then I installed all the required dependencies. If you access the original requirements.txt from Matterport, the versions are not specified, so it can create conflicts. If you are using Python 3.7.11, you can install the following dependencies:

- tensorflow==2.2.0 - keras==2.3.1 - numpy==1.20.3 - scipy==1.4.1 - pillow==8.4.0 - cython==0.29.24 - scikit-image==0.16.2 - matplotlib - opencv-python==4.5.4.60 - h5py==2.10.0 - imgaug==0.4.0 - IPython[all]

Even after these dependencies were installed, I was getting an error that was related to protobuf, and to fix it, I had to install protobuf 3.8. After that, the environment was compatible with the Mask R-CNN.

The important thing needed to train the Mask R-CNN model is a dataset. You need to classify your dataset into training, validation, and testing datasets. It is likely to make training and validation datasets in the ratio of 9:1. Annotate both the training and validation datasets using an annotation tool like VIA or Makesense. I have a guided article for creating annotations using makesense.ai here. I used JSON files in the code and made the model for one class (class + background). If you lack images to form datasets, you can perform image augmentation using Python and modify the image in terms of shear, rotation, scale, etc. I had 100 images, and I augmented them to 400 for my experiment. You can find my code here.

To expedite the training process and improve performance, you can start with a pre-trained Mask R-CNN model on a large-scale dataset such as COCO (Common Objects in Context). Transfer the weights of the pre-trained model to your custom model, excluding the classification head. You can download the pretrained coco weights from here. If you have trained your model once, you can use the trained weight next time to train the model again.

You can evaluate the performance of your trained model using appropriate metrics such as mean average precision (mAP). Use a separate validation set or perform cross-validation to assess the model’s accuracy and fine-tune hyperparameters if necessary.

Once you are satisfied with the model’s performance, you can use it for inference on new, unseen images that are in the testing dataset. The model will detect objects, generate bounding boxes, and generate masks for the objects of interest.

If your model doesn’t achieve satisfactory results, you can fine-tune it by adjusting the hyperparameters or collecting more training data to improve its performance.

My model just had one class, and it could generate masks, provide prediction percentages, detect all the objects with the same color, save the results in a new folder, and run multiple images at the same time.

It’s worth noting that implementing Mask R-CNN from scratch can be a complex task, especially if you are new to deep learning. Utilizing existing implementations and libraries can significantly simplify the process and save time.

Overall, Mask R-CNN generates binary masks for each detected object and achieves an exceptional level of granularity in understanding and segmenting objects in images. With its ability to handle multiple instances of the same object class and its versatility in detecting various object categories, Mask R-CNN emerges as an astonishing tool for instance segmentation. Its wide range of applications, including autonomous driving, medical imaging, and interactive image editing, coupled with its pre-trained models and open-source implementations, make it a truly extraordinary and powerful model in the field of computer vision.