How I Helped The WHO Deal With One Of Africa's Deadliest Medical Crisis’ Using Python And Deep Learning

The complete guide on how to combine Python and DL to detect whether a person suffers from malaria with a 98% accuracy.

Photo by Adrianna Van Groningen on Unsplash

“ I was truly amazed by how easy it was to read the article despite the fact I know nothing about artificial intelligence, computer science, or medicine! “

According to the World Health Organization (WHO), in 2018 alone, more than 228 million people were diagnosed with malaria, out of which 416 000 lost their lives. Although these numbers are alarming by themselves, what is even more alarming is the fact that 93% of the total cases (213 million), as well as 67% of the total deaths (272 000), originated from the African region.

In fact, malaria is believed to have been eradicated in most parts of the world. The majority (if not the entirety) of the places where malaria is observed to still be a cause for concern, can be seen in figure 1.0.

1.0. An approximation of the parts of the world where malaria transmission occurs.

In other words, if you live in Europe or North America, there is virtually no chance you have ever been exposed to malaria (at home). At best, you may have heard about it in the news.

Despite the deadliness and severity of the aforementioned disease, as with most others, the spotlight has shifted targets and COVID19 has received the entirety of the world’s attention, forgetting at the same time diseases such as malaria that will not seize to exist simply because a novel deadly virus has immerged.

Simply want to see what this is about and the final results?

(6 minutes read-time)

Introduction What is malaria? The problem Project Manifesto (Probably why you are here) Results/Conclusion

— — — — — — —

Genuinely interested in the code behind this endeavor and want to inspect it in detail?

(12 minutes read-time)

Introduction What is malaria? The problem Project Manifesto (My Solution) Key Terms Creating the solution - Methods presented - Preparing the dataset and work environment - Libraries - Dataset - Coding Results/Conclusion

What is malaria?

As mentioned before, a westerner’s exposure to the term malaria is limited, if not non-existent. It is, thus, important to develop a basic understanding of what malaria is and why it is so deadly.

Malaria, also called “jungle fever”, is a mosquito-borne infectious disease that can affect both humans and other animals. The disease is caused by microorganisms that are part of the Plasmodium group. Malaria is most commonly spread by an infected female Anopheles mosquito. The way the infection takes place is the following:

The mosquito bites a potential victim
The parasites are introduced into the person’s blood through the mosquito’s saliva.
The parasites travel to the liver where they mature and then reproduce.

For more information concerning malaria, this video provides some additional, interesting information:

What is the problem?

Although there are many treatment centers specializing in combatting malaria, the problem is that the citizens of the more rural and distant (from big cities) areas have no effective way of employing mass-testing and treatment.

According to the CDC, malaria diagnosis takes place in one of the following forms: Clinical diagnosis, Microscopic diagnosis, Antigen Detection, Molecular Diagnosis, and Serology.

What is the real problem though? Why did I get involved with detecting malaria now?

As stated before, due to the novel coronavirus, attention has shifted from this deadly disease. As if that was not enough, African Nations have started re-allocating resources in order to fight COVID-19. Unfortunately, the resources allocated to this new and deadly fight were previously used in the race against malaria. Even then, the resources were not enough. Medical personnel was, and continues to not be specialized enough and do not have the required knowledge to detect and treat malaria. This is why a computer model would be suitable. That is, to perform rapid and mass detections of infected patients without the need for specialized equipment and medical personnel.

Malaria detection is already limited. Limiting it, even more, will have lethal results. This logical assertion made the WHO, on April 23, 2020, urge counties to move quickly and save the lives of malaria victims in sub-Saharan Africa.

WHO urges countries to move quickly to save lives from malaria in sub-Saharan Africa

Severe disruptions to insecticide-treated net campaigns and in access to antimalarial medicines could lead to a…

www.who.int

Sadly, it is believed that the fate of malaria victims will only become worse, as COVID-19 continues to evolve and mutate.

Project Manifesto (My Solution)

Watching the above happenings taking place, I realized that things would not get better. If governments are not willing or able to protect their citizens and provide them with effective and cheap ways to identify if they are currently victims of malaria, someone else has to.

Remember! Scientists agree that the most important way to succeed in surviving from malaria is early detection.

Having the aforementioned thought process in mind, responding to WHO’s call for help, in this paper, I will be showcasing my progress in building and comparing different models that successfully detect if a person suffers from malaria.

The model requires a simple image of one’s blood and according to that, it will make its prediction. The reason for which this form of testing is preferred is the level of ease in implementing it. There are numerous portable-amateur microscopes, that are both cheap and can be found in markets all around Africa. In addition to that, there are specific phone-camera extensions that can be used to take the same blood samples.

“ Truly astonishing! Just by thinking that the author is only 16 years old makes me shiver and contemplate my life choices. “

Key Terms

It is crucial, in order to proceed to become acquainted with certain key-terms that will be used throughout this article.

Convolutional Neural Network (CNN)

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics.[1][2] They have applications in image and video recognition, recommender systems,[3] image classification, medical image analysis, natural language processing,[4], and financial time series.

A convolutional neural network architecture

CNNs are regularized versions of multilayer perceptrons. Multilayer perceptrons usually mean fully connected networks, that is, each neuron in one layer is connected to all neurons in the next layer. The “fully-connectedness” of these networks makes them prone to overfitting data.

(For more information on CNNs, this article is an excellent resource)

Transfer Learning

The definition used for transfer learning has been taken from this article.

Transfer learning is an approach in deep learning (and machine learning) where knowledge is transferred from one model to another.

Def: Model A is successfully trained to solve source task T.a using a large dataset D.a. However, the dataset D.b for a target task T.b is too small, preventing Model B from training efficiently. Thus, we use part of model A to predict results for task T.b.

A common misconception is that training and testing data should come from the same source or be with the same distribution.

Using transfer learning, we are able to solve a particular task using full or part of an already pre-trained model on a different task.

A magnificent explanation of transfer learning can be found in the youtube video by Andrew Ng below:

VGG-19 Model

VGG is a Convolutional Neural Network architecture, It was proposed by Karen Simonyan and Andrew Zisserman of Oxford Robotics Institute in the year 2014. It was submitted to Large Scale Visual Recognition Challenge 2014 (ILSVRC2014) and The model achieves 92.7% top-5 test accuracy in ImageNet. ImageNet is one of the largest data-sets available. It has 14 million hand-annotated images of what is in the picture.

Image Augmentation

Image augmentation artificially creates training images through different ways of processing or combination of multiple processing, such as random rotation, shifts, shear, and flips, etc.

Overfitting

In statistics, overfitting is “the production of an analysis that corresponds too closely or exactly to a particular set of data, and may, therefore, fail to fit additional data or predict future observations reliably”.[1] An overfitted model is a statistical model that contains more parameters than can be justified by the data.[2] The essence of overfitting is to have unknowingly extracted some of the residual variation (i.e. the noise) as if that variation represented the underlying model structure.

Epochs

An epoch is a term used in machine learning and indicates the number of passes through the entire training dataset the machine learning algorithm has completed when training.

Batch Size

Batch size is a term used in machine learning and refers to the number of training examples utilized in one iteration.

Creating the Solution

Methods Presented

In this article, I will be testing three different deep learning techniques, in order to detect malaria.

Method 1: Convolutional Neural Network (CNN)

Method 2: Transfer learning with frozen pre-trained CNN

Method 3: Fine-tuned pre-trained CNN with image augmentation

Preparing the dataset and work environment

First, a supported version of python is needed to be installed. To do so, navigate to this link and follow the instructions for the operating system of choice.

I will be using Python 3.6.9 and Ubuntu 18.04.4 LTS as my Operating System. Nevertheless, all supported python versions are welcome.

Before proceeding with installing the required libraries, pip must be also installed. (I am pretty certain that pip comes with all python versions after 2.7.9 but if pip is not already installed, follow this guide.)

Libraries

The following libraries should be installed with pip:

pip3 install numpy
pip3 install pandas
pip3 install matplotlib
pip3 install sklearn
pip3 install cv2
pip3 install tensorflow
pip3 install scipy

Dataset

Having the right dataset is undoubtedly one of the most important aspects of any data science project. In this case scenario, the dataset we need is an archive of human blood samples, classified as either infected or healthy.

The way you acquire said database depends entirely on your method of preference. In this instance, I will be using a Kaggle dataset containing a total of 27,558 classified images, found here.

Coding

Now that both the libraries and dataset are set-up, it is time to begin the actual coding of the model (I will be using a Jupyter notebook).

I will begin by importing all necessary libraries and dependencies:

Now that everything has been imported, I will be importing the dataset into the model.

It appears that both folders (“Parasitized” and “Uninfected”), have 13,779 images each.

I will be working with pandas data-frames, thus a data-frame with the name “data” is going to be created, which will have two features, “filename”, and “label”.

All deep learning models require a training, and testing set. In this case scenario, we are also going to add a validation set. Hence, 70% of the data will be set as training data, 20% as testing, and 10% for the validation set.

Due to the nature of the images supplied to the model, a crucial problem arises. The user will enter his/her self-taken pictures. These pictures will be of various sizes, orientations e.t.c. This can be resolved by using some handy libraries such as “cv2”.

A desirable image size would be 125x125 pixels. I will be using parallel processing, in order to speed up the computations’ speed required for loading and resizing each image.

Every single image should have now converted to 125x125. Plotting the data is always a good idea. With that being said, let’s visualize a sample pack of the dataset.

Certain patterns can be easily observed between healthy, and infected blood cells. The models to be constructed should be able to properly identify the core differences between an infected and healthy cell, and classify them.

Before doing so, some basic settings should be set up (“BATCH_SIZE” and “EPOCHS” can be changed in order to reach higher accuracy).

By completing this step, we have successfully adjusted the image’s dimensions, epochs, batch size, and have encoded the categorical class labels.

Model 1: Convolutional Neural Network (CNN)

To begin creating the first model, I will be defining the model’s architecture.

As can be seen above, the model consists of three convolution and pooling layers, two dense layers, and dropouts used for regularization.

It is now time to train the model:

In order to have a clear perspective of the model’s progress and accuracy, I will be plotting its accuracy and loss curves.

It appears that although the accuracy on the training data is pretty high, there is some overfitting as well. Nevertheless, I will be saving the model in order to use it later.

Model 2: Transfer learning with frozen pre-trained CNN

I will be using TensorFlow to import VGG-19 and freeze the convolution blocks in order to act as a feature extractor. The dense layers will be added at the end and perform the classification.

At the moment, there are 28 layers in total, out of which, 6 are trainable. I will be using the same setting used with the first model and train it to view the results.

It appears that the second model does not overfit as much as the first. At the same time, its accuracy is slightly less.

Model 3: Fine-tuned pre-trained CNN with image augmentation

In this model, I will be fine-tuning the last two blocks of the VGG-19 model. Some image augmentation will also take place in order to create better, altered versions of the original images and reach better results (the validation dataset will obviously not be augmented, as it will be used to evaluate the model’s performance per echo).

Let's view some of the augmented images:

The differences with the original pictures are obvious. To continue, I will be making the model and, at the same time, be making sure that the last two layers are trainable.

I will be now making some final alterations (reduce the learning rate e.t.c.), and be training the final model.

The training of all three models has successfully elapsed. In order to evaluate their accuracy and f-1 scores, I will be using a snippet from an open-source third-party code from Github. The author of the code is “DIP”, and it can be found here.

Results / Conclusion

After creating and testing all three models, I have come to certain conclusions.

The first model (a basic CNN model), presented an accuracy and F1 score of 95.95%, BUT significant overfitting was observed.
The second model (a VGG-19, frozen pre-trained model) presented an accuracy and F1 score of 94.87%.
Finally, the third model (a VGG-19 fine-tuned model) presented an accuracy and F1 score of 98.0%.

In simple terms, the best out of the three managed to classify a blood sample as infected or not with a 98% accuracy. This is more than desirable as it means that it can significantly outperform the classification accuracy of doctors. It is also worth mentioning that the blood-samples images were of different sizes, quality e.t.c. By using data-augmentation, all images were ameliorated significantly, and their classification became possible by the model.

Surviving In A Digital Age of Instability | Data Driven Investor

If you are a computer scientist you might have noticed that new frameworks are constantly popping up. Programming…

www.datadriveninvestor.com

To conclude, it becomes apparent that the lives that can be saved by the mass adoption of such models fall in the thousands, if not millions. It is, thus, crucial that researchers constantly outperform previous models and increase the models’ accuracy.

Do you want to learn more?

If you want to advance your knowledge and are interested in making money with machine learning I highly encourage you to read the articles listed below:

Did I Just Succeed In Detecting Breast Cancer From A Single Image With Python And Machine Learning?

The complete guide on how to combine Python and ML to detect whether a person suffers from breast cancer with 98.24%…

medium.com

Millennials! This Is The Unconventional Money-Making Technique You Were Looking For

The complete blueprint on how to make thousands with 0$ starting capital using python and ML.

medium.com

Predicting Oil Prices With Machine Learning And Python

The complete guide on predicting the price of “black gold” with less than 0.3% error using Python and Machine Learning

medium.com

How I Helped The WHO Deal With One Of Africa's Deadliest Medical Crisis’ Using Python And Deep Learning

The complete guide on how to combine Python and DL to detect whether a person suffers from malaria with a 98% accuracy.

Table of Contents

Simply want to see what this is about and the final results?

Genuinely interested in the code behind this endeavor and want to inspect it in detail?

What is malaria?

What is the problem?

WHO urges countries to move quickly to save lives from malaria in sub-Saharan Africa

Severe disruptions to insecticide-treated net campaigns and in access to antimalarial medicines could lead to a…

Project Manifesto (My Solution)

Key Terms

Convolutional Neural Network (CNN)

Transfer Learning

VGG-19 Model

Image Augmentation

Overfitting

Epochs

Batch Size

Creating the Solution

Methods Presented

Preparing the dataset and work environment

Libraries

Dataset

Coding

Model 1: Convolutional Neural Network (CNN)

Model 2: Transfer learning with frozen pre-trained CNN

Model 3: Fine-tuned pre-trained CNN with image augmentation

Results / Conclusion

Surviving In A Digital Age of Instability | Data Driven Investor

If you are a computer scientist you might have noticed that new frameworks are constantly popping up. Programming…

Do you want to learn more?

Did I Just Succeed In Detecting Breast Cancer From A Single Image With Python And Machine Learning?

The complete guide on how to combine Python and ML to detect whether a person suffers from breast cancer with 98.24%…

Millennials! This Is The Unconventional Money-Making Technique You Were Looking For

The complete blueprint on how to make thousands with 0$ starting capital using python and ML.

Predicting Oil Prices With Machine Learning And Python

The complete guide on predicting the price of “black gold” with less than 0.3% error using Python and Machine Learning