Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

= <span class="hljs-string">'train'</span>:</pre></div><div id="2875"><pre> x_train<span class="hljs-selector-attr">[count]</span> = <span class="hljs-built_in">img_to_array</span>( cv2<span class="hljs-selector-class">.flip</span>( <span class="hljs-selector-tag">img</span>, <span class="hljs-number">1</span> ) )</pre></div><div id="30ed"><pre> y_train[<span class="hljs-keyword">count</span>] = <span class="hljs-built_in">int</span>(has_volcano)</pre></div><div id="5951"><pre> <span class="hljs-attribute">count</span> += <span class="hljs-number">1</span></pre></div><div id="1f54"><pre> <span class="hljs-comment">// repeat the same step three more times applying different transformation and incrementing count</span></pre></div><p id="1e9a">Here we read ‘Volcano?’ attribute of the sample. If image contains volcano we apply some transformation to the original image and add modified image to dataset together with corresponding label. In my case I applied 3 flips (with values 0, 1 and -1) and rotate (cv2.ROTATE_90_CLOCKWISE).</p><p id="c521">Let’s display class distribution after oversampling</p><figure id="9977"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*dI9O9aWXVY9bNvPZ6GEZpg.png"><figcaption></figcaption></figure><h1 id="4906">Experimental evaluation of oversampling</h1><h2 id="b037">Results with oversampling</h2><p id="24ee">When not applying oversampling</p><div id="7900"><pre><span class="hljs-attribute">loss</span>: <span class="hljs-number">0</span>.<span class="hljs-number">2359</span> — acc: <span class="hljs-number">0</span>.<span class="hljs-number">9202</span> — val_loss: <span class="hljs-number">0</span>.<span class="hljs-number">4253</span> — val_acc: <span class="hljs-number">0</span>.<span class="hljs-number">8626</span></pre></div><div id="aaba"><pre><span class="hljs-attribute">AUC</span> <span class="hljs-operator">=</span> <span class="hljs-number">0.500</span></pre></div><figure id="0f90"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*9_HjyOpvejy6ZTar"><figcaption></figcaption></figure><p id="3824">and loss / accuracy plot</p><figure id="50

Options

5d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*7YPvahPx5ixr-VRw"><figcaption></figcaption></figure><p id="5cab">When perform predict on test dataset we get following results:</p><div id="2c90"><pre><span class="hljs-built_in">number</span> <span class="hljs-keyword">of</span> images <span class="hljs-keyword">with</span> volcanoes: <span class="hljs-number">0</span></pre></div><div id="e598"><pre><span class="hljs-built_in">number</span> <span class="hljs-keyword">of</span> images <span class="hljs-keyword">without</span> volcanoes: <span class="hljs-number">2734</span></pre></div><p id="6ea2">We can see that all the test samples were classified as having no volcanoes.</p><h2 id="d3ed">Results with oversampling</h2><p id="6a70">When we applying oversampling</p><div id="3db6"><pre><span class="hljs-attribute">loss</span>: <span class="hljs-number">0</span>.<span class="hljs-number">6885</span> — acc: <span class="hljs-number">0</span>.<span class="hljs-number">5264</span> — val_loss: <span class="hljs-number">0</span>.<span class="hljs-number">6856</span> — val_acc: <span class="hljs-number">0</span>.<span class="hljs-number">5718</span></pre></div><div id="658f"><pre><span class="hljs-attribute">AUC</span> <span class="hljs-operator">=</span> <span class="hljs-number">0.504</span></pre></div><figure id="ac90"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*pg-XrTgeKdBPjeYn"><figcaption></figcaption></figure><p id="de20">and our learning curves</p><figure id="bf95"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*xoKU545NKHuwXJSk"><figcaption></figcaption></figure><p id="20fb">When perform predict on test dataset we get following results:</p><div id="24bc"><pre><span class="hljs-built_in">number</span> <span class="hljs-keyword">of</span> images <span class="hljs-keyword">with</span> volcanoes: <span class="hljs-number">27</span></pre></div><div id="874f"><pre><span class="hljs-built_in">number</span> <span class="hljs-keyword">of</span> images <span class="hljs-keyword">without</span> volcanoes: <span class="hljs-number">2707</span></pre></div><p id="765c">That’s it.</p></article></body>

Handling imbalanced dataset in image classification

I have been working on test task of detecting volcanoes on images from radar. Images have dimensions 100x100 pixels and single channel. The training dataset was highly imbalanced (the number of images without volcanoes is 5x larger than these with volcanoes).

There is plenty of ways to tackle this problem like class weights, oversampling the training dataset, focal loss etc.

In this article I will present manual oversampling of the training dataset to tackle the class inbalance problem.

Let’s firstly overview the distribution of classes in the data. Here we see the number of samples per class before oversampling:

We can see 1K samples with volcanoes and about 6K samples without volcanoes.

Oversampling means that we increase the number of samples in the minor classes so that the number of samples in different classes become equal or close to it thus get more balanced.

Let’s apply manual oversampling when preparing our training samples.

I applied oversampling in method prepareImages:

def prepareImages(train, shape, data_path, mode):

    for index, row in train.iterrows():

        has_volcano = row['Volcano?']
        ...
        if has_volcano and mode == 'train':

            x_train[count] = img_to_array( cv2.flip( img, 1 ) )

            y_train[count] = int(has_volcano)

            count += 1

            // repeat the same step three more times applying different transformation and incrementing count

Here we read ‘Volcano?’ attribute of the sample. If image contains volcano we apply some transformation to the original image and add modified image to dataset together with corresponding label. In my case I applied 3 flips (with values 0, 1 and -1) and rotate (cv2.ROTATE_90_CLOCKWISE).

Let’s display class distribution after oversampling

Experimental evaluation of oversampling

Results with oversampling

When not applying oversampling

loss: 0.2359 — acc: 0.9202 — val_loss: 0.4253 — val_acc: 0.8626

AUC = 0.500

and loss / accuracy plot

When perform predict on test dataset we get following results:

number of images with volcanoes: 0

number of images without volcanoes: 2734

We can see that all the test samples were classified as having no volcanoes.

Results with oversampling

When we applying oversampling

loss: 0.6885 — acc: 0.5264 — val_loss: 0.6856 — val_acc: 0.5718

AUC = 0.504

and our learning curves

When perform predict on test dataset we get following results:

number of images with volcanoes: 27

number of images without volcanoes: 2707

That’s it.