avatardoedotdev

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

9283

Abstract

hbors of each value. You have already found the 2nd above, we just need the first in the set.</p><div id="9b71"><pre><span class="hljs-function"><span class="hljs-title">kNearestSet</span><span class="hljs-params">(a)</span></span> = { <span class="hljs-selector-tag">b</span>, c } <span class="hljs-function"><span class="hljs-title">kNearestSet</span><span class="hljs-params">(b)</span></span> = { <span class="hljs-selector-tag">a</span>, c } <span class="hljs-function"><span class="hljs-title">kNearestSet</span><span class="hljs-params">(c)</span></span> = { <span class="hljs-selector-tag">b</span>, <span class="hljs-selector-tag">a</span> } <span class="hljs-function"><span class="hljs-title">kNearestSet</span><span class="hljs-params">(d)</span></span> = { <span class="hljs-selector-tag">a</span>, c }</pre></div><div id="6f81"><pre>How many <span class="hljs-keyword">items</span> are <span class="hljs-keyword">in</span> <span class="hljs-keyword">each</span> <span class="hljs-built_in">set</span>?</pre></div><div id="dcd0"><pre><span class="hljs-function"><span class="hljs-title">kNearestSetCount</span><span class="hljs-params">(a)</span></span> = <span class="hljs-number">2</span> <span class="hljs-function"><span class="hljs-title">kNearestSetCount</span><span class="hljs-params">(b)</span></span> = <span class="hljs-number">2</span> <span class="hljs-function"><span class="hljs-title">kNearestSetCount</span><span class="hljs-params">(c)</span></span> = <span class="hljs-number">2</span> <span class="hljs-function"><span class="hljs-title">kNearestSetCount</span><span class="hljs-params">(d)</span></span> = <span class="hljs-number">2</span></pre></div><p id="adad">You can do some of these steps in whatever order you like, however it will begin to make sense why I spell out the process.</p><h1 id="c834">Local Reachability Density (LRD) Calculation</h1><p id="9dba">LRD is the estimated distance at which a point can be found by its neighbors (NOT THE OPPOSITE, read that 2x). So if a neighbor were to reach out LRD value distance in any direction, it would be likely/ most optimal to find that individual point. The LRD is the count of the items in the K nearest neighbor set which is calculated above as<code> kNearestSetCount(point)</code>, over the <code>reachDistance</code> of the point to all the values in it’s set, which is <code>kNearestSet</code> above.</p><div id="60bb"><pre> <span class="hljs-built_in">kNearestSetCount</span>(a) <span class="hljs-function"><span class="hljs-title">LRD</span><span class="hljs-params">(a)</span></span> = --------------------------------------------- <span class="hljs-built_in">reachDistance</span>(<span class="hljs-selector-tag">b</span> <- a) + <span class="hljs-built_in">reachDistance</span>(c <- a)</pre></div><p id="a5e6">What is reachDistance? The max value of the Kth, 2nd in our case, nearest neighbor of the point and the Manhattan distance of between the point and it’s neighbor, two things we already calculated.</p><p id="1bd3">Here is an example of <code>b <- a</code>.</p><div id="19fb"><pre><span class="hljs-function"><span class="hljs-title">reachDistance</span><span class="hljs-params">(b <- a)</span></span> = max{<span class="hljs-built_in">distanceToKthNearestNeighbor</span>(b), <span class="hljs-built_in">manhattanDistance</span>(<span class="hljs-selector-tag">a</span>,b)}</pre></div><div id="fc92"><pre><span class="hljs-function"><span class="hljs-title">reachDistance</span><span class="hljs-params">(b <- a)</span></span> = max{<span class="hljs-number">1</span>,<span class="hljs-number">1</span>}</pre></div><div id="5607"><pre><span class="hljs-function"><span class="hljs-title">reachDistance</span><span class="hljs-params">(b <- a)</span></span> = <span class="hljs-number">1</span></pre></div><p id="cfff">Here is <code>c <- a</code>.</p><div id="a8ac"><pre><span class="hljs-function"><span class="hljs-title">reachDistance</span><span class="hljs-params">(c <- a)</span></span> = max{<span class="hljs-built_in">distanceToKthNearestNeighbor</span>(c), <span class="hljs-built_in">manhattanDistance</span>(<span class="hljs-selector-tag">a</span>,c)}</pre></div><div id="2090"><pre>reachDistance<span class="hljs-punctuation">(</span><span class="hljs-built_in">c</span> <span class="hljs-operator"><-</span> a<span class="hljs-punctuation">)</span> <span class="hljs-operator">=</span> <span class="hljs-built_in">max</span><span class="hljs-punctuation">{</span><span class="hljs-number">2</span><span class="hljs-punctuation">,</span><span class="hljs-number">2</span><span class="hljs-punctuation">}</span></pre></div><div id="4b03"><pre><span class="hljs-function"><span class="hljs-title">reachDistance</span><span class="hljs-params">(c <- a)</span></span> = <span class="hljs-number">2</span></pre></div><p id="3a19">We know <code>reachDistance</code>, we can complete the <code>LRD</code> calculation.</p><div id="9625"><pre> <span class="hljs-built_in">kNearestSetCount</span>(a) <span class="hljs-function"><span class="hljs-title">LRD</span><span class="hljs-params">(a)</span></span> = --------------------------------------------- <span class="hljs-built_in">reachDistance</span>(<span class="hljs-selector-tag">b</span> <- a) + <span class="hljs-built_in">reachDistance</span>(c <- a)</pre></div><div id="edc9"><pre><span class="hljs-attribute">LRD</span>(a) = <span class="hljs-number">2</span>/(<span class="hljs-number">1</span> + <span class="hljs-number">2</span>) </pre></div><div id="b6db"><pre><span class="hljs-function"><span class="hljs-title">LRD</span><span class="hljs-params">(a)</span></span> = .<span class="hljs-number">667</span></pre></div><p id="669e">Calculate the LRD for the following points.</p><div id="475c"><pre> <span class="hljs-built_in">kNearestSetCount</span>(b) <span class="hljs-function"><span class="hljs-title">LRD</span><span class="hljs-params">(b)</span></span> = --------------------------------------------- = .<span class="hljs-number">5</span> <span class="hljs-built_in">reachDistance</span>(<span class="hljs-selector-tag">a</span> <- b) + <span class="hljs-built_in">reachDistance</span>(c <- b) </pre></div><div id="8fe5"><pre> <span class="hljs-built_in">kNearestSetCount</span>(c) <span class="hljs-function"><span class="hljs-title">LRD</span><span class="hljs-params">(c)</span></span> = --------------------------------------------- = .<span class="hljs-number">667</span> <span class="hljs-built_in">reachDistance</span>(<span class="hljs-selector-tag">a</span> <- c) + <span class="hljs-built_in">reachDistance</span>(<span class="hljs-selector-tag">b</span> <- c) </pre></div><div id="9b28"><pre> <span class="hljs-built_in">kNearestSetCount</span>(d) <span class="hljs-function"><span class="hljs-title">LRD</span><span class="hljs-params">(d)</span></span> = --------------------------------------------- = .<span class="hljs-number">33</span> <span class="hljs-built_in">reachDistance</span>(<span class="hljs-selector-tag">a</span> <- d) + <span class="hljs-built_in">reachDistance</span>(c <- d)</pre></div><p id="9c78">Remember reachDistance is different from LRD, and you need one to calculate the other!</p><h1 id="a473">Local Outlier Factor Calculation</h1><p id="f1f6">The final <code>LOF</code> value of each point can now be calculated. The <code>LOF</code> of a point <code>p</code> is the sum of the <code>LRD</code> of all the points in the set <code>kNearestSet(p)</code> <code>*</code> the sum of the <code>reachDistance</code> of all the points of the same set, to the point <code>p</code>, all divided by the number of items in the set, <code>kNearestSetCount(p)</code>, squared.</p><div id="29b0"><pre>Reminder calculated above: <span class="hljs-function"><span class="hljs-title">kNearestSet</span><span class="hljs-params">(a)</span></span> = { <span class="hljs-selector-tag">b</span>, c } <span class="hljs-function"><span class="hljs-title">kNearestSetCount</span><span class="hljs-params">(a)</span></span> = <span class="hljs-number">2</span> </pre></div><div id="72f0"><pre> [LRD(b) + LRD(c)] * [reachDist(b <- <span class="hljs-keyword">a</span>) + reachDist(c <- <span class="hljs-keyword">a</span>)] LOF(<span class="hljs-keyword">a</span>) = <span class="hljs-comment">----------------------------------------------------------</span> kNearestSetCount(<span class="hljs-keyword">a</span>)*kNearestSetCount(<span class="hljs-keyword">a</span>)</pre></div><div id="7b37"><pre>LOF<span class="hljs-comment">(a)</span> = [<span class="hljs-number">.5</span> + <span class="hljs-number">.667</span>] * [<span class="hljs-number">1</span> + <span class="hljs-number">2</span>] / <span class="hljs-comment">(2 * 2)</span></pre></div><div id="ae11"><pre><span class="hljs-attribute">LOF</span>(a) = <span class="hljs-number">3</span>.<span class="hljs-number">501</span> / <span class="hljs-number">4</span></pre></div><div id="9e35"><pre><span class="hljs-function"><span class="hljs-title">LOF</span><span class="hljs-param

Options

s">(a)</span></span> = .<span class="hljs-number">87</span></pre></div><p id="5a74">Now that we see a full one worked out, I will quickly show the next 3 points <code>LOF</code>.</p><div id="1b23"><pre> [LRD(a) + LRD(c)] * [reachDist(a <- <span class="hljs-keyword">b) </span>+ reachDist(c <- <span class="hljs-keyword">b)] </span>LOF(<span class="hljs-keyword">b) </span>= ---------------------------------------------------------- kNearestSetCount(<span class="hljs-keyword">b)kNearestSetCount(b) </span></pre></div><div id="ba86"><pre><span class="hljs-function"><span class="hljs-title">LOF</span><span class="hljs-params">(b)</span></span> = <span class="hljs-number">1.33</span> </pre></div><div id="6d6b"><pre> <span class="hljs-punctuation">[</span>LRD<span class="hljs-punctuation">(</span>b<span class="hljs-punctuation">)</span> <span class="hljs-operator">+</span> LRD<span class="hljs-punctuation">(</span>a<span class="hljs-punctuation">)</span><span class="hljs-punctuation">]</span> <span class="hljs-operator"></span> <span class="hljs-punctuation">[</span>reachDist<span class="hljs-punctuation">(</span>a <span class="hljs-operator"><-</span> <span class="hljs-built_in">c</span><span class="hljs-punctuation">)</span> <span class="hljs-operator">+</span> reachDist<span class="hljs-punctuation">(</span>b <span class="hljs-operator"><-</span> <span class="hljs-built_in">c</span><span class="hljs-punctuation">)</span><span class="hljs-punctuation">]</span> LOF<span class="hljs-punctuation">(</span><span class="hljs-built_in">c</span><span class="hljs-punctuation">)</span> <span class="hljs-operator">=</span> <span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span> kNearestSetCount<span class="hljs-punctuation">(</span><span class="hljs-built_in">c</span><span class="hljs-punctuation">)</span><span class="hljs-operator"></span>kNearestSetCount<span class="hljs-punctuation">(</span><span class="hljs-built_in">c</span><span class="hljs-punctuation">)</span> </pre></div><div id="4b29"><pre><span class="hljs-function"><span class="hljs-title">LOF</span><span class="hljs-params">(c)</span></span> = .<span class="hljs-number">87</span> </pre></div><div id="be26"><pre> <span class="hljs-punctuation">[</span>LRD<span class="hljs-punctuation">(</span>a<span class="hljs-punctuation">)</span> <span class="hljs-operator">+</span> LRD<span class="hljs-punctuation">(</span><span class="hljs-built_in">c</span><span class="hljs-punctuation">)</span><span class="hljs-punctuation">]</span> <span class="hljs-operator"></span> <span class="hljs-punctuation">[</span>reachDist<span class="hljs-punctuation">(</span>a <span class="hljs-operator"><-</span> d<span class="hljs-punctuation">)</span> <span class="hljs-operator">+</span> reachDist<span class="hljs-punctuation">(</span><span class="hljs-built_in">c</span> <span class="hljs-operator"><-</span> d<span class="hljs-punctuation">)</span><span class="hljs-punctuation">]</span> LOF<span class="hljs-punctuation">(</span>d<span class="hljs-punctuation">)</span> <span class="hljs-operator">=</span> <span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span><span class="hljs-operator">-</span> kNearestSetCount<span class="hljs-punctuation">(</span>d<span class="hljs-punctuation">)</span><span class="hljs-operator">*</span>kNearestSetCount<span class="hljs-punctuation">(</span>d<span class="hljs-punctuation">)</span> </pre></div><div id="66e7"><pre><span class="hljs-function"><span class="hljs-title">LOF</span><span class="hljs-params">(d)</span></span> = <span class="hljs-number">2</span></pre></div><h1 id="3d49">Analysis</h1><p id="1842">So what does this mean?</p><div id="cc89"><pre><span class="hljs-function"><span class="hljs-title">LOF</span><span class="hljs-params">(a)</span></span> = .<span class="hljs-number">87</span> <span class="hljs-function"><span class="hljs-title">LOF</span><span class="hljs-params">(b)</span></span> = <span class="hljs-number">1.33</span> <span class="hljs-function"><span class="hljs-title">LOF</span><span class="hljs-params">(c)</span></span> = .<span class="hljs-number">87</span> <span class="hljs-function"><span class="hljs-title">LOF</span><span class="hljs-params">(d)</span></span> = <span class="hljs-number">2</span></pre></div><p id="b971" type="7">Well, it depends on the data.</p><p id="6360">While a <code>LOF</code> value of 1 or less is a good indicator of an inlier, we are here to calculate and <b>probably</b> remove outliers or anomalies.</p><p id="f771">Do you have a tight, clean, and uniform dataset? Then a <code>LOF</code> value of 1.05 could be an outlier.</p><p id="a1eb">Do you have a sparse dataset, varying in density, with many local fluctuations specific to that local cluster? Then a <code>LOF</code> value of 2 could still be an inlier.</p><p id="d289">So, it depends. There are many different variations/ additions to this base algorithm. However, we set out to find and prune the top 2 outliers from out set, which turned out to be b and d.</p><p id="02dd">Any questions or mistakes, please comment.</p><p id="da95">More to come with anomaly detection with python and data science!</p></article></body>

Local Outlier Factor | Simple Example By Hand

Local Outlier Factor value is a commonly used anomaly detection tool. It takes a local approach to better detect outliers about their neighbors, whereas a global strategy, might not be the best detection for datasets that fluctuate in density.

Before we get started, I am going to assume you know a bit about DBSCAN and K Nearest Neighbor algorithms. However, I am confident you can make it through without needing an incredibly deep understanding of the algorithms.

This is the implementation by hand, on a minimal and straightforward dataset. However, seeing it by hand I feel gives the best understanding initially.

Here is my implementation.

Local Outlier Factor Calculation in 6 Steps

  • Distance Calculation
  • Kth-Nearest Neighbor Distance Calculation
  • K-Nearest Neighbor Calculation
  • Local Reachability Density (LRD) Calculation
  • Local Outlier Factor Calculation
  • Analysis

The Problem

Calculate the Local Outlier Factor (LOF) for each point and find the top 2 outliers. Use a “K” value of 2 and Manhattan Distance as the distance function.

Point (X,Y)
a     (0,0)
b     (0,1)
c     (1,1)
d     (3,0)
(Working on a markdown generator for these basic charts.)
2-|
  |
1-|*b *c
  |
0-|*a_________*d
  |   |   |   |
  0   1   2   3

Distance Calculation

Multiple distance functions out there, but as I specified in the problem above, we are going to be using Manhattan to allow easy calculations by hand.

manhattanDistance(a,b) = 1
manhattanDistance(a,c) = 2
manhattanDistance(a,d) = 3
manhattanDistance(b,c) = 1
manhattanDistance(b,d) = 4
manhattanDistance(c,d) = 3

Kth-Nearest Neighbor Distance Calculation

As prescribed in the problem, we are going to use a K value of 2. This means we need to find the Kth, 2nd, nearest neighbor of each point. That is the 2nd closest point to each.

manhattanDistance(a,b) = 1
manhattanDistance(a,c) = 2
manhattanDistance(a,d) = 3
the second nearest neighbor of a is c.
Therefore,
kthNearestNeighbor(a) = c
kthNearestNeighbor(b) = c (a and c are same distance, choose either)
kthNearestNeighbor(c) = a
kthNearestNeighbor(d) = c (c and a are same distance, choose either)

Now calculate the distance of each point to it’s kth, in our case 2nd, nearest neighbor.

distanceToKthNearestNeighbor(a) = manhattanDistance(a,c) = 2
distanceToKthNearestNeighbor(b) = manhattanDistance(b,c) = 1
distanceToKthNearestNeighbor(c) = manhattanDistance(c,a) = 2
distanceToKthNearestNeighbor(d) = manhattanDistance(d,c) = 3

K-Nearest Neighbor Calculation

Find the K, in our case 2, nearest neighbors of each value. You have already found the 2nd above, we just need the first in the set.

kNearestSet(a) = { b, c }
kNearestSet(b) = { a, c }
kNearestSet(c) = { b, a }
kNearestSet(d) = { a, c }
How many items are in each set?
kNearestSetCount(a) = 2
kNearestSetCount(b) = 2
kNearestSetCount(c) = 2
kNearestSetCount(d) = 2

You can do some of these steps in whatever order you like, however it will begin to make sense why I spell out the process.

Local Reachability Density (LRD) Calculation

LRD is the estimated distance at which a point can be found by its neighbors (NOT THE OPPOSITE, read that 2x). So if a neighbor were to reach out LRD value distance in any direction, it would be likely/ most optimal to find that individual point. The LRD is the count of the items in the K nearest neighbor set which is calculated above as kNearestSetCount(point), over the reachDistance of the point to all the values in it’s set, which is kNearestSet above.

                     kNearestSetCount(a)
LRD(a) = ---------------------------------------------
         reachDistance(b <- a) + reachDistance(c <- a)

What is reachDistance? The max value of the Kth, 2nd in our case, nearest neighbor of the point and the Manhattan distance of between the point and it’s neighbor, two things we already calculated.

Here is an example of b <- a.

reachDistance(b <- a) = 
       max{distanceToKthNearestNeighbor(b), manhattanDistance(a,b)}
reachDistance(b <- a) = max{1,1}
reachDistance(b <- a) = 1

Here is c <- a.

reachDistance(c <- a) = 
       max{distanceToKthNearestNeighbor(c), manhattanDistance(a,c)}
reachDistance(c <- a) = max{2,2}
reachDistance(c <- a) = 2

We know reachDistance, we can complete the LRD calculation.

                      kNearestSetCount(a)
LRD(a) = ---------------------------------------------
         reachDistance(b <- a) + reachDistance(c <- a)
LRD(a) =  2/(1 + 2) 
LRD(a) = .667

Calculate the LRD for the following points.

                kNearestSetCount(b)
LRD(b) = ---------------------------------------------  = .5
         reachDistance(a <- b) + reachDistance(c <- b)
                kNearestSetCount(c)
LRD(c) = ---------------------------------------------  = .667
         reachDistance(a <- c) + reachDistance(b <- c)
                kNearestSetCount(d)
LRD(d) = ---------------------------------------------  = .33
         reachDistance(a <- d) + reachDistance(c <- d)

Remember reachDistance is different from LRD, and you need one to calculate the other!

Local Outlier Factor Calculation

The final LOF value of each point can now be calculated. The LOF of a point p is the sum of the LRD of all the points in the set kNearestSet(p) * the sum of the reachDistance of all the points of the same set, to the point p, all divided by the number of items in the set, kNearestSetCount(p), squared.

Reminder calculated above: 
kNearestSet(a) = { b, c }
kNearestSetCount(a) = 2
         [LRD(b) + LRD(c)] * [reachDist(b <- a) + reachDist(c <- a)]
LOF(a) = ----------------------------------------------------------
               kNearestSetCount(a)*kNearestSetCount(a)
LOF(a) = [.5 + .667] * [1 + 2] / (2 * 2)
LOF(a) = 3.501 / 4
LOF(a) = .87

Now that we see a full one worked out, I will quickly show the next 3 points LOF.

         [LRD(a) + LRD(c)] * [reachDist(a <- b) + reachDist(c <- b)]
LOF(b) = ----------------------------------------------------------
               kNearestSetCount(b)*kNearestSetCount(b) 
LOF(b) = 1.33
         [LRD(b) + LRD(a)] * [reachDist(a <- c) + reachDist(b <- c)]
LOF(c) = ----------------------------------------------------------
               kNearestSetCount(c)*kNearestSetCount(c)
LOF(c) = .87
         [LRD(a) + LRD(c)] * [reachDist(a <- d) + reachDist(c <- d)]
LOF(d) = ----------------------------------------------------------
               kNearestSetCount(d)*kNearestSetCount(d)
LOF(d) = 2

Analysis

So what does this mean?

LOF(a) = .87
LOF(b) = 1.33
LOF(c) = .87
LOF(d) = 2

Well, it depends on the data.

While a LOF value of 1 or less is a good indicator of an inlier, we are here to calculate and probably remove outliers or anomalies.

Do you have a tight, clean, and uniform dataset? Then a LOF value of 1.05 could be an outlier.

Do you have a sparse dataset, varying in density, with many local fluctuations specific to that local cluster? Then a LOF value of 2 could still be an inlier.

So, it depends. There are many different variations/ additions to this base algorithm. However, we set out to find and prune the top 2 outliers from out set, which turned out to be b and d.

Any questions or mistakes, please comment.

More to come with anomaly detection with python and data science!

Data Science
Data
Data Analysis
Data Analytics
Python
Recommended from ReadMedium