avatarEivind Kjosbakken

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2439

Abstract

rr2 = ... arr1, arr2 = arr1[:d], arr2[:d] <span class="hljs-comment">#only choose d first elements</span>

score = rbo.RankingSimilarity(arr1, arr2).rbo(p = p) </pre></div><p id="bf7c">This will now return a score that measures the similarity between 2 ranked lists.</p><h1 id="c07d">Tuning parameters:</h1><p id="f54c">The parameters p and d can be tuned to whatever value fits your need best, and there is no correct value for the parameters that fit in all scenarios. What could be interesting to see, however, is the weight from different parameters. For a given d elements, with a given p, you can get the weights using the following script:</p><div id="3251"><pre><span class="hljs-function"><span class="hljs-keyword">import</span> numpy as np

def <span class="hljs-title">getWeight</span><span class="hljs-params">(p, d)</span>: series =</span> <span class="hljs-number">0</span> <span class="hljs-function"><span class="hljs-keyword">for</span> i in <span class="hljs-title">range</span><span class="hljs-params">(<span class="hljs-number">1</span>, d)</span>: series +=</span> (pi)/i <span class="hljs-keyword">return</span> <span class="hljs-number">1</span> - p(d<span class="hljs-number">-1</span>) + (((<span class="hljs-number">1</span>-p)/p) * d *(np.<span class="hljs-built_in">log</span>(<span class="hljs-number">1</span>/(<span class="hljs-number">1</span>-p)) - series))</pre></div><p id="5381">The weight, in this case, refers to how much the d first elements are weighted for the similarity score. For example, for p = 0.5, the 3 first elements (d = 3) are weighted around 0.954 (so the 3 first elements make up 95.4 percent of the final similarity score that is calculated).</p><p id="785a">Note the script above is made from equation 21 in the <a href="https://dl.acm.org/doi/10.1145/1852102.1852106">RBO paper</a>.</p><p id="e81d">You can also plot the weights you get for different parameters, under I am showing for d = 3 and d = 100, and p ranging from: [0.025, 0.050, …, 0.975]:</p><figure id="d53a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*eb2_7-FpKZjNXbNGD0r0NQ.jpeg"><figcaption>The weighting for d = 3, and different p values, you can see the 3 first elements are weighted lower the higher the p-value is, starting with weight approximately = 1 for p = 0.1, ending approximately at weight = 0.22</figcaption></figure><figure id="3054"><img src="https://cdn

Options

-images-1.readmedium.com/v2/resize:fit:800/1*S3mWr4DSsXIam6E5fV7hUA.jpeg"><figcaption>For d = 100, the curve looks vastly different since the 100 first elements will consist of most of the weight for any p (since it is a ranked list, and higher ranks are in general more important)</figcaption></figure><p id="1deb">The weighting for d = 3, and different p values, you can see the 3 first elements are weighted lower the higher the p-value is, starting with weight approximately = 1 for p = 0.1, ending approximately at weight = 0.22</p><p id="8e49">For d = 100, the curve looks vastly different since the 100 first elements will consist of most of the weight for any p (since it is a ranked list, and higher ranks are in general more important)</p><p id="672e">Nonetheless, you can try different values, but if you are working with a search engine, I would consider making d ≤ 10, since few people look at search engine results that are ranked above 10 (think of it as going on the second page of Google, which is probably are a rare occurrence for most). Then you can test for different p values, choosing the one where the weighting fits best for you.</p><p id="1426">Using this metric might be useful if you are creating search engines, which you can start doing by implementing TF-IDF, a common retrieval technique. To learn how to, check out <a href="https://readmedium.com/tf-idf-in-python-index-your-data-and-do-inference-264bc78bfd47">this article</a> on TF-IDF.</p><p id="bde4">You can check out the academic paper for the RBO measure <a href="https://dl.acm.org/doi/10.1145/1852102.1852106">here</a> (note that you might need to be on an academic network to view it).</p><p id="7a84">If you want to check some other related articles I have written, please check out:</p><ul><li><a href="https://readmedium.com/how-to-fine-tune-easyocr-to-achieve-better-ocr-performance-1540f5076428">✅ Fine-tuning EasyOCR</a></li><li><a href="https://readmedium.com/empower-your-donut-model-for-receipts-with-self-annotated-data-51fc882b7229">✅ Empower your Donut (document understanding transformer) model</a></li><li><a href="https://readmedium.com/downloading-and-running-llama2-for-windows-bf99ef45c855">✅ Running Llama2 locally on Windows</a></li><li><a href="https://readmedium.com/analyzing-graph-networks-part-2-utilizing-advanced-methods-604ade49f9b8">Analyzing graph networks: Utilizing advanced methods</a></li></ul></article></body>

Find similarity of ranked lists with RBO

Story overview

  • Introduction
  • Implementing RBO
  • Tuning RBO hyperparameters
What to do when you have to compare the ranking 2 lists, but don’t have a ground truth order for what is the optimal list?

Introduction

Comparing ranked lists could be useful in several scenarios, most notably for search engines, which return a list of ranked elements. If you for example want to compare two search engines to each other, using a ranked list similarity score could be useful, to see how similar the search engines perform. The best part about RBO is that it does not require a ground truth for what would be the optimal list order, which could be helpful for many smaller-scale developers. Larger corporations will have the resources to hire people to rank the lists so that you can have a ground truth for what is the optimal list, but this is not available for everyone.

RBO (rank biased overlap) is one way to compare ranked lists. It takes into ranked lists, for example from two different search engines, and then returns a similarity score between these two lists. RBO uses 2 tunable parameters:

  • p -> a value between 0 and 1, the lower the p, the more weighted the first elements of the list are (so the 3 first elements matter more if p is lower, compared to if p is higher for example)
  • d -> number of elements to compare, so you only compare the d first elements of the ranked lists

Implementing RBO in Python

Implementing RBO in Python is easy to do using the RBO Python package. Just install it using:

pip install rbo

in the terminal, and it is ready to go.

You now need two ranked lists to compare, which I am assuming you already have, call them arr1 and arr2. You then have to choose the d first elements of both lists (also assuming the length of arr1 and arr2 is > d). Also, remember to import the RBO package. Here I have chosen a p of 0.5, and the 5 first elements (d = 5):

import rbo
p = 0.5
d = 5
arr1 = ...
arr2 = ...
arr1, arr2 = arr1[:d], arr2[:d] #only choose d first elements

score = rbo.RankingSimilarity(arr1, arr2).rbo(p = p)

This will now return a score that measures the similarity between 2 ranked lists.

Tuning parameters:

The parameters p and d can be tuned to whatever value fits your need best, and there is no correct value for the parameters that fit in all scenarios. What could be interesting to see, however, is the weight from different parameters. For a given d elements, with a given p, you can get the weights using the following script:

import numpy as np

def getWeight(p, d):
    series = 0
    for i in range(1, d):
        series += (p**i)/i
    return 1 - p**(d-1) + (((1-p)/p) * d *(np.log(1/(1-p)) - series))

The weight, in this case, refers to how much the d first elements are weighted for the similarity score. For example, for p = 0.5, the 3 first elements (d = 3) are weighted around 0.954 (so the 3 first elements make up 95.4 percent of the final similarity score that is calculated).

Note the script above is made from equation 21 in the RBO paper.

You can also plot the weights you get for different parameters, under I am showing for d = 3 and d = 100, and p ranging from: [0.025, 0.050, …, 0.975]:

The weighting for d = 3, and different p values, you can see the 3 first elements are weighted lower the higher the p-value is, starting with weight approximately = 1 for p = 0.1, ending approximately at weight = 0.22
For d = 100, the curve looks vastly different since the 100 first elements will consist of most of the weight for any p (since it is a ranked list, and higher ranks are in general more important)

The weighting for d = 3, and different p values, you can see the 3 first elements are weighted lower the higher the p-value is, starting with weight approximately = 1 for p = 0.1, ending approximately at weight = 0.22

For d = 100, the curve looks vastly different since the 100 first elements will consist of most of the weight for any p (since it is a ranked list, and higher ranks are in general more important)

Nonetheless, you can try different values, but if you are working with a search engine, I would consider making d ≤ 10, since few people look at search engine results that are ranked above 10 (think of it as going on the second page of Google, which is probably are a rare occurrence for most). Then you can test for different p values, choosing the one where the weighting fits best for you.

Using this metric might be useful if you are creating search engines, which you can start doing by implementing TF-IDF, a common retrieval technique. To learn how to, check out this article on TF-IDF.

You can check out the academic paper for the RBO measure here (note that you might need to be on an academic network to view it).

If you want to check some other related articles I have written, please check out:

Search Engine Ranking
Rbo
Python
Analyzing Data
Recommended from ReadMedium