Benford’s law can be a game changer!!!
Today in this blog I wanted to share some information about benford’s law. We will see what it is and how it could be a gamechanger as we go ahead.
All the code written in this blog is in my github- https://github.com/HarshMishra2002/benfords_law
link for the dataset used-
What is Benford’s Law?
First lets see how wikipedia defines it

So the law is simple. Consider any numerical data and extract its first digit from left, One would be the number occurring the most that is almost 30% of the time and 9 being the least with around 5% appearance.

It was first presented by Frank Benford in his paper “The Law of Anomalous Numbers”.
The link for that paper is- https://mdporter.github.io/SYS6018/other/(Benford)%20The%20Law%20of%20Anomalous%20Numbers.pdf
This law became more interesting when I read this one paragraph from his paper which said:
“The study of the items shows a distinct tendency for those of a random nature to agree better with the logarithmic law than those of a formal or mathematical nature. The best agreement was found in the arabic numbers (not spelled out) of consecutive front page news items of a newspaper. Dates were barred as not being variable, and the omission of spelled-out numbers restricted the counted digits to numbers 10 and over. The first 342 street addresses given in the current American Men of Science (Item R, Table IV) gave excellent agreement, and a complete count (except for dates and page numbers) of an issue of the Readers’ Digest was also in agreement. On the other hand, the greatest variations from the logarithmic relation were found in the first digits of mathematical tables from engineering handbooks, and in tabulations of such closely knit data as Molecular Weights, Specific Heats, Physical Constants and Atomic Weights.”
Let me put this in simple words, this law is more applicable and reliable when the data on which it is applied is naturally occurring and not when there is some mathematical equation to get those numbers or there is any kind of intentional human interference.

But why this law can be a gamechanger is its Application. When I work on any data science project I simply go to kaggle and download the csv file and start working on it.
I NEVER VALIDATE DATA.
There could be two reasons for that, Firstly I trust the source of the data too much that checking whether its appropriate or whether it has some parentage of fraud data is never a question. Second reason is that actually there are not too many tools available for it and so here comes the Benford’s Law for you.
The thing which I love the most about this law is that it is very easy to implement.
When I learnt about it I though that I should actually try to implement this law on actual dataset and see what’s the result.
So I took a dataset of Cars and applied the law on price column which represents the price of each car.


Now this result was the exact representation of the Benford’s Law.
I tried the same thing on another dataset. This time I had the data of runs scored by batsman in test cricket.


And once again I had the same result.
In the 2009 Iranian elections, Benford’s law was presented as proof of fraud. According to Mebane’s analysis, the second digits in vote counts for President Mahmoud Ahmadinejad, the election winner, tended to differ significantly from the expectations of Benford’s law, and that ballot boxes with a small number of invalid ballots had a greater impact on the results, implying widespread ballot stuffing.
Election fraud has also been claimed using Benford’s law in an inappropriate manner. The distribution of the first number did not match Benford’s formula when applied to Joe Biden’s election returns for Chicago, Milwaukee, and other cities in the 2020 United States presidential election. The error occurred as a result of looking at data that was tightly bound in range, which violated Benford’s law’s assumption that the data range be large.
SO I feel if there is any kind of need to validate numerical data or detect fraud in numbers this law should be the first tool to be used.
I hope you guys got to learn something new and enjoyed this blog. If you do like it than share it with your friends. Take care. keep learning.
You could also reach me through my Linkedin account- https://www.linkedin.com/in/harsh-mishra-4b79031b3/





