Monday 27 April 2015

Benford's Law

Benford's law, also called the First-Digit Law, refers to the frequency distribution of digits in many (but not all) real-life sources of data. In this distribution, 1 occurs as the leading digit about 30% of the time, while larger digits occur in that position less frequently: 9 as the first digit less than 5% of the time. Benford's law also concerns the expected distribution for digits beyond the first, which approach a uniform distribution.

It has been shown that this result applies to a wide variety of data sets, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants,and processes described by power laws (which are very common in nature). It tends to be most accurate when values are distributed across multiple orders of magnitude.

It is named after physicist Frank Benford, who stated it in 1938, although it had been previously stated by Simon Newcomb in 1881.

A set of numbers is said to satisfy Benford's law if the leading digit d (d ∈ {1, ..., 9}) occurs with probability

P(d)=\log_{10}(d+1)-\log_{10}(d)=\log_{10} \left(\frac{d+1}{d}\right)=\log_{10} \left(1+\frac{1}{d}\right).

Numerically, the leading digits have the following distribution in Benford's law, where d is the leading digit and P(d) the probability:
dP(d)Relative size of P(d)
130.1%
217.6%
312.5%
49.7%
57.9%
66.7%
75.8%
85.1%
94.6%
Application :-

Accounting fraud detection
In 1972, Hal Varian suggested that the law could be used to detect possible fraud in lists of socio-economic data submitted in support of public planning decisions. Based on the plausible assumption that people who make up figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford's Law ought to show up any anomalous results.[15] Following this idea, Mark Nigrini showed that Benford's Law could be used in forensic accounting and auditing as an indicator of accounting and expenses fraud. In practice, applications of Benford's Law for fraud detection routinely use more than the first digit.

Legal status
In the United States, evidence based on Benford's law has been admitted in criminal cases at the federal, state, and local levels.

Election data
Benford's Law has been invoked as evidence of fraud in the 2009 Iranian elections, and also used to analyze other election results. However, other experts consider Benford's Law essentially useless as a statistical indicator of election fraud in general.

Macroeconomic data
Similarly, the macroeconomic data the Greek government reported to the European Union before entering the eurozone was shown to be probably fraudulent using Benford's law, albeit years after the country joined.

Genome data
The number of open reading frames and their relationship to genome size differs between eukaryotes and prokaryotes with the former showing a log-linear relationship and the latter a linear relationship. Benford's law has been used to test this observation with an excellent fit to the data in both cases.

Scientific fraud detection
A test of regression coefficients in published papers showed agreement with Benford's law. As a comparison group subjects were asked to fabricate statistical estimates. The fabricated results failed to obey Benford's law.

Distributions that can be expected to obey Benford's Law
When the mean is greater than median and the skew is positive
Numbers that result from mathematical combination of numbers: e.g., quantity × price
Transaction level data: e.g., disbursements, sales

Distributions that would not be expected to obey Benford's Law
Where numbers are assigned sequentially: e.g., check numbers, invoice numbers
Where numbers are influenced by human thought: e.g., prices set by psychological thresholds ($1.99)
Accounts with a large number of firm-specific numbers: e.g., accounts set up to record $100 refunds
Accounts with a built-in minimum or maximum
Where no transaction is recorded

Youtube Link for Quick understanding --> 
https://www.youtube.com/watch?v=vIsDjbhbADY

2 comments: