Thursday, 14 May 2015

Chebyshev’s Inequality

Chebyshev’s theorem refers to several theorems, all proven by Russian mathematician Pafnuty Chebyshev. They include: Chebyshev’s inequality, Bertrand’s postulate, Chebyshev’s sum inequality and Chebyshev’s equioscillation theorem. Chebyshev’s inequality is the theorem most often used in stats. 
It states that no more than 1/k2 of a distribution’s values are more than “k” standard deviations away from the mean. With a normal distributionstandard deviations tell you how much of that distribution’s data are within k standard deviations from the mean. If you have a distribution that isn’t normal, you can use Chebyshev’s to help you find out what percentage of the data is clustered around the mean.
Chebyshev’s Inequality relates to the distribution of numbers in a set. In layman’s terms, the formula helps you figure out the number of values that are inside and outside the standard deviation. The standard deviation tells you how far away values are from the average of the set. Roughly two-thirds of the values should fall within one standard deviation either side of mean in a normal distribution. In statistics, it’s often referred to as Chebyshev’s Theorem (as opposed to Chebyshev’s Inequality). 
Chebyschev’s Inequality formula is able to prove (with little information given on your part) the probability of outliers existing at a certain interval. Given X is a random variable, A stands for the mean of the set, K is the number of standard deviations, and Y is the value of the standard deviation, the formula reads as follows:

 Pr(|X-A|=>KY)<=1/K2

The absolute value of the difference of X minus A is greater than or equal to the K times Y has the probability of less than or equal to one divided by K squared.


Applications of Chebyshev's Inequality

The formula was used with calculus to develop the weak version of the law of large numbers. This law states that as a sample set increases in size, the closer it should be to its theoretical mean. A simple example is that when rolling a six-sided die, the probable average is 3.5. A sample size of 5 rolls may result in drastically different results. Roll the die 20 times; The average should begin approaching 3.5. As you add more and more rolls, the average should continue to near 3.5 until reaching it. Or, it becomes so close that they are pretty much equal.
Another application is in finding the difference between the mean and median of a set of numbers. Using a one-sided version of Chebyshev’s Inequality theorem, also known as Cantelli’s theorem, you can prove the absolute value of the difference between the median and the mean will always be less than or equal to the standard deviation. This is handy in determining if a median you derived is plausible.

2 comments: