Monday, 27 April 2015

Benford's Law

Benford's law, also called the First-Digit Law, refers to the frequency distribution of digits in many (but not all) real-life sources of data. In this distribution, 1 occurs as the leading digit about 30% of the time, while larger digits occur in that position less frequently: 9 as the first digit less than 5% of the time. Benford's law also concerns the expected distribution for digits beyond the first, which approach a uniform distribution.

It has been shown that this result applies to a wide variety of data sets, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants,and processes described by power laws (which are very common in nature). It tends to be most accurate when values are distributed across multiple orders of magnitude.

It is named after physicist Frank Benford, who stated it in 1938, although it had been previously stated by Simon Newcomb in 1881.

A set of numbers is said to satisfy Benford's law if the leading digit d (d ∈ {1, ..., 9}) occurs with probability

P(d)=\log_{10}(d+1)-\log_{10}(d)=\log_{10} \left(\frac{d+1}{d}\right)=\log_{10} \left(1+\frac{1}{d}\right).

Numerically, the leading digits have the following distribution in Benford's law, where d is the leading digit and P(d) the probability:

d	P(d)	Relative size of P(d)
1	30.1%
2	17.6%
3	12.5%
4	9.7%
5	7.9%
6	6.7%
7	5.8%
8	5.1%
9	4.6%

Application :-

Accounting fraud detection

In 1972, Hal Varian suggested that the law could be used to detect possible fraud in lists of socio-economic data submitted in support of public planning decisions. Based on the plausible assumption that people who make up figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford's Law ought to show up any anomalous results.[15] Following this idea, Mark Nigrini showed that Benford's Law could be used in forensic accounting and auditing as an indicator of accounting and expenses fraud. In practice, applications of Benford's Law for fraud detection routinely use more than the first digit.

Legal status

In the United States, evidence based on Benford's law has been admitted in criminal cases at the federal, state, and local levels.

Election data

Benford's Law has been invoked as evidence of fraud in the 2009 Iranian elections, and also used to analyze other election results. However, other experts consider Benford's Law essentially useless as a statistical indicator of election fraud in general.

Macroeconomic data

Similarly, the macroeconomic data the Greek government reported to the European Union before entering the eurozone was shown to be probably fraudulent using Benford's law, albeit years after the country joined.

Genome data

The number of open reading frames and their relationship to genome size differs between eukaryotes and prokaryotes with the former showing a log-linear relationship and the latter a linear relationship. Benford's law has been used to test this observation with an excellent fit to the data in both cases.

Scientific fraud detection

A test of regression coefficients in published papers showed agreement with Benford's law. As a comparison group subjects were asked to fabricate statistical estimates. The fabricated results failed to obey Benford's law.

Distributions that can be expected to obey Benford's Law

When the mean is greater than median and the skew is positive

Numbers that result from mathematical combination of numbers: e.g., quantity × price

Transaction level data: e.g., disbursements, sales

Distributions that would not be expected to obey Benford's Law

Where numbers are assigned sequentially: e.g., check numbers, invoice numbers

Where numbers are influenced by human thought: e.g., prices set by psychological thresholds ($1.99)

Accounts with a large number of firm-specific numbers: e.g., accounts set up to record $100 refunds

Accounts with a built-in minimum or maximum

Where no transaction is recorded

Youtube Link for Quick understanding -->

https://www.youtube.com/watch?v=vIsDjbhbADY

k-Means: Step-By-Step Example

As a simple illustration of a k-means algorithm, consider the following data set consisting of the scores of two variables on each of seven individuals:

Subject	A	B
1	1.0	1.0
2	1.5	2.0
3	3.0	4.0
4	5.0	7.0
5	3.5	5.0
6	4.5	5.0
7	3.5	4.5

This data set is to be grouped into two clusters. As a first step in finding a sensible initial partition, let the A & B values of the two individuals furthest apart (using the Euclidean distance measure), define the initial cluster means, giving:

	Individual	Mean Vector (centroid)
Group 1	1	(1.0, 1.0)
Group 2	4	(5.0, 7.0)

The remaining individuals are now examined in sequence and allocated to the cluster to which they are closest, in terms of Euclidean distance to the cluster mean. The mean vector is recalculated each time a new member is added. This leads to the following series of steps:

	Cluster 1		Cluster 2
Step	Individual	Mean Vector (centroid)	Individual	Mean Vector (centroid)
1	1	(1.0, 1.0)	4	(5.0, 7.0)
2	1, 2	(1.2, 1.5)	4	(5.0, 7.0)
3	1, 2, 3	(1.8, 2.3)	4	(5.0, 7.0)
4	1, 2, 3	(1.8, 2.3)	4, 5	(4.2, 6.0)
5	1, 2, 3	(1.8, 2.3)	4, 5, 6	(4.3, 5.7)
6	1, 2, 3	(1.8, 2.3)	4, 5, 6, 7	(4.1, 5.4)

Now the initial partition has changed, and the two clusters at this stage having the following characteristics:

	Individual	Mean Vector (centroid)
Cluster 1	1, 2, 3	(1.8, 2.3)
Cluster 2	4, 5, 6, 7	(4.1, 5.4)

But we cannot yet be sure that each individual has been assigned to the right cluster. So, we compare each individual’s distance to its own cluster mean and to
that of the opposite cluster. And we find:

Individual	Distance to mean (centroid) of Cluster 1	Distance to mean (centroid) of Cluster 2
1	1.5	5.4
2	0.4	4.3
3	2.1	1.8
4	5.7	1.8
5	3.2	0.7
6	3.8	0.6
7	2.8	1.1

Only individual 3 is nearer to the mean of the opposite cluster (Cluster 2) than its own (Cluster 1). In other words, each individual's distance to its own cluster mean should be smaller that the distance to the other cluster's mean (which is not the case with individual 3). Thus, individual 3 is relocated to Cluster 2 resulting in the new partition:

	Individual	Mean Vector (centroid)
Cluster 1	1, 2	(1.3, 1.5)
Cluster 2	3, 4, 5, 6, 7	(3.9, 5.1)

The iterative relocation would now continue from this new partition until no more relocations occur. However, in this example each individual is now nearer its own cluster mean than that of the other cluster and the iteration stops, choosing the latest partitioning as the final cluster solution.

Also, it is possible that the k-means algorithm won't find a final solution. In this case it would be a good idea to consider stopping the algorithm after a pre-chosen maximum of iterations.

Sunday, 26 April 2015

How is perceptual mapping used in consumer research

Perceptual Mapping

Perceptual mapping is a diagrammatic technique used by marketers in an attempt to visually display the perceptions of customers or potential customers. Typically the position of a product, product line, brand, or company is displayed relative to their competition.

Some perceptual maps use different size circles to indicate the sales volume or market share of the various competing products.

Perceptual maps help marketers understand where the consumer ranks their company in terms of characteristics and in comparison to competing companies.

Perceptual maps can display consumers' ideal points that reflect their ideal combinations of product characteristics.

When creating a new product, a company should look for a space that is currently unoccupied by competitors and that has a high concentration of consumer desire (ideal points).

A perceptual map is usually based more on a marketer's knowledge of an industry than market research.

Source: Boundless. “Perceptual Mapping.” Boundless Marketing. Boundless, 14 Nov. 2014. Retrieved 26 Apr. 2015 from https://www.boundless.com/marketing/textbooks/boundless-marketing-textbook/consumer-marketing-4/competitive-perceptual-positioning-39/perceptual-mapping-197-6893/

Source: Boundless. “Perceptual Mapping.” Boundless Marketing. Boundless, 14 Nov. 2014. Retrieved 26 Apr. 2015 from https://www.boundless.com/marketing/textbooks/boundless-marketing-textbook/consumer-marketing-4/competitive-perceptual-positioning-39/perceptual-mapping-197-6893/

Perceptual maps help marketers understand where the consumer ranks their company in terms of characteristics and in comparison to competing companies.
Perceptual maps can display consumers' ideal points that reflect their ideal combinations of product characteristics.
When creating a new product, a company should look for a space that is currently unoccupied by competitors and that has a high concentration of consumer desire (ideal points).
A perceptual map is usually based more on a marketer's knowledge of an industry than market research.

Price elasticity

The measurement of how changing one economic variable affects others. For example:"If I lower the price of my product, how much more will I sell? ""If I raise the price, how much less will I sell? ""If we learn that a resource is becoming scarce, will people scramble to acquire it? "

Demand void

Areas without any significant consumer desires; typically found in ideal point maps of perceptual mapping.

Perceptual Map Of Competing Products

Perceptual maps commonly have two dimensions even though they are capable of having several.

For example, in this perceptual map you can see consumer perceptions of various automobiles on the two dimensions of sportiness/conservative and classy/affordable.

This sample of consumers felt that Porsche cars were the sportiest and classiest of the ones in the study. They felt that Plymouth cars were the most practical and conservative. Cars that are positioned close to each other were seen as similar on the relevant dimensions by the consumer.

For example, consumers saw Buick, Chrysler, and Oldsmobile as similar. They are close competitors and form a competitive grouping. A company considering the introduction of a new model will look for an area on the map free from competitors.

Perceptual mapping
This is an example of a perceptual map.

Perceptual Map Of a Consumer's Ideal

Many perceptual maps also display consumers' ideal points.

These points reflect ideal combinations of the two product characteristics as seen by a consumer. This diagram shows a study of consumers' ideal points in the alcohol product space.

Each dot represents one respondent's ideal combination of the two dimensions. Areas where there is a cluster of ideal points (such as A) indicates a market segment.

Areas without ideal points are sometimes referred to as demand voids.

Perceptual Map of Ideal Points in the Alcohol Product Space
Ideal points maps reflect ideal combinations of two product characteristics as seen by a consumer. This helps marketers accurately target their message to consumers based on consumer desires.

Combining the Competing Products and Ideal Points Maps

A company considering introducing a new product will look for areas with a high density of ideal points.

They will also look for areas without competitive rivals. This is best done by placing both the ideal points and the competing products on the same map. This map displays various aspirin products as seen on the dimensions of effectiveness and gentleness.

It also shows two ideal vectors. This study indicates that there is one segment that is more concerned with effectiveness than harshness, and another segment that is more interested in gentleness than strength.

Combination Map of Competing Products and Ideal Points
A combination map allows companies to find a space that has unmet consumer desires.

Intuitive Maps

Perceptual maps need not come from a detailed study. There are also intuitive maps (also called judgmental maps or consensus maps) that are created by marketers based on their understanding of their industry.

The value of this type of map is questionable, as they often just give the appearance of credibility to management's preconceptions.

When detailed marketing research studies are done, methodological problems can arise, but at least the information is coming directly from the consumer.

There is an assortment of statistical procedures (preference regression, multi-dimensional scaling) that can be used to convert the raw data collected in a survey into a perceptual map.

Some techniques are constructed from perceived differences between products, others are constructed from perceived similarities.

Still others are constructed from cross price elasticity of demand data from electronic scanners.

Source: Boundless. “Perceptual Mapping.” Boundless Marketing. Boundless, 14 Nov. 2014.

Perceptual Mapping

PERMAP is a program that uses multidimensional scaling (MDS) to reduce multiple pairwise relationships to 2-D pictures, commonly called perceptual maps.

The fundamental purpose of Permap is to uncover hidden structure that might be residing in a complex data set. A unique feature of PERMAP is that it embeds the mapping techniques in an interactive, graphical system that minimizes several difficulties associated with multidimensional scaling practices. It is particularly effective at exposing artifacts due to local minimal, incomplete convergence, and the effects of outliers.

PERMAP takes object-to-object proximity values (also called similarities, dissimilarities, correlations, distances, interactions, psychological distances, dependencies, preferences etc.) and uses multidimensional scaling (MDS) to make a map that shows the relationships between the objects. PERMAP makes classical metric and non-metric MDS analyses in one, two, and three … or eight dimensions, for one-mode two-way or two-mode two-way data, with up to 1000 objects and with missing values allowed.

Another important aspect of perceptual maps is that they are forgiving of missing or imprecise data points. Whereas some analytical techniques cannot tolerate missing elements in the input matrix, MDS results are often unaffected.

Proximity is some measure of likeness or nearness, or difference or distance, between objects. It can be either a similarity (called a resemblance in some disciplines) or dissimilarity. If the proximity value gets larger when objects become more alike or closer in some sense, then the proximity is a similarity. If the opposite is the case, the proximity is dissimilarity.

When the distance between two objects in a matrix is 1 then the matrix is called as the similarity matrix. Otherwise if the distance between two objects in a matrix is 0 then the matrix is called as the dissimilarity matrix.

An Attribute is some aspect of an object. It may be called a factor, characteristic, trait, property, component, quantity, variable, dimension (not a good choice in MDS work, but occasionally seen), parameter, and so forth. The attributes should be presented in a form where each is normalized (standardized) to some kind of range or standard deviation, but Permap can do the normalizing internally if so desired.

Permap's data files are based on free form data entry. All values must be non negative and diagonal values must be zero. The data can be separated with space(s), a comma, or both. DISSIMILARITYLIST is all one word. If the data are dissimilarities then these diagonal values must be zero by definition.

If your proximity information is in the form of similarities instead of dissimilarities, then replace the keyword DISSIMILARITYLIST with SIMILARITYLIST and be sure that the diagonal values are all equal and are not exceeded by any other similarity value. There is no space before the "LIST" part of the keyword and capitalization is not important.

Example: Distance between different cities

Step 1: Distance matrix was created in Excel

Step 2: Copy the data to notepad as follows –

Step 3: Open the notepad file on Pemap. Once the file is loaded, click on START

Field Movements (Mirror, Rotate, Move, Zoom)

Occasionally you will want to control the final orientation of a map in order to a simple comparison to previous results, or you might want to expand a map to inspect a small, congested, area. These needs can be satisfied by mirroring, rotating, moving, or zooming in. These operations are known as "field movements”. The field movement controls are activated by clicking the Field button or right clicking the mouse on an open area.

If Mirror is chosen, then clicking near an axis will cause the map to be mirrored about that axis.
If Rotate is chosen, then dragging the mouse about the center of the map will cause the object set to rotate about the center.
If Move is chosen, then dragging the mouse in any direction will cause the object set to move in that direction.
If Zoom is chosen, then dragging the mouse away from the center of the map will cause the object set to expand, and vice versa.

PERMAP's lets you drag-and-drop objects in and out of the active set while the map is evolving and being displayed. Therefore, single objects can be taken out and placed in “Parked objects” to see how it affects the result.