Welcome to Business Analytics blog :): Purpose, Linear Equation & Assumptions : Discriminant Analysis

Discriminant analysis – creates an equation which will minimize the possibility
of misclassifying cases into their respective groups or categories.

A discriminant score. This is a weighted linear combination (sum) of the
discriminating variables.

The purposes of discriminant analysis (DA):

Discriminant Function Analysis (DA) undertakes the same task as multiple linear regression
by predicting an outcome. However, multiple linear regression is limited to cases where the
dependent variable on the Y axis is an interval variable so that the combination of predictors
will, through the regression equation, produce estimated mean population numerical
Y values for given values of weighted combinations of X values.

But many interesting variables are categorical, such as political party voting intention, migrant/non-migrant status,
making a profit or not, holding a particular credit card, owning, renting or paying a mortgage
for a house, employed/unemployed, satisfied versus dissatisfied employees, which custom-
ers are likely to buy a product or not buy, what distinguishes Stellar Bean clients from
Gloria Beans clients, whether a person is a credit risk or not, etc.

DA is used when:

1)The dependent is categorical with the predictor IV’s at interval level such as age, income,
attitudes, perceptions, and years of education, although dummy variables can be used
as predictors as in multiple regression. Logistic regression IV’s can be of any level of
measurement.
2)There are more than two DV categories, unlike logistic regression, which is limited to a
dichotomous dependent variable.

Discriminant analysis linear equation:
=======================================

DA involves the determination of a linear equation like regression that will predict which
group the case belongs to.

The form of the equation or function is:

D = v1X1 + v 2 X 2 + v3 X 3 = ........v i X i + a
Where D = discriminate function
v = the discriminant coefficient or weight for that variable
X = respondent’s score for that variable
a = a constant
i = the number of predictor variables

This function is similar to a regression equation or function.

The v’s are unstandardized discriminant coefficients analogous to the b’s in the regression equation. These v’s maximize
the distance between the means of the criterion (dependent) variable.

Standardized discriminant coefficients can also be used like beta weight in regression. Good predictors
tend to have large weights. What you want this function to do is maximize the distance
between the categories, i.e. come up with an equation that has strong discriminatory power
between groups.

After using an existing set of data to calculate the discriminant function
and classify cases, any new cases can then be classified. The number of discriminant func-
tions is one less the number of groups. There is only one function for the basic two group
discriminant analysis.

Assumptions of discriminant analysis:
========================================
The major underlying assumptions of DA are:

1)the observations are a random sample;

2)each predictor variable is normally distributed;

3)each of the allocations for the dependent categories in the initial classification are
correctly classified;

4)there must be at least two groups or categories, with each case belonging to only one
group so that the groups are mutually exclusive and collectively exhaustive (all cases
can be placed in a group);

5)each group or category must be well defined, clearly differentiated from any other
group(s) and natural. Putting a median split on an attitude scale is not a natural way to
form groups. Partitioning quantitative variables is only justifiable if there are easily
identifiable gaps at the points of division;

6)for instance, three groups taking three available levels of amounts of housing loan;

7)the groups or categories should be defined before collecting the data;

8)the attribute(s) used to separate the groups should discriminate quite clearly between

9)the groups so that group or category overlap is clearly non-existent or minimal;

10)group sizes of the dependent should not be grossly different and should be at least five
times the number of independent variables.

There are several purposes of DA:

1)To investigate differences between groups on the basis of the attributes of the cases,
indicating which attributes contribute most to group separation. The descriptive tech-
nique successively identifies the linear combination of attributes known as canonical
discriminant functions (equations) which contribute maximally to group separation.
Predictive DA addresses the question of how to assign new cases to groups. The DA
function uses a person’s scores on the predictor variables to predict the category to
which the individual belongs.

2)To determine the most parsimonious way to distinguish between groups.

3)To classify cases into groups. Statistical significance tests using chi square enable you

4)to see how well the function separates the groups.

5)To test theory whether cases are classified as predicted.

Summary:
=========

Discriminant analysis uses a collection of interval variables to predict a categorical variable
that may be a dichotomy or have more than two values. The technique involves finding a linear
combination of independent variables (predictors) – the discriminant function – that creates
the maximum difference between group membership in the categorical dependent variable.
Stepwise DA is also available to determine the best combinations of predictor variables.
Thus discriminant analysis is a tool for predicting group membership from a linear combination
of variables.

Welcome to Business Analytics blog :)

Saturday, 9 May 2015

Purpose, Linear Equation & Assumptions : Discriminant Analysis

2 comments: