Discriminant Analysis
Discriminant analysis is a
statistical method that is used by researchers to help them understand the
relationship between a "dependent variable" and one or more
"independent variables." A dependent variable is the variable that a
researcher is trying to explain or predict from the values of the independent
variables. Discriminant analysis is similar to regression analysis and analysis
of variance (ANOVA). The principal difference between discriminant analysis and
the other two methods is with regard to the nature of the dependent variable.
Discriminant analysis requires
the researcher to have measures of the dependent variable and all of the
independent variables for a large number of cases. In regression analysis and
ANOVA, the dependent variable must be a "continuous variable." A
numeric variable indicates the degree to which a subject possesses some
characteristic, so that the higher the value of the variable, the greater the
level of the characteristic. A good example of a continuous variable is a
person's income.
In discriminant analysis, the
dependent variable must be a "categorical
variable." The values of a categorical variable serve only to name
groups and do not necessarily indicate the degree to which some characteristic
is present. An example of a categorical variable is a measure indicating to
which one of several different market segments a customer belongs; another
example is a measure indicating whether or not a particular employee is a
"high potential" worker. The categories must be mutually exclusive;
that is, a subject can belong to one and only one of the groups indicated by
the categorical variable. While a categorical variable must have at least two
values (as in the "high potential" case), it may have numerous values
(as in the case of the market segmentation measure).
There are two basic steps in
discriminant analysis. The first involves estimating coefficients, or weighting
factors, that can be applied to the known characteristics of job candidates
(i.e., the independent variables) to calculate some measure of their tendency
or propensity to become high performers. This measure is called a
"discriminant function." Second, this information can then be used to
develop a decision rule that specifies some cut-off value for predicting which
job candidates are likely to become high performers.
The tendency of an individual to
become a high performer can be written as a linear equation. The values of the
various predictors of high performer status (i.e., independent variables) are
multiplied by "discriminant function coefficients" and these products
are added together to obtain a predicted discriminant function score. This
score is used in the second step to predict the job candidates likelihood of
becoming a high performer. Suppose that you were to use three different
independent variables in the discriminant analysis. Then the discriminant
function has the following form:
where
D = discriminant function score,
B , =
discriminant function coefficient relating independent variable i to the
discriminant function score,
X =
value of independent variable i.
The equation is quite similar to
a regression equation. Conventional regression analysis should not be used in
place of discriminant analysis. The dependent variable would have only two
values (high performer and low performer) and would thus violate important
assumptions of the regression model. Discriminant analysis does not have these
limitations with respect to the dependent variable.
There are various tests of
significance that can be used in discriminant analysis. One widely used test
statistic is based on Wilks lambda, which provides an assessment of the
discriminating power of the function derived from the analysis. If this value
is found to be statistically significant, then the set of independent variables
can be assumed to differentiate between the groups of the categorical variable.
This test, which is analogous to the F-ratio test in ANOVA and regression, is
useful in evaluating the overall adequacy of the analysis.
Once the analysis is completed,
the discriminant function coefficients can be used to assess the contributions
of the various independent variables to the tendency of an employee to be a
high performer. The discriminant function coefficients are analogous regression
coefficients and they range between values of -1.0 and 1.0. The first box in
Figure 1 (on the facing page) provides hypothetical results of the discriminant
analysis. The second box provides the within-group averages for the
discriminant function for the two categories of the dependent variable. Note
that the high performers have an average score of 1.45 on the discriminant
function, while the low performers have an average score of -.89. The
discriminant function is treated as a standardized variable, so it has a mean
of zero and a standard deviation of one. The average values of the discriminant
function scores are meaningful only in that they help us interpret the
coefficients. Since the high performers are at the upper end of the scale, all
of the positive coefficients indicate that the greater the value of those
variables, the greater the likelihood of a worker being a high performer (e.g.,
education, motivation).
The magnitudes of the
coefficients also tell us something about the relative contributions of the
independent variables. The closer the value of a coefficient is to zero, the
weaker it is as a predictor of the dependent variable. On the other hand, the
closer the value of a coefficient is to either 1.0 or -1.0, the stronger it is
as a predictor of the dependent variable. In this example, then, years of
education and ability to handle stress both have positive coefficients, though
the latter is quite weak. Finally, individuals who place high importance on
family life are less likely to be high performers than those who do not.
The second step in discriminant
analysis involves predicting to which group in the dependent variable a
particular case belongs. A subject's discriminant score can be translated into
a probability of being in a particular group by means of Bayes Rule. Separate
probabilities are computed for each group and the subject is assigned to the
group with the highest probability. Another test of the adequacy of a model is
the degree to which known cases are correctly classified. As in other statistical
procedures, it is generally preferable to test the model on a set of cases that
were not used to estimate the model's parameters. This provides a more
conservative test of the model. Thus, a set of cases should, if possible, be
saved for this purpose. Having completed the analysis, the results can be used
to predict the work potential of job candidates and hopefully serve to improve
the selection process.
Its very useful Discriminant Analysis post!!!
ReplyDeleteGranular Analytics
Analytics for Micro Markets
Hyper-Local Data
Hyper Local insights
Discriminant Analysis useful blog!!!
ReplyDeleteGranular view of Pharma business
Pharma eco-system in Gurugram
Profile and spread of Pharmacies/ Chemists in Gurugram
New pharma marketing in Gurugram