












Centroid
Mean value for the discriminant Z scores of all objects within
a particular category or group. For example, a twogroup discriminant
analysis has two centroids, one for the objects in each of the
two groups.









Chebychev inequality
For a random variable X with mean and variance we have for all
This follows from Jensen's inequality applied to









Classification Trees
Classifiers which partition the examples on one feature at a
time.









Classifier
A rule to assign a class (or 'doubt' or 'outlier') to new examples.









Codebook Vectors
Representative examples of a probability distribution. The term
comes from vector quantization.









Collinearity
Relationship between two (collinearity) or more (multicollinearity)
variables. Variables exhibit complete collineartity if their
correlation coefficient is 1 and a complete lack of collinearity
if their correlation coefficient is 0.









Compact set
A subset is compact if it is closed and bounded, that is for
some Compact sets are also called compacta.









Condition Index
Measure of the relative amount of variance associated with an
eigenvalue so that a large condition index indicates a high
degree of collinearity.









Consistent
An estimator is consistent if in large samples it converges
to the true parameter value (when there is one).









Concave
A function is concave if is concave.









Convex
A function is concave if is convex.









CookĄ¯s Distance
Summary measure of the influence of a single case(observation)
based on the total changes in all other residuals when the case
is deleted from the estimation process. Large values (usually
greater than 1) indicate substantial influence by the case in
affecting the estimated regression coefficients.









COVRATIO
Measure of the influence of a single observation on the entire
set of estimated regression coefficients. A value close to 1
indicates little influence. If the COVRATIO value minus 1 is
greater than +3p/n(where p is the number of independent variables+1,
and n is the sample size), the observation is deemed to be influential
based on this measure.









CrossValidation
A method of evaluating parameters or classifiers by dividing
the training set into several parts, and in turn using one part
to test the procedure fitted to the remaining parts. Sometimes
used to refer to leaveoneout (or ordinary) crossvalidation,
where every example is dropped in turn. This term is much abused;
it does not mean the use of a test set or validation set. Generalized
crossvalidation is a measure of the performance of a regularized
classifier;






