-*- mode: org -*-
Explicitly about an ROC point of view
*  47(<-573): Multi-class ROC analysis from a multi-objective optimisation perspective

The receiver operating characteristic (ROC) has become a standard tool
for the analysis and comparison of classifiers when the costs of
misclassification are unknown. There has been relatively little work,
however, examining ROC for more than two classes. Here we discuss and
present an extension to the standard two-class ROC for multi-class
problems. We define the ROC surface for the Q-class problem in terms
of a multi-objective optimisation problem in which the goal is to
simultaneously minimise the Q(Q-1) misclassification rates, when the
misclassification costs and parameters governing the classifier's
behaviour are unknown. We present an evolutionary algorithm to locate
the Pareto front-the optimal trade-off surface between
misclassifications of different types. The use of the Pareto optimal
surface to compare classifiers is discussed and we present a
straightforward multi-class analogue of the Gini coefficient. The
performance of the evolutionary algorithm is illustrated on a
synthetic three class problem, for both k-nearest neighbour and
multi-layer perceptron classifiers. (c) 2005 Elsevier B.V. All rights
reserved.

2006


* 137(<-275): A two-stage evolutionary algorithm based on sensitivity and accuracy for multi-class problems

The machine learning community has traditionally used correct
classification rates or accuracy (C) values to measure classifier
performance and has generally avoided presenting classification levels
of each class in the results, especially for problems with more than
two classes. C values alone are insufficient because they cannot
capture the myriad of contributing factors that differentiate the
performance of two different classifiers. Receiver Operating
Characteristic (ROC) analysis is an alternative to solve these
difficulties, but it can only be used for two-class problems. For this
reason, this paper proposes a new approach for analysing classifiers
based on two measures: C and sensitivity (S) (i.e., the minimum of
accuracies obtained for each class). These measures are optimised
through a two-stage evolutionary process. It was conducted by applying
two sequential fitness functions in the evolutionary process,
including entropy (E) for the first stage and a new fitness function,
area (A), for the second stage. By using these fitness functions, the
C level was optimised in the first stage, and the S value of the
classifier was generally improved without significantly reducing C in
the second stage. This two-stage approach improved S values in the
generalisation set (whereas an evolutionary algorithm (EA) based only
on the S measure obtains worse S levels) and obtained both high C
values and good classification levels for each class. The methodology
was applied to solve 16 benchmark classification problems and two
complex real-world problems in analytical chemistry and predictive
microbiology. It obtained promising results when compared to other
competitive multiclass classification algorithms and a multi-objective
alternative based on E and S. (C) 2012 Elsevier Inc. All rights
reserved.

2012


* 433(<-670): Multiobjective genetic optimization of diagnostic classifiers with implications for generating receiver operating characteristic curves

It is well understood that binary classifiers have two implicit
objective functions (sensitivity and specificity) describing their
performance. Traditional methods of classifier training attempt to
combine these two objective functions (or two analogous class
performance measures) into one so that conventional scalar
optimization techniques can be utilized. This involves incorporating a
priori information into the aggregation method so that the resulting
performance of the classifier is satisfactory for the task at hand. We
have investigated the use of a niched Pareto multiobjective genetic
algorithm (GA) for classifier optimization. With niched Pareto GA's,
an objective vector is optimized instead of a scalar function,
eliminating the need to aggregate classification objective
functions. The niched Pareto GA returns a set of optimal solutions
that are equivalent in the absence of any information regarding the
preferences of the objectives. The a priori knowledge that was used
for aggregating the objective functions in conventional classifier
training can instead be applied post-optimization to select from one
of the series of solutions returned from the multiobjective genetic
optimization. We have applied this technique to train a linear
classifier and an artificial neural network (ANN), using simulated
datasets, The performances of the solutions returned from the
multiobjective genetic optimization represent a series of optimal
(sensitivity, specificity) pairs, which can be thought of as operating
points on a receiver operating characteristic (ROC) curve. All
possible ROC curves for a given dataset and classifier are less than
or equal to the ROC curve generated by the niched Pareto genetic
optimization.

1999