
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 8, 2019
249 | P a g e
www.ijacsa.thesai.org
Information theory introduced by “Cover & Thomas” [9]
has been widely applied in filter methods, where information
measures are used to evaluate the band‟s relevance and
quantify the amount of information contained on images. This
paper contributes to the knowledge in the area of hyperspectral
dimensionality reduction by proposing a new approach based
on normalized synergic correlation. The proposed method aims
to overcome the limitations of the current state of the art filter
band selection methods such as overestimation of the band
significance, which causes selection of redundant and
irrelevant bands. The new evaluation method selects the band
that has maximum relevance, minimum redundancy and
maximum normalized synergy with the previously selected
bands. This paper reviews the state of art band selection
methods highlighting their common limitations and comparing
their performance versus the proposed algorithm. Experimental
results are carried out using three benchmark hyperspectral
images proposed by the NASA “AVIRIS Indiana Pine” [10],
“Pavia University” and “ROSIS Salinas” [11]. Classification
results are generated using the SVM [12][13] and KNN [14] to
demonstrate the effectiveness and classification accuracy
improvement of the proposed approach.
The rest of the paper is structured as follows. Section 2
describes the fundamentals of information theory and reviews
the state of art band selection methods. Section 3 presents the
proposed normalized max synergy (NMS) algorithm. Section 4
outlines the experiment conducted on the three datasets and
analysis the achieved results. Finally, Section 5 concludes the
paper.
II. BACKGROUND ON INFORMATION THEORY BASED
APPROACHES
In this section, we describe some basic concepts about
information theory and feature selection, which will be used to
build the proposed hyperspectral band selection algorithm.
The information theory proposed by "Cover & Thomas" [9]
has been widely applied in filtering methods, where
information measures are used to assess the relevance and
discrimination of the characteristic.
Definition 1: The Shannon entropy introduced in (1) is
defined as the quantification of the amount of information
contained in variable X.
(1)
Since, Shannon entropy H(X) is defined for a single
variable and it is independent of the class, the mutual
information between two random variables was introduced in
order to measure the statistical dependence between the
features and between the features and the class.
Definition 2: The mutual information (MI) of a pair of
variables in (2) represents their degree of dependence in the
probabilistic sense. It is the reduction of uncertainty on a
random variable through the knowledge of another.
(2)
(3)
(4)
The P(X,Y) in (2) is the joint probability function and P(X)
, P(Y) represent the marginal probabilities.
In the equation (3), H(X) and H(Y) are the Shannon
entropies of two variables X, Y respectively and H(X, Y) is the
joint entropy between the variables. The mutual information
can also be formulated using the conditional entropy as
presented in (4).
Mutual information has the following properties.
Mutual information is positive or zero.
The mutual information is symmetrical.
In a wide survey of the feature selection literature, we have
identified different information theory-based filters [15] and
we will be presenting a selection of the most well-known
criteria.
In the results section, the selected relevant methods will be
applied to hyperspectral data to compare it with our proposed
approach.
Battiti [16] proposed to use mutual information for variable
selection in the Mutual Information-based Feature Selection
(MIFS) algorithm. In this approach, the number of variables is
fixed in advance and at each step, the variable that maximizes
the mutual information between all the variables already
selected is chosen. Formally, the variable selected by the MIFS
algorithm is the one that maximizes the following goal
function:
(5)
The factor „‟ in (5) allows to control the redundancy term
MI(Fi,Fs) and has a great influence on the selection algorithm.
Several authors like Bollacker and Ghosh [17] that use
different values for the parameter without any justification.
The value of is often determined experimentally and depends
on the data used. The problem is highlighted when the subset is
very large and the redundancy term becomes larger than the
relevance term. The algorithm will then select irrelevant
features because they are not redundant, but not because they
are relevant to the class.
As a consequence, several variants of the MIFS algorithm
have been proposed in recent years in order to overcome its
limitations. Kwak and Choi [18] proposed the algorithm MIFS-
U as an improvement of MIFS.
(6)
Peng [19] analyzed as well the limitations of the previous
selection approach and proposed a robust approach minimum
redundancy maximum relevance (mRMR) where the
redundancy term in (7) is divided over the cardinality of the
subset.
(7)
Asma et al. [20] proposed a hybrid strategy combining the
filter mRMR with the Fanno based wrapper strategy in order to
select the relevant hyperspectral bands. Yang and Moody [21]