1. Introduction
Challenges surrounding high-dimensional data has been strikingly noticed since the emergence of
the internet and rapid development of information technology. Trace of curse of dimensionality is
become bold in machine learning, pattern recognition, data mining, computer vision and
recommender systems. Recently, dimensionality reduction techniques have been developed
rapidly. Feature selection and feature extraction are the two main categories of dimensionality
reduction. Feature extraction aims to map data to a lower dimension representation, whereas
Feature selection is aimed to obtain the most discriminative features of high-dimensional data,
which can be used to interpret medical data, industrial data, finance data, etc.
Feature selection in medical data can be important since the selected features can be used in disease
diagnosis or optimizing treatments. To this end, a new line of research is devoted to bioinformatic
and pattern recognition to deal with DNA microarray high dimensional datasets. These data types
have significantly more features than sample points and thus need to be treated as an undetermined
system [1,2]. Although thousands of genes from different samples are stored in a DNA micro array,
only a small portion of the encoded information are related to disease [3]. A Gene selection
problem in bioinformatic is equivalent to a feature selection task in machine learning.
In the context of disease diagnosis and progression analysis using feature selection, demographic,
spirometry test and Computed Tomography (CT) features were collected and combined to predict
chronic obstructive pulmonary disease (COPD) progression, exacerbation, and hospitalization.
Then, extracting more prominent and discriminative features improved the accuracy of
hospitalization prediction and discrimination between COPD and asthma [4-6].
Features without effect on output considered as irrelevant features, and redundant features are a
mix of other features which cannot add more information. Including these irrelevant and redundant
features increases the complexity of system which affects the performance of the learning
algorithm and the computation time [7].
Feature selection is utilized for dimensionality reduction and to obtain the most informative subset
of features by minimizing redundancy and maximizing the relevancy with the aim of increasing
accuracy and decreasing computation time. Feature selection decreases task complexity and thus
the probability of overfitting is decreased [8]. Peaking phenomenon states the error of a classifier
for fixed data is decreased by adding features though error might be increased [9]. There are three
strategies for feature selection; A) Filter, B) Wrapper, and C) Embedded, based techniques. In
filter-based, the features are sorted and assessed based on some criteria by statistical and intrinsic
characteristics of the dataset. In the filter method, there is no connection between classifier and
dataset. Laplacian score [10], mutual information and MRMR are amongst the most well-known
filter-based techniques [11-15]. In wrapper-based, there is a connection between features and the
learning algorithm and thus the best subset of features can be achieved based on the output of the
machine. Forward feature selection, sequential backward feature selection and floating search [16]
and recursive SVM [17] are conventional feature selection techniques in machine learning.