reporting them accordingly. Typically, an IDS is divided into two categories i.e., network-
based intrusion detection system (NIDS) and host-based intrusion detection system (HIDS).
NIDS monitors inbound and outbound network traffic. NIDS captures whole network traffic
and evaluate the separate packets to detect abnormal activities [3]. HIDS monitors computer
or host system to detect malicious activities. HIDS detect intrusion by monitor the system
call, application activities, scheduling process, login attempts and system configuration on a
specific machine. If any malicious activity or changes occur in the system, it generates an
alarm [2,3,4]. Moreover, there are three general intrusion detection types: anomaly-based,
misuse (signature-based) and hybrid-based. Anomaly based Intrusion detection approaches
relay on the attacker behavior by determining the user profile and uses as baseline to define
normal user activities. When compare attacker activities with baseline and it generates an
alarm to aware the network administrator. Anomaly detection can detect unknown attacks but
there with high false positive rate [3,4]. The misuse intrusion detection method stores the
attack pattern or signature and compares this pattern to network traffic. If it’s matched, it
generates an alarm of intrusion. This method can detect only known attacks [3,4]. Hybrid
method is a combination of both anomaly and signature-based intrusion detection methods
and can detect both known and unknown attacks.
One of the major challenges to detecting attacks or malicious activities is analyzing large
amounts of data. Intrusion detection systems face big data challenges [2]. Therefore, feature
selection plays a key role in reducing big data problems by selecting the most relevant
features [2]. Features selection is a machine learning technique that selects the most relevant
subset of the original features set that achieves high detection accuracy as compared to the
original features set [5]. By reducing the dimensional of datasets to remove the irrelevant and
redundant features, machine learning algorithms can make more efficient classification
predictions. The selection of best feature subsets can reduce training and testing time,
improve detection rates, reduce false alarms rate, and create lightweight datasets that can be
used to build IDS for real-time and online attack detection [2]. There are three
comprehensive methods for features selection such as filters, wrappers, and hybrid-based
methods [6].
Filter based feature selection approach uses independent algorithms to select features and
applies external algorithms to evaluate the performance of selected features [6,7]. This
approach uses statistical measure to evaluate the relationship between each input variable and
target variable. It selects input variables that have a strong correlation with the target variable
and considers the most relevant features. This feature method can be easily applied as it does
not use learning algorithms in the feature selection procedure [1]. Wrapper method
“wrapped” around the learning algorithm. This selection method uses a learning algorithm to
select the important features subsets. The Wrapper algorithm uses a search algorithm to
evaluate the significance of different feature subsets, where feature subsets' worthiness is
evaluated by a learner [6]. The Wrappers method is computationally expensive but, in
comparison to other feature selection approaches, it is more precise [1,2]. Hybrid methods
combined the benefits of filter-based methods and wrapper-based methods to obtain a best
feature selection subset by using a learning algorithm.