Effective Metaheuristic Base d Classifiers for Multiclass Intrusion Detection Zareen FATIMA1 Arshad Ali2

2025-05-03 0 0 503.19KB 17 页 10玖币
侵权投诉
Effective Metaheuristic Based Classifiers for Multiclass Intrusion
Detection
Zareen FATIMA1, Arshad Ali2
1Department of Computer Science, University of Lahore, Lahore, Pakistan,
ORCID iD: https://orcid.org/0000-0002-1198-5883
2FAST School of Computing, NUCES, Lahore Block B Faisal Town, Lahore, 54770, Pakistan,
ORCID iD: https://orcid.org/0000-0003-0562-7403
Abstract: Network security has become the biggest concern in the area of cyber security
because of the exponential growth in computer networks and applications. Intrusion
detection plays an important role in the security of information systems or networks
devices. The purpose of an intrusion detection system (IDS) is to detect malicious activities
and then generate an alarm against these activities. Having a large amount of data is one of
the key problems in detecting attacks. Most of the intrusion detection systems use all
features of datasets to evaluate the models and result in is, low detection rate, high
computational time and uses of many computer resources. For fast attacks detection IDS
needs a lightweight data. A feature selection method plays a key role to select best features
to achieve maximum accuracy. This research work conduct experiments by considering on
two updated attacks datasets, UNSW-NB15 and CICDDoS2019. This work suggests a
wrapper based Genetic Algorithm (GA) features selection method with ensemble
classifiers. GA select the best feature subsets and achieve high accuracy, detection rate
(DR) and low false alarm rate (FAR) compared to existing approaches. This research
focuses on multi-class classification. Implements two ensemble methods: stacking and
bagging to detect different types of attacks. The results show that GA improve the accuracy
significantly with stacking ensemble classifier.
Keywords: Intrusion detection, genetic algorithm, feature selection, UNSW-NB15,
CICDDoS2019
1. Introduction
Due to rapid growth in computer networks and applications, networks security has become a
major challenge in cyber security field. Intrusion detection plays a key role to detect different
attacks in computer networks [1]. The goal of an Intrusion Detection System (IDS) is to
monitor network traffic to identify any malicious activity that can breach confidentiality and
integrity information. Accordingly, the IDS alerts the network administrator or system about
such activities or attacks [2]. An attack is a malicious activity or unauthorized access that
affects the network system and can get access to confidential data. Typically, there are five
major categories of attacks: Denial of service (DoS), brute force attack, Probe attacks, user to
root, remote to local attack. An IDS is capable of responding to different attacks and
*Corresponding author: arshad.ali1@nu.edu.pk
reporting them accordingly. Typically, an IDS is divided into two categories i.e., network-
based intrusion detection system (NIDS) and host-based intrusion detection system (HIDS).
NIDS monitors inbound and outbound network traffic. NIDS captures whole network traffic
and evaluate the separate packets to detect abnormal activities [3]. HIDS monitors computer
or host system to detect malicious activities. HIDS detect intrusion by monitor the system
call, application activities, scheduling process, login attempts and system configuration on a
specific machine. If any malicious activity or changes occur in the system, it generates an
alarm [2,3,4]. Moreover, there are three general intrusion detection types: anomaly-based,
misuse (signature-based) and hybrid-based. Anomaly based Intrusion detection approaches
relay on the attacker behavior by determining the user profile and uses as baseline to define
normal user activities. When compare attacker activities with baseline and it generates an
alarm to aware the network administrator. Anomaly detection can detect unknown attacks but
there with high false positive rate [3,4]. The misuse intrusion detection method stores the
attack pattern or signature and compares this pattern to network traffic. If it’s matched, it
generates an alarm of intrusion. This method can detect only known attacks [3,4]. Hybrid
method is a combination of both anomaly and signature-based intrusion detection methods
and can detect both known and unknown attacks.
One of the major challenges to detecting attacks or malicious activities is analyzing large
amounts of data. Intrusion detection systems face big data challenges [2]. Therefore, feature
selection plays a key role in reducing big data problems by selecting the most relevant
features [2]. Features selection is a machine learning technique that selects the most relevant
subset of the original features set that achieves high detection accuracy as compared to the
original features set [5]. By reducing the dimensional of datasets to remove the irrelevant and
redundant features, machine learning algorithms can make more efficient classification
predictions. The selection of best feature subsets can reduce training and testing time,
improve detection rates, reduce false alarms rate, and create lightweight datasets that can be
used to build IDS for real-time and online attack detection [2]. There are three
comprehensive methods for features selection such as filters, wrappers, and hybrid-based
methods [6].
Filter based feature selection approach uses independent algorithms to select features and
applies external algorithms to evaluate the performance of selected features [6,7]. This
approach uses statistical measure to evaluate the relationship between each input variable and
target variable. It selects input variables that have a strong correlation with the target variable
and considers the most relevant features. This feature method can be easily applied as it does
not use learning algorithms in the feature selection procedure [1]. Wrapper method
“wrapped” around the learning algorithm. This selection method uses a learning algorithm to
select the important features subsets. The Wrapper algorithm uses a search algorithm to
evaluate the significance of different feature subsets, where feature subsets' worthiness is
evaluated by a learner [6]. The Wrappers method is computationally expensive but, in
comparison to other feature selection approaches, it is more precise [1,2]. Hybrid methods
combined the benefits of filter-based methods and wrapper-based methods to obtain a best
feature selection subset by using a learning algorithm.
In this research, we suggest a wrapper-based FS technique based on the Genetic Algorithm
(GA) that generates optimal feature sets by using the Naïve bayes (NB) ML algorithm as its
fitness function. To evaluate the performance of the suggested method, we use two intrusion
detection datasets, UNSW-NB15 and CICDDoS2019. These datasets are latest and have
most up-to-date attacks. The UNSW-NB15 dataset is widely used in feature selection
methods evaluation, but CICDoS2019 is not commonly used.
The contribution and purpose of this research are as follows:
We use recent and the most up-to-date attack datasets.
Firstly, we select a GA-based feature selection method. To compute the fitness
function, we used the Nave Bayes (NB) learning algorithm in the GA process.
Secondly, for each selected feature set evaluation, we apply two ensemble-based
classifiers, stacking and bagging.
Finally, we compare our suggested method with existing methods. The results show a
significant improvement in performance.
The reminder of the paper is organized as follows. Section 2 describes the related work.
Section 3 demonstrates the suggested IDS methodology. Section 4 presents the experiments
and discussion of the results. Section 5 provides the conclusion of this paper.
2. Related Work
This section gives an overview of relevant research works that used machine learning
techniques in the domain of IDS. This section also provides an overview of several IDS
frameworks and solutions.
In [8], the authors performed the features selection method on the UNSW-NB15 dataset. In
this study, features were selected by using information gain obtained through XGBoost
classifier. Features with a higher information gain score were considered more important
features. They selected the 23 most important features and applied the XGBoost classifier to
predict different attack types. On the testing dataset, they achieved 75.88% accuracy.
In [9], statistical and heuristic-based feature selection methods (forward selection and
backward elimination) were used for features selection. These methods were performed on
DARPA dataset. Resilient back propagation neural networks used for classification, it
increased the accuracy and gave less training and testing time then tradition neural network.
Heuristic and statistical based methods selected the features for each class type. The accuracy
difference is very small or may not be statistically significant because the pattern of five
class’s size has huge difference.
In [7] Fisher Score algorithm was used to select the best features. They used CICIDS2017
datasets and classified the datasets as benign or DDoS by using SVM, KNN and Decision
Tree (DT) algorithms. Fisher Score algorithm reduced features from 80 to 30. According to
the order of importance, the “Fwd Packet Length Mean” feature considered the most
important feature to detect intrusion. With 60% reduction in size of datasets, the success rate
of KNN increased, DT accuracy did not change and SVM accuracy decreased.
In [10] author presented two stage classifier approach, in first stage incoming traffic divided
in TCP, UDP or other protocols then it identifies the normal or attacks class. During
preprocessing in first stage, features were selected by using information gain methods. In
second stage, they used multiclass classification to identify the attack type. In the full dataset
with all classes, it used Reduced Error Pruning Tree (REPTree) for classification. The
experiments were performed on UNSW-NB15 and NSL KDD datasets.
In [11], researchers presented a filter-based approach by using XGBoost algorithm for
features reduction. The experiments were performed on UNSW-NB15 dataset. XGBoost
calculates the F-measure score for each feature of the given dataset, and high-scoring features
are selected as important features. The number of features has been reduced from 42 to 19.
The following ML algorithms were applied to the selected features subspace: ANN, LR,
KNN, SVM, and DT.
The author of [12] created a new dataset, CICDoS2109, which contains the 11 DDoS attack
types. This dataset has 11 classes in training dataset. It also applied the info gain method to
select the important features. The authors applied different ML classifiers, ID3, RF, NB, and
LR, to obtain accuracy. They evaluated the performance by using three metrics: precision,
recall, and f1 scores.
In [13] authors applied ensemble feature selection methods on CICIDS2017 datasets and
reduced features from 69 to 10. Gini importance, permutation importance and Drop-column
importance are used as feature selection methods. Random forest classifier used for
evolution. Permutation importance considered best methods between the accuracy of drop-
column and the computational cost of Gini importance. The comparison of both original 69
features and 10 features determined the minor difference in F1-score 0.2 and false positive
rate approximately 0.
Author of [14] proposed multi-objective feature selection method based on NSGA-II and
logistic regression. Two schemes were used in this proposed work, for binary class dataset
binomial logistic regression were used and for multi class dataset they used multinomial
logistic regression. C4.5, RF, and NB classifier were used to evaluate the best subset
features. For experiments CICIDS2017, NSL KDD and UNSW-NB15 datasets are used.
Features Selected in NSL KDD datasets, 9-19 in binary class and 19 in multi class. In
CICIDS2017 datasets selected features in binary class 725 and 33 in multi class. UNSW-
NB15 datasets selected features 8-17 in binary and 11 in multi class. This paper gives better
accuracy in binary class compared to multi class.
In [15], authors presented an IDS framework for features selection by using evolutionary
genetic algorithm approach with SVM (GA-SVM). This paper proposed a new fitness
function by using three evaluation parameters: FPR, TPR, and the number of selected
features. SVM is also used for classifying the different types of attacks. KDD CUP 99 and
摘要:

EffectiveMetaheuristicBasedClassifiersforMulticlassIntrusionDetectionZareenFATIMA1,ArshadAli21DepartmentofComputerScience,UniversityofLahore,Lahore,Pakistan,ORCIDiD:https://orcid.org/0000-0002-1198-58832FASTSchoolofComputing,NUCES,LahoreBlockBFaisalTown,Lahore,54770,Pakistan,ORCIDiD:https://orcid.or...

展开>> 收起<<
Effective Metaheuristic Base d Classifiers for Multiclass Intrusion Detection Zareen FATIMA1 Arshad Ali2.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:503.19KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注