Effective Metaheuristic Base d Classifiers for Multiclass Intrusion Detection Zareen FATIMA1 Arshad Ali2

2025-05-03 0 0 503.19KB 17 页 10玖币

侵权投诉

Effective Metaheuristic Based Classifiers for Multiclass Intrusion

Detection

Zareen FATIMA1, Arshad Ali2

1Department of Computer Science, University of Lahore, Lahore, Pakistan,

ORCID iD: https://orcid.org/0000-0002-1198-5883

2FAST School of Computing, NUCES, Lahore Block B Faisal Town, Lahore, 54770, Pakistan,

ORCID iD: https://orcid.org/0000-0003-0562-7403

Abstract: Network security has become the biggest concern in the area of cyber security

because of the exponential growth in computer networks and applications. Intrusion

detection plays an important role in the security of information systems or networks

devices. The purpose of an intrusion detection system (IDS) is to detect malicious activities

and then generate an alarm against these activities. Having a large amount of data is one of

the key problems in detecting attacks. Most of the intrusion detection systems use all

features of datasets to evaluate the models and result in is, low detection rate, high

computational time and uses of many computer resources. For fast attacks detection IDS

needs a lightweight data. A feature selection method plays a key role to select best features

to achieve maximum accuracy. This research work conduct experiments by considering on

two updated attacks datasets, UNSW-NB15 and CICDDoS2019. This work suggests a

wrapper based Genetic Algorithm (GA) features selection method with ensemble

classifiers. GA select the best feature subsets and achieve high accuracy, detection rate

(DR) and low false alarm rate (FAR) compared to existing approaches. This research

focuses on multi-class classification. Implements two ensemble methods: stacking and

bagging to detect different types of attacks. The results show that GA improve the accuracy

significantly with stacking ensemble classifier.

Keywords: Intrusion detection, genetic algorithm, feature selection, UNSW-NB15,

CICDDoS2019

1. Introduction

Due to rapid growth in computer networks and applications, networks security has become a

major challenge in cyber security field. Intrusion detection plays a key role to detect different

attacks in computer networks [1]. The goal of an Intrusion Detection System (IDS) is to

monitor network traffic to identify any malicious activity that can breach confidentiality and

integrity information. Accordingly, the IDS alerts the network administrator or system about

such activities or attacks [2]. An attack is a malicious activity or unauthorized access that

affects the network system and can get access to confidential data. Typically, there are five

major categories of attacks: Denial of service (DoS), brute force attack, Probe attacks, user to

root, remote to local attack. An IDS is capable of responding to different attacks and

*Corresponding author: arshad.ali1@nu.edu.pk

reporting them accordingly. Typically, an IDS is divided into two categories i.e., network-

based intrusion detection system (NIDS) and host-based intrusion detection system (HIDS).

NIDS monitors inbound and outbound network traffic. NIDS captures whole network traffic

and evaluate the separate packets to detect abnormal activities [3]. HIDS monitors computer

or host system to detect malicious activities. HIDS detect intrusion by monitor the system

call, application activities, scheduling process, login attempts and system configuration on a

specific machine. If any malicious activity or changes occur in the system, it generates an

alarm [2,3,4]. Moreover, there are three general intrusion detection types: anomaly-based,

misuse (signature-based) and hybrid-based. Anomaly based Intrusion detection approaches

relay on the attacker behavior by determining the user profile and uses as baseline to define

normal user activities. When compare attacker activities with baseline and it generates an

alarm to aware the network administrator. Anomaly detection can detect unknown attacks but

there with high false positive rate [3,4]. The misuse intrusion detection method stores the

attack pattern or signature and compares this pattern to network traffic. If it’s matched, it

generates an alarm of intrusion. This method can detect only known attacks [3,4]. Hybrid

method is a combination of both anomaly and signature-based intrusion detection methods

and can detect both known and unknown attacks.

One of the major challenges to detecting attacks or malicious activities is analyzing large

amounts of data. Intrusion detection systems face big data challenges [2]. Therefore, feature

selection plays a key role in reducing big data problems by selecting the most relevant

features [2]. Features selection is a machine learning technique that selects the most relevant

subset of the original features set that achieves high detection accuracy as compared to the

original features set [5]. By reducing the dimensional of datasets to remove the irrelevant and

redundant features, machine learning algorithms can make more efficient classification

predictions. The selection of best feature subsets can reduce training and testing time,

improve detection rates, reduce false alarms rate, and create lightweight datasets that can be

used to build IDS for real-time and online attack detection [2]. There are three

comprehensive methods for features selection such as filters, wrappers, and hybrid-based

methods [6].

Filter based feature selection approach uses independent algorithms to select features and

applies external algorithms to evaluate the performance of selected features [6,7]. This

approach uses statistical measure to evaluate the relationship between each input variable and

target variable. It selects input variables that have a strong correlation with the target variable

and considers the most relevant features. This feature method can be easily applied as it does

not use learning algorithms in the feature selection procedure [1]. Wrapper method

“wrapped” around the learning algorithm. This selection method uses a learning algorithm to

select the important features subsets. The Wrapper algorithm uses a search algorithm to

evaluate the significance of different feature subsets, where feature subsets' worthiness is

evaluated by a learner [6]. The Wrappers method is computationally expensive but, in

comparison to other feature selection approaches, it is more precise [1,2]. Hybrid methods

combined the benefits of filter-based methods and wrapper-based methods to obtain a best

feature selection subset by using a learning algorithm.

In this research, we suggest a wrapper-based FS technique based on the Genetic Algorithm

(GA) that generates optimal feature sets by using the Naïve bayes (NB) ML algorithm as its

fitness function. To evaluate the performance of the suggested method, we use two intrusion

detection datasets, UNSW-NB15 and CICDDoS2019. These datasets are latest and have

most up-to-date attacks. The UNSW-NB15 dataset is widely used in feature selection

methods evaluation, but CICDoS2019 is not commonly used.

The contribution and purpose of this research are as follows:

• We use recent and the most up-to-date attack datasets.

• Firstly, we select a GA-based feature selection method. To compute the fitness

function, we used the Nave Bayes (NB) learning algorithm in the GA process.

• Secondly, for each selected feature set evaluation, we apply two ensemble-based

classifiers, stacking and bagging.

• Finally, we compare our suggested method with existing methods. The results show a

significant improvement in performance.

The reminder of the paper is organized as follows. Section 2 describes the related work.

Section 3 demonstrates the suggested IDS methodology. Section 4 presents the experiments

and discussion of the results. Section 5 provides the conclusion of this paper.

2. Related Work

This section gives an overview of relevant research works that used machine learning

techniques in the domain of IDS. This section also provides an overview of several IDS

frameworks and solutions.

In [8], the authors performed the features selection method on the UNSW-NB15 dataset. In

this study, features were selected by using information gain obtained through XGBoost

classifier. Features with a higher information gain score were considered more important

features. They selected the 23 most important features and applied the XGBoost classifier to

predict different attack types. On the testing dataset, they achieved 75.88% accuracy.

In [9], statistical and heuristic-based feature selection methods (forward selection and

backward elimination) were used for features selection. These methods were performed on

DARPA dataset. Resilient back propagation neural networks used for classification, it

increased the accuracy and gave less training and testing time then tradition neural network.

Heuristic and statistical based methods selected the features for each class type. The accuracy

difference is very small or may not be statistically significant because the pattern of five

class’s size has huge difference.

In [7] Fisher Score algorithm was used to select the best features. They used CICIDS2017

datasets and classified the datasets as benign or DDoS by using SVM, KNN and Decision

Tree (DT) algorithms. Fisher Score algorithm reduced features from 80 to 30. According to

the order of importance, the “Fwd Packet Length Mean” feature considered the most

important feature to detect intrusion. With 60% reduction in size of datasets, the success rate

of KNN increased, DT accuracy did not change and SVM accuracy decreased.

In [10] author presented two stage classifier approach, in first stage incoming traffic divided

in TCP, UDP or other protocols then it identifies the normal or attacks class. During

preprocessing in first stage, features were selected by using information gain methods. In

second stage, they used multiclass classification to identify the attack type. In the full dataset

with all classes, it used Reduced Error Pruning Tree (REPTree) for classification. The

experiments were performed on UNSW-NB15 and NSL KDD datasets.

In [11], researchers presented a filter-based approach by using XGBoost algorithm for

features reduction. The experiments were performed on UNSW-NB15 dataset. XGBoost

calculates the F-measure score for each feature of the given dataset, and high-scoring features

are selected as important features. The number of features has been reduced from 42 to 19.

The following ML algorithms were applied to the selected features subspace: ANN, LR,

KNN, SVM, and DT.

The author of [12] created a new dataset, CICDoS2109, which contains the 11 DDoS attack

types. This dataset has 11 classes in training dataset. It also applied the info gain method to

select the important features. The authors applied different ML classifiers, ID3, RF, NB, and

LR, to obtain accuracy. They evaluated the performance by using three metrics: precision,

recall, and f1 scores.

In [13] authors applied ensemble feature selection methods on CICIDS2017 datasets and

reduced features from 69 to 10. Gini importance, permutation importance and Drop-column

importance are used as feature selection methods. Random forest classifier used for

evolution. Permutation importance considered best methods between the accuracy of drop-

column and the computational cost of Gini importance. The comparison of both original 69

features and 10 features determined the minor difference in F1-score 0.2 and false positive

rate approximately 0.

Author of [14] proposed multi-objective feature selection method based on NSGA-II and

logistic regression. Two schemes were used in this proposed work, for binary class dataset

binomial logistic regression were used and for multi class dataset they used multinomial

logistic regression. C4.5, RF, and NB classifier were used to evaluate the best subset

features. For experiments CICIDS2017, NSL KDD and UNSW-NB15 datasets are used.

Features Selected in NSL KDD datasets, 9-19 in binary class and 19 in multi class. In

CICIDS2017 datasets selected features in binary class 7–25 and 33 in multi class. UNSW-

NB15 datasets selected features 8-17 in binary and 11 in multi class. This paper gives better

accuracy in binary class compared to multi class.

In [15], authors presented an IDS framework for features selection by using evolutionary

genetic algorithm approach with SVM (GA-SVM). This paper proposed a new fitness

function by using three evaluation parameters: FPR, TPR, and the number of selected

features. SVM is also used for classifying the different types of attacks. KDD CUP 99 and

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EffectiveMetaheuristicBasedClassifiersforMulticlassIntrusionDetectionZareenFATIMA1,ArshadAli21DepartmentofComputerScience,UniversityofLahore,Lahore,Pakistan,ORCIDiD:https://orcid.org/0000-0002-1198-58832FASTSchoolofComputing,NUCES,LahoreBlockBFaisalTown,Lahore,54770,Pakistan,ORCIDiD:https://orcid.or...

展开>> 收起<<

Effective Metaheuristic Base d Classifiers for Multiclass Intrusion Detection Zareen FATIMA1 Arshad Ali2.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Effective Metaheuristic Base d Classifiers for Multiclass Intrusion Detection Zareen FATIMA1 Arshad Ali2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: