Do Content Management Systems Impact the Security of Free Content Websites A Correlation Analysis

2025-05-03 0 0 540.1KB 7 页 10玖币

侵权投诉

Do Content Management Systems Impact the Security of Free

Content Websites? A Correlation Analysis

Mohammed Alaqdhi

University of Central Florida

Orlando, FL, USA

malqadhi@Knights.ucf.edu

Abdulrahman Alabduljabbar

University of Central Florida

Orlando, FL, USA

jabbar@Knights.ucf.edu

Kyle Thomas

University of Central Florida

Orlando, FL, USA

kthomas4031@Knights.ucf.edu

Saeed Salem

Qatar University

Doha, Qatar

saeed.salem@qu.edu.qa

DaeHun Nyang

Ewha Womans University

Seoul, Republic of Korea

saeed.salem@ndsu.edu

David Mohaisen

University of Central Florida

Orlando, FL, USA

mohaisen@ucf.edu

ABSTRACT

This paper investigates the potential causes of the vulnerabilities

of free content websites to address risks and maliciousness. Assem-

bling more than 1,500 websites with free and premium content, we

identify their content management system (CMS) and malicious

attributes. We use frequency analysis at both the aggregate and

per category of content (books, games, movies, music, and soft-

ware), utilizing the unpatched vulnerabilities, total vulnerabilities,

malicious count, and percentiles to uncover trends and anities of

usage and maliciousness of CMS’s and their contribution to those

websites. Moreover, we nd that, despite the signicant number of

custom code websites, the use of CMS’s is pervasive, with varying

trends across types and categories. Finally, we nd that even a small

number of unpatched vulnerabilities in popular CMS’s could be a

potential cause for signicant maliciousness.

1 INTRODUCTION

Today, free content websites are an essential part of the Internet,

providing ample resources to users in the form of free books, movies,

software, and games, among others. Free content websites have

always been a focal point of debate and covered in various stud-

ies [

]. The main questions around the study of free content

websites have been their security and privacy: what are the direct

and indirect costs associated with using those websites? Those

costs have been studied by contrasting free content websites with

premium websites–websites that provide similar content but charge

fees–across multiple analysis dimensions, including their vulnera-

bilities in the code base, infrastructure utilization, and the richness

of their privacy policies [1, 21, 23].

For instance, in some of the prior work, it was reported that

there is a higher level of maliciousness in free content websites

than in premium websites [

], as reported in various scanners

(e.g., virustotal.com), which makes the free content websites unsafe

to visit by their users [

]. Digital certicates, a key component in

ensuring the condentiality and integrity of the communication

between browsers and those websites [

], are shown to be prob-

lematic in many ways. For example, those websites tend to have

mismatched domain names as a result of poor website migration or

even are expired [

]. The privacy policies of those websites are also

shown to be limited or may not necessarily cover various essential

policy elements that are expected in general and are shown in the

privacy policies of premium content websites [5].

Despite the importance and coverage of the various studies in

the literature [

], they fall short in various aspects, particularly

in understanding and identifying the root cause of the lack of se-

curity and privacy in free content websites. The contrast provided

in the literature highlights that free content websites are a source

of lurking risks and vulnerabilities that could expose users and

their data to signicant security costs. Whether it is in the eventual

detection as malicious, the lack of expressiveness in their privacy

policy, or the signicantly lax certicate qualities, all in contrast

to the premium content websites. However, there is a lack of a

study that looks into various potential contributors to the vulnera-

bility, particularly in those websites codes, to better understand a

mitigation strategy for the associated risks.

To address this gap in the existing literature on the understand-

ing of the security of free content websites, we revisit the security

analysis of free content websites through code-based analysis. The

critical insight we utilize for our study is that the security of any

website is best understood by understanding the codebase of its

content, and the shared code in particular. In essence, we hypothe-

size that many of the vulnerabilities associated with those websites

could be caused by a repeated pattern in their codebase due to the

utilization of third-party components, libraries, or just insecure

coding practices, as is the case with many web technologies. We

nd that we can understand the repeated patterns by studying the

utilization of third-party content management systems (CMS’s),

which are heavily utilized in today’s website development.

Contributions.

In this paper, we contribute to the state-of-the-art

by analyzing and contrasting the security of free content websites

through the lenses of CMS analysis using 1,562 websites. We anno-

tate the websites with their malicious attributes and systematically

evaluate the role of CMS as a contributing factor. We nd that a

signicant number of the websites (

≈

44%) use CMS’s, which comes

with vulnerabilities and contributes to maliciousness. We nd that

the use pattern of CMS’s is unique across dierent types of web-

sites and categories. The top-used CMS’s have several aspects in

common, such as unpatched vulnerabilities, which help explain the

maliciousness of websites using them.

Organization.

The rest of this paper is as follows. In section 2, we

review the related work. In section 3, we review our dataset and its

arXiv:2210.12083v1 [cs.CR] 21 Oct 2022

Conference’17, July 2017, Washington, DC, USA Mohammed Alaqdhi, Abdulrahman Alabduljabbar, Kyle Thomas, Saeed Salem, DaeHun Nyang, and David Mohaisen

annotation. In section 4, we provide an overview of the methods

utilized in this paper. In section 5, we provide the results and the

discussion. Finally, in section 6, we provide the conclusion and

recommendation for future research or work.

2 RELATED WORK

In the following, we sample and review the most related pieces of

prior work to the work presented in this study.

Online Website Analysis.

Researchers have held that diverse

constituents might be subject to increased risks when using free

content websites, given the evolution of online services and web

applications. These risks have been examined across various web-

site features, including digital certicates, content, and addressing

infrastructure. [

]. In another study, component and website-level

analyses were conducted to understand vulnerabilities utilizing

two main o-the-shelf tools, VirusTotal and Sucuri [

], linking free

content websites to signicant threats.

Privacy Practices Reporting.

Mindful of the implicit security

cost, another work has looked into the interplay between privacy

policies and the quality of those websites. Namely, the prior work

examined user comprehension of risks linked to service use through

]. The researchers passed several

ltered privacy policies into a custom pipeline that annotates the

policies against various categories (e.g., rst and third-party usage,

data retention) [

]. The authors found that the privacy policies

of free content websites are vague, lack essential policy elements,

or are lax in specifying the responsibilities of the service provider

(website owner) against possible compromise and exposure of user

data. On the other hand, they found that the privacy policies of

the premium content websites are more transparent and elaborate

about reporting their practices on data gathering, sharing, and

retention [5].

Tracking and Website Structure.

Another study has contributed

to this eld by revealing the tracking mechanisms of corporate

ownership [

]. To comprehend the web tracking phenomenon

and subsequently craft material policies to regulate it, the authors

argued that it is imperative to know the actual degree and reach

of corporations that may be subject to the increased regulations.

The most signicant nding in this research was that 78.07 per-

cent of websites within Alexa’s top million instigated third-party

HTTP requests to the domain owned by Google. Furthermore, the

researchers observed that the overall trend shown by past surveys

is not only that many of the users of websites value privacy but also

that the present privacy state online denotes an area of material

anxiety. Concerning measurement, the same study highlights that

the level of tracking on the web is on the rise and does not show

indications of abating.

3 DATASET AND DATA ANNOTATION

Websites.

For this study, we compiled a dataset that contains 1,562

websites, with 834 free content websites and 728 premium web-

sites, which have been used in prior work [

–

]. In selecting those

websites, we consider their popularity while maintaining a balance

per the sub-category of a website. To determine the popularity of a

website, we used the results of search engines Bingo, DuckDuckGo,

and Google as a proxy, where highly ranked websites are considered

popular. To balance the dataset, we undertook a manual verication

approach to vet each website across the sub-category (see below).

Namely, we sorted the websites into ve categories based on the

content they predominantly serve: software, music, movies, games,

or books. The following are the free and premium content websites

count per category: books (154 free, 195 premium), games (80 free,

113 premium), movies (331 free, 152 premium), music (83 free, 86

premium), and software (186 free, 182 premium).

Dataset annotation.

For our analysis, we augment the dataset in

various ways. We primarily focused on information reecting the

exposure to the risk of users [

]. We determine whether a website

is malicious or benign using the VirusTotal API [

]. VirusTotal

is a framework that oers cyber threat detection, which helps us

analyze, detect, and correlate threats while reducing the required

eort through automation. Specically, the API allowed us to iden-

tify malicious IP addresses, domains, or URLs associated with the

websites we use for augmentation.

CMS’s.

Since this work aims to understand the role of software

(CMS, in particular) used across websites and its contribution to

threat exposure, we follow a two-step approach: (1) website crawl-

ing and (2) manual inspection and annotation. First, we crawl each

of the websites and inspect its elements to nd the source folder for

the website. From the source folder, we list the source and content

for each website to identify the CMS used to develop this website.

This approach requires us to build a database of the dierent avail-

able CMS’s to allow automation of the annotation through regular

expression matching. We cross-validate our annotation utilizing ex-

isting online tools used for CMS detection. We use CMS-detector [

]

and w3techs [

], two popular tools, to extract the CMS’s used for

the list of websites. For automation, we build a wrapper that pre-

pares the query with the website, retrieves the response of the CMS

used from the corresponding tool, and compares it to the manually

identied set in the previous step. Among the CMS’s identied,

WordPress is the most popular, followed by Drupal, Django, Next.js,

Laravel, CodeIgniter, and DataLife. In total, we nd 77 unique CMS’s

used across the dierent websites, not including websites that rely

on a custom-coded CMS.

Vulnerabilities.

Our dataset’s nal augmentation and annotation

are the vulnerability count and patching patterns. For each CMS, we

crawl the results available in various portals concerning the current

version of the CMS to identify the associated vulnerability. Namely,

we crawl such information from cvedetails [

], snyk.io [

], open-

bugbounty [

], and Wordfence [

]. Finally, to determine whether

a vulnerability is patched or not (thus counting the number of

unpatched vulnerabilities), we query cybersecurity-help [10].

4 ANALYSIS METHODS

The key motivation behind our analysis is to understand the po-

tential contribution of CMS’s to the (in)security of free content

websites, which has been established already in the prior work, as

highlighted in section 2. To achieve this goal, we pursue two direc-

tions. The rst is a holistic analysis geared toward understanding

the distribution of various features associated with free content

and premium websites (combined). The second is a ne-grained

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DoContentManagementSystemsImpacttheSecurityofFreeContentWebsites?ACorrelationAnalysisMohammedAlaqdhiUniversityofCentralFloridaOrlando,FL,USAmalqadhi@Knights.ucf.eduAbdulrahmanAlabduljabbarUniversityofCentralFloridaOrlando,FL,USAjabbar@Knights.ucf.eduKyleThomasUniversityofCentralFloridaOrlando,FL,USA...

展开>> 收起<<

Do Content Management Systems Impact the Security of Free Content Websites A Correlation Analysis.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Do Content Management Systems Impact the Security of Free Content Websites A Correlation Analysis

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: