Do Content Management Systems Impact the Security of Free Content Websites A Correlation Analysis

2025-05-03 0 0 540.1KB 7 页 10玖币
侵权投诉
Do Content Management Systems Impact the Security of Free
Content Websites? A Correlation Analysis
Mohammed Alaqdhi
University of Central Florida
Orlando, FL, USA
malqadhi@Knights.ucf.edu
Abdulrahman Alabduljabbar
University of Central Florida
Orlando, FL, USA
jabbar@Knights.ucf.edu
Kyle Thomas
University of Central Florida
Orlando, FL, USA
kthomas4031@Knights.ucf.edu
Saeed Salem
Qatar University
Doha, Qatar
saeed.salem@qu.edu.qa
DaeHun Nyang
Ewha Womans University
Seoul, Republic of Korea
saeed.salem@ndsu.edu
David Mohaisen
University of Central Florida
Orlando, FL, USA
mohaisen@ucf.edu
ABSTRACT
This paper investigates the potential causes of the vulnerabilities
of free content websites to address risks and maliciousness. Assem-
bling more than 1,500 websites with free and premium content, we
identify their content management system (CMS) and malicious
attributes. We use frequency analysis at both the aggregate and
per category of content (books, games, movies, music, and soft-
ware), utilizing the unpatched vulnerabilities, total vulnerabilities,
malicious count, and percentiles to uncover trends and anities of
usage and maliciousness of CMS’s and their contribution to those
websites. Moreover, we nd that, despite the signicant number of
custom code websites, the use of CMS’s is pervasive, with varying
trends across types and categories. Finally, we nd that even a small
number of unpatched vulnerabilities in popular CMS’s could be a
potential cause for signicant maliciousness.
1 INTRODUCTION
Today, free content websites are an essential part of the Internet,
providing ample resources to users in the form of free books, movies,
software, and games, among others. Free content websites have
always been a focal point of debate and covered in various stud-
ies [
2
,
8
,
13
]. The main questions around the study of free content
websites have been their security and privacy: what are the direct
and indirect costs associated with using those websites? Those
costs have been studied by contrasting free content websites with
premium websites–websites that provide similar content but charge
fees–across multiple analysis dimensions, including their vulnera-
bilities in the code base, infrastructure utilization, and the richness
of their privacy policies [1, 21, 23].
For instance, in some of the prior work, it was reported that
there is a higher level of maliciousness in free content websites
than in premium websites [
16
,
18
], as reported in various scanners
(e.g., virustotal.com), which makes the free content websites unsafe
to visit by their users [
3
]. Digital certicates, a key component in
ensuring the condentiality and integrity of the communication
between browsers and those websites [
7
], are shown to be prob-
lematic in many ways. For example, those websites tend to have
mismatched domain names as a result of poor website migration or
even are expired [
4
]. The privacy policies of those websites are also
shown to be limited or may not necessarily cover various essential
policy elements that are expected in general and are shown in the
privacy policies of premium content websites [5].
Despite the importance and coverage of the various studies in
the literature [
4
,
5
], they fall short in various aspects, particularly
in understanding and identifying the root cause of the lack of se-
curity and privacy in free content websites. The contrast provided
in the literature highlights that free content websites are a source
of lurking risks and vulnerabilities that could expose users and
their data to signicant security costs. Whether it is in the eventual
detection as malicious, the lack of expressiveness in their privacy
policy, or the signicantly lax certicate qualities, all in contrast
to the premium content websites. However, there is a lack of a
study that looks into various potential contributors to the vulnera-
bility, particularly in those websites codes, to better understand a
mitigation strategy for the associated risks.
To address this gap in the existing literature on the understand-
ing of the security of free content websites, we revisit the security
analysis of free content websites through code-based analysis. The
critical insight we utilize for our study is that the security of any
website is best understood by understanding the codebase of its
content, and the shared code in particular. In essence, we hypothe-
size that many of the vulnerabilities associated with those websites
could be caused by a repeated pattern in their codebase due to the
utilization of third-party components, libraries, or just insecure
coding practices, as is the case with many web technologies. We
nd that we can understand the repeated patterns by studying the
utilization of third-party content management systems (CMS’s),
which are heavily utilized in today’s website development.
Contributions.
In this paper, we contribute to the state-of-the-art
by analyzing and contrasting the security of free content websites
through the lenses of CMS analysis using 1,562 websites. We anno-
tate the websites with their malicious attributes and systematically
evaluate the role of CMS as a contributing factor. We nd that a
signicant number of the websites (
44%) use CMS’s, which comes
with vulnerabilities and contributes to maliciousness. We nd that
the use pattern of CMS’s is unique across dierent types of web-
sites and categories. The top-used CMS’s have several aspects in
common, such as unpatched vulnerabilities, which help explain the
maliciousness of websites using them.
Organization.
The rest of this paper is as follows. In section 2, we
review the related work. In section 3, we review our dataset and its
arXiv:2210.12083v1 [cs.CR] 21 Oct 2022
Conference’17, July 2017, Washington, DC, USA Mohammed Alaqdhi, Abdulrahman Alabduljabbar, Kyle Thomas, Saeed Salem, DaeHun Nyang, and David Mohaisen
annotation. In section 4, we provide an overview of the methods
utilized in this paper. In section 5, we provide the results and the
discussion. Finally, in section 6, we provide the conclusion and
recommendation for future research or work.
2 RELATED WORK
In the following, we sample and review the most related pieces of
prior work to the work presented in this study.
Online Website Analysis.
Researchers have held that diverse
constituents might be subject to increased risks when using free
content websites, given the evolution of online services and web
applications. These risks have been examined across various web-
site features, including digital certicates, content, and addressing
infrastructure. [
4
]. In another study, component and website-level
analyses were conducted to understand vulnerabilities utilizing
two main o-the-shelf tools, VirusTotal and Sucuri [
3
], linking free
content websites to signicant threats.
Privacy Practices Reporting.
Mindful of the implicit security
cost, another work has looked into the interplay between privacy
policies and the quality of those websites. Namely, the prior work
examined user comprehension of risks linked to service use through
privacy policy understanding [
5
]. The researchers passed several
ltered privacy policies into a custom pipeline that annotates the
policies against various categories (e.g., rst and third-party usage,
data retention) [
14
]. The authors found that the privacy policies
of free content websites are vague, lack essential policy elements,
or are lax in specifying the responsibilities of the service provider
(website owner) against possible compromise and exposure of user
data. On the other hand, they found that the privacy policies of
the premium content websites are more transparent and elaborate
about reporting their practices on data gathering, sharing, and
retention [5].
Tracking and Website Structure.
Another study has contributed
to this eld by revealing the tracking mechanisms of corporate
ownership [
17
]. To comprehend the web tracking phenomenon
and subsequently craft material policies to regulate it, the authors
argued that it is imperative to know the actual degree and reach
of corporations that may be subject to the increased regulations.
The most signicant nding in this research was that 78.07 per-
cent of websites within Alexa’s top million instigated third-party
HTTP requests to the domain owned by Google. Furthermore, the
researchers observed that the overall trend shown by past surveys
is not only that many of the users of websites value privacy but also
that the present privacy state online denotes an area of material
anxiety. Concerning measurement, the same study highlights that
the level of tracking on the web is on the rise and does not show
indications of abating.
3 DATASET AND DATA ANNOTATION
Websites.
For this study, we compiled a dataset that contains 1,562
websites, with 834 free content websites and 728 premium web-
sites, which have been used in prior work [
3
5
]. In selecting those
websites, we consider their popularity while maintaining a balance
per the sub-category of a website. To determine the popularity of a
website, we used the results of search engines Bingo, DuckDuckGo,
and Google as a proxy, where highly ranked websites are considered
popular. To balance the dataset, we undertook a manual verication
approach to vet each website across the sub-category (see below).
Namely, we sorted the websites into ve categories based on the
content they predominantly serve: software, music, movies, games,
or books. The following are the free and premium content websites
count per category: books (154 free, 195 premium), games (80 free,
113 premium), movies (331 free, 152 premium), music (83 free, 86
premium), and software (186 free, 182 premium).
Dataset annotation.
For our analysis, we augment the dataset in
various ways. We primarily focused on information reecting the
exposure to the risk of users [
4
]. We determine whether a website
is malicious or benign using the VirusTotal API [
24
]. VirusTotal
is a framework that oers cyber threat detection, which helps us
analyze, detect, and correlate threats while reducing the required
eort through automation. Specically, the API allowed us to iden-
tify malicious IP addresses, domains, or URLs associated with the
websites we use for augmentation.
CMS’s.
Since this work aims to understand the role of software
(CMS, in particular) used across websites and its contribution to
threat exposure, we follow a two-step approach: (1) website crawl-
ing and (2) manual inspection and annotation. First, we crawl each
of the websites and inspect its elements to nd the source folder for
the website. From the source folder, we list the source and content
for each website to identify the CMS used to develop this website.
This approach requires us to build a database of the dierent avail-
able CMS’s to allow automation of the annotation through regular
expression matching. We cross-validate our annotation utilizing ex-
isting online tools used for CMS detection. We use CMS-detector [
9
]
and w3techs [
25
], two popular tools, to extract the CMS’s used for
the list of websites. For automation, we build a wrapper that pre-
pares the query with the website, retrieves the response of the CMS
used from the corresponding tool, and compares it to the manually
identied set in the previous step. Among the CMS’s identied,
WordPress is the most popular, followed by Drupal, Django, Next.js,
Laravel, CodeIgniter, and DataLife. In total, we nd 77 unique CMS’s
used across the dierent websites, not including websites that rely
on a custom-coded CMS.
Vulnerabilities.
Our dataset’s nal augmentation and annotation
are the vulnerability count and patching patterns. For each CMS, we
crawl the results available in various portals concerning the current
version of the CMS to identify the associated vulnerability. Namely,
we crawl such information from cvedetails [
11
], snyk.io [
22
], open-
bugbounty [
19
], and Wordfence [
12
]. Finally, to determine whether
a vulnerability is patched or not (thus counting the number of
unpatched vulnerabilities), we query cybersecurity-help [10].
4 ANALYSIS METHODS
The key motivation behind our analysis is to understand the po-
tential contribution of CMS’s to the (in)security of free content
websites, which has been established already in the prior work, as
highlighted in section 2. To achieve this goal, we pursue two direc-
tions. The rst is a holistic analysis geared toward understanding
the distribution of various features associated with free content
and premium websites (combined). The second is a ne-grained
摘要:

DoContentManagementSystemsImpacttheSecurityofFreeContentWebsites?ACorrelationAnalysisMohammedAlaqdhiUniversityofCentralFloridaOrlando,FL,USAmalqadhi@Knights.ucf.eduAbdulrahmanAlabduljabbarUniversityofCentralFloridaOrlando,FL,USAjabbar@Knights.ucf.eduKyleThomasUniversityofCentralFloridaOrlando,FL,USA...

展开>> 收起<<
Do Content Management Systems Impact the Security of Free Content Websites A Correlation Analysis.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:540.1KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注