Towards the Detection of Malicious Java Packages PRE-PRINT Piergiorgio Ladisa Henrik Plate Matias Martinez Olivier Barais Serena Elisa Ponta

2025-05-06 0 0 1.02MB 11 页 10玖币
侵权投诉
Towards the Detection of Malicious Java Packages
[PRE-PRINT]
Piergiorgio Ladisa, Henrik Plate, Matias Martinez, Olivier Barais, Serena Elisa Ponta
Open-source software supply chain attacks aim at infecting downstream users by poisoning open-source packages. The common way of
consuming such artifacts is through package repositories and the development of vetting strategies to detect such attacks is ongoing research.
Despite its popularity, the Java ecosystem is the less explored one in the context of supply chain attacks.
In this paper, we present indicators of malicious behavior that can be observed statically through the analysis of Java bytecode. Then
we evaluate how such indicators and their combinations perform when detecting malicious code injections. We do so by injecting three
malicious payloads taken from real-world examples into the Top-10 most popular Java libraries from libraries.io.
We found that the analysis of strings in the constant pool and of sensitive APIs in the bytecode instructions aid in the task of detecting
malicious Java packages by signicantly reducing the information, thus, making also manual triage possible.
Citing this paper
This is a pre-print of the paper that appears in the Proceedings of the 2022 ACM Workshop on Software Supply Chain Oensive Research
and Ecosystem Defenses (SCORED ’22), November 11, 2022, Los Angeles, CA, USA.
If you wish to cite this work, please refer to it as follows:
@INPROCEEDINGS{ladisa22towardsjava,
author={Piergiorgio Ladisa and Henrik Plate and Matias Martinez and Olivier Barais and Serena Elisa Ponta},
booktitle={2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses (SCORED ’22)},
title={Towards the Detection of Malicious Java Packages},
year=2022,
}
arXiv:2210.03998v1 [cs.CR] 8 Oct 2022
Piergiorgio Ladisa, Henrik Plate, Matias Martinez, Olivier Barais, and Serena Elisa Ponta
Towards the Detection of Malicious Java Packages
Piergiorgio Ladisa
SAP Security Research
Mougins, France
University of Rennes 1/INRIA/IRISA
Rennes, France
piergiorgio.ladisa@sap.com
piergiorgio.ladisa@irisa.fr
Henrik Plate
SAP Security Research
Mougins, France
henrik.plate@sap.com
Matias Martinez
Université Polytechnique
Hauts-de-France
Valenciennes, France
matias.martinez@uphf.fr
Olivier Barais
University of Rennes 1/INRIA/IRISA
Rennes, France
olivier.barais@irisa.fr
Serena Elisa Ponta
SAP Security Research
Mougins, France
serena.ponta@sap.com
ABSTRACT
Open-source software supply chain attacks aim at infecting down-
stream users by poisoning open-source packages. The common
way of consuming such artifacts is through package repositories
and the development of vetting strategies to detect such attacks is
ongoing research. Despite its popularity, the Java ecosystem is the
less explored one in the context of supply chain attacks.
In this paper we present indicators of malicious behavior that can
be observed statically through the analysis of Java bytecode. Then
we evaluate how such indicators and their combinations perform
when detecting malicious code injections. We do so by injecting
three malicious payloads taken from real-world examples into the
Top-10 most popular Java libraries from libraries.io.
We found that the analysis of strings in the constant pool and
of sensitive APIs in the bytecode instructions aid in the task of
detecting malicious Java packages by signicantly reducing the
information, thus, making also manual triage possible.
CCS CONCEPTS
Security and privacy Malware and its mitigation.
KEYWORDS
Open-Source Security, Supply Chain Attacks, Malware Detection
1 INTRODUCTION
Today’s software supply chains make extensive use of open-source
components. Despite the clear advantages, the lack of transparency
and reliance on unknown stakeholders and systems pose several
risks.
Open-Source Software (
OSS
) supply chain attacks are character-
ized by the injection of malicious code into open-source compo-
nents as a means for spreading malwares [
24
] and there exist many
attack vectors [
18
]. Package repositories for
OSS
(e.g., npm, PyPI,
Maven Central) are commonly used by downstream users to con-
sume
OSS
packages and several scientic works focus on vetting
mechanisms at scale. Most of those tackle interpreted languages
(e.g., JavaScript, Python), whereas the Java ecosystem is less ex-
plored despite its popularity [7, 8] .
Based on the study of real-world attacks, our goal is to nd and
evaluate indicators of malicious behavior in Java packages. Since
the usual way of consuming the latter is through pre-compiled
JARs, we focus on Java bytecode. Indicators of malicious behavior
can be used by package repositories vetting the submitted packages
or by downstream users checking the downloaded dependencies.
We set out to answer the following research questions:
RQ1
– What are some of the possible indicators of malicious
behavior that can be observed from the bytecode?
RQ2
– How do these indicators and their combinations perform
in the detection of malicious Java packages?
To answer those questions we analyze both the constant pool
(e.g., to detect obfuscated strings) and bytecode instructions (e.g., to
detect sensitive APIs). We assess the performance of the identied
indicators by analyzing the Top-10 Java projects from libraries.io
1
,
both the original, benign ones as well as infected ones, containing
malicious payloads taken from three real-world attacks.
The remainder of the paper is organized as follows. Section 2
presents related works. Section 3 motivates the need for improving
the detection of malicious Java Archive (
JAR
)s in the context of
OSS
supply chain attacks. Section 4 describes background information.
Section 5 presents our static analysis of Java Virtual Machine (
JVM
)
bytecode and answers to RQ1. Section 6 answers RQ2 by evaluating
the indicators of malicious behavior in the Java bytecode. Section 7
discusses the limitations of our approach, while Section 8 highlights
the conclusions and discusses future works.
2 RELATED WORKS
Table 1 shows work to date about the detection of malicious open-
source packages.
Seja et al. [
31
] propose a machine learning-based approach
for the automated detection of malicious npm packages trained
on a labeled dataset. We port some of the considered features in
the context of Java, in particular the concept of sensitive APIs
(e.g., process creation, dynamic code generation) and the usage of
Shannon entropy to detect obfuscation. While they apply the latter
at the le level to detect the presence of compiled or minied code,
we apply it to the strings found in the Java class le’s constant pool.
1https://libraries.io/
Towards the Detection of Malicious Java Packages
Vu et al. [
34
] analyze the discrepancy between source code and
the deployed package in PyPI as a way to detect malicious injections
in the Python ecosystem. Scalco et al. [
29
] perform the same in the
context of JavaScript. Conversely, we do not consider the source
code.
Duan et al. [
13
] propose a classier based both on dynamic and
static analysis to classify packages in npm, PyPI, and RubyGems.
Among the selected features for the static analysis, they also suggest
considering sensitive APIs and to perform data ow analysis to
highlight dangerous ows.
Ohm et al. [
25
] leverage sandboxes to collect forensic artifacts re-
lated to the execution of malicious JavaScript and Python packages
and describe the observed dynamic behaviors. In another work [
23
]
they propose a clustering model based on signatures produced from
the Abstract Syntax Tree (
AST
) of malicious JavaScript samples.
Our approach is instead only static and focuses on Java.
Garret et al. [
16
] propose an anomaly detection approach based
on the observation of code features in JavaScript (e.g., opening of
connections, read/write to the le system).
Fass et al. [
14
] extract features from the
AST
of JavaScript codes
to build a classier capable of detecting obfuscation.
In the scope of Java malware detection, related works focus on
the detection of malicious code in applets or purely malicious JARs.
Schlumberger et al. [
30
] propose a static approach for applets
based on machine learning. Among the selected features they con-
sider sensitive APIs (e.g., for obfuscation and code behavior). Com-
pared to their work, our focus is on
OSS
packages that may contain
a small portion of malicious code, while malicious applets do not
need to piggyback on existing benign functionalities. In addition,
some of the APIs for applets are not relevant in the scope of Java
libraries (e.g., APIs for MIDlets).
Pinheiro et al. [
28
] propose a dynamic approach for the auto-
mated detection of malicious JARs. They extract forensic features
related to the execution of purely malicious samples in a sandboxed
environment to train a classier based on articial neural networks.
Instead, we perform a static analysis of packages where malicious
code was injected.
Other relevant works come from the Android ecosystem [
10
,
11
,
19
,
20
], especially the ones about the static inspection of Dalvik
bytecode. In this case, Aafer et al. [
9
] analyze Android malware
samples to extract their commonly used APIs, then build a KNN
classier. As opposed to their work, not having many malicious
samples available, our search for relevant APIs is based on the
manual inspection of malicious packages. Specic aspects of the
Android ecosystem make the problem of detecting malicious Java
libraries dierent. On the one hand, because malware running
on mobile devices has dierent objectives, e.g., nancial gain by
sending SMS or reading contacts. On the other hand, there are
technical dierences between Java for Android and for the JVM
(e.g., permissions, intents, or APIs existing only for Android).
3 MOTIVATION
This section motivates our work by presenting
OSS
supply chain
attacks and reports on the detection capabilities of popular An-
tiviruses (AVs).
Reference Year
Ruby
Python
JavaScript
Java
Seja et al. [31] 2022
Scalco et al. [29] 2022
Duan et al. [13] 2021 ✓ ✓
Vu et al. [34] 2021
Ohm et al. [25] 2020 ✓ ✓
Ohm et al. [23] 2020
Garret et al. [16] 2019
Fass et al. [14] 2018
Table 1: Ecosystems covered by recent scientic works on
the detection of malicious open-source packages. (): here
we intend the case of JVM bytecode
3.1 Open-Source Software Supply Chain
Attacks
Listing 1: Malicious code snippet from HpServlet.java con-
tained in com.github.codingandcoding:servlet-api@3.2.0
1protected void doG et ( H tt pS er vl e tR eq ue st re q )
2throws Se rv le tE xc ep ti on , IO Excep ti on {
3Ru ntime . getR unt im e ()
4. ex ec ( " bash -c { echo , Y mFz ** S HOR TE NE D ** J jE =}
5|{ b ase6 4 , -d } |{ bas h , - i} " );
6}
OSS
supply chain attacks target open-source components as a
means of spreading malware. As Ladisa et al. [
18
] pointed out, there
are many possible attack vectors.
As demonstrated by multiple examples of malicious
OSS
pack-
ages [
24
], most of them prove to have a small fraction of harmful
code hidden in a bigger corpus of legitimate code. This is similar to
the case of piggybacking in Android malwares, where a legitimate
application (carrier) is repackaged by grafting a malicious code
(rider) [11, 19, 20].
Listing 1 shows the malicious code snippet present in the package
com.github.codingandcoding:servlet-api@3.2.0. Its source version
consists of 149 les, 77 of which are in java format and a total of
13458 Lines Of Code (
LOC
). The malicious payload is present in
only one line of code of the le HttpServlet.java, which consists of
a total of 749 LOC.
3.2 VirusTotal Scan
To understand the level of detection by common
AVs
we submitted
the 2886 samples available in the Backstabber’s Knife Collection
(
BKC
) [
24
] (813 Ruby samples, 261 in Python, 1807 in JavaScript,
and 4 in Java) to VirusTotal
2
(as of July 2022). The latter oers the
possibility to inspect les or hashes with over 70 Antivirus (
AV
)
softwares.
Figure 1 depicts the percentage of responses provided by all the
AVs
accessible through the VirusTotal API. The results show the
average calculated per ecosystem.
2https://www.virustotal.com
摘要:

TowardstheDetectionofMaliciousJavaPackages[PRE-PRINT]PiergiorgioLadisa,HenrikPlate,MatiasMartinez,OlivierBarais,SerenaElisaPontaOpen-sourcesoftwaresupplychainattacksaimatinfectingdownstreamusersbypoisoningopen-sourcepackages.Thecommonwayofconsumingsuchartifactsisthroughpackagerepositoriesandthedevel...

展开>> 收起<<
Towards the Detection of Malicious Java Packages PRE-PRINT Piergiorgio Ladisa Henrik Plate Matias Martinez Olivier Barais Serena Elisa Ponta.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:1.02MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注