Towards the Detection of Malicious Java Packages PRE-PRINT Piergiorgio Ladisa Henrik Plate Matias Martinez Olivier Barais Serena Elisa Ponta

2025-05-06 0 0 1.02MB 11 页 10玖币

Towards the Detection of Malicious Java Packages

[PRE-PRINT]

Piergiorgio Ladisa, Henrik Plate, Matias Martinez, Olivier Barais, Serena Elisa Ponta

Open-source software supply chain attacks aim at infecting downstream users by poisoning open-source packages. The common way of

consuming such artifacts is through package repositories and the development of vetting strategies to detect such attacks is ongoing research.

Despite its popularity, the Java ecosystem is the less explored one in the context of supply chain attacks.

In this paper, we present indicators of malicious behavior that can be observed statically through the analysis of Java bytecode. Then

we evaluate how such indicators and their combinations perform when detecting malicious code injections. We do so by injecting three

malicious payloads taken from real-world examples into the Top-10 most popular Java libraries from libraries.io.

We found that the analysis of strings in the constant pool and of sensitive APIs in the bytecode instructions aid in the task of detecting

malicious Java packages by signicantly reducing the information, thus, making also manual triage possible.

Citing this paper

This is a pre-print of the paper that appears in the Proceedings of the 2022 ACM Workshop on Software Supply Chain Oensive Research

and Ecosystem Defenses (SCORED ’22), November 11, 2022, Los Angeles, CA, USA.

If you wish to cite this work, please refer to it as follows:

@INPROCEEDINGS{ladisa22towardsjava,

author={Piergiorgio Ladisa and Henrik Plate and Matias Martinez and Olivier Barais and Serena Elisa Ponta},

booktitle={2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses (SCORED ’22)},

title={Towards the Detection of Malicious Java Packages},

year=2022,

}

arXiv:2210.03998v1 [cs.CR] 8 Oct 2022

Piergiorgio Ladisa, Henrik Plate, Matias Martinez, Olivier Barais, and Serena Elisa Ponta

Towards the Detection of Malicious Java Packages

Piergiorgio Ladisa

SAP Security Research

Mougins, France

University of Rennes 1/INRIA/IRISA

Rennes, France

piergiorgio.ladisa@sap.com

piergiorgio.ladisa@irisa.fr

Henrik Plate

SAP Security Research

Mougins, France

henrik.plate@sap.com

Matias Martinez

Université Polytechnique

Hauts-de-France

Valenciennes, France

matias.martinez@uphf.fr

Olivier Barais

University of Rennes 1/INRIA/IRISA

Rennes, France

olivier.barais@irisa.fr

Serena Elisa Ponta

SAP Security Research

Mougins, France

serena.ponta@sap.com

ABSTRACT

Open-source software supply chain attacks aim at infecting down-

stream users by poisoning open-source packages. The common

way of consuming such artifacts is through package repositories

and the development of vetting strategies to detect such attacks is

ongoing research. Despite its popularity, the Java ecosystem is the

less explored one in the context of supply chain attacks.

In this paper we present indicators of malicious behavior that can

be observed statically through the analysis of Java bytecode. Then

we evaluate how such indicators and their combinations perform

when detecting malicious code injections. We do so by injecting

three malicious payloads taken from real-world examples into the

Top-10 most popular Java libraries from libraries.io.

We found that the analysis of strings in the constant pool and

of sensitive APIs in the bytecode instructions aid in the task of

detecting malicious Java packages by signicantly reducing the

information, thus, making also manual triage possible.

CCS CONCEPTS

•Security and privacy →Malware and its mitigation.

KEYWORDS

Open-Source Security, Supply Chain Attacks, Malware Detection

1 INTRODUCTION

Today’s software supply chains make extensive use of open-source

components. Despite the clear advantages, the lack of transparency

and reliance on unknown stakeholders and systems pose several

risks.

Open-Source Software (

OSS

) supply chain attacks are character-

ized by the injection of malicious code into open-source compo-

nents as a means for spreading malwares [

] and there exist many

attack vectors [

]. Package repositories for

OSS

(e.g., npm, PyPI,

Maven Central) are commonly used by downstream users to con-

sume

OSS

packages and several scientic works focus on vetting

mechanisms at scale. Most of those tackle interpreted languages

(e.g., JavaScript, Python), whereas the Java ecosystem is less ex-

plored despite its popularity [7, 8] .

Based on the study of real-world attacks, our goal is to nd and

evaluate indicators of malicious behavior in Java packages. Since

the usual way of consuming the latter is through pre-compiled

JARs, we focus on Java bytecode. Indicators of malicious behavior

can be used by package repositories vetting the submitted packages

or by downstream users checking the downloaded dependencies.

We set out to answer the following research questions:

RQ1

– What are some of the possible indicators of malicious

behavior that can be observed from the bytecode?

RQ2

– How do these indicators and their combinations perform

in the detection of malicious Java packages?

To answer those questions we analyze both the constant pool

(e.g., to detect obfuscated strings) and bytecode instructions (e.g., to

detect sensitive APIs). We assess the performance of the identied

indicators by analyzing the Top-10 Java projects from libraries.io

both the original, benign ones as well as infected ones, containing

malicious payloads taken from three real-world attacks.

The remainder of the paper is organized as follows. Section 2

presents related works. Section 3 motivates the need for improving

the detection of malicious Java Archive (

JAR

)s in the context of

OSS

supply chain attacks. Section 4 describes background information.

Section 5 presents our static analysis of Java Virtual Machine (

JVM

)

bytecode and answers to RQ1. Section 6 answers RQ2 by evaluating

the indicators of malicious behavior in the Java bytecode. Section 7

discusses the limitations of our approach, while Section 8 highlights

the conclusions and discusses future works.

2 RELATED WORKS

Table 1 shows work to date about the detection of malicious open-

source packages.

Seja et al. [

] propose a machine learning-based approach

for the automated detection of malicious npm packages trained

on a labeled dataset. We port some of the considered features in

the context of Java, in particular the concept of sensitive APIs

(e.g., process creation, dynamic code generation) and the usage of

Shannon entropy to detect obfuscation. While they apply the latter

at the le level to detect the presence of compiled or minied code,

we apply it to the strings found in the Java class le’s constant pool.

1https://libraries.io/

Towards the Detection of Malicious Java Packages

Vu et al. [

] analyze the discrepancy between source code and

the deployed package in PyPI as a way to detect malicious injections

in the Python ecosystem. Scalco et al. [

] perform the same in the

context of JavaScript. Conversely, we do not consider the source

code.

Duan et al. [

] propose a classier based both on dynamic and

static analysis to classify packages in npm, PyPI, and RubyGems.

Among the selected features for the static analysis, they also suggest

considering sensitive APIs and to perform data ow analysis to

highlight dangerous ows.

Ohm et al. [

] leverage sandboxes to collect forensic artifacts re-

lated to the execution of malicious JavaScript and Python packages

and describe the observed dynamic behaviors. In another work [

]

they propose a clustering model based on signatures produced from

the Abstract Syntax Tree (

AST

) of malicious JavaScript samples.

Our approach is instead only static and focuses on Java.

Garret et al. [

] propose an anomaly detection approach based

on the observation of code features in JavaScript (e.g., opening of

connections, read/write to the le system).

Fass et al. [

] extract features from the

AST

of JavaScript codes

to build a classier capable of detecting obfuscation.

In the scope of Java malware detection, related works focus on

the detection of malicious code in applets or purely malicious JARs.

Schlumberger et al. [

] propose a static approach for applets

based on machine learning. Among the selected features they con-

sider sensitive APIs (e.g., for obfuscation and code behavior). Com-

pared to their work, our focus is on

OSS

packages that may contain

a small portion of malicious code, while malicious applets do not

need to piggyback on existing benign functionalities. In addition,

some of the APIs for applets are not relevant in the scope of Java

libraries (e.g., APIs for MIDlets).

Pinheiro et al. [

] propose a dynamic approach for the auto-

mated detection of malicious JARs. They extract forensic features

related to the execution of purely malicious samples in a sandboxed

environment to train a classier based on articial neural networks.

Instead, we perform a static analysis of packages where malicious

code was injected.

Other relevant works come from the Android ecosystem [

], especially the ones about the static inspection of Dalvik

bytecode. In this case, Aafer et al. [

] analyze Android malware

samples to extract their commonly used APIs, then build a KNN

classier. As opposed to their work, not having many malicious

samples available, our search for relevant APIs is based on the

manual inspection of malicious packages. Specic aspects of the

Android ecosystem make the problem of detecting malicious Java

libraries dierent. On the one hand, because malware running

on mobile devices has dierent objectives, e.g., nancial gain by

sending SMS or reading contacts. On the other hand, there are

technical dierences between Java for Android and for the JVM

(e.g., permissions, intents, or APIs existing only for Android).

3 MOTIVATION

This section motivates our work by presenting

OSS

supply chain

attacks and reports on the detection capabilities of popular An-

tiviruses (AVs).

Reference Year

Ruby

Python

JavaScript

Java∗

Seja et al. [31] 2022 ✓

Scalco et al. [29] 2022 ✓

Duan et al. [13] 2021 ✓ ✓ ✓

Vu et al. [34] 2021 ✓

Ohm et al. [25] 2020 ✓ ✓

Ohm et al. [23] 2020 ✓

Garret et al. [16] 2019 ✓

Fass et al. [14] 2018 ✓

Table 1: Ecosystems covered by recent scientic works on

the detection of malicious open-source packages. (∗): here

we intend the case of JVM bytecode

3.1 Open-Source Software Supply Chain

Attacks

Listing 1: Malicious code snippet from HpServlet.java con-

tained in com.github.codingandcoding:servlet-api@3.2.0

1protected void doG et ( H tt pS er vl e tR eq ue st re q )

2throws Se rv le tE xc ep ti on , IO Excep ti on {

3Ru ntime . getR unt im e ()

4. ex ec ( " bash -c { echo , Y mFz ** S HOR TE NE D ** J jE =}

5|{ b ase6 4 , -d } |{ bas h , - i} " );

OSS

supply chain attacks target open-source components as a

means of spreading malware. As Ladisa et al. [

] pointed out, there

are many possible attack vectors.

As demonstrated by multiple examples of malicious

OSS

pack-

ages [

], most of them prove to have a small fraction of harmful

code hidden in a bigger corpus of legitimate code. This is similar to

the case of piggybacking in Android malwares, where a legitimate

application (carrier) is repackaged by grafting a malicious code

(rider) [11, 19, 20].

Listing 1 shows the malicious code snippet present in the package

com.github.codingandcoding:servlet-api@3.2.0. Its source version

consists of 149 les, 77 of which are in java format and a total of

13458 Lines Of Code (

LOC

). The malicious payload is present in

only one line of code of the le HttpServlet.java, which consists of

a total of 749 LOC.

3.2 VirusTotal Scan

To understand the level of detection by common

AVs

we submitted

the 2886 samples available in the Backstabber’s Knife Collection

(

BKC

) [

] (813 Ruby samples, 261 in Python, 1807 in JavaScript,

and 4 in Java) to VirusTotal

(as of July 2022). The latter oers the

possibility to inspect les or hashes with over 70 Antivirus (

)

softwares.

Figure 1 depicts the percentage of responses provided by all the

AVs

accessible through the VirusTotal API. The results show the

average calculated per ecosystem.

2https://www.virustotal.com

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TowardstheDetectionofMaliciousJavaPackages[PRE-PRINT]PiergiorgioLadisa,HenrikPlate,MatiasMartinez,OlivierBarais,SerenaElisaPontaOpen-sourcesoftwaresupplychainattacksaimatinfectingdownstreamusersbypoisoningopen-sourcepackages.Thecommonwayofconsumingsuchartifactsisthroughpackagerepositoriesandthedevel...

展开>> 收起<<

Towards the Detection of Malicious Java Packages PRE-PRINT Piergiorgio Ladisa Henrik Plate Matias Martinez Olivier Barais Serena Elisa Ponta.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Towards the Detection of Malicious Java Packages PRE-PRINT Piergiorgio Ladisa Henrik Plate Matias Martinez Olivier Barais Serena Elisa Ponta

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: