
Comparing One with Many — Solving Binary2source Function Matching Under
Function Inlining
ANG JIA, MING FAN, XI XU, WUXIA JIN, and HAIJUN WANG, Xi’an Jiaotong University, China
QIYI TANG, SEN NIE, and SHI WU, Tencent Security Keen Lab, China
TING LIU, Xi’an Jiaotong University, China
Binary2source function matching is a fundamental task for many security applications, including Software Component Analysis
(SCA).
“1-to-1”
mechanism has been applied in existing binary2source matching works, in which one binary function is matched
against one source function. However, we discovered that such mapping could be
“1-to-n”
(one query binary function maps multiple
source functions), due to the existence of function inlining.
To help conduct binary2source function matching under function inlining, we propose a method named
O2NMatcher
to generate
Source Function Sets (SFSs) as the matching target for binary functions with inlining. We rst propose a model named
ECOCCJ48
for
inlined call site prediction. To train this model, we leverage the compilable OSS to generate a dataset with labeled call sites (inlined
or not), extract several features from the call sites, and design a compiler-opt-based multi-label classier by inspecting the inlining
correlations between dierent compilations. Then, we use this model to predict the labels of call sites in the uncompilable OSS projects
without compilation and obtain the labeled function call graphs of these projects. Next, we regard the construction of SFSs as a sub-tree
generation problem and design root node selection and edge extension rules to construct SFSs automatically. Finally, these SFSs will
be added to the corpus of source functions and compared with binary functions with inlining. We conduct several experiments to
evaluate the eectiveness of O2NMatcher and results show our method increases the performance of existing works by 6% and exceeds
all the state-of-the-art works.1
CCS Concepts:
•Software and its engineering →Search-based software engineering
;
Maintaining software
;
•Security and
privacy →Software and application security.
Additional Key Words and Phrases: Binary2source Matching, Function Inlining, “1-to-n”, Source Function Sets
1 INTRODUCTION
Most software today is not developed entirely from scratch. Instead, developers rely on a range of open-source
components to create their applications[
38
]. According to a report published by Gartner [
29
], over 90% of the development
organizations stated that they rely on open-source components. Although using open-source components helps to
nish projects quicker and reduce costs, dependence on risky open-source components brings software supply chain
security risks[
31
]. For example, due to code reuse, a single vulnerability (e.g., the Heartbleed [
1
] in OpenSSL [
13
]) may
spread across thousands of software, causing 17% (around half a million) of the Internet’s secure web servers vulnerable.
To avoid vulnerable dependence, Software Component Analysis (SCA) [
20
] is proposed to discover software’s
dependence on Open Source Software (OSS) projects. Usually, the SCA service provider maintains a large OSS codebase.
When commercial software companies send their released binary executables, the SCA service provider compares these
binaries with the OSS projects and returns a report of the OSS components that the queried executables contain.
1New Paper
Authors’ addresses: Ang Jia, jiaang@stu.xjtu.edu.cn; Ming Fan, mingfan@mail.xjtu.edu.cn; Xi Xu, xx19960325@stu.xjtu.edu.cn; Wuxia Jin, jinwuxia@
mail.xjtu.edu.cn; Haijun Wang, hjwang.china@gmail.com, Xi’an Jiaotong University, Shaanxi, Xi’an, China, 710049; Qiyi Tang, dodgetang@tencent.com;
Sen Nie, snie@tencent.com; Shi Wu, shiwu@tencent.com, Tencent Security Keen Lab, Shanghai, China, 710049; Ting Liu, tingliu@mail.xjtu.edu.cn, Xi’an
Jiaotong University, Xi’an, China.
Manuscript submitted to ACM 1
arXiv:2210.15159v1 [cs.SE] 27 Oct 2022