On the E ffectiveness of Hybrid Pooling in Mixup-Based Graph Learning for Language Processing Zeming Donga Qiang Hub Zhenya Zhanga Yuejun Guoc Maxime Cordyb Mike Papadakisb Yves Le Traonb

2025-05-02 0 0 649.67KB 20 页 10玖币
侵权投诉
On the Eectiveness of Hybrid Pooling in
Mixup-Based Graph Learning for Language Processing
Zeming Donga, Qiang Hub,, Zhenya Zhanga, Yuejun Guoc, Maxime Cordyb, Mike Papadakisb, Yves Le Traonb,
Jianjun Zhaoa
aKyushu University
bUniversity of Luxembourg
cLuxembourg Institute of Science and Technology
Abstract
Graph neural network (GNN)-based graph learning has been popular in natural language and programming
language processing, particularly in text and source code classification. Typically, GNNs are constructed by incorporating
alternating layers which learn transformations of graph node features, along with graph pooling layers that use
graph pooling operators (e.g., Max-pooling) to eectively reduce the number of nodes while preserving the semantic
information of the graph. Recently, to enhance GNNs in graph learning tasks, Manifold-Mixup, a data augmentation
technique that produces synthetic graph data by linearly mixing a pair of graph data and their labels, has been widely
adopted. However, the performance of Manifold-Mixup can be highly aected by graph pooling operators, and there
have not been many studies that are dedicated to uncovering such aection. To bridge this gap, we take an early step to
explore how graph pooling operators aect the performance of Mixup-based graph learning. To that end, we conduct
a comprehensive empirical study by applying Manifold-Mixup to a formal characterization of graph pooling based on
11 graph pooling operations (9 hybrid pooling operators, 2 non-hybrid pooling operators). The experimental results
on both natural language datasets (Gossipcop, Politifact) and programming language datasets (JAVA250, Python800)
demonstrate that hybrid pooling operators are more eective for Manifold-Mixup than the standard Max-pooling and
the state-of-the-art graph multiset transformer (GMT) pooling, in terms of producing more accurate and robust GNN
models.
Keywords: Hybrid Pooling, Data Augmentation, Graph Learning, Manifold-Mixup, Language Processing
1. Introduction
Since texts, as well as source code, can be represented as graph-structured data [1, 2], graph neural network
(GNN)-based graph learning has been increasingly applied for both natural language processing (NLP) [3], and
programming language (PL) understanding [4, 5]. The application of GNNs has achieved remarkable results, e.g.,
Allamanis et al. [6] utilize GNNs to learn the syntax tree and data flow representations of source code, by which they
manage to accomplish several software engineering tasks, such as code completion and defect detection.
Typically, high-quality training data, including features and their corresponding labels, are necessary to train GNN
models with competitive performance. However, preparing the labeled data is often not easy, especially in the context
of source code labeling that requires advanced expertise [7]. To alleviate the data labeling issue, data augmentation
has been proposed to enhance training data by modifying original data. As the state-of-the-art in data augmentation,
Mixup [8] achieves impressive results in dierent tasks. Take image classification for an example: Mixup synthesizes
new images and labels as additional training data by first selecting two raw images at random from the original training
data and then linearly mixing their features and labels. Recent research [9, 10] demonstrates that Manifold-Mixup, the
specialized application of Mixup on graph-structured data [10], focusing on interpolating graph-level embeddings, can
Corresponding author
Preprint submitted to Elsevier May 24, 2024
arXiv:2210.03123v3 [cs.LG] 22 May 2024
also achieve great performance for graph-structured data. With the success of Manifold-Mixup, utilizing Mixup-based
data augmentation in graph learning has emerged as a mainstream paradigm.
As indicated by existing studies [10, 11], Mixup-based graph learning is mainly influenced by two factors, namely,
1) the hyperparameters in Mixup itself, such as the Mixup ratio that balances the proportion of the source data, and 2)
the Mixup strategies that are associated with representation generation. Over these factors, the hyperparameter issue is
a common one across several dierent Mixup-applied fields, and has been extensively studied in fields such as image
classification [12, 13]. However, the second issue about Mixup strategies is highly correlated to the context in which
Mixup is applied, and for graph-structured data, the influence of such an issue has not been well studied.
In the context of Mixup-based graph learning, as shown in Figure 1, Manifold-Mixup is fed with the inputs from
the graph pooling layer, which use graph pooling operators (e.g., Max-pooling) to produce coarsened representations
of the given graph while preserving its semantic information. Namely, this layer is the key to representation generation
of Manifold-Mixup, in that Manifold-Mixup generates augmented training data by interpolating these representations.
Therefore, the performance of Manifold-Mixup can be highly aected by the graph pooling operators. Recent
works [14, 15, 16] have attempted to systematically analyze the importance of graph pooling in representation
generation; however, the following question, namely, how dierent graph pooling operators aect the eectiveness of
Mixup-based graph learning, still remains open.
In this paper, we tackle this problem by empirically analyzing the dierence when Mixup is applied in dierent
graph representations generated by dierent graph pooling operators. Specifically, we focus on two types of graph
pooling methods, namely, standard pooling methods and a unifying formulation of hybrid (mixture) pooling operators.
For the standard pooling, the Max-pooling, which is the most widely used one [15], and the state-of-the-art graph
multiset transformer pooling (GMT) [17] which is a global pooling layer based on multi-head attention and capturing
node interactions based on structural dependencies, are considered. For the hybrid pooling, we extend the prior work
[16, 18] and design 9 types of hybrid pooling strategies, and more details are introduced in Table 1. Here, GMT
and hybrid pooling operators are considered more advanced strategies. We conduct empirical experiments to evaluate
the eectiveness of graph learning using Manifold-Mixup [11], under dierent hybrid pooling operators. In total,
our experiments cover diverse types of datasets, including two programming languages (JAVA and Python) and one
natural language (English), and consider dierent tasks, two widely-studied graph-level classification tasks (program
classification and Fake news detection), and six GNN model architectures. Based on that, we answer the following
research questions:
RQ1: How eective are hybrid pooling operators for enhancing the accuracy of Mixup-based graph learning?
The results on NLP datasets (Gossipcop and Politifact used for fake news detection) show that the hybrid pooling
operator Type 1 (Msum(Patt,Pmax)) outperforms GMT by up to 4.38% accuracy. On PL datasets (JAVA250 and
Python800 used for problem classification), also the hybrid pooling operator Type 1 (Msum(Patt,Pmax)) surpasses
GMT by up to 2.36% accuracy.
RQ2: How eective are hybrid pooling operators for enhancing the robustness of Mixup-based graph learning?
The results demonstrate that in terms of robustness, the hybrid pooling operator Type 6 (Mconcat(Patt,Psum)) surpasses
GMT by up to 23.23% in fake news detection, while the hybrid pooling operator Type 1 (Msum(Patt,Pmax)) outperforms
GMT by up to 10.23% in program classification.
RQ3: How does the hyperparameter setting aect the eectiveness of Manifold-Mixup when hybrid pooling
operators are applied? According to [8], the hyperparameter λdenotes the interpolation ratio, and it is sampled from
aBeta distribution with a shape parameter α(λBeta (α, α)). Existing works [10, 11] show that the hyperparameter
setting λaects the performance of Mixup. Therefore, we study the eectiveness of Manifold-Mixup when using
hybrid pooling operators under dierent hyperparameters of Mixup. Experimental results indicate that a smaller value
of the hyperparameter leads to better robustness and accuracy.
In summary, the contributions of this paper are as follows:
This is the first work that explores the potential influence of graph pooling operators on Mixup-based graph-
structured data augmentation. To facilitate reproducibility, our code and data are available online 1.
We discuss and further extend the hybrid pooling operators from existing works.
1https://github.com/zemingd/HybridPool4Mixup
2
The comprehensive empirical analysis demonstrates that hybrid pooling is a better way for Mixup-based graph-
structured data augmentation.
2. Background and Related Work
2.1. Graph Data Classification
Researchers have proposed multiple approaches for the text classification task that analyze the data based on its
graph structure. In which Yao et al. [19] constructed text graph data by using the words and documents as nodes. To
further enhance text classification performance, Zhang et al. [20] proposed the graph-based word interaction to capture
the contextual word relationships. Similar to text data, by capturing the relationship of dierent components (e.g.,
variables and operators) in the code, source code data can also be represented structurally as the graph data [5, 21].
Zhou et al. [7] mainly integrated four separate subgraph representations of source code into one joint graph data.
Furthermore, to advance the generalization, Allamanis et al. [22] oered four code rewrite rules, such as variable
renaming, comment deletion, etc., as a data augmentation for graph-level program classification.
Dierent from the above works, our study focuses on enhancing the performance of graph classification with
Mixup-based data augmentation.
2.2. Graph Pooling Operators
Graph pooling [23, 16] plays a crucial role in capturing relevant structure information of the entire graph. Existing
works [24, 25, 26] have proposed the basic graph pooling methods, such as summing or averaging all of the node
features. However, such pooling methods treat every node information identically, which could lose the structural
information. To solve this problem, researchers [27, 28, 29, 30] dropped nodes with lower scores using a learnable
scoring function, which can compress the graph and alleviate the impact of irrelevant nodes to save the important
structural information. Additionally, to locate the tightly related communities on a graph, recent works [31, 32, 33, 34]
have considered the graph pooling as the node clustering problem, where nodes were specifically aggregated to the
same cluster. To combine these advantages, Ranjan et al. [35] first clustered nearby nodes locally and dropped clusters
with lower scores. In addition, there existed a kind of attention-based pooling methods [36, 37, 38, 17] that scored
nodes with an attention mechanism to weight the relevance of nodes to the current graph-level task. Besides, dierent
from the above standard pooling methods, Nguyen et al. [18] leveraged a mixture of Sum-pooling and Max-pooling
methods for graph classification.
In our study, we examine the application of Manifold-Mixup in graph-level classification, incorporating both
hybrid pooling and standard pooling techniques. Moreover, dierent from existing research [18], we specifically
include attention pooling because it has been proven eective for GNN model training [29, 36]. We extended from
the original three dierent types of hybrid pooling operators [18] to the current nine.
2.3. Mixup
Due to its eectiveness in graph-structured data processing and the promising performance on graph-specific
downstream tasks, e.g., graph classification, GNN has recently received considerable attention. Meanwhile, as a
sophisticated data augmentation method, Mixup [8] was initially proposed and implemented within the domain of
image classification [39]. Specifically, Mixup randomly selects a pair of images from the training data, linearly
combines their features and labels, and synthesizes a new image and label, which are treated as augmented training
data. By combining two dierent data points linearly, Mixup eectively smooths the data distribution in the feature
space. This smoothing process helps mitigate the sharpness of decision boundaries between dierent classes, reducing
susceptibility to overfitting [40]. Due to its remarkable ecacy in classification tasks, it has subsequently been
gradually applied to NLP [41] and source code learning [10].
In the NLP, Guo et al. [41] proposed two basic strategies of Mixup for augmenting data. One was wording
embedding-based, and another was sentence embedding-based. After these two dierent kinds of embedding, the
feature of input data can be mixed to synthesize the new data in vector space. To solve the diculty in mixing text
data in the raw format, Sun et al. [42] mixed text data from transformer-based pre-trained architecture. Chen et al. [43]
increased the size of augmented samples by interpolating text data in hidden space. Zhang et al. [44] generated extra
labeled sequences in each iteration to augment the scale of training data. Unlike previous work, some researchers
3
{𝑥
𝒢}
𝑅𝑒𝑎𝑑𝑂𝑢𝑡+
(SUMPOOL)
𝑅𝑒𝑎𝑑𝑂𝑢𝑡2
(MAXPOOL)
𝑅𝑒𝑎𝑑𝑂𝑢𝑡5
(AttentionPOOL)
Node?Embedding
Hybrid?Pooling?Functions
Graph?Embedding
{𝑥
𝒱}
GNN Graph Pooling?Layer
Classifier
Mainfold-Mixup
Label?Embedding
{𝑦
𝒢}
Graph?Structured?Data
GNN Alternating Layers
Figure 1: Overview of Mixup-based graph learning via hybrid pooling.
considered the raw text itself to augment the input data. Yoon et al. [45] synthesized the new text data from two raw
input data by span-based mixing to replace the hidden vectors. In source code learning, Dong et al. [10] proposed a
mixup-based data augmentation method that linearly interpolates the features of a pair of programs as well as their
labels for the GNN model training.
In our work, we do not simply employ Mixup for graph-structured data classification. Instead, we explore how
dierent graph pooling operators aect the eectiveness of Mixup.
2.4. Empirical Study on Data Augmentation
Recently, there has been a surge of empirical studies exploring the topic of data augmentation. In the NLP,
Konno et al. [46] presented an empirical analysis to evaluate the eectiveness of contextual data augmentation
(CDA) in improving the quality of the augmented training data, in comparison to [MASK]-based augmentation and
linguistically-controlled masking. To assist practitioners in selecting suitable augmentation strategies, Chen et al. [47]
conducted a comprehensive empirical study on 11 datasets, encompassing topics such as news classification, inference
tasks, paraphrasing tasks, and single-sentence tasks. In source code learning, Yu et al. [48] conducted an empirical
study on three program-related downstream tasks, namely method naming, code commenting, and clone detection.
The study aimed to validate the eectiveness of data augmentation that is designed by 18 program transformation
methods that preserve both semantics and syntax-naturalness. Dong et al. [21] presented a meticulous empirical study
that aimed to evaluate the eectiveness of data augmentation methods that were adapted from the domains of NLP and
graph learning for source code learning. The study rigorously examined the impact of these augmentation techniques
on enhancing the performance of various tasks related to source code analysis and understanding.
Dierent from existing empirical studies, our work focuses primarily on examining the influence of graph pooling
operators on Mixup-based data augmentation that has received limited attention thus far.
3. Mixup-Based Graph Learning via Hybrid Pooling
3.1. Overview
Figure 1 provides an overview of GNNs for graph classification, demonstrating the application of Manifold-Mixup
after the hybrid pooling layer. Concretely, first, GNNs process the input graph-structured data and transform them into
node attributes nxV
ion
i=1, where Vis the vertex set. Then, after learning the features of each node, the hybrid pooling
layer produces the entire graph embedding by utilizing three fundamental types of readout functions: Sum-pooling
(SUMPOOL), Max-pooling (MAXPOOL), and Attention-pooling (AttentionPOOL). Finally, as shown in Eq. (1),
Manifold-Mixup is applied to randomly mix two selected graph embeddings f
xG
i,f
xG
jand their ground truth labels yG
i,
yG
jwith one-hot values as the new training set. This augmented training set is then used for training the classifier.
4
摘要:

OntheEffectivenessofHybridPoolinginMixup-BasedGraphLearningforLanguageProcessingZemingDonga,QiangHub,∗,ZhenyaZhanga,YuejunGuoc,MaximeCordyb,MikePapadakisb,YvesLeTraonb,JianjunZhaoaaKyushuUniversitybUniversityofLuxembourgcLuxembourgInstituteofScienceandTechnologyAbstractGraphneuralnetwork(GNN)-basedg...

展开>> 收起<<
On the E ffectiveness of Hybrid Pooling in Mixup-Based Graph Learning for Language Processing Zeming Donga Qiang Hub Zhenya Zhanga Yuejun Guoc Maxime Cordyb Mike Papadakisb Yves Le Traonb.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:649.67KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注