On the E ffectiveness of Hybrid Pooling in Mixup-Based Graph Learning for Language Processing Zeming Donga Qiang Hub Zhenya Zhanga Yuejun Guoc Maxime Cordyb Mike Papadakisb Yves Le Traonb

2025-05-02 0 0 649.67KB 20 页 10玖币

侵权投诉

On the Eﬀectiveness of Hybrid Pooling in

Mixup-Based Graph Learning for Language Processing

Zeming Donga, Qiang Hub,∗, Zhenya Zhanga, Yuejun Guoc, Maxime Cordyb, Mike Papadakisb, Yves Le Traonb,

Jianjun Zhaoa

aKyushu University

bUniversity of Luxembourg

cLuxembourg Institute of Science and Technology

Abstract

Graph neural network (GNN)-based graph learning has been popular in natural language and programming

language processing, particularly in text and source code classiﬁcation. Typically, GNNs are constructed by incorporating

alternating layers which learn transformations of graph node features, along with graph pooling layers that use

graph pooling operators (e.g., Max-pooling) to eﬀectively reduce the number of nodes while preserving the semantic

information of the graph. Recently, to enhance GNNs in graph learning tasks, Manifold-Mixup, a data augmentation

technique that produces synthetic graph data by linearly mixing a pair of graph data and their labels, has been widely

adopted. However, the performance of Manifold-Mixup can be highly aﬀected by graph pooling operators, and there

have not been many studies that are dedicated to uncovering such aﬀection. To bridge this gap, we take an early step to

explore how graph pooling operators aﬀect the performance of Mixup-based graph learning. To that end, we conduct

a comprehensive empirical study by applying Manifold-Mixup to a formal characterization of graph pooling based on

11 graph pooling operations (9 hybrid pooling operators, 2 non-hybrid pooling operators). The experimental results

on both natural language datasets (Gossipcop, Politifact) and programming language datasets (JAVA250, Python800)

demonstrate that hybrid pooling operators are more eﬀective for Manifold-Mixup than the standard Max-pooling and

the state-of-the-art graph multiset transformer (GMT) pooling, in terms of producing more accurate and robust GNN

models.

Keywords: Hybrid Pooling, Data Augmentation, Graph Learning, Manifold-Mixup, Language Processing

1. Introduction

Since texts, as well as source code, can be represented as graph-structured data [1, 2], graph neural network

(GNN)-based graph learning has been increasingly applied for both natural language processing (NLP) [3], and

programming language (PL) understanding [4, 5]. The application of GNNs has achieved remarkable results, e.g.,

Allamanis et al. [6] utilize GNNs to learn the syntax tree and data ﬂow representations of source code, by which they

manage to accomplish several software engineering tasks, such as code completion and defect detection.

Typically, high-quality training data, including features and their corresponding labels, are necessary to train GNN

models with competitive performance. However, preparing the labeled data is often not easy, especially in the context

of source code labeling that requires advanced expertise [7]. To alleviate the data labeling issue, data augmentation

has been proposed to enhance training data by modifying original data. As the state-of-the-art in data augmentation,

Mixup [8] achieves impressive results in diﬀerent tasks. Take image classiﬁcation for an example: Mixup synthesizes

new images and labels as additional training data by ﬁrst selecting two raw images at random from the original training

data and then linearly mixing their features and labels. Recent research [9, 10] demonstrates that Manifold-Mixup, the

specialized application of Mixup on graph-structured data [10], focusing on interpolating graph-level embeddings, can

∗Corresponding author

Preprint submitted to Elsevier May 24, 2024

arXiv:2210.03123v3 [cs.LG] 22 May 2024

also achieve great performance for graph-structured data. With the success of Manifold-Mixup, utilizing Mixup-based

data augmentation in graph learning has emerged as a mainstream paradigm.

As indicated by existing studies [10, 11], Mixup-based graph learning is mainly inﬂuenced by two factors, namely,

1) the hyperparameters in Mixup itself, such as the Mixup ratio that balances the proportion of the source data, and 2)

the Mixup strategies that are associated with representation generation. Over these factors, the hyperparameter issue is

a common one across several diﬀerent Mixup-applied ﬁelds, and has been extensively studied in ﬁelds such as image

classiﬁcation [12, 13]. However, the second issue about Mixup strategies is highly correlated to the context in which

Mixup is applied, and for graph-structured data, the inﬂuence of such an issue has not been well studied.

In the context of Mixup-based graph learning, as shown in Figure 1, Manifold-Mixup is fed with the inputs from

the graph pooling layer, which use graph pooling operators (e.g., Max-pooling) to produce coarsened representations

of the given graph while preserving its semantic information. Namely, this layer is the key to representation generation

of Manifold-Mixup, in that Manifold-Mixup generates augmented training data by interpolating these representations.

Therefore, the performance of Manifold-Mixup can be highly aﬀected by the graph pooling operators. Recent

works [14, 15, 16] have attempted to systematically analyze the importance of graph pooling in representation

generation; however, the following question, namely, how diﬀerent graph pooling operators aﬀect the eﬀectiveness of

Mixup-based graph learning, still remains open.

In this paper, we tackle this problem by empirically analyzing the diﬀerence when Mixup is applied in diﬀerent

graph representations generated by diﬀerent graph pooling operators. Speciﬁcally, we focus on two types of graph

pooling methods, namely, standard pooling methods and a unifying formulation of hybrid (mixture) pooling operators.

For the standard pooling, the Max-pooling, which is the most widely used one [15], and the state-of-the-art graph

multiset transformer pooling (GMT) [17] which is a global pooling layer based on multi-head attention and capturing

node interactions based on structural dependencies, are considered. For the hybrid pooling, we extend the prior work

[16, 18] and design 9 types of hybrid pooling strategies, and more details are introduced in Table 1. Here, GMT

and hybrid pooling operators are considered more advanced strategies. We conduct empirical experiments to evaluate

the eﬀectiveness of graph learning using Manifold-Mixup [11], under diﬀerent hybrid pooling operators. In total,

our experiments cover diverse types of datasets, including two programming languages (JAVA and Python) and one

natural language (English), and consider diﬀerent tasks, two widely-studied graph-level classiﬁcation tasks (program

classiﬁcation and Fake news detection), and six GNN model architectures. Based on that, we answer the following

research questions:

RQ1: How eﬀective are hybrid pooling operators for enhancing the accuracy of Mixup-based graph learning?

The results on NLP datasets (Gossipcop and Politifact used for fake news detection) show that the hybrid pooling

operator Type 1 (Msum(Patt,Pmax)) outperforms GMT by up to 4.38% accuracy. On PL datasets (JAVA250 and

Python800 used for problem classiﬁcation), also the hybrid pooling operator Type 1 (Msum(Patt,Pmax)) surpasses

GMT by up to 2.36% accuracy.

RQ2: How eﬀective are hybrid pooling operators for enhancing the robustness of Mixup-based graph learning?

The results demonstrate that in terms of robustness, the hybrid pooling operator Type 6 (Mconcat(Patt,Psum)) surpasses

GMT by up to 23.23% in fake news detection, while the hybrid pooling operator Type 1 (Msum(Patt,Pmax)) outperforms

GMT by up to 10.23% in program classiﬁcation.

RQ3: How does the hyperparameter setting aﬀect the eﬀectiveness of Manifold-Mixup when hybrid pooling

operators are applied? According to [8], the hyperparameter λdenotes the interpolation ratio, and it is sampled from

aBeta distribution with a shape parameter α(λ∼Beta (α, α)). Existing works [10, 11] show that the hyperparameter

setting λaﬀects the performance of Mixup. Therefore, we study the eﬀectiveness of Manifold-Mixup when using

hybrid pooling operators under diﬀerent hyperparameters of Mixup. Experimental results indicate that a smaller value

of the hyperparameter leads to better robustness and accuracy.

In summary, the contributions of this paper are as follows:

•This is the ﬁrst work that explores the potential inﬂuence of graph pooling operators on Mixup-based graph-

structured data augmentation. To facilitate reproducibility, our code and data are available online 1.

•We discuss and further extend the hybrid pooling operators from existing works.

1https://github.com/zemingd/HybridPool4Mixup

•The comprehensive empirical analysis demonstrates that hybrid pooling is a better way for Mixup-based graph-

structured data augmentation.

2. Background and Related Work

2.1. Graph Data Classiﬁcation

Researchers have proposed multiple approaches for the text classiﬁcation task that analyze the data based on its

graph structure. In which Yao et al. [19] constructed text graph data by using the words and documents as nodes. To

further enhance text classiﬁcation performance, Zhang et al. [20] proposed the graph-based word interaction to capture

the contextual word relationships. Similar to text data, by capturing the relationship of diﬀerent components (e.g.,

variables and operators) in the code, source code data can also be represented structurally as the graph data [5, 21].

Zhou et al. [7] mainly integrated four separate subgraph representations of source code into one joint graph data.

Furthermore, to advance the generalization, Allamanis et al. [22] oﬀered four code rewrite rules, such as variable

renaming, comment deletion, etc., as a data augmentation for graph-level program classiﬁcation.

Diﬀerent from the above works, our study focuses on enhancing the performance of graph classiﬁcation with

Mixup-based data augmentation.

2.2. Graph Pooling Operators

Graph pooling [23, 16] plays a crucial role in capturing relevant structure information of the entire graph. Existing

works [24, 25, 26] have proposed the basic graph pooling methods, such as summing or averaging all of the node

features. However, such pooling methods treat every node information identically, which could lose the structural

information. To solve this problem, researchers [27, 28, 29, 30] dropped nodes with lower scores using a learnable

scoring function, which can compress the graph and alleviate the impact of irrelevant nodes to save the important

structural information. Additionally, to locate the tightly related communities on a graph, recent works [31, 32, 33, 34]

have considered the graph pooling as the node clustering problem, where nodes were speciﬁcally aggregated to the

same cluster. To combine these advantages, Ranjan et al. [35] ﬁrst clustered nearby nodes locally and dropped clusters

with lower scores. In addition, there existed a kind of attention-based pooling methods [36, 37, 38, 17] that scored

nodes with an attention mechanism to weight the relevance of nodes to the current graph-level task. Besides, diﬀerent

from the above standard pooling methods, Nguyen et al. [18] leveraged a mixture of Sum-pooling and Max-pooling

methods for graph classiﬁcation.

In our study, we examine the application of Manifold-Mixup in graph-level classiﬁcation, incorporating both

hybrid pooling and standard pooling techniques. Moreover, diﬀerent from existing research [18], we speciﬁcally

include attention pooling because it has been proven eﬀective for GNN model training [29, 36]. We extended from

the original three diﬀerent types of hybrid pooling operators [18] to the current nine.

2.3. Mixup

Due to its eﬀectiveness in graph-structured data processing and the promising performance on graph-speciﬁc

downstream tasks, e.g., graph classiﬁcation, GNN has recently received considerable attention. Meanwhile, as a

sophisticated data augmentation method, Mixup [8] was initially proposed and implemented within the domain of

image classiﬁcation [39]. Speciﬁcally, Mixup randomly selects a pair of images from the training data, linearly

combines their features and labels, and synthesizes a new image and label, which are treated as augmented training

data. By combining two diﬀerent data points linearly, Mixup eﬀectively smooths the data distribution in the feature

space. This smoothing process helps mitigate the sharpness of decision boundaries between diﬀerent classes, reducing

susceptibility to overﬁtting [40]. Due to its remarkable eﬃcacy in classiﬁcation tasks, it has subsequently been

gradually applied to NLP [41] and source code learning [10].

In the NLP, Guo et al. [41] proposed two basic strategies of Mixup for augmenting data. One was wording

embedding-based, and another was sentence embedding-based. After these two diﬀerent kinds of embedding, the

feature of input data can be mixed to synthesize the new data in vector space. To solve the diﬃculty in mixing text

data in the raw format, Sun et al. [42] mixed text data from transformer-based pre-trained architecture. Chen et al. [43]

increased the size of augmented samples by interpolating text data in hidden space. Zhang et al. [44] generated extra

labeled sequences in each iteration to augment the scale of training data. Unlike previous work, some researchers

{𝑥∗

𝒢}

𝑅𝑒𝑎𝑑𝑂𝑢𝑡+

(SUMPOOL)

𝑅𝑒𝑎𝑑𝑂𝑢𝑡2

(MAXPOOL)

𝑅𝑒𝑎𝑑𝑂𝑢𝑡5

(AttentionPOOL)

Node?Embedding

Hybrid?Pooling?Functions

Graph?Embedding

{𝑥∗

𝒱}

GNN Graph Pooling?Layer

Classifier

Mainfold-Mixup

Label?Embedding

{𝑦∗

𝒢}

Graph?Structured?Data

GNN Alternating Layers

Figure 1: Overview of Mixup-based graph learning via hybrid pooling.

considered the raw text itself to augment the input data. Yoon et al. [45] synthesized the new text data from two raw

input data by span-based mixing to replace the hidden vectors. In source code learning, Dong et al. [10] proposed a

mixup-based data augmentation method that linearly interpolates the features of a pair of programs as well as their

labels for the GNN model training.

In our work, we do not simply employ Mixup for graph-structured data classiﬁcation. Instead, we explore how

diﬀerent graph pooling operators aﬀect the eﬀectiveness of Mixup.

2.4. Empirical Study on Data Augmentation

Recently, there has been a surge of empirical studies exploring the topic of data augmentation. In the NLP,

Konno et al. [46] presented an empirical analysis to evaluate the eﬀectiveness of contextual data augmentation

(CDA) in improving the quality of the augmented training data, in comparison to [MASK]-based augmentation and

linguistically-controlled masking. To assist practitioners in selecting suitable augmentation strategies, Chen et al. [47]

conducted a comprehensive empirical study on 11 datasets, encompassing topics such as news classiﬁcation, inference

tasks, paraphrasing tasks, and single-sentence tasks. In source code learning, Yu et al. [48] conducted an empirical

study on three program-related downstream tasks, namely method naming, code commenting, and clone detection.

The study aimed to validate the eﬀectiveness of data augmentation that is designed by 18 program transformation

methods that preserve both semantics and syntax-naturalness. Dong et al. [21] presented a meticulous empirical study

that aimed to evaluate the eﬀectiveness of data augmentation methods that were adapted from the domains of NLP and

graph learning for source code learning. The study rigorously examined the impact of these augmentation techniques

on enhancing the performance of various tasks related to source code analysis and understanding.

Diﬀerent from existing empirical studies, our work focuses primarily on examining the inﬂuence of graph pooling

operators on Mixup-based data augmentation that has received limited attention thus far.

3. Mixup-Based Graph Learning via Hybrid Pooling

3.1. Overview

Figure 1 provides an overview of GNNs for graph classiﬁcation, demonstrating the application of Manifold-Mixup

after the hybrid pooling layer. Concretely, ﬁrst, GNNs process the input graph-structured data and transform them into

node attributes nxV

ion

i=1, where Vis the vertex set. Then, after learning the features of each node, the hybrid pooling

layer produces the entire graph embedding by utilizing three fundamental types of readout functions: Sum-pooling

(SUMPOOL), Max-pooling (MAXPOOL), and Attention-pooling (AttentionPOOL). Finally, as shown in Eq. (1),

Manifold-Mixup is applied to randomly mix two selected graph embeddings f

i,f

jand their ground truth labels yG

jwith one-hot values as the new training set. This augmented training set is then used for training the classiﬁer.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OntheEffectivenessofHybridPoolinginMixup-BasedGraphLearningforLanguageProcessingZemingDonga,QiangHub,∗,ZhenyaZhanga,YuejunGuoc,MaximeCordyb,MikePapadakisb,YvesLeTraonb,JianjunZhaoaaKyushuUniversitybUniversityofLuxembourgcLuxembourgInstituteofScienceandTechnologyAbstractGraphneuralnetwork(GNN)-basedg...

展开>> 收起<<

On the E ffectiveness of Hybrid Pooling in Mixup-Based Graph Learning for Language Processing Zeming Donga Qiang Hub Zhenya Zhanga Yuejun Guoc Maxime Cordyb Mike Papadakisb Yves Le Traonb.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

On the E ffectiveness of Hybrid Pooling in Mixup-Based Graph Learning for Language Processing Zeming Donga Qiang Hub Zhenya Zhanga Yuejun Guoc Maxime Cordyb Mike Papadakisb Yves Le Traonb

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: