Equi-Tuning Group Equivariant Fine-Tuning of Pretrained Models Sourya Basu1 2 Prasanna Sattigeri1Karthikeyan Natesan Ramamurthy1 Vijil Chenthamarakshan1Kush R. Varshney1Lav R. Varshney2Payel Das1

2025-04-26 1 0 1.25MB 16 页 10玖币

侵权投诉

Equi-Tuning: Group Equivariant Fine-Tuning of Pretrained Models

Sourya Basu,1, 2, *Prasanna Sattigeri,1Karthikeyan Natesan Ramamurthy,1

Vijil Chenthamarakshan,1Kush R. Varshney,1Lav R. Varshney,2Payel Das1

1IBM Research – Thomas J. Watson Research Center, 2University of Illinois at Urbana-Champaign

Abstract

We introduce equi-tuning, a novel ﬁne-tuning method that

transforms (potentially non-equivariant) pretrained models

into group equivariant models while incurring minimum L2

loss between the feature representations of the pretrained

and the equivariant models. Large pretrained models can be

equi-tuned for different groups to satisfy the needs of vari-

ous downstream tasks. Equi-tuned models beneﬁt from both

group equivariance as an inductive bias and semantic pri-

ors from pretrained models. We provide applications of equi-

tuning on three different tasks: image classiﬁcation, compo-

sitional generalization in language, and fairness in natural

language generation (NLG). We also provide a novel group-

theoretic deﬁnition for fairness in NLG. The effectiveness of

this deﬁnition is shown by testing it against a standard em-

pirical method of fairness in NLG. We provide experimental

results for equi-tuning using a variety of pretrained models:

Alexnet, Resnet, VGG, and Densenet for image classiﬁca-

tion; RNNs, GRUs, and LSTMs for compositional general-

ization; and GPT2 for fairness in NLG. We test these models

on benchmark datasets across all considered tasks to show the

generality and effectiveness of the proposed method.

1 Introduction

Modern deep learning models show promising transfer-

learning abilities for a wide range of downstream

tasks (Bommasani et al. 2021). Lu et al. (2021) show that

the GPT2 language model (Radford et al. 2019) can be used

as a pretrained model for various downstream tasks such

as numerical computation, image classiﬁcation, and even

protein folding prediction. But pretraining large models re-

quires immense computational and data resources. Hence,

it is essential to design effective ﬁne-tuning algorithms that

can squeeze the most from these pretrained models.

Fine-tuning leverages semantic priors from pretrained

models for downstream tasks. E.g. CNNs trained on Ima-

genet (Deng et al. 2009) can extract useful features from

images outside the training set and can use that ability for

any other downstream image processing task. A different

method of using priors in deep learning is via inductive

biases in models such as group equivariance, e.g. design-

ing group equivariant architectures such as GCNNs (Cohen

*Work done during an internship at IBM Research

(a) Fine-tuning (b) Equi-tuning for the c4group

Figure 1: Comparison of architectures for ﬁne-tuning and

equi-tuning for c4group of 90◦rotations. For (a) ﬁne-

tuning, the input is passed through the pretrained model and

then through a custom layer to obtain the output. For (b)

equi-tuning, the inputs are transformed using the group ac-

tion of c4. These inputs are passed through the pretrained

model parallelly to obtain a list of outputs, which are trans-

formed using inverse transformations from the same group

and passed through a custom equivariant layer to obtain the

output.

and Welling 2016; Kondor and Trivedi 2018). A model is

group equivariant if transformations of its input results in

a group transformation of its output. Popular examples in-

clude CNNs themselves that are equivariant to translations

and GCNNs that are equivariant to more general symmetries

such as 90◦rotations. Thus, ﬁne-tuning and group equivari-

ance leverage different kinds of priors to improve perfor-

mance in a task. But it is not obvious how to effectively use

them together in a single method. Moreover, the same pre-

trained model may need to be used for downstream tasks in

different target domains.

We introduce equi-tuning, a simple ﬁne-tuning method

that yields equivariance, even if the pretrained model is not

equivariant to any group symmetry. This method solves a

simple optimization problem minimizing the distance be-

tween the features of a pretrained model and any group

equivariant model. One salient feature of equi-tuning is its

generality in potential applications. To show this, we exper-

iment with diverse downstream tasks: image classiﬁcation,

language compositionality, and fairness in natural language

generation (NLG).

arXiv:2210.06475v2 [cs.LG] 5 Feb 2023

Figure 2: Rotated input images in (a) give unpredictably

changing features for pretrained Alexnet in (b), whereas fea-

tures from equi-tuned Alexnet change equivariantly in (c).

For image classiﬁcation, we consider classifying the Hy-

menoptera and CIFAR-10 datasets as downstream tasks

using several pretrained models such as Alexnet, Resnet,

VGG, and Densenet.1These pretrained models are not natu-

rally equivariant to groups such as the c4group of 90◦rota-

tions, see Fig. 2. We ﬁnd that equi-tuning these models using

group symmetries such as c4outperform ﬁne-tuning.

Lake and Baroni (2018) proposed the SCAN task to

benchmark the performance of language models on com-

positional generalization. Standard models such as RNNs,

GRUs, and LSTMs fail miserably on this task showing their

lack of compositional generalization abilities. Later, Gordon

et al. (2019) proposed a group-equivariant language model

with compositional generalization capabilities that passes

the SCAN task. But, training group equivariant language

models from scratch for different compositionality require-

ments can be computationally expensive. Here, we simply

equi-tune pretrained models using suitable groups to ob-

tain competitive results and sometimes even outperform the

group equivariant models of Gordon et al. (2019).

Several empirical studies on fairness in NLG show biases

and stereotypes in language models such as GPT2 (Zhao

et al. 2017; Sheng et al. 2019; Nadeem, Bethke, and Reddy

2021).2But, theoretical study of bias mitigation methods in

NLG remain largely unexplored. We ﬁrst provide a group-

theoretic framework for fairness in NLG. Then we introduce

two different equi-tuning methods for debiasing language

models. We use the regard classiﬁer of Sheng et al. (2019)

to show that equi-tuned GPT2 reduces bias towards various

demographic groups in generated texts compared to the orig-

inal GPT2 model.

The main contributions of this paper are as follows.

• § 4 derives equi-tuning and discusses its properties.

• § 5.1 and 5.2 apply equi-tuning to image classiﬁcation

and compositional generalization, respectively.

• § 5.3 ﬁrst provides a group-theoretic deﬁnition of fair-

ness in NLG. Then, it provides two different equi-tuning

1We will use Resnet to refer to Resnet18 and VGG to refer to

VGG11 throughout this paper

2Throughout this work we use GPT2 to refer to the version of

the GPT2 model that has 117M parameters.

methods to mitigate bias in language models.

• § 6 provides experimental validation of equi-tuning by

testing with several pretrained models and benchmark

datasets across all the aforementioned applications.

2 Related Work

Group equivariant networks. Group equivariant net-

works (Cohen and Welling 2016; Kondor and Trivedi 2018;

Ravanbakhsh, Schneider, and Poczos 2017) use equivari-

ance as inductive priors for efﬁcient learning. They ﬁnd ap-

plications in image classiﬁcation (Cohen and Welling 2016,

2017), graph processing (Satorras, Hoogeboom, and Welling

2021; Maron et al. 2019; Keriven and Peyr´

e 2019), meshes

and 3D point cloud data processing (He et al. 2021; De Haan

et al. 2020; Basu et al. 2022a), and reinforcement learning

(van der Pol et al. 2020; Wang et al. 2022; Basu et al. 2022b).

But these methods do not leverage the recent emergence of

powerful pretrained models.

Transfer learning. Transfer learning has gained popular-

ity in deep learning because of the availability of large

pretrained models and the gains obtained from their use

(Zhuang et al. 2020; Dai et al. 2009; Zamir et al. 2018; Tay-

lor and Stone 2009; Bengio 2012; Ruder et al. 2019). But

equivariance in transfer learning remains unexplored.

Compositional generalization. SCAN is a dataset that

benchmarks the performance of language models for their

compositional generalization ability (Lake and Baroni

2018). Various models such as RNNs, GRUs, and LSTMs

fail at the SCAN task (Lake and Baroni 2018). Several

methods have been proposed to solve parts of the SCAN

task: group equivariance (Gordon et al. 2019), meta learn-

ing (Lake 2019), syntactic attention mechanism (Russin

et al. 2019), and data augmentation (GECA) (Andreas

2020). Among these, the group equivariant method of Gor-

don et al. (2019) is the most systematic and achieves the

best results. Also, all methods besides GECA require com-

plex architectures or training methods that are non trivial to

use with transfer learning. Equi-tuning, in contrast, is a sys-

tematic method that can be used on top of pretrained models

such as RNNs, GRUs, LSTMs, transformers, etc.

Fairness in NLG. Several works have shown bias in lan-

guage models on the basis of gender, race, sexual orienta-

tion, etc. (Sheng et al. 2019; Prates, Avelar, and Lamb 2020;

Henderson et al. 2018). Existing work on detecting and mit-

igating biases in NLG is mainly ad hoc and lacks generality

(Sun et al. 2019; Nadeem, Bethke, and Reddy 2021; Abid,

Farooqi, and Zou 2021). Moreover, Steed et al. (2022) have

shown that mitigating bias in the embedding space does not

help reduce bias for downstream tasks. In contrast, our work

attempts to deﬁne fairness using group theory, which moti-

vates our bias mitigation methods that provide appropriate

guarantees on fairness. Recently, Yeo and Chen (2020) pro-

vided a theoretical deﬁnition of fairness in NLG inspired by

Dwork et al. (2012); the idea is that similar prompts from

different demographic groups such as “man” and “woman”

must generate similar sentences. There, deﬁning the metric

to measure similarity is non-trivial since the metric must also

preserve the individuality of different demographic groups.

In contrast, our framework does not need any such metric

and provides a direct method to preserve such individuality

while mitigating bias.

3 Background

Here we give a background on group equivariance, compo-

sitional generalization, and fairness in NLG.

3.1 Group Equivariance

Groups. A set with a binary operator, (G, ·)is called a

group if it satisﬁes the axioms of a group in appendix § A.1.

The action of a group on a ﬁnite set Xis given as Γ : G×

X 7→ X that satisﬁes the axioms of group action in § A.4.

Group actions are used to formally describe transformations

acting on a set X, e.g. rotations of 90◦s is an action Γon

a set of square images X. A transformation of x∈ X by

group element g∈Gis written as Γ(g, x).

Group equivariance. Let ΓXand ΓYbe the group actions

of Gon sets Xand Yrespectively. A function f:X 7→ Y is

called group equivariant to Gif f(ΓX(g, x)) = ΓY(g, f(x))

for all g∈G, x ∈ X . Hence, if a neural network performing

segmentation is equivariant to the group of 90◦rotations (c4

group), then, if the input is rotated by a multiple of 90◦, the

output also gets rotated by the same angle.

3.2 Compositional Generalization

Compositionality in languages refers to the ability to un-

derstand novel sentences by understanding and algebraically

manipulating their components (Chomsky 2009; Montague

1970). Compositionality is key to excellent human under-

standing of languages, whereas it is hypothesized that neu-

ral networks do not posses such capabilities, leading to their

extreme sample inefﬁciency in modeling languages (Lake

et al. 2017; Lake and Baroni 2018; Loula, Baroni, and Lake

2018; Dess`

ı and Baroni 2019). E.g., if humans understand

the meanings of “walk”, “jump”, and “jump twice”, then

they can naturally understand the meaning of “walk twice”.

But deep neural networks fail to do so, as shown by tests on

the SCAN dataset (Lake and Baroni 2018).

SCAN is a translation dataset where the inputs are com-

mands such as “Jump Twice” and the outputs consist of cor-

responding actions such as “JUMP JUMP”. There are sev-

eral data splits in SCAN that test different generalization ca-

pabilities of a model. The two of interest to us are the Add

jump task and the Around right task. These two tasks test the

compositional generalization capabilities of models.

The training set of the Add jump task consists of sen-

tences that do not contain any commands containing the

word “Jump” except for the word “Jump” itself. But the

training set contains other sentences with verbs that are sim-

ilar to “Jump”, such as “Walk”, “Run”, “Walk Twice”, “Run

Twice”, etc. The test set on the other hand contains com-

plicated commands using the word “Jump” such as “Jump

Twice”, “Turn Left After Jump Twice”, etc. Thus, for a

model to perform well in the test set, it must infer the mean-

ing of complicated sentences such as “Jump Twice” from

the understanding of “Jump” and “Walk Twice”. Similarly,

in the training set of the Around right task, the command

“Around Right” never appears, but similar commands such

as “Around Left” appear. The test set contains the phrase

“Around Right” and for the model to succeed in this task, it

must infer that “Right” and “Left” are directions and can be

treated in a similar way.

3.3 Fairness: Mitigating Biases in NLG

As discussed in § 2, Sheng et al. (2019) show that lan-

guage models such as GPT2 exhibit biases towards certain

demographic groups in their generated texts. These biases

are often subtle and are not easily detectable using senti-

ment classiﬁers. Hence, they introduce the concept of re-

gard towards various demographic groups and provide a task

to detect bias in texts generated by models in terms of re-

gards. They consider three sets of demographic groups for

this task: a) [“man”, “woman”], b) [“Black”, “White”], c)

[“gay”, “straight”]. These sets correspond to gender, race,

and sexual orientation, respectively. The task consists of two

types of contexts: respect and occupation. Each type has

ﬁve context sentences, and models generate texts for each

of these sentences. The respect task tests the biases in the

model’s respect towards various demographic groups, e.g.

‘The XYZ was known for’, where XYZ is replaced by any

demographic group. The occupation task tests the bias in

model’s description of occupation for different demographic

groups, e.g. ‘The XYZ worked as’, where XYZ is replaced by

any demographic group. Sheng et al. (2019) also develop an

automatic regard classiﬁer using transfer learning on BERT

using a dataset created using human annotations. This classi-

ﬁer labels any generated sentence as negative, neutral, posi-

tive, or other. This classiﬁer is shown to match human labels

of regard for texts with around 80% accuracy. We use this

regard classiﬁer in our experiments for fairness in NLG.

4 Equi-Tuning

We motivate equi-tuning as a method that minimizes a dis-

tance between the features obtained by a pretrained model

and any equivariant model when the dataset contains all

the transformations from a discrete group. We show that

the solution obtained corresponds to the Reynold’s opera-

tor (Sturmfels 2008) applied to the pretrained model, which

directly implies certain universality properties.

Let M:X ⊂ Rn7→ Y ⊂ Rmbe a pretrained model.

Further, let ΓXand ΓYbe group actions of the group G

on Xand Yrespectively. We construct a model MGthat is

equivariant to actions of a ﬁnite group Gand also minimizes

the sum of the distances between features M(ΓX(g, x)) and

MG(ΓX(g, x)) for any x, for all g∈G. The idea is that MG

loses little pretrained knowledge from Mwhile also being

equivariant to G. We assume that the group actions are well

deﬁned, which is true for a wide range of cases including all

cases considered in this paper. Formally, for any x∈ X , we

want to solve the following optimization problem.

min

MG(x)X

g∈G

kM(ΓX(g, x)) −MG(ΓX(g, x))k2

s.t. MG(ΓX(g, x)) = ΓY(g, MG(x)) for all g∈G.

(1)

When clear from context, we write ΓX(g, x)as gx and

ΓY(g, y)as gy, for simplicity. Now, assuming that kgk2=

1, we have the optimization as

min

MG(x)X

g∈G



g−1M(gx)−MG(x)



s.t. MG(gx) = gMG(x)for all g∈G.

(2)

To solve (2), we ﬁrst remove the constraint of equivariance

on MGand obtain a lower bound to the solution of (2).

Then, we show the obtained solution also satisﬁes the con-

straints in (2), hence, it is also a solution to (2). Removing

the equivariant constraint from (2), we obtain the optimiza-

tion problem minMG(x)Pg∈G

g−1M(gx)−MG(x)



This is a convex problem with solution

MG(x) = 1

|G|X

g∈G

g−1M(gx)(3)

Note that (3) is the Reynold’s operator (Sturmfels

2008) applied to M. Further, Yarotsky (2022) shows that

Reynold’s operator for group Gapplied to any function

makes it equivariant to G. Hence, it satisﬁes the constraints

of (2). Since it minimizes the lower bound, it also minimizes

the function in (2). Sec. C gives efﬁcient implementation of

(3). Sec. D shows that equituning is comparable to parameter

sharing (Ravanbakhsh, Schneider, and Poczos 2017; Cohen

and Welling 2016) in compute complexity.

Comments and properties. The assumption kgk2= 1

is very general and subsumes the entire class of permuta-

tion, and special linear groups such as SO(n), where nis

a positive integer. Moreover, our algorithm can be directly

extended to groups that have a constant norm, not necessar-

ily just 1. Note that equi-tuning is not useful in cases where

Mis already equivariant/invariant to a larger group H>G,

where we get MG(x) = M(x)in (3).

Under the assumption that Mis a universal approximator

of all functions f:X 7→ Y as deﬁned in appendix § B.2, it

follows from Yarotsky (2022) and Murphy et al. (2018) that

MGis an universal approximator of all functions e:X 7→ Y

that are equivariant with respect to G.

Discussion and Example. The features obtained in (3) are

called scalar features as described by Cohen et al. (2019). In

appendix § H, we extend this solution to obtain outputs that

are regular features represented by MR

Gin Alg. 2. Regular

features are considered more expressive than scalar features.

As proved in § H, MR

Gis also equivariant. We restrict our

experiments in this work to scalar features for simplicity.

Traditional equivariant networks, such as GCNN (Cohen

and Welling 2016), SE(3)-transformers (Fuchs et al. 2020),

and LieConv (Finzi et al. 2020), require the group equivari-

ance constraint to hold for each layer of the network. In con-

trast, for equi-tuning, we only need to ensure that the group

actions are deﬁned on the input and output layers of the pre-

trained model, which is a key reason for the simplicity and

generality of our algorithm.

Now we provide an example of equi-tuning for image pro-

cessing using the c4 = {e, r, r2, r3}group, where eis the

identity and rdenotes rotation by 90◦. As shown in Fig. 1b,

for constructing the model for equi-tuning, we compute four

transformations of the input and compute the features by

passing them through the pretrained model parallelly. The

outputs are transformed using inverse transformations and

are passed through a custom group equivariant layer, where

they are averaged and passed through custom equivariant

layers to obtain the output. In contrast, for ﬁne-tuning the

input is simply passed through the model and a custom layer

to obtain the output, see Fig. 1a. §F gives examples of equi-

tuning for language models.

5 Applications

Emphasizing the generality of equi-tuning, we apply it to

three different tasks: 1) image classiﬁcation, 2) composi-

tional generalization in language, and 3) fairness in NLG.

5.1 Image Classiﬁcation

Cohen and Welling (2016) found that equivariant networks

using the c4(90◦rotations) and d4groups (90◦rotations and

horizontal ﬂips) consistently outperformed non-equivariant

networks on the CIFAR10 dataset. Hence, we choose the

same groups for our image classiﬁcation experiments.

As shown in Fig. 1, equi-tuning supports a custom equiv-

ariant layer, which is useful to change the dimension of the

output as required by downstream tasks. For our image clas-

siﬁcation tasks, we use parameter-sharing (Ravanbakhsh,

Schneider, and Poczos 2017) to design the custom equivari-

ant layers for the c4and d4groups. Parameter-sharing sim-

ply takes a fully connected network and introduces a sharing

scheme in the weights of the network.

5.2 Compositional Generalization in Language

We consider the SCAN task for testing compositional gener-

alization of language models. As discussed in § 3.2, Gordon

et al. (2019) provide a solution to the Add jump task and

Around right task by training group equivariant recurrent

deep neural networks such as G-RNNs, G-GRUs, G-LSTMs

from scratch.

For solving the SCAN task, Gordon et al. (2019) use

cyclic groups and apply them on the vocabulary space of

the models to achieve local equivariance. The group used

for both Add jump task and Around right task is the cyclic

group of size two, i.e. G= ({e, g},·), where g·g=e, and

eis the identity element. The group acts on the input and

output vocabularies of models considered for the tasks. The

identity element makes no transformations to the input or

the output. The element gswaps two words in both the in-

put and the output vocabularies simultaneously. The words

swapped depends on the task considered.

For Add jump task,gswaps the words [“Jump”, “Run”]

in the input vocabulary, and the words [JUMP,RUN] in the

output vocabulary. Similarly, for Around right task,gswaps

the words [“Left”, “Right”] in the input vocabulary, and the

words [LEFT,RIGHT] in the output vocabulary.

We start with recurrent models such as RNNs, GRUs,

LSTMs, pretrained in-house, and treat them as blackbox

models and simply use the equi-tune transform from (3) on

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Equi-Tuning:GroupEquivariantFine-TuningofPretrainedModelsSouryaBasu,1,2,*PrasannaSattigeri,1KarthikeyanNatesanRamamurthy,1VijilChenthamarakshan,1KushR.Varshney,1LavR.Varshney,2PayelDas11IBMResearchThomasJ.WatsonResearchCenter,2UniversityofIllinoisatUrbana-ChampaignAbstractWeintroduceequi-tuning,ano...

展开>> 收起<<

Equi-Tuning Group Equivariant Fine-Tuning of Pretrained Models Sourya Basu1 2 Prasanna Sattigeri1Karthikeyan Natesan Ramamurthy1 Vijil Chenthamarakshan1Kush R. Varshney1Lav R. Varshney2Payel Das1.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Equi-Tuning Group Equivariant Fine-Tuning of Pretrained Models Sourya Basu1 2 Prasanna Sattigeri1Karthikeyan Natesan Ramamurthy1 Vijil Chenthamarakshan1Kush R. Varshney1Lav R. Varshney2Payel Das1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: