1 Mixup for Test-Time Training Bochao Zhang1

2025-04-28 1 0 1020KB 11 页 10玖币

侵权投诉

Mixup for Test-Time Training

Bochao Zhang1

csbczhang@comp.hkbu.edu.hk

Rui Shao2

rui.shao@ntu.edu.sg

Jingda Du1

csjddu@comp.hkbu.edu.hk

PC Yuen1

pcyuen@comp.hkbu.edu.hk

1Hong Kong Baptist University

Hong Kong SAR, China

2Nanyang Technological University

Singapore

Abstract

Test-time training provides a new approach solving the problem of domain shift.

In its framework, a test-time training phase is inserted between training phase and test

phase. During test-time training phase, usually parts of the model are updated with test

sample(s). Then the updated model will be used in the test phase. However, utilizing test

samples for test-time training has some limitations. Firstly, it will lead to overﬁtting to

the test-time procedure thus hurt the performance on the main task. Besides, updating

part of the model without changing other parts will induce a mismatch problem. Thus it

is hard to perform better on the main task. To relieve above problems, we propose to use

mixup in test-time training (MixTTT) which controls the change of model’s parameters

as well as completing the test-time procedure. We theoretically show its contribution in

alleviating the mismatch problem of updated part and static part for the main task as a

speciﬁc regularization effect for test-time training. MixTTT can be used as an add-on

module in general test-time training based methods to further improve their performance.

Experimental results show the effectiveness of our method.

1 Introduction

Usually in deep learning based methods, training phase and test phase are strictly separated.

E.g. a normal dataset will be divided into three parts:training set, validation set and test set.

Models are trained with training set data, while hyperparameters of the model are chosen

with respect to the performance on validation set. Then the well trained model is evaluated

on the test set. However, the demand of a good model is not restricted to good performance

on the test set from the same dataset but also from other datasets. Namely, under distri-

bution shifts whether the model still enjoys a high generalization ability and keeps its high

performance. This problem is speciﬁcally studied in many ﬁelds with different settings like

domain adaptation, domain generalization, adversarial learning etc. Test-time training opens

a new learning paradigm and provides a new strategy to counter this problem.

Test-time training inserts a test-time training phase between the normal training phase

and test phase. The model would be well trained in the training phase. Then when a test

It may be distributed unchanged freely in print or electronic forms.

arXiv:2210.01640v1 [cs.LG] 4 Oct 2022

sample(s) comes for inference, it carries in itself information that could be utilized in test-

time training phase. Such information includes domain information and visual information

etc. Usually test-time training phase will ﬁne-tune the model or important statistics (e.g.

prototypes) as completing the auxiliary unsupervised task. Finally inference on the main

task will be performed with the updated model (statistics) in test phase. From the setting

view, test-time training obeys the rule of no access to test data during training. It coincides

with domain generalization. From the view of test-time training process, whole or part of the

model is updated to adapt to the test sample. In this way it coincides with domain adaptation.

So test-time training basically puts very little requirement for the data and model, yet can

counter certain degree of domain shift in its process. The early network structure of test-time

training from [12] is a multitask training framework. One main task and one self-supervised

auxiliary task share a feature extractor and each task has an independent classiﬁer, which

is called head. Self-supervised task rotation [1] is chosen as auxiliary task for the test-time

procedure. Simple multi-task training is used for the ﬁrst phase. In test-time training phase

of [12], a test sample(s) will be rotated to perform the auxiliary task and update the feature

extractor. In test phase inference will be performed on the updated feature extractor and the

original classiﬁer. This network structure is considered as one of the most classical network

structures for test-time training methods. Therefore, we conduct our theoretical analysis

about MixTTT and ordinary TTT based on this framework.

Test-time training aims that optimizing auxiliary task will intermediately improve the

main task through the updated shared model. This could be realized with two approaches.

The ﬁrst one relies that auxiliary task can minimize the domain discrepancy between the

training set and test samples. Naturally when performing the main task, more accurate re-

sults could be obtained. Most test-time training methods follow this approach. However in

such cases, a batch of test samples are demanded. The second approach depends on the co-

operation of the main task and auxiliary task on certain datasets. It is expected that auxiliary

task can help dig the inherent properties of the test sample. E.g. by performing the auxiliary

task, visual information of test sample could be better extracted. Under such situation, op-

timizing auxiliary task will intermediately optimize the main task. This approach normally

does not restrict the number of test samples. However it requires more delicate test-time

update process to keep the good relation of two tasks, especially on unseen samples. As

we mentioned above, uncontrolled optimization on the auxiliary task will cause overﬁtting.

Besides, with much change of some parts of the model the static part for the main task will

not work well on top of it.

In this paper we propose to utilize mixup between training data and the test sample(s)

to mitigate the above problems in test-time procedure. Our method can be applied on both

the ﬁrst approach setting and the second approach setting without speciﬁc requirement about

the number of test sample(s). As a result, our add-on module allows test-time training to

improve the main task with the full strength. In summary, our contribution is three-folded:

• We identify an important problem in test-time training, which is model mismatch be-

tween the updated part and static part when accomplishing the auxiliary task.

• We show from theoretical analysis that the effect using mixup in test-time training

brings implicit control in model change beyond original test-time training.

• MixTTT can be seen as an add-on module without speciﬁc requirement about the

number of test samples which can further boost the performance of existing test-time

training related methods.

2 Related Work

2.1 Test-time training

Test-time training [12] opens a new learning paradigm to solve domain shift problem. In

its belief, when the test sample comes for inference, it is a waste to not explore it. The

core idea is to utilize the test sample information to optimize some parts of the model with

the unsupervised auxiliary task and then evaluate the test sample with the updated model

for the main task. Following its idea, appears test-time adaptation, for which usually many

test samples are required to perform adaptation. Under the setting of test-time adaptation,

many domain adaptation related techniques could be borrowed and utilized. Namely in such

methods the aim of auxiliary task is to minimize the discrepancy between the source and

target domains. [14] estimates normalization statistics and updates afﬁne transformations to

reduce entropy in the auxiliary task. [8] performs contrastive learning on the test sample

batch with its augmented versions and on the other hand aims to align the mean and variance

of features from the test sample batch with the training set. For now most test-time methods

belong to test-time adaptation, there is rather limited paper solving single-test-sample based

test-time training problem. Our method can ﬁt in with both multiple-test-sample-based and

single-test-sample-based test-time procedure.

Test-time training itself could also be used as an add-on module in domain generalization

methods to improve the performance. [4] does not literally update the model during test,

instead it chooses to update prototypes in a memory bank. At the end, the test sample is

classiﬁed based on the distance of its feature vector with adjusted prototype representations.

[9] does not explicitly mention test-time training, but they share similar idea in utilizing

the test sample to do optimization for better inference. It aims to project the target sample

to the source domain manifold through an inference-time procedure thus get more accurate

inference outcome.

Approaches from other ﬁelds like medical semantic segmentation [3,5] and face anti-

spooﬁng [11,15] also utilize the idea of test-time training together with speciﬁc domain

knowledge to solve the domain shift problem.

2.2 Mixup

The core idea of mixup [20] is that convex combination of sample pair and their labels forms

a general vicinal distribution. Previous experience shows that samples drawn from vicinal

distribution increase the amount of training samples and relieve overﬁtting.

[13] claims that manifold mixup as a regularization method gives smoother decision

boundary and better regularization during training stage. More mixup-based augmentation

methods gradually appear as well e.g. [19] further improves the performance on localization.

[6] gives a more effective strategy on how to cut and mix sample pairs.

Mixup as a data processing method is also used to solve domain generalization and do-

main adaptation problems. [16] separates two kinds of mixup: mixing samples from two

different domains and mixing samples from all domains. The second one shows good per-

formance in visual decathlon benchmark [10]. [17,18] both utilize the concepts of mixup

and adversarial training for domain adaptation but in different ways. [18] chooses to mix

source domain data with the target domain data, thus ﬁlling the gap with mixed samples

between two domains. [17] instead mix up samples within the source domain and the target

domain.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

:1MixupforTest-TimeTrainingBochaoZhang1csbczhang@comp.hkbu.edu.hkRuiShao2rui.shao@ntu.edu.sgJingdaDu1csjddu@comp.hkbu.edu.hkPCYuen1pcyuen@comp.hkbu.edu.hk1HongKongBaptistUniversityHongKongSAR,China2NanyangTechnologicalUniversitySingaporeAbstractTest-timetrainingprovidesanewapproachsolvingtheproblemo...

展开>> 收起<<

1 Mixup for Test-Time Training Bochao Zhang1.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Mixup for Test-Time Training Bochao Zhang1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: