1 Mixup for Test-Time Training Bochao Zhang1

2025-04-28 0 0 1020KB 11 页 10玖币
侵权投诉
:1
Mixup for Test-Time Training
Bochao Zhang1
csbczhang@comp.hkbu.edu.hk
Rui Shao2
rui.shao@ntu.edu.sg
Jingda Du1
csjddu@comp.hkbu.edu.hk
PC Yuen1
pcyuen@comp.hkbu.edu.hk
1Hong Kong Baptist University
Hong Kong SAR, China
2Nanyang Technological University
Singapore
Abstract
Test-time training provides a new approach solving the problem of domain shift.
In its framework, a test-time training phase is inserted between training phase and test
phase. During test-time training phase, usually parts of the model are updated with test
sample(s). Then the updated model will be used in the test phase. However, utilizing test
samples for test-time training has some limitations. Firstly, it will lead to overfitting to
the test-time procedure thus hurt the performance on the main task. Besides, updating
part of the model without changing other parts will induce a mismatch problem. Thus it
is hard to perform better on the main task. To relieve above problems, we propose to use
mixup in test-time training (MixTTT) which controls the change of model’s parameters
as well as completing the test-time procedure. We theoretically show its contribution in
alleviating the mismatch problem of updated part and static part for the main task as a
specific regularization effect for test-time training. MixTTT can be used as an add-on
module in general test-time training based methods to further improve their performance.
Experimental results show the effectiveness of our method.
1 Introduction
Usually in deep learning based methods, training phase and test phase are strictly separated.
E.g. a normal dataset will be divided into three parts:training set, validation set and test set.
Models are trained with training set data, while hyperparameters of the model are chosen
with respect to the performance on validation set. Then the well trained model is evaluated
on the test set. However, the demand of a good model is not restricted to good performance
on the test set from the same dataset but also from other datasets. Namely, under distri-
bution shifts whether the model still enjoys a high generalization ability and keeps its high
performance. This problem is specifically studied in many fields with different settings like
domain adaptation, domain generalization, adversarial learning etc. Test-time training opens
a new learning paradigm and provides a new strategy to counter this problem.
Test-time training inserts a test-time training phase between the normal training phase
and test phase. The model would be well trained in the training phase. Then when a test
© 2022. The copyright of this document resides with its authors.
It may be distributed unchanged freely in print or electronic forms.
arXiv:2210.01640v1 [cs.LG] 4 Oct 2022
2:
sample(s) comes for inference, it carries in itself information that could be utilized in test-
time training phase. Such information includes domain information and visual information
etc. Usually test-time training phase will fine-tune the model or important statistics (e.g.
prototypes) as completing the auxiliary unsupervised task. Finally inference on the main
task will be performed with the updated model (statistics) in test phase. From the setting
view, test-time training obeys the rule of no access to test data during training. It coincides
with domain generalization. From the view of test-time training process, whole or part of the
model is updated to adapt to the test sample. In this way it coincides with domain adaptation.
So test-time training basically puts very little requirement for the data and model, yet can
counter certain degree of domain shift in its process. The early network structure of test-time
training from [12] is a multitask training framework. One main task and one self-supervised
auxiliary task share a feature extractor and each task has an independent classifier, which
is called head. Self-supervised task rotation [1] is chosen as auxiliary task for the test-time
procedure. Simple multi-task training is used for the first phase. In test-time training phase
of [12], a test sample(s) will be rotated to perform the auxiliary task and update the feature
extractor. In test phase inference will be performed on the updated feature extractor and the
original classifier. This network structure is considered as one of the most classical network
structures for test-time training methods. Therefore, we conduct our theoretical analysis
about MixTTT and ordinary TTT based on this framework.
Test-time training aims that optimizing auxiliary task will intermediately improve the
main task through the updated shared model. This could be realized with two approaches.
The first one relies that auxiliary task can minimize the domain discrepancy between the
training set and test samples. Naturally when performing the main task, more accurate re-
sults could be obtained. Most test-time training methods follow this approach. However in
such cases, a batch of test samples are demanded. The second approach depends on the co-
operation of the main task and auxiliary task on certain datasets. It is expected that auxiliary
task can help dig the inherent properties of the test sample. E.g. by performing the auxiliary
task, visual information of test sample could be better extracted. Under such situation, op-
timizing auxiliary task will intermediately optimize the main task. This approach normally
does not restrict the number of test samples. However it requires more delicate test-time
update process to keep the good relation of two tasks, especially on unseen samples. As
we mentioned above, uncontrolled optimization on the auxiliary task will cause overfitting.
Besides, with much change of some parts of the model the static part for the main task will
not work well on top of it.
In this paper we propose to utilize mixup between training data and the test sample(s)
to mitigate the above problems in test-time procedure. Our method can be applied on both
the first approach setting and the second approach setting without specific requirement about
the number of test sample(s). As a result, our add-on module allows test-time training to
improve the main task with the full strength. In summary, our contribution is three-folded:
We identify an important problem in test-time training, which is model mismatch be-
tween the updated part and static part when accomplishing the auxiliary task.
We show from theoretical analysis that the effect using mixup in test-time training
brings implicit control in model change beyond original test-time training.
MixTTT can be seen as an add-on module without specific requirement about the
number of test samples which can further boost the performance of existing test-time
training related methods.
:3
2 Related Work
2.1 Test-time training
Test-time training [12] opens a new learning paradigm to solve domain shift problem. In
its belief, when the test sample comes for inference, it is a waste to not explore it. The
core idea is to utilize the test sample information to optimize some parts of the model with
the unsupervised auxiliary task and then evaluate the test sample with the updated model
for the main task. Following its idea, appears test-time adaptation, for which usually many
test samples are required to perform adaptation. Under the setting of test-time adaptation,
many domain adaptation related techniques could be borrowed and utilized. Namely in such
methods the aim of auxiliary task is to minimize the discrepancy between the source and
target domains. [14] estimates normalization statistics and updates affine transformations to
reduce entropy in the auxiliary task. [8] performs contrastive learning on the test sample
batch with its augmented versions and on the other hand aims to align the mean and variance
of features from the test sample batch with the training set. For now most test-time methods
belong to test-time adaptation, there is rather limited paper solving single-test-sample based
test-time training problem. Our method can fit in with both multiple-test-sample-based and
single-test-sample-based test-time procedure.
Test-time training itself could also be used as an add-on module in domain generalization
methods to improve the performance. [4] does not literally update the model during test,
instead it chooses to update prototypes in a memory bank. At the end, the test sample is
classified based on the distance of its feature vector with adjusted prototype representations.
[9] does not explicitly mention test-time training, but they share similar idea in utilizing
the test sample to do optimization for better inference. It aims to project the target sample
to the source domain manifold through an inference-time procedure thus get more accurate
inference outcome.
Approaches from other fields like medical semantic segmentation [3,5] and face anti-
spoofing [11,15] also utilize the idea of test-time training together with specific domain
knowledge to solve the domain shift problem.
2.2 Mixup
The core idea of mixup [20] is that convex combination of sample pair and their labels forms
a general vicinal distribution. Previous experience shows that samples drawn from vicinal
distribution increase the amount of training samples and relieve overfitting.
[13] claims that manifold mixup as a regularization method gives smoother decision
boundary and better regularization during training stage. More mixup-based augmentation
methods gradually appear as well e.g. [19] further improves the performance on localization.
[6] gives a more effective strategy on how to cut and mix sample pairs.
Mixup as a data processing method is also used to solve domain generalization and do-
main adaptation problems. [16] separates two kinds of mixup: mixing samples from two
different domains and mixing samples from all domains. The second one shows good per-
formance in visual decathlon benchmark [10]. [17,18] both utilize the concepts of mixup
and adversarial training for domain adaptation but in different ways. [18] chooses to mix
source domain data with the target domain data, thus filling the gap with mixed samples
between two domains. [17] instead mix up samples within the source domain and the target
domain.
摘要:

:1MixupforTest-TimeTrainingBochaoZhang1csbczhang@comp.hkbu.edu.hkRuiShao2rui.shao@ntu.edu.sgJingdaDu1csjddu@comp.hkbu.edu.hkPCYuen1pcyuen@comp.hkbu.edu.hk1HongKongBaptistUniversityHongKongSAR,China2NanyangTechnologicalUniversitySingaporeAbstractTest-timetrainingprovidesanewapproachsolvingtheproblemo...

展开>> 收起<<
1 Mixup for Test-Time Training Bochao Zhang1.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:1020KB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注