
2:
sample(s) comes for inference, it carries in itself information that could be utilized in test-
time training phase. Such information includes domain information and visual information
etc. Usually test-time training phase will fine-tune the model or important statistics (e.g.
prototypes) as completing the auxiliary unsupervised task. Finally inference on the main
task will be performed with the updated model (statistics) in test phase. From the setting
view, test-time training obeys the rule of no access to test data during training. It coincides
with domain generalization. From the view of test-time training process, whole or part of the
model is updated to adapt to the test sample. In this way it coincides with domain adaptation.
So test-time training basically puts very little requirement for the data and model, yet can
counter certain degree of domain shift in its process. The early network structure of test-time
training from [12] is a multitask training framework. One main task and one self-supervised
auxiliary task share a feature extractor and each task has an independent classifier, which
is called head. Self-supervised task rotation [1] is chosen as auxiliary task for the test-time
procedure. Simple multi-task training is used for the first phase. In test-time training phase
of [12], a test sample(s) will be rotated to perform the auxiliary task and update the feature
extractor. In test phase inference will be performed on the updated feature extractor and the
original classifier. This network structure is considered as one of the most classical network
structures for test-time training methods. Therefore, we conduct our theoretical analysis
about MixTTT and ordinary TTT based on this framework.
Test-time training aims that optimizing auxiliary task will intermediately improve the
main task through the updated shared model. This could be realized with two approaches.
The first one relies that auxiliary task can minimize the domain discrepancy between the
training set and test samples. Naturally when performing the main task, more accurate re-
sults could be obtained. Most test-time training methods follow this approach. However in
such cases, a batch of test samples are demanded. The second approach depends on the co-
operation of the main task and auxiliary task on certain datasets. It is expected that auxiliary
task can help dig the inherent properties of the test sample. E.g. by performing the auxiliary
task, visual information of test sample could be better extracted. Under such situation, op-
timizing auxiliary task will intermediately optimize the main task. This approach normally
does not restrict the number of test samples. However it requires more delicate test-time
update process to keep the good relation of two tasks, especially on unseen samples. As
we mentioned above, uncontrolled optimization on the auxiliary task will cause overfitting.
Besides, with much change of some parts of the model the static part for the main task will
not work well on top of it.
In this paper we propose to utilize mixup between training data and the test sample(s)
to mitigate the above problems in test-time procedure. Our method can be applied on both
the first approach setting and the second approach setting without specific requirement about
the number of test sample(s). As a result, our add-on module allows test-time training to
improve the main task with the full strength. In summary, our contribution is three-folded:
• We identify an important problem in test-time training, which is model mismatch be-
tween the updated part and static part when accomplishing the auxiliary task.
• We show from theoretical analysis that the effect using mixup in test-time training
brings implicit control in model change beyond original test-time training.
• MixTTT can be seen as an add-on module without specific requirement about the
number of test samples which can further boost the performance of existing test-time
training related methods.