model explanations. We evaluate the faithfulness of
rationales extracted using eight feature attribution
approaches and three select-then-predict models
over six text classification tasks with chronological
data splits. Our contributions are as follows:
•
We find that faithfulness is not consistent
under temporal concept drift for rationales
extracted with feature attribution methods
(e.g. it decreases or increases depending on
the method), with an attention-based method
demonstrating the most robust faithfulness
scores across datasets;
•
We empirically show that select-then-predict
models can be used in asynchronous settings
when it achieves comparable performance to
the full-text model;
•
We demonstrate that sufficiency is not trust-
worthy evaluation metrics for explanation
faithfulness, regardless of a synchronous or
an asynchronous setting.
2 Related Work
2.1 Temporal Concept Drift in NLP
Temporal model deterioration describes the true
difference in system performance when a system
is evaluated on chronologically newer data (Jaidka
et al.,2018;Gorman and Bedrick,2019). This
has been linked to changes in the data distribu-
tion, also known as concept drift in early studies
(Schlimmer and Granger,1986;Widmer and Ku-
bat,1993). Previous work has demonstrated the
impact of temporal concept drift on model perfor-
mance by assessing the temporal generalization
(Lazaridou et al.,2021;Søgaard et al.,2021;Agar-
wal and Nenkova,2021;Röttger and Pierrehum-
bert,2021). Søgaard et al. (2021) has studied sev-
eral factors that affect the true difference in system
performance such as temporal drift, variations in
text length and adversarial data distributions. They
found that temporal variation is the most important
factor for performance degradation and suggest
including chronological data splits in model eval-
uation. Chalkidis and Søgaard (2022) also noted
that evaluating on random splits with the same tem-
poral distribution as the training data consistently
over-estimates model performance at test time in
multi-label classification problems.
Previous work on mitigating temporal concept
drift includes automatically identifying semantic
drift of words over time (Tsakalidis et al.,2019;
Giulianelli et al.,2020;Rosin and Radinsky,2022;
Montariol et al.,2021). Efforts have also been
made to mitigate the impact of temporal concept
drift on model prediction performance (Lukes and
Søgaard,2018;Röttger and Pierrehumbert,2021;
Loureiro et al.,2022;Chalkidis and Søgaard,2022)
and develop time-aware models (Dhingra et al.,
2022;Rijhwani and Preotiuc-Pietro,2020;Dhingra
et al.,2021;Rosin and Radinsky,2022). For ex-
ample, both Röttger and Pierrehumbert (2021) and
Loureiro et al. (2022) observed performance im-
provements when continue fine-tuning their models
with chronologically newer data. While the impact
of temporal concept drift on model performance
has received particular attention, to the best of our
knowledge, no previous work has examined its im-
pact on model explanations.
2.2 Concept Drift and Model Explanations
Poerner et al. (2018) has compared the explana-
tion quality between tasks that contain short and
long textual context. More recently, Chrysostomou
and Aletras (2022a) have studied model explana-
tions in out-of-domain settings (i.e. under con-
cept drift) using train and test data from different
domains. Their results showed that the faithful-
ness of out-of-domain explanations unexpectedly
increases, i.e. outperforming in-domain explana-
tions’ faithfulness. This is interesting given that
performance degradation due to concept drift is of-
ten expected in domain adaptation (Schlimmer and
Granger,1986;Widmer and Kubat,1993;Chan
and Ng,2006;Gama et al.,2014).
3 Extracting Explanations
We extract explanations using two standard ap-
proaches: (i) post-hoc methods; and (ii) select-
then-predict models.
3.1 Post-hoc Explanation Methods
For post-hoc explanations, we fine-tune a BERT-
base model on each task on the synchronous train-
ing set and extract explanations using post-hoc
feature attribution methods for all synchronous
and asynchronous testing sets. We use eight
widely used feature attribution methods follow-
ing Chrysostomou and Aletras (2021a,b).
•Attention (α):
Token importance is com-
puted using the corresponding normalized at-
tention scores (Jain et al.,2020).