We argue that apart from diachronic word em-
beddings there is a need to train dynamic word
embeddings that not only capture temporal shifts
in language but for instance also semantic shifts
between domains or regional differences. It is
therefore important that those embeddings can be
trained on small datasets. We therefore propose
two generalizations of Dynamic Word2Vec. Our
first method is called Word2Vec with Structure
Constraint (W2VConstr), where domain-specific
embeddings are learned under regularization with
any kind of structure. This method performs
well when a respective graph structure is given
a priori. For more general cases where no struc-
ture information is given, we propose our second
method, called Word2Vec with Structure Predic-
tion (W2VPred), where domain-specific embed-
dings and sub-corpora structure are learned at the
same time. W2VPred simultaneously solves three
central problems that arise with word embedding
representations:
1. Words in the sub-corpora are embedded in
the same vector space, and are therefore di-
rectly comparable without post-alignment.
2. The different representations are trained si-
multaneously on the whole corpus as well
as on the sub-corpora, which makes embed-
dings for both general and domain-specific
words robust, due to the information ex-
change between sub-corpora.
3. The estimated graph structure can be used
for confirmatory evaluation when a reason-
able prior structure is given. W2VPred to-
gether with W2VConstr identifies the cases
where the given structure is not ideal, and
suggests a refined structure which leads to an
improved embedding performance, we call
this method Word2Vec with Denoised Struc-
ture Constraint. When no structure is given,
W2VPred provides insights on the structure
of sub-corpora, e.g., similarity between au-
thors or scientific domains.
All our methods rely on static word embeddings
as opposed to currently often used contextualized
word embeddings. As we learn one representation
per slice such as year or author, thus considering a
much broader context than contextualized embed-
dings, we are able to find a meaningful structure
between corresponding slices. Another main ad-
vantage comes from the fact that our methods do
not require any pre-training and can be run on a
single GPU.
We test our methods on 4 different datasets with
different structures (sequences, trees and general
graphs), domains (news, wikipedia, high litera-
ture) and languages (English and German). We
show on numerous established evaluation meth-
ods that W2VConstr and W2VPred significantly
outperform baseline methods with regard to gen-
eral as well as domain-specific embedding qual-
ity. We also show that W2VPred is able to
predict the structure of a given corpus, outper-
forming all baselines. Additionally, we show ro-
bust heuristics to select hyperparameters based
on proxy measurements in a setting where the
true structure is not known. Finally, we show
how W2VPred can be used in an explorative
setting to raise novel research questions in the
field of Digital Humanities. Our code is avail-
able at github.com/stephaniebrandl/
domain-word-embeddings.
2 Related Work
Various approaches to track, detect and quantify
semantic shifts in text over time have been pro-
posed (Kim et al.,2014;Kulkarni et al.,2015;
Hamilton et al.,2016;Zhang et al.,2016;Marja-
nen et al.,2019).
This research is driven by the hypothesis that se-
mantic shifts occur, e.g., over time (Bleich et al.,
2016) and viewpoints (Azarbonyad et al.,2017),
in political debates (Reese and Lewis,2009)
or caused by cultural developments (Lansdall-
Welfare et al.,2017). Analysing those shifts can
be crucial in political and social studies but also in
literary studies as we show in Section 5.
Typically, methods first train individual static
embeddings for different timestamps, and then
align them afterwards (e.g., Kulkarni et al.,2015;
Hamilton et al.,2016;Kutuzov et al.,2018;Devlin
et al.,2018;Jawahar and Seddah,2019;Hofmann
et al.,2020 and a comprehensive survey by Tah-
masebi et al.,2018). Other approaches, which deal
with more general structure (Azarbonyad et al.,
2017;Gonen et al.,2020) and more general appli-
cations (Zeng et al.,2017;Shoemark et al.,2019),
also rely on post-alignment of static word em-
beddings (Grave et al.,2019). With the rise of
larger language models such as Bidirectional En-
coder Representations from Transformers (BERT)
and with that contextualized embeddings, a part of