
these questions on electoral programs from the Ger-
man 2021 elections, comparing party similarities
against a ground truth built from structured data.
We find that our hypothesis is borne out: We can
achieve competitive results in modelling the party
proximity with textual data provided that the text
representations are optimized to capture the dif-
ferences across parties and normalized to fall in a
certain distribution that is appropriate for comput-
ing text similarity. More surprisingly, we find that
completely unstructured data reach higher corre-
lations than more informed settings that consider
exclusively claims and/or their policy domain. We
make our code and data available for replicability.
1
Paper structure.
The paper is structured as fol-
lows. Section 2provides an overview of related
work. Section 3describes the data we work with
and our ground truth. Section 4presents our mod-
eling approach. Sections 5and 6discuss the exper-
imental setup and our results. Section 7concludes.
2 Related Work
2.1 Party Characterization
The characterization of parties is an important topic
in political science, and has previously been at-
tempted with NLP models. Most studies, however,
have focused on methods to place parties along the
left to right ideological dimension. For instance, an
early example is Laver et al. (2003) who investigate
the scaling of political texts associated with parties
(such as manifestos or legislative speeches) with a
bag of words approach in a supervised fashion, with
position scores provided by human domain experts.
Others, instead, have implemented unsupervised
methods for party positioning in order to avoid pick-
ing up on biases of the annotated data and to scale
up to large amounts of texts from different political
contexts while still implementing word frequency
methods (Slapin and Proksch,2008). More recent
studies have sought to overcome the drawbacks of
word frequency models such as topic reliance and
lack of similarity between synonymous pairs of
words, e.g. Glavaš et al. (2017) and Nanni et al.
(2022) implement a combination of distributional
semantics methods and a graph-based score propa-
gation algorithm for capturing the party positions
in the left-right dimension.
Our study differs from previous ones in two main
1https://github.com/tceron/capture_similarity_
between_political_parties.git
aspects. First, our aim is not to place parties a
left-to-right political dimension but to assess party
similarity in a latent multidimensional space of
policy positions and ideologies. Second, our focus
is not on the use of specific vocabulary, but on
representations of whole sentences. In other words,
our proposed models work well if they manage to
learn how political viewpoints are expressed at the
sentence level in party manifestos.
2.2 Optimizing Text Representations for
Similarity
Fine Tuning.
Recent years have seen rapid ad-
vances in the area of neural language models, in-
cluding models such as BERT, RoBERTa or GPT-
3 (Devlin et al.,2019;Liu et al.,2020;Brown
et al.,2020). The sentence-encoding capabilities
of these models make them generally applicable to
text classification and similarity tasks (Cer et al.,
2018). Both for classification and for similarity,
it was found that pre-trained models already show
respectable performance, but fine-tuning them on
task-related data is crucial to optimize the models’
predictions – essentially telling the model which
aspects of the input matter for the task at hand.
On the similarity side, a well-known language
model is Sentence-BERT Reimers and Gurevych
(2019), a siamese and triplet network based on
BERT (Devlin et al.,2019) or RoBERTa (Liu et al.,
2020) which aims at better encoding the similar-
ities between sequences of text. Sentence-BERT
(SBERT) comes with its own fine-tuning schema
which is informed by ranked pairs or triplets and
tunes the text representations to respect the pref-
erences expressed by the fine-tuning data. Of
course, this raises the question of how to obtain
such fine-tuning data: The study experiments both
with manually annotated datasets (for entailment
and paraphrasing tasks) and with the use of heuris-
tic document structure information, assuming that
sentences from the same Wikipedia section are se-
mantically closer and sentences from different sec-
tions are further away. Parallel results are also
found by Gao et al. (2021) in their SimCSE model,
which reach even better results when fine-tuning
with contrastive learning: They also compare a
setting based on manually annotated data from an
inference dataset with a heuristic setting based on
combining a pair of sentences with its drop-out
version as positive examples and different pairs as
negative examples.