
NLP-based CSS. Recent CSS studies have lever-
aged LLMs for a variety of complex tasks. Ziems
et al. (2024) conduct a comprehensive evalua-
tion of LLMs, pointing out their weaknesses in
tasks which require understanding of subjective
expert taxonomies that deviate from the training
data of LLMs (such as implicit hate and empa-
thy classification). LLMs enhance text-as-data
methods in social sciences, particularly in ana-
lyzing political ideology (Wu et al.,2023), but
struggle with social language understanding, of-
ten outperformed by fine-tuned models (Choi et al.,
2023). Zhang et al. (2023) introduced SPARROW,
a benchmark showing ChatGPT’s limitations in
sociopragmatic understanding across languages.
In exploring German migration debates, Blokker
et al. (2020) and Zaberer et al. (2023) utilize fine-
tuning of transformer-based language models to
classify claims in German newspapers. Chen et al.
(2022) apply LLM-based classification on German
social media posts to study public controversies
over the course of one decade. In contrast to these
approaches, we apply LLMs to longitudinal histor-
ical data and explore it for a new challenging task,
fine-grained detection of social solidarity.
Analysis of parliamentary debates using NLP
tools. Abercrombie and Batista-Navarro (2020a)
review 61 studies on sentiment and position-taking
within parliamentary contexts, covering dictionary-
based sentiment scoring, statistical machine learn-
ing, and other conventional NLP methods. In
terms of specific methodologies, studies often de-
ploy: (i) shallow classifiers, where Lai et al. (2020)
use SVM, Naïve Bayes, and Logistic Regression
for multilingual stance detection; (ii) deep learn-
ing approaches, with Abercrombie and Batista-
Navarro (2020b) applying BERT, Al Hamoud et al.
(2022) exploring LSTM variants, and Sawhney
et al. (2020) introducing GPolS for political speech
analysis; (iii) probabilistic models, as in Vilares
and He (2017)’s Bayesian approach to identify top-
ics and perspectives in debates. With German po-
litical debates, Müller-Hansen et al. (2021) use
topic modeling to study shifts in German parlia-
mentary discussions on coal due to changes in en-
ergy policy, while Walter et al. (2021) employ di-
achronic word embeddings to track antisemitic and
anti-communist biases in these debates. More re-
cently, Bornheim et al. (2023) apply Llama 2 to
automate speaker attribution in German parliamen-
tary debates from 2017-2021. Our research goes
beyond this by adopting recent powerful LLMs to
track changes of a specific social concept, solidar-
ity, in plenary debates from three centuries.
Social solidarity in NLP. Previous studies of so-
cial solidarity in NLP have largely focused on so-
cial media platforms. For example, Santhanam
et al. (2019) study how emojis are used to express
solidarity in social media during Hurricane Irma
in 2017 and Paris terrorist attacks from November
2015. Ils et al. (2021) consider solidarity in Eu-
ropean social media discourse around COVID-19.
Eger et al. (2022) extend this work by examining
how design choices, like keyword selection and
language, affect assessments of solidarity changes
over time. Compared to these works, we use a
similar methodological setup (annotate data and
infer trends), but focus on parliamentary debates
instead of social media, employ a much more fine-
grained sociological framework (Thijssen,2012),
and use LLMs for systematic categorization and
examination of solidarity types over time.
3 Data
We obtain data from two sources: (i) Open Data,
covering Bundestag (en.: federal diet) protocols
from 1949 until today; and (ii) Reichstagspro-
tokolle covering Reichstag (en.: imperial diet) pro-
tocols until 1945.
1
We use the OCR-scanned ver-
sion from Walter et al. (2021). Links to data, mod-
els, etc. used are in Appendix D. For the Reichstag
data, we apply preprocessing steps similar to Walter
et al. (2021) (e.g., removal of OCR artifacts), but
keep German umlauts, capitalization, and punctua-
tion. We automatically split the data into individual
sittings and collect metadata like the date, period
and number of each sitting, which we manually
check and correct. Additionally, we removed in-
terjections and split the text into sentences using
NLTK (Bird et al.,2009), resulting in 19.1M sen-
tences. We release this dataset of plenary protocols
from German political debates (DeuParl) consisting
of 9,923 sittings from 1867 to 2022 on GitHub.2
To select keywords, we (i) train a Word2Vec
model (Mikolov et al.,2013) on our dataset to iden-
tify words with vector representations similar to
Migrant (en.: migrant) and Frau (en.: woman); (ii)
manually expand this list with intuitively relevant
terms; (iii) from both lists, we filter for those which
1
Volkskammer (en.: Eastern German parliament) protocols
could not be included due to lack of availability.
2https://github.com/DominikBeese/DeuParl-v2