Fine-Grained Detection of Solidarity for Women and Migrants in 155 Years of German Parliamentary Debates Aida Kostikova1 Benjamin Paassen1 Dominik Beese2

2025-04-27 0 0 1.79MB 24 页 10玖币
侵权投诉
Fine-Grained Detection of Solidarity for Women and Migrants in
155 Years of German Parliamentary Debates
Aida Kostikova1, Benjamin Paassen1, Dominik Beese2,
Ole Pütz1,Gregor Wiedemann3,Steffen Eger4,5
1Bielefeld University, 2TU Darmstadt, 3Hans-Bredow-Institut,
4University of Mannheim, 5University of Technology Nuremberg
aida.kostikova@uni-bielefeld.de
Abstract
Solidarity is a crucial concept to understand
social relations in societies. In this paper, we
explore fine-grained solidarity frames to study
solidarity towards women and migrants in Ger-
man parliamentary debates between 1867 and
2022. Using 2,864 manually annotated text
snippets (with a cost exceeding 18k Euro), we
evaluate large language models (LLMs) like
Llama 3, GPT-3.5, and GPT-4. We find that
GPT-4 outperforms other LLMs, approaching
human annotation quality. Using GPT-4, we
automatically annotate more than 18k further
instances (with a cost of around 500 Euro)
across 155 years and find that solidarity with
migrants outweighs anti-solidarity but that fre-
quencies and solidarity types shift over time.
Most importantly, group-based notions of (anti-
)solidarity fade in favor of compassionate soli-
darity, focusing on the vulnerability of migrant
groups, and exchange-based anti-solidarity, fo-
cusing on the lack of (economic) contribution.
Our study highlights the interplay of histori-
cal events, socio-economic needs, and politi-
cal ideologies in shaping migration discourse
and social cohesion. We also show that power-
ful LLMs, if carefully prompted, can be cost-
effective alternatives to human annotation for
hard social scientific tasks.
1 Introduction
Solidarity is a crucial concept for understanding
how societies achieve and maintain stability and
cohesion (Reynolds,2014), and it plays a critical
role in shaping policies (Laitinen and Pessi,2014).
Traditionally, solidarity relied on common identity
and reciprocity, potentially excluding out-groups
like migrants (Hopman and Knijn,2022) and re-
inforcing social boundaries and hierarchies (An-
thias,2014). However, growing diversity and in-
creasing socio-economic, political, religious, and
cultural complexities of modern societies (Kym-
licka,2020) call such traditional forms of solidar-
ity into question. Simultaneously, recent populist
Group-
based
Exchange-
based
Compassionate
Empathic
based on... based on...
Shared
identity
Valuing for
contributions
Support for
marginalized
groups
Respect for
diversity
Perceived
differences
Imbalance in
contribution
Denying help to
vulnerable
Disregard of
groups'
differences
Solidarity Mixed Anti-solidarity
Immigrants must receive the guaranteed minimum wage just
like everyone else and the same support as everyone else.
This text expresses...
None
Figure 1: Annotation scheme based on Thijssen (2012).
The scheme categorizes statements into solidarity,anti-
solidarity,mixed, and none (high-level). At the fine-
grained level, solidarity and anti-solidarity are further
divided into group-based,exchange-based,compassion-
ate, and empathic subtypes.
movements challenge policies that support equality,
such as equal opportunity or reproductive rights of
women (Inglehart and Norris,2016). These evolv-
ing complexities motivate a deeper and broader
study of social solidarity, namely (i) a fine-grained
exploration of different forms of social solidarity
to reflect its multifaceted nature (Oosterlynck and
Bouchaute,2013), and (ii) a broader historical anal-
ysis to trace its evolution from the 19th century to
today (Banting and Kymlicka,2017). In this work,
we contribute to such a systematic study of solidar-
ity by tracing fine-grained notions of solidarity and
anti-solidarity towards two target groups, women
and migrants, in political speech, namely German
parliamentary debates from 1867 to 2022 (Walter
et al.,2021). We employ the social solidarity frame-
work by Thijssen (2012) that incorporates rational
(group-based and exchange-based) and emotive
arXiv:2210.04359v3 [cs.CL] 21 Nov 2024
Gold Standard Translation of the Original German Text
(1) Compassionate solidarity
towards women
(June 29, 1961)
“In connection with § 1708 BGB, the Bundestag has set the age of 18 as the limit for
the obligation to provide maintenance. In the transitional provisions, this stipulation
has been repealed for those who had already reached the age of 16 on January 1,
1962. My faction finds this regulation unfair, as it would exempt significant groups of
people from this maintenance obligation. Especially women who have made great
efforts to send their children to higher education, for example, would have to
bear these costs alone. [...]”
(2) Exchange-based
anti-solidarity towards migrants
(Apr. 19, 2018)
“[...] Let me also add: Migration is not necessarily successful – you always act
as if that is great – it can fail, and it fails in particular when the immigrants’
qualifications are low. In 2013, before the so-called refugee wave, 40 percent of
immigrants from non-EU countries had no qualifications. Since the wave of refugees,
stabbings have increased by 20 percent, and we have imported anti-Semitism in the
country. Does this make for an outstandingly successful migration?”
(3) Mixed stance towards
migrants
(Feb. 2, 1982)
“[...] We must accept that in a few years we will again need a higher number of
foreign workers in the Federal Republic, as Mr. Urbaniak hinted earlier. In reality,
therefore, we must commit to effective integration, which admittedly requires
[...] that there can be no exceptions, no alternative, regarding the recruitment
stop and the prevention of illegal immigration. [...]”
(4) None case (women)
(June 17, 2015)
“[...] ‘We want to be free people!’ There is probably no better phrase to open today’s
debate here in the German Bundestag about the popular uprising of 1953. [...] We
remember women and men who, 62 years ago, showed great courage because
they wanted to change the course of their country’s development and their own
lives, because they wanted to be free people.
Table 1: Example sentences from our dataset showing (anti-)solidarity towards women/migrants. Bold text is the
main sentence, the other sentences are for context. Original German texts, as well as examples of mixed stance and
none, are available in Table 3in the Appendix.
(compassionate and empathic) elements of solidar-
ity (refer to Fig. 1and Section 4 for more details
on the typology). We focus on migrants, central for
solidarity discourse in European and German poli-
tics (Thränhardt,1993;Faist,1994;Fröhlich,2023;
Lehr,2015), and women as an “oppressed majority”
historically marginalized from public life (Calloni,
2020). As manual annotation of (anti-)solidarity
concepts in all parliamentary proceedings over this
155-year period using traditional sociological meth-
ods is practically infeasible, we explore the use of
language models for this complex task. In par-
ticular, we assess the efficacy of BERT, Llama-3,
GPT-3.5, and GPT-4, to detect expressions of var-
ious (anti-)solidarity types in parliamentary texts,
aiming to identify the best performing model for
our large-scale analysis. From an NLP perspective,
this task is semantically and pragmatically chal-
lenging because, (i) expressions of (anti-)solidarity
are often implied rather than explicitly stated in the
text and their meaning is affected by the political
and historical context in which they are made (see
the examples in Table 1;Sravanthi et al.,2024); (ii)
German data, especially evolving German language
over 155 years, is under-represented in common
training data sets for LLMs, which may affect per-
formance (Ahuja et al.,2023;Qin et al.,2024;Liu
et al.,2024); and (iii) LLMs might struggle with an-
notating complex sociological concepts, achieving
lower quality and reliability compared to human
annotators (Wang et al.,2021;Ding et al.,2022;
Zhu et al.,2023;Pangakis et al.,2023).
Our contributions are: (i) We provide a human
annotated training & evaluation dataset of 2,864
text snippets, which required 40+ hours weekly
from 4-5 annotators over nine months, totaling an
investment of approximately 18k Euro; (ii) we con-
duct a comparative analysis of LLMs on a complex
sociological task in which pre-trained language
models (esp. GPT-4) outperform an open-source
model Llama-3-70B-Instruct, as well as models
fine-tuned for this task (BERT, GPT-3.5 fine tuned);
(iii) we provide fine-grained insights into solidarity
discourse concerning migrants in Germany in the
last 155 years across different political parties.
We make our code and data available at
https:
//github.com/DominikBeese/FairGer.
2 Related work
Our work connects to (i) computational social sci-
ence (CSS) (ii) analysis of political data (parlia-
mentary debates) and (iii) the emergent field of
analysis of social solidarity using NLP approaches.
NLP-based CSS. Recent CSS studies have lever-
aged LLMs for a variety of complex tasks. Ziems
et al. (2024) conduct a comprehensive evalua-
tion of LLMs, pointing out their weaknesses in
tasks which require understanding of subjective
expert taxonomies that deviate from the training
data of LLMs (such as implicit hate and empa-
thy classification). LLMs enhance text-as-data
methods in social sciences, particularly in ana-
lyzing political ideology (Wu et al.,2023), but
struggle with social language understanding, of-
ten outperformed by fine-tuned models (Choi et al.,
2023). Zhang et al. (2023) introduced SPARROW,
a benchmark showing ChatGPT’s limitations in
sociopragmatic understanding across languages.
In exploring German migration debates, Blokker
et al. (2020) and Zaberer et al. (2023) utilize fine-
tuning of transformer-based language models to
classify claims in German newspapers. Chen et al.
(2022) apply LLM-based classification on German
social media posts to study public controversies
over the course of one decade. In contrast to these
approaches, we apply LLMs to longitudinal histor-
ical data and explore it for a new challenging task,
fine-grained detection of social solidarity.
Analysis of parliamentary debates using NLP
tools. Abercrombie and Batista-Navarro (2020a)
review 61 studies on sentiment and position-taking
within parliamentary contexts, covering dictionary-
based sentiment scoring, statistical machine learn-
ing, and other conventional NLP methods. In
terms of specific methodologies, studies often de-
ploy: (i) shallow classifiers, where Lai et al. (2020)
use SVM, Naïve Bayes, and Logistic Regression
for multilingual stance detection; (ii) deep learn-
ing approaches, with Abercrombie and Batista-
Navarro (2020b) applying BERT, Al Hamoud et al.
(2022) exploring LSTM variants, and Sawhney
et al. (2020) introducing GPolS for political speech
analysis; (iii) probabilistic models, as in Vilares
and He (2017)’s Bayesian approach to identify top-
ics and perspectives in debates. With German po-
litical debates, Müller-Hansen et al. (2021) use
topic modeling to study shifts in German parlia-
mentary discussions on coal due to changes in en-
ergy policy, while Walter et al. (2021) employ di-
achronic word embeddings to track antisemitic and
anti-communist biases in these debates. More re-
cently, Bornheim et al. (2023) apply Llama 2 to
automate speaker attribution in German parliamen-
tary debates from 2017-2021. Our research goes
beyond this by adopting recent powerful LLMs to
track changes of a specific social concept, solidar-
ity, in plenary debates from three centuries.
Social solidarity in NLP. Previous studies of so-
cial solidarity in NLP have largely focused on so-
cial media platforms. For example, Santhanam
et al. (2019) study how emojis are used to express
solidarity in social media during Hurricane Irma
in 2017 and Paris terrorist attacks from November
2015. Ils et al. (2021) consider solidarity in Eu-
ropean social media discourse around COVID-19.
Eger et al. (2022) extend this work by examining
how design choices, like keyword selection and
language, affect assessments of solidarity changes
over time. Compared to these works, we use a
similar methodological setup (annotate data and
infer trends), but focus on parliamentary debates
instead of social media, employ a much more fine-
grained sociological framework (Thijssen,2012),
and use LLMs for systematic categorization and
examination of solidarity types over time.
3 Data
We obtain data from two sources: (i) Open Data,
covering Bundestag (en.: federal diet) protocols
from 1949 until today; and (ii) Reichstagspro-
tokolle covering Reichstag (en.: imperial diet) pro-
tocols until 1945.
1
We use the OCR-scanned ver-
sion from Walter et al. (2021). Links to data, mod-
els, etc. used are in Appendix D. For the Reichstag
data, we apply preprocessing steps similar to Walter
et al. (2021) (e.g., removal of OCR artifacts), but
keep German umlauts, capitalization, and punctua-
tion. We automatically split the data into individual
sittings and collect metadata like the date, period
and number of each sitting, which we manually
check and correct. Additionally, we removed in-
terjections and split the text into sentences using
NLTK (Bird et al.,2009), resulting in 19.1M sen-
tences. We release this dataset of plenary protocols
from German political debates (DeuParl) consisting
of 9,923 sittings from 1867 to 2022 on GitHub.2
To select keywords, we (i) train a Word2Vec
model (Mikolov et al.,2013) on our dataset to iden-
tify words with vector representations similar to
Migrant (en.: migrant) and Frau (en.: woman); (ii)
manually expand this list with intuitively relevant
terms; (iii) from both lists, we filter for those which
1
Volkskammer (en.: Eastern German parliament) protocols
could not be included due to lack of availability.
2https://github.com/DominikBeese/DeuParl-v2
appear at least 200 times in the dataset. This re-
sulted in 32 keywords for Migrant and 18 keywords
for Frau. These include general terms like Migrant,
Immigrant and Frau to period-specific terms, such
as Vertriebene (en.: expellees) and Bürgerkriegs-
flüchtlinge (en.: civil war refugees), or social roles,
such as Mütter (en.: mothers) and Hausfrauen
(en.: housewives). See the full list of keywords,
and further preprocessing in Appendix A. For a de-
tailed keyword distribution across the dataset, see
Fig. 11 and Fig. 12 in the Appendix.
Using these keywords, we extract 58k main
sentences (instances) for migrants and 131k for
women from DeuParl, expanding each with three
preceding and three following sentences for con-
text, resulting in a total of (i) 463k sentences
(9.79M tokens) for migrants and (ii) 1.58M sen-
tences (32.82M tokens) for women. Fig. 2shows
the number of instances over time.
3
Fig. 7in the
Appendix shows yearly relative frequencies of sen-
tences with terms related to women and migrants
in the entire dataset. It is notable that both Frau
and Migrant terms represent a minor fraction of the
discourse, typically under 0.02. Periodic spikes in
mentions likely align with historical and societal
changes, such as post-WWII for Migrant.
Figure 2: Number of instances in the
Woman
and
Migrant
dataset in each year. Fig. 7in the Appendix
illustrates the relative frequency of instances in both
datasets.
4 Data annotation
To obtain ground truth data for model training and
evaluation, we annotated 2864 instances with five
annotators (all student assistants, with specializa-
tions in social science or computer science). The
annotation was performed over a duration of nine
months. In the first three months, we iteratively
3
We note that the dataset is sparse in the period from 1933
to 1949, i.e. during the NS dictatorship and the immediate
after-war period until the first parliament after the war was
elected in 1949.
refined the annotation guidelines and monitored
the inter-rater agreement (measured by Cohen’s
Kappa) until inter-rater agreement converged (see
Section 4.2 for exact scores) and annotators began
annotating independently.
4.1 Annotation task design
For the manual annotation, we take the target sen-
tence and three preceding and following sentences
for context into account. We first select a high-level
category (solidarity,anti-solidarity,mixed,none).
Solidarity or anti-solidarity cases are then further
distinguished into frames as defined by Thijssen
and Verheyen (2022): group-based,compassionate,
exchange-based, and empathic. We describe each
of the included variables below.
High-level categories. Based on Lahusen and
Grasso (2018) and Ils et al. (2021), we define sol-
idarity as willingness to share resources, directly
or indirectly, or support for target groups, and anti-
solidarity as statements restricting resources, show-
ing unwillingness to support, or implying exclusion
of these groups. Texts with both supporting and op-
posing expressions are labeled mixed, while neutral
or unrelated texts are labeled none.
Group-based solidarity is coded for texts em-
phasizing shared identity and common goals
among group members, whereas group-based anti-
solidarity emphasizes out-group exclusion based
on perceived differences.
Compassionate solidarity is coded for texts sup-
porting marginalized groups, emphasizing their
need for protection, while compassionate anti-
solidarity dismisses these groups by considering
them already in a good position, minimizing their
need support or protection.
Exchange-based solidarity is coded when texts
highlight the economic contributions of “exchange
partners” and potential rewards or further contribu-
tions. Conversely, exchange-based anti-solidarity
calls for penalizing groups perceived as receiving
more than they contribute.
Empathic solidarity is coded when a speaker
expresses respect for individual differences, see-
ing social diversity as beneficial, while empathic
anti-solidarity arises when differences are used as
grounds for exclusion or neglect.
Annotation involved elaborate explanations,
with identification of (anti-)solidarity resources
and highlighting expressions of (anti-)solidarity, as
well as self- (speaker’s own viewpoint) and other-
position (addressing or criticizing others’ view-
points). The full annotation process and a detailed
example are described in Appendix Cand Fig. 15
in the Appendix, respectively.
4.2 Annotation results
(a) Annotators’ agreement (b) Annotators vs. Model
Figure 3: Fig. 3a shows the confusion matrix between
human annotators; Fig. 3b shows agreement between
the best model and human annotators on a test set from
one of the three splits. The former is aggregated over
all pairwise comparisons of annotators, thus the matrix
is symmetric.
While initial agreement levels were low, by the
time annotators began working independently, they
achieved a pairwise agreement with a Cohen’s
Kappa of 0.42 on a fine-grained level and 0.62
on a high level. We observe three main disagree-
ment issues in annotation: misclassification of none
cases, confusion between mixed stance and anti-
solidarity, and overlap within solidarity and anti-
solidarity subtypes (see Fig. 6in the Appendix).
This confusion is often due to overlapping charac-
teristics or the presence of multiple subtypes within
the text; moreover, this annotation task is inherently
subjective, which can lead to differing interpreta-
tions. This is further evidenced by our average
agreement scores. Table 6in the Appendix pro-
vides examples of annotator divergence, explain-
ing why multiple labels could be correct, which
gives insight into more difficult instances. How-
ever, there was almost no confusion between sol-
idarity and anti-solidarity. We note less stability
in annotator agreement before 1930, stabilizing in
subsequent years (see Fig. 14a in the Appendix).
Although these variations can stem from the com-
plexities of historical language and diverse interpre-
tations of past events, they might also stem from
the unbalanced distribution of human annotated
data over the decades (see Fig. 4).
Our dataset comprises 2864 annotated instances,
1437 for migrants and 1427 for women. We note
that anti-solidarity accounts for 13.5% of instances,
being more common among migrants (12.1%) than
women (1.4%) (see Table 5a in the Appendix). 368
instances in our dataset (referred to as curated)
were reviewed by a social science expert to provide
a reliable comparison benchmark for evaluation of
our models. Other consensus mechanisms for the fi-
nal labels in the human-annotated dataset, and their
distribution are shown in Table 5b in the Appendix.
Figure 4: Distribution of instances in the human anno-
tated dataset across time and target groups. See Fig. 8
in the Appendix for the plots for each group separately;
and Table 4for the actual numbers of instances in the
human annotated dataset.
5 Models and experiments
To determine the most cost-effective model
(both in terms of performance and costs) for our
large-scale sociological analysis, we evaluate var-
ious models, including
Llama-3-70B-Instruct
,
gpt-4-1106-preview
, base and instruction-
finetuned
gpt-3.5-turbo-0125
, across high-level
and fine-grained (anti-)solidarity annotation tasks
in achieving human-level performance. Once the
quality of the models is assured, we apply the
best performing model — GPT-4 — large-scale to
determine trends in Section 7.
Data We use a 70/15/15 train/dev/test split for all
Migrant
and
Woman
annotated data, which gives us
approx. 2000 train, 400 dev and 430 test instances.
We ensure reliability of a test set by allocating ap-
proximately 40% curated and 45% majority (labels
assigned when more than half of annotators agree
on the same label) labels. We create 3 random data
splits, and calculate performance metrics as the
average score of the 3 runs on the test sets.4
4
These sets are fully used for training and evaluating base-
line models; for inference-based experiments with Llama-3,
摘要:

Fine-GrainedDetectionofSolidarityforWomenandMigrantsin155YearsofGermanParliamentaryDebatesAidaKostikova1,BenjaminPaassen1,DominikBeese2,OlePütz1,GregorWiedemann3,SteffenEger4,51BielefeldUniversity,2TUDarmstadt,3Hans-Bredow-Institut,4UniversityofMannheim,5UniversityofTechnologyNurembergaida.kostikova...

展开>> 收起<<
Fine-Grained Detection of Solidarity for Women and Migrants in 155 Years of German Parliamentary Debates Aida Kostikova1 Benjamin Paassen1 Dominik Beese2.pdf

共24页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:24 页 大小:1.79MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 24
客服
关注