Fine-Grained Detection of Solidarity for Women and Migrants in 155 Years of German Parliamentary Debates Aida Kostikova1 Benjamin Paassen1 Dominik Beese2

2025-04-27 0 0 1.79MB 24 页 10玖币

侵权投诉

Fine-Grained Detection of Solidarity for Women and Migrants in

155 Years of German Parliamentary Debates

Aida Kostikova1, Benjamin Paassen1, Dominik Beese2,

Ole Pütz1,Gregor Wiedemann3,Steffen Eger4,5

1Bielefeld University, 2TU Darmstadt, 3Hans-Bredow-Institut,

4University of Mannheim, 5University of Technology Nuremberg

aida.kostikova@uni-bielefeld.de

Abstract

Solidarity is a crucial concept to understand

social relations in societies. In this paper, we

explore ﬁne-grained solidarity frames to study

solidarity towards women and migrants in Ger-

man parliamentary debates between 1867 and

2022. Using 2,864 manually annotated text

snippets (with a cost exceeding 18k Euro), we

evaluate large language models (LLMs) like

Llama 3, GPT-3.5, and GPT-4. We ﬁnd that

GPT-4 outperforms other LLMs, approaching

human annotation quality. Using GPT-4, we

automatically annotate more than 18k further

instances (with a cost of around 500 Euro)

across 155 years and ﬁnd that solidarity with

migrants outweighs anti-solidarity but that fre-

quencies and solidarity types shift over time.

Most importantly, group-based notions of (anti-

)solidarity fade in favor of compassionate soli-

darity, focusing on the vulnerability of migrant

groups, and exchange-based anti-solidarity, fo-

cusing on the lack of (economic) contribution.

Our study highlights the interplay of histori-

cal events, socio-economic needs, and politi-

cal ideologies in shaping migration discourse

and social cohesion. We also show that power-

ful LLMs, if carefully prompted, can be cost-

effective alternatives to human annotation for

hard social scientiﬁc tasks.

1 Introduction

Solidarity is a crucial concept for understanding

how societies achieve and maintain stability and

cohesion (Reynolds,2014), and it plays a critical

role in shaping policies (Laitinen and Pessi,2014).

Traditionally, solidarity relied on common identity

and reciprocity, potentially excluding out-groups

like migrants (Hopman and Knijn,2022) and re-

inforcing social boundaries and hierarchies (An-

thias,2014). However, growing diversity and in-

creasing socio-economic, political, religious, and

cultural complexities of modern societies (Kym-

licka,2020) call such traditional forms of solidar-

ity into question. Simultaneously, recent populist

Group-

based

Exchange-

based

Compassionate

Empathic

based on... based on...

Shared

identity

Valuing for

contributions

Support for

marginalized

groups

Respect for

diversity

Perceived

differences

Imbalance in

contribution

Denying help to

vulnerable

Disregard of

groups'

differences

Solidarity Mixed Anti-solidarity

Immigrants must receive the guaranteed minimum wage just

like everyone else and the same support as everyone else.

This text expresses...

None

Figure 1: Annotation scheme based on Thijssen (2012).

The scheme categorizes statements into solidarity,anti-

solidarity,mixed, and none (high-level). At the ﬁne-

grained level, solidarity and anti-solidarity are further

divided into group-based,exchange-based,compassion-

ate, and empathic subtypes.

movements challenge policies that support equality,

such as equal opportunity or reproductive rights of

women (Inglehart and Norris,2016). These evolv-

ing complexities motivate a deeper and broader

study of social solidarity, namely (i) a ﬁne-grained

exploration of different forms of social solidarity

to reﬂect its multifaceted nature (Oosterlynck and

Bouchaute,2013), and (ii) a broader historical anal-

ysis to trace its evolution from the 19th century to

today (Banting and Kymlicka,2017). In this work,

we contribute to such a systematic study of solidar-

ity by tracing ﬁne-grained notions of solidarity and

anti-solidarity towards two target groups, women

and migrants, in political speech, namely German

parliamentary debates from 1867 to 2022 (Walter

et al.,2021). We employ the social solidarity frame-

work by Thijssen (2012) that incorporates rational

(group-based and exchange-based) and emotive

arXiv:2210.04359v3 [cs.CL] 21 Nov 2024

Gold Standard Translation of the Original German Text

(1) Compassionate solidarity

towards women

(June 29, 1961)

“In connection with § 1708 BGB, the Bundestag has set the age of 18 as the limit for

the obligation to provide maintenance. In the transitional provisions, this stipulation

has been repealed for those who had already reached the age of 16 on January 1,

1962. My faction ﬁnds this regulation unfair, as it would exempt signiﬁcant groups of

people from this maintenance obligation. Especially women who have made great

efforts to send their children to higher education, for example, would have to

bear these costs alone. [...]”

(2) Exchange-based

anti-solidarity towards migrants

(Apr. 19, 2018)

“[...] Let me also add: Migration is not necessarily successful – you always act

as if that is great – it can fail, and it fails in particular when the immigrants’

qualiﬁcations are low. In 2013, before the so-called refugee wave, 40 percent of

immigrants from non-EU countries had no qualiﬁcations. Since the wave of refugees,

stabbings have increased by 20 percent, and we have imported anti-Semitism in the

country. Does this make for an outstandingly successful migration?”

(3) Mixed stance towards

migrants

(Feb. 2, 1982)

“[...] We must accept that in a few years we will again need a higher number of

foreign workers in the Federal Republic, as Mr. Urbaniak hinted earlier. In reality,

therefore, we must commit to effective integration, which admittedly requires

[...] that there can be no exceptions, no alternative, regarding the recruitment

stop and the prevention of illegal immigration. [...]”

(4) None case (women)

(June 17, 2015)

“[...] ‘We want to be free people!’ There is probably no better phrase to open today’s

debate here in the German Bundestag about the popular uprising of 1953. [...] We

remember women and men who, 62 years ago, showed great courage because

they wanted to change the course of their country’s development and their own

lives, because they wanted to be free people.”

Table 1: Example sentences from our dataset showing (anti-)solidarity towards women/migrants. Bold text is the

main sentence, the other sentences are for context. Original German texts, as well as examples of mixed stance and

none, are available in Table 3in the Appendix.

(compassionate and empathic) elements of solidar-

ity (refer to Fig. 1and Section 4 for more details

on the typology). We focus on migrants, central for

solidarity discourse in European and German poli-

tics (Thränhardt,1993;Faist,1994;Fröhlich,2023;

Lehr,2015), and women as an “oppressed majority”

historically marginalized from public life (Calloni,

2020). As manual annotation of (anti-)solidarity

concepts in all parliamentary proceedings over this

155-year period using traditional sociological meth-

ods is practically infeasible, we explore the use of

language models for this complex task. In par-

ticular, we assess the efﬁcacy of BERT, Llama-3,

GPT-3.5, and GPT-4, to detect expressions of var-

ious (anti-)solidarity types in parliamentary texts,

aiming to identify the best performing model for

our large-scale analysis. From an NLP perspective,

this task is semantically and pragmatically chal-

lenging because, (i) expressions of (anti-)solidarity

are often implied rather than explicitly stated in the

text and their meaning is affected by the political

and historical context in which they are made (see

the examples in Table 1;Sravanthi et al.,2024); (ii)

German data, especially evolving German language

over 155 years, is under-represented in common

training data sets for LLMs, which may affect per-

formance (Ahuja et al.,2023;Qin et al.,2024;Liu

et al.,2024); and (iii) LLMs might struggle with an-

notating complex sociological concepts, achieving

lower quality and reliability compared to human

annotators (Wang et al.,2021;Ding et al.,2022;

Zhu et al.,2023;Pangakis et al.,2023).

Our contributions are: (i) We provide a human

annotated training & evaluation dataset of 2,864

text snippets, which required 40+ hours weekly

from 4-5 annotators over nine months, totaling an

investment of approximately 18k Euro; (ii) we con-

duct a comparative analysis of LLMs on a complex

sociological task in which pre-trained language

models (esp. GPT-4) outperform an open-source

model Llama-3-70B-Instruct, as well as models

ﬁne-tuned for this task (BERT, GPT-3.5 ﬁne tuned);

(iii) we provide ﬁne-grained insights into solidarity

discourse concerning migrants in Germany in the

last 155 years across different political parties.

We make our code and data available at

https:

//github.com/DominikBeese/FairGer.

2 Related work

Our work connects to (i) computational social sci-

ence (CSS) (ii) analysis of political data (parlia-

mentary debates) and (iii) the emergent ﬁeld of

analysis of social solidarity using NLP approaches.

NLP-based CSS. Recent CSS studies have lever-

aged LLMs for a variety of complex tasks. Ziems

et al. (2024) conduct a comprehensive evalua-

tion of LLMs, pointing out their weaknesses in

tasks which require understanding of subjective

expert taxonomies that deviate from the training

data of LLMs (such as implicit hate and empa-

thy classiﬁcation). LLMs enhance text-as-data

methods in social sciences, particularly in ana-

lyzing political ideology (Wu et al.,2023), but

struggle with social language understanding, of-

ten outperformed by ﬁne-tuned models (Choi et al.,

2023). Zhang et al. (2023) introduced SPARROW,

a benchmark showing ChatGPT’s limitations in

sociopragmatic understanding across languages.

In exploring German migration debates, Blokker

et al. (2020) and Zaberer et al. (2023) utilize ﬁne-

tuning of transformer-based language models to

classify claims in German newspapers. Chen et al.

(2022) apply LLM-based classiﬁcation on German

social media posts to study public controversies

over the course of one decade. In contrast to these

approaches, we apply LLMs to longitudinal histor-

ical data and explore it for a new challenging task,

ﬁne-grained detection of social solidarity.

Analysis of parliamentary debates using NLP

tools. Abercrombie and Batista-Navarro (2020a)

review 61 studies on sentiment and position-taking

within parliamentary contexts, covering dictionary-

based sentiment scoring, statistical machine learn-

ing, and other conventional NLP methods. In

terms of speciﬁc methodologies, studies often de-

ploy: (i) shallow classiﬁers, where Lai et al. (2020)

use SVM, Naïve Bayes, and Logistic Regression

for multilingual stance detection; (ii) deep learn-

ing approaches, with Abercrombie and Batista-

Navarro (2020b) applying BERT, Al Hamoud et al.

(2022) exploring LSTM variants, and Sawhney

et al. (2020) introducing GPolS for political speech

analysis; (iii) probabilistic models, as in Vilares

and He (2017)’s Bayesian approach to identify top-

ics and perspectives in debates. With German po-

litical debates, Müller-Hansen et al. (2021) use

topic modeling to study shifts in German parlia-

mentary discussions on coal due to changes in en-

ergy policy, while Walter et al. (2021) employ di-

achronic word embeddings to track antisemitic and

anti-communist biases in these debates. More re-

cently, Bornheim et al. (2023) apply Llama 2 to

automate speaker attribution in German parliamen-

tary debates from 2017-2021. Our research goes

beyond this by adopting recent powerful LLMs to

track changes of a speciﬁc social concept, solidar-

ity, in plenary debates from three centuries.

Social solidarity in NLP. Previous studies of so-

cial solidarity in NLP have largely focused on so-

cial media platforms. For example, Santhanam

et al. (2019) study how emojis are used to express

solidarity in social media during Hurricane Irma

in 2017 and Paris terrorist attacks from November

2015. Ils et al. (2021) consider solidarity in Eu-

ropean social media discourse around COVID-19.

Eger et al. (2022) extend this work by examining

how design choices, like keyword selection and

language, affect assessments of solidarity changes

over time. Compared to these works, we use a

similar methodological setup (annotate data and

infer trends), but focus on parliamentary debates

instead of social media, employ a much more ﬁne-

grained sociological framework (Thijssen,2012),

and use LLMs for systematic categorization and

examination of solidarity types over time.

3 Data

We obtain data from two sources: (i) Open Data,

covering Bundestag (en.: federal diet) protocols

from 1949 until today; and (ii) Reichstagspro-

tokolle covering Reichstag (en.: imperial diet) pro-

tocols until 1945.

We use the OCR-scanned ver-

sion from Walter et al. (2021). Links to data, mod-

els, etc. used are in Appendix D. For the Reichstag

data, we apply preprocessing steps similar to Walter

et al. (2021) (e.g., removal of OCR artifacts), but

keep German umlauts, capitalization, and punctua-

tion. We automatically split the data into individual

sittings and collect metadata like the date, period

and number of each sitting, which we manually

check and correct. Additionally, we removed in-

terjections and split the text into sentences using

NLTK (Bird et al.,2009), resulting in 19.1M sen-

tences. We release this dataset of plenary protocols

from German political debates (DeuParl) consisting

of 9,923 sittings from 1867 to 2022 on GitHub.2

To select keywords, we (i) train a Word2Vec

model (Mikolov et al.,2013) on our dataset to iden-

tify words with vector representations similar to

Migrant (en.: migrant) and Frau (en.: woman); (ii)

manually expand this list with intuitively relevant

terms; (iii) from both lists, we ﬁlter for those which

Volkskammer (en.: Eastern German parliament) protocols

could not be included due to lack of availability.

2https://github.com/DominikBeese/DeuParl-v2

appear at least 200 times in the dataset. This re-

sulted in 32 keywords for Migrant and 18 keywords

for Frau. These include general terms like Migrant,

Immigrant and Frau to period-speciﬁc terms, such

as Vertriebene (en.: expellees) and Bürgerkriegs-

ﬂüchtlinge (en.: civil war refugees), or social roles,

such as Mütter (en.: mothers) and Hausfrauen

(en.: housewives). See the full list of keywords,

and further preprocessing in Appendix A. For a de-

tailed keyword distribution across the dataset, see

Fig. 11 and Fig. 12 in the Appendix.

Using these keywords, we extract 58k main

sentences (instances) for migrants and 131k for

women from DeuParl, expanding each with three

preceding and three following sentences for con-

text, resulting in a total of (i) 463k sentences

(9.79M tokens) for migrants and (ii) 1.58M sen-

tences (32.82M tokens) for women. Fig. 2shows

the number of instances over time.

Fig. 7in the

Appendix shows yearly relative frequencies of sen-

tences with terms related to women and migrants

in the entire dataset. It is notable that both Frau

and Migrant terms represent a minor fraction of the

discourse, typically under 0.02. Periodic spikes in

mentions likely align with historical and societal

changes, such as post-WWII for Migrant.

Figure 2: Number of instances in the

Woman

and

Migrant

dataset in each year. Fig. 7in the Appendix

illustrates the relative frequency of instances in both

datasets.

4 Data annotation

To obtain ground truth data for model training and

evaluation, we annotated 2864 instances with ﬁve

annotators (all student assistants, with specializa-

tions in social science or computer science). The

annotation was performed over a duration of nine

months. In the ﬁrst three months, we iteratively

We note that the dataset is sparse in the period from 1933

to 1949, i.e. during the NS dictatorship and the immediate

after-war period until the ﬁrst parliament after the war was

elected in 1949.

reﬁned the annotation guidelines and monitored

the inter-rater agreement (measured by Cohen’s

Kappa) until inter-rater agreement converged (see

Section 4.2 for exact scores) and annotators began

annotating independently.

4.1 Annotation task design

For the manual annotation, we take the target sen-

tence and three preceding and following sentences

for context into account. We ﬁrst select a high-level

category (solidarity,anti-solidarity,mixed,none).

Solidarity or anti-solidarity cases are then further

distinguished into frames as deﬁned by Thijssen

and Verheyen (2022): group-based,compassionate,

exchange-based, and empathic. We describe each

of the included variables below.

High-level categories. Based on Lahusen and

Grasso (2018) and Ils et al. (2021), we deﬁne sol-

idarity as willingness to share resources, directly

or indirectly, or support for target groups, and anti-

solidarity as statements restricting resources, show-

ing unwillingness to support, or implying exclusion

of these groups. Texts with both supporting and op-

posing expressions are labeled mixed, while neutral

or unrelated texts are labeled none.

Group-based solidarity is coded for texts em-

phasizing shared identity and common goals

among group members, whereas group-based anti-

solidarity emphasizes out-group exclusion based

on perceived differences.

Compassionate solidarity is coded for texts sup-

porting marginalized groups, emphasizing their

need for protection, while compassionate anti-

solidarity dismisses these groups by considering

them already in a good position, minimizing their

need support or protection.

Exchange-based solidarity is coded when texts

highlight the economic contributions of “exchange

partners” and potential rewards or further contribu-

tions. Conversely, exchange-based anti-solidarity

calls for penalizing groups perceived as receiving

more than they contribute.

Empathic solidarity is coded when a speaker

expresses respect for individual differences, see-

ing social diversity as beneﬁcial, while empathic

anti-solidarity arises when differences are used as

grounds for exclusion or neglect.

Annotation involved elaborate explanations,

with identiﬁcation of (anti-)solidarity resources

and highlighting expressions of (anti-)solidarity, as

well as self- (speaker’s own viewpoint) and other-

position (addressing or criticizing others’ view-

points). The full annotation process and a detailed

example are described in Appendix Cand Fig. 15

in the Appendix, respectively.

4.2 Annotation results

(a) Annotators’ agreement (b) Annotators vs. Model

Figure 3: Fig. 3a shows the confusion matrix between

human annotators; Fig. 3b shows agreement between

the best model and human annotators on a test set from

one of the three splits. The former is aggregated over

all pairwise comparisons of annotators, thus the matrix

is symmetric.

While initial agreement levels were low, by the

time annotators began working independently, they

achieved a pairwise agreement with a Cohen’s

Kappa of 0.42 on a ﬁne-grained level and 0.62

on a high level. We observe three main disagree-

ment issues in annotation: misclassiﬁcation of none

cases, confusion between mixed stance and anti-

solidarity, and overlap within solidarity and anti-

solidarity subtypes (see Fig. 6in the Appendix).

This confusion is often due to overlapping charac-

teristics or the presence of multiple subtypes within

the text; moreover, this annotation task is inherently

subjective, which can lead to differing interpreta-

tions. This is further evidenced by our average

agreement scores. Table 6in the Appendix pro-

vides examples of annotator divergence, explain-

ing why multiple labels could be correct, which

gives insight into more difﬁcult instances. How-

ever, there was almost no confusion between sol-

idarity and anti-solidarity. We note less stability

in annotator agreement before 1930, stabilizing in

subsequent years (see Fig. 14a in the Appendix).

Although these variations can stem from the com-

plexities of historical language and diverse interpre-

tations of past events, they might also stem from

the unbalanced distribution of human annotated

data over the decades (see Fig. 4).

Our dataset comprises 2864 annotated instances,

1437 for migrants and 1427 for women. We note

that anti-solidarity accounts for 13.5% of instances,

being more common among migrants (12.1%) than

women (1.4%) (see Table 5a in the Appendix). 368

instances in our dataset (referred to as curated)

were reviewed by a social science expert to provide

a reliable comparison benchmark for evaluation of

our models. Other consensus mechanisms for the ﬁ-

nal labels in the human-annotated dataset, and their

distribution are shown in Table 5b in the Appendix.

Figure 4: Distribution of instances in the human anno-

tated dataset across time and target groups. See Fig. 8

in the Appendix for the plots for each group separately;

and Table 4for the actual numbers of instances in the

human annotated dataset.

5 Models and experiments

To determine the most cost-effective model

(both in terms of performance and costs) for our

large-scale sociological analysis, we evaluate var-

ious models, including

Llama-3-70B-Instruct

gpt-4-1106-preview

, base and instruction-

ﬁnetuned

gpt-3.5-turbo-0125

, across high-level

and ﬁne-grained (anti-)solidarity annotation tasks

in achieving human-level performance. Once the

quality of the models is assured, we apply the

best performing model — GPT-4 — large-scale to

determine trends in Section 7.

Data We use a 70/15/15 train/dev/test split for all

Migrant

and

Woman

annotated data, which gives us

approx. 2000 train, 400 dev and 430 test instances.

We ensure reliability of a test set by allocating ap-

proximately 40% curated and 45% majority (labels

assigned when more than half of annotators agree

on the same label) labels. We create 3 random data

splits, and calculate performance metrics as the

average score of the 3 runs on the test sets.4

These sets are fully used for training and evaluating base-

line models; for inference-based experiments with Llama-3,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Fine-GrainedDetectionofSolidarityforWomenandMigrantsin155YearsofGermanParliamentaryDebatesAidaKostikova1,BenjaminPaassen1,DominikBeese2,OlePütz1,GregorWiedemann3,SteffenEger4,51BielefeldUniversity,2TUDarmstadt,3Hans-Bredow-Institut,4UniversityofMannheim,5UniversityofTechnologyNurembergaida.kostikova...

展开>> 收起<<

Fine-Grained Detection of Solidarity for Women and Migrants in 155 Years of German Parliamentary Debates Aida Kostikova1 Benjamin Paassen1 Dominik Beese2.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Fine-Grained Detection of Solidarity for Women and Migrants in 155 Years of German Parliamentary Debates Aida Kostikova1 Benjamin Paassen1 Dominik Beese2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: