A Comprehensive Comparison of Neural Networks as Cognitive Models of Inﬂection Adam Wiemerslage and Shiran Dudy and Katharina Kann

2025-04-30 0 0 1.04MB 13 页 10玖币

侵权投诉

A Comprehensive Comparison of Neural Networks as Cognitive Models of

Inﬂection

Adam Wiemerslage and Shiran Dudy and Katharina Kann

University of Colorado Boulder

first.last@colorado.edu

Abstract

Neural networks have long been at the center

of a debate around the cognitive mechanism by

which humans process inﬂectional morphol-

ogy. This debate has gravitated into NLP by

way of the question: Are neural networks a fea-

sible account for human behavior in morpho-

logical inﬂection? We address that question

by measuring the correlation between human

judgments and neural network probabilities for

unknown word inﬂections. We test a larger

range of architectures than previously studied

on two important tasks for the cognitive pro-

cessing debate: English past tense, and Ger-

man number inﬂection. We ﬁnd evidence that

the Transformer may be a better account of hu-

man behavior than LSTMs on these datasets,

and that LSTM features known to increase in-

ﬂection accuracy do not always result in more

human-like behavior.

1 Introduction: The Past Tense Debate

Morphological inﬂection has historically been a

proving ground for studying models of language

acquisition. Rumelhart and McClelland (1985)

famously presented a neural network that they

claimed could learn English past tense inﬂection.

However, Pinker and Prince (1988) proposed a

dual-route theory for inﬂection, wherein regular

verbs are inﬂected based on rules and irregular

verbs are looked up in the lexicon. They high-

lighted several shortcomings of Rumelhart and Mc-

Clelland (1985) that they claimed any neural net-

work would suffer from.

This opened a line of work wherein cognitive

theories of inﬂection are analyzed by implement-

ing them as computational models and comparing

their behavior to that of humans. A famous study

in the area of morphology is the wug test (Berko,

1958), where human participants are prompted with

a novel-to-them nonce word and asked to produce

its plural form. Similarly, morphological inﬂection

models are generally evaluated on words they have

R&M

M95

P&P

K&C

A&H

Corkery

et al

Evidence for

Evidence against

Ours

Dankers

et al

McCurdy

et al

Figure 1: Summary of the past tense debate as it per-

tains to this work, color coded by evidence for (blue)

or against (red) neural networks as a cognitively plausi-

ble account for human behavior.

not seen during training. However, since they are

evaluated on actual words, it is impossible to mean-

ingfully ask a native speaker, who knows the words’

inﬂected forms, how likely different reasonable in-

ﬂections for the words in a model’s test data are.

Thus, in order to compare the behavior of humans

and models on words unknown to both, prior work

has created sets of made-up nonce words (Marcus

et al.,1995;Albright and Hayes,2003).

English Past Tense

English verbs inﬂect to ex-

press the past and present tense distinction. Most

verbs inﬂect for past tense by applying the /-

/, /-

or /-

/ sufﬁx: allophones of the regular inﬂection

class. Some verbs, however, express the past tense

with a highly infrequent or completely unique in-

ﬂection, forming the irregular inﬂection class. This

distinction between regular and irregular inﬂection

has motivated theories like the dual-route theory

described above.

Prasada and Pinker (1993) performed a wug test

for English past tense inﬂection in order to com-

pare the model from Rumelhart and McClelland

(1985) to humans with special attention to how

models behave with respect to regular vs. irregular

forms, ﬁnding that it could not account for human

generalizations. Albright and Hayes (2003, A&H)

gathered production probabilities – i.e., the normal-

arXiv:2210.12321v1 [cs.CL] 22 Oct 2022

ized frequencies of the inﬂected forms produced

by participants – and ratings – i.e., the average

rating assigned to a given past tense form on a

well-formedness scale. They then implemented

two computational models: a rule-based and an

analogy-based model and computed the correlation

between the probabilities of past tense forms for

nonce verbs under each model and according to hu-

mans. They found that the rule-based model more

accurately accounts for nonce word inﬂection.

After several years of progress for neural net-

works, including state-of-the-art results on morpho-

logical inﬂection (Kann and Schütze,2016;Cot-

terell et al.,2016), this debate was revisited by

Kirov and Cotterell (2018, K&C), who examined

modern neural networks. They trained a bidirec-

tional LSTM (Hochreiter and Schmidhuber,1997)

with attention (Bahdanau et al.,2015) on English

past tense inﬂection and in experiments quantifying

model accuracy on a held out set of real English

verbs, they showed that it addresses many of the

shortcomings pointed out by Pinker and Prince

(1988). They concluded that the LSTM is, in fact,

capable of modeling English past tense inﬂection.

They also applied the model to the wug experiment

from A&H and found a positive correlation with

human production probabilities that was slightly

higher than the rule-based model from A&H.

Corkery et al. (2019, C&al.) reproduced this ex-

periment and additionally compared to the average

human rating that each past tense form received

in A&H’s dataset. They found that the neural net-

work from K&C produced probabilities that were

sensitive to random initialization – showing high

variance in the resulting correlations with humans –

and typically did not correlate better than the rule-

based model from A&H. They then designed an

experiment where inﬂected forms were sampled

from several different randomly initialized mod-

els, so that the frequencies of each form could be

aggregated in a similar fashion to the adult pro-

duction probabilities – but the results still favored

A&H. They hypothesized that the model’s overcon-

ﬁdence in the most likely inﬂection (i.e. the regular

inﬂection class) leads to uncharacteristically low

variance on predictions for unknown words.

German Noun Plural

McCurdy et al. (2020a,

M&al.) applied an LSTM to the task of German

noun plural inﬂection to investigate a hypothesis

from Marcus et al. (1995, M95), who attributed the

outputs of neural models to their susceptibility to

the most frequent pattern observed during training,

stressing that, as a result, neural approaches fail to

learn patterns of infrequent groups.

German nouns inﬂect for the plural and singular

distinction. There are ﬁve sufﬁxes, none of which

is considered a regular majority: /-(e)n/, /-e/, /-er/,

/-s/, and /-

∅

/. M95 had built a dataset of monosyl-

labic German noun wugs and investigated human

behavior when inﬂecting the plural form, distin-

guishing between phonologically familiar environ-

ments (rhymes), and unfamiliar ones (non-rhymes).

The German plural system, they argued, was an

important test for neural networks since it presents

multiple productive inﬂection rules, all of which

are minority inﬂection classes by frequency. This

is in contrast to the dichotomy of the regular and

irregular English past tense. M&al. collected their

own human production probabilities and ratings

for these wugs, and then compared those to LSTM

productions. Humans were prompted with each

wug with the neuter determiner to control for the

fact that neural inﬂection models of German noun

plurals are sensitive to grammatical gender (Goebel

and Indefrey,2000), and because humans do not

have a majority preference for monosyllabic, neuter

nouns (Clahsen et al.,1992).

The /-s/ inﬂection class, which is highly infre-

quent appears in a wide range of phonological con-

texts, which has lead some research to suggest it

is the default class for German noun plurals, and

thus the regular inﬂection, despite its infrequent

use. M&al. found that it was preferred by hu-

mans in Non-Rhyme context more than Rhymes,

but the LSTM model showed the opposite pref-

erence, undermining the hypothesis that LSTMs

model human generalization behavior. /-s/ was ad-

ditionally predicted less accurately on a held-out

test set of real noun inﬂections when compared to

other inﬂection classes.

They found that the most frequent inﬂection

class in the training for the monosyllabic neuter

contexts, /-e/, was over-generalized by the LSTM

when compared to human productions. The most

frequent class overall, /-(e)n/ (but infrequent in the

neuter context), was applied by humans quite fre-

quently to nonce nouns, but rarely by the LSTM.

They additionally found that /-er/, which is as in-

frequent as /-s/, could be accurately predicted in

the test set, and the null inﬂection /-

∅

/, which is

generally frequent, but extremely rare in the mono-

syllabic, neuter setting was never predicted for the

wugs. We refer to McCurdy et al. (2020a) for more

details on the inﬂection classes and their frequen-

cies, and additional discussion around their rele-

vance to inﬂection behavior.

Ultimately, M&al. reported no correlation with

human production probabilities for any inﬂection

class. They concluded that modern neural networks

still simply generalize the most frequent patterns to

unfamiliar inputs.

Dankers et al. (2021) performed in-depth behav-

ioral and structural analyses of German noun plural

inﬂection by a unidirectional LSTM without atten-

tion. They argued that these modeling decisions

made a more plausible model of human cognition.

In a behavioral test they found that, like humans but

unlike M&al., their model did predict /-s/ more for

non-rhymes than for rhymes, but the result was not

statistically signiﬁcant. They also found that /-s/

was applied with a high frequency and attributed

this to sensitivity to word length. For a visual of all

studies discussed in this section, see Figure 1.

Our Contribution

Most work on modern neural

networks discussed here analyzes the same bidirec-

tional LSTM with attention and draws a mixture of

conclusions based on differing experimental setups.

Dankers et al. (2021) changed the LSTM-based

architecture, and found somewhat different results

for German number inﬂection, though they did not

investigate correlations with human ratings nor pro-

duction probabilities in the same way as previous

work. The limited variation of architectures in pre-

vious studies as well as inconsistent methods of

comparison with human behavior prevent us from

drawing deﬁnite conclusions about the adequacy

of neural networks as models of human inﬂection.

Here, we present results on a wider range of

LSTMs and a Transformer (Vaswani et al.,2017)

model for both English past tense and German num-

ber inﬂection. We ask which architecture is the

best account for human inﬂection behavior and,

following M&al., investigate the actual model pro-

ductions (and probabilities) for the German plural

classes in order to qualitatively compare to human

behavior. We additionally ask how architectural de-

cisions for the LSTM encoder-decoder affect this

correlation. Finally, we investigate the relation-

ship between inﬂection accuracy on the test set and

correlation with human wug ratings.

We ﬁnd that the Transformer consistently cor-

relates best with human ratings, producing proba-

bilities that result in Spearman’s

in the range of

0.47-0.71 for several inﬂection classes, which is fre-

quently higher than LSTMs. However, when look-

ing closely at the Transformer productions, it dis-

plays behavior that deviates from humans similarly

to the LSTM in M&al., though to a lesser extent.

While attention greatly increases LSTM accuracy

on inﬂection, we also ﬁnd that it does not always

lead to better correlations with human wug ratings,

and that the directionality of the encoder has more

complicated implications. Finally, we ﬁnd that

there is no clear relationship between model accu-

racy and correlation with human ratings across all

experiments, demonstrating that neural networks

can solve the inﬂection task in its current setup

without learning human-like distributions. While

the Transformer experiment in this work demon-

strates stronger correlations with human behavior,

and some more human-like behaviors than before,

our ﬁndings continue to cast doubt on the cognitive

plausibility of neural networks for inﬂection.

2 Neural Morphological Inﬂection

2.1 Task Description

The experiments in this paper are centered around

a natural language processing (NLP) task called

morphological inﬂection, which consists of gener-

ating an inﬂected form for a given lemma and set of

morphological features indicating the target form.

It is typically cast as a character-level sequence-to-

sequence task, where the characters of the lemma

and the morphological features constitute the input,

while the characters of the target inﬂected form are

the output (Kann and Schütze,2016):

PST c r y →cried

Formally, let

be the paradigm slots expressed

in a language and

a lemma in the language. The

set of all inﬂected forms – or paradigm –

then deﬁned as:

π(l) = nfk[l], tkok∈S (1)

fk[l]

denotes the inﬂection of

which expresses tag

, and

and

fk[l]

represent strings consisting of

letters from the language’s alphabet Σ.

The task of morphological inﬂection can then

formally be described as predicting the form

fi[l]

from the paradigm of lcorresponding to tag ti.

2.2 Models

Rumelhart and McClelland

The original

model of Rumelhart and McClelland (1985)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AComprehensiveComparisonofNeuralNetworksasCognitiveModelsofInectionAdamWiemerslageandShiranDudyandKatharinaKannUniversityofColoradoBoulderfirst.last@colorado.eduAbstractNeuralnetworkshavelongbeenatthecenterofadebatearoundthecognitivemechanismbywhichhumansprocessinectionalmorphol-ogy.Thisdebatehasg...

展开>> 收起<<

A Comprehensive Comparison of Neural Networks as Cognitive Models of Inﬂection Adam Wiemerslage and Shiran Dudy and Katharina Kann.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Comprehensive Comparison of Neural Networks as Cognitive Models of Inﬂection Adam Wiemerslage and Shiran Dudy and Katharina Kann

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: