The boundaries of meaning a case study in neural machine translation Yuri Balashov

2025-05-06 0 0 1.19MB 35 页 10玖币

侵权投诉

The boundaries of meaning: a case study in neural

machine translation∗

Yuri Balashov†

Department of Philosophy, University of Georgia, Athens, GA 30602, USA

yuri@uga.edu

ORCID ID: 0000-0001-7369-2122

October 4, 2022

Abstract

The success of deep learning in natural language processing raises intriguing

questions about the nature of linguistic meaning and ways in which it can

be processed by natural and artiﬁcial systems. One such question has to do

with subword segmentation algorithms widely employed in language modeling,

machine translation, and other tasks since 2016. These algorithms often cut

words into semantically opaque pieces, such as ‘period’, ‘on’, ‘t’, and ‘ist’ in

‘period|on|t|ist’. The system then represents the resulting segments in a dense

vector space, which is expected to model grammatical relations among them.

This representation may in turn be used to map ‘period|on|t|ist’ (English) to

‘par|od|ont|iste’ (French). Thus, instead of being modeled at the lexical level,

translation is reformulated more generally as the task of learning the best bilin-

gual mapping between the sequences of subword segments of two languages;

and sometimes even between pure character sequences: ‘p|e|r|i|o|d|o|n|t|i|s|t|’

→‘p|a|r|o|d|o|n|t|i|s|t|e’. Such subword segmentations and alignments are at

work in highly eﬃcient end-to-end machine translation systems, despite their

allegedly opaque nature. The computational value of such processes is un-

questionable. But do they have any linguistic or philosophical plausibility? I

attempt to cast light on this question by reviewing the relevant details of the

subword segmentation algorithms and by relating them to important philosoph-

ical and linguistic debates, in the spirit of making artiﬁcial intelligence more

transparent and explainable.

Keywords: Opacity; Deep learning; Computational Linguistics; Neural ma-

chine translation; Subword segmentation

∗Published in Inquiry. doi:10.1080/0020174X.2022.2113429

†I wish to thank the referees for the very helpful comments and suggestions. This work was

partially supported by a M. G. Michael Award from the Franklin College of Arts and Sciences at

the University of Georgia, AY 2023.

arXiv:2210.00613v1 [cs.CL] 2 Oct 2022

1 Introduction: Quine and Kaplan on the insigniﬁcance

of ‘nine’ in ‘canine’

Words can be split into smaller segments in diﬀerent ways. Some of them are illus-

trated below:

(1) a. canines →canine|s canine.PL

b. canine →ca|nine

c. canine →can|in|e

d. canine →cani|ne

e. canine →c|a|n|i|n|e

(1a) is a typical case of morphemic segmentation dividing words into morphemes,

the smallest units of meaning contributing to the whole according to the rules of

morphosemantics. The segments in (1b – 1d), on the other hand, cut across mor-

pheme boundaries and are, in this sense, accidental. (1e) is the limit case of purely

orthographic or character segmentation which appears to have nothing to do with

semantics. (1b) and (1c), and perhaps (1e), are diﬀerent from (1d) in that some

segments in the former, but not in the latter, are meaningful when considered on

their own: witness ‘nine’ in (1b), ‘can’ and ‘in’ in (1c), and ‘a’ (a deﬁnite article) and

‘i’ (a lowercased personal pronoun) in (1e). But these items do not contribute their

usual meaning to the whole and are, for that reason, semantically inert or irrelevant.

The contexts in which they appear are usually deemed to be semantically opaque.

Cases like (1b), (1c), and (1e) were made famous by Quine (1960, §30) and

Kaplan (1969). Quine drew a stark contrast between the occurrence of singular

terms like ‘nine’ in semantically transparent contexts such as

(2) Nine is greater than seven.

and in modal and propositional-attitude contexts, which he regarded as hopelessly

opaque due to their resistance to substitution and existential generalization:

(3) Necessarily, nine is greater than seven.

(4) Frank believes that nine is greater than seven.

Thus (4) may be true (assuming Frank knows his arithmetic) and (5) false (if he is

astronomically challenged):

(5) Frank believes that the number of planets is greater than seven.

Hence, we cannot coherently speak of some number, no matter how it is designated,

that the predicate ‘λx (Frank believes that xis greater than seven)’ is true of:

(6) #(∃x) (Frank believes that xis greater than seven).

Quine motivated his pessimism about (3) – (6) by assimilating the occurrence of

‘nine’ and other similar expressions in contexts such as (3) and (4) to their occurrence

in (1b) and their analogs:

We are not unaccustomed to passing over occurrences that somehow “do not

count” — ‘mary’ in ‘summary’, ‘can’ in ‘canary’; and we can allow similarly

for all non-referential occurrences of terms, once we know what to look out for

(Quine 1960, 144).

Kaplan’s approach, in contrast, was more optimistic. Getting inspiration from

Frege’s notion of referential shift (Frege 1892), he took the occurrences of ‘nine’ in (3)

and (4) to be fully transparent but denoting, not the number nine, but themselves

(i.e. the expression ‘nine’, as in Kaplan (1969)), or their sense (in his version of

intensional logic). With the aid of additional resources this allows one to make full

sense of (3) – (6):

(30)∃α(∆(α, nine)&Npαis greater than ﬁveq).

(40)∃α(∆(α, nine)& Frank Bpαis greater than ﬁveq).

(50)∃β(∆(β, nine)&¬Frank Bpβis greater than ﬁveq).

(60)∃x(xis a number ∧ ∃α(∆(α, x)∧Frank Bpαis greater than ﬁveq)).

where αand βrange over expressions, ‘N’ and ‘B’ are sentential analogs of the

necessity and belief operators, and ‘∆’ is Church’s denotation predicate adapted by

Kaplan. One can fully expect all of (30)–(60)to be true.

As Kaplan notes (Kaplan 1969), this is only the ﬁrst step in a good, Fregean

direction, “ripe with insight.” And his early response to Quine is just the tip of an

iceberg.1I began with this classic exchange because it provides a useful background

and a point of reference for my case study. What really matters for it is not where

Quine and Kaplan disagree but where they agree: that no semantic sense can be

made of the occurrence of ‘nine’ in ‘canine’ — see (1b) above — or, for that matter,

of the occurrences of the subword segments in (1c – 1e). To paraphrase Kaplan,

semantic concerns — substitution, existential generalization, and contribution to

the meaning of the whole — are simply inappropriate to (1b – 1e) alike.2,3 This

seems to be a reasonable common ground.

The goal of this paper is to argue that recent developments in computational

linguistics may prompt us to be more open-minded about this common ground. As

recently noted by a leading researcher (Koehn 2020, 229),

1I.e. the ongoing debate on propositional attitude reports. For a recent overview, see Nelson

(2022).

2Presumably, neither Quine nor Kaplan would object to a standard morphosemantic analysis

of (1a).

3Quine’s take on character segmentation such as (1e) is notable in the present context. He

intimates (Quine 1960, 143–4, 189–90) that spelling or orthographic transcription may be preferable

to quotation because, unlike quotation, orthographic transcription generates not even an illusion of

transparency. I revisit character segmentation in Section 3.2

In the onslaught of deep learning on natural language processing, the linguistic

concept of the word has survived, even as concepts such as morphology or

syntax have been relegated to be just latent properties of language that can be

discovered automatically in the intermediate representations of neural network

models. But even that may fall. Maybe the atomic unit of language should

be just the consonants and vowels, or in their written form, a character in

the writing system — a letter in Latin script, a logograph or just a stroke in

Chinese.

Koehn is speaking of the semantic import of the “intermediate hidden vector repre-

sentation” of subword pieces and separate characters, such as (1a – 1e) above, not

simply of their initial encoding in the form of useful numerical indices.

Such claims require careful examination, and the devil may be in the details.

Language translation, I submit, is a natural place to examine them. According

to the conventional wisdom, meaning representation and meaning transfer are at

the very core of translation.4Thinkers as diﬀerent as Schleiermacher, Heidegger,

Benjamin, Quine, and Davidson approached this idea from rather diﬀerent angles.5

Jakobson (1959, 232) put it in a slogan: “The meaning of any linguistic sign is

its translation into some further, alternative sign.” On this view, the meaning of

‘dog’ has much, if not everything, to do with the fact that it is variously translated

as chien,Hund, and perro. But this is just a starting point. ‘dog|s’ is translated

as chien|sor chien|nes, and ‘kick the bucket’ as casser sa pipe (“break his pipe”).

Signs, or “semantic atoms,” therefore, may be word-internal functional morphemes

such as ‘-s’, or entire idiomatic phrases; they may be smaller or larger than words.

In a broader perspective, diﬀerent languages describe (model, represent) the extra-

linguistic reality (i.e. who did what to whom) in very diﬀerent ways reﬂected in

numerous and often crosscutting typologies. For example, the “one morpheme per

word” pattern of isolating analytic languages, such as Chinese (Dawson and Phelan

2016, 171):

(7) a. [wO m@n tAn tçin]

Iplural play piano

“We are playing the piano”

b. [wO m@n tAn tçin l@]

Iplural play piano past

“We played the piano”

4The conventional wisdom has been challenged from several directions usefully characterized by

the translation scholar Rachel Weissbrod as follows: “[1] translation cannot transfer meaning; [2]

meaning is not what translators are supposed to transfer; [3] translators are authorized to create

meaning rather than transferring it; (4) translation studies is not about meaning” (Weissbrod 2018,

289). I return to the relationship between translation and meaning at the end of the paper.

5For recent discussions of their views on translation, see Rawling and Wilson (2018).

is contrasted with the morphological processes in synthetic agglutinative languages

like Turkish in which words are formed by concatenating multiple morphemes with

clean boundaries:6

(8) masalarımda (Turkish)

masa (‘desk’) + lar (plural) + ım (‘my’: possessive) + da (‘at/on’: locative)

“on my desks”

In polysynthetic languages such as Yupik, Chukchi, Sora, and Tiwi, highly complex

words may be formed by combining several stems and aﬃxes (i.e. both lexical and

functional morphemes):7

(9) [angyaghllangyugtuq] (Yupik)

[angya-ghlla-ng-yug-tuq]

Boat. AUGMENT. ACQUIRE. DESIDERATIVE. 3SG

“He wants to acquire a big boat”

(10) [NEn@dZdZadarsi@m] (Sora)

[NEn-@dZ-dZa-dar-si-@m]

Inot received cooked-rice hand you.SG

“I will not receive cooked rice from your hands”

Translating between languages of diﬀerent types is far from straightforward. In

many cases it requires mapping subword sequences to word or phrase sequences and

vice versa, and sometimes mapping a single long word into an entire sentence. It

requires dealing with structures both below and above the word level whose rela-

tionship, traditionally studied in morphosyntax, may be complicated.

Even more intriguingly, Marian NMT8— a state-of-the-art neural machine

translation engine, developed primarily by the Microsoft Translator team and widely

used in production — translates the word ‘periodontist’ from English to its nearest

neighbor French as parodontiste, with the source and target segmented and aligned

as shown below:9

6http://www.turkishtextbook.com/adding-word-endings-agglutination

7Examples from Veselovská (2009, 47) and Dawson and Phelan (2016, 175).

8https://marian-nmt.github.io

9Segmentation and alignment based on the OPUS-CAT implementation of Marian NMT (Niem-

inen 2021); trained on over 100M English-French sentence pairs from the OPUS collection of mul-

tilingual corpora (https://opus.nlpl.eu).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Theboundariesofmeaning:acasestudyinneuralmachinetranslation*YuriBalashovDepartmentofPhilosophy,UniversityofGeorgia,Athens,GA30602,USAyuri@uga.eduORCIDID:0000-0001-7369-2122October4,2022AbstractThesuccessofdeeplearninginnaturallanguageprocessingraisesintriguingquestionsaboutthenatureoflinguisticmean...

展开>> 收起<<

The boundaries of meaning a case study in neural machine translation Yuri Balashov.pdf

共35页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

The boundaries of meaning a case study in neural machine translation Yuri Balashov

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: