The boundaries of meaning a case study in neural machine translation Yuri Balashov

2025-05-06 0 0 1.19MB 35 页 10玖币
侵权投诉
The boundaries of meaning: a case study in neural
machine translation
Yuri Balashov
Department of Philosophy, University of Georgia, Athens, GA 30602, USA
yuri@uga.edu
ORCID ID: 0000-0001-7369-2122
October 4, 2022
Abstract
The success of deep learning in natural language processing raises intriguing
questions about the nature of linguistic meaning and ways in which it can
be processed by natural and artificial systems. One such question has to do
with subword segmentation algorithms widely employed in language modeling,
machine translation, and other tasks since 2016. These algorithms often cut
words into semantically opaque pieces, such as ‘period’, ‘on’, ‘t’, and ‘ist’ in
period|on|t|ist’. The system then represents the resulting segments in a dense
vector space, which is expected to model grammatical relations among them.
This representation may in turn be used to map ‘period|on|t|ist’ (English) to
par|od|ont|iste’ (French). Thus, instead of being modeled at the lexical level,
translation is reformulated more generally as the task of learning the best bilin-
gual mapping between the sequences of subword segments of two languages;
and sometimes even between pure character sequences: p|e|r|i|o|d|o|n|t|i|s|t|
p|a|r|o|d|o|n|t|i|s|t|e’. Such subword segmentations and alignments are at
work in highly efficient end-to-end machine translation systems, despite their
allegedly opaque nature. The computational value of such processes is un-
questionable. But do they have any linguistic or philosophical plausibility? I
attempt to cast light on this question by reviewing the relevant details of the
subword segmentation algorithms and by relating them to important philosoph-
ical and linguistic debates, in the spirit of making artificial intelligence more
transparent and explainable.
Keywords: Opacity; Deep learning; Computational Linguistics; Neural ma-
chine translation; Subword segmentation
Published in Inquiry. doi:10.1080/0020174X.2022.2113429
I wish to thank the referees for the very helpful comments and suggestions. This work was
partially supported by a M. G. Michael Award from the Franklin College of Arts and Sciences at
the University of Georgia, AY 2023.
1
arXiv:2210.00613v1 [cs.CL] 2 Oct 2022
1 Introduction: Quine and Kaplan on the insignificance
of ‘nine’ in ‘canine’
Words can be split into smaller segments in different ways. Some of them are illus-
trated below:
(1) a. canines canine|s canine.PL
b. canine ca|nine
c. canine can|in|e
d. canine cani|ne
e. canine c|a|n|i|n|e
(1a) is a typical case of morphemic segmentation dividing words into morphemes,
the smallest units of meaning contributing to the whole according to the rules of
morphosemantics. The segments in (1b – 1d), on the other hand, cut across mor-
pheme boundaries and are, in this sense, accidental. (1e) is the limit case of purely
orthographic or character segmentation which appears to have nothing to do with
semantics. (1b) and (1c), and perhaps (1e), are different from (1d) in that some
segments in the former, but not in the latter, are meaningful when considered on
their own: witness ‘nine’ in (1b), ‘can’ and ‘in’ in (1c), and ‘a’ (a definite article) and
‘i’ (a lowercased personal pronoun) in (1e). But these items do not contribute their
usual meaning to the whole and are, for that reason, semantically inert or irrelevant.
The contexts in which they appear are usually deemed to be semantically opaque.
Cases like (1b), (1c), and (1e) were made famous by Quine (1960, §30) and
Kaplan (1969). Quine drew a stark contrast between the occurrence of singular
terms like ‘nine’ in semantically transparent contexts such as
(2) Nine is greater than seven.
and in modal and propositional-attitude contexts, which he regarded as hopelessly
opaque due to their resistance to substitution and existential generalization:
(3) Necessarily, nine is greater than seven.
(4) Frank believes that nine is greater than seven.
Thus (4) may be true (assuming Frank knows his arithmetic) and (5) false (if he is
astronomically challenged):
(5) Frank believes that the number of planets is greater than seven.
Hence, we cannot coherently speak of some number, no matter how it is designated,
that the predicate ‘λx (Frank believes that xis greater than seven)’ is true of:
(6) #(x) (Frank believes that xis greater than seven).
2
Quine motivated his pessimism about (3) – (6) by assimilating the occurrence of
‘nine’ and other similar expressions in contexts such as (3) and (4) to their occurrence
in (1b) and their analogs:
We are not unaccustomed to passing over occurrences that somehow “do not
count” — ‘mary’ in ‘summary’, ‘can’ in ‘canary’; and we can allow similarly
for all non-referential occurrences of terms, once we know what to look out for
(Quine 1960, 144).
Kaplan’s approach, in contrast, was more optimistic. Getting inspiration from
Frege’s notion of referential shift (Frege 1892), he took the occurrences of ‘nine’ in (3)
and (4) to be fully transparent but denoting, not the number nine, but themselves
(i.e. the expression ‘nine’, as in Kaplan (1969)), or their sense (in his version of
intensional logic). With the aid of additional resources this allows one to make full
sense of (3) – (6):
(30)α(∆(α, nine)&Npαis greater than fiveq).
(40)α(∆(α, nine)& Frank Bpαis greater than fiveq).
(50)β(∆(β, nine)&¬Frank Bpβis greater than fiveq).
(60)x(xis a number ∧ ∃α(∆(α, x)Frank Bpαis greater than fiveq)).
where αand βrange over expressions, ‘N’ and ‘B’ are sentential analogs of the
necessity and belief operators, and ‘’ is Church’s denotation predicate adapted by
Kaplan. One can fully expect all of (30)(60)to be true.
As Kaplan notes (Kaplan 1969), this is only the first step in a good, Fregean
direction, “ripe with insight.” And his early response to Quine is just the tip of an
iceberg.1I began with this classic exchange because it provides a useful background
and a point of reference for my case study. What really matters for it is not where
Quine and Kaplan disagree but where they agree: that no semantic sense can be
made of the occurrence of ‘nine’ in ‘canine’ — see (1b) above — or, for that matter,
of the occurrences of the subword segments in (1c – 1e). To paraphrase Kaplan,
semantic concerns — substitution, existential generalization, and contribution to
the meaning of the whole — are simply inappropriate to (1b – 1e) alike.2,3 This
seems to be a reasonable common ground.
The goal of this paper is to argue that recent developments in computational
linguistics may prompt us to be more open-minded about this common ground. As
recently noted by a leading researcher (Koehn 2020, 229),
1I.e. the ongoing debate on propositional attitude reports. For a recent overview, see Nelson
(2022).
2Presumably, neither Quine nor Kaplan would object to a standard morphosemantic analysis
of (1a).
3Quine’s take on character segmentation such as (1e) is notable in the present context. He
intimates (Quine 1960, 143–4, 189–90) that spelling or orthographic transcription may be preferable
to quotation because, unlike quotation, orthographic transcription generates not even an illusion of
transparency. I revisit character segmentation in Section 3.2
3
In the onslaught of deep learning on natural language processing, the linguistic
concept of the word has survived, even as concepts such as morphology or
syntax have been relegated to be just latent properties of language that can be
discovered automatically in the intermediate representations of neural network
models. But even that may fall. Maybe the atomic unit of language should
be just the consonants and vowels, or in their written form, a character in
the writing system — a letter in Latin script, a logograph or just a stroke in
Chinese.
Koehn is speaking of the semantic import of the “intermediate hidden vector repre-
sentation” of subword pieces and separate characters, such as (1a – 1e) above, not
simply of their initial encoding in the form of useful numerical indices.
Such claims require careful examination, and the devil may be in the details.
Language translation, I submit, is a natural place to examine them. According
to the conventional wisdom, meaning representation and meaning transfer are at
the very core of translation.4Thinkers as different as Schleiermacher, Heidegger,
Benjamin, Quine, and Davidson approached this idea from rather different angles.5
Jakobson (1959, 232) put it in a slogan: “The meaning of any linguistic sign is
its translation into some further, alternative sign.” On this view, the meaning of
‘dog’ has much, if not everything, to do with the fact that it is variously translated
as chien,Hund, and perro. But this is just a starting point. ‘dog|s’ is translated
as chien|sor chien|nes, and ‘kick the bucket’ as casser sa pipe (“break his pipe”).
Signs, or “semantic atoms,” therefore, may be word-internal functional morphemes
such as ‘-s’, or entire idiomatic phrases; they may be smaller or larger than words.
In a broader perspective, different languages describe (model, represent) the extra-
linguistic reality (i.e. who did what to whom) in very different ways reflected in
numerous and often crosscutting typologies. For example, the “one morpheme per
word” pattern of isolating analytic languages, such as Chinese (Dawson and Phelan
2016, 171):
(7) a. [wO m@n tAn tçin]
Iplural play piano
“We are playing the piano”
b. [wO m@n tAn tçin l@]
Iplural play piano past
“We played the piano”
4The conventional wisdom has been challenged from several directions usefully characterized by
the translation scholar Rachel Weissbrod as follows: “[1] translation cannot transfer meaning; [2]
meaning is not what translators are supposed to transfer; [3] translators are authorized to create
meaning rather than transferring it; (4) translation studies is not about meaning” (Weissbrod 2018,
289). I return to the relationship between translation and meaning at the end of the paper.
5For recent discussions of their views on translation, see Rawling and Wilson (2018).
4
is contrasted with the morphological processes in synthetic agglutinative languages
like Turkish in which words are formed by concatenating multiple morphemes with
clean boundaries:6
(8) masalarımda (Turkish)
masa (‘desk’) + lar (plural) + ım (‘my’: possessive) + da (‘at/on’: locative)
on my desks
In polysynthetic languages such as Yupik, Chukchi, Sora, and Tiwi, highly complex
words may be formed by combining several stems and affixes (i.e. both lexical and
functional morphemes):7
(9) [angyaghllangyugtuq] (Yupik)
[angya-ghlla-ng-yug-tuq]
Boat. AUGMENT. ACQUIRE. DESIDERATIVE. 3SG
“He wants to acquire a big boat”
(10) [NEn@dZdZadarsi@m] (Sora)
[NEn-@dZ-dZa-dar-si-@m]
Inot received cooked-rice hand you.SG
“I will not receive cooked rice from your hands”
Translating between languages of different types is far from straightforward. In
many cases it requires mapping subword sequences to word or phrase sequences and
vice versa, and sometimes mapping a single long word into an entire sentence. It
requires dealing with structures both below and above the word level whose rela-
tionship, traditionally studied in morphosyntax, may be complicated.
Even more intriguingly, Marian NMT8— a state-of-the-art neural machine
translation engine, developed primarily by the Microsoft Translator team and widely
used in production — translates the word ‘periodontist’ from English to its nearest
neighbor French as parodontiste, with the source and target segmented and aligned
as shown below:9
6http://www.turkishtextbook.com/adding-word-endings-agglutination
7Examples from Veselovská (2009, 47) and Dawson and Phelan (2016, 175).
8https://marian-nmt.github.io
9Segmentation and alignment based on the OPUS-CAT implementation of Marian NMT (Niem-
inen 2021); trained on over 100M English-French sentence pairs from the OPUS collection of mul-
tilingual corpora (https://opus.nlpl.eu).
5
摘要:

Theboundariesofmeaning:acasestudyinneuralmachinetranslation*YuriBalashov„DepartmentofPhilosophy,UniversityofGeorgia,Athens,GA30602,USAyuri@uga.eduORCIDID:0000-0001-7369-2122October4,2022AbstractThesuccessofdeeplearninginnaturallanguageprocessingraisesintriguingquestionsaboutthenatureoflinguisticmean...

展开>> 收起<<
The boundaries of meaning a case study in neural machine translation Yuri Balashov.pdf

共35页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:35 页 大小:1.19MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 35
客服
关注