Modeling the Graphotactics of Low-Resource Languages Using Sequential GANs Isaac Wasserman

2025-05-06 0 0 135.44KB 5 页 10玖币

侵权投诉

Modeling the Graphotactics of Low-Resource Languages Using

Sequential GANs

Isaac Wasserman

Haverford College / Haverford, PA

University of Pennsylvania / Philadelphia, PA

isaacrw@seas.upenn.edu

Abstract

Generative Adversarial Networks (GANs)

have been shown to aid in the creation of ar-

tiﬁcial data in situations where large amounts

of real data are difﬁcult to come by. This is-

sue is especially salient in the computational

linguistics space, where researchers are of-

ten tasked with modeling the complex mor-

phologic and grammatical processes of low-

resource languages. This paper will discuss

the implementation and testing of a GAN that

attempts to model and reproduce the grapho-

tactics of a language using only 100 example

strings. These artiﬁcial, yet graphotactically

compliant, strings are meant to aid in modeling

the morphological inﬂection of low-resource

languages.

1 Introduction

1.1 Task

In 2019, Anastasopoulos and Neubig made waves

with their multilingual morphological inﬂection

model for low resource languages (Anastasopou-

los and Neubig,2019) that they submitted to the

SIGMORPHON 2019 shared task (McCarthy et al.,

2019). All models submitted were pretrained on

high resource languages of similar ancestry to the

target language, allowing many models to greatly

exceed the performance of previous attempts at

low-resource morphological inﬂection. However,

what allowed Anastasopoulos and Neubig’s model

to outperform other submissions was its use of data

“hallucination”.

To perform this hallucination, they aligned the

lemma with its inﬂected form, extracted the stem,

and generated new artiﬁcial examples by replacing

this stem with randomly generated strings (in the

language’s alphabet) of equal length.

Though this

random substitution may seem haphazard, the ap-

proach allowed for an additional 10% accuracy, on

The alignment process assumes that the lemma and in-

ﬂected form share a common substring.

average, when tested against versions of the model

that only used cross-lingual transfer.

Surely, a more well informed approach to stem

generation would further improve the accuracy of

the inﬂectional model. Given the demonstrated

ability of GANs to produce photorealistic, yet com-

pletely contrived images, they are potentially ideal

for such a task. The experiments detailed in this

paper attempt to produce a technique for gener-

ating fake word stems that provide more relevant

information to the inﬂectional model from Anasta-

sopoulos and Neubig, 2019 (Anastasopoulos and

Neubig,2019), thereby increasing the accuracy of

its inﬂections. By modeling the graphotactics of

the target language using a GAN, it should be pos-

sible to produce strings that more accurately depict

possible character sequences.

1.2 Generative Adversarial Networks

Generative adversarial networks are a class of un-

supervised machine learning architectures, most

commonly used for image generation. These net-

works consist of a generator and a discriminator

that are trained simultaneously on a set of data rep-

resenting a class or domain; this domain could be

anything from photos of human faces to time series

of hourly temperatures.

The generator is tasked

with producing “fake” examples that are within

this domain without ever seeing any real examples

from the training set. Meanwhile, the discriminator

is fed a combination of fake examples (from the

generator) and real examples and is tasked with

classifying them as real or fake. The respective

goals of the generator and discriminator constitute

a zero-sum game, in which the generator is con-

stantly trying to outsmart the discriminator, while

the discriminator hones its ability to distinguish

between in-domain and out-of-domain examples.

Though GANs are most often applied to im-

Technically speaking, the generator and discriminator are

most often trained one after another on a repeated basis.

arXiv:2210.14409v1 [cs.CL] 26 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ModelingtheGraphotacticsofLow-ResourceLanguagesUsingSequentialGANsIsaacWassermanHaverfordCollege/Haverford,PAUniversityofPennsylvania/Philadelphia,PAisaacrw@seas.upenn.eduAbstractGenerativeAdversarialNetworks(GANs)havebeenshowntoaidinthecreationofar-ticialdatainsituationswherelargeamountsofrealdata...

展开>> 收起<<

Modeling the Graphotactics of Low-Resource Languages Using Sequential GANs Isaac Wasserman.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Modeling the Graphotactics of Low-Resource Languages Using Sequential GANs Isaac Wasserman

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: