Text Editing as Imitation Game Ning ShiBin TangBo YuanLongtao Huang Yewen PuJie Fu yZhouhan LinF

2025-05-02 0 0 682.85KB 12 页 10玖币

侵权投诉

Text Editing as Imitation Game

∗Ning Shi♠♥ Bin Tang♥Bo Yuan♥Longtao Huang♥

Yewen Pu♣Jie Fu♦†Zhouhan LinF

♠Alberta Machine Intelligence Institute, Dept. of Computing Science, University of Alberta

♥Alibaba Group FShanghai Jiao Tong University

♣Autodesk Research ♦Beijing Academy of Artiﬁcial Intelligence

ning.shi@ualberta.ca, {tangbin.tang,qiufu.yb,kaiyang.hlt}@alibaba-inc.com

yewen.pu@autodesk.com, fujie@baai.ac.cn, lin.zhouhan@gmail.com

Abstract

Text editing, such as grammatical error cor-

rection, arises naturally from imperfect textual

data. Recent works frame text editing as a

multi-round sequence tagging task, where op-

erations – such as insertion and substitution –

are represented as a sequence of tags. While

achieving good results, this encoding is lim-

ited in ﬂexibility as all actions are bound to

token-level tags. In this work, we reformulate

text editing as an imitation game using behav-

ioral cloning. Speciﬁcally, we convert conven-

tional sequence-to-sequence data into state-to-

action demonstrations, where the action space

can be as ﬂexible as needed. Instead of gen-

erating the actions one at a time, we introduce

a dual decoders structure to parallel the decod-

ing while retaining the dependencies between

action tokens, coupled with trajectory augmen-

tation to alleviate the distribution shift that im-

itation learning often suffers. In experiments

on a suite of Arithmetic Equation benchmarks,

our model consistently outperforms the autore-

gressive baselines in terms of performance, ef-

ﬁciency, and robustness. We hope our ﬁndings

will shed light on future studies in reinforce-

ment learning applying sequence-level action

generation to natural language processing.

1 Introduction

Text editing (Malmi et al.,2022) is an important

domain of processing tasks to edit the text in a

localized fashion, applying to text simpliﬁcation

(Agrawal et al.,2021), grammatical error correc-

tion (Li et al.,2022), punctuation restoration (Shi

et al.,2021), to name a few. Neural sequence-to-

sequence (seq2seq) framework (Sutskever et al.,

2014) establishes itself as the primary approach

to text editing tasks, by framing the problem as

machine translation (Wu et al.,2016). Applying

a seq2seq modeling has the advantage of simplic-

∗∗ Work was done at Alibaba Group.

†∗ Zhouhan Lin is the corresponding author.

1 1 2

Sequence

Tagging

Sequence

Generation

End-to-end

[INSERT, POS_1, +][INSERT_+, INSERT_=, KEEP]

Realization Environment

1 + 1 = 21 + 1 = 2 1 + 1 = 2

Token-level

Actions

Sequence-level

Actions

Figure 1: Three approaches – sequence tagging (left),

end-to-end (middle), sequence generation (right) – to

turn an invalid arithmetic expression “1 1 2” into a valid

one “1 + 1 = 2”. In end-to-end, the entire string “1 1 2”

is encoded into a latent state, which the string “1 + 1 =

2” is generated directly. In sequence tagging, a local-

ized action (such as “INSERT_+”, meaning insert a “+”

symbol after this token) is applied/tagged to each token;

these token-level actions are then executed, modifying

the input string. In contrast, sequence generation out-

put an entire action sequence, generating the location

(rather than tagging it), and the action sequence is ex-

ecuted, modifying the input string. Both token-level

actions and sequence-level actions can be applied mul-

tiple times to polish the text further (up to a ﬁxed point).

ity, where the system can simply be built by giv-

ing input-output pairs consisting of pathological

sequences to be edited, and the desired sequence

output, without much manual processing efforts

(Junczys-Dowmunt et al.,2018).

However, even with a copy mechanism (See

et al.,2017;Zhao et al.,2019;Panthaplackel et al.,

2021), an end-to-end model can struggle in car-

rying out localized, speciﬁc ﬁxes while keeping

the rest of the sequence intact. Thus, sequence

tagging is often found more appropriate when out-

puts highly overlap with inputs (Dong et al.,2019;

Mallinson et al.,2020;Stahlberg and Kumar,2020).

In such cases, a neural model predicts a tag se-

quence – representing localized ﬁxes such as in-

sertion and substitution – and a programmatic in-

terpreter implements these edit operations through.

Here, each tag represents a token-level action and

determines the operation on its attached token (Ko-

hita et al.,2020). A model can avoid modifying the

overlap by assigning no-op (e.g.,

KEEP

), while the

action space is limited to token-level modiﬁcations,

arXiv:2210.12276v1 [cs.CL] 21 Oct 2022

such as deletion or insertion after a token (Awasthi

et al.,2019;Malmi et al.,2019).

In contrast, alternative approaches (Gupta et al.,

2019) train the agent to explicitly generate free-

form edit actions and iteratively reconstructs the

text during the interaction with an environment

capable of altering the text based on these actions.

This sequence-level action generation (Branavan

et al.,2009;Guu et al.,2017;Elgohary et al.,2021)

allows higher ﬂexibility of action design not limited

to token-level actions, and is more advantageous

given the narrowed problem space and dynamic

context in the edit (Shi et al.,2020).

The mechanisms of sequence tagging and se-

quence generation against end-to-end are exem-

pliﬁed in Figure 1. Both methods allow multiple

rounds of sequence reﬁnement (Ge et al.,2018;Liu

et al.,2021) and imitation learning (IL) (Pomer-

leau,1991). Essentially an agent learns from the

demonstrations of an expert policy and later imi-

tates the memorized behavior to act independently

(Schaal,1996). On the one hand, IL in sequence

tagging functions as a standard supervised learning

in its nature and thus has attracted signiﬁcant inter-

est and been widely used recently (Agrawal et al.,

2021;Yao et al.,2021;Agrawal and Carpuat,2022),

achieving good results in the token-level action gen-

eration setting (Gu et al.,2019;Reid and Zhong,

2021). On the other hand, IL in sequence-level

action generation is less well deﬁned even though

its principle has been followed in text editing (Shi

et al.,2020) and many others (Chen et al.,2021).

As a major obstacle, the training is on state-action

demonstrations, where the encoding of the states

and actions can be very different (Gu et al.,2018).

For instance, the mismatch of the lengths dimen-

sion between the state and action makes it tricky

to implement for an auto-regressive modeling that

beneﬁts from a single, uniform representation.

To tackle the issues above, we reformulate

text editing as an imitation game controlled by a

Markov Decision Process (MDP). To begin with,

we deﬁne the input sequence as the initial state, the

required operations as action sequences, and the

output target sequence as the goal state. A learning

agent needs to imitate an expert policy, respond to

seen states with actions, and interact with the envi-

ronment until the success of the eventual editing.

To convert existing input-output data into state-

action pairs, we utilize trajectory generation (TG),

a skill to leverage dynamic programming (DP) for

an efﬁcient search of the minimum operations given

a predeﬁned edit metric. We backtrace explored

editing paths and automatically express operations

as action sequences. Regarding the length misalign-

ment, we ﬁrst take advantage of the ﬂexibility at

the sequence-level to ﬁx actions to be of the same

length. Secondly, we employ a linear layer after the

encoder to transform the length dimension of the

context matrix into the action length. By that, we

introduce a dual decoders (D2) structure that not

only parallels the decoding but also retains captur-

ing interdependencies among action tokens. Taking

a further step, we propose trajectory augmentation

(TA) as a solution to the distribution shift problem

most IL suffers (Ross et al.,2011). Through a suite

of three Arithmetic Equation (AE) benchmarks

(Shi et al.,2020), namely Arithmetic Operators

Restoration (AOR), Arithmetic Equation Simpliﬁ-

cation (AES), and Arithmetic Equation Correction

(AEC), we conﬁrm the superiority of our learning

paradigm. In particular, D2 consistently exceeds

standard autoregressive models from performance,

efﬁciency, and robustness perspectives.

In theory, our methods also apply to other imi-

tation learning scenarios where a reward function

exists to further promote the agent. In this work,

we primarily focus on a proof-of-concept of our

learning paradigm landing at supervised behavior

cloning (BC) in the context of text editing. To this

end, our contributions1are as follows:

We frame text editing into an imitation game

formally deﬁned as an MDP, allowing the high-

est degrees of ﬂexibility to design actions at the

sequence-level.

We involve TG to translate input-output data to

state-action demonstrations for IL.

We introduce D2, a novel non-autoregressive

decoder, boosting the learning in terms of accu-

racy, efﬁciency, and robustness.

We propose a corresponding TA technique to

mitigate distribution shift IL often suffers.

2 Imitation Game

We aim to cast text editing into an imitation game

by deﬁning the task as a recurrent sequence gener-

ation, as presented in Figure 2(a). In this section,

we describe the major components of our proposal,

including (1) the problem deﬁnition, (2) the data

translation, (3) the model structure, and (4) a solu-

tion to the distribution shift.

1Code and data are publicly available at GitHub.

[INSERT, POS_1, +] 1 + 1 2 [INSERT, POS_3, =] 1 + 1 = 2 [DONE, DONE, DONE]

a1s2a2s3a3

Action Environment

State

a s

1 1 2 1 + 1 = 2 1 + 1 = 2

1 1 2 Agent

Imitation Game

s'a*

Expert Action

1 1 2

1 + 1 2

1 1 2

[INSERT, POS_1, +]

1 1 = 2

1 1 2

1 + 1 = 2

1 + 1 2

[INSERT, POS_3, =]

[INSERT, POS_2, =]

Expert State

Execute Skip Update

(a)

Shifted State

* *

(b)

Figure 2: (a) shows the imitation game of AOR. Considering input text xas initial state s1, the agent interacts with

the environment to edit “112” into “1 + 1 = 2” via action a1to insert “+” at the ﬁrst position and a2to insert “=”

at the thrid position. After a3, the agent stops editing and calls the environment to return s3as the output text y.

Using the same example, (b) explains how to achieve shifted state s0

2by skipping action a∗

1and doing a0

2. Here we

update a∗

2to a0

2accordingly due to the previous skipping. The new state s0

2was not in the expert demonstrations.

2.1 Behavior cloning

We tear a text editing task

X 7→ Y

into recurrent

subtasks of sequence generation

S 7→ A

deﬁned

by an MDP tuple M= (S,A,P,E,R).

State S

is a set of text sequences

s=sj≤m

, where

s∈ VS

. We think of a source sequence

x∈ X

the initial state

, its target sequence

y∈ Y

as the

goal state

, and every edited sequence in between

as an intermediate state

. The path

x7→ y

can be

represented as a set of sequential states st≤T.

Action A

is a set of action sequences

a=ai≤n

where

a∈ VA

. In Figure 3, “INSERT”, “POS_3”,

and “=” are three action tokens belonging to the

vocabulary space of action

. In contrast to token-

level actions in sequence tagging, sentence-level

ones set free the editing by varying edit metrics

(e.g., Levenshtein distance) as long as

XAE

7−−→ Y

It serves as an expert policy

π∗

to demonstrate the

path to the goal state. A better expert usually means

better demonstrations and imitation results. Hence,

depending on the task, a suitable Eis essential.

Transition matrix P

models the probability

that

an action

leads a state

to the state

st+1

. We

know

∀s,a. p(st+1|st,at)=1

due to the nature of

text editing. So we can omit P.

Environment E

responds to an action and updates

the game state accordingly by

st+1 =E(st,at)

with process control. For example, the environ-

ment can refuse to execute actions that fail to pass

the veriﬁcation and terminate the game if a maxi-

mum number of iterations has been consumed.

Reward function R

calculates a reward for each

action. It is a major factor contributing to the suc-

cess of reinforcement learning. In the scope of this

paper, we focus on BC, the simplest form of IL. So

we can also omit Rand leave it for future work.

Algorithm 1 Trajectory Generation (TG)

Input:

Initial state

, goal state

, environment

, and edit

metric E.

Output: Trajectories τ.

1: τ← ∅

2: s←x

3: ops ←DP(x,y, E)

4: for op ∈ops do

5: a←Action(op).Translate operation to action

6: τ←τ∪[(s,a)]

7: s← E(s,a)

8: end for

9: τ←τ∪[(s,aT)] .

Append goal state and output action

10: return τ

The formulation turns out to be a simpliﬁed

MBC = (S,A,E)

. Interacting with the environ-

ment

, we hope a trained agent is able to follow

its learned policy

π:S 7→ A

, and iteratively edit

the initial state s0=xinto the goal state sT=y.

2.2 Trajectory generation

A data set to learn

X 7→ Y

consists of input-output

pairs. It is necessary to convert it into state-action

ones so that an agent can mimic the expert policy

π∗:S 7→ A

via supervised learning. A detailed

TG is described in Algorithm 1.

Treating a pre-deﬁned edit metric

as the expert

policy

π∗

, we can leverage DP to efﬁciently ﬁnd

the minimum operations required to convert

into

in a left-to-right manner and backtrace this path

to get speciﬁc operations.

Operations are later expressed as a set of se-

quential actions

a∗

t≤T

. Here we utilize a special

symbol

DONE

to mark the last action

a∗

where

∀a∈a∗

T. a =DONE

. Once an agent performs

a∗

, the current state is returned by the environment

as the ﬁnal output.

Given

s∗

1=x

, we attain the next state

s∗

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TextEditingasImitationGameNingShi~BinTang~BoYuan~LongtaoHuang~YewenPu|JieFu}yZhouhanLinFAlbertaMachineIntelligenceInstitute,Dept.ofComputingScience,UniversityofAlberta~AlibabaGroupFShanghaiJiaoTongUniversity|AutodeskResearch}BeijingAcademyofArticialIntelligencening.shi@ualberta.ca,{tangbin.tang,...

展开>> 收起<<

Text Editing as Imitation Game Ning ShiBin TangBo YuanLongtao Huang Yewen PuJie Fu yZhouhan LinF.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Text Editing as Imitation Game Ning ShiBin TangBo YuanLongtao Huang Yewen PuJie Fu yZhouhan LinF

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: