Revision for Concision A Constrained Paraphrase Generation Task Wenchuan Mu Kwan Hui Lim Singapore University of Technology and Design

2025-05-01 0 0 673.95KB 20 页 10玖币

侵权投诉

Revision for Concision: A Constrained Paraphrase Generation Task

Wenchuan Mu Kwan Hui Lim

Singapore University of Technology and Design

{wenchuan_mu,kwanhui_lim}@sutd.edu.sg

Abstract

Academic writing should be concise as con-

cise sentences better keep the readers’ atten-

tion and convey meaning clearly. Writing con-

cisely is challenging, for writers often strug-

gle to revise their drafts. We introduce and

formulate revising for concision as a natural

language processing task at the sentence level.

Revising for concision requires algorithms to

use only necessary words to rewrite a sentence

while preserving its meaning. The revised

sentence should be evaluated according to its

word choice, sentence structure, and organiza-

tion. The revised sentence also needs to ful-

ﬁl semantic retention and syntactic soundness.

To aide these efforts, we curate and make avail-

able a benchmark parallel dataset that can de-

pict revising for concision. The dataset con-

tains 536 pairs of sentences before and after re-

vising, and all pairs are collected from college

writing centres. We also present and evaluate

the approaches to this problem, which may as-

sist researchers in this area.

1 Introduction

Concision and clarity

are important in academic

writing as wordy sentences will obscure good ideas

(Figure 1). Concise writing encourages writers

to choose words deliberately and precisely, con-

struct sentences carefully to eliminate deadword,

and use grammar properly (Stanford University),

which often requires experience and time. A ﬁrst

draft often contains far more words than neces-

sary, and achieving concise writing requires revi-

sions (MON,2020). As far as we know, currently

this revision process can only be done manually, or

semi-manually with the help of some rule-based

wordiness detectors (Adam and Long,2013). We

therefore introduce and formulate revising for con-

cision as a natural language processing (NLP) task

We treat concision and conciseness as equivalent, and

clarity as part of concision

Wordy

Concise

Wordy

Concise

As you carefully read what you have written to

improve your wording and catch small errors of

spelling, punctuation, and so on, the thing to do

before you do anything else is to try to see

where a series of words expressing action could

replace the ideas found in nouns rather than

verbs.

For example, in the field of image recognition,

experimental results on some standard test sets

indicate that the recognition capabilities of deep

learning models can already reach the level of

human intelligence.

As you edit, first find nominalizations that you

can replace with verb phrases.

For example, in the field of image recognition,

test results show that deep learning models can

already reach human intelligence.

Figure 1: Wordy sentences are more boring to read

than concise sentences. But how do we turn lengthy

sentences into concise ones? We show two examples.

The above sentence pair is taken from the Purdue Writ-

ing Lab, which suggests how college students should

succinctly revise their writing (PU). In the other ex-

ample, the wordy sentence comes from a scientiﬁc pa-

per (Chen et al.,2020), and its concise counterpart

is predicted from the concise revisioner we developed

(Section 5). In each pair, text with the same colour de-

livers the same information.

and address it. In this study, we make the following

contributions:

1. We formulate the revising for concision NLP

task at the sentence level, which reﬂects the

revising task in academic writing. We also

survey the differences between this task and

sentence compression, paraphrasing, etc.

We release a corpus of 536 sentence pairs,

curated from 72 writing centres and addition-

ally coded with the various linguistic rules for

concise sentence revision.

We propose an gloss-based Seq2Seq approach

arXiv:2210.14257v1 [cs.CL] 25 Oct 2022

to this problem, and conduct automatic and

human evaluations. We observed promising

preliminary results and we believe that our

ﬁndings will be useful for researchers working

in this area.

2 Problem Statement

2.1 Revision as an English Writing Task

Concise writing itself is a lesson that is often em-

phasized in colleges, and revision is crucial in writ-

ing. The following deﬁnitions are helpful when we

set out to formulate the task.

Deﬁnition 2.1

(Concise)

Marked by brevity of

expression or statement: free from all elaboration

and superﬂuous detail (Merriam-Webster).

Deﬁnition 2.2

(Concise writing, English)

Writing

that is clear and does not include unnecessary or

vague/unclear words or language (UOA).

Revising for concision at paragraph level, or

even article level, may be the best practice. How-

ever, sentence-level revising usually sufﬁces. We

focus on revising for concision at the sentence level

now. Indeed, in many college academic writing tu-

torials, revisions for concision are for individual

sentences, and this process is deﬁned as follows.

Deﬁnition 2.3

(Revise for concision at the sentence

level, English

)

Study a sentence in draft, use

speciﬁc strategies

to edit the sentence concisely

without losing meaning.

If someone, such as a college student, wants

to concisely modify a sentence, speciﬁc strategies

(e.g.,delete weak modiﬁers, replace phrasal verbs

with single verbs, or rewrite in active voice, etc.)

tell us how to locate wordiness and how to edit

it (PU;WU;UALR;UNZ;MON,2020). The rule

is to repeatedly detect wordiness and revise it until

no wordiness is detected or it cannot be removed

without adding new wordiness. The ﬁnal product

serves as a concise version of the original sentence,

if it does not lose its meaning.

2.2 Task Deﬁnition in NLP

Now that we know how humans can revise a sen-

tence, what about programs? Each strategy is clear

to a trained college student, but not clear enough

to program in code. On the one hand, existing ver-

bosity detectors may suggest which part of a sen-

tence is too "dense" (Adam and Long,2013), but

Adapted from notes of PU Writing Lab and Rambo (2019)

Presented in Appendix (Table 4) as a periphery of this

study.

fail to expose ﬁne-grained wordiness details. On

the other hand, how programs can edit sentences

without losing their meaning remains challenging.

In short, no existing program can generate well-

modiﬁed sentences in terms of concision.

Eager for a program that revises sentences nicely

and concisely, we set out to formulate this modiﬁca-

tion process as a sequence-to-sequence (Seq2Seq)

NLP task. In this task, the input is any English sen-

tence and the output should be its concise version.

We deﬁne it as follows.

Deﬁnition 2.4

(Revise for concision at the sentence

level, NLP)

Produce a sentence where minimum

wordiness can be identiﬁed. (And,) the produced

sentence delivers the same information as input

does. (And,) the produced sentence is syntactically

correct.

As many other NLP tasks, e.g., machine transla-

tion, named-entity recognition, etc., Deﬁnition 2.4

describes the product (text) of a process, not the

process itself, i.e., how the text is produced. This

perspective is different from that of Deﬁnition 2.3.

Among the three components in Deﬁnition 2.4,

both the ﬁrst and the third are clear and self-

contained. They are related to syntax; hence, at

least human experts would think it straightforward

to determine the soundness of a sentence on both.

For example, the syntax correctness of an English

sentence will not be judged differently by different

experts, unless the syntax itself changes. Unfor-

tunately, the second component is neither clear

nor self-contained. This component asks for in-

formation retention, which is a rule inherited from

Deﬁnition 2.3. Determining the semantic similarity

between texts has long been challenging, even for

human experts (Rus et al.,2014).

We then clarify the deﬁnition by assuming that

combining the second and third components in Def-

inition 2.4 meet the deﬁnition of the paraphrase

generation task (Rus et al.,2014). Henceforth, Def-

inition 2.4 can be simpliﬁed to Deﬁnition 2.5.

Deﬁnition 2.5

(Revise for concision at the sentence

level, NLP, simpliﬁed)

Produce a paraphrase

where minimum wordiness can be identiﬁed.

The revising

task is well-deﬁned, as long as

"paraphrase generation" is well-deﬁned. It is a

paraphrase generation task with a syntactic con-

straint.

stands for (machine) revising for concision if not other-

wise speciﬁed, so does revision

2.3 Task Performance Indicator

How does one approximately measure revision per-

formance? In principle, Deﬁnition 2.4 should be

used as a checklist. A good sample requires cor-

rect grammar (

), complete information (

) and

reduced wordiness (

1−ω

), assuming each com-

ponent as a ﬂoat number between 0 and 1. The

overall assessment (

) of the three components is

as follows,

χ=α2·(γ−1) + α·(ρ−1) + (1 −ω),(1)

where

α∈R>1

is a large enough number, as we

believe that

and

overweigh

1−ω

. Intuitively, if

a revised sentence does not paraphrase the original

one, assessing the reduction of wordiness makes

little sense. Concision

would always be negative

if γ < 1or ρ < 1.

Corresponding to the three components is a mix

of three tasks, including grammatical error correc-

tion for

, textual semantic similarity for

, and

wordiness detection for

. Unfortunately, both a

reference-free metric good enough to characterize

the paraphrase and a robust wordiness detector are

rare. Therefore, such assessment of concision is

now only feasible through human evaluation.

To enable automatic evaluation for faster feed-

back, we currently follow Papineni’s viewpoint (Pa-

pineni et al.,2002). The closer a machine revision

is to a professional human revision, the better it

is. To judge the quality of a machine revision, one

measures its closeness to one or more reference

human revisions according to a numerical metric.

Thus, our revising evaluation system requires two

main components:

1. A numerical "revision closeness" metric.

A corpus of good quality human reference

revisions.

Different from days when Papineni needed to

propose a closeness metric, we can adopt various

metrics from machine translation and summariza-

tion community (Lin,2004;Banerjee and Lavie,

2005). Since it is certain which criterion corre-

lates best, we take multiple relevant and reasonable

metrics into account to estimate quality of revi-

sion. These metrics include those measuring higher

order n-grams precision (BLEU, Papineni et al.,

2002), explicit Word-matching, stem-matching,

or synonym-matching (METEOR, Banerjee and

Lavie,2005), surface bigram units overlapping

(ROUGE-2-F1, Lin,2004), cosine similarity be-

tween matched contextual words embeddings

(BERTScore-F1, Zhang et al.,2020b), edit distance

with single-word insertion, deletion, or replace-

ment (word error rate, Su et al.,1992), edit dis-

tance with block insertion, deletion, or replacement

(translation edit rate, Snover et al.,2006), and ex-

plicit goodness of words editing against reference

and source (SARI, Xu et al.,2016). In short, BLEU,

METEOR, ROUGE-2-F1, SARI, word error rate

and translation edit rate estimate sentence well-

formedness lexically; METEOR and BERTScore-

F1 consider semantic equivalence. Comparing

grammatical relations found in prediction with

those found in references can also measure seman-

tic similarity (Clarke and Lapata,2006b;Riezler

et al.,2003;Toutanova et al.,2016). Grammatical

relations are extracted from dependency parsing,

and F1 scores can then be used to measure overlap.

In contrast, the lack of good parallel corpus im-

pedes (machine) revising for concision. To address

this limitation, we curate and make available such

a corpus as benchmark. Each sample in the cor-

pus contains a wordy sentence, and at least one

sentence revised for concision. Samples are from

English writing centres of 57 universities, ten col-

leges, four community colleges, and a postgraduate

school.

3 Related Work

Manual revision operations include delete, replace,

and rewrite. Intuitively, a revising program should

do similar jobs, too. In fact, these actions are imple-

mented individually in various NLP tasks. For ex-

ample, sentence compression requires programs to

delete unnecessary words, and paraphrasing itself

is a matter of replacement. Machine revision for

concision could also share traits with them. Prac-

tically, when a neural model learns in a Seq2Seq

manner, the difference among these tasks is the

parallel dataset. We are also interested in whether

programs developed for these tasks can work in

machine revision.

3.1 Deleting as in Sentence Compression

When revising, deleting redundant words is com-

mon. For example, we can revise "research is

increasing in the ﬁeld of nutrition and food sci-

ence" to "research is increasing in nutrition and

food science" (URI,2019), simply by deleting "the

ﬁeld of ". Deleting is canonical in sentence com-

pression, a task aiming to reduce sentence length

from source sentences while retaining basic mean-

ing (Jing,2000;Knight and Marcu,2000;McDon-

ald,2006). For example, the compression task

has been formulated as integer linear programming

optimization using syntactic trees (Clarke and La-

pata,2006a), or as a sequence labelling optimiza-

tion problem using the recurrent neural networks

(RNN) (Filippova et al.,2015;Klerke et al.,2016;

Kamigaito et al.,2018). They explicitly or im-

plicitly use dependency grammar. Pre-trained lan-

guage models such as ELMo (Peters et al.,2018)

and BERT (Devlin et al.,2019) can encode fea-

tures apart from dependency parsing (Kamigaito

and Okumura,2020), bringing prediction and ref-

erence sentences closer.

All methods rely on parallel datasets labelling

parts to be deleted. However, the deleting part

in sentence compression differs from that in revi-

sion. Filippova and Altun (2013) created Google

dataset from titles and ﬁrst sentence of news arti-

cles. The information retained in the ﬁrst sentence

depends on the title. While this creation is useful

for reducing excessive information, the deleted part

is probably not wordiness.

Deleting does not solve everything in revision.

We can revise "in this report I will conduct a study

of ants and the setup of their colonies" to "in this

report I will study ants and their colonies", tak-

ing advantage of noun-and-verb homograph. How-

ever, a more concise version "this report stud-

ies ants" (Commnet) requires changing "study" to

third-person singular.

3.2 Replacing as in Paraphrase Generation

Word choice matters as well, thus we revise by

paraphrasing to stronger words. Paraphrase gen-

eration changes a sentence grammatically and re-

selects words, while retaining meaning. Paraphras-

ing matters in academic writing, for it helps avoid

plagiarism. Rule-based or statistical machine para-

phrasing substitutes words by ﬁnding synonyms

from lexical databases, and decodes syntax accord-

ing to template sentences. This rigid method may

undermine creativity (Bui et al.,2021). Pre-trained

neural language models like GPT (Radford et al.,

2019) or BART (Lewis et al.,2020) paraphrase

more accurately (Hegde and Patil,2020). Through

paraphrasing, we can replace verb phrase "con-

duct a study" to verb "study" in the example above,

rather than delete and rely on noun-and-verb homo-

graphs to keep the sentence syntactically correct.

Machine revision is a kind of paraphrase gen-

eration, and vice versa is not true. Current para-

phrase generation does not require concision in gen-

erated sentences. Automatically annotated datasets

for paraphrasing include ParaNMT (Wieting and

Gimpel,2018), Twitter (Lan et al.,2017), or re-

purposed noisy datasets such as MSCOCO (Lin

et al.,2014) and WikiAnswers (Fader et al.,2013).

We may adapt paraphrase parallel datasets to train

revising models, as investigated in Section 5.

3.3 Other related tasks

Summarization produces a shorter text of one or

several documents, while retaining most of mean-

ing (Paulus et al.,2018). This is similar to sen-

tence compression. In practice, summarization

welcomes novel words, allows specifying output

length (Kikuchi et al.,2016), and removes much

more information than sentence compression does.

Datasets include XSum (Narayan et al.,2018) , CN-

N/DM (Hermann et al.,2015), WikiHow (Koupaee

and Wang,2018), NYT (Sandhaus,2008), DUC-

2004 (Over et al.,2007), and Gigaword (Rush et al.,

2015), where summaries are generally shorter than

one-tenth of documents. On the other hand, sen-

tence summarization (Chopra et al.,2016) uses

summarization methods on sentence compression

datasets, retaining more information and possibly

generating new words.

Text simpliﬁcation modiﬁes vocabulary and syn-

tax for easier reading, while retaining approxi-

mate meaning (Omelianchuk et al.,2021). Hand-

crafted syntactic rules (Siddharthan,2006;Car-

roll et al.,1999;Chandrasekar et al.,1996) and

aligned sentences-driven simpliﬁcation (Yatskar

et al.,2010) have been explored. Corpora such

as Turk (Xu et al.,2016) and PWKP (Zhu et al.,

2010) are compiled from Wikipedia and Simple

English Wikipedia (Coster and Kauchak,2011).

Rules for simpliﬁcation may deviate from that for

revision, e.g., text simpliﬁcation sometimes encour-

ages prepositional phrases (Xu et al.,2016). Still,

adapting these approaches may beneﬁt academic

revising for concision.

Fluency editing (Napoles et al.,2017) not only

corrects grammatical errors but paraphrases text to

be more native sounding as well. Its paraphrasing

section is constrained such that outputs represent a

higher level of English proﬁciency than inputs. As

a constrained paraphrase task, ﬂuency editing may

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RevisionforConcision:AConstrainedParaphraseGenerationTaskWenchuanMuKwanHuiLimSingaporeUniversityofTechnologyandDesign{wenchuan_mu,kwanhui_lim}@sutd.edu.sgAbstractAcademicwritingshouldbeconciseascon-cisesentencesbetterkeepthereaders'atten-tionandconveymeaningclearly.Writingcon-ciselyischallenging,for...

展开>> 收起<<

Revision for Concision A Constrained Paraphrase Generation Task Wenchuan Mu Kwan Hui Lim Singapore University of Technology and Design.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Revision for Concision A Constrained Paraphrase Generation Task Wenchuan Mu Kwan Hui Lim Singapore University of Technology and Design

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: