Counterfactual Recipe Generation Exploring Compositional Generalization in a Realistic Scenario Xiao Liu1 Yansong Feng12 Jizhi Tang3 Chengang Hu1and Dongyan Zhao145

2025-05-06 0 0 3.7MB 17 页 10玖币

侵权投诉

Counterfactual Recipe Generation: Exploring Compositional

Generalization in a Realistic Scenario

Xiao Liu1, Yansong Feng1,2∗, Jizhi Tang3, Chengang Hu1and Dongyan Zhao1,4,5

1Wangxuan Institute of Computer Technology, Peking University

2The MOE Key Laboratory of Computational Linguistics, Peking University

3Baidu Inc., Beijing, China

4Beijing Institute for General Artiﬁcial Intelligence

5State Key Laboratory of Media Convergence Production Technology and Systems

{lxlisa,fengyansong,hcg,zhaody}@pku.edu.cn

tangjizhi@baidu.com

Abstract

People can acquire knowledge in an unsuper-

vised manner by reading, and compose the

knowledge to make novel combinations. In

this paper, we investigate whether pretrained

language models can perform compositional

generalization in a realistic setting: recipe gen-

eration. We design the counterfactual recipe

generation task, which asks models to mod-

ify a base recipe according to the change of

an ingredient. This task requires composi-

tional generalization at two levels: the sur-

face level of incorporating the new ingredi-

ent into the base recipe, and the deeper level

of adjusting actions related to the changing

ingredient. We collect a large-scale recipe

dataset in Chinese for models to learn culi-

nary knowledge, and a subset of action-level

ﬁne-grained annotations for evaluation. We

ﬁnetune pretrained language models on the

recipe corpus, and use unsupervised counter-

factual generation methods to generate modi-

ﬁed recipes. Results show that existing mod-

els have difﬁculties in modifying the ingredi-

ents while preserving the original text style,

and often miss actions that need to be ad-

justed. Although pretrained language mod-

els can generate ﬂuent recipe texts, they fail

to truly learn and use the culinary knowledge

in a compositional way. Code and data are

available at https://github.com/xxxiaol/

counterfactual-recipe-generation.

1 Introduction

Reading is an effective way to gain knowledge.

When people read, mental processes like structured

information extraction and rule discovery go on in

our brains (Gibson and Levin,1975). In the case

of cooking, we read recipes of various dishes, gain

knowledge of ingredients and ﬂavors, and compose

∗Corresponding author.

Red-braised: add soy sauce, simmer

over gentle heat, …

Crucian Carp: make diagonal cuts on

each side of the fish, …

……

Red-braised

Crucian Carp

Red-braised

Pork

1. …

2. …

Crucian Carp

Soup

1. …

2. …

Red-braised

Beef Brisket

1. …

2. …

Pan-fried

Crucian Carp

1. …

2. …

Red-braised

Pork

1. …

2. …

Crucian Carp

Soup

1. …

2. …

Red-braised

Beef Brisket

1. …

2. …

Pan-fried

Crucian Carp

1. …

2. …

Figure 1: An example of the recipe learning process of

human for the dish red-braised crucian carp.

them to cook other dishes. 1

This process involves knowledge acquisition and

composition. As shown in Figure 1, when people

read recipes, they distill the knowledge of ﬂavors

and ingredients, like soy sauce is usually used to get

red-braised ﬂavor and people often make diagonal

cuts on ﬁsh to better marinate. People can then

cook new dishes like red-braised crucian carp by

composing existing knowledge about how to form

the red-braised ﬂavor and how to cook ﬁsh.

We expect models to acquire culinary knowl-

edge unsupervisedly, and be able to use the knowl-

edge skillfully, e.g., composing new dishes. Cur-

rent recipe processing tasks do not examine this

We provide explanations of concepts related to recipes in

Table 1for better understanding.

arXiv:2210.11431v1 [cs.CL] 20 Oct 2022

Red-braised Pork

1. Wash the pork.

2. Blanch the

pork in a pot.

3. …

What if we replace

the ingredient pork

to crucian carp?

Red-braised

Crucian Carp Level 1:

ingredient

Red-braised

Crucian Carp

1. Wash the crucian

carp.

2. …

Red-braised

Crucian Carp

1. Wash the crucian

carp.

2. …

Level 2:

ingredient-

related actions

Red-braised

Crucian Carp

1. …

2. Blanch the pork

in a pot.

2. Make diagonal

cuts on the fish.

Red-braised

Crucian Carp

1. …

2. Blanch the pork

in a pot.

2. Make diagonal

cuts on the fish.

Figure 2: The counterfactual recipe generation task and the two levels of compositional competencies examined.

ability explicitly. Recipe understanding tasks usu-

ally evaluate the models in a speciﬁc aspect under

supervision, like identifying ingredient states or

relationship between actions. Recipe generation

tasks evaluate whether models can generate ﬂuent

recipes, but do not investigate whether the gener-

ation ability relies on simple word correlation or

culinary knowledge composition.

Current models exhibit strong compositional

generalization ability in the understanding of syn-

thetic texts (Lake,2019;Nye et al.,2020;Weißen-

horn et al.,2022), but few of them conduct experi-

ments on realistic data, which is more challenging

in two ways: (1) there are far many synonymous ex-

pressions in real-world texts, like various mentions

of ingredients and actions in recipes (Fang et al.,

2022), but synthetic data often use a single expres-

sion for one meaning (Keysers et al.,2019); (2)

the knowledge in synthetic data is often expressed

accurately and clearly, but knowledge in data pro-

vided by real users is varying. For example, when

forming the red-braised ﬂavor, most people use soy

sauce while few use iced sugar instead. Moreover,

previous compositional generalization tasks mainly

focus on semantic parsing and language grounding,

while we aim to examine models in the form of

natural language generation.

We propose the

Counterfactual Recipe Gener-

ation

task, to examine models’ compositional gen-

eralization ability in understanding and generating

recipe texts. This task requires models to answer

the question: given a recipe of a dish, how will

the recipe change if we replace or add one ingre-

dient? As shown in Figure 2, models are asked to

modify the base recipe of red-braised pork to form

a new recipe of red-braised crucian carp while

preserving the original cooking and writing style.

We develop baseline methods to solve the task,

as existing compositional generalization models

cannot ﬁt in the counterfactual generation task form.

We ﬁnetune pretrained language models (PLMs)

on our collected Chinese recipe corpus to learn culi-

nary knowledge, and use prevalent unsupervised

counterfactual generation frameworks to generate

counterfactual recipes given {base dish, base recipe,

target dish}, where the target dish differs from the

base dish in a main ingredient.

Instead of annotating gold target recipes, we eval-

uate models in two levels of compositional compe-

tencies, which is less labor-consuming. The surface

level (L1) is the fusion of the

changing ingredient

and the base recipe: the new ingredient should be

added to the recipe, and the replaced ingredient

should be removed. For instance, the original step

wash the pork needs to be changed to wash the

crucian carp. We evaluate the ingredient coverage

ratio of the added and replaced ingredient, and the

degree of recipe modiﬁcation. Results show that

existing PLMs can hardly cover the new ingredient

and delete the replaced ingredient without making

unnecessary modiﬁcations to the base recipe.

The deeper level (L2) is the fusion of

actions

related to the changing ingredient

and the base

recipe: actions to process the new ingredient should

be inserted, and actions only related to the replaced

ingredient should be deleted. This is a much harder

problem involving decompositions and composi-

tions of actions. In the case of Figure 2, a model

should know blanch is related to pork and is not

suitable for a crucian carp, in order to delete the

action blanch the pork. Also, the action make diag-

onal cuts is widely used in ﬁsh dishes, and should

be added to the recipe. Our action-level evaluations

show that most models fail to either remove all the

irrelevant actions, or insert all the necessary ones,

indicating that current PLMs have not fully learned

the patterns of ingredient processing.

Our main contributions are as follows: 1) We

propose the counterfactual recipe generation task

to test models’ compositional generalization ability

in a realistic scenario. 2) We collect a large-scale

Chinese recipe dataset and build a counterfactual

recipe generation testbed with ﬁne-grained action-

level annotations, which can also increase the re-

search diversity for procedural text understanding.

3) We examine models’ compositional generaliza-

tion ability from two levels. Our experiments show

current PLMs are unable to modify the ingredient

and preserve the original text style simultaneously,

and will miss actions related to the changing in-

gredient that need to be adjusted. Further analysis

reveals that current models are still far from human

experts in the deep understanding of procedural

texts, like tracking entities and learning implicit

patterns.

2 Related Work

2.1 Recipe Processing

Recipes are a common type of procedural texts,

describing the actions a chef needs to perform to

cook a speciﬁc dish. Recipe comprehension tasks

include entity state tracking (Bosselut et al.,2018),

recipe structure extraction (Kiddon et al.,2015;Do-

natelli et al.,2021), and anaphora resolution (Fang

et al.,2022). These tasks involve the understand-

ing of the relationship between ingredients and ac-

tions and test models’ abilities under ﬁne-grained

supervision. In contrast, we evaluate whether mod-

els can understand the recipes and compose them

unsupervisedly. Recipe generation tasks ask mod-

els to create recipes from a given title. Kiddon

et al. (2016); H. Lee et al. (2020) provide an in-

gredient list, Majumder et al. (2019) add user’s

historical preference into consideration, and Sakib

et al. (2021) generate recipes from an action graph.

Li et al. (2021) introduce the recipe editing task,

which expects models to edit a base recipe to meet

dietary constraints; and Antognini et al. (2022) it-

eratively rewrite recipes to satisfy users’ feedback.

Our task is similar to the recipe editing tasks in

including a base recipe in the input, but we go be-

yond simple ingredient substitution in recipes, and

also evaluate whether the actions associated with

the ingredients are alternated by models.

2.2 Compositional Generalization

Compositional generalization is the ability to un-

derstand and produce novel combinations of previ-

ously seen components and constructions (Chom-

sky,1956). To measure the compositional general-

ization ability of models, a line of research (Lake

and Baroni,2018;Ruis et al.,2020) designs tasks

that map sentences into action sequences, and splits

the data into training and testing sets from differ-

ent data distributions, e.g., according to different

lengths or primitive commands. Motivated by these

works, we divide our data into ﬁnetuning corpus

and test set based on ﬂavors and ingredients, which

are basic components of a dish. Other composi-

tional generalization works conduct experiments

on semantic parsing (Keysers et al.,2019;Kim and

Linzen,2020) and language grounding (Johnson

et al.,2017). To the best of our knowledge, our task

is the ﬁrst compositional generalization task in the

form of natural language generation.

Our task mainly differs from previous ones in

measuring compositional generalization in a real-

istic setting, where all texts are natural rather than

synthetic. The variation in natural language, both

the variation of expressions and the variation of

knowledge provided by different users, brings ex-

tra challenges. Shaw et al. (2021) also address the

challenge of natural language variation in compo-

sitional generalization, but they experiment on se-

mantic parsing, where the knowledge is highly con-

sistent and can be inducted with grammar rules.

3 Task Deﬁnition

We formulate our task in the form of counterfactual

generation (Qin et al.,2019):

p(y0|y,xy,x0),(1)

where

in the base recipe,

is the ingredient set

is the adjusted ingredient set that replaces

or adds one main ingredient, and

is the target

recipe to generate. In Figure 2’s example,

is the

base recipe of red-braised pork, and

differs from

xyin changing pork to crucian carp.

4 Data Preparation

4.1 The XIACHUFANG Recipe Dataset

We collect a novel Chinese dataset XIACHUFANG

of 1,550,151 recipes from

xiachufang.com

, a

popular Chinese recipe sharing website. Com-

pared to the commonly used English recipe dataset

Recipe1M+, XIACHUFANG contains 1.5 times the

recipes. The website provides a list of common

dishes. We map the recipe titles to these dishes, and

ﬁnd 1,242,206 recipes belonging to 30,060 dishes.

A dish has 41.3 recipes on average. The average

length of a recipe is 224 characters. Organizing

recipes in terms of dishes helps us to learn what

different people have in common when cooking

this dish, which are often the necessary actions.

We select 50 dish pairs <

(base dish),

(tar-

get dish)> for evaluation. The two dishes share

Word Deﬁnition Examples

Dish Food prepared in a particular way. Red-braised pork, Kung Pao chicken

Recipe Instructions for preparing a dish.

Ingredient Part of the foods that are combined to make a dish. Pork, Chicken

Flavor The taste expression of a dish. Red-braised, Scorched chile (the ﬂavor of Kung Pao)

Action An event described in the recipe, centering on a verb. Wash the pork, Blanch the pork

Table 1: Key concepts we used when analyzing recipes.

Base Dish Target Dish

Spicy Crayﬁsh Spicy Chicken Feet

Kung Pao Chicken Kung Pao Shrimp Balls

Stir-fried Baby Cabbage Stir-fried Loofah

Cold Tofu Cold Tofu with Century Egg

Fried Beef Fried Beef with Carrot

Table 2: Examples of dish pairs used in evaluation.

the same ﬂavor and differ in one principal ingre-

dient in the dish name.

We randomly sample 50

recipes of each base dish as the base recipes, and

form 2,500 evaluation instances in total. By letting

models modify different recipes for the same dish,

we better measure whether models have actually

learned the culinary knowledge and can apply the

knowledge ﬂexibly.

Table 2shows several dish pairs used in our

evaluation, and the full list is in Appendix A.1.

In the ﬁrst three lines of Table 2, the target dish

replaces one ingredient of the base dish; and in

the last two lines, one ingredient is added to the

target dish. The latter situation is rare in the dataset

(only 8%), but it presents additional challenges, as

simply substitution in recipes does not work, and

models have to generate reasonable actions to add

the new ingredient in the right place.

We regard the recipes that do not belong to the

50 dish pairs as the recipe corpus. The corpus size

is 1,479,764. We expect models to learn cooking

knowledge unsupervisedly from this corpus.

The chosen dish pairs meet the following criteria:

they are common in Chinese cuisine; recipes of the

dishes have not been seen by the PLMs we used;

and models have the opportunity to learn about the

ingredients and ﬂavors from the recipe corpus. For

example, for the ingredient crayﬁsh, models can

learn how to process it from recipes of stir-fried

crayﬁsh,garlic crayﬁsh,chilled crayﬁsh, etc. De-

tails of the selection criteria are in Appendix A.1.

Some auxiliary ingredients may also change accordingly,

but we only focus on the changes directly associated with the

principal ingredient changed in the dish name.

4.2 Pivot Actions

Pivot actions are actions that differ between a dish

pair, like blanch and make diagonal cuts in the case

of <red-braised pork, red-braised crucian carp>.

Since there is no gold standard for the modiﬁed

recipes, we evaluate the quality of the modiﬁed

recipes by collecting the pivot actions. For the dish

pair <

>, the pivot action set,

, contains both

actions to remove,

, and actions to insert,

are actions that may appear in the recipes of

but are not appropriate for

;

are actions

that are not needed for dbbut should appear in the

recipes of

. In the example of Figure 2,blanch

the pork belongs to

and make diagonal cuts

belongs to PI.

It is hard to ask annotators to write pivot ac-

tions from scratch, and checking all the actions in

the recipes is inefﬁcient. We observe that actions

that frequently occur in the recipes of a dish are

more likely to be necessary for the dish. Taking

advantage of the abundant recipes of each dish in

XIACHUFANG, we categorize actions based on fre-

quency, and ask annotators to annotate the vague

ones. Figure 3shows the semi-automatic pivot

action collection workﬂow.

Recipe Parsing.

We ﬁrst parse the recipe into a

list of actions. An action

(v, igs, tools)

is centered

on a verb, often accompanied by ingredients and

cooking tools. The deﬁnition is consistent with

previous recipe processing works (Donatelli et al.,

2021), and the parsing details are in Appendix A.2.

Pilot Study.

We conduct a pilot annotation to

verify the validity of categorizing actions according

to frequency. We randomly select 10 dish pairs to

check whether annotators agree with the automatic

categorizing results.

For each action

, we calculate its frequencies,

and

, according to its appearances in the

recipes of

and

, respectively, and categorize it

as:

(actions to remove), if

fb> α ×ft

and

ft< τR

(here

is a coefﬁcient greater than 1), the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CounterfactualRecipeGeneration:ExploringCompositionalGeneralizationinaRealisticScenarioXiaoLiu1,YansongFeng1;2,JizhiTang3,ChengangHu1andDongyanZhao1;4;51WangxuanInstituteofComputerTechnology,PekingUniversity2TheMOEKeyLaboratoryofComputationalLinguistics,PekingUniversity3BaiduInc.,Beijing,China4Beij...

展开>> 收起<<

Counterfactual Recipe Generation Exploring Compositional Generalization in a Realistic Scenario Xiao Liu1 Yansong Feng12 Jizhi Tang3 Chengang Hu1and Dongyan Zhao145.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Counterfactual Recipe Generation Exploring Compositional Generalization in a Realistic Scenario Xiao Liu1 Yansong Feng12 Jizhi Tang3 Chengang Hu1and Dongyan Zhao145

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: