Counterfactual Recipe Generation Exploring Compositional Generalization in a Realistic Scenario Xiao Liu1 Yansong Feng12 Jizhi Tang3 Chengang Hu1and Dongyan Zhao145

2025-05-06 0 0 3.7MB 17 页 10玖币
侵权投诉
Counterfactual Recipe Generation: Exploring Compositional
Generalization in a Realistic Scenario
Xiao Liu1, Yansong Feng1,2, Jizhi Tang3, Chengang Hu1and Dongyan Zhao1,4,5
1Wangxuan Institute of Computer Technology, Peking University
2The MOE Key Laboratory of Computational Linguistics, Peking University
3Baidu Inc., Beijing, China
4Beijing Institute for General Artificial Intelligence
5State Key Laboratory of Media Convergence Production Technology and Systems
{lxlisa,fengyansong,hcg,zhaody}@pku.edu.cn
tangjizhi@baidu.com
Abstract
People can acquire knowledge in an unsuper-
vised manner by reading, and compose the
knowledge to make novel combinations. In
this paper, we investigate whether pretrained
language models can perform compositional
generalization in a realistic setting: recipe gen-
eration. We design the counterfactual recipe
generation task, which asks models to mod-
ify a base recipe according to the change of
an ingredient. This task requires composi-
tional generalization at two levels: the sur-
face level of incorporating the new ingredi-
ent into the base recipe, and the deeper level
of adjusting actions related to the changing
ingredient. We collect a large-scale recipe
dataset in Chinese for models to learn culi-
nary knowledge, and a subset of action-level
fine-grained annotations for evaluation. We
finetune pretrained language models on the
recipe corpus, and use unsupervised counter-
factual generation methods to generate modi-
fied recipes. Results show that existing mod-
els have difficulties in modifying the ingredi-
ents while preserving the original text style,
and often miss actions that need to be ad-
justed. Although pretrained language mod-
els can generate fluent recipe texts, they fail
to truly learn and use the culinary knowledge
in a compositional way. Code and data are
available at https://github.com/xxxiaol/
counterfactual-recipe-generation.
1 Introduction
Reading is an effective way to gain knowledge.
When people read, mental processes like structured
information extraction and rule discovery go on in
our brains (Gibson and Levin,1975). In the case
of cooking, we read recipes of various dishes, gain
knowledge of ingredients and flavors, and compose
Corresponding author.
Red-braised: add soy sauce, simmer
over gentle heat,
Crucian Carp: make diagonal cuts on
each side of the fish,
……
Red-braised
Crucian Carp
Red-braised
Pork
1.
2.
Crucian Carp
Soup
1.
2.
Red-braised
Beef Brisket
1.
2.
Pan-fried
Crucian Carp
1.
2.
Red-braised
Pork
1.
2.
Crucian Carp
Soup
1.
2.
Red-braised
Beef Brisket
1.
2.
Pan-fried
Crucian Carp
1.
2.
Figure 1: An example of the recipe learning process of
human for the dish red-braised crucian carp.
them to cook other dishes. 1
This process involves knowledge acquisition and
composition. As shown in Figure 1, when people
read recipes, they distill the knowledge of flavors
and ingredients, like soy sauce is usually used to get
red-braised flavor and people often make diagonal
cuts on fish to better marinate. People can then
cook new dishes like red-braised crucian carp by
composing existing knowledge about how to form
the red-braised flavor and how to cook fish.
We expect models to acquire culinary knowl-
edge unsupervisedly, and be able to use the knowl-
edge skillfully, e.g., composing new dishes. Cur-
rent recipe processing tasks do not examine this
1
We provide explanations of concepts related to recipes in
Table 1for better understanding.
arXiv:2210.11431v1 [cs.CL] 20 Oct 2022
Red-braised Pork
1. Wash the pork.
2. Blanch the
pork in a pot.
3.
What if we replace
the ingredient pork
to crucian carp?
Red-braised
Crucian Carp Level 1:
ingredient
Red-braised
Crucian Carp
1. Wash the crucian
carp.
2.
Red-braised
Crucian Carp
1. Wash the crucian
carp.
2.
Level 2:
ingredient-
related actions
Red-braised
Crucian Carp
1.
2. Blanch the pork
in a pot.
2. Make diagonal
cuts on the fish.
Red-braised
Crucian Carp
1.
2. Blanch the pork
in a pot.
2. Make diagonal
cuts on the fish.
Figure 2: The counterfactual recipe generation task and the two levels of compositional competencies examined.
ability explicitly. Recipe understanding tasks usu-
ally evaluate the models in a specific aspect under
supervision, like identifying ingredient states or
relationship between actions. Recipe generation
tasks evaluate whether models can generate fluent
recipes, but do not investigate whether the gener-
ation ability relies on simple word correlation or
culinary knowledge composition.
Current models exhibit strong compositional
generalization ability in the understanding of syn-
thetic texts (Lake,2019;Nye et al.,2020;Weißen-
horn et al.,2022), but few of them conduct experi-
ments on realistic data, which is more challenging
in two ways: (1) there are far many synonymous ex-
pressions in real-world texts, like various mentions
of ingredients and actions in recipes (Fang et al.,
2022), but synthetic data often use a single expres-
sion for one meaning (Keysers et al.,2019); (2)
the knowledge in synthetic data is often expressed
accurately and clearly, but knowledge in data pro-
vided by real users is varying. For example, when
forming the red-braised flavor, most people use soy
sauce while few use iced sugar instead. Moreover,
previous compositional generalization tasks mainly
focus on semantic parsing and language grounding,
while we aim to examine models in the form of
natural language generation.
We propose the
Counterfactual Recipe Gener-
ation
task, to examine models’ compositional gen-
eralization ability in understanding and generating
recipe texts. This task requires models to answer
the question: given a recipe of a dish, how will
the recipe change if we replace or add one ingre-
dient? As shown in Figure 2, models are asked to
modify the base recipe of red-braised pork to form
a new recipe of red-braised crucian carp while
preserving the original cooking and writing style.
We develop baseline methods to solve the task,
as existing compositional generalization models
cannot fit in the counterfactual generation task form.
We finetune pretrained language models (PLMs)
on our collected Chinese recipe corpus to learn culi-
nary knowledge, and use prevalent unsupervised
counterfactual generation frameworks to generate
counterfactual recipes given {base dish, base recipe,
target dish}, where the target dish differs from the
base dish in a main ingredient.
Instead of annotating gold target recipes, we eval-
uate models in two levels of compositional compe-
tencies, which is less labor-consuming. The surface
level (L1) is the fusion of the
changing ingredient
and the base recipe: the new ingredient should be
added to the recipe, and the replaced ingredient
should be removed. For instance, the original step
wash the pork needs to be changed to wash the
crucian carp. We evaluate the ingredient coverage
ratio of the added and replaced ingredient, and the
degree of recipe modification. Results show that
existing PLMs can hardly cover the new ingredient
and delete the replaced ingredient without making
unnecessary modifications to the base recipe.
The deeper level (L2) is the fusion of
actions
related to the changing ingredient
and the base
recipe: actions to process the new ingredient should
be inserted, and actions only related to the replaced
ingredient should be deleted. This is a much harder
problem involving decompositions and composi-
tions of actions. In the case of Figure 2, a model
should know blanch is related to pork and is not
suitable for a crucian carp, in order to delete the
action blanch the pork. Also, the action make diag-
onal cuts is widely used in fish dishes, and should
be added to the recipe. Our action-level evaluations
show that most models fail to either remove all the
irrelevant actions, or insert all the necessary ones,
indicating that current PLMs have not fully learned
the patterns of ingredient processing.
Our main contributions are as follows: 1) We
propose the counterfactual recipe generation task
to test models’ compositional generalization ability
in a realistic scenario. 2) We collect a large-scale
Chinese recipe dataset and build a counterfactual
recipe generation testbed with fine-grained action-
level annotations, which can also increase the re-
search diversity for procedural text understanding.
3) We examine models’ compositional generaliza-
tion ability from two levels. Our experiments show
current PLMs are unable to modify the ingredient
and preserve the original text style simultaneously,
and will miss actions related to the changing in-
gredient that need to be adjusted. Further analysis
reveals that current models are still far from human
experts in the deep understanding of procedural
texts, like tracking entities and learning implicit
patterns.
2 Related Work
2.1 Recipe Processing
Recipes are a common type of procedural texts,
describing the actions a chef needs to perform to
cook a specific dish. Recipe comprehension tasks
include entity state tracking (Bosselut et al.,2018),
recipe structure extraction (Kiddon et al.,2015;Do-
natelli et al.,2021), and anaphora resolution (Fang
et al.,2022). These tasks involve the understand-
ing of the relationship between ingredients and ac-
tions and test models’ abilities under fine-grained
supervision. In contrast, we evaluate whether mod-
els can understand the recipes and compose them
unsupervisedly. Recipe generation tasks ask mod-
els to create recipes from a given title. Kiddon
et al. (2016); H. Lee et al. (2020) provide an in-
gredient list, Majumder et al. (2019) add user’s
historical preference into consideration, and Sakib
et al. (2021) generate recipes from an action graph.
Li et al. (2021) introduce the recipe editing task,
which expects models to edit a base recipe to meet
dietary constraints; and Antognini et al. (2022) it-
eratively rewrite recipes to satisfy users’ feedback.
Our task is similar to the recipe editing tasks in
including a base recipe in the input, but we go be-
yond simple ingredient substitution in recipes, and
also evaluate whether the actions associated with
the ingredients are alternated by models.
2.2 Compositional Generalization
Compositional generalization is the ability to un-
derstand and produce novel combinations of previ-
ously seen components and constructions (Chom-
sky,1956). To measure the compositional general-
ization ability of models, a line of research (Lake
and Baroni,2018;Ruis et al.,2020) designs tasks
that map sentences into action sequences, and splits
the data into training and testing sets from differ-
ent data distributions, e.g., according to different
lengths or primitive commands. Motivated by these
works, we divide our data into finetuning corpus
and test set based on flavors and ingredients, which
are basic components of a dish. Other composi-
tional generalization works conduct experiments
on semantic parsing (Keysers et al.,2019;Kim and
Linzen,2020) and language grounding (Johnson
et al.,2017). To the best of our knowledge, our task
is the first compositional generalization task in the
form of natural language generation.
Our task mainly differs from previous ones in
measuring compositional generalization in a real-
istic setting, where all texts are natural rather than
synthetic. The variation in natural language, both
the variation of expressions and the variation of
knowledge provided by different users, brings ex-
tra challenges. Shaw et al. (2021) also address the
challenge of natural language variation in compo-
sitional generalization, but they experiment on se-
mantic parsing, where the knowledge is highly con-
sistent and can be inducted with grammar rules.
3 Task Definition
We formulate our task in the form of counterfactual
generation (Qin et al.,2019):
p(y0|y,xy,x0),(1)
where
y
in the base recipe,
xy
is the ingredient set
of
y
,
x0
is the adjusted ingredient set that replaces
or adds one main ingredient, and
y0
is the target
recipe to generate. In Figure 2s example,
y
is the
base recipe of red-braised pork, and
x0
differs from
xyin changing pork to crucian carp.
4 Data Preparation
4.1 The XIACHUFANG Recipe Dataset
We collect a novel Chinese dataset XIACHUFANG
of 1,550,151 recipes from
xiachufang.com
, a
popular Chinese recipe sharing website. Com-
pared to the commonly used English recipe dataset
Recipe1M+, XIACHUFANG contains 1.5 times the
recipes. The website provides a list of common
dishes. We map the recipe titles to these dishes, and
find 1,242,206 recipes belonging to 30,060 dishes.
A dish has 41.3 recipes on average. The average
length of a recipe is 224 characters. Organizing
recipes in terms of dishes helps us to learn what
different people have in common when cooking
this dish, which are often the necessary actions.
We select 50 dish pairs <
db
(base dish),
dt
(tar-
get dish)> for evaluation. The two dishes share
Word Definition Examples
Dish Food prepared in a particular way. Red-braised pork, Kung Pao chicken
Recipe Instructions for preparing a dish.
Ingredient Part of the foods that are combined to make a dish. Pork, Chicken
Flavor The taste expression of a dish. Red-braised, Scorched chile (the flavor of Kung Pao)
Action An event described in the recipe, centering on a verb. Wash the pork, Blanch the pork
Table 1: Key concepts we used when analyzing recipes.
Base Dish Target Dish
Spicy Crayfish Spicy Chicken Feet
Kung Pao Chicken Kung Pao Shrimp Balls
Stir-fried Baby Cabbage Stir-fried Loofah
Cold Tofu Cold Tofu with Century Egg
Fried Beef Fried Beef with Carrot
Table 2: Examples of dish pairs used in evaluation.
the same flavor and differ in one principal ingre-
dient in the dish name.
2
We randomly sample 50
recipes of each base dish as the base recipes, and
form 2,500 evaluation instances in total. By letting
models modify different recipes for the same dish,
we better measure whether models have actually
learned the culinary knowledge and can apply the
knowledge flexibly.
Table 2shows several dish pairs used in our
evaluation, and the full list is in Appendix A.1.
In the first three lines of Table 2, the target dish
replaces one ingredient of the base dish; and in
the last two lines, one ingredient is added to the
target dish. The latter situation is rare in the dataset
(only 8%), but it presents additional challenges, as
simply substitution in recipes does not work, and
models have to generate reasonable actions to add
the new ingredient in the right place.
We regard the recipes that do not belong to the
50 dish pairs as the recipe corpus. The corpus size
is 1,479,764. We expect models to learn cooking
knowledge unsupervisedly from this corpus.
The chosen dish pairs meet the following criteria:
they are common in Chinese cuisine; recipes of the
dishes have not been seen by the PLMs we used;
and models have the opportunity to learn about the
ingredients and flavors from the recipe corpus. For
example, for the ingredient crayfish, models can
learn how to process it from recipes of stir-fried
crayfish,garlic crayfish,chilled crayfish, etc. De-
tails of the selection criteria are in Appendix A.1.
2
Some auxiliary ingredients may also change accordingly,
but we only focus on the changes directly associated with the
principal ingredient changed in the dish name.
4.2 Pivot Actions
Pivot actions are actions that differ between a dish
pair, like blanch and make diagonal cuts in the case
of <red-braised pork, red-braised crucian carp>.
Since there is no gold standard for the modified
recipes, we evaluate the quality of the modified
recipes by collecting the pivot actions. For the dish
pair <
db
,
dt
>, the pivot action set,
P
, contains both
actions to remove,
PR
, and actions to insert,
PI
.
PR
are actions that may appear in the recipes of
db
but are not appropriate for
dt
;
PI
are actions
that are not needed for dbbut should appear in the
recipes of
dt
. In the example of Figure 2,blanch
the pork belongs to
PR
and make diagonal cuts
belongs to PI.
It is hard to ask annotators to write pivot ac-
tions from scratch, and checking all the actions in
the recipes is inefficient. We observe that actions
that frequently occur in the recipes of a dish are
more likely to be necessary for the dish. Taking
advantage of the abundant recipes of each dish in
XIACHUFANG, we categorize actions based on fre-
quency, and ask annotators to annotate the vague
ones. Figure 3shows the semi-automatic pivot
action collection workflow.
Recipe Parsing.
We first parse the recipe into a
list of actions. An action
(v, igs, tools)
is centered
on a verb, often accompanied by ingredients and
cooking tools. The definition is consistent with
previous recipe processing works (Donatelli et al.,
2021), and the parsing details are in Appendix A.2.
Pilot Study.
We conduct a pilot annotation to
verify the validity of categorizing actions according
to frequency. We randomly select 10 dish pairs to
check whether annotators agree with the automatic
categorizing results.
For each action
a
, we calculate its frequencies,
fb
and
ft
, according to its appearances in the
recipes of
db
and
dt
, respectively, and categorize it
as:
PR
(actions to remove), if
fb> α ×ft
and
ft< τR
(here
α
is a coefficient greater than 1), the
摘要:

CounterfactualRecipeGeneration:ExploringCompositionalGeneralizationinaRealisticScenarioXiaoLiu1,YansongFeng1;2,JizhiTang3,ChengangHu1andDongyanZhao1;4;51WangxuanInstituteofComputerTechnology,PekingUniversity2TheMOEKeyLaboratoryofComputationalLinguistics,PekingUniversity3BaiduInc.,Beijing,China4Beij...

展开>> 收起<<
Counterfactual Recipe Generation Exploring Compositional Generalization in a Realistic Scenario Xiao Liu1 Yansong Feng12 Jizhi Tang3 Chengang Hu1and Dongyan Zhao145.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:3.7MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注