Distilling Task-speciﬁc Logical Rules from Large Pre-trained Models Tao Chen13 Luxin Liu2 Xuepeng Jia2 Baoliang Cui2 Haihong Tang2 Siliang Tang13y

2025-04-27 1 0 1.31MB 10 页 10玖币

侵权投诉

Distilling Task-speciﬁc Logical Rules from Large Pre-trained Models

Tao Chen1,3∗

, Luxin Liu2, Xuepeng Jia2, Baoliang Cui2,

Haihong Tang2& Siliang Tang1,3†

1Zhejiang University 2Alibaba Group

3Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies

{ttc, siliang}@zju.edu.cn

{xique.llx, jiaxuepeng.jxp}@alibaba-inc.com

{moqing.cbl, piaoxue}@taobao.com

Abstract

Logical rules, both transferable and explain-

able, are widely used as weakly supervised

signals for many downstream tasks such as

named entity tagging. To reduce the human

effort of writing rules, previous researchers

adopt an iterative approach to automatically

learn logical rules from several seed rules.

However, obtaining more seed rules can only

be accomplished by extra human annotation

with heavy costs. Limited by the size and qual-

ity of the seed rules, the model performance

of previous systems is bounded. In this pa-

per, we develop a novel framework STREAM

to distill task-speciﬁc logical rules from large

pre-trained models. Speciﬁcally, we borrow

recent prompt-based language models as the

knowledge expert to yield initial seed rules,

and based on the formed high-quality instance

pool that acts as an intermediary role, we keep

teaching the expert to ﬁt our task and learn-

ing task-speciﬁc logical rules. Experiments

on three public named entity tagging bench-

marks demonstrate the effectiveness of our

proposed framework. With several predeﬁned

prompt templates, our system has gained sig-

niﬁcant improvements over previous state-of-

the-art methods.

1 Introduction

Following the supervised learning paradigm, re-

searchers resort to human annotation to obtain train-

ing data for speciﬁc tasks such as named entity

tagging. Though accurate, manual annotation con-

struction is quite expensive and time-consuming. In

real scenarios, logical rules often serve as a source

of weak supervision that provides abundant weakly

supervised data for various downstream models,

and compared with labeling data, applying rules

can cover more application domains with better

interpretability. Therefore, rule-based weakly su-

∗Work done during an internship at Alibaba

†Corresponding author

label

train

Weakly Supervised Data

write

Logical Rules

(1) PD →Disease

(2) nicotine→Chemical

(3) and so on …

Thirty PD patients

participated in the study.

Named Entity Tagging

Specific Task

Expert

Figure 1: Schematic diagram of a typical rule-based

weakly supervised named entity tagging system. Logi-

cal rules are used to label the data, and models can be

trained on the weakly supervised data. Our goal in this

work is to learn logical rules without any manual seed

rules, corresponding to the dotted area in the ﬁgure.

pervised systems (Figure 1) have attracted consid-

erable attention in recent years.

In fact, it’s not easy to develop an accurate and

complete rule system, as the logical rules are usu-

ally summarized by human experts and the build-

ing process requires extensive domain knowledge.

Besides, there is no evaluation metric to guide an-

notators to select valuable rules. The usability and

quality of acquired rules can not be guaranteed. In

this sense, how to build a reliable rule system with

limited human effort is still an important challenge.

To solve above issue, previous researchers pay

attention to the automatic construction of logical

rules, which tends to start from a few seed rules and

learn new logical rules by pre-deﬁned similarity

measures in an iterative manner. Though proven to

be effective, these systems still require manually

constructed seed rules as the cold start. Limited by

human effort, the size of seed rules is usually small

so that the system performance is bounded.

In this work, we propose an automated frame-

work

STREAM

to di

till

ask-speciﬁc logical

ules from large pr

-tr

ined

odels. Speciﬁcally,

(1) In order to get rid of the restrictions of the seed

rules, we ﬁrstly ask large pre-trained models for

arXiv:2210.02768v1 [cs.CL] 6 Oct 2022

help. As the prompt-based pre-trained models own

the zero-shot ability to generate candidate entity

types, we design two appropriate prompt templates

and achieve automatic acquisition of seed rules by

the prompt model outputs’ consistency. (2) Once

seed rules are obtained, we form a high-quality

instance pool to train the downstream task, continu-

ously add potential instances to the pool, and distill

new logical rules from the pool in an iterative man-

ner. (3) Based on the convergent instance pool, we

further ﬁne-tune a new prompt-based model with

more suitable prompt templates to obtain more re-

liable seed rules, and yield a better downstream

task model. Compared with previous methods, our

system no longer relies on manual seed rules or dic-

tionaries, but only needs several prompt templates.

Experiments on three public named entity tag-

ging benchmarks demonstrate the effectiveness of

our proposed framework

STREAM

, with consis-

tent improvements over several baseline models

and far exceed the state-of-the-art (SOTA) systems.

Besides, we perform a detailed ablation study to

analyze the quality of our obtained seed rules, the

convergence of our propose iterative framework,

and some speciﬁc cases of learned logical rules.

Accordingly, the major contributions of our work

are summarized as follows:

(1) We introduce the large pre-trained prompt-

based models to end the dilemma that the logical

rule learning systems require seed rules as a start.

(2) We develop an effective and stable frame-

work to distill logical rules in an iterative manner,

which combines prompt-based ﬁne-tuning and rule

distillation to achieve mutual enhancement.

(3) We conduct detailed experiments to illustrate

the effectiveness and rationality of our framework

— with several predeﬁned prompt templates, the

performance of our method has surpassed previous

rule learning systems based on manual rules.

2 Methodology

2.1 Overview

In this work, we adopt named entity tagging as the

speciﬁc downstream task to compare with previous

work (Li et al.,2021) of learning logical rules. The

diagram of STREAM is visualized in Figure 3.

2.2 Logical Rules

In real scenarios, logic rules can appear in vari-

ous forms. For convenience, we deﬁne the logical

rules in the uniﬁed form of “if p then q (i.e.p

→

q)”. In named entity tagging task, “p” can be any

logical expression and “q” is the corresponding

entity category. For example, a logical rule may

look like: “if the entity’s lexical string is PD

, then

its corresponding entity label should be disease”.

As demonstrated in previous work (Zhou and

Su,2002), we deﬁne ﬁve meta logical rules to tag

named entities based on their lexical, contextual,

and syntax information. In addition, some combi-

nations of simple logical rules are also considered.

2.2.1 Meta Logical Rules

Following existing literature, our pre-deﬁned meta-

rules are: (1) TOKENSTRING rule matches entity’s

lexical string; (2) PRENGRAM rule matches en-

tity’s preceding context tokens; (3) POSTNGRAM

rule matches entity’s succeeding context tokens;

(4) POSTAG rule matches entity’s part-of-speech

tags; (5) DEPENDENCYREL rule matches the de-

pendency relations of the entity and its headword.

11⋯2

⋯

2 =−3

Thirty PD patients participated in the study

NUM PROPN NOUN VERB ADP DET NOUN

nummod

compound nsubj prep

proj

det

Figure 2: Dependency parsing example.

Figure 2shows an example with its dependency

structure. In this sentence, word PD is a potential

disease entity and following logical rules may exist:

TOKENSTRING == PD →disease

PRENGRAM == thirty →disease

POSTNGRAM == patients →disease

POSTAG == PROPN →disease

DEPENDENCYREL ==

(compound, patient)→disease

In fact, above simple rules may sometimes fail to

work, therefore we introduce complex rules, which

combine several simple rules into compound rules

by logical connectives including and (

∧

), or (

∨

)

and negation (

). For example, only a mention that

satisﬁes both rule POSTNGRAM == patients and

rule POSTAG == PROPN can be a disease entity.

2.2.2 Logical Rules Mining

After deﬁning the form of meta logical rules, we

traverse the entire training set and recall all poten-

tial rules that satisfy the format of meta rules.

1PD: Parkinson’s disease

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DistillingTask-specicLogicalRulesfromLargePre-trainedModelsTaoChen1,3,LuxinLiu2,XuepengJia2,BaoliangCui2,HaihongTang2&SiliangTang1,3y1ZhejiangUniversity2AlibabaGroup3Alibaba-ZhejiangUniversityJointResearchInstituteofFrontierTechnologies{ttc,siliang}@zju.edu.cn{xique.llx,jiaxuepeng.jxp}@alibaba-inc...

展开>> 收起<<

Distilling Task-speciﬁc Logical Rules from Large Pre-trained Models Tao Chen13 Luxin Liu2 Xuepeng Jia2 Baoliang Cui2 Haihong Tang2 Siliang Tang13y.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Distilling Task-speciﬁc Logical Rules from Large Pre-trained Models Tao Chen13 Luxin Liu2 Xuepeng Jia2 Baoliang Cui2 Haihong Tang2 Siliang Tang13y

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: