Distilling Task-specific Logical Rules from Large Pre-trained Models Tao Chen13 Luxin Liu2 Xuepeng Jia2 Baoliang Cui2 Haihong Tang2 Siliang Tang13y

2025-04-27 0 0 1.31MB 10 页 10玖币
侵权投诉
Distilling Task-specific Logical Rules from Large Pre-trained Models
Tao Chen1,3
, Luxin Liu2, Xuepeng Jia2, Baoliang Cui2,
Haihong Tang2& Siliang Tang1,3
1Zhejiang University 2Alibaba Group
3Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies
{ttc, siliang}@zju.edu.cn
{xique.llx, jiaxuepeng.jxp}@alibaba-inc.com
{moqing.cbl, piaoxue}@taobao.com
Abstract
Logical rules, both transferable and explain-
able, are widely used as weakly supervised
signals for many downstream tasks such as
named entity tagging. To reduce the human
effort of writing rules, previous researchers
adopt an iterative approach to automatically
learn logical rules from several seed rules.
However, obtaining more seed rules can only
be accomplished by extra human annotation
with heavy costs. Limited by the size and qual-
ity of the seed rules, the model performance
of previous systems is bounded. In this pa-
per, we develop a novel framework STREAM
to distill task-specific logical rules from large
pre-trained models. Specifically, we borrow
recent prompt-based language models as the
knowledge expert to yield initial seed rules,
and based on the formed high-quality instance
pool that acts as an intermediary role, we keep
teaching the expert to fit our task and learn-
ing task-specific logical rules. Experiments
on three public named entity tagging bench-
marks demonstrate the effectiveness of our
proposed framework. With several predefined
prompt templates, our system has gained sig-
nificant improvements over previous state-of-
the-art methods.
1 Introduction
Following the supervised learning paradigm, re-
searchers resort to human annotation to obtain train-
ing data for specific tasks such as named entity
tagging. Though accurate, manual annotation con-
struction is quite expensive and time-consuming. In
real scenarios, logical rules often serve as a source
of weak supervision that provides abundant weakly
supervised data for various downstream models,
and compared with labeling data, applying rules
can cover more application domains with better
interpretability. Therefore, rule-based weakly su-
Work done during an internship at Alibaba
Corresponding author
label
train
Weakly Supervised Data
write
Logical Rules
(1) PD Disease
(2) nicotineChemical
(3) and so on …
Thirty PD patients
participated in the study.
Named Entity Tagging
Specific Task
Expert
Figure 1: Schematic diagram of a typical rule-based
weakly supervised named entity tagging system. Logi-
cal rules are used to label the data, and models can be
trained on the weakly supervised data. Our goal in this
work is to learn logical rules without any manual seed
rules, corresponding to the dotted area in the figure.
pervised systems (Figure 1) have attracted consid-
erable attention in recent years.
In fact, it’s not easy to develop an accurate and
complete rule system, as the logical rules are usu-
ally summarized by human experts and the build-
ing process requires extensive domain knowledge.
Besides, there is no evaluation metric to guide an-
notators to select valuable rules. The usability and
quality of acquired rules can not be guaranteed. In
this sense, how to build a reliable rule system with
limited human effort is still an important challenge.
To solve above issue, previous researchers pay
attention to the automatic construction of logical
rules, which tends to start from a few seed rules and
learn new logical rules by pre-defined similarity
measures in an iterative manner. Though proven to
be effective, these systems still require manually
constructed seed rules as the cold start. Limited by
human effort, the size of seed rules is usually small
so that the system performance is bounded.
In this work, we propose an automated frame-
work
STREAM
to di
S
till
T
ask-specific logical
R
ules from large pr
E
-tr
A
ined
M
odels. Specifically,
(1) In order to get rid of the restrictions of the seed
rules, we firstly ask large pre-trained models for
arXiv:2210.02768v1 [cs.CL] 6 Oct 2022
help. As the prompt-based pre-trained models own
the zero-shot ability to generate candidate entity
types, we design two appropriate prompt templates
and achieve automatic acquisition of seed rules by
the prompt model outputs’ consistency. (2) Once
seed rules are obtained, we form a high-quality
instance pool to train the downstream task, continu-
ously add potential instances to the pool, and distill
new logical rules from the pool in an iterative man-
ner. (3) Based on the convergent instance pool, we
further fine-tune a new prompt-based model with
more suitable prompt templates to obtain more re-
liable seed rules, and yield a better downstream
task model. Compared with previous methods, our
system no longer relies on manual seed rules or dic-
tionaries, but only needs several prompt templates.
Experiments on three public named entity tag-
ging benchmarks demonstrate the effectiveness of
our proposed framework
STREAM
, with consis-
tent improvements over several baseline models
and far exceed the state-of-the-art (SOTA) systems.
Besides, we perform a detailed ablation study to
analyze the quality of our obtained seed rules, the
convergence of our propose iterative framework,
and some specific cases of learned logical rules.
Accordingly, the major contributions of our work
are summarized as follows:
(1) We introduce the large pre-trained prompt-
based models to end the dilemma that the logical
rule learning systems require seed rules as a start.
(2) We develop an effective and stable frame-
work to distill logical rules in an iterative manner,
which combines prompt-based fine-tuning and rule
distillation to achieve mutual enhancement.
(3) We conduct detailed experiments to illustrate
the effectiveness and rationality of our framework
— with several predefined prompt templates, the
performance of our method has surpassed previous
rule learning systems based on manual rules.
2 Methodology
2.1 Overview
In this work, we adopt named entity tagging as the
specific downstream task to compare with previous
work (Li et al.,2021) of learning logical rules. The
diagram of STREAM is visualized in Figure 3.
2.2 Logical Rules
In real scenarios, logic rules can appear in vari-
ous forms. For convenience, we define the logical
rules in the unified form of “if p then q (i.e.p
q)”. In named entity tagging task, “p” can be any
logical expression and “q” is the corresponding
entity category. For example, a logical rule may
look like: if the entity’s lexical string is PD
1
, then
its corresponding entity label should be disease”.
As demonstrated in previous work (Zhou and
Su,2002), we define five meta logical rules to tag
named entities based on their lexical, contextual,
and syntax information. In addition, some combi-
nations of simple logical rules are also considered.
2.2.1 Meta Logical Rules
Following existing literature, our pre-defined meta-
rules are: (1) TOKENSTRING rule matches entity’s
lexical string; (2) PRENGRAM rule matches en-
tity’s preceding context tokens; (3) POSTNGRAM
rule matches entity’s succeeding context tokens;
(4) POSTAG rule matches entity’s part-of-speech
tags; (5) DEPENDENCYREL rule matches the de-
pendency relations of the entity and its headword.
112
2 =3
Thirty PD patients participated in the study
NUM PROPN NOUN VERB ADP DET NOUN
nummod
compound nsubj prep
proj
det
Figure 2: Dependency parsing example.
Figure 2shows an example with its dependency
structure. In this sentence, word PD is a potential
disease entity and following logical rules may exist:
TOKENSTRING == PD disease
PRENGRAM == thirty disease
POSTNGRAM == patients disease
POSTAG == PROPN disease
DEPENDENCYREL ==
(compound, patient)disease
In fact, above simple rules may sometimes fail to
work, therefore we introduce complex rules, which
combine several simple rules into compound rules
by logical connectives including and (
), or (
)
and negation (
¬
). For example, only a mention that
satisfies both rule POSTNGRAM == patients and
rule POSTAG == PROPN can be a disease entity.
2.2.2 Logical Rules Mining
After defining the form of meta logical rules, we
traverse the entire training set and recall all poten-
tial rules that satisfy the format of meta rules.
1PD: Parkinson’s disease
摘要:

DistillingTask-specicLogicalRulesfromLargePre-trainedModelsTaoChen1,3,LuxinLiu2,XuepengJia2,BaoliangCui2,HaihongTang2&SiliangTang1,3y1ZhejiangUniversity2AlibabaGroup3Alibaba-ZhejiangUniversityJointResearchInstituteofFrontierTechnologies{ttc,siliang}@zju.edu.cn{xique.llx,jiaxuepeng.jxp}@alibaba-inc...

展开>> 收起<<
Distilling Task-specific Logical Rules from Large Pre-trained Models Tao Chen13 Luxin Liu2 Xuepeng Jia2 Baoliang Cui2 Haihong Tang2 Siliang Tang13y.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:1.31MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注