
KSAT: Knowledge-infused Self Aention Transformer -
Integrating Multiple Domain-Specific Contexts
Kaushik Roy
kaushikr@email.sc.edu
Articial Intelligence Institute
University of South Carolina
USA
Yuxin Zi
yzi@email.sc.edu
Articial Intelligence Institute
University of South Carolina
USA
Vignesh Narayanan
vignar@sc.edu
Articial Intelligence Institute
University of South Carolina
USA
Manas Gaur
manas@umbc.edu
KAI2, University of Maryland
Baltimore County
USA
Amit Sheth
amit@sc.edu
Articial Intelligence Institute
University of South Carolina
USA
ABSTRACT
Domain-specic language understanding requires integrating mul-
tiple pieces of relevant contextual information. For example, we
see both suicide and depression-related behavior (multiple con-
texts) in the text “I have a gun and feel pretty bad about my life,
and it wouldn’t be the worst thing if I didn’t wake up tomorrow”.
Domain specicity in self-attention architectures is handled by
ne-tuning on excerpts from relevant domain specic resources
(datasets and external knowledge - medical textbook chapters on
mental health diagnosis related to suicide and depression). We
propose a modied self-attention architecture Knowledge-infused
Self Attention Transformer (KSAT) that achieves the integration
of multiple domain-specic contexts through the use of external
knowledge sources. KSAT introduces knowledge-guided biases in
dedicated self-attention layers for each knowledge source to accom-
plish this. In addition, KSAT provides mechanics for controlling
the trade-o between learning from data and learning from knowl-
edge. Our quantitative and qualitative evaluations show that (1) the
KSAT architecture provides novel human-understandable ways to
precisely measure and visualize the contributions of the infused
domain contexts, and (2) KSAT performs competitively with other
knowledge-infused baselines and signicantly outperforms base-
lines that use ne-tuning for domain-specic tasks.
KEYWORDS
knowledge graphs, language models, knowledge-infusion
1 MOTIVATION
Solving domain-specic tasks such as mental health diagnosis
(MHD), and triaging, requires integrating relevant contextual infor-
mation from data and knowledge sources. Self-Attention based Lan-
guage Models (SAMs) capture an aggregated broader context from
domain-agnostic, voluminous training corpora [
1
]. Fine-tuning
SAMs on domain-specic corpora achieves domain-specic con-
text capture [
2
,
3
]. However, SAM architectures are black-box in
nature [
4
]. Consequently, ne-tuned SAM architectures do not lend
themselves to the robust evaluation of the open research aims: (R1)
Relevant domain-specic context coverage, and (R2) The inu-
ence of knowledge context traded-o against the data context in
downstream tasks [
5
,
6
]. We propose a modied self-attention ar-
chitecture Knowledge-infused Self Attention Transformer (KSAT) to
address these aims. KSAT performs well on select domain-specic
tasks (see 2.2) while lending itself to a robust human-understandable
evaluation of R1 and R2. Thus KSAT provides a substantial step
towards fostering AI-user trust, and satisfaction [7, 8].
2 BACKGROUND
2.1 Related Work
Prior approaches that are relevant to R1 and R2 and incorporate
multiple knowledge contexts can be broadly categorized based on
the knowledge-infusion technique as (1) knowledge modulated
SAMs and (2) knowledge infused input embedding-based SAMs
[
9
,
10
]. The former uses knowledge to guide the self-attention
mechanism in SAMs, and the latter embeds the knowledge into a
vector space before passing the inputs into SAMs. Here, we briey
summarize their contributions towards R1 and R2. Both Cate-
gory (1), and Category (2) methods’ domain coverage is evaluated
through performance on domain-specic task descriptions (R1).
These methods’ ablations highlight contributions of knowledge
context (R2). However, inspecting the numerical outputs from the
model components (projection matrices and vectors) does not easily
lend themselves to human-understandable scrutiny. Explainable AI
techniques (post-processing of the numerical outputs that trans-
form them into human-understandable information) are required
to conrm the author(s) perspectives [
11
]. Post-processing-based
explanations are local approximations of the SAM reasoning for
particular inputs and therefore do not present the global picture,
casting doubts on the SAM evaluation validity. KSAT presents a
SAM architecture whose numerical outputs lend themselves to
robust human-understandable evaluations of R1 and R2.
2.2 Task Description, Data, and External
Knowledge Sources
Although the KSAT architecture broadly applies to any domain-
specic task, we choose the specic task of Mental Health Diag-
nostic Assistance for Suicidal Tendencies by Gaur et al. [
12
]. We
denote this dataset as MHDA. The data contains high-quality ex-
pert annotations on Reddit posts from suicide-related subreddits.
arXiv:2210.04307v2 [cs.CL] 24 Jun 2023