Specializing Scope Graph Resolution Queries

2025-05-03 0 0 883.97KB 14 页 10玖币
侵权投诉
Specializing Scope Graph Resolution Queries
Aron Zwaan
Software Technology
Delft University of Technology
Delft, Netherlands
a.s.zwaan@tudel.nl
Abstract
To warrant programmer productivity, type checker results
should be correct and available quickly. Correctness can be
provided when a type checker implementation corresponds
to a declarative type system specication. Statix is a type
system specication language which achieves this by au-
tomatically deriving type checker implementations from
declarative typing rules. A key feature of Statix is that it
uses scope graphs for declarative specication of name reso-
lution. However, compared to hand-written type checkers,
type checkers derived from Statix specications have sub-
optimal run time performance.
In this paper, we identify and resolve a performance bottle-
neck in the Statix solver, namely part of the name resolution
algorithm, using partial evaluation. To this end, we introduce
a tailored procedural intermediate query resolution language,
and provide a specializer that translates declarative queries
to this language.
Evaluating this specializer by comparing type checking
run time performance on three benchmarks (Apache Com-
mons CSV, IO, and Lang3), shows that our specializer im-
proves query resolution time up to 7.7x, which reduces the
total type checking run time by 38 – 48%.
CCS Concepts: Software and its engineering
Do-
main specic languages; Semantics;
Theory of compu-
tation
Regular languages;
Program semantics
;
Graph
algorithms analysis.
Keywords:
scope graphs, graph query resolution, specializa-
tion, partial evaluation, declarative languages
ACM Reference Format:
Aron Zwaan. 2022. Specializing Scope Graph Resolution Queries.
In Proceedings of the 15th ACM SIGPLAN International Conference
on Software Language Engineering (SLE ’22), December 06–07, 2022,
Auckland, New Zealand. ACM, New York, NY, USA, 14 pages. hps:
//doi.org/10.1145/3567512.3567523
Permission to make digital or hard copies of part or all of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for prot or commercial advantage and that copies
bear this notice and the full citation on the rst page. Copyrights for third-
party components of this work must be honored. For all other uses, contact
the owner/author(s).
SLE ’22, December 06–07, 2022, Auckland, New Zealand
©2022 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9919-7/22/12.
hps://doi.org/10.1145/3567512.3567523
1 Introduction
Developers, whether they use a general-purpose or a domain-
specic language (DSL), use static name and type analysis to
understand and evolve their code. However, implementing a
type checker takes signicant time and eort. In particular,
implementing name binding correctly is challenging, as it re-
quires careful staging of program traversals [18]. Therefore,
type checker frameworks that abstract over name resolu-
tion scheduling, such as Pacak et al. [15] (based on Datalog),
Wyk et al. [31], Hedin and Magnusson [5] (using attribute
grammars), and Van Antwerpen et al. [27] (using constraint
programming and scope graphs) have been developed. These
frameworks ensure executable type checkers can be devel-
oped with signicantly reduced eort.
Interpreting such declarative specications often requires
intricate logic. Generally, the more a language abstracts from
implementation details, the more complicated an interpreter
or compiler will be. However, this comes with the risk of
introducing signicant run time overhead, resulting in sub-
optimal performance compared to low-level approaches.
In this paper, we improve the performance of type check-
ers based on scope graphs [11, 27]. Scope graphs are an es-
tablished approach to modeling name-binding structure. In
this model, the scoping structure and declarations of a pro-
gram are represented in a graph. References can be resolved
using a versatile graph query mechanism. Scope graphs
are embedded in the Statix DSL for type system specica-
tion [18, 27]. This DSL allows high-level specication of
type systems using declarative inference rules. It has a well-
dened declarative semantics, which allows reasoning over
type-systems while abstracting over operational details. The
Statix solver interprets specications as constraint programs,
which yields executable type checkers that are sound with
respect to the declarative semantics. Case studies using Statix
have shown that scope graphs are expressive enough to sup-
port type systems with non-lexical bindings (e.g., imports
and inheritance), structural types, and parametric polymor-
phism [11, 27]. In addition, they allow language-parametric
denition of editor services, such as semantic code comple-
tion [16], renaming [9, 10] and inlining [25]. The expressive-
ness, declarativity, and additional services makes it especially
suitable for DSLs and rapid language prototyping.
However, type checkers derived from Statix specications
are rather slow. For example, type checking the Apache Com-
mons IO library takes 14.7 seconds with the concurrent solver
1
arXiv:2210.06121v1 [cs.PL] 12 Oct 2022
SLE ’22, December 06–07, 2022, Auckland, New Zealand Aron Zwaan
using 8 cores and even 73.4 seconds on a single core [28].
On the same machine, a full compilation using
javac
takes
roughly 3 seconds on 8 cores, and 5 seconds on a single core.
In this paper, we apply partial evaluation to resolve a
newly identied performance bottleneck in Statix’ scope
graph query resolution algorithm. Our evaluation shows
that the approach ensures query resolution is up to 7.7x
faster than traditional query resolution on average. This
improves the performance of Statix-based type checkers by
38 – 48% on Java projects, which is a signicant step forward
to applying generated type checkers on larger codebases.
In summary, our contributions are as follows:
We explain the scope graph query resolution algo-
rithm, and identify a performance bottleneck in the
algorithm (section 3).
We introduce an intermediate language that makes
scope graph traversal order and partial query result
combination explicit (section 4).
We present a specializer from traditional scope graph
queries to our new intermediate language (section 5).
We evaluate the correctness and performance of our
approach (section 6). We show that specializing queries
makes scope graph query resolution up to 7.7x faster.
2 Partial Evaluation for DSL Interpreters
In this section, we provide a brief introduction to partial
evaluation (popularized by Futamura [4]), and explain why
we think it is especially benecial for declarative languages
such as Statix. From the perspective of partial evaluation, a
program can be seen as a function from inputs to an output
𝑂
.
Some of these inputs may be known statically (
𝑆
), while some
of them may vary per invocation (
𝐷
). Then, the signature of
a program looks as follows:
prog :𝑆×𝐷𝑂
Aspecializer takes such a program and its static input, and
returns a residual program
prog𝑆
:
𝐷𝑂
. When generating
prog𝑆
, it performs the part of the computation that does not
depend on 𝐷, making prog𝑆generally faster than prog.
2.1 Partial Evaluation for Interpreters
This pattern can easily be applied to programming languages.
In that case,
prog
is an interpreter that evaluates the seman-
tics of a program (its static input
𝑆
) with respect to some
arguments
𝐷
. The residual program is essentially a compiled
version of 𝑆. This is called the rst Futamura projection.
Specialization is generally only benecial when a program
is executed multiple times. However, Futamura argues that
specializing an interpreter to a program may already be ben-
ecial when executing the program once, as programs may
repeatedly evaluate a particular piece of code. Specializing
repeatedly executed program fragments removes the inter-
pretation overhead, which might outweigh the run time costs
of compilation. This eect becomes stronger when the com-
putational complexity of interpreting particular language
constructs is high. That is, the more overhead an interpreter
introduces, the more benecial specialization will be.
Declarative languages are languages in which a user spec-
ies intent rather than procedure. The logic to compute a
result that corresponds to the intent is then implemented in
the interpreter (or compiler) of the language. Thus, a declar-
ative language moves part of the complexity of a problem
from the program or programmer to its interpreter.
Having an interpreter with intricate logic means that
declarative languages are susceptible to introducing rela-
tively more run time overhead compared to non-declarative
languages. Interpreters of declarative languages might have
to execute non-trivial algorithms in order to evaluate a pro-
gram. For that reason, partial evaluation might be relatively
benecial for declarative languages.
2.2 Application to Statix
Applying partial evaluation to Statix introduces a few prob-
lems. In fact, it is as complex as nding a compiler from a
constraint language with an embedded scope graph logic
to an imperative language, such as Java. Such a compiler
should ensure that all internal scheduling of rule selection
and query resolution is handled correctly. We regard this as
an open research challenge that is too complicated to solve in
one step. Instead, in this paper, we specialize only a compu-
tationally complex part of the interpreter, namely the query
resolution algorithm, to a specication. This yields a speci-
cation in which the query resolution constraints are partially
evaluated, but other constraints are not. This specication
can then be interpreted by a constraint solver without using
the query resolution algorithm, ensuring faster execution.
To characterize this approach, regard a Statix specication
𝐶𝑄𝐶𝑂
as a collection of query constraints
𝐶𝑄
and other
constraints
𝐶𝑂
. The
symbol indicates that these groups are
mutually embedded in actual specications. Our approach
is then summarized in the following functions:
specialize :𝐶𝑄𝐶𝑂𝐶
𝑄𝐶𝑂
solve :𝐶
𝑄𝐶𝑂×𝑃𝑇
Here
specialize
specializes the query resolution algorithm
with respect to particular query constraints, yielding spe-
cialized queries (
𝐶
𝑄
). These constraints are embedded in the
original AST, yielding a partially specialized specication
𝐶
𝑄𝐶𝑂
. When type-checking a concrete object program
𝑃
, this specialized specication is interpreted by an adapted
solver solve, returning a typing 𝑇.
These specialized queries
𝐶
𝑄
cannot be represented in
either Statix itself or Java (the language in which the solver is
written). Thus, we introduce a tailored intermediate language
to express those. For this language, we provide an interpreter,
2
Specializing Scope Graph Resolution eries SLE ’22, December 06–07, 2022, Auckland, New Zealand
𝑠𝑙𝑠𝑥↦→ x:N
VAR
𝑠𝜆
P
𝑠𝑙
P
𝑠𝑓↦→ f:NNVAR 𝑠𝑦↦→ y:N
VAR
let x=6in
let f = fun y . x* y
in f 7 x
1
2
3
Figure 1. PCF program with Scope Graph and Query
which the Statix solver uses instead of the name resolution
algorithm.
3 Resolving Queries in Scope Graphs
In this section, we introduce scope graphs and query resolu-
tion. Section 3.1 introduces scope graphs and three parame-
ters of scope graph queries using two examples. After that,
section 3.2 explains how Statix interprets these query param-
eters. Then, we explain that repeated querying of some of
these parameters makes query resolution slow, which moti-
vates the eort to optimize it (section 3.3). Finally, section 3.4
provides the full resolution algorithm, which is required to
understand the remainder of the paper.
3.1 Query Resolution by Example
We consider two examples of queries in scope graphs. These
examples motivate the declarative query language of Statix,
as well as explaining the resolution algorithm that interprets
such queries. Each example discusses the scope graph of a
program, and the resolution of a particular query in that
scope graph.
Example 1.
In g. 1, a small PCF program, containing
two let-bindings, a function denition and its application, is
shown. Next to the program, the scope graph is depicted. In
a scope graph, each ‘scope’ is modeled by a node. In this case,
𝑠𝑙
and
𝑠𝑙
represent the bodies of the let expressions, while
𝑠𝜆
models the body of the function. The P-labeled edges, such
as the edge from
𝑠𝑙
to
𝑠𝑙
, ensure declarations of outer scopes
will be visible in inner scopes. Nodes
𝑠𝑥
,
𝑠𝑦
, and
𝑠𝑓
model
the declarations of the
x
,
y
, and
f
variables, respectively.
Therefore, these scopes map to a datum (e.g.
x
:
N
for
𝑠𝑥
)
that indicates the name and type of the declaration.
References are modeled using queries in scope graphs. In
the code, a reference
x
is highlighted. This reference corre-
sponds to the dashed box in the scope graph. The box points
to
𝑠𝜆
, because
x
occurs in the body of the function expression.
Eventually, the query resolves to
𝑠𝑥
via
𝑠𝑙
, which is indeed
the declaration of xin the outer let.
For this paper, we are particularly interested in how this
query result was computed. This is indicated by the num-
bered, dashed edges. When starting the query in
𝑠𝜆
, the al-
gorithm rst traverses the
VAR
edge to
𝑠𝑦
. Then it checks
whether
𝑠𝑦
is a valid declaration for the reference. Since this
is not the case, the algorithm continues by traversing the P
edge to
𝑠𝑙
(step 2). From there, the
VAR
edge to
𝑠𝑥
is traversed.
The algorithm nds that
𝑠𝑥
is a valid declaration, and returns
that as the environment to which the query evaluates.
In Statix, one would not write a resolution procedure as
shown above directly, as Statix is meant to be a declarative
specication language. Instead, Statix interprets a high-level
description of valid query answers using a generic algorithm,
yielding the behavior as shown above. So how can we de-
scribe the query resolution procedure in a high level fashion?
The given example already shows two of the three query
parameters that determine how a query resolves to an envi-
ronment. First, the query resolution algorithm decided that
𝑠𝑦
should not be in the environment, while
𝑠𝑥
should. That
is expressed using a data well-formedness condition
D
, which
is a unary predicate over datums. A possible declaration is
only included in the environment when its datum matches
D
. In this case, the predicate only accepts declarations with
name
x
. Second, the algorithm decided to traverse
VAR
edges
before Pedges. This corresponds to the intuition that local
declaration are prefered over (i.e., shadow) more distant dec-
larations. In Statix, this is modeled using a strict partial order
over labels (refered to as label order). For this example, the
label order
VAR <
Pindicates that
VAR
edges should be
traversed rst.
Example 2.
Figure 2 shows an example program in Lan-
guage with Modules (LM, introduced by Néron et al. [11]). In
this scope graph, scope
𝑠
represent the global scope. Scopes
𝑠𝐴
,
𝑠𝐵
,
𝑠𝐶
,
𝑠𝐷
and
𝑠𝐸
have a double role: they model the decla-
ration of a module as well as its body. Therefore, they have
incoming
MOD
edges and a datum as well as inner decla-
rations. All modules have a Pedge back to their enclosing
context. Finally, the imports are modeled using Iedges.
When resolving
x
in
𝑠𝐸
, the resolution algorithm rst tra-
verses the
VAR
edge to
𝑠𝑦
(step 1). Because that declaration
does not match, it continues traversing the Iedge to
𝑠𝐵
(step
2). That scope does not contain valid declarations but it has
Pand Iedges to
𝑠𝐴
, and a
MOD
edge to
𝑠𝐶
. However, refer-
ence resolution should not traverse those. Clearly, traversing
the Pedge would be incorrect. After all, module
E
only im-
ports
A.B
, which should not bring declarations from
A
in
scope. Whether traversing the Iedge is valid depends on the
language. Languages with transitive imports would allow
traversing multiple subsequent Iedges, while language with
non-transitive imports would not. As LM has non-transitive
imports, we do not traverse the Iedge to
𝑠𝐴
. Similarly, the
MOD
edge to
𝑠𝐶
should not be traversed, as modules should
not be imported implicitly. Instead, the query resolves to
declaration
𝑠2
in
𝑠𝐶
via the import edge from
𝑠𝐸
(step 3 and
4). Now, we still need to consider whether it is necessary
to traverse the Pedge to
𝑠𝐷
. When imports shadow the sur-
rounding scope, that is not required, as
𝑠2
would shadow
any result from
𝑠𝐷
. However, for the sake of the example,
3
摘要:

SpecializingScopeGraphResolutionQueriesAronZwaanSoftwareTechnologyDelftUniversityofTechnologyDelft,Netherlandsa.s.zwaan@tudelft.nlAbstractTowarrantprogrammerproductivity,typecheckerresultsshouldbecorrectandavailablequickly.Correctnesscanbeprovidedwhenatypecheckerimplementationcorrespondstoadeclarati...

展开>> 收起<<
Specializing Scope Graph Resolution Queries.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:883.97KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注