Specializing Scope Graph Resolution Queries

2025-05-03 0 0 883.97KB 14 页 10玖币

侵权投诉

Aron Zwaan

Software Technology

Delft University of Technology

Delft, Netherlands

a.s.zwaan@tudel.nl

Abstract

To warrant programmer productivity, type checker results

should be correct and available quickly. Correctness can be

provided when a type checker implementation corresponds

to a declarative type system specication. Statix is a type

system specication language which achieves this by au-

tomatically deriving type checker implementations from

declarative typing rules. A key feature of Statix is that it

uses scope graphs for declarative specication of name reso-

lution. However, compared to hand-written type checkers,

type checkers derived from Statix specications have sub-

optimal run time performance.

In this paper, we identify and resolve a performance bottle-

neck in the Statix solver, namely part of the name resolution

algorithm, using partial evaluation. To this end, we introduce

a tailored procedural intermediate query resolution language,

and provide a specializer that translates declarative queries

to this language.

Evaluating this specializer by comparing type checking

run time performance on three benchmarks (Apache Com-

mons CSV, IO, and Lang3), shows that our specializer im-

proves query resolution time up to 7.7x, which reduces the

total type checking run time by 38 – 48%.

CCS Concepts: •Software and its engineering →

Do-

main specic languages; Semantics;

•Theory of compu-

tation →

Regular languages;

Program semantics

;

Graph

algorithms analysis.

Keywords:

scope graphs, graph query resolution, specializa-

tion, partial evaluation, declarative languages

ACM Reference Format:

Aron Zwaan. 2022. Specializing Scope Graph Resolution Queries.

In Proceedings of the 15th ACM SIGPLAN International Conference

on Software Language Engineering (SLE ’22), December 06–07, 2022,

Auckland, New Zealand. ACM, New York, NY, USA, 14 pages. hps:

//doi.org/10.1145/3567512.3567523

Permission to make digital or hard copies of part or all of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for prot or commercial advantage and that copies

bear this notice and the full citation on the rst page. Copyrights for third-

party components of this work must be honored. For all other uses, contact

the owner/author(s).

SLE ’22, December 06–07, 2022, Auckland, New Zealand

ACM ISBN 978-1-4503-9919-7/22/12.

hps://doi.org/10.1145/3567512.3567523

1 Introduction

Developers, whether they use a general-purpose or a domain-

specic language (DSL), use static name and type analysis to

understand and evolve their code. However, implementing a

type checker takes signicant time and eort. In particular,

implementing name binding correctly is challenging, as it re-

quires careful staging of program traversals [18]. Therefore,

type checker frameworks that abstract over name resolu-

tion scheduling, such as Pacak et al. [15] (based on Datalog),

Wyk et al. [31], Hedin and Magnusson [5] (using attribute

grammars), and Van Antwerpen et al. [27] (using constraint

programming and scope graphs) have been developed. These

frameworks ensure executable type checkers can be devel-

oped with signicantly reduced eort.

Interpreting such declarative specications often requires

intricate logic. Generally, the more a language abstracts from

implementation details, the more complicated an interpreter

or compiler will be. However, this comes with the risk of

introducing signicant run time overhead, resulting in sub-

optimal performance compared to low-level approaches.

In this paper, we improve the performance of type check-

ers based on scope graphs [11, 27]. Scope graphs are an es-

tablished approach to modeling name-binding structure. In

this model, the scoping structure and declarations of a pro-

gram are represented in a graph. References can be resolved

using a versatile graph query mechanism. Scope graphs

are embedded in the Statix DSL for type system specica-

tion [18, 27]. This DSL allows high-level specication of

type systems using declarative inference rules. It has a well-

dened declarative semantics, which allows reasoning over

type-systems while abstracting over operational details. The

Statix solver interprets specications as constraint programs,

which yields executable type checkers that are sound with

respect to the declarative semantics. Case studies using Statix

have shown that scope graphs are expressive enough to sup-

port type systems with non-lexical bindings (e.g., imports

and inheritance), structural types, and parametric polymor-

phism [11, 27]. In addition, they allow language-parametric

denition of editor services, such as semantic code comple-

tion [16], renaming [9, 10] and inlining [25]. The expressive-

ness, declarativity, and additional services makes it especially

suitable for DSLs and rapid language prototyping.

However, type checkers derived from Statix specications

are rather slow. For example, type checking the Apache Com-

mons IO library takes 14.7 seconds with the concurrent solver

arXiv:2210.06121v1 [cs.PL] 12 Oct 2022

SLE ’22, December 06–07, 2022, Auckland, New Zealand Aron Zwaan

using 8 cores and even 73.4 seconds on a single core [28].

On the same machine, a full compilation using

javac

takes

roughly 3 seconds on 8 cores, and 5 seconds on a single core.

In this paper, we apply partial evaluation to resolve a

newly identied performance bottleneck in Statix’ scope

graph query resolution algorithm. Our evaluation shows

that the approach ensures query resolution is up to 7.7x

faster than traditional query resolution on average. This

improves the performance of Statix-based type checkers by

38 – 48% on Java projects, which is a signicant step forward

to applying generated type checkers on larger codebases.

In summary, our contributions are as follows:

•

We explain the scope graph query resolution algo-

rithm, and identify a performance bottleneck in the

algorithm (section 3).

•

We introduce an intermediate language that makes

scope graph traversal order and partial query result

combination explicit (section 4).

•

We present a specializer from traditional scope graph

queries to our new intermediate language (section 5).

•

We evaluate the correctness and performance of our

approach (section 6). We show that specializing queries

makes scope graph query resolution up to 7.7x faster.

2 Partial Evaluation for DSL Interpreters

In this section, we provide a brief introduction to partial

evaluation (popularized by Futamura [4]), and explain why

we think it is especially benecial for declarative languages

such as Statix. From the perspective of partial evaluation, a

program can be seen as a function from inputs to an output

𝑂

Some of these inputs may be known statically (

𝑆

), while some

of them may vary per invocation (

𝐷

). Then, the signature of

a program looks as follows:

prog :𝑆×𝐷→𝑂

Aspecializer takes such a program and its static input, and

returns a residual program

prog𝑆

𝐷→𝑂

. When generating

prog𝑆

, it performs the part of the computation that does not

depend on 𝐷, making prog𝑆generally faster than prog.

2.1 Partial Evaluation for Interpreters

This pattern can easily be applied to programming languages.

In that case,

prog

is an interpreter that evaluates the seman-

tics of a program (its static input

𝑆

) with respect to some

arguments

𝐷

. The residual program is essentially a compiled

version of 𝑆. This is called the rst Futamura projection.

Specialization is generally only benecial when a program

is executed multiple times. However, Futamura argues that

specializing an interpreter to a program may already be ben-

ecial when executing the program once, as programs may

repeatedly evaluate a particular piece of code. Specializing

repeatedly executed program fragments removes the inter-

pretation overhead, which might outweigh the run time costs

of compilation. This eect becomes stronger when the com-

putational complexity of interpreting particular language

constructs is high. That is, the more overhead an interpreter

introduces, the more benecial specialization will be.

Declarative languages are languages in which a user spec-

ies intent rather than procedure. The logic to compute a

result that corresponds to the intent is then implemented in

the interpreter (or compiler) of the language. Thus, a declar-

ative language moves part of the complexity of a problem

from the program or programmer to its interpreter.

Having an interpreter with intricate logic means that

declarative languages are susceptible to introducing rela-

tively more run time overhead compared to non-declarative

languages. Interpreters of declarative languages might have

to execute non-trivial algorithms in order to evaluate a pro-

gram. For that reason, partial evaluation might be relatively

benecial for declarative languages.

2.2 Application to Statix

Applying partial evaluation to Statix introduces a few prob-

lems. In fact, it is as complex as nding a compiler from a

constraint language with an embedded scope graph logic

to an imperative language, such as Java. Such a compiler

should ensure that all internal scheduling of rule selection

and query resolution is handled correctly. We regard this as

an open research challenge that is too complicated to solve in

one step. Instead, in this paper, we specialize only a compu-

tationally complex part of the interpreter, namely the query

resolution algorithm, to a specication. This yields a speci-

cation in which the query resolution constraints are partially

evaluated, but other constraints are not. This specication

can then be interpreted by a constraint solver without using

the query resolution algorithm, ensuring faster execution.

To characterize this approach, regard a Statix specication

𝐶𝑄⊎𝐶𝑂

as a collection of query constraints

𝐶𝑄

and other

constraints

𝐶𝑂

. The

⊎

symbol indicates that these groups are

mutually embedded in actual specications. Our approach

is then summarized in the following functions:

specialize :𝐶𝑄⊎𝐶𝑂→𝐶∗

𝑄⊎𝐶𝑂

solve :𝐶∗

𝑄⊎𝐶𝑂×𝑃→𝑇

Here

specialize

specializes the query resolution algorithm

with respect to particular query constraints, yielding spe-

cialized queries (

𝐶∗

𝑄

). These constraints are embedded in the

original AST, yielding a partially specialized specication

𝐶∗

𝑄⊎𝐶𝑂

. When type-checking a concrete object program

𝑃

, this specialized specication is interpreted by an adapted

solver solve, returning a typing 𝑇.

These specialized queries

𝐶∗

𝑄

cannot be represented in

either Statix itself or Java (the language in which the solver is

written). Thus, we introduce a tailored intermediate language

to express those. For this language, we provide an interpreter,

Specializing Scope Graph Resolution eries SLE ’22, December 06–07, 2022, Auckland, New Zealand

𝑠𝑙𝑠𝑥↦→ x:N

VAR

𝑠𝜆

𝑠𝑙′

𝑠𝑓↦→ f:N→NVAR 𝑠𝑦↦→ y:N

VAR

let x=6in

let f = fun y . x* y

in f 7 x

Figure 1. PCF program with Scope Graph and Query

which the Statix solver uses instead of the name resolution

algorithm.

3 Resolving Queries in Scope Graphs

In this section, we introduce scope graphs and query resolu-

tion. Section 3.1 introduces scope graphs and three parame-

ters of scope graph queries using two examples. After that,

section 3.2 explains how Statix interprets these query param-

eters. Then, we explain that repeated querying of some of

these parameters makes query resolution slow, which moti-

vates the eort to optimize it (section 3.3). Finally, section 3.4

provides the full resolution algorithm, which is required to

understand the remainder of the paper.

3.1 Query Resolution by Example

We consider two examples of queries in scope graphs. These

examples motivate the declarative query language of Statix,

as well as explaining the resolution algorithm that interprets

such queries. Each example discusses the scope graph of a

program, and the resolution of a particular query in that

scope graph.

Example 1.

In g. 1, a small PCF program, containing

two let-bindings, a function denition and its application, is

shown. Next to the program, the scope graph is depicted. In

a scope graph, each ‘scope’ is modeled by a node. In this case,

𝑠𝑙

and

𝑠𝑙′

represent the bodies of the let expressions, while

𝑠𝜆

models the body of the function. The P-labeled edges, such

as the edge from

𝑠𝑙′

𝑠𝑙

, ensure declarations of outer scopes

will be visible in inner scopes. Nodes

𝑠𝑥

𝑠𝑦

, and

𝑠𝑓

model

the declarations of the

, and

variables, respectively.

Therefore, these scopes map to a datum (e.g.

for

𝑠𝑥

)

that indicates the name and type of the declaration.

References are modeled using queries in scope graphs. In

the code, a reference

is highlighted. This reference corre-

sponds to the dashed box in the scope graph. The box points

𝑠𝜆

, because

occurs in the body of the function expression.

Eventually, the query resolves to

𝑠𝑥

via

𝑠𝑙

, which is indeed

the declaration of xin the outer let.

For this paper, we are particularly interested in how this

query result was computed. This is indicated by the num-

bered, dashed edges. When starting the query in

𝑠𝜆

, the al-

gorithm rst traverses the

VAR

edge to

𝑠𝑦

. Then it checks

whether

𝑠𝑦

is a valid declaration for the reference. Since this

is not the case, the algorithm continues by traversing the P

edge to

𝑠𝑙

(step 2). From there, the

VAR

edge to

𝑠𝑥

is traversed.

The algorithm nds that

𝑠𝑥

is a valid declaration, and returns

that as the environment to which the query evaluates.

In Statix, one would not write a resolution procedure as

shown above directly, as Statix is meant to be a declarative

specication language. Instead, Statix interprets a high-level

description of valid query answers using a generic algorithm,

yielding the behavior as shown above. So how can we de-

scribe the query resolution procedure in a high level fashion?

The given example already shows two of the three query

parameters that determine how a query resolves to an envi-

ronment. First, the query resolution algorithm decided that

𝑠𝑦

should not be in the environment, while

𝑠𝑥

should. That

is expressed using a data well-formedness condition

, which

is a unary predicate over datums. A possible declaration is

only included in the environment when its datum matches

. In this case, the predicate only accepts declarations with

name

. Second, the algorithm decided to traverse

VAR

edges

before Pedges. This corresponds to the intuition that local

declaration are prefered over (i.e., shadow) more distant dec-

larations. In Statix, this is modeled using a strict partial order

over labels (refered to as label order). For this example, the

label order

VAR <

Pindicates that

VAR

edges should be

traversed rst.

Example 2.

Figure 2 shows an example program in Lan-

guage with Modules (LM, introduced by Néron et al. [11]). In

this scope graph, scope

𝑠

represent the global scope. Scopes

𝑠𝐴

𝑠𝐵

𝑠𝐶

𝑠𝐷

and

𝑠𝐸

have a double role: they model the decla-

ration of a module as well as its body. Therefore, they have

incoming

MOD

edges and a datum as well as inner decla-

rations. All modules have a Pedge back to their enclosing

context. Finally, the imports are modeled using Iedges.

When resolving

𝑠𝐸

, the resolution algorithm rst tra-

verses the

VAR

edge to

𝑠𝑦

(step 1). Because that declaration

does not match, it continues traversing the Iedge to

𝑠𝐵

(step

2). That scope does not contain valid declarations but it has

Pand Iedges to

𝑠𝐴

, and a

MOD

edge to

𝑠𝐶

. However, refer-

ence resolution should not traverse those. Clearly, traversing

the Pedge would be incorrect. After all, module

only im-

ports

A.B

, which should not bring declarations from

scope. Whether traversing the Iedge is valid depends on the

language. Languages with transitive imports would allow

traversing multiple subsequent Iedges, while language with

non-transitive imports would not. As LM has non-transitive

imports, we do not traverse the Iedge to

𝑠𝐴

. Similarly, the

MOD

edge to

𝑠𝐶

should not be traversed, as modules should

not be imported implicitly. Instead, the query resolves to

declaration

𝑠2

𝑠𝐶

via the import edge from

𝑠𝐸

(step 3 and

4). Now, we still need to consider whether it is necessary

to traverse the Pedge to

𝑠𝐷

. When imports shadow the sur-

rounding scope, that is not required, as

𝑠2

would shadow

any result from

𝑠𝐷

. However, for the sake of the example,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SpecializingScopeGraphResolutionQueriesAronZwaanSoftwareTechnologyDelftUniversityofTechnologyDelft,Netherlandsa.s.zwaan@tudelft.nlAbstractTowarrantprogrammerproductivity,typecheckerresultsshouldbecorrectandavailablequickly.Correctnesscanbeprovidedwhenatypecheckerimplementationcorrespondstoadeclarati...

展开>> 收起<<

Specializing Scope Graph Resolution Queries.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Specializing Scope Graph Resolution Queries

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: