Data types as a more ergonomic frontend for Grammar-Guided Genetic Programming_2

2025-04-27 0 0 576.83KB 9 页 10玖币
侵权投诉
Data types as a more ergonomic frontend for
Grammar-Guided Genetic Programming
Guilherme Espada
gjespada@ciencias.ulisboa.pt
LASIGE, Faculdade de Ciências da
Universidade de Lisboa
Portugal
Leon Ingelse
lingelse@lasige.di.fc.ul.pt
LASIGE, Faculdade de Ciências da
Universidade de Lisboa
Portugal
Paulo Canelas
pacsantos@ciencias.ulisboa.pt
School of Computer Science
Carnegie Mellon University
USA
LASIGE, Faculdade de Ciências da
Universidade de Lisboa
Portugal
Pedro Barbosa
psbarbosa@ciencias.ulisboa.pt
LASIGE, Faculdade de Ciências da
Universidade de Lisboa
Portugal
Instituto de Medicina Molecular
Portugal
Alcides Fonseca
alcides@ciencias.ulisboa.pt
LASIGE, Faculdade de Ciências da
Universidade de Lisboa
Portugal
Abstract
Genetic Programming (GP) is an heuristic method that can
be applied to many Machine Learning, Optimization and
Engineering problems. In particular, it has been widely used
in Software Engineering for Test-case generation, Program
Synthesis and Improvement of Software (GI).
Grammar-Guided Genetic Programming (GGGP) approaches
allow the user to rene the domain of valid program solu-
tions. Backus Normal Form is the most popular interface for
describing Context-Free Grammars (CFG) for GGGP. BNF
and its derivatives have the disadvantage of interleaving the
grammar language and the target language of the program.
We propose to embed the grammar as an internal Domain-
Specic Language in the host language of the framework.
This approach has the same expressive power as BNF and
EBNF while using the host language type-system to take
advantage of all the existing tooling: linters, formatters, type-
checkers, autocomplete, and legacy code support. These tools
have a practical utility in designing software in general, and
GP systems in particular.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for prot or commercial advantage and that copies bear
this notice and the full citation on the rst page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specic permission and/or a fee. Request
permissions from permissions@acm.org.
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY
©2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06... $15.00
hps://doi.org/XXXXXXX.XXXXXXX
We also present Meta-Handlers, user-dened overrides
of the tree-generation system. This technique extends our
object-oriented encoding with more practicability and ex-
pressive power than existing CFG approaches, achieving the
same expressive power of Attribute Grammars, but without
the grammar vs target language duality.
Furthermore, we evidence that this approach is feasible,
showing an example Python implementation as proof. We
also compare our approach against textual BNF-representations
w.r.t. expressive power and ergonomics. These advantages
do not come at the cost of performance, as shown by our
empirical evaluation on 5 benchmarks of our example imple-
mentation against PonyGE2. We conclude that our approach
has better ergonomics with the same expressive power and
performance of textual BNF-based grammar encodings.
CCS Concepts: Computer systems organization Em-
bedded systems
;Redundancy; Robotics;
Networks
Network reliability.
Keywords:
Grammar-guided Genetic Programming, Strongly-
Typed Genetic Programming, Genetic Programming Frame-
work
ACM Reference Format:
Guilherme Espada, Leon Ingelse, Paulo Canelas, Pedro Barbosa,
and Alcides Fonseca. 2018. Data types as a more ergonomic frontend
for Grammar-Guided Genetic Programming. In Proceedings of Make
sure to enter the correct conference title from your rights conrmation
emai (Conference acronym ’XX). ACM, New York, NY, USA, 9 pages.
hps://doi.org/XXXXXXX.XXXXXXX
arXiv:2210.04826v1 [cs.PL] 10 Oct 2022
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Espada et al.
1 Introduction
Genetic Programming (GP) has been applied to several do-
mains [
1
], including test-case generation [
6
], program syn-
thesis [
10
] and automatic optimization of programs [
18
].
Being a search-based method, GP often requires the user to
encode domain-specic restrictions to the possible solutions.
Throughout the Genetic Programming procedure, many
solutions (also called programs or trees) are generated, mod-
ied and combined in order to search the one that has the
best performance or solves the problem. One of the main
challenges of using GP is to dene the boundaries of the
search-space in a way that is performant and ecient. En-
coding domain-knowledge in the denition of the search
space is desirable, so many candidate, but invalid solutions
are removed from the search.Grammar-Guided and Strongly-
Typed Genetic Programming (GGGP [
33
] and STGP [
21
])
are two popular families of approaches to guide the random
solution generation by the user. To restrict valid solutions,
GGGP relies on user-dened grammars while STGP relies
on user-dened type annotations.
Within each family of approaches, there are dierent trade-
os between expressive power and computational complex-
ity. Context-Free Grammars [
33
] can be used with little over-
head in processing, while Attribute Grammars [
5
] can en-
code more complex restrictions. Similarly, polymorphic type
systems [
37
] are more expressive than monomorphic ones,
but they have a higher overhead during the evolutionary
process.
However, in practice, popular implementations of these
systems are limited to Context-Free grammars (CFG) (e.g.,
PonyGE2 [
7
]) and base types (e.g., DEAP [
11
] and Jenet-
ics [36]).
Problem Statement
GGGP requires users to express the search-space of valid pro-
grams in the Backus Normal Form (BNF) or one of its deriva-
tions, such as the Extended Backus Normal Form (EBNF).
We consider BNF and its derivatives to be a textual editing
medium of the target language, as concepts in the target
language (like types, functions, literals) have no meaning in
the grammar language. As such, users are treating the target
language as character streams, instead of treating them as
semantic code units, as they do in code editors.
The BNF approach is not ergonomic nor error-prone due
to the interleaving of two languages: the BNF meta-syntax
and the syntax of the target language, i.e., the language of
the generated programs. As an example, Listing 1 depicts
two common errors that both novices and experts do while
writing grammars. These are not detected immediately, and
are typically only revealed during execution.
Ideally, these two bugs would be detected immediately in
most development environments: the type-checker would
identify a missing import, and the type mismatch in the
1<S> ::= <S> "+" <S> | "x" | 1 | np.array([<A>])
2<A> ::= [0-9]+,<A> | [0-9]+
Listing 1.
Example of a grammar with two possible bugs:
the
np
module may not be imported, and the
+
operator may
be applied to a string and integer, causing a crash.
1<Real> ::= <RealLit> | DATASET[GE_RANGE:3] | <
RealLit> + <RealLit> | np.average(<Vector>)
2<RealLit> ::= <d>.<d>
3<d> ::= GE_RANGE:10<d> |
4<Vector> ::= DATASET[3 + GE_RANGE:3]
Listing 2. PonyGE2 grammar for a subset of VectorialGP
operator. Is it really necessary that, to implement a grammar,
developers have to lose access to an array of tooling that
has shown to be useful in software engineering? Or is there
a more ergonomic interface that is integrated into the host
language ecosystem, providing access to code editors, syntax
highlighting, linting, formatting, test suites, etc...?
If there is, such an approach could reduce the barrier to
adoption of GGGP, STGP, and similar systems, increasing
the productivity of practitioners and reducing the number of
possible bugs that are not caught earlier due the mismatch
between the host language and the grammar language.
It is important to notice that usability has been identied
as a relevant open challenge for GP both in 2010 [
25
], and
again in 2020, when it was identied as one of the 10 major
challenges for the following 15 years [
30
]. This work ad-
dresses exactly that problem, especially in the case where
practitioners encode domain information to improve the
performance of GP.
Approach
Our rst contribution is an object-oriented encoding of gram-
mars to be used in GGGP, regardless of representation. In
short, non-terminals are represented as abstract classes (or
interfaces, or traits) and productions are represented as con-
crete classes that implement the corresponding abstract class.
As a running example, we will use grammars to describe
the VectorialGP example in Azzali et al
. [2]
. Listing 2 de-
scribes that subset using a PonyGE2’s avor of BNF, while
Listing 3 encodes the same grammar using the proposed
approach, both implemented in Python.
There is a direct correspondence between the abstract and
concrete types, and the non-terminals and productions of
the grammar. This direct correspondence is an advantage of
this approach, that does not add more cognitive overhead
when using the Object-Oriented encoding.
摘要:

DatatypesasamoreergonomicfrontendforGrammar-GuidedGeneticProgrammingGuilhermeEspadagjespada@ciencias.ulisboa.ptLASIGE,FaculdadedeCiênciasdaUniversidadedeLisboaPortugalLeonIngelselingelse@lasige.di.fc.ul.ptLASIGE,FaculdadedeCiênciasdaUniversidadedeLisboaPortugalPauloCanelaspacsantos@ciencias.ulisboa....

展开>> 收起<<
Data types as a more ergonomic frontend for Grammar-Guided Genetic Programming_2.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:576.83KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注