Shortest Edit Path Crossover A Theory-driven Solution to the Permutation Problem in Evolutionary Neural Architecture Search

2025-05-03 0 0 4.37MB 26 页 10玖币
侵权投诉
Shortest Edit Path Crossover: A Theory-driven Solution to the Permutation
Problem in Evolutionary Neural Architecture Search
Xin Qiu 1Risto Miikkulainen 1 2
Abstract
Population-based search has recently emerged as
a possible alternative to Reinforcement Learn-
ing (RL) for black-box neural architecture search
(NAS). It performs well in practice even though
it is not theoretically well understood. In particu-
lar, whereas traditional population-based search
methods such as evolutionary algorithms (EAs)
draw much power from crossover operations, it
is difficult to take advantage of them in NAS.
The main obstacle is believed to be the permu-
tation problem: The mapping between genotype
and phenotype in traditional graph representations
is many-to-one, leading to a disruptive effect of
standard crossover. This paper presents the first
theoretical analysis of the behaviors of mutation,
crossover and RL in black-box NAS, and proposes
a new crossover operator based on the shortest edit
path (SEP) in graph space. The SEP crossover
is shown theoretically to overcome the permuta-
tion problem, and as a result, have a better ex-
pected improvement compared to mutation, stan-
dard crossover and RL. Further, it empirically out-
perform these other methods on state-of-the-art
NAS benchmarks. The SEP crossover therefore
allows taking full advantage of population-based
search in NAS, and the underlying theory can
serve as a foundation for deeper understanding of
black-box NAS methods in general.
1. Introduction
Neural architecture search (NAS), a technique for automat-
ically designing architectures for neural networks, outper-
forms human-designed models in many tasks (Zoph et al.,
1
Cognizant AI Labs, San Francisco, USA
2
The Uni-
versity of Texas at Austin, Austin, USA. Correspondence
to: Xin Qiu
<
qiuxin.nju@gmail.com
>
, Risto Miikkulainen
<risto@cognizant.com>.
Proceedings of the
40 th
International Conference on Machine
Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright
2023 by the author(s).
2018;Chen et al.,2018;Miikkulainen et al.,2021). One
major branch of NAS approaches are the black-box NAS
methods, which require only zeroth-order information about
the objectives. While reinforcement learning (RL) con-
tributed to the early success of black-box NAS methods
(Zoph & Le,2017), population-based search has emerged
recently as a popular and empirically more powerful alter-
native (Real et al.,2017;2019;Ying et al.,2019), achieving
SOTA performance in various benchmarks and real-world
domains (Real et al.,2017;Elsken et al.,2019;Real et al.,
2019;So et al.,2021;Gao et al.,2022). Population-based
NAS is usually based on evolutionary algorithms (EAs) (Liu
et al.,2021), which mimic natural evolution by maintaining
a population of solutions and evolving them through muta-
tion and crossover. Mutation provides for local search (i.e.
refinement), while crossover implements a directed global
search, and thus constitutes the engine behind evolution-
ary discovery. However, most recent evolutionary NAS
methods are limited to mutation only (Real et al.,2017;
Fernando et al.,2017;Liu et al.,2018;Elsken et al.,2019;
Real et al.,2019;So et al.,2021;Co-Reyes et al.,2021;Gao
et al.,2022), which has also been used extensively in simple
hill-climbing/local search methods (White et al.,2021a;b).
The main obstacle in applying crossover to NAS is the
permutation problem (Radcliffe,1992;1993), also known
as the competing conventions problem (Montana & Davis,
1989;Schaffer et al.,1992). This problem is due to iso-
morphisms in graph space, i.e., functionally identical archi-
tectures are mapped to different encodings/representations,
making crossover operations disruptive. A number of pos-
sible solutions to this problem have been proposed in the
neuroevolution community (Thierens,1996;Stanley & Mi-
ikkulainen,2002;Dragoni et al.,2014;Mahmood et al.,
2007;Wang et al.,2018;Uriot & Izzo,2020). However, they
either only work on fixed or constrained network topologies,
or are limited to one particular algorithm or search space;
none of them generalize to arbitrary graphs or architectures
such as those that might arise from NAS. Moreover, prior
work has focused only on empirical verification without a
theoretical analysis of potential solutions. Theoretical un-
derstanding of search efficiency of mutation, crossover, and
RL is still lacking in black-box NAS.
1
arXiv:2210.14016v4 [cs.NE] 29 May 2023
Shortest Edit Path Crossover
To meet the above challenges, this paper first proposes a new
crossover operator based on shortest edit path (SEP) in the
original graph space. The SEP crossover does not impose
any constraints on other algorithmic components or appli-
cation scope, thereby forming a simple and generalizable
solution to the permutation problem. Second, a theory is
derived for analyzing mutation, standard crossover, RL, and
the proposed SEP crossover in the NAS domain. The SEP
crossover is shown to have the best expected improvement in
terms of graph edit distance (GED) between the found archi-
tecture and the global optimum. Third, empirical results on
SOTA NAS benchmarks further verify the theoretical analy-
sis, demonstrating that the SEP approach is effective. It thus
allows taking full advantage of population-based search, and
serves as a theoretical foundation for further research on
methods for NAS and similar problems. All source codes
for reproducing the experimental results are provided at:
(https://github.com/cognizant-ai-labs/sepx-paper).
2. Related Work
NAS NAS approaches can generally be categorized into
two groups: one-shot methods and black-box methods
(Mehta et al.,2022). In one-shot approaches (Liu et al.,
2019;Dong & Yang,2019;Chen et al.,2021), a supernet
is trained to represent the entire search space. The overall
training cost is reduced significantly; however, these ap-
proaches can only be run on small cell-based search spaces
with a complete graph (Mehta et al.,2022;Zela et al.,2020)
and the search objectives must be differentiable. In contrast,
although computationally more expensive, black-box meth-
ods have no restrictions on the search space or objectives,
making them a more general solution to NAS. Thus, this
paper will focus on black-box NAS.
Black-box NAS Black-box NAS methods, also called
zeroth-order methods, iteratively generate architectures for
evaluation, and then use the outcome to update the search
strategy. There are four main types of search strategies in
black-box NAS approaches: random search (Li & Talwalkar,
2020;Yu et al.,2020), RL (Zoph & Le,2017;Zoph et al.,
2018), evolutionary search (Real et al.,2017;2019), and
local search (White et al.,2021a;b). Local search, whether
used alone or together with neural predictors (e.g., Bayesian
models), is based on operations that are essentially the same
as mutation (although different terminology may be used)
(White et al.,2021a;b). They can therefore be seen as equiv-
alent to mutation-only evolutionary search with a population
size of one. The one search strategy that is significantly dif-
ferent from evolutionary methods is RL, and will thus be
included in the theoretical analysis in this paper. The the-
ory developed in this paper thus covers most of the search
strategies in Black-box NAS.
RL-based black-box NAS RL-based methods work by iter-
atively sampling architectures using a RL agent, then col-
lecting the prediction accuracies as the reward for updating
the policy. Zoph & Le (2017) successfully generated well-
performing convolutional networks and recurrent cells using
a specially designed recurrent neural network as the agent.
Zoph et al. (2018) further showed that the approach finds
architectures that transfer well between different datasets.
In a recent empirical study (Ying et al.,2019), a simple RL
agent based on multinomial probability distributions was
found to perform significantly better on NAS-bench-101
than previous RL-based NAS methods. This RL controller
is analyzed in this paper as well.
Evolutionary Black-box NAS Evolutionary NAS methods
work by improving a population of architectures over time
(Liu et al.,2021). To generate new offspring architectures,
two operators can be used: a random edge/node mutation ap-
plied to an existing architecture, and crossover to recombine
two existing architectures. Architectures that do not perform
well are removed periodically from the population, and the
best-performing architecture returned in the end. While
crossover is a powerful operator, most existing methods rely
on mutation only because of the permutation problem. It is
this problem that this work aims to solve, in order to take
full advantage of the evolutionary approch in NAS.
The permutation problem and existing solutions The
permutation problem has been discussed in the Neuroevolu-
tion community for many years. One simple but common
solution is simply to get rid of crossover completely dur-
ing evolution (Angeline et al.,1994;Yao & Liu,1998).
Indeed, almost all newly developed evolutionary NAS meth-
ods avoid using a crossover operator (Real et al.,2017;
Fernando et al.,2017;Liu et al.,2018;Elsken et al.,2019;
Real et al.,2019;So et al.,2021;Co-Reyes et al.,2021;
Gao et al.,2022). For instance, Real et al. (2017) reported
that crossover operators were included in their initial experi-
ments, but no performance improvement was observed, and
therefore only mutation was deployed in their final Amoe-
baNet algorithm.
A number of principled solutions have been proposed to
overcome the permutation problem as well. Many of them
require that the network topologies are fixed. For instance,
Thierens (1996) proposed a non-redundant encoding for
matching neurons during crossover, Uriot & Izzo (2020) de-
veloped a safe crossover through a neural alignment mecha-
nism, and Gangwani & Peng (2018) used genetic distillation
to improve crossover. Further, Dragoni et al. (2014) pro-
posed a generalization where the population can include
different topologies, but only parents with a similar topol-
ogy can be crossed over.
Other solutions have been developed for special cases, mak-
ing them non-applicable to arbitrary architectures. For in-
stance, the unit-alignment method (Sun et al.,2020) utilizes
2
Shortest Edit Path Crossover
a special encoding that is only for CNN-based architectures.
A graph matching recombination operator (Mahmood et al.,
2007) only applies to parents with very different qualities. It
mimics the behaviors of mutating the weaker parent towards
the stronger parent, so the offspring does not differ from
parents greatly. A modular inheritable crossover (He et al.,
2021) is developed for a specific cell-based structure, and
the default order of the nodes is preserved when performing
crossover, without any node matching or reordering. As a
result, the permutation problem still remains. The historical
markings in NEAT-based algorithms (Stanley & Miikku-
lainen,2002;Miikkulainen et al.,2019) are intended to be
used together with other mechanisms in NEAT, and cannot
be directly applied to any given architectures.
In contrast to these existing solutions, the proposed SEP
crossover does not have any constraints on the encoding or
other algorithmic components, and can be directly applied
to any arbitrary architectures.
3. The Shortest Edit Path Crossover
In this section, the permutation problem is first described
and a solution to it proposed in the form of Shortest Edit
Path Crossover.
Given two neural architectures as parents, a crossover oper-
ator generates an offspring architecture by recombining the
two parents. The crossover design consists of the encoding
(i.e. genotype) and the recombination strategy, with the goal
of properly integrating the information in both parents. The
permutation problem arises when the same architecture (i.e.
phenotype) can have multiple distinct genotypes. As a re-
sult, crossover on these genotypes has a disruptive effect on
the information encoded in the parents, leading to damaged
offspring (Stanley & Miikkulainen,2002).
In order to propose a solution, let us first define a repre-
sentation of the neural network architecture and a distance
metric between two architectures. A neural architecture is
a computation graph that can always be represented by an
attributed directed graph, defined as:
Definition 3.1 (Directed graph).A directed graph
G
con-
sists of a set of vertices
V={vi|i= 1,2, . . . , n}
, where
n
is the number of vertices and each
vi
denotes a vertex (node),
and a set of directed edges
E={ei,j |i, j 1,2, . . . , n}
,
where
ei,j
denotes a directed edge from
vi
to
vj
. The order
of a directed graph
G
equals the number of its vertices, rep-
resented by
|G|
. For an attributed directed graph, a function
γv
assigns an attribute (e.g., an integer) to each vertex, and
a funtion γeassigns an attribute to each edge.
In the context of NAS, each vertex with an attribute denotes
an operation in a neural architecture, and the directed edges
denote data flows. The similarity between two architectures
can then be measured by the graph edit distance (GED)
between their corresponding graphs, defined as:
Definition 3.2 (Graph edit distance).A graph edit operation
is defined as a function
δ:G → G
that applies an elemen-
tary graph edit to transform
G
to
G
. In standard neural ar-
chitecture search, the set of elementary graph edits typically
includes vertex deletion/insertion, edge deletion/insertion,
and vertex attribute substitution. An edit path is defined
as a sequence of graph edit operations
δ=δ1, δ2, . . . , δd
,
where
d
is the length of the edit path. Application of
δ
to a graph is equivalent to applying each edit sequentially:
δ(G) = δd. . . δ2δ1(G)
. Graph edit distance between
two graphs
G1
and
G2
is then defined as
GED(G1,G2) =
minδ∆(G1,G2)Pd
i=1 c(δi)
, where
∆(G1,G2)
denotes the
set of all edit paths that transform
G1
to an isomorphism of
G2
(including
G2
itself),
δ=δ1, δ2, . . . , δd
, and
c(δi)
is the
cost of edit
δi
. In this work, all types of edit operations are
defined to have the same cost of
1
. As a result, the edit path
that minimizes the total edit cost,
δ
G1,G2
, equals the shortest
edit path between
G1
and
G2
. Thus,
GED(G1,G2) = d
G1,G2
,
where
d
G1,G2
is the length of this shortest edit path. Note
that
δ
G1,G2
may not be unique, and thus there may exist
multiple shortest edit paths that have the same length.
The proposed SEP crossover is then defined as
Definition 3.3 (Shortest edit path (SEP) crossover).Given
two attributed directed graphs
G1
and
G2
, suppose
δ
G1,G2=
δ
1, δ
2, . . . , δ
d
G1,G2
. SEP crossover generates an offspring
graph Gnew by
Gnew =δ
πr(
d
G1,G2
2)
δ
πr(
d
G1,G2
2⌉−1)
δ
πr(
d
G1,G2
2⌉−2)
. . . δ
πr(2) δ
πr(1)(G1),
where
πr
is a random permutation of the
d
G1,G2
indices:
πr: 1,2, . . . , d
G1,G2π(1), π(2), . . . , π(d
G1,G2)
, and
⌈·⌉
denotes the ceiling function. In other words, the SEP
crossover shuffles the edits randomly in the SEP between
parents, then selects half of them randomly, and applies
them to one of the parents to obtain the offspring.
This operator is motivated by a common observation in the
literature (e.g. Ying et al.,2019;White et al.,2021a;Mehta
et al.,2022) that the differences in predictive performance
between two architectures are positively correlated with
their GEDs. This observation suggests that the edits in the
SEP encode fundamental differences between two architec-
tures that matter to predictive performance. An offspring
that lies in the middle of this SEP can explore the search
regions where the parents have fundamental discrepancies.
At the same time, the offspring can automatically preserve
those common substructures between parents, avoiding un-
necessary disruptive behaviors, and thus avoiding the per-
mutation problem. A visual demo showing how the SEP
crossover resolves the permutation problem is provided in
Appendix A.3.
3
Shortest Edit Path Crossover
4. Theoretical Analysis
In this section, the SEP crossover, standard crossover, muta-
tion, and RL approaches to NAS will be analyzed theoreti-
cally, showing that the SEP crossover has an advantage in
improving the expected quality of generated graphs. The
fundamental concepts are defined first in Section 4.1, lead-
ing to new interpretations of graph edit distance, crossover
and mutation based on attributed adjacency matrices. Feasi-
bility assumptions are then declared, and theorems derived
for expected improvement for SEP, standard crossover, and
mutation. Section 4.2 focuses on RL: It interprets RL in
terms of the same fundamental concepts, defines two ex-
treme cases whose combinations span the possible states
of the RL process, and derives theorems for expected im-
provement for both. Section 4.3 then brings these theorems
together, showing that the SEP crossover results in more
improvement than the other methods in common NAS se-
tups. Section 4.4 further verifies the robustness of the SEP
crossover under inaccurate GED calculations. All proofs
and lemmas are included in Appendix A.1. For clarify, a full
list of mathematical symbols is provided in Appendix A.2.
4.1. Expected Improvement with Crossover and
Mutation
First, let us define the basic concepts:
Definition 4.1 (Attributed adjacency matrix).An attributed
adjacency matrix (AA-matrix)
AG
is a representation of an
attributed directed graph. It is a
n×n
matrix, where
n
is the
number of vertices in
G
. The entry in
i
th row and
j
th column
is represented by
AG
i,j
.
AG
i,j = 0
if there is no edge from
vi
to
vj
, and
AG
i,j =γe(ei,j )
if there exists an edge from
vi
to
vj
, for
i, j 1,2, . . . , n
and
i̸=j
.
AG
i,i =γv(vi)
, for
i1,2, . . . , n.
Definition 4.2 (Permutation matrix).Given a permutation
π
of
n
elements:
π: 1,2, . . . , n π(1), π(2), . . . , π(n)
, a
permutation matrix
Pπ
can be constructed by permuting the
columns or rows of an
n×n
identity matrix
In
according
to
π
. In this work, a column permutation of
In
is performed
to obtain
Pπ
, The entry in
i
th row and
j
th column is rep-
resented by
Pπ
i,j
, and
Pπ
i,j = 1 if j=π(i),and Pπ
i,j =
0 otherwise.
Definition 4.3 (Null vertex).A null vertex has no connec-
tions to other existing vertices in a graph. It is assigned with
a special ”null” attribute, which means that it does not have
any impact on the original graph. Null vertices are added
to an existing graph only for convenience of theoretical
analysis, and they do not affect the calculation of GEDs.
Based on the above definitions, GED, crossover and muta-
tion can be interpreted from the AA-matrix perspective:
Definition 4.4 (AA-matrix-based interpretation of GED).
Two graphs
G1
and
G2
can both be extended to have the same
order
n= max(|G1|,|G2|)
by adding null vertices. The
extended
G1
and
G2
are denoted as
ˆ
G1
and
ˆ
G2
. Calculating
the GED between G1and G2can then be defined as
GED(G1,G2) = min
πSn
d(Aˆ
G1,PπAˆ
G2P
π),
where
d(A,B) = Pm
i=1 Pn
j=1 1Ai,j ̸=Bi,j
,
m×n
is the
order of both
A
and
B
,
1condition
is 1 if the condition
is true, 0 otherwise (i.e.,
d(A,B)
counts the number of
different entries between two matrices with same shape),
Sn
denotes the set of all permutations of
{1,2,3, . . . , n}
. The
permutation that minimizes
d(Aˆ
G1,PπAˆ
G2P
π)
is denoted
as
π
ˆ
G1,ˆ
G2
, and the permuted AA-matrix of
ˆ
G2
is denoted as
Aˆ
G2ˆ
G1=Pπ
ˆ
G1,ˆ
G2
Aˆ
G2P
π
ˆ
G1,ˆ
G2
.
Remark 4.5.In the context of standard neural architecture
search, assume
γe(·)
always assigns 1 to any existing edge,
and
γv(·)
assigns 0 to ”null” vertex and positive integers
for other types of vertex attributes (each type of attribute
has its own unique integer). Then the differences between
Aˆ
G1
and
Aˆ
G2ˆ
G1
correspond to the shortest edit path that
transforms
G1
to
G2
in the following way:
δ:=
(1) add a
vertex with attribute
Aˆ
G2ˆ
G1
i,i
, if
Aˆ
G1
i,i = 0 and Aˆ
G2ˆ
G1
i,i >0
;
(2) delete vertex
vi
from
G1
, if
Aˆ
G1
i,i >0 and Aˆ
G2ˆ
G1
i,i = 0
;
(3) change attribute of vertex
vi
to
Aˆ
G2ˆ
G1
i,i
, if
Aˆ
G1
i,i >0
and
Aˆ
G2ˆ
G1
i,i >0
; (4) add an edge from
vi
to
vj
, if
Aˆ
G1
i,j = 0
and
Aˆ
G2ˆ
G1
i,j = 1, i ̸=j
; (5) delete the edge from
vi
to
vj
,
if
Aˆ
G1
i,j = 1
and
Aˆ
G2ˆ
G1
i,j = 0, i ̸=j
. Note that when adding
an edge, the origin
vi
and/or destination
vj
may be newly
added vertices.
Definition 4.6 (AA-matrix-based interpretation of
crossover).Assume two graphs
G1
and
G2
are extended to
have the same order by adding null vertices, resulting
ˆ
G1
and
ˆ
G2
. A crossover between
G1
and
G2
is defined as the
process of generating an offspring graph
Gnew
by recombin-
ing
Aˆ
G1
and
Aˆ
G2
:
Aˆ
Gnew =r(Aˆ
G1,PπAˆ
G2P
π)
, where
function
r(A,B)
returns a matrix that inherits each entry
from
A
or
B
with probability 0.5. That is, if
C=r(A,B)
,
then
p(Ci,j =Ai,j ) = p(Ci,j =Bi,j ) = 0.5
for any valid
i, j
.
Pπ
is a permutation matrix based on permutation
π
,
which is decided by the specific crossover operator utilized.
For the SEP crossover,
π=π
ˆ
G1,ˆ
G2
, which minimizes the
GED between
G1
and
G2
. For the standard crossover, since
the vertices may be in any order in the original AA-matrix
representation and there is no particular vertex/edge
matching mechanisms during crossover, a purely random
permutation
πrand
is used to represent this randomness.
The result,
Aˆ
Gnew
, is the AA-matrix of the generated new
graph with null vertices. By removing all null vertices from
ˆ
Gnew, the offspring graph Gnew is obtained.
Definition 4.7 (AA-matrix-based interpretation of mutation).
Given a graph
G1
, a mutation operation is defined as the
4
Shortest Edit Path Crossover
process of generating an offspring graph Gnew by mutating
G1
. In standard NAS, allowed mutations to
G1
include
vertex deletion/insertion, edge deletion/insertion, and vertex
attribute substitution. In the AA-matrix representation, a
mutation operation is then defined as
Aˆ
Gnew =m(Aˆ
G1)
,
where function
m(A)
alters each element of
A
with an
equal probability
pm
. and
pm
is usually selected so that
on average one element is altered during each mutation
operation. The
ˆ
G1
is the extended graph of
G1
with null
vertices, so that node additions can be performed in
Aˆ
G1
(by
changing a null vertex to a vertex with a valid attribute). An
element
Aˆ
G1
i,j
can be altered in order to randomly resample
an allowed value that is different from the original
Aˆ
G1
i,j
. The
result,
Aˆ
Gnew
, is the AA-matrix of the generated new graph
with null vertices. By removing all null vertices from
ˆ
Gnew
,
the mutated offspring graph Gnew is obtained.
Next, in order to define a performance metric for compar-
ing different crossover and mutation operators, a realistic
assumption needs to be made about the search space:
Locality in NAS search spaces means that close architec-
tures (in terms of GED) tend to have similar performance.
Random-walk autocorrelation (RWA; Weinberger,1990) is
a commonly used metric to measure such locality. Strong
autocorrelation of prediction accuracies of architectures dur-
ing a random walk, in which each move is a graph edit
operation, has been consistently observed in many existing
NAS benchmarks or studies (Ying et al.,2019;White et al.,
2021a;Mehta et al.,2022). This observation leads to the
following assumption:
Assumption 4.8 (Positive correlation between GED and fit-
ness/reward difference).If
GED(Gi,Gj)<GED(Gi,Gk)
,
then
E(|f(Gi)f(Gj)|)<E(|f(Gi)f(Gk)|)
, where
f(G)
returns the fitness/reward of
G
, i.e., the prediction accuracy.
Suppose
Gopt
is the global optimal graph (i.e. the target
of the evolutionary search),
G1
and
G2
are the two par-
ents to undergo crossover or mutation, and
Gnew
is the
generated offspring. For convenience of theoretical analy-
sis,
Gopt
,
G1
, and
G2
are extended to have the same order
n= max(|Gopt|,|G1|,|G2|)
by adding null vertices. The
extended
Gopt
is denoted as
ˆ
Gopt
, and
ˆ
G1
,
ˆ
G2
and
ˆ
Gnew
have
the same meaning as in Definitions 4.6 and 4.7.
Given assumption 4.8, a direct measurement of the
progress of the entire search is
GED(Gopt,Gnew)
, and
the ultimate goal is to minimize it so that a good
solution can be generated.
GED(Gopt,Gnew) =
d
Gopt,Gnew =d(Aˆ
Gopt,,Aˆ
Gnew ˆ
Gopt,)
can be decom-
posed to
dv(Aˆ
Gopt ,Aˆ
Gnew ˆ
Gopt )+de(Aˆ
Gopt ,Aˆ
Gnew ˆ
Gopt )
,
where
dv(A,B) = Pi1Ai,i̸=Bi,i
counts only the number
of different diagonal entries, i.e., the differences in vertex at-
tributes, and
de(A,B) = PiPj̸=i1Ai,j ̸=Bi,j
counts the
number of different non-diagonal entries, i.e., the differ-
ences in edges/connections, thereby measuring the topologi-
cal similarity.
In order to derive a performance metric, let’s consider two
factors. First,
dv(·)
only covers
n
elements, whereas
de(·)
covers
n·(n1)
elements. We have
n·(n1) n
when
n
increases, so
de(·)
is a dominant factor in deciding
GED(Gopt,Gnew)
. Second, modeling of vertex attributes
varies a lot across different NAS spaces, e.g., they have dif-
ferent numbers of usable attributes and different constraints
on vertex attribute assignments. In contrast,
γe(·)=1
can
simply be used for all valid edges in most NAS spaces, lead-
ing to generality of any theoretical conclusions. These two
factors suggest that
de(Aˆ
Gopt ,Aˆ
Gnew ˆ
Gopt )
is a representa-
tive quantitative metric when comparing different crossover
and mutation operators theoretically. For simplicity, we will
use d
e,G1,G2to denote de(AG1,AG2→G1).
Accordingly, the main performance metric for crossover and
mutation can now be defined as follows:
Definition 4.9 (Expected improvement of crossover and
mutation).This work focuses on the expected im-
provement in terms of topological similarity to the
global optimal graph. More specifically, expected
improvement refers to
E(max(de(Aˆ
Gopt ,Aˆ
G1ˆ
Gopt )
de(Aˆ
Gopt ,Aˆ
Gnew ˆ
Gopt ),0))
, which compares offspring
graph
Gnew
with one parent graph
G1
in terms of the ex-
pected edge/connection differences to
Gopt
. The
max(·,0)
part takes into account the selection pressure in standard
EAs; that is, only the offspring that is better than its parent
can survive and become the next parent.
As the penultimate step, three lemmas are derived in Ap-
pendix A.1 to assist the proofs regarding expected improve-
ment. According to Lemmas A.1 and A.2, any
π
can be
chosen to analyze the behaviors of SEP crossover, stan-
dard crossover, and mutation, without affecting the result
of
de(Aˆ
Gopt ,Aˆ
Gnew ˆ
Gopt )
. Lemma A.3 further derives the
lower bound for common parts in
Gopt
,
G1
and
G2
. Now,
choose
π=π1=π
ˆ
Gopt,ˆ
G1
and
π2=π
ˆ
G
1,ˆ
G2
so that there
are at least
ns= max(n2d
ˆ
Gopt,ˆ
G1d
ˆ
G1,ˆ
G2,0)
com-
mon entries among
Aˆ
Gopt
,
Aˆ
G1ˆ
Gopt
and
Aˆ
G2ˆ
G
1
, where
Aˆ
G1ˆ
Gopt =Aˆ
G
1
. Regarding the remaining entries, the
following assumption is made:
Assumption 4.10 (Uniform distribution of differences).The
entries that are different between
Aˆ
G
1
and
Aˆ
G2ˆ
G
1
are
assumed to be uniformly distributed on the positions other
than those nscommon entries.
With these lemmas and assumption, the expected improve-
ment of SEP crossover, standard crossover and mutation can
be derived:
Theorem 4.11 (Expected improvement of SEP crossover).
5
摘要:

ShortestEditPathCrossover:ATheory-drivenSolutiontothePermutationProbleminEvolutionaryNeuralArchitectureSearchXinQiu1RistoMiikkulainen12AbstractPopulation-basedsearchhasrecentlyemergedasapossiblealternativetoReinforcementLearn-ing(RL)forblack-boxneuralarchitecturesearch(NAS).Itperformswellinpracticee...

展开>> 收起<<
Shortest Edit Path Crossover A Theory-driven Solution to the Permutation Problem in Evolutionary Neural Architecture Search.pdf

共26页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:26 页 大小:4.37MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 26
客服
关注