Shortest Edit Path Crossover A Theory-driven Solution to the Permutation Problem in Evolutionary Neural Architecture Search

2025-05-03 0 0 4.37MB 26 页 10玖币

侵权投诉

Shortest Edit Path Crossover: A Theory-driven Solution to the Permutation

Problem in Evolutionary Neural Architecture Search

Xin Qiu 1Risto Miikkulainen 1 2

Abstract

Population-based search has recently emerged as

a possible alternative to Reinforcement Learn-

ing (RL) for black-box neural architecture search

(NAS). It performs well in practice even though

it is not theoretically well understood. In particu-

lar, whereas traditional population-based search

methods such as evolutionary algorithms (EAs)

draw much power from crossover operations, it

is difﬁcult to take advantage of them in NAS.

The main obstacle is believed to be the permu-

tation problem: The mapping between genotype

and phenotype in traditional graph representations

is many-to-one, leading to a disruptive effect of

standard crossover. This paper presents the ﬁrst

theoretical analysis of the behaviors of mutation,

crossover and RL in black-box NAS, and proposes

a new crossover operator based on the shortest edit

path (SEP) in graph space. The SEP crossover

is shown theoretically to overcome the permuta-

tion problem, and as a result, have a better ex-

pected improvement compared to mutation, stan-

dard crossover and RL. Further, it empirically out-

perform these other methods on state-of-the-art

NAS benchmarks. The SEP crossover therefore

allows taking full advantage of population-based

search in NAS, and the underlying theory can

serve as a foundation for deeper understanding of

black-box NAS methods in general.

1. Introduction

Neural architecture search (NAS), a technique for automat-

ically designing architectures for neural networks, outper-

forms human-designed models in many tasks (Zoph et al.,

Cognizant AI Labs, San Francisco, USA

The Uni-

versity of Texas at Austin, Austin, USA. Correspondence

to: Xin Qiu

qiuxin.nju@gmail.com

, Risto Miikkulainen

<risto@cognizant.com>.

Proceedings of the

40 th

International Conference on Machine

Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright

2023 by the author(s).

2018;Chen et al.,2018;Miikkulainen et al.,2021). One

major branch of NAS approaches are the black-box NAS

methods, which require only zeroth-order information about

the objectives. While reinforcement learning (RL) con-

tributed to the early success of black-box NAS methods

(Zoph & Le,2017), population-based search has emerged

recently as a popular and empirically more powerful alter-

native (Real et al.,2017;2019;Ying et al.,2019), achieving

SOTA performance in various benchmarks and real-world

domains (Real et al.,2017;Elsken et al.,2019;Real et al.,

2019;So et al.,2021;Gao et al.,2022). Population-based

NAS is usually based on evolutionary algorithms (EAs) (Liu

et al.,2021), which mimic natural evolution by maintaining

a population of solutions and evolving them through muta-

tion and crossover. Mutation provides for local search (i.e.

reﬁnement), while crossover implements a directed global

search, and thus constitutes the engine behind evolution-

ary discovery. However, most recent evolutionary NAS

methods are limited to mutation only (Real et al.,2017;

Fernando et al.,2017;Liu et al.,2018;Elsken et al.,2019;

Real et al.,2019;So et al.,2021;Co-Reyes et al.,2021;Gao

et al.,2022), which has also been used extensively in simple

hill-climbing/local search methods (White et al.,2021a;b).

The main obstacle in applying crossover to NAS is the

permutation problem (Radcliffe,1992;1993), also known

as the competing conventions problem (Montana & Davis,

1989;Schaffer et al.,1992). This problem is due to iso-

morphisms in graph space, i.e., functionally identical archi-

tectures are mapped to different encodings/representations,

making crossover operations disruptive. A number of pos-

sible solutions to this problem have been proposed in the

neuroevolution community (Thierens,1996;Stanley & Mi-

ikkulainen,2002;Dragoni et al.,2014;Mahmood et al.,

2007;Wang et al.,2018;Uriot & Izzo,2020). However, they

either only work on ﬁxed or constrained network topologies,

or are limited to one particular algorithm or search space;

none of them generalize to arbitrary graphs or architectures

such as those that might arise from NAS. Moreover, prior

work has focused only on empirical veriﬁcation without a

theoretical analysis of potential solutions. Theoretical un-

derstanding of search efﬁciency of mutation, crossover, and

RL is still lacking in black-box NAS.

arXiv:2210.14016v4 [cs.NE] 29 May 2023

Shortest Edit Path Crossover

To meet the above challenges, this paper ﬁrst proposes a new

crossover operator based on shortest edit path (SEP) in the

original graph space. The SEP crossover does not impose

any constraints on other algorithmic components or appli-

cation scope, thereby forming a simple and generalizable

solution to the permutation problem. Second, a theory is

derived for analyzing mutation, standard crossover, RL, and

the proposed SEP crossover in the NAS domain. The SEP

crossover is shown to have the best expected improvement in

terms of graph edit distance (GED) between the found archi-

tecture and the global optimum. Third, empirical results on

SOTA NAS benchmarks further verify the theoretical analy-

sis, demonstrating that the SEP approach is effective. It thus

allows taking full advantage of population-based search, and

serves as a theoretical foundation for further research on

methods for NAS and similar problems. All source codes

for reproducing the experimental results are provided at:

(https://github.com/cognizant-ai-labs/sepx-paper).

2. Related Work

NAS NAS approaches can generally be categorized into

two groups: one-shot methods and black-box methods

(Mehta et al.,2022). In one-shot approaches (Liu et al.,

2019;Dong & Yang,2019;Chen et al.,2021), a supernet

is trained to represent the entire search space. The overall

training cost is reduced signiﬁcantly; however, these ap-

proaches can only be run on small cell-based search spaces

with a complete graph (Mehta et al.,2022;Zela et al.,2020)

and the search objectives must be differentiable. In contrast,

although computationally more expensive, black-box meth-

ods have no restrictions on the search space or objectives,

making them a more general solution to NAS. Thus, this

paper will focus on black-box NAS.

Black-box NAS Black-box NAS methods, also called

zeroth-order methods, iteratively generate architectures for

evaluation, and then use the outcome to update the search

strategy. There are four main types of search strategies in

black-box NAS approaches: random search (Li & Talwalkar,

2020;Yu et al.,2020), RL (Zoph & Le,2017;Zoph et al.,

2018), evolutionary search (Real et al.,2017;2019), and

local search (White et al.,2021a;b). Local search, whether

used alone or together with neural predictors (e.g., Bayesian

models), is based on operations that are essentially the same

as mutation (although different terminology may be used)

(White et al.,2021a;b). They can therefore be seen as equiv-

alent to mutation-only evolutionary search with a population

size of one. The one search strategy that is signiﬁcantly dif-

ferent from evolutionary methods is RL, and will thus be

included in the theoretical analysis in this paper. The the-

ory developed in this paper thus covers most of the search

strategies in Black-box NAS.

RL-based black-box NAS RL-based methods work by iter-

atively sampling architectures using a RL agent, then col-

lecting the prediction accuracies as the reward for updating

the policy. Zoph & Le (2017) successfully generated well-

performing convolutional networks and recurrent cells using

a specially designed recurrent neural network as the agent.

Zoph et al. (2018) further showed that the approach ﬁnds

architectures that transfer well between different datasets.

In a recent empirical study (Ying et al.,2019), a simple RL

agent based on multinomial probability distributions was

found to perform signiﬁcantly better on NAS-bench-101

than previous RL-based NAS methods. This RL controller

is analyzed in this paper as well.

Evolutionary Black-box NAS Evolutionary NAS methods

work by improving a population of architectures over time

(Liu et al.,2021). To generate new offspring architectures,

two operators can be used: a random edge/node mutation ap-

plied to an existing architecture, and crossover to recombine

two existing architectures. Architectures that do not perform

well are removed periodically from the population, and the

best-performing architecture returned in the end. While

crossover is a powerful operator, most existing methods rely

on mutation only because of the permutation problem. It is

this problem that this work aims to solve, in order to take

full advantage of the evolutionary approch in NAS.

The permutation problem and existing solutions The

permutation problem has been discussed in the Neuroevolu-

tion community for many years. One simple but common

solution is simply to get rid of crossover completely dur-

ing evolution (Angeline et al.,1994;Yao & Liu,1998).

Indeed, almost all newly developed evolutionary NAS meth-

ods avoid using a crossover operator (Real et al.,2017;

Fernando et al.,2017;Liu et al.,2018;Elsken et al.,2019;

Real et al.,2019;So et al.,2021;Co-Reyes et al.,2021;

Gao et al.,2022). For instance, Real et al. (2017) reported

that crossover operators were included in their initial experi-

ments, but no performance improvement was observed, and

therefore only mutation was deployed in their ﬁnal Amoe-

baNet algorithm.

A number of principled solutions have been proposed to

overcome the permutation problem as well. Many of them

require that the network topologies are ﬁxed. For instance,

Thierens (1996) proposed a non-redundant encoding for

matching neurons during crossover, Uriot & Izzo (2020) de-

veloped a safe crossover through a neural alignment mecha-

nism, and Gangwani & Peng (2018) used genetic distillation

to improve crossover. Further, Dragoni et al. (2014) pro-

posed a generalization where the population can include

different topologies, but only parents with a similar topol-

ogy can be crossed over.

Other solutions have been developed for special cases, mak-

ing them non-applicable to arbitrary architectures. For in-

stance, the unit-alignment method (Sun et al.,2020) utilizes

Shortest Edit Path Crossover

a special encoding that is only for CNN-based architectures.

A graph matching recombination operator (Mahmood et al.,

2007) only applies to parents with very different qualities. It

mimics the behaviors of mutating the weaker parent towards

the stronger parent, so the offspring does not differ from

parents greatly. A modular inheritable crossover (He et al.,

2021) is developed for a speciﬁc cell-based structure, and

the default order of the nodes is preserved when performing

crossover, without any node matching or reordering. As a

result, the permutation problem still remains. The historical

markings in NEAT-based algorithms (Stanley & Miikku-

lainen,2002;Miikkulainen et al.,2019) are intended to be

used together with other mechanisms in NEAT, and cannot

be directly applied to any given architectures.

In contrast to these existing solutions, the proposed SEP

crossover does not have any constraints on the encoding or

other algorithmic components, and can be directly applied

to any arbitrary architectures.

3. The Shortest Edit Path Crossover

In this section, the permutation problem is ﬁrst described

and a solution to it proposed in the form of Shortest Edit

Path Crossover.

Given two neural architectures as parents, a crossover oper-

ator generates an offspring architecture by recombining the

two parents. The crossover design consists of the encoding

(i.e. genotype) and the recombination strategy, with the goal

of properly integrating the information in both parents. The

permutation problem arises when the same architecture (i.e.

phenotype) can have multiple distinct genotypes. As a re-

sult, crossover on these genotypes has a disruptive effect on

the information encoded in the parents, leading to damaged

offspring (Stanley & Miikkulainen,2002).

In order to propose a solution, let us ﬁrst deﬁne a repre-

sentation of the neural network architecture and a distance

metric between two architectures. A neural architecture is

a computation graph that can always be represented by an

attributed directed graph, deﬁned as:

Deﬁnition 3.1 (Directed graph).A directed graph

con-

sists of a set of vertices

V={vi|i= 1,2, . . . , n}

, where

is the number of vertices and each

denotes a vertex (node),

and a set of directed edges

E={ei,j |i, j ∈1,2, . . . , n}

where

ei,j

denotes a directed edge from

. The order

of a directed graph

equals the number of its vertices, rep-

resented by

|G|

. For an attributed directed graph, a function

γv

assigns an attribute (e.g., an integer) to each vertex, and

a funtion γeassigns an attribute to each edge.

In the context of NAS, each vertex with an attribute denotes

an operation in a neural architecture, and the directed edges

denote data ﬂows. The similarity between two architectures

can then be measured by the graph edit distance (GED)

between their corresponding graphs, deﬁned as:

Deﬁnition 3.2 (Graph edit distance).A graph edit operation

is deﬁned as a function

δ:G → G′

that applies an elemen-

tary graph edit to transform

G′

. In standard neural ar-

chitecture search, the set of elementary graph edits typically

includes vertex deletion/insertion, edge deletion/insertion,

and vertex attribute substitution. An edit path is deﬁned

as a sequence of graph edit operations

δ=δ1, δ2, . . . , δd

where

is the length of the edit path. Application of

to a graph is equivalent to applying each edit sequentially:

δ(G) = δd◦. . . ◦δ2◦δ1(G)

. Graph edit distance between

two graphs

and

is then deﬁned as

GED(G1,G2) =

minδ∈∆(G1,G2)Pd

i=1 c(δi)

, where

∆(G1,G2)

denotes the

set of all edit paths that transform

to an isomorphism of

(including

itself),

δ=δ1, δ2, . . . , δd

, and

c(δi)

is the

cost of edit

δi

. In this work, all types of edit operations are

deﬁned to have the same cost of

. As a result, the edit path

that minimizes the total edit cost,

δ∗

G1,G2

, equals the shortest

edit path between

and

. Thus,

GED(G1,G2) = d∗

G1,G2

where

d∗

G1,G2

is the length of this shortest edit path. Note

that

δ∗

G1,G2

may not be unique, and thus there may exist

multiple shortest edit paths that have the same length.

The proposed SEP crossover is then deﬁned as

Deﬁnition 3.3 (Shortest edit path (SEP) crossover).Given

two attributed directed graphs

and

, suppose

δ∗

G1,G2=

δ∗

1, δ∗

2, . . . , δ∗

d∗

G1,G2

. SEP crossover generates an offspring

graph Gnew by

Gnew =δ∗

πr(⌈

d∗

G1,G2

2⌉)

◦δ∗

πr(⌈

d∗

G1,G2

2⌉−1)

◦δ∗

πr(⌈

d∗

G1,G2

2⌉−2)

◦. . . ◦δ∗

πr(2) ◦δ∗

πr(1)(G1),

where

πr

is a random permutation of the

d∗

G1,G2

indices:

πr: 1,2, . . . , d∗

G1,G2→π(1), π(2), . . . , π(d∗

G1,G2)

, and

⌈·⌉

denotes the ceiling function. In other words, the SEP

crossover shufﬂes the edits randomly in the SEP between

parents, then selects half of them randomly, and applies

them to one of the parents to obtain the offspring.

This operator is motivated by a common observation in the

literature (e.g. Ying et al.,2019;White et al.,2021a;Mehta

et al.,2022) that the differences in predictive performance

between two architectures are positively correlated with

their GEDs. This observation suggests that the edits in the

SEP encode fundamental differences between two architec-

tures that matter to predictive performance. An offspring

that lies in the middle of this SEP can explore the search

regions where the parents have fundamental discrepancies.

At the same time, the offspring can automatically preserve

those common substructures between parents, avoiding un-

necessary disruptive behaviors, and thus avoiding the per-

mutation problem. A visual demo showing how the SEP

crossover resolves the permutation problem is provided in

Appendix A.3.

Shortest Edit Path Crossover

4. Theoretical Analysis

In this section, the SEP crossover, standard crossover, muta-

tion, and RL approaches to NAS will be analyzed theoreti-

cally, showing that the SEP crossover has an advantage in

improving the expected quality of generated graphs. The

fundamental concepts are deﬁned ﬁrst in Section 4.1, lead-

ing to new interpretations of graph edit distance, crossover

and mutation based on attributed adjacency matrices. Feasi-

bility assumptions are then declared, and theorems derived

for expected improvement for SEP, standard crossover, and

mutation. Section 4.2 focuses on RL: It interprets RL in

terms of the same fundamental concepts, deﬁnes two ex-

treme cases whose combinations span the possible states

of the RL process, and derives theorems for expected im-

provement for both. Section 4.3 then brings these theorems

together, showing that the SEP crossover results in more

improvement than the other methods in common NAS se-

tups. Section 4.4 further veriﬁes the robustness of the SEP

crossover under inaccurate GED calculations. All proofs

and lemmas are included in Appendix A.1. For clarify, a full

list of mathematical symbols is provided in Appendix A.2.

4.1. Expected Improvement with Crossover and

Mutation

First, let us deﬁne the basic concepts:

Deﬁnition 4.1 (Attributed adjacency matrix).An attributed

adjacency matrix (AA-matrix)

is a representation of an

attributed directed graph. It is a

n×n

matrix, where

is the

number of vertices in

. The entry in

th row and

th column

is represented by

i,j

i,j = 0

if there is no edge from

, and

i,j =γe(ei,j )

if there exists an edge from

, for

i, j ∈1,2, . . . , n

and

i̸=j

i,i =γv(vi)

, for

i∈1,2, . . . , n.

Deﬁnition 4.2 (Permutation matrix).Given a permutation

elements:

π: 1,2, . . . , n →π(1), π(2), . . . , π(n)

, a

permutation matrix

Pπ

can be constructed by permuting the

columns or rows of an

n×n

identity matrix

according

. In this work, a column permutation of

is performed

to obtain

Pπ

, The entry in

th row and

th column is rep-

resented by

Pπ

i,j

, and

Pπ

i,j = 1 if j=π(i),and Pπ

i,j =

0 otherwise.

Deﬁnition 4.3 (Null vertex).A null vertex has no connec-

tions to other existing vertices in a graph. It is assigned with

a special ”null” attribute, which means that it does not have

any impact on the original graph. Null vertices are added

to an existing graph only for convenience of theoretical

analysis, and they do not affect the calculation of GEDs.

Based on the above deﬁnitions, GED, crossover and muta-

tion can be interpreted from the AA-matrix perspective:

Deﬁnition 4.4 (AA-matrix-based interpretation of GED).

Two graphs

and

can both be extended to have the same

order

n= max(|G1|,|G2|)

by adding null vertices. The

extended

and

are denoted as

and

. Calculating

the GED between G1and G2can then be deﬁned as

GED(G1,G2) = min

π∈Sn

d(Aˆ

G1,PπAˆ

G2P⊤

π),

where

d(A,B) = Pm

i=1 Pn

j=1 1Ai,j ̸=Bi,j

m×n

is the

order of both

and

1condition

is 1 if the condition

is true, 0 otherwise (i.e.,

d(A,B)

counts the number of

different entries between two matrices with same shape),

denotes the set of all permutations of

{1,2,3, . . . , n}

. The

permutation that minimizes

d(Aˆ

G1,PπAˆ

G2P⊤

π)

is denoted

π∗

G1,ˆ

, and the permuted AA-matrix of

is denoted as

Aˆ

G2→ˆ

G1=Pπ∗

G1,ˆ

Aˆ

G2P⊤

π∗

G1,ˆ

Remark 4.5.In the context of standard neural architecture

search, assume

γe(·)

always assigns 1 to any existing edge,

and

γv(·)

assigns 0 to ”null” vertex and positive integers

for other types of vertex attributes (each type of attribute

has its own unique integer). Then the differences between

Aˆ

and

Aˆ

G2→ˆ

correspond to the shortest edit path that

transforms

in the following way:

δ:=

(1) add a

vertex with attribute

Aˆ

G2→ˆ

i,i

, if

Aˆ

i,i = 0 and Aˆ

G2→ˆ

i,i >0

;

(2) delete vertex

from

, if

Aˆ

i,i >0 and Aˆ

G2→ˆ

i,i = 0

;

(3) change attribute of vertex

Aˆ

G2→ˆ

i,i

, if

Aˆ

i,i >0

and

Aˆ

G2→ˆ

i,i >0

; (4) add an edge from

, if

Aˆ

i,j = 0

and

Aˆ

G2→ˆ

i,j = 1, i ̸=j

; (5) delete the edge from

Aˆ

i,j = 1

and

Aˆ

G2→ˆ

i,j = 0, i ̸=j

. Note that when adding

an edge, the origin

and/or destination

may be newly

added vertices.

Deﬁnition 4.6 (AA-matrix-based interpretation of

crossover).Assume two graphs

and

are extended to

have the same order by adding null vertices, resulting

and

. A crossover between

and

is deﬁned as the

process of generating an offspring graph

Gnew

by recombin-

ing

Aˆ

and

Aˆ

Gnew =r(Aˆ

G1,PπAˆ

G2P⊤

π)

, where

function

r(A,B)

returns a matrix that inherits each entry

from

with probability 0.5. That is, if

C=r(A,B)

then

p(Ci,j =Ai,j ) = p(Ci,j =Bi,j ) = 0.5

for any valid

i, j

Pπ

is a permutation matrix based on permutation

which is decided by the speciﬁc crossover operator utilized.

For the SEP crossover,

π=π∗

G1,ˆ

, which minimizes the

GED between

and

. For the standard crossover, since

the vertices may be in any order in the original AA-matrix

representation and there is no particular vertex/edge

matching mechanisms during crossover, a purely random

permutation

πrand

is used to represent this randomness.

The result,

Aˆ

Gnew

, is the AA-matrix of the generated new

graph with null vertices. By removing all null vertices from

Gnew, the offspring graph Gnew is obtained.

Deﬁnition 4.7 (AA-matrix-based interpretation of mutation).

Given a graph

, a mutation operation is deﬁned as the

Shortest Edit Path Crossover

process of generating an offspring graph Gnew by mutating

. In standard NAS, allowed mutations to

include

vertex deletion/insertion, edge deletion/insertion, and vertex

attribute substitution. In the AA-matrix representation, a

mutation operation is then deﬁned as

Aˆ

Gnew =m(Aˆ

G1)

where function

m(A)

alters each element of

with an

equal probability

. and

is usually selected so that

on average one element is altered during each mutation

operation. The

is the extended graph of

with null

vertices, so that node additions can be performed in

Aˆ

(by

changing a null vertex to a vertex with a valid attribute). An

element

Aˆ

i,j

can be altered in order to randomly resample

an allowed value that is different from the original

Aˆ

i,j

. The

result,

Aˆ

Gnew

, is the AA-matrix of the generated new graph

with null vertices. By removing all null vertices from

Gnew

the mutated offspring graph Gnew is obtained.

Next, in order to deﬁne a performance metric for compar-

ing different crossover and mutation operators, a realistic

assumption needs to be made about the search space:

Locality in NAS search spaces means that close architec-

tures (in terms of GED) tend to have similar performance.

Random-walk autocorrelation (RWA; Weinberger,1990) is

a commonly used metric to measure such locality. Strong

autocorrelation of prediction accuracies of architectures dur-

ing a random walk, in which each move is a graph edit

operation, has been consistently observed in many existing

NAS benchmarks or studies (Ying et al.,2019;White et al.,

2021a;Mehta et al.,2022). This observation leads to the

following assumption:

Assumption 4.8 (Positive correlation between GED and ﬁt-

ness/reward difference).If

GED(Gi,Gj)<GED(Gi,Gk)

then

E(|f(Gi)−f(Gj)|)<E(|f(Gi)−f(Gk)|)

, where

f(G)

returns the ﬁtness/reward of

, i.e., the prediction accuracy.

Suppose

Gopt

is the global optimal graph (i.e. the target

of the evolutionary search),

and

are the two par-

ents to undergo crossover or mutation, and

Gnew

is the

generated offspring. For convenience of theoretical analy-

sis,

Gopt

, and

are extended to have the same order

n= max(|Gopt|,|G1|,|G2|)

by adding null vertices. The

extended

Gopt

is denoted as

Gopt

, and

and

Gnew

have

the same meaning as in Deﬁnitions 4.6 and 4.7.

Given assumption 4.8, a direct measurement of the

progress of the entire search is

GED(Gopt,Gnew)

, and

the ultimate goal is to minimize it so that a good

solution can be generated.

GED(Gopt,Gnew) =

d∗

Gopt,Gnew =d(Aˆ

Gopt,,Aˆ

Gnew →ˆ

Gopt,)

can be decom-

posed to

dv(Aˆ

Gopt ,Aˆ

Gnew →ˆ

Gopt )+de(Aˆ

Gopt ,Aˆ

Gnew →ˆ

Gopt )

where

dv(A,B) = Pi1Ai,i̸=Bi,i

counts only the number

of different diagonal entries, i.e., the differences in vertex at-

tributes, and

de(A,B) = PiPj̸=i1Ai,j ̸=Bi,j

counts the

number of different non-diagonal entries, i.e., the differ-

ences in edges/connections, thereby measuring the topologi-

cal similarity.

In order to derive a performance metric, let’s consider two

factors. First,

dv(·)

only covers

elements, whereas

de(·)

covers

n·(n−1)

elements. We have

n·(n−1) ≫n

when

increases, so

de(·)

is a dominant factor in deciding

GED(Gopt,Gnew)

. Second, modeling of vertex attributes

varies a lot across different NAS spaces, e.g., they have dif-

ferent numbers of usable attributes and different constraints

on vertex attribute assignments. In contrast,

γe(·)=1

can

simply be used for all valid edges in most NAS spaces, lead-

ing to generality of any theoretical conclusions. These two

factors suggest that

de(Aˆ

Gopt ,Aˆ

Gnew →ˆ

Gopt )

is a representa-

tive quantitative metric when comparing different crossover

and mutation operators theoretically. For simplicity, we will

use d∗

e,G1,G2to denote de(AG1,AG2→G1).

Accordingly, the main performance metric for crossover and

mutation can now be deﬁned as follows:

Deﬁnition 4.9 (Expected improvement of crossover and

mutation).This work focuses on the expected im-

provement in terms of topological similarity to the

global optimal graph. More speciﬁcally, expected

improvement refers to

E(max(de(Aˆ

Gopt ,Aˆ

G1→ˆ

Gopt )−

de(Aˆ

Gopt ,Aˆ

Gnew →ˆ

Gopt ),0))

, which compares offspring

graph

Gnew

with one parent graph

in terms of the ex-

pected edge/connection differences to

Gopt

. The

max(·,0)

part takes into account the selection pressure in standard

EAs; that is, only the offspring that is better than its parent

can survive and become the next parent.

As the penultimate step, three lemmas are derived in Ap-

pendix A.1 to assist the proofs regarding expected improve-

ment. According to Lemmas A.1 and A.2, any

π′

can be

chosen to analyze the behaviors of SEP crossover, stan-

dard crossover, and mutation, without affecting the result

de(Aˆ

Gopt ,Aˆ

Gnew →ˆ

Gopt )

. Lemma A.3 further derives the

lower bound for common parts in

Gopt

and

. Now,

choose

π′=π1=π∗

Gopt,ˆ

and

π2=π∗

G′

1,ˆ

so that there

are at least

ns= max(n2−d∗

Gopt,ˆ

G1−d∗

G1,ˆ

G2,0)

com-

mon entries among

Aˆ

Gopt

Aˆ

G1→ˆ

Gopt

and

Aˆ

G2→ˆ

G′

, where

Aˆ

G1→ˆ

Gopt =Aˆ

G′

. Regarding the remaining entries, the

following assumption is made:

Assumption 4.10 (Uniform distribution of differences).The

entries that are different between

Aˆ

G′

and

Aˆ

G2→ˆ

G′

are

assumed to be uniformly distributed on the positions other

than those nscommon entries.

With these lemmas and assumption, the expected improve-

ment of SEP crossover, standard crossover and mutation can

be derived:

Theorem 4.11 (Expected improvement of SEP crossover).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ShortestEditPathCrossover:ATheory-drivenSolutiontothePermutationProbleminEvolutionaryNeuralArchitectureSearchXinQiu1RistoMiikkulainen12AbstractPopulation-basedsearchhasrecentlyemergedasapossiblealternativetoReinforcementLearn-ing(RL)forblack-boxneuralarchitecturesearch(NAS).Itperformswellinpracticee...

展开>> 收起<<

Shortest Edit Path Crossover A Theory-driven Solution to the Permutation Problem in Evolutionary Neural Architecture Search.pdf

共26页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Shortest Edit Path Crossover A Theory-driven Solution to the Permutation Problem in Evolutionary Neural Architecture Search

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: