process of MCTS-VS, after optimizing a subset
xM
of variables where
M⊆[D]
denotes the indexes
of the variables, a set
D
of sampled points will be generated, and the pair
(M,D)
will be recorded
into a set D, called information set. The variable score vector is based on D, and calculated as
s=
X
(M,D)∈DX
(xi,yi)∈D
yi·g(M)
X
(M,D)∈D
|D| · g(M)
,(2)
where the function
g: 2[D]→ {0,1}D
gives the Boolean vector representation of a variable index
subset
M⊆[D]
(i.e., the
i
-th element of
g(M)
is
1
if
i∈M
, and 0 otherwise), and
/
is the element-
wise division. Each dimension of
P(M,D)∈DP(xi,yi)∈D yi·g(M)
is the sum of query evaluations
using each variable, and each dimension of
P(M,D)∈D|D| · g(M)
is the number of queries using
each variable. Thus, the
i
-th element of variable score
s
, representing the importance of the
i
-th
variable
xi
, is actually measured by the average goodness of all the sampled points that are generated
by optimizing a subset of variables containing
xi
. The variable score
s
will be used to define the
value of each tree node of MCTS as well as for node expansion.
In MCTS-VS, the root of the tree represents all variables. A tree node
X
represents a subset of
variables, whose index set is denoted by
AX⊆[D]
, and it stores the value
vX
and the number
nX
of visits, which are used to calculate the value of UCB as in Eq. (1). The value
vX
is defined as
the average score (i.e., importance) of the variables contained by
X
, which can be calculated by
s·g(AX)/|AX|
, where
g(AX)
is the Boolean vector representation of
AX
and
|AX|
is the size of
AX, i.e., the number of variables in node X.
At each iteration, MCTS-VS first recursively selects a node with larger UCB until a leaf node (denoted
as
X
), which is regarded as containing important variables. Note that if we optimize the subset
xAX
of variables represented by the leaf
X
directly, the variables in
xAX
will have the same score (because
they are optimized together), and their relative importance cannot be further distinguished. Thus,
MCTS-VS uniformly selects a variable index subset
M
from
AX
at random, and employs BO to
optimize
xM
as well as
xAX\M
; this process is repeated for several times. After that, the information
set
D
will be augmented by the pairs of the selected variable index subset
M
(or
AX\M
) and the
corresponding sampled points generated by BO. The variable score vector
s
will be updated using
this new
D
. Based on
s
, the variable index set
AX
represented by the leaf
X
will be divided into two
disjoint subsets, containing variables with larger and smaller scores (i.e., important and unimportant
variables), respectively, and the leaf
X
will be bifurcated into two child nodes accordingly. Finally,
the
v
values of these two children will be calculated using the variable score vector
s
, and back-
propagation will be performed to update the vvalue and the number of visits of the nodes along the
current path of the tree.
MCTS-VS can be equipped with any specific BO optimizer, resulting in the concrete algorithm
MCTS-VS-BO, where BO is used to optimize the selected subsets of variables during the running
of MCTS-VS. Compared with LA-MCTS [
40
], MCTS-VS applies MCTS to partition the variables
instead of the search space, and thus can be more scalable. Compared with the previous variable
selection method Dropout [
21
], MCTS-VS can select important variables automatically instead of
randomly selecting a fixed number of variables in each iteration. Next we introduce it in detail.
3.1 Details of MCTS-VS
The procedure of MCTS-VS is described in Algorithm 1. In line 1, it first initializes the information
set
D
. In particular, a variable index subset
Mi
is randomly sampled from
[D]
, and the Latin
hypercube sampling [
24
] is used to generate two sets (denoted as
Di
and
D¯
i
) of
Ns
points to form
the two pairs of
(Mi,Di)
and
(¯
Mi,D¯
i)
, where
¯
Mi= [D]\Mi
. This process will be repeated for
Nv
times, resulting in the initial
D={(Mi,Di),(¯
Mi,D¯
i)}Nv
i=1
. The variable score vector
s
is calculated
using this initial
D
in line 3, and the Monte Carlo tree is initialized in line 4 by adding only a root
node, whose
v
value is calculated according to
s
and number of visits is 0. MCTS-VS uses the
variable
t
to record the number of evaluations it has performed, and thus
t
is set to
2×Nv×Ns
in
line 5 as the initial Dcontains 2×Nv×Nssampled points in total.
In each iteration (i.e., lines 7–28) of MCTS-VS, it selects a leaf node
X
by UCB in line 10, and
optimizes the variables (i.e.,
xAX
) represented by
X
in lines 13–23. Note that to measure the
relative importance of variables in
xAX
, MCTS-VS optimizes different subsets of variables of
xAX
4