Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection Abir De Soumen Chakrabarti

2025-05-02 0 0 656.65KB 24 页 10玖币
侵权投诉
Neural Estimation of Submodular Functions
with Applications to Differentiable Subset Selection
Abir De, Soumen Chakrabarti
Indian Institute of Technology Bombay
{abir,soumen}@cse.iitb.ac.in
Abstract
Submodular functions and variants, through their ability to characterize diversity and coverage, have emerged
as a key tool for data selection and summarization. Many recent approaches to learn submodular functions suffer
from limited expressiveness. In this work, we propose FLEXSUBNET, a family of flexible neural models for both
monotone and non-monotone submodular functions. To fit a latent submodular function from (set, value) observations,
FLEXSUBNET applies a concave function on modular functions in a recursive manner. We do not draw the concave
function from a restricted family, but rather learn from data using a highly expressive neural network that implements a
differentiable quadrature procedure. Such an expressive neural model for concave functions may be of independent
interest. Next, we extend this setup to provide a novel characterization of monotone
α
-submodular functions, a recently
introduced notion of approximate submodular functions. We then use this characterization to design a novel neural
model for such functions. Finally, we consider learning submodular set functions under distant supervision in the form
of (perimeter-set, high-value-subset) pairs. This yields a novel subset selection method based on an order-invariant,
yet greedy sampler built around the above neural set functions. Our experiments on synthetic and real data show that
FLEXSUBNET outperforms several baselines.
1 Introduction
Owing to their strong characterization of diversity and coverage, submodular functions and their extensions, viz., weak
and approximate submodular functions, have emerged as a powerful machinery in data selection tasks [
46
,
84
,
35
,
66
,
91
,
63
,
5
,
23
,
75
]. We propose trainable parameterized families of submodular functions under two supervision regimes.
In the first setting, the goal is to estimate the submodular function based on (set, value) pairs, where the function
outputs the value of an input set. This problem is hard in the worst case for
poly(n)
value oracle queries [
31
]. This has
applications in auction design where one may like to learn a player’s valuation function based on her bids [
6
]. In the
second setting, the task is to learn the submodular function under the supervision of (perimeter-set, high-value-subset)
pairs, where high-value-subset potentially maximizes the underlying function against all other subsets of perimeter-set.
The trained function is expected to extract high-value subsets from freshly-specified perimeter sets. This scenario has
applications in itemset prediction in recommendation [78,79], data summarization [50,2,4,10,74], etc.
1.1 Our contributions
Driven by the above motivations, we propose (i) a novel family of highly expressive neural models for submodular
and
α
-submodular functions which can be estimated under the supervisions of both (set, value) and (perimeter-set,
high-value-subset) pairs; (ii) a novel permutation adversarial training method for differentiable subset selection, which
efficiently trains submodular and
α
-submodular functions based on (perimeter-set, high-value-subset) pairs. We provide
more details below.
Neural models for submodular functions.
We design FLEXSUBNET, a family of flexible neural models for monotone,
non-monotone submodular functions and monotone α-submodular functions.
— Monotone submodular functions. We model a monotone submodular function using a recursive neural model which
outputs a concave composed submodular function [
27
,
76
,
52
] at every step of recursion. Specifically, it first computes
1
arXiv:2210.11033v1 [cs.LG] 20 Oct 2022
a linear combination of the submodular function computed in the previous step and a modular function and, then applies
a concave function to the result to output the next submodular function.
— Monotone
α
-submodular function. Our proposed model for submodular function rests on the principle that a concave
composed submodular function is always submodular. However, to the best of our knowledge, there is no known result
for an
α
-submodular function. We address this gap by showing that an
α
-submodular function can be represented
by applying a mapping
ϕ
on a positive modular set function, where
ϕ
satisfies a second-order differential inequality.
Subsequently, we provide a neural model representing the universal approximator of
ϕ
, which in turn is used for
modeling an α-submodular function.
— Non-monotone submodular functions. By applying a non-monotone concave function on modular function, we extend
our model to non-monotone submodular functions.
Several recent models learn subclasses of submodular functions [
53
,
77
,
79
]. Bilmes and Bai
[7]
present a thorough
theoretical characterization of the benefits of network depth with fixed concave functions, in a general framework called
deep submodular functions (DSF). DSF leaves open all design choices: the number of layers, their widths, the DAG
topology, and the choice of concave functions. All-to-all attention has replaced domain-driven topology design in
much of NLP [
18
]. Set transformers [
51
] would therefore be a natural alternative to compare against DSF, but need
more memory and computation. Here we explore the following third, somewhat extreme trade-off: we restrict the
topology to a single recursive chain, thus providing an effectively plug-and-play model with no topology and minimal
hyperparameter choices (mainly the length of the chain). However, we compensate with a more expressive, trainable
concave function that is shared across all nodes of the chain. Our experiments show that our strategy improves ease
of training and predictive accuracy beyond both set transformers and various DSF instantiations with fixed concave
functions.
Permutation insensitive differentiable subset selection.
It is common [
77
,
70
,
63
] to select a subset from a given
dataset by sequentially sampling elements using a softmax distribution obtained from the outputs of a set function
on various sets of elements. At the time of learning set functions based on (perimeter-set, high-value-subset) pairs,
this protocol naturally results in order-sensitivity in the training process for learning set functions. To mitigate this,
we propose a novel max-min optimization. Such a formulation sets forth a game between an adversarial permutation
generator and the set function learner — where the former generates the worst-case permutations to induce minimum
likelihood of training subsets and the latter keeps maximizing the likelihood function until the estimated parameters
become permutation-agnostic. To this end, we use a Gumbel-Sinkhorn neural network [
57
,
68
,
17
,
69
] as a neural
surrogate of hard permutations, that expedites the underlying training process and allows us to avoid combinatorial
search on large permutation spaces.
Experiments.
We first experiment with several submodular set functions and synthetically generated examples, which
show that FLEXSUBNET recovers the function more accurately than several baselines. Later experiments with several
real datasets on product recommendation reveal that FLEXSUBNET can predict the purchased items more effectively
and efficiently than several baselines.
2 Related work
Deep set functions.
Recent years have witnessed a surge of interest in deep learning of set functions. Zaheer et al.
[86]
showed that any set function can be modeled using a symmteric aggregator on the feature vectors associated with
the underlying set members. Lee et al.
[51]
proposed a transformer based neural architecture to model set functions.
However, their work do not focus on modeling or learning submodular functions in particular. Deep set functions
enforce permutation invariance by using symmetric aggregators [
86
,
65
,
64
], which have several applications, e.g.,
character counting [
51
], set anomaly detection [
51
], graph embedding design [
34
,
47
,
80
,
71
], etc. However, they often
suffer from limited expressiveness as shown by Wagstaff et al.
[82]
. Some work aims to overcome this limitations by
sequence encoder followed by learning a permutation invariant network structure [
62
,
68
]. However, none of them
learns an underlying submodular model in the context of subset selection.
Learning functions with shape constraints.
Our double-quadrature strategy for concavity is inspired by a series of
recent efforts to fit functions with shape constraints to suit various learning tasks. Wehenkel and Louppe
[83]
proposed
universal monotone neural networks (UMNN) was a significant early step in learning univariate monotone functions
by numerical integration of a non-negative integrand returned by a ‘universal’ network — this paved the path for
2
universal monotone function modeling. Gupta et al.
[33]
extended to multidimensional shape constraints for supervised
learning tasks, for situations were features complemented or dominated others, or a learnt function
y=f(x)
should be
unimodal. Such constraints could be expressed as linear inequalities, and therefore possible to optimize using projected
stochastic gradient descent. Gupta et al.
[32]
widened the scope further to situations where more general constraints
had to be enforced on gradients. In the context of density estimation and variational inference, a popular technique is to
transform between simple and complex distributions via invertible and differentiable mappings using normalizing flows
[
48
], where coupling functions can be implemented as monotone networks. Our work provides a bridge between shape
constraints, universal concavity and differentiable subset selection.
Deep submodular functions (DSF).
Early work predominantly modeled a trainable submodular function as a mixture
of fixed submodular functions [
53
,
77
]. If training instances do not fit their ‘basis’ of hand-picked submodular
functions, limited expressiveness results. In the quest for ‘universal’ submodular function networks, Bilmes and Bai
[7]
and Bai et al.
[3]
undertook a thorough theoretical inquiry into the effect of network structure on expressiveness.
Specifically, they modeled submodular functions as an aggregate of concave functions of modular functions, computed
in a topological order along a directed acyclic graph (DAG), driven by the fact that a concave function of a monotone
submodular function is a submodular function [
26
,
27
,
76
]. But DSF provides no practical prescription for picking the
concave functions. Each application of DSF will need an extensive search over these design spaces.
Subset selection.
Subset selection especially under submodular or approximate submodular profit enjoys an efficient
greedy maximization routine which admits an approximation guarantee. Consequently, a wide variety of set function
optimization tasks focus on representing the underlying objective as an instance of a submodular function. At large,
subset selection has a myriad of applications in machine learning, e.g., data summarization [
4
], feature selection [
44
],
influence maximization in social networks [
43
,
13
,
14
,
87
], opinion dynamics [
11
,
12
,
14
,
89
,
49
], efficient learning [
20
,
45
], human assisted learning [
16
,
15
,
60
], etc. However, these works do not aim to learn the underlying submodular
function from training subsets.
Set prediction.
Our work is also related to set prediction. Zhang et al.
[90]
use a encoder-decoder architecture for set
prediction. Rezatofighi et al.
[67]
provide a deep probabilistic model for set prediction. However, they aim to predict an
output set rather than the set function.
Differentiable subset selection.
Existing trainable subset selection methods [
77
,
50
] often adopt a max-margin
optimization approach. However, it requires solving one submodular optimization problem at each epoch, which renders
it computationally expensive. On the other hand, Tschiatschek et al.
[79]
provide a probabilistic soft-greedy model
which can generate and be trained on a permutation of subset elements. But then, the trained model becomes sensitive
to this specific permutations. Tschiatschek et al.
[79]
overcome this challenge by presenting several permutations to the
learner, which can be inefficient.
One sided smoothness.
We would like to highlight that our characterization for
α
-submodular function is a special case
of one-sided smoothness (OSS) proposed in [
29
,
28
]. However, the significance of these characterizations are different
between their and our work. First, they consider
γ
-meta submodular function which is a different generalisation of
submodular functions compared to
α
-submodular functions. Second, the OSS characterization they provide is for
the multilinear extension of
γ
-meta submodular function, whereas we provide the characterization of
α
-submodular
functions itself, which allows direct construction of our neural models.
Sample complexity in the context of learning submodular functions.
Goemans et al.
[31]
provided an algorithm
which outputs a function
ˆ
f(S)
that approximates an arbitrary monotone submodular function
f(S)
within a factor
O(nlog n)
using poly
(n)
queries on
f
. Their algorithm considers a powerful active probe setting where
f
can
be queried with arbitrary sets. In contrast, Balcan and Harvey
[6]
consider a more realistic passive setup used in a
supervised learning scenario, and designed an algorithm which obtains an approximation of
f(S)
within factor
O(n)
.
3 Design of FLEXSUBNET
In this section, we first present our notations and then propose a family of flexible neural network models for monotone
and non-monotone submodular functions and α-submodular functions.
3
3.1 Notation and preliminary results
We denote
V={1,2, .., |V|}
as the ground set or universal set of elements and
S, T V
as subsets of
V
. Each
element
sV
may be associated with a feature vector
zs
. Given a set function
F: 2VR
, we define the marginal
utility
F(s|S) := F(S∪ {s})F(S)
. The function
F
is called monotone if
F(s|S)0
whenever
SV
and
sV\S
;
F
is called
α
-submodular with
α > 0
if
F(s|S)αF (s|T)
whenever
ST
and
sV\T
[
35
,
88
,
21
].
As a special case,
F
is submodular if
α= 1
and
F
is modular if
F(s|S) = F(s|T)
. Here,
F(·)
is called normalized
if
F()=0
. Similarly, a real function
f:RR
is normalized if
f(0) = 0
. Unless otherwise stated, we only
consider normalized set functions in this paper. We quote a key result often used for neural modeling for submodular
functions [27,76,52].
Proposition 1.
Given the set function
F: 2VR+
and a real valued function
φ:RR
, (i) the set function
φ(F(·))
is monotone submodular if
F
is monotone submodular and
φ
is an increasing concave function; and, (ii)
φ(F(·))
is
non-monotone submodular if Fis positive modular and φis non-monotone.
3.2 Monotone submodular functions
Overview.
Our model for monotone submodular functions consists of a neural network which cascades the underlying
functions in a recursive manner for
N
steps. Specifically, to compute the submodular function
F(n)(·)
at step
n
, it first
linearly combines the submodular function
F(n1)(·)
computed in the previous step and a trainable positive modular
function m(n)(·)and then, applies a monotone concave activation function φon it.
Recursive model. We model the submodular function Fθ(·)as follows:
F(0)(S) = m(0)
θ(S); F(n)(S) = φθλF (n1)(S) + (1 λ)m(n)
θ(S);Fθ(S) = F(N)(S); (1)
where the iterations are indexed by
1nN
,
{m(n)
θ(·)}
is a sequence of positive modular functions, driven by a
neural network with parameter
θ
.
λ[0,1]
is a tunable or trained parameter. We apply a linear layer with positive
weights on the each feature vector
zs
to compute the value of
m(n)
θ(·)
and then compute
m(n)
θ(S) = PsSm(n)
θ(s)
.
Moreover,
φθ
is an increasing concave function which, as we shall see later, is modeled using neural networks.
Under these conditions, one can use Proposition 1(i) to easily show that
Fθ(S)
is a monotone submodular function
(Appendix B).
3.3 Monotone-α-submodular functions
Our characterization for submodular functions in Eq.
(1)
are based on Proposition 1(i), which implies that a concave
composed submodular function is submodular. However, to the best of our knowledge, a similar characterization of
α
-submodular functions is lacking in the literature. To address this gap, we first introduce a novel characterization of
monotone α-submodular functions and then use it to design a recursive model for such functions.
Novel characterization of α-submodular function.
In the following, we show how we can characterize an
α
-
submodular function using a differential inequality (proven in Appendix B).
Theorem 2.
Given the function
ϕ:RR+
and a modular function
m:V[0,1]
, the set function
F(S) =
ϕ(PsSm(s)) is monotone α-submodular for |S| ≤ k, if ϕ(x)is increasing in xand d2ϕ(x)
dx21
klog 1
αdϕ(x)
dx.
The above theorem also implies that given
α= 1
, then
F(S) = ϕ(PsSm(s))
is monotone submodular if
ϕ(x)
is
concave in
x
, which reduces to a particular case of Proposition 1(i). Once we design
F(S)
by applying
ϕ
on a modular
function, our next goal is to design more expressive and flexible modeling in a recursive manner similar to Eq.
(1)
. To
this end, we extend Proposition 1to the case for α-submodular functions.
Proposition 3.
Given a monotone
α
-submodular function
F(·)
,
φ(F(S))
is monotone
α
-submodular, if
φ(·)
is an
increasing concave function. Here, αremains the same for Fand φ(F(·)).
Note that, when
F(S)
is submodular, i.e.,
α= 1
, the above result reduces to Proposition 1in the context of concave
composed modular functions.
Recursive model.
Similar to Eq.
(1)
for submodular functions, our model for
α
-submodular functions is also driven
by an recursive model which maintains an
α
-submodular function and updates its value recursively for
N
iterations.
4
However, in contrast to FLEXSUBNET, where the underlying submodular function is initialized with a positive modular
function, we initialize the corresponding
α
-submodular function
F(0)(S)
with
ϕθ(m(0)
θ(S))
, where
ϕθ
is a trainable
function satisfying the conditions of Theorem 2. Then we recursively apply a trainable monotone concave function on
F(n1)(S)for n[N]to build a flexible model for α-submodular function. Formally, we have:
F(0)(S) = ϕθ(m(0)
θ(S)); F(n)(S) = φθλF (n1)(S) + (1 λ)m(n)
θ(S);Fθ(S) = F(N)(S); (2)
with
λ[0,1]
. Here,
mθ
,
ϕθ
and
φθ
are realized using neural networks parameterized by
θ
. Then, using Proposition 3
and Theorem 2, we can show that Fθ(S)is α-submodular in S(proven in Proposition 6in Appendix B).
3.4 Non-monotone submodular functions
In contrast to monotone set functions, non monotone submodular functions can be built by applying concave function
on top of only one modular function rather than submodular function (Proposition 1(i) vs. (ii)). To this end, we model a
non-monotone submodular function Fθ(·)as follows:
Fθ(S) = ψθ(mθ(S)) (3)
where
mθ(·)
is positive modular function but
ψθ(·)
can be a non-monotone concave function. Both
mθ
and
ψθ
are
trainable functions realized using neural networks parameterized by
θ
. One can use Proposition 1(ii) to show that
Fθ(S)
is a non-monotone submodular function.
3.5 Neural parameterization of mθ, φθ, ψθ, ϕθ
We complete the neural parameterization introduced in Sections 3.23.4. Each model has two types of component
functions: (i) the modular set function
mθ
; and, (ii) the concave functions
φθ
(Eq.
(1)
and
(2)
) and
ψθ
(Eq.
(3)
) and
the non-concave function
ϕθ
(Eq.
(2)
). While modeling
mθ
is simple and straightforward, designing neural models
for the other components, i.e.,
φθ
,
ψθ
and
ϕθ
is non-trivial. As mentioned before, because we cannot rely on the
structural complexity of our ‘network’ (which is a simple linear recursion) or a judiciously pre-selected library of
concave functions [7], we need to invest more capacity in the concave function, effectively making it universal.
Monotone submodular function.
Our model for monotone submodular function described in Eq.
(1)
consists of two
neural components: the sequence of modular functions {m(n)
θ}and φθ.
— Parameterization of
mθ
.We model the modular function
m(n)
θ: 2VR+
in Eq.
(1)
as
m(n)
θ(S) = PsSθ·zs
,
where zsis the feature vector for each element sSand both θ, zsare non-negative.
— Parameterization of
φθ
.Recall that
φθ
is an increasing concave function. We model it using the fact that a
differentiable function is concave if its second derivative is negative. We focus on capturing the second derivative of the
underlying function using a complex neural network of arbitrary capacity, providing a negative output. Hence, we use a
positive neural network hθto model the second derivative
d2φθ(x)
dx2=hθ(x)0.(4)
Now, since φθis increasing, we have:
dφθ(x)
dx=Zb=
b=x
hθ(b) db0 =φθ(x) = Za=x
a=0 Zb=
b=a
hθ(b) dbda. (5)
Here,
φθ(·)
is normalized, i.e.,
φθ(0) = 0
, which ensures that
{F(n)
θ}
in Eq.
(1)
are also normalized. An offset in Eq.
(5)
allows a nonzero initial value of
φθ(·)
, if required. Note that monotonicity and concavity of
φθ
can be achieved by the
restricting positivity of
hθ
. Such a constraint can be ensured by setting
hθ(·) = ReLU(h)
θ(·))
, where
Λ(h)
θ(·)
is any
complex neural network. Hence,
φθ(·)
represents a class of universal approximators of normalized increasing concave
functions, if
Λ(h)
θ(·)
is an universal approximator of continuous functions [
36
]. For a monotone submodular function of
the form of concave composed modular function, we have the following result (Proven in Appendix B).
Proposition 4.
Given an universal set
V
, a constant
 > 0
and a submodular function
F(S) = φ(PsSm(zs))
where
zsRd
,
SV
,
0m(z)<
for all
zRd
. Then there exists two fully connected neural networks
mθ1
and
hθ2
5
摘要:

NeuralEstimationofSubmodularFunctionswithApplicationstoDifferentiableSubsetSelectionAbirDe,SoumenChakrabartiIndianInstituteofTechnologyBombay{abir,soumen}@cse.iitb.ac.inAbstractSubmodularfunctionsandvariants,throughtheirabilitytocharacterizediversityandcoverage,haveemergedasakeytoolfordataselectiona...

展开>> 收起<<
Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection Abir De Soumen Chakrabarti.pdf

共24页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:24 页 大小:656.65KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 24
客服
关注