Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection Abir De Soumen Chakrabarti

2025-05-02 0 0 656.65KB 24 页 10玖币

侵权投诉

Neural Estimation of Submodular Functions

with Applications to Differentiable Subset Selection

Abir De, Soumen Chakrabarti

Indian Institute of Technology Bombay

{abir,soumen}@cse.iitb.ac.in

Abstract

Submodular functions and variants, through their ability to characterize diversity and coverage, have emerged

as a key tool for data selection and summarization. Many recent approaches to learn submodular functions suffer

from limited expressiveness. In this work, we propose FLEXSUBNET, a family of ﬂexible neural models for both

monotone and non-monotone submodular functions. To ﬁt a latent submodular function from (set, value) observations,

FLEXSUBNET applies a concave function on modular functions in a recursive manner. We do not draw the concave

function from a restricted family, but rather learn from data using a highly expressive neural network that implements a

differentiable quadrature procedure. Such an expressive neural model for concave functions may be of independent

interest. Next, we extend this setup to provide a novel characterization of monotone

-submodular functions, a recently

introduced notion of approximate submodular functions. We then use this characterization to design a novel neural

model for such functions. Finally, we consider learning submodular set functions under distant supervision in the form

of (perimeter-set, high-value-subset) pairs. This yields a novel subset selection method based on an order-invariant,

yet greedy sampler built around the above neural set functions. Our experiments on synthetic and real data show that

FLEXSUBNET outperforms several baselines.

1 Introduction

Owing to their strong characterization of diversity and coverage, submodular functions and their extensions, viz., weak

and approximate submodular functions, have emerged as a powerful machinery in data selection tasks [

]. We propose trainable parameterized families of submodular functions under two supervision regimes.

In the ﬁrst setting, the goal is to estimate the submodular function based on (set, value) pairs, where the function

outputs the value of an input set. This problem is hard in the worst case for

poly(n)

value oracle queries [

]. This has

applications in auction design where one may like to learn a player’s valuation function based on her bids [

]. In the

second setting, the task is to learn the submodular function under the supervision of (perimeter-set, high-value-subset)

pairs, where high-value-subset potentially maximizes the underlying function against all other subsets of perimeter-set.

The trained function is expected to extract high-value subsets from freshly-speciﬁed perimeter sets. This scenario has

applications in itemset prediction in recommendation [78,79], data summarization [50,2,4,10,74], etc.

1.1 Our contributions

Driven by the above motivations, we propose (i) a novel family of highly expressive neural models for submodular

and

-submodular functions which can be estimated under the supervisions of both (set, value) and (perimeter-set,

high-value-subset) pairs; (ii) a novel permutation adversarial training method for differentiable subset selection, which

efﬁciently trains submodular and

-submodular functions based on (perimeter-set, high-value-subset) pairs. We provide

more details below.

Neural models for submodular functions.

We design FLEXSUBNET, a family of ﬂexible neural models for monotone,

non-monotone submodular functions and monotone α-submodular functions.

— Monotone submodular functions. We model a monotone submodular function using a recursive neural model which

outputs a concave composed submodular function [

] at every step of recursion. Speciﬁcally, it ﬁrst computes

arXiv:2210.11033v1 [cs.LG] 20 Oct 2022

a linear combination of the submodular function computed in the previous step and a modular function and, then applies

a concave function to the result to output the next submodular function.

— Monotone

-submodular function. Our proposed model for submodular function rests on the principle that a concave

composed submodular function is always submodular. However, to the best of our knowledge, there is no known result

for an

-submodular function. We address this gap by showing that an

-submodular function can be represented

by applying a mapping

on a positive modular set function, where

satisﬁes a second-order differential inequality.

Subsequently, we provide a neural model representing the universal approximator of

, which in turn is used for

modeling an α-submodular function.

— Non-monotone submodular functions. By applying a non-monotone concave function on modular function, we extend

our model to non-monotone submodular functions.

Several recent models learn subclasses of submodular functions [

]. Bilmes and Bai

[7]

present a thorough

theoretical characterization of the beneﬁts of network depth with ﬁxed concave functions, in a general framework called

deep submodular functions (DSF). DSF leaves open all design choices: the number of layers, their widths, the DAG

topology, and the choice of concave functions. All-to-all attention has replaced domain-driven topology design in

much of NLP [

]. Set transformers [

] would therefore be a natural alternative to compare against DSF, but need

more memory and computation. Here we explore the following third, somewhat extreme trade-off: we restrict the

topology to a single recursive chain, thus providing an effectively plug-and-play model with no topology and minimal

hyperparameter choices (mainly the length of the chain). However, we compensate with a more expressive, trainable

concave function that is shared across all nodes of the chain. Our experiments show that our strategy improves ease

of training and predictive accuracy beyond both set transformers and various DSF instantiations with ﬁxed concave

functions.

Permutation insensitive differentiable subset selection.

It is common [

] to select a subset from a given

dataset by sequentially sampling elements using a softmax distribution obtained from the outputs of a set function

on various sets of elements. At the time of learning set functions based on (perimeter-set, high-value-subset) pairs,

this protocol naturally results in order-sensitivity in the training process for learning set functions. To mitigate this,

we propose a novel max-min optimization. Such a formulation sets forth a game between an adversarial permutation

generator and the set function learner — where the former generates the worst-case permutations to induce minimum

likelihood of training subsets and the latter keeps maximizing the likelihood function until the estimated parameters

become permutation-agnostic. To this end, we use a Gumbel-Sinkhorn neural network [

] as a neural

surrogate of hard permutations, that expedites the underlying training process and allows us to avoid combinatorial

search on large permutation spaces.

Experiments.

We ﬁrst experiment with several submodular set functions and synthetically generated examples, which

show that FLEXSUBNET recovers the function more accurately than several baselines. Later experiments with several

real datasets on product recommendation reveal that FLEXSUBNET can predict the purchased items more effectively

and efﬁciently than several baselines.

2 Related work

Deep set functions.

Recent years have witnessed a surge of interest in deep learning of set functions. Zaheer et al.

[86]

showed that any set function can be modeled using a symmteric aggregator on the feature vectors associated with

the underlying set members. Lee et al.

[51]

proposed a transformer based neural architecture to model set functions.

However, their work do not focus on modeling or learning submodular functions in particular. Deep set functions

enforce permutation invariance by using symmetric aggregators [

], which have several applications, e.g.,

character counting [

], set anomaly detection [

], graph embedding design [

], etc. However, they often

suffer from limited expressiveness as shown by Wagstaff et al.

[82]

. Some work aims to overcome this limitations by

sequence encoder followed by learning a permutation invariant network structure [

]. However, none of them

learns an underlying submodular model in the context of subset selection.

Learning functions with shape constraints.

Our double-quadrature strategy for concavity is inspired by a series of

recent efforts to ﬁt functions with shape constraints to suit various learning tasks. Wehenkel and Louppe

[83]

proposed

universal monotone neural networks (UMNN) was a signiﬁcant early step in learning univariate monotone functions

by numerical integration of a non-negative integrand returned by a ‘universal’ network — this paved the path for

universal monotone function modeling. Gupta et al.

[33]

extended to multidimensional shape constraints for supervised

learning tasks, for situations were features complemented or dominated others, or a learnt function

y=f(x)

should be

unimodal. Such constraints could be expressed as linear inequalities, and therefore possible to optimize using projected

stochastic gradient descent. Gupta et al.

[32]

widened the scope further to situations where more general constraints

had to be enforced on gradients. In the context of density estimation and variational inference, a popular technique is to

transform between simple and complex distributions via invertible and differentiable mappings using normalizing ﬂows

[

], where coupling functions can be implemented as monotone networks. Our work provides a bridge between shape

constraints, universal concavity and differentiable subset selection.

Deep submodular functions (DSF).

Early work predominantly modeled a trainable submodular function as a mixture

of ﬁxed submodular functions [

]. If training instances do not ﬁt their ‘basis’ of hand-picked submodular

functions, limited expressiveness results. In the quest for ‘universal’ submodular function networks, Bilmes and Bai

[7]

and Bai et al.

[3]

undertook a thorough theoretical inquiry into the effect of network structure on expressiveness.

Speciﬁcally, they modeled submodular functions as an aggregate of concave functions of modular functions, computed

in a topological order along a directed acyclic graph (DAG), driven by the fact that a concave function of a monotone

submodular function is a submodular function [

]. But DSF provides no practical prescription for picking the

concave functions. Each application of DSF will need an extensive search over these design spaces.

Subset selection.

Subset selection especially under submodular or approximate submodular proﬁt enjoys an efﬁcient

greedy maximization routine which admits an approximation guarantee. Consequently, a wide variety of set function

optimization tasks focus on representing the underlying objective as an instance of a submodular function. At large,

subset selection has a myriad of applications in machine learning, e.g., data summarization [

], feature selection [

inﬂuence maximization in social networks [

], opinion dynamics [

], efﬁcient learning [

], human assisted learning [

], etc. However, these works do not aim to learn the underlying submodular

function from training subsets.

Set prediction.

Our work is also related to set prediction. Zhang et al.

[90]

use a encoder-decoder architecture for set

prediction. Rezatoﬁghi et al.

[67]

provide a deep probabilistic model for set prediction. However, they aim to predict an

output set rather than the set function.

Differentiable subset selection.

Existing trainable subset selection methods [

] often adopt a max-margin

optimization approach. However, it requires solving one submodular optimization problem at each epoch, which renders

it computationally expensive. On the other hand, Tschiatschek et al.

[79]

provide a probabilistic soft-greedy model

which can generate and be trained on a permutation of subset elements. But then, the trained model becomes sensitive

to this speciﬁc permutations. Tschiatschek et al.

[79]

overcome this challenge by presenting several permutations to the

learner, which can be inefﬁcient.

One sided smoothness.

We would like to highlight that our characterization for

-submodular function is a special case

of one-sided smoothness (OSS) proposed in [

]. However, the signiﬁcance of these characterizations are different

between their and our work. First, they consider

-meta submodular function which is a different generalisation of

submodular functions compared to

-submodular functions. Second, the OSS characterization they provide is for

the multilinear extension of

-meta submodular function, whereas we provide the characterization of

-submodular

functions itself, which allows direct construction of our neural models.

Sample complexity in the context of learning submodular functions.

Goemans et al.

[31]

provided an algorithm

which outputs a function

f(S)

that approximates an arbitrary monotone submodular function

f(S)

within a factor

O(√nlog n)

using poly

(n)

queries on

. Their algorithm considers a powerful active probe setting where

can

be queried with arbitrary sets. In contrast, Balcan and Harvey

[6]

consider a more realistic passive setup used in a

supervised learning scenario, and designed an algorithm which obtains an approximation of

f(S)

within factor

O(√n)

3 Design of FLEXSUBNET

In this section, we ﬁrst present our notations and then propose a family of ﬂexible neural network models for monotone

and non-monotone submodular functions and α-submodular functions.

3.1 Notation and preliminary results

We denote

V={1,2, .., |V|}

as the ground set or universal set of elements and

S, T ⊆V

as subsets of

. Each

element

s∈V

may be associated with a feature vector

. Given a set function

F: 2V→R

, we deﬁne the marginal

utility

F(s|S) := F(S∪ {s})−F(S)

. The function

is called monotone if

F(s|S)≥0

whenever

S⊂V

and

s∈V\S

;

is called

-submodular with

α > 0

F(s|S)≥αF (s|T)

whenever

S⊆T

and

s∈V\T

[

As a special case,

is submodular if

α= 1

and

is modular if

F(s|S) = F(s|T)

. Here,

F(·)

is called normalized

F(∅)=0

. Similarly, a real function

f:R→R

is normalized if

f(0) = 0

. Unless otherwise stated, we only

consider normalized set functions in this paper. We quote a key result often used for neural modeling for submodular

functions [27,76,52].

Proposition 1.

Given the set function

F: 2V→R+

and a real valued function

φ:R→R

, (i) the set function

φ(F(·))

is monotone submodular if

is monotone submodular and

is an increasing concave function; and, (ii)

φ(F(·))

non-monotone submodular if Fis positive modular and φis non-monotone.

3.2 Monotone submodular functions

Overview.

Our model for monotone submodular functions consists of a neural network which cascades the underlying

functions in a recursive manner for

steps. Speciﬁcally, to compute the submodular function

F(n)(·)

at step

, it ﬁrst

linearly combines the submodular function

F(n−1)(·)

computed in the previous step and a trainable positive modular

function m(n)(·)and then, applies a monotone concave activation function φon it.

Recursive model. We model the submodular function Fθ(·)as follows:

F(0)(S) = m(0)

θ(S); F(n)(S) = φθλF (n−1)(S) + (1 −λ)m(n)

θ(S);Fθ(S) = F(N)(S); (1)

where the iterations are indexed by

1≤n≤N

{m(n)

θ(·)}

is a sequence of positive modular functions, driven by a

neural network with parameter

λ∈[0,1]

is a tunable or trained parameter. We apply a linear layer with positive

weights on the each feature vector

to compute the value of

m(n)

θ(·)

and then compute

m(n)

θ(S) = Ps∈Sm(n)

θ(s)

Moreover,

φθ

is an increasing concave function which, as we shall see later, is modeled using neural networks.

Under these conditions, one can use Proposition 1(i) to easily show that

Fθ(S)

is a monotone submodular function

(Appendix B).

3.3 Monotone-α-submodular functions

Our characterization for submodular functions in Eq.

(1)

are based on Proposition 1(i), which implies that a concave

composed submodular function is submodular. However, to the best of our knowledge, a similar characterization of

-submodular functions is lacking in the literature. To address this gap, we ﬁrst introduce a novel characterization of

monotone α-submodular functions and then use it to design a recursive model for such functions.

Novel characterization of α-submodular function.

In the following, we show how we can characterize an

submodular function using a differential inequality (proven in Appendix B).

Theorem 2.

Given the function

ϕ:R→R+

and a modular function

m:V→[0,1]

, the set function

F(S) =

ϕ(Ps∈Sm(s)) is monotone α-submodular for |S| ≤ k, if ϕ(x)is increasing in xand d2ϕ(x)

dx2≤1

klog 1

αdϕ(x)

dx.

The above theorem also implies that given

α= 1

, then

F(S) = ϕ(Ps∈Sm(s))

is monotone submodular if

ϕ(x)

concave in

, which reduces to a particular case of Proposition 1(i). Once we design

F(S)

by applying

on a modular

function, our next goal is to design more expressive and ﬂexible modeling in a recursive manner similar to Eq.

(1)

. To

this end, we extend Proposition 1to the case for α-submodular functions.

Proposition 3.

Given a monotone

-submodular function

F(·)

φ(F(S))

is monotone

-submodular, if

φ(·)

is an

increasing concave function. Here, αremains the same for Fand φ(F(·)).

Note that, when

F(S)

is submodular, i.e.,

α= 1

, the above result reduces to Proposition 1in the context of concave

composed modular functions.

Recursive model.

Similar to Eq.

(1)

for submodular functions, our model for

-submodular functions is also driven

by an recursive model which maintains an

-submodular function and updates its value recursively for

iterations.

However, in contrast to FLEXSUBNET, where the underlying submodular function is initialized with a positive modular

function, we initialize the corresponding

-submodular function

F(0)(S)

with

ϕθ(m(0)

θ(S))

, where

ϕθ

is a trainable

function satisfying the conditions of Theorem 2. Then we recursively apply a trainable monotone concave function on

F(n−1)(S)for n∈[N]to build a ﬂexible model for α-submodular function. Formally, we have:

F(0)(S) = ϕθ(m(0)

θ(S)); F(n)(S) = φθλF (n−1)(S) + (1 −λ)m(n)

θ(S);Fθ(S) = F(N)(S); (2)

with

λ∈[0,1]

. Here,

mθ

ϕθ

and

φθ

are realized using neural networks parameterized by

. Then, using Proposition 3

and Theorem 2, we can show that Fθ(S)is α-submodular in S(proven in Proposition 6in Appendix B).

3.4 Non-monotone submodular functions

In contrast to monotone set functions, non monotone submodular functions can be built by applying concave function

on top of only one modular function rather than submodular function (Proposition 1(i) vs. (ii)). To this end, we model a

non-monotone submodular function Fθ(·)as follows:

Fθ(S) = ψθ(mθ(S)) (3)

where

mθ(·)

is positive modular function but

ψθ(·)

can be a non-monotone concave function. Both

mθ

and

ψθ

are

trainable functions realized using neural networks parameterized by

. One can use Proposition 1(ii) to show that

Fθ(S)

is a non-monotone submodular function.

3.5 Neural parameterization of mθ, φθ, ψθ, ϕθ

We complete the neural parameterization introduced in Sections 3.2–3.4. Each model has two types of component

functions: (i) the modular set function

mθ

; and, (ii) the concave functions

φθ

(Eq.

(1)

and

(2)

) and

ψθ

(Eq.

(3)

) and

the non-concave function

ϕθ

(Eq.

(2)

). While modeling

mθ

is simple and straightforward, designing neural models

for the other components, i.e.,

φθ

ψθ

and

ϕθ

is non-trivial. As mentioned before, because we cannot rely on the

structural complexity of our ‘network’ (which is a simple linear recursion) or a judiciously pre-selected library of

concave functions [7], we need to invest more capacity in the concave function, effectively making it universal.

Monotone submodular function.

Our model for monotone submodular function described in Eq.

(1)

consists of two

neural components: the sequence of modular functions {m(n)

θ}and φθ.

— Parameterization of

mθ

.We model the modular function

m(n)

θ: 2V→R+

in Eq.

(1)

m(n)

θ(S) = Ps∈Sθ·zs

where zsis the feature vector for each element s∈Sand both θ, zsare non-negative.

— Parameterization of

φθ

.Recall that

φθ

is an increasing concave function. We model it using the fact that a

differentiable function is concave if its second derivative is negative. We focus on capturing the second derivative of the

underlying function using a complex neural network of arbitrary capacity, providing a negative output. Hence, we use a

positive neural network hθto model the second derivative

d2φθ(x)

dx2=−hθ(x)≤0.(4)

Now, since φθis increasing, we have:

dφθ(x)

dx=Zb=∞

b=x

hθ(b) db≥0 =⇒φθ(x) = Za=x

a=0 Zb=∞

b=a

hθ(b) dbda. (5)

Here,

φθ(·)

is normalized, i.e.,

φθ(0) = 0

, which ensures that

{F(n)

θ}

in Eq.

(1)

are also normalized. An offset in Eq.

(5)

allows a nonzero initial value of

φθ(·)

, if required. Note that monotonicity and concavity of

φθ

can be achieved by the

restricting positivity of

hθ

. Such a constraint can be ensured by setting

hθ(·) = ReLU(Λ(h)

θ(·))

, where

Λ(h)

θ(·)

is any

complex neural network. Hence,

φθ(·)

represents a class of universal approximators of normalized increasing concave

functions, if

Λ(h)

θ(·)

is an universal approximator of continuous functions [

]. For a monotone submodular function of

the form of concave composed modular function, we have the following result (Proven in Appendix B).

Proposition 4.

Given an universal set

, a constant

 > 0

and a submodular function

F(S) = φ(Ps∈Sm(zs))

where

zs∈Rd

S⊂V

0≤m(z)<∞

for all

z∈Rd

. Then there exists two fully connected neural networks

mθ1

and

hθ2

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NeuralEstimationofSubmodularFunctionswithApplicationstoDifferentiableSubsetSelectionAbirDe,SoumenChakrabartiIndianInstituteofTechnologyBombay{abir,soumen}@cse.iitb.ac.inAbstractSubmodularfunctionsandvariants,throughtheirabilitytocharacterizediversityandcoverage,haveemergedasakeytoolfordataselectiona...

展开>> 收起<<

Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection Abir De Soumen Chakrabarti.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection Abir De Soumen Chakrabarti

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: