
universal monotone function modeling. Gupta et al.
[33]
extended to multidimensional shape constraints for supervised
learning tasks, for situations were features complemented or dominated others, or a learnt function
y=f(x)
should be
unimodal. Such constraints could be expressed as linear inequalities, and therefore possible to optimize using projected
stochastic gradient descent. Gupta et al.
[32]
widened the scope further to situations where more general constraints
had to be enforced on gradients. In the context of density estimation and variational inference, a popular technique is to
transform between simple and complex distributions via invertible and differentiable mappings using normalizing flows
[
48
], where coupling functions can be implemented as monotone networks. Our work provides a bridge between shape
constraints, universal concavity and differentiable subset selection.
Deep submodular functions (DSF).
Early work predominantly modeled a trainable submodular function as a mixture
of fixed submodular functions [
53
,
77
]. If training instances do not fit their ‘basis’ of hand-picked submodular
functions, limited expressiveness results. In the quest for ‘universal’ submodular function networks, Bilmes and Bai
[7]
and Bai et al.
[3]
undertook a thorough theoretical inquiry into the effect of network structure on expressiveness.
Specifically, they modeled submodular functions as an aggregate of concave functions of modular functions, computed
in a topological order along a directed acyclic graph (DAG), driven by the fact that a concave function of a monotone
submodular function is a submodular function [
26
,
27
,
76
]. But DSF provides no practical prescription for picking the
concave functions. Each application of DSF will need an extensive search over these design spaces.
Subset selection.
Subset selection especially under submodular or approximate submodular profit enjoys an efficient
greedy maximization routine which admits an approximation guarantee. Consequently, a wide variety of set function
optimization tasks focus on representing the underlying objective as an instance of a submodular function. At large,
subset selection has a myriad of applications in machine learning, e.g., data summarization [
4
], feature selection [
44
],
influence maximization in social networks [
43
,
13
,
14
,
87
], opinion dynamics [
11
,
12
,
14
,
89
,
49
], efficient learning [
20
,
45
], human assisted learning [
16
,
15
,
60
], etc. However, these works do not aim to learn the underlying submodular
function from training subsets.
Set prediction.
Our work is also related to set prediction. Zhang et al.
[90]
use a encoder-decoder architecture for set
prediction. Rezatofighi et al.
[67]
provide a deep probabilistic model for set prediction. However, they aim to predict an
output set rather than the set function.
Differentiable subset selection.
Existing trainable subset selection methods [
77
,
50
] often adopt a max-margin
optimization approach. However, it requires solving one submodular optimization problem at each epoch, which renders
it computationally expensive. On the other hand, Tschiatschek et al.
[79]
provide a probabilistic soft-greedy model
which can generate and be trained on a permutation of subset elements. But then, the trained model becomes sensitive
to this specific permutations. Tschiatschek et al.
[79]
overcome this challenge by presenting several permutations to the
learner, which can be inefficient.
One sided smoothness.
We would like to highlight that our characterization for
α
-submodular function is a special case
of one-sided smoothness (OSS) proposed in [
29
,
28
]. However, the significance of these characterizations are different
between their and our work. First, they consider
γ
-meta submodular function which is a different generalisation of
submodular functions compared to
α
-submodular functions. Second, the OSS characterization they provide is for
the multilinear extension of
γ
-meta submodular function, whereas we provide the characterization of
α
-submodular
functions itself, which allows direct construction of our neural models.
Sample complexity in the context of learning submodular functions.
Goemans et al.
[31]
provided an algorithm
which outputs a function
ˆ
f(S)
that approximates an arbitrary monotone submodular function
f(S)
within a factor
O(√nlog n)
using poly
(n)
queries on
f
. Their algorithm considers a powerful active probe setting where
f
can
be queried with arbitrary sets. In contrast, Balcan and Harvey
[6]
consider a more realistic passive setup used in a
supervised learning scenario, and designed an algorithm which obtains an approximation of
f(S)
within factor
O(√n)
.
3 Design of FLEXSUBNET
In this section, we first present our notations and then propose a family of flexible neural network models for monotone
and non-monotone submodular functions and α-submodular functions.
3