Game Theoretic Rating in N-player general-sum games with Equilibria

2025-04-22 1 0 1.2MB 21 页 10玖币

侵权投诉

Luke Marris 1 2 Marc Lanctot 1Ian Gemp 1Shayegan Omidshaﬁei 3Stephen McAleer 4Jerome Connor 1

Karl Tuyls 1Thore Graepel 2

Abstract

Rating strategies in a game is an important area

of research in game theory and artiﬁcial intel-

ligence, and can be applied to any real-world

competitive or cooperative setting. Traditionally,

only transitive dependencies between strategies

have been used to rate strategies (e.g. Elo), how-

ever recent work has expanded ratings to utilize

game theoretic solutions to better rate strategies

in non-transitive games. This work generalizes

these ideas and proposes novel algorithms suitable

for N-player, general-sum rating of strategies in

normal-form games according to the payoff rating

system. This enables well-established solution

concepts, such as equilibria, to be leveraged to

efﬁciently rate strategies in games with complex

strategic interactions, which arise in multiagent

training and real-world interactions between many

agents. We empirically validate our methods on

real world normal-form data (Premier League)

and multiagent reinforcement learning agent eval-

uation.

1. Introduction

Traditionally, rating systems assume transitive dependen-

cies of strategies in a game (such as Elo (Elo, 1978) and

TrueSkill (Herbrich et al., 2007)). That is, there exists an

unambiguous ordering of all strategies according to their

relative strengths. This ignores all other interesting inter-

actions between strategies, including cycles where strategy

S beats P beats R beats S in the classic game of Rock, Pa-

per, Scissors (Table 2). Many interesting games have this

so-called “strategic” dimension (Czarnecki et al., 2020), or

“gamescapes” (Balduzzi et al., 2018), that cannot be captured

by pairwise transitivity constraints.

Game theoretic rating of strategies is an emerging area of

study which seeks to overcome some of these drawbacks.

Equal contribution

DeepMind

University College London

Google

Carnegie Mellon University. Correspondence to: Luke

Marris <marris@deepmind.com>.

These methods can be employed in normal-form games, or

in empirical games, constructed normal-form games where

strategies are policies competing in a multiagent interac-

tion (e.g. a simulation or a game) and the payoffs are ap-

proximate expected returns of the players employing these

policies (Wellman, 2006; Walsh et al., 2002; Tuyls et al.,

2020).

The Nash Average (NA) (Balduzzi et al., 2018) algorithm

proposed a way of rating strategies in two-player, zero-sum,

normal-form games. This approach is known as maximal

lottery (Kreweras, 1965; Fishburn, 1984) in social choice

theory, where it ﬁrst arose, and is so fundamental it has

been rediscovered across many ﬁelds (Brandt, 2017). In

particular, NA proposed two applications of rating: agent-

vs-agent interactions and agent-vs-task interactions. NA

possesses a number of interesting properties: its ratings are

invariant to strategy duplication, and it captures interesting

non-transitive interactions between strategies. However, the

technique is difﬁcult to apply outside of two-player, zero-

sum domains due to computational tractability and equi-

librium selection difﬁculties. More recent work,

-Rank

(Omidshaﬁei et al., 2019), sought to remedy this by introduc-

ing a novel computationally feasible solution concept based

on the stationary distribution a discrete-time evolutionary

process. Its main advantages concerns its uniqueness and

efﬁcient computation in N-player and general-sum games.

This work expands game theoretic rating techniques to es-

tablished equilibrium concepts correlated equilibrium (CE)

(Aumann, 1974), and coarse-correlated equilibrium (CCE)

(Moulin & Vial, 1978). In Section 2 we provide background

to rating algorithms, game theory, and equilibrium based

solution concepts. In particular, we describe CEs and CCEs

that are suitable in the N-player, general-sum setting. In

Section 3 we deﬁne a novel general rating deﬁnition: payoff

rating, which is equivalent to NA if the game is two-player

zero-sum. Payoff rating is the expected payoff under a joint

distribution, conditioned on taking a certain strategy. The

choice of joint distribution is what provides payoff ratings

with its interesting properties. In Section 4 we suggest joint

distributions to parameterize game theoretic rating algo-

rithms. In Section 5 we test these algorithms on instances of

N-player, general-sum games using real-world data. Finally,

Section 6 is a discussion of the connections of this work

arXiv:2210.02205v1 [cs.GT] 5 Oct 2022

Game Theoretic Rating in N-player general-sum games with Equilibria

to other areas of machine learning and the relevance of the

work to machine learning.

2. Preliminaries

First we cover background on game theory (Section 2.1),

rating and ranking (Section 2.2), equilibria (Section 2.3),

and game theoretic rating (Section 2.4).

2.1. Empirical Game Theory

A normal-form game is a single step game where all play-

ers take an action simultaneously, and each receive a pay-

off. With n players, we denote a joint strategy as the tuple

a= (a1, ..., an)∈ A

, where

ap∈ Ap

is the strategy space

available to player

. The payoff given to player

from a

joint strategy

Gp(a)

. A player’s objective is to maxi-

mize their payoff in the presence of other players, who are

maximizing their own payoffs.

It is possible to construct normal-form game representations

from observations of much larger systems. This process is

known as empirical game theoretic analysis (EGTA) (Well-

man, 2006; Walsh et al., 2002). The most common example

in artiﬁcial intelligence is when studying a set of policies

in large extensive form games (e.g., Chess or Go). Often the

set of policies is too large to enumerate entirely so we retain

a subset of them and track their performance against one an-

other, therefore constructing a normal-form game from the

policies’ expected returns within the environment (Lanctot

et al., 2017). For example, a given season of the Premier

League can be modeled as a normal-form game involving

a set of 20 team policies, out of many possible football

teams

. EGTA has proved essential to multiagent reinforce-

ment learning (MARL) recently in scaling to human-level

StarCraft (Vinyals et al., 2019).

2.2. Rating and Ranking

Ranking is the problem of assigning a partial ordering to

a set. Rating is the more general problem of assigning a

scalar value to each element of a set, which then could be

used to describe a ranking. The simplest rating procedure

is to take the mean performance of a strategy against all

other strategies. Viewed through a game theoretic lens, this

is equivalent to assuming each opponent strategy is equally

likely to be encountered: the opponent player is playing a

uniform distribution. The key drawback of this approach

is that it is heavily inﬂuenced by the strategies available,

and that an opponent player, in practice, will favour their

strongest strategies. This argument is made thoroughly in

Balduzzi et al. (2018).

1A mapping from information states to actions.

2The full permutation of all players and coaches in the world.

Perhaps the best known rating algorithm is Elo (Elo, 1978).

It is used to rate players in many two-player, zero-sum

games, most famously in Chess. It is also commonly used

for evaluating artiﬁcial agents in two-player settings (Silver

et al., 2016; Schrittwieser et al., 2019). The skill (Elo) of

each competitor,

, is described using a single variable

r(a1)

which maps to a win probability compared to other

competitors,

G1(a1, a2) = (1 −10r(a1)−r(a2)

400 )−1

. This

therefore deﬁnes a two-player, symmetric, constant-sum

game where competitors are strategies, with payoff deﬁned

G2= 1 −G1=GT

. It is only suitable for describing

highly transitive games. Multi-dimensional Elo (Balduzzi

et al., 2018) is a method that attempts to overcome the

limitations of Elo by introducing additional variables for

each competitor which describe cyclic structure in the game.

This gives a more accurate approximation of the payoff,

however it does not provide a way of rating strategies on its

own

. Decomposed Elo (Jaderberg et al., 2019) is a method

that works for

constant-sum, team games. It is

capable of assigning a rating to each competitor in the team

as well as a rating for the team. However, it is also only

suitable for transitive games, both where team compositions

and between-team interactions are transitive.

2.3. Equilibria

The joint strategy distribution over the set of all joint

strategies is denoted

σ(a) = σ(a1, ..., an)

, where

Pa∈A σ(a) = 1

and

σ(a)≥0∀a∈ A

. Furthermore

we deﬁne the marginal strategy distribution as

σ(ap) =

Pq∈−pPaq∈Aqσ(a1, ..., an)

, where

−p={1, ..., p −

1, p+1, ..., n}

. Sometimes it is possible to factorize the joint

strategy distribution into its marginals

σ(a) = ⊗pσ(ap)

. Fi-

nally, the conditional distribution

σ(a−p|ap) = σ(a)

σ(ap)

deﬁned if

σ(ap)>0

. Sometimes we may denote the joint

in terms of one players strategies versus all other players:

σ(a)≡σ(a1, ..., an)≡σ(ap, a−p).

A popular class of solution concepts are equilibrium based:

joint distributions,

σ(a)

, where under certain deﬁnitions,

no player has incentive to deviate. The most well known

is Nash equilibrium (Nash, 1951) (NE), which is tractable,

interchangeable and unexploitable in two-player, zero-sum

games (Shoham & Leyton-Brown, 2009). NEs are always

factorizable joint distributions. A related solution concept is

correlated equilibrium (Aumann, 1974) (CE) which is more

suitable for N-player, general-sum settings where players

are allowed to coordinate strategies with each other if it is

mutually beneﬁcial. Furthermore, CEs are more compatible

with the Bayesian perspective, and arise as a result of learn-

ing rules (Foster & Vohra, 1997; Cesa-Bianchi & Lugosi,

2006). The mechanism of implementing a CE is via a corre-

3It assigns vectors, not scalars, to strategies.

Game Theoretic Rating in N-player general-sum games with Equilibria

lation device which samples a joint strategy from a known

public distribution and recommends the sampled strategy

secretly to each player. A distribution is in correlated equi-

librium if no player is incentivised to unilaterally deviate

from the recommendation after receiving it. CEs that are fac-

torizable are also NEs. An additional solution concept, the

coarse correlated equilibrium (Moulin & Vial, 1978) (CCE),

requires players to commit to the recommendation before

it has been sampled. It is less computationally expensive

and permits even higher equilibrium payoffs. These sets

are related to each other

NE ⊆CE ⊆CCE

. The empirical

average policy of no-regret learning algorithms in self-play

are known to converge to CCEs (Foster & Vohra, 1997; Hart

& Mas-Colell, 2000).

All these equilibria have approximate forms which are pa-

rameterized by the approximation parameter



which de-

scribes the maximum allowed incentive to deviate to a

best response (across all players). There are two common

methods of deﬁning an approximate equilibrium: the stan-

dard approximate equilibrium (Shoham & Leyton-Brown,

2009) describes the bound on incentive to deviate under the

joint, and the well-supported (WS) approximate equilibrium

(Goldberg & Papadimitriou, 2006) describes the bound on

incentive to deviate under the conditionals. When

= 0

these deﬁnitions become equivalent. The standard method

has the property that any

>min

will permit a full-support

equilibrium (Section A), where

min ≤0

is the minimum



that permits a feasible solution in a game. Each player may

have individual tolerances to deviation,

p

. For the rest of

this work we only consider the standard deﬁnition, similar

derivations can be adopted for the well-supported deﬁnition.

CEs can be deﬁned in terms of linear inequality con-

straints, deﬁned in terms of the deviation gain of a player:

ACE

p(a0

p, ap, a−p) = Gp(a0

p, a−p)−Gp(ap, a−p)

p6=

ap∈ Ap, ap∈ Ap,∀p

, where each constraint corresponds

to a pair of strategies: apdeviating to a0

-CE: X

a−p∈A−p

σ(ap, a−p)ACE

p(a0

p, ap, a−p)≤p(1)

CCEs can be derived from the CE deﬁnition by summing

over strategies available to a player:

Papσ(ap)(·)

, there-

fore there is only a constraint for each possible deviation

p∈ Ap,∀p

with a deviation gain of

ACCE

p(a0

p, a) =

Gp(a0

p, a−p)−Gp(a).

-CCE: X

a∈A

σ(ap, a−p)ACCE

p(a0

p, a)≤p(2)

NEs have similar deﬁnitions to CEs but have an extra con-

straint: the joint distribution factorizes

⊗pσ(ap) = σ(a)

resulting in nonlinear constraints4.

-NE: X

a−p∈A−p

⊗qσ(aq)ACE

p(a0

p, ap, a−p)≤p(3)

When a distribution is in equilibrium, no player has incentive

to unilaterally deviate from it to achieve a better payoff.

There can however be many equilibria in a game, choosing

amongst these is known as the equilibrium selection problem

(Harsanyi & Selten, 1988).

For NEs it has been suggested to use a maximum entropy cri-

terion (MENE) (Balduzzi et al., 2018), which always exists

and is unique in two-player, constant-sum settings. Another

strategy is to regularize the NE of the game with Shannon

entropy resulting in the quantal response equilibrium (QRE)

(McKelvey & Palfrey, 1995). There exists a continuum of

QREs starting at the uniform distribution, ﬁnishing at the

limiting logit equilibrium (LLE), which is unique for almost

all games. Solvers (Gemp et al., 2021) can ﬁnd LLEs, even

in scenarios with stochastic payoffs.

(C)CEs permit a convex polytope of valid solutions which

are deﬁned by their linear inequality constraints. Con-

vex functions can be used to uniquely select from this

set. Multiple objectives have been proposed to select

from the set of valid solutions including maximum entropy,

−Paσ(a) ln(σ(a))

(ME(C)CE) (Ortiz et al., 2007), maxi-

mum Gini,

Pa1−σ(a)2

(MG(C)CE) (Marris et al., 2021a),

and maximum welfare5,PpPaσ(a)Gp(a)(MW(C)CE).

2.4. Game Theoretic Rating

It is natural to formulate rating problems as rating strategies

in a normal-form game. For example, a football league is a

symmetric, two-player game where strategies are teams and

payoffs are win probabilities or points. Teams can therefore

be ranked by analysing this empirical game. Single player

multi-task reinforcement learning can be formulated as a

game between an agent player and an environment player,

with strategies describing different policies and different

tasks respectively (Balduzzi et al., 2018). We call the prob-

lem of assigning a rating to each player’s strategies the game

theoretic rating problem. While any joint distribution can

be used to calculate a rating (Section 3), game theoretic

distributions, such as ones that are in equilibrium have a

number of advantages.

Nash Average (Balduzzi et al., 2018) leverages the proper-

ties of NE to rate strategies in two-player, zero-sum games

according to the payoff rating deﬁnition (Section 3.1). This

approach can also be used to rate relative strengths between

populations of strategies in sub-games: relative population

performance (RPP) (Balduzzi et al., 2019; Czarnecki et al.,

4This is why NEs are harder to compute than (C)CEs.

5Not convex and hence not always unique.

Game Theoretic Rating in N-player general-sum games with Equilibria

2020). It also has an interesting property that it is invariant

to strategy repeats.

Using NEs to rate general-sum games has not been explored,

however LLEs (McKelvey & Palfrey, 1995) could be lever-

aged to select an equilibrium that is unique in almost all

games. However it is difﬁcult to compute and sophisticated

solvers such as gambit-logit (Turocy, 2005; McKelvey et al.,

2016) do not scale well to large N-player games (Gemp

et al., 2021). In contrast, (C)CEs which have not been con-

sidered as rating algorithms until now, a) have a convex

optimization formulation, b) have unique principled equi-

librium selection, c) can capture coordination in games, d)

are established and understood, and e) have a number of

interesting rating properties.

3. Game Theoretic Rating

We introduce a novel generalized rating for N-player,

general-sum: the payoff rating. The deﬁnition functions

for arbitrary joint strategy distributions, however we pro-

pose using equilibrium distributions to ensure the rating is

game theoretic.

3.1. Payoff Rating

The rating is deﬁned terms of the payoff,

, and the joint

distribution players are assumed to be playing under,

. We

deﬁne the payoff rating:

rσ

p(ap) = ∂

∂σ(ap)X

a∈A

Gp(a)σ(a)(4)

=∂

∂σ(ap)X

a−p∈A−p

Gp(ap, a−p)σ(a−p|ap)σ(ap)

a−p∈A−p

Gp(ap, a−p)σ(a−p|ap)(5)

Theorem 3.1.

(Nash Average Equivalence) When using an

MENE for the joint strategy distribution in two-player, zero-

sum games, payoff rating is equivalent to Nash Average

(NA).

Proof.

For NE, a player’s strategies are independent from

the other player’s strategies,

σ(a2|a1) = σ(a2)

. There-

fore

rσ

1(a1) = Pa2∈A2G1(a1, a2)σ(a2)

and

rσ

2(a2) =

Pa1∈A1G2(a1, a2)σ(a1)

, which is the deﬁnition of

NA.

This deﬁnition has two interpretations: a) the change

in the player’s payoff under a joint strategy distribution,

Pa∈A Gp(a)σ(a)

, with respect to the probability of se-

lecting that strategy,

σ(ap)

b) the expected strategy payoff

under a joint strategy distribution conditioned on that strat-

egy. When deﬁned, the payoff rating is bounded between

the minimum and maximum values of a strategy’s payoff,

minaGp(ap, a−p)≤rσ

p(ap)≤maxaGp(ap, a−p).

Note the mathematical edge case that strategies with zero

marginal probability,

σ(ap)=0

, have undeﬁned conditional

probability,

σ(a−p|ap)

, and therefore have undeﬁned payoff

rating. Consider a symmetric two-player zero-sum transitive

game where strategy

dominates

, and

dominates

Many game theoretic distributions (including NE, CE and

CCE) will place all probability mass on

(S, S)

, leaving

strategies

and

with undeﬁned rating. This may be

unsatisfying for two reasons; ﬁrstly that there could be a

further ordering between

and

such that

S>A>W

reﬂected in the ranking, and secondly, that all strategies

should receive a rating value. It could argued that if a

strategy dominates all others then an ordering over the rest

is redundant. However there are ways to achieve ordering,

a) with approximate equilibria (Section 4), certain joint

strategies (such as

min +

-MECCE) are guaranteed to place

at least some mass on all strategies (Section A), b) assign

rσ

p(a) = minaGp(a)

for undeﬁned values, and c) rate using

a sub-game with dominating strategies pruned.

3.2. Joint Strategy Distributions

We now turn our discussion to the joint distributions we

measure such a rating under. The most ubiquitous approach

is the uniform distribution which is equivalent to calculating

the mean payoff across all opponent strategies. As discussed

previously, this approach does not consider any interesting

dynamics of the game. It is, however, the distribution with

maximum entropy and therefore makes the fewest assump-

tions (Jaynes, 1957) .

In order to be more game theoretic, using distributions that

are in certain types of equilibrium is beneﬁcial. Firstly, con-

sider the deﬁnitions of several equilibria (Equations

(1)

(3)

These equations are linear

inequality constraints between

strategies, so already closely resemble a partial ordering.

Rankings are nothing more than partial orderings between

elements. Secondly, values of the payoff ratings depend

entirely on the payoffs under distributions that all players

are not incentivized to deviate from. Therefore this set of

joint distributions are representative of ones which rational

agents may employ in practice. In contrast, the uniform

distribution is rarely within an equilibrium set. Therefore,

we argue, equilibrium distributions are a much more natu-

ral approach. Further mathematical justiﬁcation is given in

Section B.

It is possible to mix the opinionated properties of an equi-

librium with the zero-assumption properties of the uniform:

there exists a principled continuum between the uniform

distribution and an equilibrium distribution (Marris et al.,

6In joint distribution space.

Game Theoretic Rating in N-player general-sum games with Equilibria

2021a) to achieve this balance. The uniform distribution

is recovered when using a large enough approximation pa-

rameter

p≥uni

. The value of

uni

depends on the solution

concept, and can be determined directly from a payoff (Sec-

tion A).

3.3. Properties of Equilibria Ratings

Naively, one may want a rating strategy to differentiate the

strategies it is rating. Game theoretic rating does the oppo-

site: it groups strategies into similar ratings that should not

be differentiated, such as strategies that are in a cycle with

one another (Tables 1a, and 1b). We call this phenomenon

the grouping property. Other properties, such as strategic

dominance resulting in dominated ratings, and consistent

ratings over repeated strategies or between players in a sym-

metric game can also be achieved when using the maximum

entropy criterion (Section B).

4. Rating Algorithms

A generalized payoff rating algorithm (Algorithm 1) is there-

fore parameterized over an equilibrium concept, and an

equilibrium selection criterion. This section makes some

recommendations on suitable parameterizations.

Algorithm 1 Generalized Payoff Rating

1: σ(a)←CONCEPTANDSELECTION(G(a), )

2: for p←1...n do

3: rσ

p(ap)←X

a−p∈A−p

Gp(ap, a−p)σ(a−p|ap)

4: return (rσ

1(a1), ..., rσ

n(an))

4.1. min +-MECCE Payoff Rating

We recommend using Coarse Correlated Equilibrium (CCE)

as the joint strategy distribution, maximum entropy (ME) for

the equilibrium selection function. We consider the solution

when

→min

(or equivalently with a sufﬁciently small

=min +

), where

min ≤0

is the minimum approxima-

tion parameter that permits a feasible solution (Section A)

(Marris et al., 2021b). We call the resulting rating

min +

MECCE Payoff Rating.

Using CCEs as the solution concept has a number of advan-

tages: a) full joint distributions allow cooperative as well

as competitive games to be rated; factorizable distributions

such as NE struggle with cooperative components of a game

b) CCEs are more tractable to compute than CEs and NEs, c)

full-support CCEs only require a single variable per strategy

to deﬁne

, d) they are amenable to equilibrium selection

because it permits a convex polytope of solutions, e) under

a CCE, no player has incentive to deviate from the joint

7With the payoff tensor.

(possibly correlated) distribution to any of their own strate-

gies unilaterally since it would not result in higher payoff,

and f) the empirical joint strategy of no-regret algorithms in

self-play converge to a CCE.

In combination with CCEs, ME with any

>min

(Sec-

tion A) spreads at least some mass over all joint strategies

(“full support” (Ortiz et al., 2007)) meaning that the condi-

tional distribution, and hence the payoff rating, is always

well deﬁned. This equilibrium selection method is also in-

variant under afﬁne transforms (Marris et al., 2021a) of the

payoff, scales well to large numbers of players and strate-

gies, and is principled in that it makes minimal assumptions

about the distribution (Jaynes, 1957). Empirically, it groups

strategies within strategic cycles with each other. Using

a solution near

min

allows for a strong, high value equi-

librium to be selected which is particularly important for

coordination games.

4.2. 

uni -MECCE Payoff Rating

A drawback of using

=min +

is that sometimes (usu-

ally when strategies are strictly dominated by others) the

distribution needs to be computed to a very high precision,

otherwise numerical issues will complicate the calculation

of the conditional distributions.

In order to mitigate this problem let us use an approximate

equilibrium distribution which will spread more mass. It

is advantageous to normalize the approximation parame-

ter(Marris et al., 2021a),



uni

, where

uni

is the minimum



that permits the uniform distribution in the feasible set.

When



uni = 1

the uniform distribution is selected by ME,

when 

uni = 0 the MECCE solution is recovered. For some

games, it is possible to set



uni ≤0

to produce ratings with

very robust distributions. This is similar in idea to the contin-

uum of QREs (McKelvey & Palfrey, 1995). Figure 7 shows

how ratings change with



uni

for a two-player, zero-sum

game.

5. Experiments

In order to build intuition and demonstrate the ﬂexibility

of the rating algorithms presented, this section shows rat-

ings for a number of standard and real world data games.

We compare against uniform and

-Rank rating methods.

Section F contains further experiments.

5.1. Standard Normal Form Games

First let us consider the payoff, equilibrium and payoff rat-

ings of some two-player normal-form games (Table 1). RPS

has three strategies in a cycle, and therefore equilibrium

ratings dictate that we cannot differentiate between their

strengths. This is true even if the cycles are biased (Ta-

ble 1a) for the MECCE rating. In this case the probability

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GameTheoreticRatinginN-playergeneral-sumgameswithEquilibriaLukeMarris12MarcLanctot1IanGemp1ShayeganOmidshaei3StephenMcAleer4JeromeConnor1KarlTuyls1ThoreGraepel2AbstractRatingstrategiesinagameisanimportantareaofresearchingametheoryandarticialintel-ligence,andcanbeappliedtoanyreal-worldcompetitiveor...

展开>> 收起<<

Game Theoretic Rating in N-player general-sum games with Equilibria.pdf

共21页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Game Theoretic Rating in N-player general-sum games with Equilibria

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: