Clustering-based Tile Embedding CTE A Representation for Level Designs with Skewed Tile Distributions Mrunal Jadhav Matthew Guzdial

2025-04-29 0 0 2.19MB 11 页 10玖币

侵权投诉

Clustering-based Tile Embedding (CTE): A Representation for Level Designs with

Skewed Tile Distributions

Mrunal Jadhav, Matthew Guzdial

Computing Science Department, Amii

University of Alberta

mrunalsu,guzdial@ualberta.ca

Abstract

There has been signiﬁcant research interest in Procedural

Level Generation via Machine Learning (PLGML), apply-

ing ML techniques to automated level generation. One re-

cent trend is in the direction of learning representations for

level design via embeddings, such as tile embeddings. Tile

Embeddings are continuous vector representations of game

levels unifying their visual, contextual and behavioural in-

formation. However, the original tile embedding struggled to

generate levels with skewed tile distributions. For instance,

Super Mario Bros. (SMB) wherein a majority of tiles rep-

resent the background. To remedy this, we present a modi-

ﬁed tile embedding representation referred to as Clustering-

based Tile Embedding (CTE). Further, we employ clustering

to discretize the continuous CTE representation and present

a novel two-step level generation to leverage both these rep-

resentations. We evaluate the performance of our approach in

generating levels for seen and unseen games with skewed tile

distributions and outperform the original tile embeddings.

Introduction

Procedural Content Generation via Machine Learning

(PCGML) involves training machine learning models on ex-

isting game data to generate new content such as levels,

characters, stories, and music (Summerville et al. 2018).

Due to limited publicly available datasets, particular games

have received a disproportionate amount of attentions from

PCGML researchers, especially when it comes to level de-

sign. Thus we identify a problem of diversity in Procedural

Level Generation via Machine Learning (PLGML).

To address this problem, PLGML researchers have re-

sorted to constructing their own training corpora. For ex-

ample, game level information can be represented as im-

ages (Schubert, Awiszus, and Rosenhahn 2022; Chen et al.

2020), gameplay videos (Summerville et al. 2016a), or as

abstractions of in-game object behaviour (Guzdial and Riedl

2016a; Summerville et al. 2020). An example of this prac-

tice and a valuable contribution to the PLGML community is

the Video Game Level Corpus (VGLC) (Summerville et al.

2016b), which provides an annotated training corpora for

level generation research. The VGLC represents a level with

characters called tiles. A rich amount of literature has lever-

aged this representation to generate levels using various ma-

chine learning algorithms such as autoencoders, GANs, and

LSTMs (Summerville and Mateas 2016; Sarkar et al. 2020;

Sarkar, Yang, and Cooper 2020; Giacomello, Lanzi, and

Loiacono 2018; Thakkar et al. 2019). However, the signiﬁ-

cant amount of human effort that goes into converting game

levels to this representation limits the number of games rep-

resented in this format.

Rather than relying on hand-authored representations for

level design, recent research has looked into learning these

representations (Karth et al. 2021; Mawhorter et al. 2021;

Trivedi et al. 2022; Rabii and Cook 2021). We previously

introduced tile embeddings as a domain-independent vector

representation of levels (Jadhav and Guzdial 2021). An au-

toencoder was trained to take in mechanical affordances and

the local pixel context of a tile, and learned a representa-

tion unifying these pieces of information. Tile embeddings

have shown promising results in generating levels where the

games have a good mix of tiles such as Lode Runner. How-

ever, we found that tile embeddings struggled to generate

levels with imbalanced tile distributions. For example, we

observed that a tile embedding-based LSTM level generator

for Super Mario Bros. resulted in empty levels (Figure 1(b)).

This is a common problem in PCGML when the process of

sampling new levels is greedy and biased towards the tile

with the highest probability (in the case of SMB: empty sky

tiles) (Snodgrass and Ontan´

on 2013).

Traditional PLGML approaches have taken advantage of

the discrete nature of the VGLC representation to alleviate

the issue of skewed tile distributions. For instance, a level

generator can be trained on the VGLC or any discrete rep-

resentation such that given a sequence of previous tiles in a

level, it predicts a distribution over the likelihood of possible

next tiles. When generating a new level, tiles at each posi-

tion can be sampled from this probability distribution (Sum-

merville and Mateas 2016). This sampling process solves

the problem of producing empty levels encountered with a

greedy tile selection strategy (Figure 1(b)).

In order to enable sampling in our level generator, we

learn a discrete representation by clustering learned tile em-

beddings. Thus in the presented work, we leverage the ben-

eﬁts of learning simultaneous discrete and continuous rep-

resentations to improve level generation for games with

arXiv:2210.12789v1 [cs.LG] 23 Oct 2022

skewed tile distributions. This allows us to approximate the

beneﬁts of a discrete representation like the VGLC without

the cost of hand-processing training data. The main contri-

butions presented in this work are as follows:

• We introduce Cluster-based Tile Embeddings (CTE),

which differ from our original tile embeddings (Jadhav

and Guzdial 2021) by the incorporation of edge informa-

tion and a cluster-based loss.

• We present a novel two-step level generation pipeline

based on discretizing our new embedding representation.

• We demonstrate and compare the performance of our

CTE representation against both the original tile embed-

dings and the VGLC representation at the task of level

generation for games with skewed tile distributions.

• We demonstrate our approach’s ability to generate lev-

els for two games that no prior PLGML approach has

attempted: Bugs Bunny Crazy Castle and Genghis Khan,

based solely on images of their levels.

Related Work

In this section we discuss prior work that has investigated

the role of clustering in game level design, as our approach

learns to discretize our tile embedding using clustering.

Clustering is an unsupervised machine learning technique

to discover groupings in data.

(Guzdial and Riedl 2016b) employed clustering to help

learn probabilistic graphical models for Mario level design.

(Snodgrass 2018) proposed an approach to automatically

identify sets of tiles, based on Markov Random ﬁelds and

clustering. Similar to these approaches, we use clustering as

part of our representation learning. However, these previous

studies have based their clustering decisions solely on RGB

representations. In our presented work, along with the RGB

representation of a tile, we also incorporate behavioural and

edge information.

(Yang, Sarkar, and Cooper 2020) employed a Variational

Autoencoder with a Gaussian mixture as a prior distribu-

tion (GMVAE) for level generation. Their work relies on

clustering to identify similar (16 ×16) chunks from levels

of multiple games. The learned components of the Gaus-

sian Mixture Model are then used to generate new chunks of

the same style. (Karth et al. 2021) proposed neurosymbolic

map generation using a VQ-VAE and Wave Function Col-

lapse (WFC). A VQ-VAE quantizes patches of level images

to a ﬁnite tileset on which WFC is applied to generate lev-

els. While (Karth et al. 2021) focused on discretizing large

patches of level design images, our work extracts represen-

tation of individual 16 ×16 tiles similar to (Yang, Sarkar,

and Cooper 2020) Like both approaches, our work also uses

clustering for level generation. However, our approach dif-

fers by learning continuous and discrete representations of

tiles and utilising both for level generation.

To the best of our knowledge, we are the ﬁrst to tie clus-

tering and embeddings together for representation learning

in PCGML. However, this approach has been explored in

other ﬁelds like reinforcement learning for games. (Liu et al.

2020) introduced the shrinkage effect in training an encoder

for extracting representations of players in professional ice

hockey. It allows the model to transfer information between

the observations of different players such that statistically

similar players lead to similar representations under similar

game contexts. We draw a parallel to this work and imple-

ment clustering loss to enforce intrinsic clustering and as-

sign similar representations to tiles with similar RGB pixel

representation, affordances and edges.

System Overview

The goal of this work is to learn an improved tile embed-

ding for games with skewed tile distributions for the task

of level generation. Towards this objective, we begin this

section by discussing our modiﬁcations to the original tile

embedding autoencoder to learn our new Cluster-based Tile

Embeddings (CTE). Next, we explain the limitations of an

LSTM level generator trained on the original tile embedding

representation for games with skewed tile distributions. We

then present our novel two-step level generation pipeline that

learns a discretization of our CTE through clustering and

leverages both representations for level generation.

CTE: Cluster-based Tile Embeddings

The VGLC tile-based representation of a level Lis an h×w

dimensional array. Here hand ware the height and width

of the level, respectively. Each character of Lis called a tile

which is associated with a 16 ×16 pixel representation in a

level image and a corresponding set of affordances. Affor-

dances convey a tile’s mechanical behaviour.

Our original tile embedding work employed a dual

branched autoencoder to learn a 256-dimensional embed-

ding vector representation of a tile (Jadhav and Guzdial

2021). The network accepted two inputs: 1) a 3*3 grid of

the candidate tile at the centre with its neighbours surround-

ing it in the 16*16*3 RGB pixel representation (48×48×3),

2) the candidate tile’s 13-dimensional one-hot affordance

vector. To compare more easily to the original tile embed-

ding work, we utilise the same set of games (Super Mario

Bros., Kid Icarus, Megaman, Lode Runner and Legend of

Zelda) as our training corpus and maintain the same tile-

affordance mapping. The tile-based level data is taken from

the VGLC corpus1and the JSON ﬁles for tile-affordance

mapping are from the original tile embedding implementa-

tion2. We make two modiﬁcations to the training of the origi-

nal autoencoder to better handle level design tasks for games

with skewed tile distributions and refer to the newly ex-

tracted 256-dimensional embedding vector as the Cluster-

based Tile Embedding (CTE).

Incorporating Edge Information: When applying the

original tile embeddings to games where the affordance in-

formation was unknown, we found that the latent space rep-

resentations depended predominantly on coloured pixel in-

formation of a tile. For instance, an empty blue sky tile was

placed close to a solid blue brick tile. To discourage this, we

included edge information into our embedding. Canny edge

1https://github.com/TheVGLC/TheVGLC

2https://github.com/js-mrunal/tile embeddings

Figure 1: SMB LSTM level generator outputs with: (a) VGLC representation (b) original tile embedding (c) CTE. We also

include good (d) and bad (e) examples for our two-step CTE level generation process.

detection (Canny 1986) is a common algorithm for identify-

ing edge information. We convert the 16 ×16 ×3pixel rep-

resentation of a tile to grayscale and apply the canny edge

detection algorithm to obtain a 16 ×16 edge feature vec-

tor. Thus for each candidate tile, we feed three inputs to our

autoencoder: the pixel representation of the candidate tile

along with its neighbours (48 ×48 ×3), a 13-dimensional

multi-hot affordance vector and (16 ×16) edge features.

Clustering Loss: In the original tile embedding work, the

learned latent space was fairly continuous, without clear sep-

aration between types of tiles. Learning more distinct groups

can improve the utility of a ﬁnal representation (Hershey

et al. 2016). With an aim to push representations of similar

elements closer while keeping representations of dissimilar

elements apart, we introduce an explicit cluster-based loss

Lcin the training process. For this cluster-based loss, we

must cluster our data prior to training our autoencoder. The

idea is to leverage the clusters as a guide for representation

learning. For each candidate tile, its 16 ×16 ×3RGB pixel

representation, 13-dimensional multi-hot affordance vector,

and 16×16 edge vector are fed to a Gaussian Mixture Model

(GMM) (Reynolds 2009).

A tile can belong to multiple clusters. For instance, it is

appropriate to assign a Cannon in MegaMan to a clus-

ter of Hazards as well as to a cluster of Solids. We rely on

a GMM in order to account for such potential overlap in

tile groups. We pick an elbow point based on the Silhou-

ette score and Bayesian Information Criterion (BIC) to de-

termine the optimal number of clusters (Rousseeuw 1987;

Schwarz 1978). For the given VGLC dataset, we observe an

elbow point at 10 clusters.

We compute our clustering loss (Lc) as the categorical

cross entropy error between the GMM cluster assignment of

a given tile and its corresponding embedding during train-

ing. Along with Lc, our loss function includes the mean

squared error on the reconstructed edge feature vector (Le),

the mean squared error over the reconstructed image data

(Li) and the binary cross entropy loss on the reconstructed

affordances (La). In totality, the loss function can be mathe-

matically represented as:

T otal loss = (0.5Li)+(1.5La)+(0.5Le)+(0.5Lc)(1)

To accurately embed affordance information, we increase

the relative weight of its reconstruction.

Level Generation for Super Mario Bros.

In this section, we describe the difﬁculty in generating SMB

levels using an LSTM trained on the original tile embed-

dings and CTE, which motivated our novel two-step level

generation process described below.

Problems with SMB Level Generation: We train two

LSTM models, one on the original tile embeddings and the

other on our CTE representation, for SMB. We follow the

training process from (Jadhav and Guzdial 2021). Sampling

from an LSTM trained on a continuous representation is de-

terministic and hence for a given seed input, these models

generate only one output as shown in Figure 1(b) and (c)

respectively. In both cases we feed in the same 200 tiles

of ﬂat ground as input. While the CTE representation helps

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Clustering-basedTileEmbedding(CTE):ARepresentationforLevelDesignswithSkewedTileDistributionsMrunalJadhav,MatthewGuzdialComputingScienceDepartment,AmiiUniversityofAlbertamrunalsu,guzdial@ualberta.caAbstractTherehasbeensignicantresearchinterestinProceduralLevelGenerationviaMachineLearning(PLGML),appl...

展开>> 收起<<

Clustering-based Tile Embedding CTE A Representation for Level Designs with Skewed Tile Distributions Mrunal Jadhav Matthew Guzdial.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Clustering-based Tile Embedding CTE A Representation for Level Designs with Skewed Tile Distributions Mrunal Jadhav Matthew Guzdial

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: