Clustering-based Tile Embedding CTE A Representation for Level Designs with Skewed Tile Distributions Mrunal Jadhav Matthew Guzdial

2025-04-29 0 0 2.19MB 11 页 10玖币
侵权投诉
Clustering-based Tile Embedding (CTE): A Representation for Level Designs with
Skewed Tile Distributions
Mrunal Jadhav, Matthew Guzdial
Computing Science Department, Amii
University of Alberta
mrunalsu,guzdial@ualberta.ca
Abstract
There has been significant research interest in Procedural
Level Generation via Machine Learning (PLGML), apply-
ing ML techniques to automated level generation. One re-
cent trend is in the direction of learning representations for
level design via embeddings, such as tile embeddings. Tile
Embeddings are continuous vector representations of game
levels unifying their visual, contextual and behavioural in-
formation. However, the original tile embedding struggled to
generate levels with skewed tile distributions. For instance,
Super Mario Bros. (SMB) wherein a majority of tiles rep-
resent the background. To remedy this, we present a modi-
fied tile embedding representation referred to as Clustering-
based Tile Embedding (CTE). Further, we employ clustering
to discretize the continuous CTE representation and present
a novel two-step level generation to leverage both these rep-
resentations. We evaluate the performance of our approach in
generating levels for seen and unseen games with skewed tile
distributions and outperform the original tile embeddings.
Introduction
Procedural Content Generation via Machine Learning
(PCGML) involves training machine learning models on ex-
isting game data to generate new content such as levels,
characters, stories, and music (Summerville et al. 2018).
Due to limited publicly available datasets, particular games
have received a disproportionate amount of attentions from
PCGML researchers, especially when it comes to level de-
sign. Thus we identify a problem of diversity in Procedural
Level Generation via Machine Learning (PLGML).
To address this problem, PLGML researchers have re-
sorted to constructing their own training corpora. For ex-
ample, game level information can be represented as im-
ages (Schubert, Awiszus, and Rosenhahn 2022; Chen et al.
2020), gameplay videos (Summerville et al. 2016a), or as
abstractions of in-game object behaviour (Guzdial and Riedl
2016a; Summerville et al. 2020). An example of this prac-
tice and a valuable contribution to the PLGML community is
the Video Game Level Corpus (VGLC) (Summerville et al.
2016b), which provides an annotated training corpora for
level generation research. The VGLC represents a level with
Copyright © 2022, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
characters called tiles. A rich amount of literature has lever-
aged this representation to generate levels using various ma-
chine learning algorithms such as autoencoders, GANs, and
LSTMs (Summerville and Mateas 2016; Sarkar et al. 2020;
Sarkar, Yang, and Cooper 2020; Giacomello, Lanzi, and
Loiacono 2018; Thakkar et al. 2019). However, the signifi-
cant amount of human effort that goes into converting game
levels to this representation limits the number of games rep-
resented in this format.
Rather than relying on hand-authored representations for
level design, recent research has looked into learning these
representations (Karth et al. 2021; Mawhorter et al. 2021;
Trivedi et al. 2022; Rabii and Cook 2021). We previously
introduced tile embeddings as a domain-independent vector
representation of levels (Jadhav and Guzdial 2021). An au-
toencoder was trained to take in mechanical affordances and
the local pixel context of a tile, and learned a representa-
tion unifying these pieces of information. Tile embeddings
have shown promising results in generating levels where the
games have a good mix of tiles such as Lode Runner. How-
ever, we found that tile embeddings struggled to generate
levels with imbalanced tile distributions. For example, we
observed that a tile embedding-based LSTM level generator
for Super Mario Bros. resulted in empty levels (Figure 1(b)).
This is a common problem in PCGML when the process of
sampling new levels is greedy and biased towards the tile
with the highest probability (in the case of SMB: empty sky
tiles) (Snodgrass and Ontan´
on 2013).
Traditional PLGML approaches have taken advantage of
the discrete nature of the VGLC representation to alleviate
the issue of skewed tile distributions. For instance, a level
generator can be trained on the VGLC or any discrete rep-
resentation such that given a sequence of previous tiles in a
level, it predicts a distribution over the likelihood of possible
next tiles. When generating a new level, tiles at each posi-
tion can be sampled from this probability distribution (Sum-
merville and Mateas 2016). This sampling process solves
the problem of producing empty levels encountered with a
greedy tile selection strategy (Figure 1(b)).
In order to enable sampling in our level generator, we
learn a discrete representation by clustering learned tile em-
beddings. Thus in the presented work, we leverage the ben-
efits of learning simultaneous discrete and continuous rep-
resentations to improve level generation for games with
arXiv:2210.12789v1 [cs.LG] 23 Oct 2022
skewed tile distributions. This allows us to approximate the
benefits of a discrete representation like the VGLC without
the cost of hand-processing training data. The main contri-
butions presented in this work are as follows:
We introduce Cluster-based Tile Embeddings (CTE),
which differ from our original tile embeddings (Jadhav
and Guzdial 2021) by the incorporation of edge informa-
tion and a cluster-based loss.
We present a novel two-step level generation pipeline
based on discretizing our new embedding representation.
We demonstrate and compare the performance of our
CTE representation against both the original tile embed-
dings and the VGLC representation at the task of level
generation for games with skewed tile distributions.
We demonstrate our approach’s ability to generate lev-
els for two games that no prior PLGML approach has
attempted: Bugs Bunny Crazy Castle and Genghis Khan,
based solely on images of their levels.
Related Work
In this section we discuss prior work that has investigated
the role of clustering in game level design, as our approach
learns to discretize our tile embedding using clustering.
Clustering is an unsupervised machine learning technique
to discover groupings in data.
(Guzdial and Riedl 2016b) employed clustering to help
learn probabilistic graphical models for Mario level design.
(Snodgrass 2018) proposed an approach to automatically
identify sets of tiles, based on Markov Random fields and
clustering. Similar to these approaches, we use clustering as
part of our representation learning. However, these previous
studies have based their clustering decisions solely on RGB
representations. In our presented work, along with the RGB
representation of a tile, we also incorporate behavioural and
edge information.
(Yang, Sarkar, and Cooper 2020) employed a Variational
Autoencoder with a Gaussian mixture as a prior distribu-
tion (GMVAE) for level generation. Their work relies on
clustering to identify similar (16 ×16) chunks from levels
of multiple games. The learned components of the Gaus-
sian Mixture Model are then used to generate new chunks of
the same style. (Karth et al. 2021) proposed neurosymbolic
map generation using a VQ-VAE and Wave Function Col-
lapse (WFC). A VQ-VAE quantizes patches of level images
to a finite tileset on which WFC is applied to generate lev-
els. While (Karth et al. 2021) focused on discretizing large
patches of level design images, our work extracts represen-
tation of individual 16 ×16 tiles similar to (Yang, Sarkar,
and Cooper 2020) Like both approaches, our work also uses
clustering for level generation. However, our approach dif-
fers by learning continuous and discrete representations of
tiles and utilising both for level generation.
To the best of our knowledge, we are the first to tie clus-
tering and embeddings together for representation learning
in PCGML. However, this approach has been explored in
other fields like reinforcement learning for games. (Liu et al.
2020) introduced the shrinkage effect in training an encoder
for extracting representations of players in professional ice
hockey. It allows the model to transfer information between
the observations of different players such that statistically
similar players lead to similar representations under similar
game contexts. We draw a parallel to this work and imple-
ment clustering loss to enforce intrinsic clustering and as-
sign similar representations to tiles with similar RGB pixel
representation, affordances and edges.
System Overview
The goal of this work is to learn an improved tile embed-
ding for games with skewed tile distributions for the task
of level generation. Towards this objective, we begin this
section by discussing our modifications to the original tile
embedding autoencoder to learn our new Cluster-based Tile
Embeddings (CTE). Next, we explain the limitations of an
LSTM level generator trained on the original tile embedding
representation for games with skewed tile distributions. We
then present our novel two-step level generation pipeline that
learns a discretization of our CTE through clustering and
leverages both representations for level generation.
CTE: Cluster-based Tile Embeddings
The VGLC tile-based representation of a level Lis an h×w
dimensional array. Here hand ware the height and width
of the level, respectively. Each character of Lis called a tile
which is associated with a 16 ×16 pixel representation in a
level image and a corresponding set of affordances. Affor-
dances convey a tile’s mechanical behaviour.
Our original tile embedding work employed a dual
branched autoencoder to learn a 256-dimensional embed-
ding vector representation of a tile (Jadhav and Guzdial
2021). The network accepted two inputs: 1) a 3*3 grid of
the candidate tile at the centre with its neighbours surround-
ing it in the 16*16*3 RGB pixel representation (48×48×3),
2) the candidate tile’s 13-dimensional one-hot affordance
vector. To compare more easily to the original tile embed-
ding work, we utilise the same set of games (Super Mario
Bros., Kid Icarus, Megaman, Lode Runner and Legend of
Zelda) as our training corpus and maintain the same tile-
affordance mapping. The tile-based level data is taken from
the VGLC corpus1and the JSON files for tile-affordance
mapping are from the original tile embedding implementa-
tion2. We make two modifications to the training of the origi-
nal autoencoder to better handle level design tasks for games
with skewed tile distributions and refer to the newly ex-
tracted 256-dimensional embedding vector as the Cluster-
based Tile Embedding (CTE).
Incorporating Edge Information: When applying the
original tile embeddings to games where the affordance in-
formation was unknown, we found that the latent space rep-
resentations depended predominantly on coloured pixel in-
formation of a tile. For instance, an empty blue sky tile was
placed close to a solid blue brick tile. To discourage this, we
included edge information into our embedding. Canny edge
1https://github.com/TheVGLC/TheVGLC
2https://github.com/js-mrunal/tile embeddings
Figure 1: SMB LSTM level generator outputs with: (a) VGLC representation (b) original tile embedding (c) CTE. We also
include good (d) and bad (e) examples for our two-step CTE level generation process.
detection (Canny 1986) is a common algorithm for identify-
ing edge information. We convert the 16 ×16 ×3pixel rep-
resentation of a tile to grayscale and apply the canny edge
detection algorithm to obtain a 16 ×16 edge feature vec-
tor. Thus for each candidate tile, we feed three inputs to our
autoencoder: the pixel representation of the candidate tile
along with its neighbours (48 ×48 ×3), a 13-dimensional
multi-hot affordance vector and (16 ×16) edge features.
Clustering Loss: In the original tile embedding work, the
learned latent space was fairly continuous, without clear sep-
aration between types of tiles. Learning more distinct groups
can improve the utility of a final representation (Hershey
et al. 2016). With an aim to push representations of similar
elements closer while keeping representations of dissimilar
elements apart, we introduce an explicit cluster-based loss
Lcin the training process. For this cluster-based loss, we
must cluster our data prior to training our autoencoder. The
idea is to leverage the clusters as a guide for representation
learning. For each candidate tile, its 16 ×16 ×3RGB pixel
representation, 13-dimensional multi-hot affordance vector,
and 16×16 edge vector are fed to a Gaussian Mixture Model
(GMM) (Reynolds 2009).
A tile can belong to multiple clusters. For instance, it is
appropriate to assign a Cannon in MegaMan to a clus-
ter of Hazards as well as to a cluster of Solids. We rely on
a GMM in order to account for such potential overlap in
tile groups. We pick an elbow point based on the Silhou-
ette score and Bayesian Information Criterion (BIC) to de-
termine the optimal number of clusters (Rousseeuw 1987;
Schwarz 1978). For the given VGLC dataset, we observe an
elbow point at 10 clusters.
We compute our clustering loss (Lc) as the categorical
cross entropy error between the GMM cluster assignment of
a given tile and its corresponding embedding during train-
ing. Along with Lc, our loss function includes the mean
squared error on the reconstructed edge feature vector (Le),
the mean squared error over the reconstructed image data
(Li) and the binary cross entropy loss on the reconstructed
affordances (La). In totality, the loss function can be mathe-
matically represented as:
T otal loss = (0.5Li)+(1.5La)+(0.5Le)+(0.5Lc)(1)
To accurately embed affordance information, we increase
the relative weight of its reconstruction.
Level Generation for Super Mario Bros.
In this section, we describe the difficulty in generating SMB
levels using an LSTM trained on the original tile embed-
dings and CTE, which motivated our novel two-step level
generation process described below.
Problems with SMB Level Generation: We train two
LSTM models, one on the original tile embeddings and the
other on our CTE representation, for SMB. We follow the
training process from (Jadhav and Guzdial 2021). Sampling
from an LSTM trained on a continuous representation is de-
terministic and hence for a given seed input, these models
generate only one output as shown in Figure 1(b) and (c)
respectively. In both cases we feed in the same 200 tiles
of flat ground as input. While the CTE representation helps
摘要:

Clustering-basedTileEmbedding(CTE):ARepresentationforLevelDesignswithSkewedTileDistributionsMrunalJadhav,MatthewGuzdialComputingScienceDepartment,AmiiUniversityofAlbertamrunalsu,guzdial@ualberta.caAbstractTherehasbeensignicantresearchinterestinProceduralLevelGenerationviaMachineLearning(PLGML),appl...

展开>> 收起<<
Clustering-based Tile Embedding CTE A Representation for Level Designs with Skewed Tile Distributions Mrunal Jadhav Matthew Guzdial.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:2.19MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注