
skewed tile distributions. This allows us to approximate the
benefits of a discrete representation like the VGLC without
the cost of hand-processing training data. The main contri-
butions presented in this work are as follows:
• We introduce Cluster-based Tile Embeddings (CTE),
which differ from our original tile embeddings (Jadhav
and Guzdial 2021) by the incorporation of edge informa-
tion and a cluster-based loss.
• We present a novel two-step level generation pipeline
based on discretizing our new embedding representation.
• We demonstrate and compare the performance of our
CTE representation against both the original tile embed-
dings and the VGLC representation at the task of level
generation for games with skewed tile distributions.
• We demonstrate our approach’s ability to generate lev-
els for two games that no prior PLGML approach has
attempted: Bugs Bunny Crazy Castle and Genghis Khan,
based solely on images of their levels.
Related Work
In this section we discuss prior work that has investigated
the role of clustering in game level design, as our approach
learns to discretize our tile embedding using clustering.
Clustering is an unsupervised machine learning technique
to discover groupings in data.
(Guzdial and Riedl 2016b) employed clustering to help
learn probabilistic graphical models for Mario level design.
(Snodgrass 2018) proposed an approach to automatically
identify sets of tiles, based on Markov Random fields and
clustering. Similar to these approaches, we use clustering as
part of our representation learning. However, these previous
studies have based their clustering decisions solely on RGB
representations. In our presented work, along with the RGB
representation of a tile, we also incorporate behavioural and
edge information.
(Yang, Sarkar, and Cooper 2020) employed a Variational
Autoencoder with a Gaussian mixture as a prior distribu-
tion (GMVAE) for level generation. Their work relies on
clustering to identify similar (16 ×16) chunks from levels
of multiple games. The learned components of the Gaus-
sian Mixture Model are then used to generate new chunks of
the same style. (Karth et al. 2021) proposed neurosymbolic
map generation using a VQ-VAE and Wave Function Col-
lapse (WFC). A VQ-VAE quantizes patches of level images
to a finite tileset on which WFC is applied to generate lev-
els. While (Karth et al. 2021) focused on discretizing large
patches of level design images, our work extracts represen-
tation of individual 16 ×16 tiles similar to (Yang, Sarkar,
and Cooper 2020) Like both approaches, our work also uses
clustering for level generation. However, our approach dif-
fers by learning continuous and discrete representations of
tiles and utilising both for level generation.
To the best of our knowledge, we are the first to tie clus-
tering and embeddings together for representation learning
in PCGML. However, this approach has been explored in
other fields like reinforcement learning for games. (Liu et al.
2020) introduced the shrinkage effect in training an encoder
for extracting representations of players in professional ice
hockey. It allows the model to transfer information between
the observations of different players such that statistically
similar players lead to similar representations under similar
game contexts. We draw a parallel to this work and imple-
ment clustering loss to enforce intrinsic clustering and as-
sign similar representations to tiles with similar RGB pixel
representation, affordances and edges.
System Overview
The goal of this work is to learn an improved tile embed-
ding for games with skewed tile distributions for the task
of level generation. Towards this objective, we begin this
section by discussing our modifications to the original tile
embedding autoencoder to learn our new Cluster-based Tile
Embeddings (CTE). Next, we explain the limitations of an
LSTM level generator trained on the original tile embedding
representation for games with skewed tile distributions. We
then present our novel two-step level generation pipeline that
learns a discretization of our CTE through clustering and
leverages both representations for level generation.
CTE: Cluster-based Tile Embeddings
The VGLC tile-based representation of a level Lis an h×w
dimensional array. Here hand ware the height and width
of the level, respectively. Each character of Lis called a tile
which is associated with a 16 ×16 pixel representation in a
level image and a corresponding set of affordances. Affor-
dances convey a tile’s mechanical behaviour.
Our original tile embedding work employed a dual
branched autoencoder to learn a 256-dimensional embed-
ding vector representation of a tile (Jadhav and Guzdial
2021). The network accepted two inputs: 1) a 3*3 grid of
the candidate tile at the centre with its neighbours surround-
ing it in the 16*16*3 RGB pixel representation (48×48×3),
2) the candidate tile’s 13-dimensional one-hot affordance
vector. To compare more easily to the original tile embed-
ding work, we utilise the same set of games (Super Mario
Bros., Kid Icarus, Megaman, Lode Runner and Legend of
Zelda) as our training corpus and maintain the same tile-
affordance mapping. The tile-based level data is taken from
the VGLC corpus1and the JSON files for tile-affordance
mapping are from the original tile embedding implementa-
tion2. We make two modifications to the training of the origi-
nal autoencoder to better handle level design tasks for games
with skewed tile distributions and refer to the newly ex-
tracted 256-dimensional embedding vector as the Cluster-
based Tile Embedding (CTE).
Incorporating Edge Information: When applying the
original tile embeddings to games where the affordance in-
formation was unknown, we found that the latent space rep-
resentations depended predominantly on coloured pixel in-
formation of a tile. For instance, an empty blue sky tile was
placed close to a solid blue brick tile. To discourage this, we
included edge information into our embedding. Canny edge
1https://github.com/TheVGLC/TheVGLC
2https://github.com/js-mrunal/tile embeddings