FCT-GAN: Enhancing Table Synthesis via Fourier
Transform
Zilong Zhao∗, Robert Birke†, Lydia Y. Chen¶
∗TU Delft, Netherlands z.zhao-8@tudelft.nl
†ABB Research, Switzerland birke@ieee.org
¶TU Delft, Netherlands lydiaychen@ieee.org
Abstract—Synthetic tabular data emerges as an alternative
for sharing knowledge while adhering to restrictive data access
regulations, e.g., European General Data Protection Regulation
(GDPR). Mainstream state-of-the-art tabular data synthesiz-
ers draw methodologies from Generative Adversarial Networks
(GANs), which are composed of a generator and a discriminator.
While convolution neural networks are shown to be a better
architecture than fully connected networks for tabular data
synthesizing, two key properties of tabular data are overlooked:
(i) the global correlation across columns, and (ii) invariant
synthesizing to column permutations of input data. To address the
above problems, we propose a Fourier conditional tabular gen-
erative adversarial network (FCT-GAN). We introduce feature
tokenization and Fourier networks to construct a transformer-
style generator and discriminator, and capture both local and
global dependencies across columns. The tokenizer captures
local spatial features and transforms original data into tokens.
Fourier networks transform tokens to frequency domains and
element-wisely multiply a learnable filter. Extensive evaluation
on benchmarks and real-world data shows that FCT-GAN can
synthesize tabular data with high machine learning utility (up to
27.8% better than state-of-the-art baselines) and high statistical
similarity to the original data (up to 26.5% better), while
maintaining the global correlation across columns, especially on
high dimensional dataset.
I. INTRODUCTION
While data sharing is crucial for knowledge development,
privacy concerns and strict regulations (e.g., European General
Data Protection Regulation (GDPR)) limit its full effective-
ness. An emerging solution is to leverage synthetic data
generated by machine learning models. Synthetic data has been
powered by generative adversarial networks (GAN) [6] for
various types of data, e.g., image [9], text to image [19] and
table [24].
Synthetic tabular data emerges as a prominent research
direction because of its ample application scenarios in areas
such as medicine [4] and finance [1]. Compared to image
data, one key difference of tabular data is that it is composed
of different types of columns such as continuous, categorical
or mixed variables. Therefore, GANs designed for image
synthesis cannot be directly applied for tabular data. Previous
works [24], [26], [27] propose feature engineering solutions
for different types of data such as using one-hot encoding for
categorical variable. One-hot encoding is shown [24] to better
recover the categorical variable distribution for tabular GANs
and capture inter-dependency across all the columns. However,
one-hot encoding inevitably increases the data dimensions.
High dimensional data1is challenging for tabular GANs to
learn global relations. Prior studies [24], [26], [27] show that
the tabular GAN algorithms, which adopt CNNs as generator
and discriminator, achieve better synthesis quality than using
purely fully-connected neural networks. This is due to the fact
that CNNs can extract local spatial features well. The first lim-
itation of directly adopting CNN to model tabular data is that it
may overlook global relations between columns due to the size
of the convolution filter. This limitation exacerbates when one-
hot encoding is applied for categorical variables. Secondly,
while permuting columns, e.g., reordering the columns by their
types, does not have any semantic meaning, the local feature
presentation extracted by convolution layers is distorted. When
using CNN for tabular GANs, one table row is transformed
into one fixed-size image by mapping each column value to a
pixel. The relationship between highly distant pixels, e.g. the
pixel in the upper left corner and the pixel in the lower right
corner, in a real image may not influence image classification.
But for the tabular data wrapped as an image, these two pixels
can represent highly correlated columns.
To address the above two limitations, we propose a condi-
tional tabular GAN with Fourier Network blocks (FNBs). The
objective of the FNBs is to learn the interactions among spatial
locations in the frequency domain. We use FNBs for both
the discriminator and generator with different designs. The
Fourier layer, which is the key part of an FNB, contains three
operations: (i) 2D discrete Fourier transform, (ii) element-wise
multiplication between frequency-domain features and learn-
able weights and (iii) 2D inverse discrete Fourier transform.
Furthermore, we process input data in a transformer-style
tokenization way. A CNN-based filter is applied to original
data to capture local spatial features and transform them into
feature tokens. Fourier layers transform tokens into frequency
domain, then the learnable weights are applied to all the
frequencies to learn the global relations. Our results show that
FCT-GAN outperforms state-of-the-art (SOTA) up to 27.8%
in machine learning utility and 26.5% in statistical similarity
on 7 datasets. Thanks to Fourier blocks ability to capture local
and global relations, our results also show that among three
different column orders, FCT-GAN has the least variation in
synthesis quality among all comparisons. The experiment, with
one high dimensional dataset, which 3 SOTA algorithms fail
1In this paper, dimension refers to the number of columns
arXiv:2210.06239v1 [cs.LG] 12 Oct 2022