omics data, it is nearly impossible for clinicians to analyse multi-omics data. Due to this
reason, they tend to focus on analysing the values of specific biomarkers. However, to get a
complete picture of a tumour, which is heterogeneous and complex, multi-omics data analysis
is vital.
Modern machine learning algorithms, especially deep neural networks, have shown to be
able to work well with high-dimensional data. Deep learning has made massive progress in
tasks like object recognition, object detection and semantic segmentation in the visual domain.
It has also made strides in speech and natural language processing on tasks such as machine
translation, speech recognition and question answering. The algorithms developed for the tasks
mentioned above require processing high-dimensional inputs. In this work, we developed Self-
Supervised Learning (SSL) methods for multi-omics data to provide supervision to the model
from unlabelled data. We explored various SSL pretext tasks on top of the usual reconstruction
task with autoencoders. Some of the SSL techniques we implemented include contrastive
learning, recovering data from its corrupted versions and aligning representations from multi-
omics data.
The low-dimensional representations that our model produces from high-dimensional
multi-omics data can be considered ”computational biomarkers”. The model that learns from
large datasets gets good at producing such biomarkers and can be used to produce good rep-
resentations for smaller datasets. Furthermore, as the model learns from tumours diagnosed
early, it produces better representations for such tumours. Therefore, even if the dataset at
hand does not have samples of tumours that are sequenced early, the fact that it was pre-
trained on a large dataset that contains many samples of such tumours makes the model
better at early diagnosis.
2. Literature Review
Self-supervised learning (SSL) has been extensively applied in representation learning of data
in various domains such as natural language processing4–6 audio and image.7–9 These methods
mainly use spatial, semantic and temporal structural relationships in the data. This is done
through developing novel pretext tasks, data augmentation methods and model architectures.
Due to the absence of the relationships mentioned above in tabular data, such methods could
be less effective. For instance, augmentation methods used on images, such as scaling and
rotation, cannot be directly used on tabular data. SSL techniques have not been explored
enough on tabular data due to these reasons.10
An autoencoder is a deep network that consists of an encoder and decoder.11 While the
encoder is trained to map the input to a latent representation, the decoder is trained to re-
construct the input from this latent representation. A popular work in images is denoising
autoencoders (DAE).12 It is built on the hypothesis that partially destroyed inputs should
result in a similar latent representation as the original inputs. In this work, the authors inves-
tigated an autoencoder’s robustness to partial demolition of inputs. The input is corrupted
and fed to the autoencoder, whose job is to recover the original ”clean” input. A group of re-
searchers developed VIME,10 a novel SSL framework for tabular data. They developed a couple
of pretext tasks called feature vector estimation and mask vector estimation. The former aims