
2
•a weighted min-max entropy loss for learning from the
unlabeled data by giving more weight to the unlabeled
samples that are closer to the labeled samples
•a novel algorithm for sensor calibration that can leverage
even the unlabeled sensor data at the target domain. To
the best of our knowledge, semi-supervised deep learning
has not been used for sensor calibration before.
The data and the codes to implement the work described in
this paper are available at https://github.com/madhavlab/2022
SSDA airquality/.
II. RELATED WORKS
A. Domain Adaptation
Generally, domain adaptation problem is characterized by a
source distribution, S(x, y)and a target distribution, T(x, y),
where xis the input and yare the corresponding labels.
Domain adaptation problem is characterized by co-variate
shift where the marginal distributions of two corresponding
domain differ, i.e., S(x)6=T(x). To minimise this co-variance
shift, several feature-based approaches have been proposed
where the model learns a transformation for the feature space
to correct the domain shift between S(x)and T(x). In
[13], the authors propose a new adaptation layer in model
that learns to forget the distributional discrepancy between
the two domains by minimizing the distributional distance
(Maximum Mean Discrepancy). In [14], authors propose Deep
Adaptation Network (DAN), where the mean embeddings
of representations from different domains are matched in
reproducing kernel Hilbert space (RKHS). In [15] and [16],
authors propose correlation alignment based loss to minimize
the domain discrepancy. In the past few years, there have been
extensive research to develop adversarial methods for domain
adaptation [17]–[22]. In [17], the authors propose learning
deep features which are discriminative for the task at hand,
whilst invariant with respect to the shift between domains.
In [19], authors propose to minimize hypothesis discrepancy
between multiple source domains and target domain to make
the learned representations to be domain invariant. The work
[20] extends the method of [17] by proposing a two-stage
algorithm which first learn source encoder and task hypothesis
from the labeled source data, and later, learns a target encoder
through adversarial training. In [21], authors suggest to learn
a new domain-invariant representation by minimizing the mar-
gin discrepancy distance between encoded source and target
domains. In [22], model also learns domain-invariant features
through multi-linear conditional adversarial training between
feature extractor and domain classifier. There are other domain
adaptation methods [7], [23], [24] which not only align feature
distributions from different domains, but also concentrate to
learn better class discriminative boundaries. We adapt the idea
of [7] for the regression task presented in this paper.
B. Semi-supervised Learning
Supervised machine learning has had fabulous success in
various applications. However, it needs labeled data for train-
ing the models. There are many applications where unlabeled
data is available in plenty but labeled data is not easy to
obtain. Semi-supervised learning methods [25]–[27] have been
developed to leverage the unlabeled data by learning good
representations from it and subsequently mapping it to the
target labels by learning from the labeled data. Contrastive
learning [28] is one of the most popular ways to learn from
unlabeled data. Semi-supervised learning is also used for
enhancing domain adaptation [7], [23], [29].
One of the problems with limited label data is class imbal-
ance, where not all classes may be represented in a balanced
way in the labeled data. In supervised learning, several ap-
proaches have been proposed [30], [31]. For semi-supervised
learning too, there are some approaches for handling class
imbalance [32], [33]. These approaches are mostly designed
for classification tasks. For regression tasks, the imbalance
problem becomes more challenging as there is no finite
number of classes in regression.
C. Sensor Calibration
Dense deployment of Continuous ambient air quality mon-
itoring stations (CAAQMS) can provide highly reliable real-
time PM2.5values with having the shortcoming of their high
cost [10]. However, small, portable low-cost sensor devices
(LCSD) can be deployed densely compromising the reliability
of their measurements [12], [11]. Compromise in the reliability
of measured values by low-cost sensors raise the need of
calibration of LCSD against CAAQMS [34].
Extensive research has been carried out in low-cost sensor
calibration for the past few years [35], [36]. In [37], they
propose a linear regressor and Gaussian process regressor
for the calibration of low-cost PM2.5sensors. The work [34]
find quadratic calibration model to be better than its linear
counterpart. Statistical methods, such as ARIMA based models
have been also suggested to calibrate low-cost sensors in
[38]. The work [11] proposes Mahalanobis distance based
weighted K-nearest neighbour algorithm with a learned metric
for calibration. Neural network based methods such as fully
connected neural network [39], [40], convolutional neural
network (CNN) [41], Recurrent neural network [42] have
also been found to be effective in achieving state-of-the-art
results for sensor calibration. But none of these works leverage
domain adaptation for calibration.
There are two recent works [43], [44] which involve domain
adaptation for sensor calibration. [43] applies simple fine
tuning based domain adaptation technique for calibration. In
[44], authors propose to apply model-agnostic meta learning
technique for few shot domain adaptation based calibration.
But none of these domain adaptation based calibration meth-
ods leverages unlabeled data from the target domain that is
available in abundance.
III. PROPOSED METHOD
In semi-supervised domain adaptation, we are given labeled
data and their corresponding ground truth from the source
domain Ds,l ={(xs,l
i, ys,l
i)}ms
i=1. From the target domain, we
are given a few data-label pairs Dt,l ={(xt,l
i, yt,l
i)}mt
i=1 and
a large amount of unlabeled data Dt,u ={(xt,u
i)}mu
i=1 where
ms>> mt.