1 Leveraging unsupervised data and domain adaptation for deep regression in low-cost sensor

2025-04-28 0 0 581.71KB 8 页 10玖币
侵权投诉
1
Leveraging unsupervised data and domain
adaptation for deep regression in low-cost sensor
calibration
Swapnil Dey, Vipul Arora, Sachchida Nand Tripathi
Abstract—Air quality monitoring is becoming an essential task
with rising awareness about air quality. Low cost air quality
sensors are easy to deploy but are not as reliable as the costly
and bulky reference monitors. The low quality sensors can be
calibrated against the reference monitors with the help of deep
learning. In this paper, we translate the task of sensor calibration
into a semi-supervised domain adaptation problem and propose a
novel solution for the same. The problem is challenging because it
is a regression problem with covariate shift and label gap. We use
histogram loss instead of mean squared or mean absolute error,
which is commonly used for regression, and find it useful against
covariate shift. To handle the label gap, we propose weighting
of samples for adversarial entropy optimization. In experimental
evaluations, the proposed scheme outperforms many competitive
baselines, which are based on semi-supervised and supervised
domain adaptation, in terms of R2score and mean absolute error.
Ablation studies show the relevance of each proposed component
in the entire scheme.
Index Terms—air quality monitoring, regression, sensor cali-
bration, semi-supervised domain adaptation, unsupervised learn-
ing.
I. INTRODUCTION
Deep learning based models achieve remarkable perfor-
mances with labeled data. However, many practical scenarios
face a scarcity of labeled data, while there is an abundance of
unlabeled data. Semi-supervised approaches have been devel-
oped to make use of the large unlabeled data in those scenarios.
Another challenge faced in real world tasks is the mismatch
in the data distributions across domains. The models trained
in one domain, called source domain, are unable to generalize
well to the other, called target domain. Domain adaptation
[1] based approaches come to rescue here. While there have
been several works on semi-supervised domain adaptation for
classification [2]–[6], only a limited number of works address
this challenge for regression. In this paper, we focus on a
regression problem where large amounts of labeled data is
available in the source domain, while a very limited labeled
data, along with a large amount of unlabeled data, is available
in the target domain. We apply our approach to calibration of
low cost PM2.5sensors for air quality monitoring.
Swapnil Dey and Vipul Arora are with the Department of Electrical
Engineering, Indian Institute of Technology Kanpur, Kanpur 208016, India
(e-mail: swapon@iitk.ac.in, vipular@iitk.ac.in).
Sachchida Nand Tripathi is with the Department of Civil Engineering,
Indian Institute of Technology Kanpur, and the Centre for Environmental
Science and Engineering, Indian Institute of Technology Kanpur, Kanpur
208016, India (e-mail: snt@iitk.ac.in).
Equal contribution
Saito et al. [7] use Minmax Entropy (MME) based approach
for semi-supervised domain adaptation in classification prob-
lems. The goal there is to not just align the feature distribution
in target domain with that in the source domain, but also
to learn better class-discriminative boundaries in the target
domain. They achieve this by minimizing the entropy with
respect to the feature extractor and maximizing the same with
respect to the classifier. In this paper, we extend their method
to regression.
Since, regression does not include discrete classes that is
needed for the MME approach, we convert the continuous
target into a probability mass function by binning. Apart from
allowing semi-supervised domain adaptation, the histogram
based regression brings in other benefits too. Imani and White
[8] find that histogram based loss function improves regression
performance by regularizing the learning. In this paper, we
amend the MME based approach for regression problems.
Label sets of large amount of source and limited number
of target domain inputs could be different from that of the
unlabeled target domain inputs. We compare this with having
novel classes in the unlabeled target domain inputs, not seen
in the labeled source and target domain inputs. These novel
classes could have a negative effect on learning. Peng et
al. [9] propose a distance based weighting framework for
semi-supervised learning in classification. We adapt the same
framework for domain adaptation in the regression task at
hand.
Air pollution monitoring is generally done using costly
high-fidelity monitors [10]. These days, with increased aware-
ness about air quality, the need of low cost solutions is
being felt. Low cost sensor devices (LCSD) are affordable
and portable but offer less reliable measurements [11], [12].
To improve the precision and robustness of LCSD, deep
calibration methods have been found to be effective. The
supervision for these deep calibration models comes by co-
locating the LCSD with the costly reference monitors, which
is cumbersome. Hence, there is a scarcity of labeled data,
while an abundance of unlabeled data. There are various
factors such as the environmental conditions, geographical
location and sensor characteristics, that bring a mismatch in
domains and make domain adaptation a necessity. We apply
the proposed semi-supervised domain adaptation method to
sensor calibration problem.
The main contributions of this paper are:
a novel semi-supervised domain adaptation algorithm for
regression tasks
arXiv:2210.00521v1 [cs.LG] 2 Oct 2022
2
a weighted min-max entropy loss for learning from the
unlabeled data by giving more weight to the unlabeled
samples that are closer to the labeled samples
a novel algorithm for sensor calibration that can leverage
even the unlabeled sensor data at the target domain. To
the best of our knowledge, semi-supervised deep learning
has not been used for sensor calibration before.
The data and the codes to implement the work described in
this paper are available at https://github.com/madhavlab/2022
SSDA airquality/.
II. RELATED WORKS
A. Domain Adaptation
Generally, domain adaptation problem is characterized by a
source distribution, S(x, y)and a target distribution, T(x, y),
where xis the input and yare the corresponding labels.
Domain adaptation problem is characterized by co-variate
shift where the marginal distributions of two corresponding
domain differ, i.e., S(x)6=T(x). To minimise this co-variance
shift, several feature-based approaches have been proposed
where the model learns a transformation for the feature space
to correct the domain shift between S(x)and T(x). In
[13], the authors propose a new adaptation layer in model
that learns to forget the distributional discrepancy between
the two domains by minimizing the distributional distance
(Maximum Mean Discrepancy). In [14], authors propose Deep
Adaptation Network (DAN), where the mean embeddings
of representations from different domains are matched in
reproducing kernel Hilbert space (RKHS). In [15] and [16],
authors propose correlation alignment based loss to minimize
the domain discrepancy. In the past few years, there have been
extensive research to develop adversarial methods for domain
adaptation [17]–[22]. In [17], the authors propose learning
deep features which are discriminative for the task at hand,
whilst invariant with respect to the shift between domains.
In [19], authors propose to minimize hypothesis discrepancy
between multiple source domains and target domain to make
the learned representations to be domain invariant. The work
[20] extends the method of [17] by proposing a two-stage
algorithm which first learn source encoder and task hypothesis
from the labeled source data, and later, learns a target encoder
through adversarial training. In [21], authors suggest to learn
a new domain-invariant representation by minimizing the mar-
gin discrepancy distance between encoded source and target
domains. In [22], model also learns domain-invariant features
through multi-linear conditional adversarial training between
feature extractor and domain classifier. There are other domain
adaptation methods [7], [23], [24] which not only align feature
distributions from different domains, but also concentrate to
learn better class discriminative boundaries. We adapt the idea
of [7] for the regression task presented in this paper.
B. Semi-supervised Learning
Supervised machine learning has had fabulous success in
various applications. However, it needs labeled data for train-
ing the models. There are many applications where unlabeled
data is available in plenty but labeled data is not easy to
obtain. Semi-supervised learning methods [25]–[27] have been
developed to leverage the unlabeled data by learning good
representations from it and subsequently mapping it to the
target labels by learning from the labeled data. Contrastive
learning [28] is one of the most popular ways to learn from
unlabeled data. Semi-supervised learning is also used for
enhancing domain adaptation [7], [23], [29].
One of the problems with limited label data is class imbal-
ance, where not all classes may be represented in a balanced
way in the labeled data. In supervised learning, several ap-
proaches have been proposed [30], [31]. For semi-supervised
learning too, there are some approaches for handling class
imbalance [32], [33]. These approaches are mostly designed
for classification tasks. For regression tasks, the imbalance
problem becomes more challenging as there is no finite
number of classes in regression.
C. Sensor Calibration
Dense deployment of Continuous ambient air quality mon-
itoring stations (CAAQMS) can provide highly reliable real-
time PM2.5values with having the shortcoming of their high
cost [10]. However, small, portable low-cost sensor devices
(LCSD) can be deployed densely compromising the reliability
of their measurements [12], [11]. Compromise in the reliability
of measured values by low-cost sensors raise the need of
calibration of LCSD against CAAQMS [34].
Extensive research has been carried out in low-cost sensor
calibration for the past few years [35], [36]. In [37], they
propose a linear regressor and Gaussian process regressor
for the calibration of low-cost PM2.5sensors. The work [34]
find quadratic calibration model to be better than its linear
counterpart. Statistical methods, such as ARIMA based models
have been also suggested to calibrate low-cost sensors in
[38]. The work [11] proposes Mahalanobis distance based
weighted K-nearest neighbour algorithm with a learned metric
for calibration. Neural network based methods such as fully
connected neural network [39], [40], convolutional neural
network (CNN) [41], Recurrent neural network [42] have
also been found to be effective in achieving state-of-the-art
results for sensor calibration. But none of these works leverage
domain adaptation for calibration.
There are two recent works [43], [44] which involve domain
adaptation for sensor calibration. [43] applies simple fine
tuning based domain adaptation technique for calibration. In
[44], authors propose to apply model-agnostic meta learning
technique for few shot domain adaptation based calibration.
But none of these domain adaptation based calibration meth-
ods leverages unlabeled data from the target domain that is
available in abundance.
III. PROPOSED METHOD
In semi-supervised domain adaptation, we are given labeled
data and their corresponding ground truth from the source
domain Ds,l ={(xs,l
i, ys,l
i)}ms
i=1. From the target domain, we
are given a few data-label pairs Dt,l ={(xt,l
i, yt,l
i)}mt
i=1 and
a large amount of unlabeled data Dt,u ={(xt,u
i)}mu
i=1 where
ms>> mt.
摘要:

1Leveragingunsuperviseddataanddomainadaptationfordeepregressioninlow-costsensorcalibrationSwapnilDey,VipulArora,SachchidaNandTripathiAbstract—Airqualitymonitoringisbecominganessentialtaskwithrisingawarenessaboutairquality.Lowcostairqualitysensorsareeasytodeploybutarenotasreliableasthecostlyandbulk...

展开>> 收起<<
1 Leveraging unsupervised data and domain adaptation for deep regression in low-cost sensor.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:581.71KB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注