1 Leveraging unsupervised data and domain adaptation for deep regression in low-cost sensor

2025-04-28 0 0 581.71KB 8 页 10玖币

侵权投诉

Leveraging unsupervised data and domain

adaptation for deep regression in low-cost sensor

calibration

Swapnil Dey∗, Vipul Arora∗, Sachchida Nand Tripathi

Abstract—Air quality monitoring is becoming an essential task

with rising awareness about air quality. Low cost air quality

sensors are easy to deploy but are not as reliable as the costly

and bulky reference monitors. The low quality sensors can be

calibrated against the reference monitors with the help of deep

learning. In this paper, we translate the task of sensor calibration

into a semi-supervised domain adaptation problem and propose a

novel solution for the same. The problem is challenging because it

is a regression problem with covariate shift and label gap. We use

histogram loss instead of mean squared or mean absolute error,

which is commonly used for regression, and ﬁnd it useful against

covariate shift. To handle the label gap, we propose weighting

of samples for adversarial entropy optimization. In experimental

evaluations, the proposed scheme outperforms many competitive

baselines, which are based on semi-supervised and supervised

domain adaptation, in terms of R2score and mean absolute error.

Ablation studies show the relevance of each proposed component

in the entire scheme.

Index Terms—air quality monitoring, regression, sensor cali-

bration, semi-supervised domain adaptation, unsupervised learn-

ing.

I. INTRODUCTION

Deep learning based models achieve remarkable perfor-

mances with labeled data. However, many practical scenarios

face a scarcity of labeled data, while there is an abundance of

unlabeled data. Semi-supervised approaches have been devel-

oped to make use of the large unlabeled data in those scenarios.

Another challenge faced in real world tasks is the mismatch

in the data distributions across domains. The models trained

in one domain, called source domain, are unable to generalize

well to the other, called target domain. Domain adaptation

[1] based approaches come to rescue here. While there have

been several works on semi-supervised domain adaptation for

classiﬁcation [2]–[6], only a limited number of works address

this challenge for regression. In this paper, we focus on a

regression problem where large amounts of labeled data is

available in the source domain, while a very limited labeled

data, along with a large amount of unlabeled data, is available

in the target domain. We apply our approach to calibration of

low cost PM2.5sensors for air quality monitoring.

Swapnil Dey and Vipul Arora are with the Department of Electrical

Engineering, Indian Institute of Technology Kanpur, Kanpur 208016, India

(e-mail: swapon@iitk.ac.in, vipular@iitk.ac.in).

Sachchida Nand Tripathi is with the Department of Civil Engineering,

Indian Institute of Technology Kanpur, and the Centre for Environmental

Science and Engineering, Indian Institute of Technology Kanpur, Kanpur

208016, India (e-mail: snt@iitk.ac.in).

∗Equal contribution

Saito et al. [7] use Minmax Entropy (MME) based approach

for semi-supervised domain adaptation in classiﬁcation prob-

lems. The goal there is to not just align the feature distribution

in target domain with that in the source domain, but also

to learn better class-discriminative boundaries in the target

domain. They achieve this by minimizing the entropy with

respect to the feature extractor and maximizing the same with

respect to the classiﬁer. In this paper, we extend their method

to regression.

Since, regression does not include discrete classes that is

needed for the MME approach, we convert the continuous

target into a probability mass function by binning. Apart from

allowing semi-supervised domain adaptation, the histogram

based regression brings in other beneﬁts too. Imani and White

[8] ﬁnd that histogram based loss function improves regression

performance by regularizing the learning. In this paper, we

amend the MME based approach for regression problems.

Label sets of large amount of source and limited number

of target domain inputs could be different from that of the

unlabeled target domain inputs. We compare this with having

novel classes in the unlabeled target domain inputs, not seen

in the labeled source and target domain inputs. These novel

classes could have a negative effect on learning. Peng et

al. [9] propose a distance based weighting framework for

semi-supervised learning in classiﬁcation. We adapt the same

framework for domain adaptation in the regression task at

hand.

Air pollution monitoring is generally done using costly

high-ﬁdelity monitors [10]. These days, with increased aware-

ness about air quality, the need of low cost solutions is

being felt. Low cost sensor devices (LCSD) are affordable

and portable but offer less reliable measurements [11], [12].

To improve the precision and robustness of LCSD, deep

calibration methods have been found to be effective. The

supervision for these deep calibration models comes by co-

locating the LCSD with the costly reference monitors, which

is cumbersome. Hence, there is a scarcity of labeled data,

while an abundance of unlabeled data. There are various

factors such as the environmental conditions, geographical

location and sensor characteristics, that bring a mismatch in

domains and make domain adaptation a necessity. We apply

the proposed semi-supervised domain adaptation method to

sensor calibration problem.

The main contributions of this paper are:

•a novel semi-supervised domain adaptation algorithm for

regression tasks

arXiv:2210.00521v1 [cs.LG] 2 Oct 2022

•a weighted min-max entropy loss for learning from the

unlabeled data by giving more weight to the unlabeled

samples that are closer to the labeled samples

•a novel algorithm for sensor calibration that can leverage

even the unlabeled sensor data at the target domain. To

the best of our knowledge, semi-supervised deep learning

has not been used for sensor calibration before.

The data and the codes to implement the work described in

this paper are available at https://github.com/madhavlab/2022

SSDA airquality/.

II. RELATED WORKS

A. Domain Adaptation

Generally, domain adaptation problem is characterized by a

source distribution, S(x, y)and a target distribution, T(x, y),

where xis the input and yare the corresponding labels.

Domain adaptation problem is characterized by co-variate

shift where the marginal distributions of two corresponding

domain differ, i.e., S(x)6=T(x). To minimise this co-variance

shift, several feature-based approaches have been proposed

where the model learns a transformation for the feature space

to correct the domain shift between S(x)and T(x). In

[13], the authors propose a new adaptation layer in model

that learns to forget the distributional discrepancy between

the two domains by minimizing the distributional distance

(Maximum Mean Discrepancy). In [14], authors propose Deep

Adaptation Network (DAN), where the mean embeddings

of representations from different domains are matched in

reproducing kernel Hilbert space (RKHS). In [15] and [16],

authors propose correlation alignment based loss to minimize

the domain discrepancy. In the past few years, there have been

extensive research to develop adversarial methods for domain

adaptation [17]–[22]. In [17], the authors propose learning

deep features which are discriminative for the task at hand,

whilst invariant with respect to the shift between domains.

In [19], authors propose to minimize hypothesis discrepancy

between multiple source domains and target domain to make

the learned representations to be domain invariant. The work

[20] extends the method of [17] by proposing a two-stage

algorithm which ﬁrst learn source encoder and task hypothesis

from the labeled source data, and later, learns a target encoder

through adversarial training. In [21], authors suggest to learn

a new domain-invariant representation by minimizing the mar-

gin discrepancy distance between encoded source and target

domains. In [22], model also learns domain-invariant features

through multi-linear conditional adversarial training between

feature extractor and domain classiﬁer. There are other domain

adaptation methods [7], [23], [24] which not only align feature

distributions from different domains, but also concentrate to

learn better class discriminative boundaries. We adapt the idea

of [7] for the regression task presented in this paper.

B. Semi-supervised Learning

Supervised machine learning has had fabulous success in

various applications. However, it needs labeled data for train-

ing the models. There are many applications where unlabeled

data is available in plenty but labeled data is not easy to

obtain. Semi-supervised learning methods [25]–[27] have been

developed to leverage the unlabeled data by learning good

representations from it and subsequently mapping it to the

target labels by learning from the labeled data. Contrastive

learning [28] is one of the most popular ways to learn from

unlabeled data. Semi-supervised learning is also used for

enhancing domain adaptation [7], [23], [29].

One of the problems with limited label data is class imbal-

ance, where not all classes may be represented in a balanced

way in the labeled data. In supervised learning, several ap-

proaches have been proposed [30], [31]. For semi-supervised

learning too, there are some approaches for handling class

imbalance [32], [33]. These approaches are mostly designed

for classiﬁcation tasks. For regression tasks, the imbalance

problem becomes more challenging as there is no ﬁnite

number of classes in regression.

C. Sensor Calibration

Dense deployment of Continuous ambient air quality mon-

itoring stations (CAAQMS) can provide highly reliable real-

time PM2.5values with having the shortcoming of their high

cost [10]. However, small, portable low-cost sensor devices

(LCSD) can be deployed densely compromising the reliability

of their measurements [12], [11]. Compromise in the reliability

of measured values by low-cost sensors raise the need of

calibration of LCSD against CAAQMS [34].

Extensive research has been carried out in low-cost sensor

calibration for the past few years [35], [36]. In [37], they

propose a linear regressor and Gaussian process regressor

for the calibration of low-cost PM2.5sensors. The work [34]

ﬁnd quadratic calibration model to be better than its linear

counterpart. Statistical methods, such as ARIMA based models

have been also suggested to calibrate low-cost sensors in

[38]. The work [11] proposes Mahalanobis distance based

weighted K-nearest neighbour algorithm with a learned metric

for calibration. Neural network based methods such as fully

connected neural network [39], [40], convolutional neural

network (CNN) [41], Recurrent neural network [42] have

also been found to be effective in achieving state-of-the-art

results for sensor calibration. But none of these works leverage

domain adaptation for calibration.

There are two recent works [43], [44] which involve domain

adaptation for sensor calibration. [43] applies simple ﬁne

tuning based domain adaptation technique for calibration. In

[44], authors propose to apply model-agnostic meta learning

technique for few shot domain adaptation based calibration.

But none of these domain adaptation based calibration meth-

ods leverages unlabeled data from the target domain that is

available in abundance.

III. PROPOSED METHOD

In semi-supervised domain adaptation, we are given labeled

data and their corresponding ground truth from the source

domain Ds,l ={(xs,l

i, ys,l

i)}ms

i=1. From the target domain, we

are given a few data-label pairs Dt,l ={(xt,l

i, yt,l

i)}mt

i=1 and

a large amount of unlabeled data Dt,u ={(xt,u

i)}mu

i=1 where

ms>> mt.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1Leveragingunsuperviseddataanddomainadaptationfordeepregressioninlow-costsensorcalibrationSwapnilDey,VipulArora,SachchidaNandTripathiAbstractAirqualitymonitoringisbecominganessentialtaskwithrisingawarenessaboutairquality.Lowcostairqualitysensorsareeasytodeploybutarenotasreliableasthecostlyandbulk...

展开>> 收起<<

1 Leveraging unsupervised data and domain adaptation for deep regression in low-cost sensor.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Leveraging unsupervised data and domain adaptation for deep regression in low-cost sensor

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: