Coupling User Preference with External Rewards to Enable Driver-centered and Resource-aware EV Charging Recommendation

2025-05-06 0 0 884.08KB 16 页 10玖币

侵权投诉

Coupling User Preference with External Rewards

to Enable Driver-centered and Resource-aware

EV Charging Recommendation

Chengyin Li, Zheng Dong, Nathan Fisher, and Dongxiao Zhu [ ]

Department of Computer Science, Wayne State University, Detroit MI 48201, USA

{cyli, dong, fishern, dzhu}@wayne.edu

Abstract. Electric Vehicle (EV) charging recommendation that both

accommodates user preference and adapts to the ever-changing external

environment arises as a cost-eﬀective strategy to alleviate the range anxi-

ety of private EV drivers. Previous studies focus on centralized strategies

to achieve optimized resource allocation, particularly useful for privacy-

indiﬀerent taxi ﬂeets and ﬁxed-route public transits. However, private

EV driver seeks a more personalized and resource-aware charging rec-

ommendation that is tailor-made to accommodate the user preference

(when and where to charge) yet suﬃciently adaptive to the spatiotem-

poral mismatch between charging supply and demand. Here we propose

a novel Regularized Actor-Critic (RAC) charging recommendation ap-

proach that would allow each EV driver to strike an optimal balance

between the user preference (historical charging pattern) and the exter-

nal reward (driving distance and wait time). Experimental results on

two real-world datasets demonstrate the unique features and superior

performance of our approach to the competing methods.

Keywords: Actor critic ·Charging recommendation ·Electric vehicle

(EV) ·User preference ·External reward.

1 Introduction

Electric Vehicles (EVs) are becoming popular due to their decreased carbon

footprint and intelligent driving experience over conventional internal combus-

tion vehicles [9] in personal transportation tools. Meanwhile, the miles per charge

of an EV is limited by its battery capacity, together with sparse allocations of

charging stations (CSs) and excessive wait/charge time, which are major driving

factors for the so-called range anxiety, especially for private EV drivers. Recently,

developing intelligent driver-centered charging recommendation algorithms are

emerging as a cost-eﬀective strategy to ensure suﬃcient utilization of the existing

charging infrastructure and satisfactory user experience [17, 19].

Existing charging recommendation studies mainly focus on public EVs (e.g.,

electric taxis and buses) [2, 17]. With relatively ﬁxed schedule routines, and no

privacy or user preference consideration, the public EV charging recommenda-

tion for public transits can be made completely to optimize CS resource utiliza-

tion. In general, these algorithms often leverage a global server, which monitors

arXiv:2210.12693v1 [cs.LG] 23 Oct 2022

2 Li et al.

Server

CSN

CS4

CS3

Info. of all CSs

(A)

(B)

SOC

(i) Spatiotemporal charging behavior

(ii) Inherent user preference and external reward

Day

Hour

CS3

CS1 CS2 CS3 CS2

Next CS

recommendation

CS1 CS2

Reward

1 2 3 4 N

Preference

1 2 3 4 N

E.g., occupancy

information

Fig. 1. Driver-centered and resource-aware charging recommendation. (A) Centralized

charging recommendation enables optimized resource allocation, where bi-directional

information sharing between the sever and EVs is assumed. (B) Driver-centered charg-

ing recommendation considers user preference and external reward, where only monodi-

rectional information (e.g., the occupancy information of all CSs) sharing from the sever

to each EV is required (green dotted line). Therefore, private information of an EV,

like GPS location, is not uploaded to the server (pink dotted line).

all the CSs in a city (Fig. 1A). Charging recommendation can be fulﬁlled upon re-

quests for public EVs by sending their GPS locations and state of charge (SOC).

This kind of recommendation gives each EV an optimal driving and wait time

before charging. Instead of using one single global server, many servers can be

distributed across a city [1, 2] to reduce the recommendation latency for public

EVs.

Although server-centralized methods have an excellent resource-aware prop-

erty for the availability of charging for CSs, for private EVs, they rarely ac-

commodate individual user preferences of charging and even have the risk of

private data leakage (e.g., GPS location). Thus, the centralized strategy would

also impair the trustworthiness [6, 11, 12] of the charging recommendation. A

driver-centered instead of a server-centralized charging recommendation strat-

egy would be preferred for a private EV to follow its user preference without

leaking private information. In this situation (Fig. 1B), there would be a se-

quence of on-EV charging events records (when and which CS) that reﬂect the

personal preference of charging patterns for a private EV driver. To enable the

resource-aware property for a driver-centered charging recommendation, creating

a public platform for sharing availability of CSs is needed.

Motivated by the success of recent research on the next POI (Point Of Inter-

est) recommendation centered on each user, these studies can also be adapted to

solve the charging recommendation problem for private EVs when viewing each

CS as a POI. Diﬀerent from collaborative ﬁltering, based on the general recom-

mendation that learns similarities between users and items [10], the following

POI recommendation algorithms attempt to predict the most likely next POI

that a user will visit based on the historical trajectory [4,13,23,24,26]. Although

Coupling User Preference with External Rewards for EV Charging Rec. 3

these methods indeed model user preferences, they are neither resource-aware

nor adapted to the ever-changing external environment.

As such, a desirable charging recommender for a private EV requires: (1)

learning the user preference from its historical charging patterns for achieving

driver-centered recommendation, and (2) having a good external reward (op-

timal driving and wait time before charging) to achieve resource-aware recom-

mendation (Fig. 1 B). By treating the private EV charging recommendation as

the next POI recommendation problem, maximizing external rewards (with a

shorter time of driving and wait before charging) by exploring possible CSs for

each recommendation, reinforcement learning can be utilized. To leverage user

preference and external reward, we propose a novel charging recommendation

framework, Regularized Actor-Critic (RAC), for private EVs. The critic is based

on a resource-saving over all CSs to give a evaluation value over the prediction

of actor representing external reward, and the actor is reinforced by the reward

and simultaneously regularized by the driver’s user preference. Both actor and

critic are based on deep neural networks (DNNs).

We summarize the main contributions of this work as follows: (1) we design

and develop a novel framework RAC to give driver-centered and resource-aware

charging recommendations on-EV recommendation; (2) RAC is tailor-made for

each driver, allowing each to accommodate inherent user preference and also

adapt to ever-changing external reward; and (3) we propose a warm-up training

technique to solve the cold-start recommendation problem for new EV drivers.

2 Related Work

Next POI recommendation has attracted much attention recently in location-

based analysis. There are two lines of POI recommendation methods: (1) follow-

ing user preference from sequential visiting POIs regularities, and (2) exploiting

external incentive via maximizing the utility (reward) of recommendations.

For the ﬁrst line of research, the earlier works primarily attempt to solve

the sequential next-item recommendation problem using temporal features. For

example, [13] introduces Factorizing Personalized Markov Chain (FPMC) that

captures sequential dependency between the recent and next items as well as the

general taste of a user using a combination of matrix factorization and Markov

chains for next-basket recommendation. [26] proposes a time-related Long-Short

Term Memory (LSTM) network to capture both long- and short-term sequential

inﬂuence for next item recommendation. [5] attempts to model user’ preference

drift over time to achieve a better user experience in next item recommendation.

These next-item recommendation approaches only use temporal features whereas

next POI recommendation would need to use both temporal and geospatial

features.

More recent studies of next POI recommendation not only model temporal

relations but also consider geospatial context, such as ST-RNN [7] and ATST-

LSTM [3]. [4] proposes a hierarchical extension of LSTM to code spatial and

temporal contexts into the LSTM for general location recommendation. [24] in-

4 Li et al.

troduces a spatiotemporal gated network model where they leverage time gate

and distance gate to control the eﬀect of the last visited POI on next POI recom-

mendation. [23] extends the gates with a power-law attention mechanism with

more attention on the nearby POIs and explores the subsequence patterns for

next POI recommendation. [21] develops a long and short-term preference learn-

ing model considering sequential and context information for next POI recom-

mendation. User preference-based methods can achieve signiﬁcant performance

for the following users’ previous experience; however, they are restricted from

making novel recommendations beyond users’ previous experience.

Although few studies exploit external incentive, these methods can help ex-

plore new possibilities for next POI recommendation. Charging Recommenda-

tion with multi-agent reinforcement learning is applied for public EVs [16, 22],

in which private information from each EV is inevitably required. [8] proposes

an inverse reinforcement learning method for next visit action recommendation

by maximizing the reward that the user gains when discovering new, relevant,

and non-popular POIs. This study utilizes the optimal POI selection policy (the

POI visit trajectory of a similar group users) as the guidance. As such, it is only

applicable for the centralized charging recommendation for privacy-indiﬀerent

public transit ﬂeets where charging events are aggregated to the central server

to learn the user group. However, this approach is not applicable to the driver-

centered EV charging recommendation problem that we are tackling since the

individual charging pattern is learned without data sharing across drivers. Be-

sides the inverse reinforcement learning approach, [25] introduces deep reinforce-

ment learning for news recommendation, and [18] proposes supervised reinforce-

ment learning for treatment recommendation. These methods are also based on

learning similar user groups thus not directly applicable to the driver-centered

EV charging recommendation task, the latter is further subject to resource and

geospatial constraints.

Despite the existing approaches utilized spatiotemporal, social network, and/or

contextual information for eﬀective next POI recommendations, they do not pos-

sess the desirable features for CS recommendation, which are (1) driver-centered:

the trade-oﬀ between the driver’s charging preference and the external reward is

tuned for each driver, particularly for new drivers, and (2) resource-aware: there

is usually capacity constraint on a CS but not on a social check-in POI.

3 Problem Formulation

Each EV driver is considered as an agent, and the trustworthy server that col-

lects occupancy information of all the CSs represent the external ever-changing

environment. We considered our charging recommendation as a ﬁnite-horizon

MDP problem where a stochastic policy consists of a state space S, an action

space A, and a reward function r:S×A → R. At each time point t, an EV driver

with the current state st∈ S, chooses an action at, i.e., the one-hot encoding of

a CS, based on a stochastic policy πθ(a|s)where θis the set of parameters, and

receives a reward rtfrom the spatiotemporal environment. Our objective is to

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CouplingUserPreferencewithExternalRewardstoEnableDriver-centeredandResource-awareEVChargingRecommendationChengyinLi,ZhengDong,NathanFisher,andDongxiaoZhu[]DepartmentofComputerScience,WayneStateUniversity,DetroitMI48201,USA{cyli,dong,fishern,dzhu}@wayne.eduAbstract.ElectricVehicle(EV)chargingrecommen...

展开>> 收起<<

Coupling User Preference with External Rewards to Enable Driver-centered and Resource-aware EV Charging Recommendation.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Coupling User Preference with External Rewards to Enable Driver-centered and Resource-aware EV Charging Recommendation

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: