Predicting Blossom Date of Cherry Tree With Support Vector Machine and Recurrent Neural Network

2025-05-02 0 0 1.47MB 6 页 10玖币

侵权投诉

Predicting Blossom Date of Cherry Tree With

Support Vector Machine and Recurrent Neural

Network

Hongyi Zheng,Yanyu Chen,Zihan Zhang

Abstract—Our project probes the relationship between temper-

atures and the blossom date of cherry trees. Through modeling,

future ﬂowering will become predictive, helping the public plan

travels and avoid pollen season. To predict the date when the

cherry trees will blossom exactly could be viewed as a multiclass

classiﬁcation problem, so we applied the multi-class Support

Vector Classiﬁer (SVC) and Recurrent Neural Network (RNN),

particularly Long Short-term Memory (LSTM), to formulate the

problem. In the end, we evaluate and compare the performance of

these approaches to ﬁnd out which one might be more applicable

in reality.

I. INTRODUCTION

Many plants have high ornamental value during speciﬁc

phenophases, and plant phenology correlates highly with sea-

sonal vegetation landscape. Determination of the span and

spatiotemporal patterns of the tourism season for ornamental

plants could provide tourism administrators and the tourists

themselves with a theoretical basis for making travel ar-

rangements. Cherry Blossom, as our investigation focus, is

widely distributed in the northern hemisphere, including Japan,

China, and United States, and is tremendously important both

culturally and economically. According to The National News

review, during the 2018 hanami season, an estimated 63

million people travel to and within Japan (more than 40%

of foreign visitors) with total spending of around $2.7 billion.

Our primary objective is to implement ML techniques

to predict the future exact peak blossom date of Cherry

trees given past sequential daily temperature records (average,

max, min, etc.). Such causality and correlation are inspired

by Zhang’s observation: among all meteorological features,

daily average temperature correlates to the ﬁrst ﬂowering

date and full ﬂowering date of ornamental plants (Magnolia,

Subhirtella) in the Beijing area most strongly [1]. The accu-

mulated temperature of a consecutive time span describes the

growing process of plants and if the value exceeds a certain

threshold, the tree will blossom. Also, the paper suggests

adding additional factors and features, like relative humidity,

solar radiation, and wind speed measurement might improve

the prediction accuracy. We identify the problem as a multi-

class classiﬁcation problem. We implement SVM to evaluate

the non-sequential time interval and LSTM RNN to describe

the interaction between different speciﬁc timestamps in se-

quential series. The effectiveness and universality of these two

approaches in the temperature forecasting ﬁeld are respectively

examined and presented [2,3]. SVM is preferred based on

its good compromise between simplicity and accuracy; An

artiﬁcial neural network is more applicable than a regression

model when predicting accumulated temperature.

Our research ﬁeld is novel and uncultivated after taking an

online literature review. Present studies mostly apply thermal-

time-based or process-based phenology models and statistical

parameterizations, but the ML application is scarce [4–6]. For

its sensitivity to winter and early spring temperatures, the

timing of cherry blossoms is an ideal indicator of the impacts

of climate change on tree phenology. Thus, our result might

give insight into developing adaptation strategies to climate

change in horticulture, conservation planning, restoration, and

other related disciplines. In practice, our model could provide

tourism guidance (more manageable schedules), pollen season

alert, and possibly inspire agricultural planting and induce

ﬁnancial beneﬁts.

II. DATA

Our dataset is twofold: Full-ﬂowering (>70%) date and

historical series of phenological data. Both raw data are

expected to be consecutive time-sequential, and we would

select the intersection dates. Our target regions are Washington

D.C. and Kyoto, two cities renowned for their amazing cherry

blossom festival and have comparable geographical features

(similar latitude, coastal). For Kyoto, the ﬂowering date data

is provided by Yasuki Aono from Osaka Prefecture University,

which records the vegetative cycle of the local cherry tree

since 810AC [7]. We would select the 1881-now span as

the ancient temperature data is missing. For D.C., the data

source is the United States Environmental Protection Agency

with records from 1921-2016 for the main type of cherry tree

around the Tidal Basin [8]. These peak bloom date data will

serve as labels for our classiﬁcation algorithm. Furthermore,

the detailed historical temperature data are from the Japan

Meteorological Agency and the U.S. National Oceanic and

Atmospheric Administration. The latter includes multiple daily

weather features, like humidity, precipitation, evapotranspira-

tion, and wind speed. However, these data are lacking on the

Kyoto side. Such shortage restricts the performance of our

model in the following. Our primary preprocessing is cleaning

missing values and changing data format. For instance, we

modify the original presentation of the date "Month-Day-

Year" (timestamp type) to "Date of the Year" (int type), thus

eliminating the potential error of leap years. If the average

temperature (reported by the measuring station) is missing,

arXiv:2210.04406v1 [cs.LG] 10 Oct 2022

we calculate the average of the maximum and minimum

temperature on that speciﬁc day and ﬁll in the value.

III. METHODOLOGY

To select and implement the most suitable model, the ﬁrst

step is to decide whether we should make it a multi-class

classiﬁcation problem or a regression problem.

On one hand, it is quite intuitive to interpret it as a

regression problem: for each date, we only need to output a

number nindicating the number of days between the date for

prediction and the estimated full ﬂowering date, and ncould

be any value greater than 0.

On the other hand, we could also interpret this prediction

problem as a multi-class classiﬁcation problem, in which we

will only focus on the peak blossom date estimation within k

days. In this case, the output would be a vector with length

k+ 1 containing the probability of class 0to class k. Class 0

represents that the estimated peak blossom date is more than

kdays away, while for i∈1,2,· · · , k, class irepresents that

the estimated peak blossom date is iday(s) away.

We ﬁnally decided to make it a multi-class classiﬁcation

problem based on the consideration that the multi-class classi-

ﬁcation approach focuses on a relatively short time span (e.g.

10 days or 20 days) and thus could provide more accurate

predictions. Although using this method we are not able to

predict the full ﬂowering date if it is more than kdays away,

in this case, the prediction of peak blossom date too far away

would be neither valuable nor accurate.

We will implement two different types of models: Support

Vector Machine (SVM) classiﬁer and Long Short-Term Mem-

ory (LSTM) model to conduct the multi-class classiﬁcation

tasks.

A. Support Vector Machine approach

Multi-class Suppor Vector Machine (SVM) is essentially

a combination of many binary SVM classiﬁers. Meantime,

One-vs-One (OVO) and One-vs-Rest (OVR) are two common

methods used to build multiple classiﬁcation SVM. In our

problem, we explicitly choose the OVO scheme to construct

our multi-class SVC for two reasons [9].

Fig. 1: OVR-OVO schemes

First, multi-class classiﬁers using the OVO scheme do not

generate ambiguous regions that further enlarge the bias in

the ﬁnal prediction, and this bias is initially resulted from our

imbalanced train dataset. Speciﬁcally, in Figure 1, the separa-

tion region of the OVR multi-class classiﬁer fails to cover the

whole space of data. If an input data (Xi, yi)lies in the white

ambiguous region marked, the OVR-SVC will be confused

and will pick a random class near (Xi, yi)to be the output

instead of choosing the one with the largest probability. This

kind of prediction is highly susceptible to misclassiﬁcation in

our problem. Because using unbalanced data for training, the

generated SVC inevitably favors predicting the majority class

appearing most frequently, more accurately than the minority

class appearing least frequent [10]. In other words, ambiguous

regions import more unfair errors, making the classiﬁcation

results more imbalanced in our multi-class SVC [9]. Hence,

we chose the OVO scheme over OVR to obtain SVC predicting

labels more precisely.

Secondly, classiﬁers in the OVO scheme are more stable

and independent than those in the OVR scheme, “dependent

binary classiﬁers could increase learning instability” [9]. Ill-

conditioned systems are always unwanted, therefore, we nat-

urally prefer OVO over OVR.

Given ltraining data (x1, y1), ..., (xl, yl), where x0∈

Rn, i = 1, ..., l and yi∈ {0, ..., 10}is the class of xi. The

primal problem for each binary soft-margin classiﬁer in our

multi-class SVM is:

min

w,b,ξ

2wTw+C

i=1

ξi

subject to yiwTφ(xi) + b≥1−ξi,

ξi≥0, i = 1, . . . , l,

in which, 1

2wTwis the margin maximizer, C·Pl

i=1 ξiis

the penalty term, ξiis the slack variable, Cis the penalization

parameter controlling tolerance of ξi. Since the OVO approach

is applied, this model will generate k(k−1)

2sub-classiﬁers in

total, each of them gives us a decision boundary function fi=

ωT·φ(xi+b).

Here, because our data is not linearly separable, we need

to transform the feature space, making it separable in other

dimensions. So, we apply the RBF kernel trick to complete

the transformation. In speciﬁc, we choose RBF rather than

Linear or polynomial kernel mainly because it generates more

ﬂexible boundaries. The Gaussian Radial Basis kernel function

is exp(γ· kx−¯xk)

The ﬁnal output of the eventual SVM model is:

argmax(fi), indicating the class yireceiving most votes from

k(k−1)

2sub-classiﬁers will be the ﬁnal output of our multi-class

SVM model. 2shows the detailed ﬂow of our SVM method:

Yet, the preparation is not done. Recalling our dataset is

highly imbalanced, if we left the imbalance problem unsolved

and directly do the train-test-split to train and test models, the

SVM-classier obtained will be meaningless since it will always

generate high accuracy due to its preference for majority class,

but fail to be generalized for the minority class. We come up

with two approaches to reduce the inﬂuence of imbalanced

data in our SVC [10].

The ﬁrst approach is to alternate weights of penalization

parameter Ciof different classes proportionally in the primal

equation according to rules: wj=n

k·nj=weight →Cj=

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PredictingBlossomDateofCherryTreeWithSupportVectorMachineandRecurrentNeuralNetworkHongyiZheng,YanyuChen,ZihanZhangAbstractOurprojectprobestherelationshipbetweentemper-aturesandtheblossomdateofcherrytrees.Throughmodeling,futureoweringwillbecomepredictive,helpingthepublicplantravelsandavoidpollensea...

展开>> 收起<<

Predicting Blossom Date of Cherry Tree With Support Vector Machine and Recurrent Neural Network.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Predicting Blossom Date of Cherry Tree With Support Vector Machine and Recurrent Neural Network

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: