Predicting Blossom Date of Cherry Tree With
Support Vector Machine and Recurrent Neural
Network
Hongyi Zheng,Yanyu Chen,Zihan Zhang
Abstract—Our project probes the relationship between temper-
atures and the blossom date of cherry trees. Through modeling,
future flowering will become predictive, helping the public plan
travels and avoid pollen season. To predict the date when the
cherry trees will blossom exactly could be viewed as a multiclass
classification problem, so we applied the multi-class Support
Vector Classifier (SVC) and Recurrent Neural Network (RNN),
particularly Long Short-term Memory (LSTM), to formulate the
problem. In the end, we evaluate and compare the performance of
these approaches to find out which one might be more applicable
in reality.
I. INTRODUCTION
Many plants have high ornamental value during specific
phenophases, and plant phenology correlates highly with sea-
sonal vegetation landscape. Determination of the span and
spatiotemporal patterns of the tourism season for ornamental
plants could provide tourism administrators and the tourists
themselves with a theoretical basis for making travel ar-
rangements. Cherry Blossom, as our investigation focus, is
widely distributed in the northern hemisphere, including Japan,
China, and United States, and is tremendously important both
culturally and economically. According to The National News
review, during the 2018 hanami season, an estimated 63
million people travel to and within Japan (more than 40%
of foreign visitors) with total spending of around $2.7 billion.
Our primary objective is to implement ML techniques
to predict the future exact peak blossom date of Cherry
trees given past sequential daily temperature records (average,
max, min, etc.). Such causality and correlation are inspired
by Zhang’s observation: among all meteorological features,
daily average temperature correlates to the first flowering
date and full flowering date of ornamental plants (Magnolia,
Subhirtella) in the Beijing area most strongly [1]. The accu-
mulated temperature of a consecutive time span describes the
growing process of plants and if the value exceeds a certain
threshold, the tree will blossom. Also, the paper suggests
adding additional factors and features, like relative humidity,
solar radiation, and wind speed measurement might improve
the prediction accuracy. We identify the problem as a multi-
class classification problem. We implement SVM to evaluate
the non-sequential time interval and LSTM RNN to describe
the interaction between different specific timestamps in se-
quential series. The effectiveness and universality of these two
approaches in the temperature forecasting field are respectively
examined and presented [2,3]. SVM is preferred based on
its good compromise between simplicity and accuracy; An
artificial neural network is more applicable than a regression
model when predicting accumulated temperature.
Our research field is novel and uncultivated after taking an
online literature review. Present studies mostly apply thermal-
time-based or process-based phenology models and statistical
parameterizations, but the ML application is scarce [4–6]. For
its sensitivity to winter and early spring temperatures, the
timing of cherry blossoms is an ideal indicator of the impacts
of climate change on tree phenology. Thus, our result might
give insight into developing adaptation strategies to climate
change in horticulture, conservation planning, restoration, and
other related disciplines. In practice, our model could provide
tourism guidance (more manageable schedules), pollen season
alert, and possibly inspire agricultural planting and induce
financial benefits.
II. DATA
Our dataset is twofold: Full-flowering (>70%) date and
historical series of phenological data. Both raw data are
expected to be consecutive time-sequential, and we would
select the intersection dates. Our target regions are Washington
D.C. and Kyoto, two cities renowned for their amazing cherry
blossom festival and have comparable geographical features
(similar latitude, coastal). For Kyoto, the flowering date data
is provided by Yasuki Aono from Osaka Prefecture University,
which records the vegetative cycle of the local cherry tree
since 810AC [7]. We would select the 1881-now span as
the ancient temperature data is missing. For D.C., the data
source is the United States Environmental Protection Agency
with records from 1921-2016 for the main type of cherry tree
around the Tidal Basin [8]. These peak bloom date data will
serve as labels for our classification algorithm. Furthermore,
the detailed historical temperature data are from the Japan
Meteorological Agency and the U.S. National Oceanic and
Atmospheric Administration. The latter includes multiple daily
weather features, like humidity, precipitation, evapotranspira-
tion, and wind speed. However, these data are lacking on the
Kyoto side. Such shortage restricts the performance of our
model in the following. Our primary preprocessing is cleaning
missing values and changing data format. For instance, we
modify the original presentation of the date "Month-Day-
Year" (timestamp type) to "Date of the Year" (int type), thus
eliminating the potential error of leap years. If the average
temperature (reported by the measuring station) is missing,
1
arXiv:2210.04406v1 [cs.LG] 10 Oct 2022