On Designing Day Ahead and Same Day Ridership Level Prediction Models for City-Scale Transit Networks Using Noisy APC Data

2025-05-02 0 0 1.19MB 9 页 10玖币
侵权投诉
On Designing Day Ahead and Same Day Ridership
Level Prediction Models for City-Scale Transit
Networks Using Noisy APC Data
Jose Paolo Talusan1, Ayan Mukhopadhyay1, Dan Freudberg2, and Abhishek Dubey1
1Vanderbilt University
2Nashville Metropolitan Transit Authority
Abstract—The ability to accurately predict public transit rid-
ership demand benefits passengers and transit agencies. Agencies
will be able to reallocate buses to handle under or over-utilized
bus routes, improving resource utilization, and passengers will
be able to adjust and plan their schedules to avoid overcrowded
buses and maintain a certain level of comfort. However, accu-
rately predicting occupancy is a non-trivial task. Various reasons
such as heterogeneity, evolving ridership patterns, exogenous
events like weather, and other stochastic variables, make the task
much more challenging. With the progress of big data, transit
authorities now have access to real-time passenger occupancy
information for their vehicles. The amount of data generated is
staggering. While there is no shortage in data, it must still be
cleaned, processed, augmented, and merged before any useful
information can be generated. In this paper, we propose the use
and fusion of data from multiple sources, cleaned, processed, and
merged together, for use in training machine learning models
to predict transit ridership. We use data that spans a 2-year
period (2020-2022) incorporating transit, weather, traffic, and
calendar data. The resulting data, which equates to 17 million
observations, is used to train separate models for the trip and
stop level prediction. We evaluate our approach on real-world
transit data provided by the public transit agency of Nashville,
TN. We demonstrate that the trip level model based on Xgboost
and the stop level model based on LSTM outperform the baseline
statistical model across the entire transit service day.
Index Terms—component, formatting, style, styling, insert
I. INTRODUCTION
Public transportation is a vital component in any modern
metropolitan city. Access to reliable forms of public transit
have been known to have an impact in many aspects, such
improved quality of life, reduced carbon emissions, and have
an overall positive effect on social equity. However, even with
the availability of public transit, it is not always guaranteed
that it is always reliable and accessible. On the contrary, they
are more often over-stretched or underdeveloped. As a result,
most of the work being done is focused on improving the
accessibility and reliability of public transit.
Traditional measures of reliable transit systems include trip
frequency, punctuality, and travel time. In response, plenty of
work has been done with the goal of improving travel times
by identifying and reducing causes of delay [1]. However, an
often overlooked element in reliability is the perceived comfort
of riders [2] which can be seen as the a direct consequence
of vehicle occupancy and capacity. A frequently overcrowded
bus can prevent potential commuters from even considering
public transit. Inversely, consistently low rider demand can be
seen as an under utilization of already constrained resources.
This duality of public transit is often caused by the agencies’
constant struggle with providing increased transit coverage
amidst highly heterogeneous ridership demand.
With the progress of big data, transit authorities now have
access to and are able to provide real-time passenger occu-
pancy information for their vehicles. Transit agencies such
as the Nashville Metropolitan Transit Authority (MTA) uses
Automated Passenger Counter (APC) systems that provides
stop-level estimates of passenger boarding and alighting. This
information have been integrated by apps such as Transit1,
which in addition to allowing potential riders to see an
estimated future passenger occupancy, also use crowdsourcing
to collect occupancy information from riders onboard in an
effort to improve service accuracy. From the perspective of
passengers, this helps them choose departure times to match
their desired comfort level. For the agencies, this can be a
reference for them to optimize their services by allocating
resources according to predicted ridership demand. Thus,
accurately predicting the maximum occupancy of each vehicle
in a public transit system is pivotal in improving perceived
reliability, resource optimization, and rider comfort.
Achieving highly accurate occupancy prediction, however,
is a difficult task. There are a number of factors that can
affect demand ranging from short high impact factors such
as sport events and festivals to long-term factors such as
school schedule and season. Additionally, stochastic traffic
conditions along the route can cause variation in ridership,
further increasing uncertainty. Another issue that can affect
prediction is sensor data noise. As with any system that
relies on a fleet of sensors and a large database, there are
bound to be inconsistencies and errors. This is especially true
for APC systems, where passenger boarding and alighting
information are recorded using infrared sensors installed on
vehicle doors [3]. This can lead to erratic and misleading
information. This issue brings up the need for data preparation
and augmentation to ensure that the data is reliable and useful.
In this paper, we implement an end-to-end framework for
1https://transitapp.com/
arXiv:2210.04989v1 [cs.LG] 10 Oct 2022
predicting occupancy at both the stop and route levels. This
ensures that our method can react to both short and long-term
changes in the public transit system. We do this by analyzing
and combining different spatio-temporal data such as weather,
traffic, and APC data to develop a model for bus occupancy.
First, we investigate how data can be augmented and merged
to provide features that would expose the relationship with
bus occupancy. Second, we build different models for bus-stop
and transit-route levels. Finally, we demonstrate and compare
our approach using actual APC data from the public transit
agency of Nashville, TN. The main contribution of this paper
is implementing a data cleaning and augmentation method
that processes and cleans raw APC data. Raw APC data is
often noisy and is faced by different issues. Augmenting and
cleaning ensure that data used in training models is valid.
We generate passenger occupancy from alighting and boarding
information.
Organization: The rest of this paper is organized as fol-
lows. In Section II, we give an overview of the state-of-the-
art in occupancy prediction. In Section III, we present and
formulate the problem. We then discuss in-depth the APC
data in Section IV and the issues accompanying the dataset. In
Section VI, we validate our proposed models using real-world
data from Nashville, TN. Finally, in Section VII, we give our
conclusions.
II. RELATED WORK
In this section we discuss the current state-of-the-art meth-
ods used in public transit occupancy prediction.
A. Occupancy Prediction
Given the importance of public transit and the increasing
ubiquity of available vehicle data, research in the field of
occupancy prediction, also known as passenger flow or transit
demand prediction, has been flourishing. There is a consider-
able number of work done on understanding and mapping the
occupancy level in public transport.
Short-term passenger demand forecasting fall into one of
two categories, parametric and non-parametric approaches.
Traditionally, parametric approaches such as historical av-
eraging [4] and autoregressive integrated moving average
(ARIMA) [5] have been used to predict not only demand
but traffic flow, travel times and vehicle speed. Ever since it
was established, ARIMA has been known to perform well in
modeling linear and stationary time series. However, ARIMAs
shortcomings in taking into account seasonality and capturing
non-linear relationships in data are also well known.
In contrast, non-parametric approaches build a non-linear
relationship between the input and output variables without
any prior knowledge. These methods gained popularity as
consequence of the rapidly increasing availability of data
from systems such as Advanced Public Transportation Systems
(APTS) and Advanced Traveler Information Systems [3].
These techniques have been proven effective at forecasting
demand based on data gathered through smart cards [6],
[7]. Toque et al. [8] used Random Forest (RF) and LSTM
neural networks trained on smart card data to predict travel
demand. By creating multiple temporal units neural networks
(MTUNN) and parallel ensemble neural networks (PENN),
Tsai et al. [9] showed that it can outperform predictions based
on statistical analysis of historical data.
Incorporating other spatio-temporal dataset such as weather
and special events have also been explored. Karnberger et
al. [10] considered the effect of exogenous events on public
transportation ridership. Meanwhile, Zhou et al. [11] combined
smart data and weather information and found that while riders
are more resilient to changes in weather, it still has an effect
on the overall demand. Finally, Wood et al. generated models
the passenger occupancy and demand at the next-stop/any-
stop level based on APC and weather data [12] and proved
that even simpler models such as RF and LSTM provide
reliable estimates of future data when trained with historical
information if demand patterns are fairly stable.
There has been plenty of work done in the field of public
transportation with a special focus on improving reliability
through understanding and forecasting passenger demand.
However, our work is distinct in three ways. First, our work
aims to provide occupancy prediction at both the stop and trip
levels separately by forecasting short and long term demand.
Second, we work on APC data which is fundamentally differ-
ent from smart card data, which is the data commonly used by
prior work. Smart cards are embedded with integrated circuits
enabling it to process information, or in this case, allow for
contactless ticketing for riding on mass transit. These cards are
much more accurate and complete in their data collection [13],
[14] due in part they require passengers to swipe after getting
on and before getting of the vehicle. In contrast, APC data is
much more noisy and introduces far more uncertainty in data
collection and processing. Third, we focus on implementing
this for the entire public transport system and not on a few
select routes.
III. PROBLEM STATEMENT
Based on our conversations with the transit agency, they
want to be able to identify particular trips and stops which
experience overcrowding. Overcrowding increases the chances
of passengers not being able to get on the bus and decreasing
their overall satisfaction and willingness to take public transit
again in the future. Knowing the maximum occupancy at
the trip and stop level will allow them to react and prepare
accordingly by increasing bus dispatch frequency thereby
decreasing headway.
The primary objective of this work is to provide accurate
occupancy prediction for public transit vehicles. The goal is to
be able to reliable and efficiently forecast maximum ridership
demand at both stop and trip levels. The problem then is,
given a fleet of heterogeneous vehicles2, each equipped with
automated passenger count systems, how are we able to model
and accurately predict the maximum occupancy at any trip or
stop in the future.
2In this work we use the terms vehicle and bus as public transit vehicles
interchangeably.
摘要:

OnDesigningDayAheadandSameDayRidershipLevelPredictionModelsforCity-ScaleTransitNetworksUsingNoisyAPCDataJosePaoloTalusan1,AyanMukhopadhyay1,DanFreudberg2,andAbhishekDubey11VanderbiltUniversity2NashvilleMetropolitanTransitAuthorityAbstract—Theabilitytoaccuratelypredictpublictransitrid-ershipdemandben...

展开>> 收起<<
On Designing Day Ahead and Same Day Ridership Level Prediction Models for City-Scale Transit Networks Using Noisy APC Data.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:9 页 大小:1.19MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注