On Designing Day Ahead and Same Day Ridership Level Prediction Models for City-Scale Transit Networks Using Noisy APC Data

2025-05-02 0 0 1.19MB 9 页 10玖币

侵权投诉

On Designing Day Ahead and Same Day Ridership

Level Prediction Models for City-Scale Transit

Networks Using Noisy APC Data

Jose Paolo Talusan1, Ayan Mukhopadhyay1, Dan Freudberg2, and Abhishek Dubey1

1Vanderbilt University

2Nashville Metropolitan Transit Authority

Abstract—The ability to accurately predict public transit rid-

ership demand beneﬁts passengers and transit agencies. Agencies

will be able to reallocate buses to handle under or over-utilized

bus routes, improving resource utilization, and passengers will

be able to adjust and plan their schedules to avoid overcrowded

buses and maintain a certain level of comfort. However, accu-

rately predicting occupancy is a non-trivial task. Various reasons

such as heterogeneity, evolving ridership patterns, exogenous

events like weather, and other stochastic variables, make the task

much more challenging. With the progress of big data, transit

authorities now have access to real-time passenger occupancy

information for their vehicles. The amount of data generated is

staggering. While there is no shortage in data, it must still be

cleaned, processed, augmented, and merged before any useful

information can be generated. In this paper, we propose the use

and fusion of data from multiple sources, cleaned, processed, and

merged together, for use in training machine learning models

to predict transit ridership. We use data that spans a 2-year

period (2020-2022) incorporating transit, weather, trafﬁc, and

calendar data. The resulting data, which equates to 17 million

observations, is used to train separate models for the trip and

stop level prediction. We evaluate our approach on real-world

transit data provided by the public transit agency of Nashville,

TN. We demonstrate that the trip level model based on Xgboost

and the stop level model based on LSTM outperform the baseline

statistical model across the entire transit service day.

Index Terms—component, formatting, style, styling, insert

I. INTRODUCTION

Public transportation is a vital component in any modern

metropolitan city. Access to reliable forms of public transit

have been known to have an impact in many aspects, such

improved quality of life, reduced carbon emissions, and have

an overall positive effect on social equity. However, even with

the availability of public transit, it is not always guaranteed

that it is always reliable and accessible. On the contrary, they

are more often over-stretched or underdeveloped. As a result,

most of the work being done is focused on improving the

accessibility and reliability of public transit.

Traditional measures of reliable transit systems include trip

frequency, punctuality, and travel time. In response, plenty of

work has been done with the goal of improving travel times

by identifying and reducing causes of delay [1]. However, an

often overlooked element in reliability is the perceived comfort

of riders [2] which can be seen as the a direct consequence

of vehicle occupancy and capacity. A frequently overcrowded

bus can prevent potential commuters from even considering

public transit. Inversely, consistently low rider demand can be

seen as an under utilization of already constrained resources.

This duality of public transit is often caused by the agencies’

constant struggle with providing increased transit coverage

amidst highly heterogeneous ridership demand.

With the progress of big data, transit authorities now have

access to and are able to provide real-time passenger occu-

pancy information for their vehicles. Transit agencies such

as the Nashville Metropolitan Transit Authority (MTA) uses

Automated Passenger Counter (APC) systems that provides

stop-level estimates of passenger boarding and alighting. This

information have been integrated by apps such as Transit1,

which in addition to allowing potential riders to see an

estimated future passenger occupancy, also use crowdsourcing

to collect occupancy information from riders onboard in an

effort to improve service accuracy. From the perspective of

passengers, this helps them choose departure times to match

their desired comfort level. For the agencies, this can be a

reference for them to optimize their services by allocating

resources according to predicted ridership demand. Thus,

accurately predicting the maximum occupancy of each vehicle

in a public transit system is pivotal in improving perceived

reliability, resource optimization, and rider comfort.

Achieving highly accurate occupancy prediction, however,

is a difﬁcult task. There are a number of factors that can

affect demand ranging from short high impact factors such

as sport events and festivals to long-term factors such as

school schedule and season. Additionally, stochastic trafﬁc

conditions along the route can cause variation in ridership,

further increasing uncertainty. Another issue that can affect

prediction is sensor data noise. As with any system that

relies on a ﬂeet of sensors and a large database, there are

bound to be inconsistencies and errors. This is especially true

for APC systems, where passenger boarding and alighting

information are recorded using infrared sensors installed on

vehicle doors [3]. This can lead to erratic and misleading

information. This issue brings up the need for data preparation

and augmentation to ensure that the data is reliable and useful.

In this paper, we implement an end-to-end framework for

1https://transitapp.com/

arXiv:2210.04989v1 [cs.LG] 10 Oct 2022

predicting occupancy at both the stop and route levels. This

ensures that our method can react to both short and long-term

changes in the public transit system. We do this by analyzing

and combining different spatio-temporal data such as weather,

trafﬁc, and APC data to develop a model for bus occupancy.

First, we investigate how data can be augmented and merged

to provide features that would expose the relationship with

bus occupancy. Second, we build different models for bus-stop

and transit-route levels. Finally, we demonstrate and compare

our approach using actual APC data from the public transit

agency of Nashville, TN. The main contribution of this paper

is implementing a data cleaning and augmentation method

that processes and cleans raw APC data. Raw APC data is

often noisy and is faced by different issues. Augmenting and

cleaning ensure that data used in training models is valid.

We generate passenger occupancy from alighting and boarding

information.

Organization: The rest of this paper is organized as fol-

lows. In Section II, we give an overview of the state-of-the-

art in occupancy prediction. In Section III, we present and

formulate the problem. We then discuss in-depth the APC

data in Section IV and the issues accompanying the dataset. In

Section VI, we validate our proposed models using real-world

data from Nashville, TN. Finally, in Section VII, we give our

conclusions.

II. RELATED WORK

In this section we discuss the current state-of-the-art meth-

ods used in public transit occupancy prediction.

A. Occupancy Prediction

Given the importance of public transit and the increasing

ubiquity of available vehicle data, research in the ﬁeld of

occupancy prediction, also known as passenger ﬂow or transit

demand prediction, has been ﬂourishing. There is a consider-

able number of work done on understanding and mapping the

occupancy level in public transport.

Short-term passenger demand forecasting fall into one of

two categories, parametric and non-parametric approaches.

Traditionally, parametric approaches such as historical av-

eraging [4] and autoregressive integrated moving average

(ARIMA) [5] have been used to predict not only demand

but trafﬁc ﬂow, travel times and vehicle speed. Ever since it

was established, ARIMA has been known to perform well in

modeling linear and stationary time series. However, ARIMA’s

shortcomings in taking into account seasonality and capturing

non-linear relationships in data are also well known.

In contrast, non-parametric approaches build a non-linear

relationship between the input and output variables without

any prior knowledge. These methods gained popularity as

consequence of the rapidly increasing availability of data

from systems such as Advanced Public Transportation Systems

(APTS) and Advanced Traveler Information Systems [3].

These techniques have been proven effective at forecasting

demand based on data gathered through smart cards [6],

[7]. Toque et al. [8] used Random Forest (RF) and LSTM

neural networks trained on smart card data to predict travel

demand. By creating multiple temporal units neural networks

(MTUNN) and parallel ensemble neural networks (PENN),

Tsai et al. [9] showed that it can outperform predictions based

on statistical analysis of historical data.

Incorporating other spatio-temporal dataset such as weather

and special events have also been explored. Karnberger et

al. [10] considered the effect of exogenous events on public

transportation ridership. Meanwhile, Zhou et al. [11] combined

smart data and weather information and found that while riders

are more resilient to changes in weather, it still has an effect

on the overall demand. Finally, Wood et al. generated models

the passenger occupancy and demand at the next-stop/any-

stop level based on APC and weather data [12] and proved

that even simpler models such as RF and LSTM provide

reliable estimates of future data when trained with historical

information if demand patterns are fairly stable.

There has been plenty of work done in the ﬁeld of public

transportation with a special focus on improving reliability

through understanding and forecasting passenger demand.

However, our work is distinct in three ways. First, our work

aims to provide occupancy prediction at both the stop and trip

levels separately by forecasting short and long term demand.

Second, we work on APC data which is fundamentally differ-

ent from smart card data, which is the data commonly used by

prior work. Smart cards are embedded with integrated circuits

enabling it to process information, or in this case, allow for

contactless ticketing for riding on mass transit. These cards are

much more accurate and complete in their data collection [13],

[14] due in part they require passengers to swipe after getting

on and before getting of the vehicle. In contrast, APC data is

much more noisy and introduces far more uncertainty in data

collection and processing. Third, we focus on implementing

this for the entire public transport system and not on a few

select routes.

III. PROBLEM STATEMENT

Based on our conversations with the transit agency, they

want to be able to identify particular trips and stops which

experience overcrowding. Overcrowding increases the chances

of passengers not being able to get on the bus and decreasing

their overall satisfaction and willingness to take public transit

again in the future. Knowing the maximum occupancy at

the trip and stop level will allow them to react and prepare

accordingly by increasing bus dispatch frequency thereby

decreasing headway.

The primary objective of this work is to provide accurate

occupancy prediction for public transit vehicles. The goal is to

be able to reliable and efﬁciently forecast maximum ridership

demand at both stop and trip levels. The problem then is,

given a ﬂeet of heterogeneous vehicles2, each equipped with

automated passenger count systems, how are we able to model

and accurately predict the maximum occupancy at any trip or

stop in the future.

2In this work we use the terms vehicle and bus as public transit vehicles

interchangeably.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OnDesigningDayAheadandSameDayRidershipLevelPredictionModelsforCity-ScaleTransitNetworksUsingNoisyAPCDataJosePaoloTalusan1,AyanMukhopadhyay1,DanFreudberg2,andAbhishekDubey11VanderbiltUniversity2NashvilleMetropolitanTransitAuthorityAbstractTheabilitytoaccuratelypredictpublictransitrid-ershipdemandben...

展开>> 收起<<

On Designing Day Ahead and Same Day Ridership Level Prediction Models for City-Scale Transit Networks Using Noisy APC Data.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

On Designing Day Ahead and Same Day Ridership Level Prediction Models for City-Scale Transit Networks Using Noisy APC Data

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: