1 A fairness assessment of mobility -based COVID -19 case prediction models Abdolmajid Erfani 1 and Vanessa Frias -Martinez 2 3

2025-04-27 0 0 1.1MB 25 页 10玖币
侵权投诉
1
A fairness assessment of mobility-based COVID-19 case prediction models
Abdolmajid Erfani 1*, and Vanessa Frias-Martinez 2, 3
1 Department of Civil and Environmental Engineering, University of Maryland, 1173 Glenn
Martin Hall, College Park, MD 20742, USA.
2 College of Information Studies, University of Maryland, College Park, MD 20742, USA.
3 University of Maryland Institute for Advanced Computer Studies, University of Maryland,
College Park, MD 20742, USA
* Corresponding Author: erfani@umd.edu
2
ABSTRACT
In light of the outbreak of COVID-19, analyzing and measuring human mobility has become
increasingly important. A wide range of studies have explored spatiotemporal trends over time,
examined associations with other variables, evaluated non-pharmacologic interventions (NPIs),
and predicted or simulated COVID-19 spread using mobility data. Despite the benefits of publicly
available mobility data, a key question remains unanswered: are models using mobility data
performing equitably across demographic groups? We hypothesize that bias in the mobility data
used to train the predictive models might lead to unfairly less accurate predictions for certain
demographic groups. To test our hypothesis, we applied two mobility-based COVID infection
prediction models at the county level in the United States using SafeGraph data, and correlated
model performance with sociodemographic traits. Findings revealed that there is a systematic bias
in modelsperformance toward certain demographic characteristics. Specifically, the models tend
to favor large, highly educated, wealthy, young, and urban counties. We hypothesize that the
mobility data currently used by many predictive models tends to capture less information about
older, poorer, less educated and people from rural regions, which in turn negatively impacts the
accuracy of the COVID-19 prediction in these areas. Ultimately, this study points to the need of
improved data collection and sampling approaches that allow for an accurate representation of the
mobility patterns across demographic groups.
3
INTRODUCTION
The interactions between human mobility and epidemic spread have been studied unprecedentedly
during the COVID-19 pandemic 1-8. With these efforts, nonpharmaceutical interventions (such as
national lockdowns) have been evaluated for their effectiveness and socio-economic impact on
different groups 9-11, models have been developed to predict disease spatial diffusion 12,13, and
scenarios have been modeled to assess their outcomes 14-17. Studies have demonstrated that
mobility data are a meaningful proxy measure of social distancing 18, affect viral spreading 19,20,
and are useful for predicting the spread of COVID-19 21-23.
In particular, to control the spread of new cases and plan efficiently for hospital needs and
capacities during an epidemic, public health decision-makers require accurate predictions of future
case numbers 7. For example, a study by Ilin et al. (2021) showed that changes in mobility can be
used to predict COVID-19 cases. Their study demonstrated that public mobility data can be used
to develop reduced-form and simple models that mimic the behavior of more sophisticated
epidemiological models for predicting COVID-19 cases on a 10-day basis 21. Another study
examined several state-of-the-art machine learning models and statistical methods and
demonstrated how mobility data can improve prediction trends when used as exogenous
information in models 22.
As discussed, mobility data from anonymized smartphones has been shown to improve COVID-
19 case prediction models. However, mobility data bias has received little attention in this
predictive context. There exist only just a handful of papers reporting demographic bias in mobility
data due to differences in smartphone ownership and use 24-26; and since data providers are not
transparent about how mobility data is collected, or about the socio-economic and demographic
groups represented in them, directly measuring and correcting bias in mobility data is difficult. In
4
this study, we hypothesize that the presence of socio-economic and demographic bias in the
mobility data used to train the COVID-19 case predictive models, might result in unfairly less
accurate predictions for particular socio-economic and demographic groups. Unfair predictions
provided to decision makers e.g., predictions of COVID-19 cases for minority groups that are
lower than reality, could in turn be used to unfairly assign more resources to population groups
that do not necessarily need them.
To test our hypothesis, we evaluated the performance of two types of mobility-based COVID-19
case prediction models highly used by decision makers due to its interpretability: linear regressions
and time series models. In contrast to more complex epidemiological models that are hard to tune
due to its parametric nature, and deep learning models that are difficult to interpret, linear models
and time series are easy to train and test 21,27-29. The models were trained using SafeGraph’s
mobility data, and performance was measured via predictive errors. To assess the fairness of the
predictions, we analyzed the relationship between the model prediction errors and specific socio-
economic and demographic features at the county level in the United States and across the two
model types. Evaluating the performance of two diverse interpretable models allowed us to
account for potential algorithmic bias i.e., bias introduced by the algorithm itself 30,31. If unfair
predictions are pervasive across types of models trained and tested with the same data, we can
partially attribute the unfairness to the mobility data itself.
5
MATERIAL AND METHODS
In our study, we use mobility data from SafeGraph to build COVID-19 case prediction models;
and we explore model performance across socio-economic and demographic features to potentially
identify unfair results for specific groups i.e., differences in error distributions across social groups.
We next describe these three types of datasets, with all being publicly available.
Human mobility
We used SafeGraph’s publicly-available human mobility data at the county level in the US.
SafeGraph uses location information extracted from smartphones to provide aggregate data
characterizing mobility in terms of visit volumes to types of places and volumes of origin-
destination (OD) flows 32. For this study specifically, we used the data publicly available in the
origin-destination-time (ODT) platform 33, that computes OD flows between counties as the
aggregation of trips that start at an individual’s home county location (origin), with a destination
defined as a stay location within a county for longer than a minute. OD flows between all counties
in the US were collected throughout all days of the year 2020. Figure 1.a illustrates how the
average number of trips at county level across the US changed over the year 2020. According to
various studies in the US using mobility data, the dataset collected in Figure 1.a also shows similar
trends of mobility change 34,35.
摘要:

1Afairnessassessmentofmobility-basedCOVID-19casepredictionmodelsAbdolmajidErfani1*,andVanessaFrias-Martinez2,31DepartmentofCivilandEnvironmentalEngineering,UniversityofMaryland,1173GlennMartinHall,CollegePark,MD20742,USA.2CollegeofInformationStudies,UniversityofMaryland,CollegePark,MD20742,USA.3Univ...

展开>> 收起<<
1 A fairness assessment of mobility -based COVID -19 case prediction models Abdolmajid Erfani 1 and Vanessa Frias -Martinez 2 3.pdf

共25页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:25 页 大小:1.1MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 25
客服
关注