1 Introduction
The bike share system (BSS) market has rapidly expanded in recent years and is expected to triple by 2030 (
Fishman and Allan, 2019;
Straits Research, 2021). The traditional BSS is station-based, allowing travellers to pick up and return bikes at designated locations (
Kou and Cai, 2019). BSSs allow travellers to use bikes on a need-basis either for a fee or for free, providing a convenient and accessible mobility option, especially for their first/last-mile trips (
Bachand-Marleau et al., 2012;
Fishman, 2016). As part of the sharing economy and as a viable substitute for short private car-based trips, BSS also has the potential to reduce greenhouse gas emissions (
Shaheen et al., 2010;
Kou et al., 2020;
Zhou et al., 2023). However, these benefits may remain unattained if the BSSs are not well planned or managed. Unplanned or unmanaged BSS can lead to over- and under-supply, inconvenient parking, lower service level, street safety issues, and suboptimal business operations (
Regue and Recker, 2014;
Chen et al., 2016). The rapid growth in BSSs and the necessity for intelligent forward-looking design has led researchers to study the implementation and improvement of these systems (
Luo et al., 2020;
Kou and Cai, 2021a).
Demand prediction for BSS stations is the foundation of planning and managing BSSs. Due to spatial and temporal imbalance in BSS demand, it is common for BSSs to suffer from disparity of either undersupply or oversupply of bikes among different stations (
Li et al., 2015;
Chen et al., 2016). Inaccurate demand prediction can cascade to improper bike rebalancing, increased operational costs, reduced user satisfaction, and more greenhouse gas emissions. Therefore, system operators and researchers have devoted significant attention to predicting the demands of BSS stations (
El-Assi et al., 2017;
Hyland et al., 2018;
Zhou et al., 2018;
Kou and Cai, 2021b).
Existing demand prediction studies have used traditional regression, machine learning, or deep learning models to predict BSS demand in different ways. For instance, Médard de Chardon and Caruso (
2015) used linear regression-based method for estimating demands for daily BSS trips using station-level data in six cities. Hulot et al. (
2018) used linear regression and machine learning models to predict hourly demand and recommended intervals for how regularly to rebalance the bikes. Convolutional neural network based models have been used to predict bike inflow and outflow at stations and station-level hourly demand (
Chai et al., 2018;
Lin et al., 2018;
Yu et al., 2018). Cluster-based regressions have been used to predict pickups and returns of bikes to stations with similar characteristics and to predict citywide bike usage (
Chen et al., 2016;
Jia et al., 2019;
Li and Zheng, 2020). Additionally, a few studies have investigated residual correction to reveal hidden stochasticity in time series data to improve prediction (
Kim et al., 2022;
Zheng et al., 2023). Among existing models, in order to improve the prediction accuracy, previous studies also integrated spatiotemporal variables to analyse their role in impacting demand. Bao et al. (
2017) investigated bike share travel patterns and trip purposes by combining smart card data and point of interests (POIs). Zhou (
2015) and Lin et al. (
2020) revealed the spatiotemporal patterns of bike sharing behavior and identified influential factors such as the built environment on bike share trip demands.
Various features have been incorporated in demand prediction as well, in addition to spatiotemporal information. Yang et al. (
2016;
2019) created a probabilistic spatiotemporal model using dynamic networks, time factors, and weather. Hulot et al. (
2018) predicted demand with temporal and weather variables using linear regression and machine learning models. Singhvi et al. (
2015) used temporal, demographic, and weather factors in their pairwise model. Chen et al. (
2016) used the temporal and weather factors but added social events (such as city festivals, parades, or traffic accidents) in their cluster-based prediction. However, despite the advancement in demand forecasting techniques, existing studies mainly focused on how to improve the prediction of demands. The accuracy of demand prediction is only known posteriorly after the prediction model has been developed and it requires extensive domain-specific feature engineering from the researchers. No existing studies have addressed the fundamental predictability of demands at stations, and there is a gap in understanding how the intrinsic randomness (i.e., the inherent variation of BSS demands due to unpredictable factors such as weather conditions and human behaviors) governs the limit of future demand prediction. By understanding the governing randomness in demand levels, system operators and city administrators can better manage and maintain stations. Meanwhile, little research has evaluated how temporal invariant determinants shape the random nature of demand patterns at individual stations. These temporal invariant determinants intrinsically characterize the functionalities of different zones in cities as well as shape the heterogeneities among bike share stations. They are relatively stable over time compared to time dependent determinants, including seasonality and unique events. There exists a research gap in studying variables accounting for such randomness, where we could gain knowledge about heterogeneous station-level demand predictability even prior to the launch of a BSS in the city.
To evaluate the accuracy of the predictive models, however, current demand prediction studies have primarily applied one single model for the entire system and evaluated prediction model performance at the system-level. These studies rely on station-level data and employ evaluation metrics such as root mean squared error (RMSE) and mean absolute percentage error (MAPE), but only inspecting the aggregated outcome at the system-level may fail to capture the variation and anomalies among stations and dilute the local understanding of predictive performance on station-level (
Li et al., 2015;
Médard de Chardon and Caruso, 2015;
Singhvi et al., 2015;
El Sibai et al., 2018;
Liu et al., 2022). For instance, station-level understanding is critical for rebalancing at the station-level and maintaining efficient system operation. Consequently, going beyond focusing on global average performance, there exists a knowledge gap on how prediction errors deviate at high resolution and how the inherent predictability of demands differs at individual stations (
He and Shin, 2020).
To address the aforementioned research gaps, this study quantified the randomness and predictability rooted in time series bike check-out demands at individual stations based on entropy and predictability from information theory. The calculation of these two metrics was based on one-year’s data, which can capture relatively long-term variations and seasonal changes. To test the validity of our measurements, we compared the station-level entropy and predictability with the empirical prediction performance from two benchmark prediction models — Auto Regressive Moving Average (ARMA) and XGBoost. This establishes a viable mapping from model-based accuracy/error to model-free measurements, allowing for anticipation of demand prediction performance without the need of prior feature engineering and prediction algorithms. Additionally, to further examine how temporal invariant factors of a city impact such intrinsic randomness of BSS demands, this work used random forest regression model to identify the most significant temporal invariant factors (i.e., features that remain consistent over the study period, without significant variation, such as infrastructure) in contributing to station-level entropy and predictability. In the context of the existing literature, this study makes three primary contributions:
(1) We adopted entropy and predictability as model-free measurements for better investigating station-level inherent demand predictability.
(2) Through empirical experiments, we further established these two measurements as representative of the practical prediction algorithm performance, without the need of feature engineering and building prediction models.
(3) We offered managerial insights for system operators and city authorities who are considering launching new BSS or expanding their existing BSS by identifying the key factors impacting the entropy and predictability of individual stations.
The subsequent sections of the paper are organized as follows. Section 2 introduces the data, data processing methods, demand randomness measurements, benchmark demand prediction algorithms, and evaluation metrics. In Section 3, we present the overview of the computed entropy and predictability. We also show the association between the entropy/predictability and empirical performances achieved by demand prediction models at the individual station-level. In addition, we determine the most notable factors influencing station demands’ entropy and predictability. Last, Section 4 draws inferences and implications about demand predictability from the results, discusses the limitations, and suggests future research directions.
2 Data and method
2.1 Data and data processing
In this study, our goal was to quantify the inherent randomness and predictability of bike check-out demands at individual bike share stations as well as identify what city factors would contribute to such randomness. To achieve this, we collected one year of historical Divvy trip records (366 days, Aug. 1, 2015 – Aug. 1, 2016) as demand data, where the record for each single trip includes the trip ID, start and end time of the trip, the check-out and check-in station IDs & names, and trip duration. There are 534 stations in total with complete trip records within our study period. Moreover, we partitioned the pre-processed historical trip data into two sets: One for training/analysis, spanning 361 days, and the other for testing, spanning 5 days. We selected Chicago as our case study city because of Chicago’s long history of operating station-based BSS (active since 2013) and Divvy’s wide service coverage, spanning throughout both downtown and suburban areas of the city. The start time and check-out station ID for each ride were extracted from the trip records for this study. Subsequently, we aggregated check-out demands at individual stations based on a certain time interval,
, which could range from hourly (
) to daily (
) segments, based on what were most used in previous works (
Hulot et al., 2018;
Kou and Cai, 2021a). In this study, we selected the time interval of four hours (
) as the base scenario and this resulted in a series of six demand data points for each station per day (we also conducted sensitivity analysis to examine how using different time intervals may impact the results). We therefore formulated a series of observed check-out demand level at station
over the one-year timespan
T as
where
indicates the aggregated check-out demand level at time step
. For each of the 534 Divvy stations, we also acquired data consisting of station ID, name, latitude, longitude, and capacity. We used such station information as source data to construct variables for the spatial network listed in Tab.1. Additionally, we collected the point of interests (POIs) within a 1000 ft (304.8 m) radius around each station based on Google Maps Places API (application program interface). POIs encompass a series of specific functional location, such as bus station, school and parking lot. Furthermore, we sourced socio-demographic data, specifically per capita income and population density, from the American Community Survey (
US Census Bureau, 2012) at the census-tract level. The spatial network, POIs, and socio-demographic data prepared above were utilized in Section 2.3 to analyze the key factors that contribute to the entropy and predictability.
2.2 Measures of demand randomness and predictability
In the context of BSS demand prediction, prediction models have been developed to statistically fit the relationship between the spatiotemporal variables and the bike usage demands, via either linear or nonlinear approaches (
Senter, 2008). How much of the demands can be predicted depend on the degree of uncertainty rooted in the demand pattern at a station. Therefore, we harnessed the concept of entropy in information theory (
Shannon, 1948) to measure the degree of randomness in a sequence of demands at individual stations (Section 2.2.1). In addition to the entropy, we computed each station’s predictability to capture the upper limit to which these demands can be correctly forecasted (Section 2.2.2). These two measurements thereby provide a quantitative assessment of the inherent predictability of demands at each installed bike share station, based on a stable and relative long-term (one-year) trend of demands rather than various short-term shifts (
Lu et al., 2013). Additionally, entropy and predictability imply the station-wise theoretical upper bound of unexplained uncertainty and potential margin of error associated with a given demand prediction model (
Song et al., 2010). These two metrics can serve as
a priori model-free evaluation for fundamental station demand patterns.
2.2.1 Entropy
Given a sequence of observed check-out demand level
for each station
, we utilized three entropy measurements to reflect the randomness of station’s demands using different amount of information, as proposed by Song et al. (
2010):
(1) The random entropy , which indicates the disorder of a station’s demand level, with the assumption that each demand level is observed with equal probability among unique demand levels (with the resolution of one bike check-out). As the random entropy increases, more diverse levels of demands are likely to be observed at a specific station.
(2) The temporal-uncorrelated entropy , which characterizes the heterogeneity of observed demand levels, where is the probability that the th demand level is observed among unique demand levels. The temporal-uncorrelated entropy considers both the unique number of demand levels and their frequency of being observed over the timespan.
(3) The actual entropy
, which incorporates the probability of an observed demand level as well as the order in which demand levels are observed and the persistence of an observed demand level (whether the observed demand stays at a certain level for multiple time windows), where
is the probability of finding a subsequent
in the observed full sequence
. Due to the computational complexity, the actual entropy could be estimated based on Lempel-Ziv data compression (
Kontoyiannis et al., 1998).
2.2.2 Predictability
Naturally, a demand sequence with greater entropy would have more randomness in its demand pattern, which in turn decreases the predictability of future demands at this station. Considering the entropy (
) that represents the disorder of a series of demands, the upper bounds of predictability (
) that could be attained by a predictive algorithm for correctly predicting future demands at a station is subject to Fano’s inequality (
Fano and Hawkins, 1961), when a station’s demands with entropy
range between
distinct levels (
Song et al., 2010):
where has a relationship with as outlined in Eq. (2.2):
and Eq. (2.3) describes the binary entropy function:
By solving Eqs. (2.2) and (2.3) based on the specified
and
calculated from a station’s demand series, we could compute the predictability
. Based on three types of entropy defined in Section 2.2.1, for each station
, we can thereby denote random predictability
, temporal-uncorrelated predictability
, and actual predictability
. Higher values of Π indicate that the demands could be better predicted. Comparing these three predictability metrics enables us to examine how temporal correlations within an individual station’s demand sequence enhance potential predictive accuracy (
Lu et al., 2013). This also aligns with existing works presented in Zhou (
2015) and Lin et al. (
2020) that incorporated different spatial and temporal variables into the prediction model with the aim to reduce prediction errors.
Since the entropy and predictability are fully subject to the time series of demands itself, we also performed a sensitivity analysis to examine how entropy and predictability of each station vary based on having different demand observation intervals (). Specifically, we computed the entropy and predictability corresponding to six distinct demand observation intervals (): 1 h, 2 h, 4 h, 6 h, 12 h, and 24 h. These observation intervals embody the potential granularity of performing demand monitoring and analysis by operators for each station.
2.3 Station-level temporally-constant features
The entropy and predictability of a station’s demand can be measured using historical demand data. However, for planning new systems or expanding an existing system to build new stations, such historical demand data would be unavailable. To examine whether the intrinsic demand randomness is related to temporally-constant and city-specific features that are available prior to launching new stations, we adopted the temporally-constant features of individual stations in Chicago as proposed in Kou and Cai (
2021b) and analyzed how such features contribute to the entropy and predictability across stations. The features are listed in Tab.1 and they can be classified into three categories: Land use characterization, spatial network information of stations, and socio-demographics. Furthermore, we applied a random forest regression model to explore the importance of temporally-constant variables to the entropy and predictability of demands at each station. A random forest regression model is a machine learning model that trains multiple decision trees on different subsets of the dataset, aggregating their results to improve overall predictive performance (
Biau and Scornet, 2016). We selected such a model because of its powerful regression capability as well as interpretability to the factor significance (
Belgiu and Drăguţ, 2016). Specifically, the variables listed in Tab.1 are treated as input variables in the model while the entropy
and predictability
computed in Section 2.2 are dependent variables, respectively. Once a random forest regression model is trained, we can extract the feature importance according to each feature’s contribution in making correct regression and reducing loss. This enables the interpretability in understanding the contribution of temporally-constant featuring in impacting the entropy and predictability of demands.
2.4 Benchmark prediction algorithms and evaluation metrics
The entropy and predictability measurements discussed in Section 2.2 could establish the theoretical upper limit of the predictive power for demand prediction algorithms. To verify the relationship between these theoretical upper limits and the practical predictive performance at the station level, we implemented two widely used benchmark algorithms for BSS demand prediction: ARMA and XGBoost. For each predictive performance (detailed in Section 2.4.3) pertaining to the predictive algorithms, we modelled their relationship and calculated scores, which shows how well such relationship holds. We also did a sensitivity analysis to examine whether such relationship sustains when using different time intervals (i.e., six time-intervals as described in Section 2.2 were analyzed), examining their fitted curves and associated scores. Please note that we rounded the output of prediction to the nearest integer as the demand level (i.e., the number of bikes being checked-out) can only be integers.
2.4.1 ARMA
The ARMA algorithm is a widely recognized statistical method for time series forecasting, which has been applied to conduct demand prediction in various applications, such as stock price, inventory and disease infection (
Saboia, 1977;
Nochai and Nochai, 2006;
Chen et al., 2008;
Benvenuto et al., 2020). The ARMA model consists of two main parameters:
(order of autoregression (AR)) and
(order of moving average (MA)), denoted as
, which is equivalent to an ARIMA model without differencing a time series. The future demand
to be forecasted at a specific station
at time step
is a linear combination of past demands and past errors, formulated as follows:
where
and
are the coefficients,
is the random noise with
and variance
, and
and
are the orders of AR and MA polynomials, respectively. In our study, individual stations had their own ARMA models and models were trained on stations’ historical check-out demand data (i.e., the training dataset introduced in Section 2.1) separately, so the ARMA model is station specific. In addition, the parameters of each
model were determined through automatic ARMA algorithm (
Hyndman and Khandakar, 2008).
2.4.2 XGBoost
XGBoost is a tree boosting machine learning algorithm which is able to handle high-dimension data with selected features (
Chen and Guestrin, 2016). Within a XGBoost model, the output forecasted demand
at station
at time step
is predicted by using
additive functions in a tree-ensemble model:
where F is the space of regression trees; each represents an independent tree structure within additive functions; and corresponds to the set of input variables for station at time step . To be more specific, the input variables featuring each station include both time-invariant variables described in Tab.1 and weather variables (average temperature, average humidity, average wind speed, average precipitation, and average pressure).
2.4.3 Evaluation metrics for prediction algorithms
The models introduced in Sections 2.4.1 and 2.4.2 were trained on the training dataset and then their prediction performances were evaluated based on the testing dataset. We selected Root Mean Square Error (RMSE) and Cumulative Scores (CS) as the performance metrics, which represent the absolute and relative errors, respectively. Specifically, RMSE is defined as:
The value of RMSE ranges from 0 to +∞. A lower RMSE score indicates a better fitted regression model while a higher RMSE indicates a large prediction error.
Because RMSE could be highly influenced by the stations’ capacity (larger stations tend to have higher RMSE), we further employed CS adopted from Kocer (
2013) and Niu et al. (
2016) as a measurement of relative prediction accuracy. The value of CS ranges from 0 to 1 and is defined as:
where D is the total number of tested time steps for each station, and refers to the number of predicted demand values whose predicted demands do not exceed the lower and upper percentage bound of the ground truth demand. In this study, we set the percentage bound η as 10%. A CS value closer to 1 signifies a higher prediction accuracy.
3 Results and discussions
3.1 Overview of station-level demands, entropy, and predictability
Fig.1(a) shows the distribution of demand levels
observed in each 4-hour time interval at individual stations during the study period of 361 days. Aligning with previous studies, e.g., Kou and Cai (
2019), the distribution of demand is heavy tailed. This indicates that within each 4-hour window, the majority of stations have low check-out demand while the number of stations with high-volume demands is much smaller.
However, low demands do not necessarily mean low entropy or high predictability. We calculated the entropy and predictability across all stations based on their demand sequences in the study period. The resulting distribution for , , and , as well as Πrand, Πunc, and Πactual are illustrated in Fig.1(b) and 1(c), respectively. The encodes the additional frequency information of demands compared to the which only considers the number of unique demand levels . Such additional information explains partial randomness rooted in the time series of demands, thus exhibits left shift (i.e., lower entropy in general) comparing with . Likely, encodes the additional temporal order information on top of the , so is more left shifted than . With peaking at , one can expect that, if users come to rent bikes at stations randomly, different demand levels can be observed in a station on average. In contrast, the actual entropy reveals the real uncertainty considering the sequence of demands that could be observed at a station. Surprisingly, the actual entropy has two major peaks at 0.1 and 2.5, respectively. For stations with , they almost have no demand uncertainty. The next demand that can be forecasted at those stations is always at level, suggesting that these stations are likely to have no check-out demand but could randomly witness one check-out demand. While for stations with , their future demands are likely to be at (i.e., less than six) random levels. A consistent conclusion is observed in terms of predictability. We find that for demands at individual stations on average as more information is incorporated. With only the information of number of unique demand levels, the majority of stations exhibit very low predictability. As shown in Fig.1(c), the distribution of actual predictability has two peaks. One group of stations has nearly perfect predictability for demands, which aligns with our observation that these stations almost have no check-out demands. The other group of stations has predictability around 0.65. In other words, no matter how good the predictive algorithm is, the future demand levels of stations with has maximum prediction accuracy of 65%. Therefore, serves as the intrinsic limit for predictability of individual station. Notably, the two-peaked distribution of and indicates a polarized demand predictability rooted in stations. Despite that the overall station demands are very low across stations, a significant number of stations exhibit high entropy and reduced predictability. This implies that system operators should pay particular attention to these stations.
We depicted Fig.1(d) and 1(e) to further reconcile the contradiction that, overall, stations have low observed demands but with two peaks in the distributions of and . For the group of stations with low predictability (high entropy), they span a broad spectrum of unique levels of historically observed demands from 0 to approximately 100. In particular, the peak corresponds to an average of 30 unique demand levels. Contrastingly, the stations associated with high predictability (low entropy) span a narrower range of unique levels from 0 to around 25, with an average of 1. Therefore, while both groups of stations could experience low level of demands, the stations within the high entropy group can be anticipated to present a broader spectrum of demand variation. The increased variation of demand spectrum at individual stations introduces more randomness into the time series of observed demands, which inherently leads to higher entropy and lower predictability. In addition, the contrast of polarized stations is also illustrated in Fig.2, which displays the overall spatial distribution of stations with varying and . Stations with high but low (smaller circles in dark red) mainly locate within the downtown area whereas stations with low but high (larger circles in dark blue) mostly distribute in the suburban regions.
3.2 Relationship between entropy/predictability and prediction performance
In this section, we confirm that the entropy and predictability, which measures the randomness rooted in a station’s demands, are feasible and model-free estimators for empirical prediction errors and accuracy of demand prediction models, respectively.
In Fig.3(a)–3(f), the RMSE of each data point represents the average RMSE over the testing horizon of a single station derived from a single predictive algorithm. The results reveal the presence of an exponential association between the entropy over one year’s historical demands at individual stations and the resulting RMSE values from future demand prediction. This relationship shows an for ARMA and an for XGBoost when correlating and RMSE. Likewise, Fig.3(g)–3(l) demonstrate a logistic relationship established between the predictability of demands and the CS values, where ranges from 0.81 to 0.87 for ARMA between CS and while ranges from 0.59 to 0.74 for XGBoost between CS and . Note that we paired RMSE with entropy and CS with predictability because each of the paired values share the same unit of measurement. RMSE and entropy are expressed in terms of absolute magnitude of errors, while CS and predictability are expressed as the percentage scale of accuracy.
Fig.3(i) and 3(l) validate predictability as a theoretical limit for prediction accuracy, which is consistent with the upper limit identified in Lu et al. (
2013). Entropy and predictability capture the theoretical limits for the predictive analysis of BSS demand prediction, offering an approachable upper bound of predictive power for such BSS demand data. For instance, for the group of stations with average
, their CS values stay below 0.5 in both ARMA and XGBoost models. Even though some stations are expected to reach maximum
, they still exhibit practical CS values under 0.5, verifying that
bounds the empirical prediction accuracy. This is in line with data point distribution shown in Fig.3(i) and 3(l) that corresponding practical CS from predictive models is always smaller than the theoretical predictability. That being said, by applying the same demand prediction model across all stations and regressing models on certain variables to forecast station-level demands, it can be anticipated that various prediction errors across different stations are bounded by their inherent predictability. Furthermore, it is worth noting that entropy and predictability are model-free estimators, as they are derived exclusively from historical demands at individual stations. Their calculations are independent of any predictive models and serves as
a priori measurements in lieu of extensive domain-specific feature engineering for the input variables of the prediction models.
3.3 The impact of features on entropy and predictability
After verifying that entropy and predictability were feasible and model-free representations of practical prediction effects, we then identified which temporally-invariant factors are most significant contributors to entropy and predictability of station demands. In this work, we ran the random forest regression model whose inputs are temporally-constant variables listed in Tab.1 and whose outputs are either the set of entropy or the set of predictability . Based on five-fold cross validation, our results show that the model achieved an RMSE of 0.66 for entropy and 0.83 for predictability. This highlights that a statistically significant relationship holds between the temporally-constant factors and the entropy/predictability of a station’s demands.
In Tab.2, we present three most significant temporally-constant attributes affecting entropy and predictability, respectively (see the full list of ranked feature importance in Supplementary Information Tables A1 and A2). Our findings indicate that per capita income, spatial eccentricity, and the number of parking lots hold significant importance for both entropy and predictability. Notably, each of these top three variables comes from a different variable category: Per capita income is a socio-demographic variable; spatial eccentricity is a spatial network variable; and the number of parking lots is a land use variable.
In Fig.4, we showcase the specific quantities of these three temporally-constant variables identified above for the top ten stations with the highest and lowest actual predictability , respectively, in order to examine the positive/negative impact of variables on entropy and predictability. One can observe that per capita income negatively impacts predictability since stations with lower predictability exhibit higher per capita income. Similarly, the number of parking lots negatively influences the predictability. On the other hand, spatial eccentricity positively influences predictability, with more eccentric stations exhibiting higher predictability. Furthermore, in Fig.5, we depict the spatial distribution of stations in the entire systems and their associated three temporally-constant variables, showing the spatial interplay between entropy and predictability and three variables across the service region. For per capita income (Fig.5(a)), we observed that stations exhibiting higher predictability and lower entropy as shown in Fig.2 are typically located in suburban areas with lower per capita income. Conversely, stations with lower predictability are often found in downtown area with higher per capita income. In terms of spatial eccentricity, one can observe that in suburban areas, stations with higher predictable demands are sparsely located from their neighbouring stations. In contrast, stations with less predictable demands tend to be concentrated in the urban center, where stations are denser. Additionally, for parking lot, stations exhibiting lower predictability have more parking lots surrounding them in the downtown compared with their more predictable, lower entropy counterparts.
The significant city- or system-specific, temporally-constant variables impacting entropy and predictability can be interpreted in several ways. From the significance of per capita income, we can infer that in census tracts with higher wealth, more stations could serve leisure-based or recreational trips (
Stromberg, 2015), which would induce more entropy compared to stations that serve regular commute- or errand-based trips. On the other hand, high entropy in high-income areas could be a symptom of the accessibility to various transportation options in high per capita income (
Smith et al., 2020), inducing a higher variety of trips and options in those places. The spatial eccentricity implies that those stations located farther from other stations are more predictable. That is, demands of suburban stations are generally more predictable. This could be the result of suburban places having lower accessibility to diverse trip purposes, or it may be representative of low overall use in suburban stations. The quantity of parking spaces may signal an area’s capacity to accommodate various trip purposes. Fig.5(c) reveals that parking lots mainly concentrate in the downtown region. Considering the functionality of BSS in first/last mile trips, customers are able to park their private cars in the parking spaces (e.g., public parking lot and street parking) and continue the subsequent trips by using shared bikes in the downtown area. Such a capacity of hosting various trips increases the variation of number of trips that can be anticipated in the downtown area, which leads to high entropy and in turn reduces the predictability of demands.
3.4 Sensitivity analysis
3.4.1 Entropy and predictability under different observation intervals
Entropy and predictability distribute differently under different demand observation intervals. Fig.6 suggests the distinct effects of demand observation frequencies: Increasing the frequency of demand observations enhances the predictability and decreases the entropy of those stations initially characterized by low predictability. However, this increase in observational frequency appears to have no impact on the stations that already exhibit high levels of predictability. More specifically, in Fig.6(a) and 6(d), as the frequency of observation lowers, stations tend to have higher and lower , respectively. This is because demands cumulate to higher levels when less frequently monitored. For instance, the demand level of one unit over two consecutive hours could aggregate to the demand level recorded for two units within a two-hour observation window. This is also in line with the distribution of displayed in Fig.6(g), where the peak slightly shifts towards zero and the variance of distribution decreases when implementing more frequent monitoring. Moreover, in Fig.6(b), 6(c), 6(e) and 6(f), one can observe that the peaks of high predictability stations (i.e., low entropy stations) stays aligned regardless of variations in the monitoring frequency. This is because these stations are typically associated with very low demand levels (typically zero or one), resulting in unvarying time series of demands even though the monitoring changes drastically from hourly to daily. In contrast, it is also noticeable that the peaks of low predictability stations shift towards higher predictability when de-escalating the monitoring interval from 24 hours to 1 hour.
This sensitivity analysis also delivers implication of potential operational strategies to mitigate the low predictability associated with certain stations. Operators could heterogeneously monitor the demands across stations situated in various locations: The downtown area may necessitate more frequent monitoring (such as every half an hour or every hour); conversely, suburban areas may require less frequent observations (such as every 12 hours or every day). By shortening the time interval between demand observations, the time series of demands tend to be less random so operators can anticipate more accurate predictions with less errors in their predictive models. This strategy also suggests a practical way to balance operational resources for different service regions while keeping service reliability and predictability.
3.4.2 Relationship between theoretical upper limits and practical predictive performances under different observation intervals
Our sensitivity analysis solidifies the relationship between entropy/predictability (theoretical upper limits) and RMSE/CS (practical predictive performances) identified in Section 3.2. Fig.7 presents the fitted curves under different observation intervals for both predictive algorithms with both performance metrics and Tab.3 lists the associated score of each fitted curve in Fig.7. Same as Fig.3, RMSE and entropy follows an exponential relationship while the relationship between CS and predictability is logistic. When monitoring the demands more frequently (the time interval is smaller), the score becomes higher for the corresponding fitted curve. This is because for a given fixed period (i.e., 366 days in this work), the observation with higher frequency results in more data points for both training and testing the model and the model gets better trained.
4 Conclusions
In summary, this study measured the inherent randomness of station demands via entropy and predictability based on information theory. We validated those metrics against empirical performance metrics to show that entropy and predictability are applicable to function as model-free estimators to anticipate the error and accuracy in prediction models. Furthermore, we identified three most significant temporally-constant factors — per capita income, spatial eccentricity, and the number of parking lots — that influence station demand entropy and predictability. The results we obtained from this study can serve as a priori implication for BSS management, operation, expansion, and launch, without the need of domain-specific feature engineering and modelling demand prediction algorithms.
For cities with active BSSs, our findings imply that there are inherent errors and randomness that existing predictive models might fail to capture, so the operators could apply customized monitoring strategies in different service regions to escalate the demand predictability and service reliability. For instance, for stations with high entropy and low predictability, monitoring and maintenance (e.g., rebalancing) can be employed more frequently so as to improve operational functionality. On the other hand, for cities considering an expansion or investment in a BSS, the identified temporally-invariant key factors associated with high entropy stations can be qualitatively taken into account when deciding where to add new stations to the BSS. Moreover, measuring the entropy and predictability at station-level allows for a finer resolution of operation among stations and offers the possibility to enhance the specificity of management tailored for each station. Thus, these results have direct implications for operators to make decisions about station monitoring, rebalancing, capacity design, siting, or decommissioning.
In a broader context, for cities without BSSs, the municipal officials and potential system operators can utilize entropy and predictability to link temporally-constant city and system characteristics to potential demand prediction efficacy. These a priori measurements can help inform future planning and designing of the BSSs as well as corresponding infrastructure such as parking lots and bike lanes in a city. Our model is also adaptable so when there are significant changes occurred for temporally-constant city characteristics, operators can re-implement it to update their planning decisions for the next stage.
Suggested next stages of this research are 1) to perform a large-scale analysis to generalize the entropy and predictability in other cities and 2) to compare entropy and predictability to other evaluation metrics (which could emphasize prediction error by how often rather than how much) with more predictive algorithms. In the first case, BSS demand prediction performance could be heavily influenced by the number of high-entropy stations in a specific system, thus case study selection and generalization of results must consider this aspect of BSS. A large-scale analysis could help determine generalization possibilities of these methods and provide more insight on the dynamics of individual BSS throughout the world. In the second case, it could be just as valuable to know how often demand prediction is wrong as by how much the demand prediction was wrong, especially for dynamic rebalancing of bikes. Therefore, an evaluation metric that captures the temporal prediction error of demand could be useful in relating to the different forms of entropy and predictability calculated herein. Furthermore, the exponential and logistic association found in our results in Section 3.2 were slightly different from the linear association identified in the literature on entropy. Investigating the reason for this is a direction for future research.
One challenge in BSS research regarding demand prediction has been the absence of trip purpose data. Without knowing the purpose of people’s trips at bike sharing stations, it has been more difficult to predict the demand. High entropy at stations could suggest a diversity and heterogeneity of trip purposes, whereas stations with low entropy could mean consistent travellers using BSS for similar trip purposes. Future research could explore how entropy is related to trip purpose for predicting demand, when trip purpose data become available. Furthermore, future research can explore in more detail the operational costs of high- versus low-entropy stations. If a relationship between entropy measurement and maintenance costs of a station exists, our method could be employed to anticipate the operational costs of adding new stations or, in the case of cities without active BSS yet, investing in a BSS.