At a time of the very visible effects of the climate impact on our urban lives, some cities have become unbreathable, and greenhouse gas emissions are produced by buildings heating and cooling networks and all-round petrol transport. At a time when transport has become the first emitter of CO2, we need to imagine, propose, other ways of occupying urban space. This calls for a better understanding of the spatial distributions of facilities and population (1–7). The information age and the online mapping revolution allow us to globally study the interactions of humans with their built and natural environment (8–13). Pioneering work in multicity studies has uncovered scaling laws relating population to distribution of facilities and socioeconomic activities at macroscopic scale (3, 6, 14–16). It has been asserted, for example, that more populated cities are more efficient in their per capita consumption (3, 4), and their occupation diversity can be modeled as social networks embedded in space (10). However, a systematic understanding of the interplay of the urban form, their facilities distribution, and their accessibility at multiple scales remains an elusive task.
At the country scale, when maximizing for the accessibility of population to a fixed number of facilities, Gastner and Newman (17) demonstrated a simple two-thirds power law between the optimal density of facilities d and their population density ρ. The power law was fitted by allocating 5000 facilities in the continental United States using population data within more than 8 million census blocks. In this case, each facility covers an area about the size of a county (∼1000 km2). In a follow up study, Um et al. (18) proposed distinct optimization goals to differentiate public services, such as fire stations and public schools, from commercial facilities, such as banks and restaurants. Public service facilities aim to minimize the overall distance between people and the facilities and follow d ∝ ρ2/3. However, in the case of profit-driven facilities, which have the goal of maximizing the number of potential customers, the power law has an exponent close to 1, that is, d ∝ ρ. The authors found alignment in the analytical optimization and empirical distributions in the United States and South Korea, confirming the 2/3 exponent for public services and the 1 exponent for profit-driven facilities. The simple power law at city scale reveals the equilibrium of empirical allocation of resources across cities with different population. However, distributing facilities at fine scale within cities, where the coverage area per facility is of few blocks (∼10 km2), results in more heterogeneous settlements of population with different socioeconomic characteristics. Studies of accessibility within cities merit attention for science-informed land use planning and the redistribution of public services after disasters and evacuations (19–23). Forward-looking approaches for planning facilities in cities would also consider individuals’ preferences to facilities via mining mobility patterns. Zhou et al. (24) introduced a location-based social network dataset to derive the demand for different types of cultural resources and identified the urban regions with lack of venues. While efforts have been devoted to address the optimal allocation problem in specific cities (24–27), systematic understanding of the optimal distribution of facilities is still lacking from the urban science perspective.
To contribute in this direction, we propose a multicity study that measures the accessibility of city blocks to different types of facilities through their road networks and investigate the role of population distributions. While at large scale, travel cost can be substituted by the Euclidean distance from residents to the facilities, road networks and geographic constraints play important roles for human mobility within cities (28–31). It has been well established that road network properties affect the daily journeys of residents (32–34), their urban form (35, 36), and their accessibility (29, 37). As a complement to most studies devoted to travel costs of commuters, we analyze in this work the road network distance of individuals to the nearest amenity of various types, dividing the space in high-resolution blocks of constant area of 1 km2. For each city and facility type, we optimally redistribute the existing facilities and compare the result with their empirical distribution. We observe that, in the redistribution, some blocks increase their accessibility and others decrease it. This implies that, to make the best use of the existing facilities for a more equitable accessibility, some blocks would benefit, whereas others would have facilities removed. At the city level, the gap between the empirical facility distribution and the optimal planning offers the opportunity to assess the planning quality of facilities in diverse cities. We also revisit the power law between facility and population densities and observe that the two-thirds power law is not followed by the empirical cases, and it is observed in the optimal scenario only when the number of facilities is small compared to the total number of blocks in the city.
We further investigate optimal distributions of facilities by modeling its average travel distance in different cities as a function of the number of facilities to assign. A model of this quantity is derived on both synthetic and real-world cities and fits different cities well with only two free parameters. Furthermore, this gives us a universal function between the average travel distance and the number of facilities controlled by the urban form derived from the population distribution. As an application case, we estimate the number of facilities required to achieve a given accessibility via the proposed function in 12 real-world cities.
Empirical distribution of facilities
We select three cities [Boston, Los Angeles (LA), and New York City (NYC)] in the United States and three cities (Doha, Dubai, and Riyadh) in the Gulf Cooperation Countries (GCC) to study the empirical distribution of facilities. For each city, we collect the population in blocks with a spatial resolution of 30 arc sec (1 km2 near the equator) from LandScan (38), road networks with the OpenStreetMap (39), and facilities from the Foursquare (40) service application. These novel, rich, and publicly available datasets have proven value in transportation planning (34, 41, 42), land use studies (43, 44), and human activity modeling (45–47). The boundary of each city is drawn along with the Metroplex, encompassing both urban and rural regions. Figure 1 depicts the road network, population density, and 10 selected types of facilities (e.g., hospitals and schools) in NYC and Doha. The statistical information of six cities is summarized in Table 1. For clarity, all variables and notations introduced in this work are summarized in note S1. The distribution of all available facilities in Foursquare data for the six cities is presented in fig. S1. Details of the data sets are described in Materials and Methods and note S2. Figure S2 presents the distribution of population and different facility categories as a function of the distance from the central business district, indicating the diversity of the selected cities. Specifically, it can be observed that Doha and Dubai have more facilities that are located in the highly populated areas, whereas Boston has the majority of facilities located near the city center where fewer people reside. Discrepancy in the distributions between population and facilities can also be observed in LA, NYC, and Riyadh. In these three cities, the population density peaks near the city center, but the facilities are distributed more uniformly across the city.
It is noteworthy that, for calculating facility density and total number of facilities, we first merge the same type of facilities (e.g., hospitals) located in the same block as one facility. Thus, the number of facilities thereafter refers to the number of blocks accommodating a given type of facility, denoted by N. We define Nmax as the total number of blocks in one city. Furthermore, as nearly unpopulated blocks do not weigh in the calculations of accessibility, we define Nocc as the number of occupied blocks given by the blocks with population over a threshold. We set the threshold as 500 in real-world cities, which is commonly used to distinguish between urban and rural regions. The ratio between the number of blocks occupied by facilities N and populated blocks Nocc is denoted by Docc. Table 1 reports Docc of the 10 selected types of facilities in the six cities of study. As an example, Docc of hospital in Boston equals to 0.11, indicating that about 11% of populated blocks are occupied by hospitals.
To quantify the accessibility of the population to facilities, previous work used the Voronoi cell around each facility as a proxy of the tendency of individuals to select the closest facility in a Euclidean distance (17, 18). However, within cities, the distance that people travel in the road networks is constrained by the infrastructure and the landscape. In this context, the routing distance is a better proxy of the accessibility from the place of residence to each amenity. Figure S2C compares the distributions of routing distance of the actual and optimal locations of facilities versus the Euclidean distances, respectively. Our findings confirm that the optimal strategy based on the Euclidean distance achieves similar costs to the actual distribution of facilities, which is much less effective than the strategy that optimizes for routing distance.
Optimal distribution of facilities to maximize overall accessibility
Accessibility indicates the level of service of facilities to the residents. In network science, accessibility is defined as the ease of reaching points of interest within a given cost budget (48–50). How to allocate the facilities to maximize the overall accessibility in cities is one of the most essential concerns of facility planning. From this point of view, we redistribute the facilities by minimizing the total routing distance of population to their nearest facilities. In the following, we refer to this redistribution as the optimal scenario. Likewise, the empirical distribution of facilities is referred to as the actual scenario. Specifically, among the Nmax blocks of one city, we denote as facility-tagged the N blocks that are occupied by a given type of facility in the actual scenario and redistribute the same number of facilities in the optimal scenario. The shortest distance between any pair of two blocks is calculated using the Dijkstra’s algorithm in the road network. The idea is to find a new set of N blocks and label them as facility-tagged such as it minimizes the total population-weighted travel distance from all Nmax blocks to the newly selected N blocks. This optimal allocation problem in networks is known as the p-median problem and, here, it is solved with an efficient algorithm proposed by Resende and Werneck (51) (Materials and Methods).
The difference of the travel distance between the actual and the optimal scenarios assesses the quality of the distribution and therefore, of the accessibility in different cities. In each scenario, each residential block is associated with the facility that can be reached in the shortest routing distance. The block is linked to itself if it is occupied by a facility. It is important to note that we do not consider in the present study the capacity of facilities as a constraint, i.e., the number of people using the same facility is not limited. We group the set of blocks served by the same facility and define them as a service community. Considering hospitals as an example, we present in Fig. 2 (A and B) the service communities in Boston in the actual and optimal scenarios, respectively. The color of each cluster depicts the total population
in the service community of the jth facility. The communities in optimal scenario are more uniform in both size and population compared to those in the actual scenario. Particularly in the actual scenario, the communities have small area in downtown Boston but large in the rural area, revealing the uneven distribution of hospitals.
To quantify the disparities between blocks in the level of service for a given type of facility, we compare the actual and optimal travel distances to facilities. We define a gain index of the ith block as
and li are the shortest travel distances from the ith block to its nearest facility in the actual and optimal scenarios, respectively. An ri >1 identifies that the block is better served by the facility in the actual than the optimal scenario. Residents living in these blocks benefit more from the distribution of facilities than they would in the scenario of social optimum. In Fig. 2C, we illustrate in Boston the ri of each block to hospitals in a logarithmic scale. The blocks in green, near to hospitals, are located in the central, southern, and northeastern areas, while the blocks in red have lower accessibility to hospitals when compared with the optimal scenario and are located in the northern, southwestern, and southeastern areas. This has some resemblance with the spatial distribution of wealth in Boston metropolitan area (52). The actual travel distance
and the gain index ri in the ith block to hospitals for six cities are presented in fig. S3.
Although the inequality of the distribution of facilities can be visually observed from Fig. 2C, for comparing the inequality across facility types and between cities, we compute the Gini coefficient of ri of all blocks per facility type per city, as illustrated in fig. S4A. We observe that the Gini coefficients of all selected facility types in Boston are similar and around 0.5. NYC has the most discrepancies in the Gini coefficients over the 10 facility types, where the distributions of schools, parks, pharmacies, banks, and bars are more equitable than others due to their high densities (see Table 1). In the GCC cities, fire stations are the most equitably distributed facilities, while bars, hospitals, parks, and pharmacies are distributed less equitably than others. The Lorenz curves and the values of the Gini coefficients per facility type are presented in fig. S4B. The three cities in the United States are generally planned more equally than the GCC cities.
Thereafter, we compare the difference in accessibility across cities to various facility types. Figure 3A presents the average travel distances in the actual scenario (
) and optimal scenario (L) to the 10 selected types of amenities. The first row displays the facilities with higher densities in the United States cities: banks, pharmacies, schools, parks, and bars. Next come hospitals and supermarkets, followed by concert halls, soccer fields, and fire stations, which have the lowest densities. As expected, the lower the density, the longer the travel distance there is to them. Note that the accessibilities to parks, fire stations, and bars have the largest differences between the United States and GCC cities mainly due to lower availability in the latter. To compare the travel distance in different cities in the same order, we exhibit the scatterplots of
and L versus Docc, the ratio between N and Nocc, in Fig. 3 (B and C). The discrepancy of actual travel distance
among the six cities is mainly caused by the difference in facility planning strategy and urban form. As expected, the optimal travel distance L displays a more uniform tendency than
, revealing the potential of modeling L with the number of facilities N.
An interesting measure is the improvement of overall accessibility if the locations of facilities are optimally redistributed at a city scale. To that end, we define the optimality index R for a given type of facility at city level as the ratio between the average travel distance to the nearest facilities in the optimal and actual scenarios
(2)where pi is the population in the ith block. R ranges from 0 to 1, with 1 indicating that the facilities are optimally distributed in reality. In note S3 and fig. S4C, we discuss the change of R with N/Nmax by introducing two extreme planning strategies, random and population-weighted assignments, described in note S3. We observe that R score of actual planning is mostly between the two extreme strategies, except Riyadh, in which R is even lower than random assignment. This suggests the imbalance between facility locations and service delivery in Riyadh (53). Besides, we observe that the R score of actual planning is the highest when N/Nmax is the smallest for cities, except LA and NYC. For the two extreme strategies, we observe R is u-shaped as a function of N, except for LA. This suggests a higher R for both small and large N values. This is because, for a small N, simply allocating the facilities in the most crowded blocks would shorten the total travel cost to a great extent, while for a large N, most blocks are occupied by facilities. The R score of LA keeps flat compared to other cities mainly due to the polycentric distribution of population, indicating that a small number of facilities cannot efficiently serve most of the population.
Figure 3D depicts the box plot of R of the 10 types of facilities in the six cities. R is generally lower with larger facility density, suggesting that the gaps between actual and optimal distribution are larger. For example, hospitals and fire stations have much lower density than bars, but their R scores are larger. Public services need to be uniformly distributed, while commercial ones do not. The more available facilities are banks, pharmacies, schools, parks, and bars, with an R between 0.4 and 0.5 on average, revealing that the average travel distance could be reduced by 50% if all facilities are planned in the optimal locations.
Revisiting scaling law between facility and population densities
Previous work has related the facility density to population density as a power function both in the actual and optimal scenarios at the national scale (18). Here, through introducing the road networks, we dissect these power laws in the two scenarios in diverse cities. We calculate both facility and population densities in the service communities, as shown in Fig. 2 (A and B). Specifically,
is approximated by the product of the number of blocks
and the average block area in the city, that is,
. Taking hospitals as an example, their densities versus the population densities of the service communities in the actual scenario over the six cities are illustrated in Fig. 4A. The full lines represent the fitted power law functions with least squares method and with communities with more than 500 residents. Cities have different exponents and the r2 scores of the fitting are less than 0.5 in most cases. These results show that, although the two-thirds power law was found for public facilities at county resolution (18), we do not find a uniform law between facility and population densities at finer resolutions, i.e., intra-city community level.
Once facilities are optimally redistributed in the city, the service communities are reorganized accordingly. The fitted power laws between the distribution of hospitals and population in optimal scenario of the six cities are shown in Fig. 4B. The fitted exponents are closer to 2/3 and have a larger r2, and the 95% confidence intervals are narrower than those in Fig. 4A, depicting the actual scenario. The exponents for the 10 selected types of facilities in the actual and optimal scenarios are reported in table S1. As expected, cities have different exponents for both actual and optimal scenarios. In all cases, we observe that the optimal exponents deviate from the analytical 2/3 previously reported when the facilities are optimally distributed by a Euclidean distance at national case (17). Sources of difference are both the constraints introduced by the road networks and the higher density of facilities to be distributed.
For a comprehensive understanding of the existence of the power laws, we optimally allocate varying number of facilities N in our six cities of study and in synthetic cities. In Fig. 4 (C and D), we relate the β to Docc, the ratio of N to Nocc, and observe 2/3 when Docc < 0.2(0.1) for the real-world (synthetic) cities. We simulate controlled scenarios via four synthetic or toy cities of size 100 ×100, with population distributions depicted in Fig. 5A. Note that the population threshold is set as 50 in toy cities to count Nocc, and the total population is fixed as half million, which is about 1/10 of the studied cities. We find that the curves of diverse cities collapse into a single one, indicating that the difference in the change of β across cities is mainly caused by different Nocc. In the toy cities, we notice that the change of β is not monotonous. It stays around 2/3 when Docc is below 0.1. Subsequently, β decreases with Docc as more facilities are assigned to the low-density regions and then increases as facilities start to refill the high-density regions. After all high-density blocks are assigned with facilities, β starts to drop to zero, implying that all blocks are filled with facilities. The same fluctuation of β is not observed in real-world cities because the large and low-density regions are not segregated like in the synthetic cities. In summary, in the optimal scenario, the two-thirds power law can be found for a limited number of facilities but tends to disappear for larger values of N.
Modeling accessibility to optimally distributed facilities
In Fig. 3B, we see that Docc is the most determinant factor to decrease the average distance
to a facility independent of its type and city. In Fig. 3C, we observe that these decreasing functions L(Docc) collapse for the optimal distributions in each city. Following up on this observation, we explore further the relation between travel distance in optimal scenario L and the number of facilities N for diverse cities with various geographic constraints and population distributions. To this end, we designed 17 toy cities with different levels of urban centrality; 4 of these are illustrated in Fig. 5A. Population distributions of the toy cities are generated by a two-dimensional Gaussian function (e.g., cities a and b) or a mixture of several two-dimensional Gaussian functions (e.g., cities c and d). The toy cities have the same population of half million and are equal sized, consisting of 100 × 100 blocks. The size of each block is set to 1 km2, and the travel cost between two blocks is calculated with the Euclidean distance between their centroids. We measure the centrality of a city by computing the urban centrality index (UCI), proposed by Pereira et al. (54), of the population distribution (Materials and Methods). UCI ranges from 0 to 1, with 0 indicating the totally polycentric—with the population of the city uniformly distributed—and 1 indicating totally monocentric—with all the population residing in one block. In addition, we include 12 real-world cities for further exploration, the six aforementioned to which we add: Paris, Barcelona, London, Dublin, Mexico City, and Melbourne. Population distributions of four selected cities are illustrated in Fig. 5B. Paris is the most monocentric, with a UCI of 0.50 and most residents residing in the urban region, while Melbourne is the most polycentric, with a UCI of 0.08 and residents dispersed over the city.
For an estimate of the optimal travel distance L in each city, we first assume that the in-block travel distance is constant lmin = 0.5 km, and the average travel distance within a service community approximates to
denotes the geometric factor in the community;
denotes the area of the occupied blocks (17). Then, L is expressed as the sum of two terms, the first for the population in the N blocks with facilities and the second for the population in the Nmax–N blocks without facilities
(3)where P is the total population in the city, and
denotes the population in the service community of the jth facility after removing the block where the jth facility is located, that is,
. We find that
follows power law relation to the total area in community
in most cities, that is,
(fig. S5A). We assume that
is constant in each city, written as gcity, and
denoting the average block area in the city. Then, we can rewrite Eq. 3 as
(4)where p(N) denotes the share of population in blocks with facilities; A and λ are both constant. More details of this derivation can be found in note S4.1.
We further study how the share of population in blocks without facilities is related to the number of facilities N, and find that 1 − p(N) ≈ e−αN when N ≪ Nocc (see details in fig. S5B and notes S4.2 and S4.3). Thereby, we could model L as
(5)where the number of facilities N is the main variable that determines L. While α controls the relation between p(N) and N, A and λ are two free parameters to calibrate. The model of L(N) summarizes the fact that to model L, the only two essential ingredients are the number of facilities to allocate N and the distribution of population in space.
Next, we numerically assign the optimal distribution of facilities given varying number of facilities for both toy and real-world cities. We present the average travel distance L versus the number of facilities N in the toy and real-world cities in log-log plots in Fig. 5 (C and D). In Fig. 5C, we see that for the same N, the global travel costs in polycentric cities are larger than in the monocentric ones. To validate the proposed function L(N), we first calibrate α by fitting 1 − p(N) = e−αN per city (fig. S5B), and then calibrate the two free parameters A and λ in Eq. 5 with the simulated L. All parameters are presented in table S2. The fitted L(N) values are shown with lines in Fig. 5 (C and D). The simulated and modeled L values are presented separately for each city in fig. S6, showing good results under various empirical conditions.
For seeking a universal function to approach the simulated L in diverse cities, we use λ in Eq. 5 as a constant, fixing its average empirical value
in the 12 real-world cities. Combining the observation that Nmax is inversely proportional to α and
(note S4.1), we can expect that
. Figure S7C confirms this, showing that
. We can rewrite Eq. 5 as follows
This function with only one free parameter α suggests that we are able to rescale N with α to collapse the curves of L in all cities into one, as shown in Fig. 5F that depicts Eq. 6 as solid line. The same rescaling of N in toy cities is presented in Fig. 5E, where the collapse is not as good as in the real cities due to the divergent values of λ of toy cities in table S2. Next, we go beyond the average distance L and plot the distribution of travel distances when keeping αN fixed (fig. S7, F and G). In all cases, the travel distance follows a gamma distribution. This universality suggests that (i) given a certain αN, all real-world cities can reach comparable accessibility; and (ii) the overall accessibility in the optimal scenario depends not only on the availability of the resources but also on the settlement of population, independently from the road network and total area of the city.
Empirically, the decay of population share in blocks without facilities α depends on the population distribution in space. Taking into account that unpopulated blocks are not ideal when optimizing accessibility, Nocc is a better variable to express α. A good agreement α = 1.833/Nocc (R2 = 0.96) over the 12 real-world cities is shown in Fig. 5G, suggesting that α can be estimated by Nocc. Given that α = 1.833/Nocc and the universal relation of L(αN), we can explain the collapses found in Figs. 3C and 4 (C and D).
As a concrete application of this universal model for optimal distance of facilities, in Eq. 6, we can plan for facilities by, for example, extracting how many facilities are needed for varying levels of accessibility to a given type of service. In this context, the number of facilities N can be estimated with the inverse function of Eq. 6. As the second term in Eq. 6 dominates the L for a limited N, we simply invert the second term to estimate N, given by
, where W( · ) is the ProductLog or the Lambert W function (note S4.5). Figure 5H presents the estimated and simulated N versus L for two limiting cases, LA, in which the approximation agrees well with the simulation, and Barcelona, in which the approximation underestimates N. The results of other real-world cities are depicted in fig. S8, showing in general a good agreement between the analytical approximation via the Lambert W function and the numerical simulations.
As cities differ in their form, economy, and population distribution, the interplay between population and facility distributions is challenging to plan. The accessibility of facilities is constrained by their availability, the road network, and means of transportation. While efforts are devoted to managing daily commuting and transit-oriented developments, the planning of the distribution of different urban facilities deserves attention to a paradigm shift toward walkable cities. We present a framework that uses publicly available data to compare the optimal and the actual accessibility of various facility types at the resolution of urban blocks. This allows us to efficiently pinpoint blocks that are underserved, i.e., those where people have to travel longer distances to reach the facilities they need compared to the social optimum. By relocating the facilities to optimize the global travel distance, we find that the relation between facility and population densities follows the scaling law, d ∝ ρβ only in the limit of few or limited number of facilities, regardless of the differences in road network structures. The observed exponent β is generally around 2/3 if the number of facilities is diluted or less than 10% of the occupied blocks, and it starts to decay for larger number of facilities. This confirms the continuous limit for diluted number of facilities presented at national scale (18). We observe that the empirical conditions within cities do not follow the continuous approximation for the power law with population density because facilities are not equally planned, and the number of facilities is large in comparison with the number of populated blocks.
To gain further insights when the number of facilities is large, we analytically model the average travel distance L in the optimal scenario versus the number of facilities N and three parameters. Parameter α represents the rate of the population share in blocks without facilities, and the other two parameters can be approximated as constant among cities. A universal expression L(αN) is verified with 17 synthetic cities and 12 real-world cities depicting diverse urban forms. Furthermore, the travel distance to optimally distributed facilities follows a gamma distribution for all cities once αN is fixed. This function can be applied to estimate the number of facilities needed to offer services to people within a given accessibility in average. The results estimated with the derived function find a good match to the numerical simulations that require solving the optimal distribution of facilities. When relating α to the urban form, we uncover that centralized cities require less facilities than polycentric cities to achieve the same levels of accessibility. Applications of this framework could be to optimally reallocate resources that provide emergency services, such as the placement of shelters, ambulances, or mobile petrol stations in the event of natural disasters.
The optimal planning of facilities in this work supposes that all residents equally need the resources, and the accessibility is measured from their places of residence. In reality, the socioeconomic segregation in cities results in heterogeneous needs for resources. Cities in different social systems and economic development levels also exhibit different needs for various types of facilities that would need to be taken into account for economic considerations. On the other hand, people’s needs are naturally dynamic and change in time and space owing to their time-varying mobility behavior. All these factors result in complex interactions between the allocation of facilities and settlements of residents and can be considerable avenues for future research. Another important avenue is to consider the limited capacity of facilities in the optimal planning. This became ever more evident when distributing the health care system resources during the outbreak of a pandemic, such as the COVID-19 in 2020.
Acknowledgments: Funding: This work was supported by the QCRI-CSAIL, the Berkeley DeepDrive (BDD), and the University of California Institute of Transportation Studies (UC ITS) research grants. Author contributions: Y.X., L.E.O., S.A., and M.C.G. conceived the research and designed the analyses. Y.X. and S.A. collected the data. Y.X. and L.E.O. performed the analyses. M.C.G. and Y.X. wrote the paper. M.C.G. supervised the research. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.