Rising demand for agricultural land is a major driver of deforestation, contributing to biodiversity loss and climate change (1, 2). A range of actors have long argued that securing the property rights of local populations and improving commons management may help protect forests (3, 4). A recent “Global Call to Action” backed by more than 300 organizations argued that “insecure land rights are a global crisis…undermining our ability to confront climate change” (5). Conservationists have typically concentrated on tenure interventions focused on forests, such as the creation of protected areas. However, the ability of protected areas to preserve forests is limited (6, 7), and there can be important trade-offs between conserving ecosystems in this way and increasing economic activity (8). While many low-income countries have undertaken programs to improve rural land governance, they have often focused on improving the security of agricultural land (9). The impact on forests of such an approach is, however, uncertain (10, 11).
This paper evaluates the impact on forest cover of a large-scale randomized control trial of a rural land rights program undertaken in Benin. The country has one of the highest deforestation rates in the world, having lost approximately a quarter of its forested area between 1990 and 2015 (12). The main cause of this deforestation is expanding agriculture (1, 13). In 2009, the government of Benin rolled out an experimental scale-up of the Plans Fonciers Ruraux (PFR) program with the objective of both increasing agricultural production and protecting natural resources. The program is a bundle of several interventions broadly serving to formalize and support traditional local land governance systems, lying at the frontier of new land policies being developed in Africa (14).
The random assignment of treated villages in this context provides us with a unique opportunity to identify the causal impact of a land formalization program on forest loss. Previous studies of the impact of land rights on forest loss have generally analyzed reforms that created protected areas or demarcated large areas of forest for community management (15–20). A relatively small number of studies have looked at the impact of agricultural land rights security on deforestation, but a recent meta-analysis found that these typically did not construct credible counterfactuals (21).
Land registration program in Benin
The land registration program in Benin, known as the PFR program, demarcated landholdings within villages, documented usage rights, and created institutions to facilitate conflict resolution (see fig. S1 and “Background” section in the Supplementary Materials). The theory behind the intervention was that it would both increase agricultural productivity and improve natural resource management. On the one hand, demarcation and certification would improve farmers’ tenure security on agricultural plots, leading to intensification and a reduction in the need to clear new land. On the other hand, resolving conflicts, clarifying boundaries, and documenting usage rights would reduce the transaction costs involved in cooperation. This enhanced cooperation could then facilitate the effective self-governance of common-pool resources including forests (3).
Most of the landholdings demarcated by the PFR program contained agricultural plots, but forested areas were also demarcated in several ways. At times, areas of communally managed forest are documented explicitly in the PFR—as an example, Fig. 1A shows a demarcated area with some tree cover labeled explicitly as a domaine sacré (sacred ground) within the PFR. In a similar vein, we also find demarcated areas within the PFRs labeled as domaine du village (village land) or réserve villageoise (village reserve). Areas of forest also frequently occur on areas registered in an individual’s name, such as those in Fig. 1B, although even here, typically, the individual is described as a representative of a lineage or collectivité. This means that there is a substantial amount of tree cover within the demarcated areas—according to our highest resolution measure of tree cover (from RapidEye), 12% of pixels within the demarcated areas are classified as trees in the baseline year, which is very similar to the tree cover in the 5-km buffers that we use in our main analysis. Even when forested areas are not included within the PFR, they may be effectively demarcated by the program. For instance, Fig. 1C shows a small forest that is encircled by demarcated landholdings, while Fig. 1D displays an area of forest that lies between two demarcated villages. Overall, therefore, the PFR program effectively demarcated many forested areas in addition to agricultural plots.
Impact of the PFR program on tree cover loss
Our evaluation exploits the randomized rollout of the PFR program across the country. The program selected 575 eligible villages across the country and then organized 80 lotteries to randomly pick 300 villages for treatment (fig. S2). Activities began in treatment villages in 2009, and to this date, the program has not been expanded to the control villages (fig. S3 provides a timeline).
We measure the impact on forest loss using three different sources (see “Data” section for more details—summary statistics are given in table S1). Our primary measure is from a publicly available dataset of global annual forest loss between 2000 and 2017 constructed by Hansen et al. (22) from Landsat satellites. We also construct our own measure of forest loss from 2010 using higher-resolution data from RapidEye satellites (23). Last, we analyze the impact on fires measured by MODIS (Moderate Resolution Imaging Spectroradiometer) satellites (24) because fires are commonly used to clear land in Benin.
We construct an annual village-level panel by aggregating forest loss and fires within nonoverlapping 5-km buffers centered on the villages (fig. S4). Because the area of forest loss and number of fires are highly skewed (fig. S5), we use the inverse hyperbolic sine (IHS) of each variable. We estimate the impact of the PFR with the following ordinary least squares regression
is the average of the outcome variable over the posttreatment years (2009 to 2017), βl is the treatment effect on villages in lottery l,
is the average of the outcome variable over the pretreatment years (2001 to 2008), and Xv is a set of village-level controls including those variables found to be unbalanced across treatment and control groups. Note that we cannot control for the pretreatment average when we use the RapidEye measure of forest loss as we have no data from before 2010, so instead, we control for pretreatment forest loss as measured by Landsat. In regressions using forest loss, we additionally control for forest cover at baseline. Because the treatment probability varies across lotteries, we estimate separate treatment effects βl for each lottery l and then present the average treatment effect weighting βl by the share of villages in lottery l (25).
The main findings of our work are presented in Table 1. The table shows that the PFR treatment reduced forest loss and fires in the areas containing treated villages. More precisely, the point estimates in columns 1 to 3 suggest a reduction in tree cover loss of around 20% and a reduction in fires of 5% (26). Using the Landsat measure, we estimate that 2700 ha of tree cover was lost in the treated villages between 2009 and 2017, and hence, the point estimate suggests that the program led to 600 ha of extra tree cover by the end of 2017. When we construct a composite index using information from all three of these variables in column 4, we estimate a reduction in the treatment group of around 11% of an SD. In columns 5 and 6, we take the annual percentage of tree cover loss as our outcome variable and find a reduction of 0.1 to 0.3 percentage points. For the Landsat measure, the coefficient implies that the percentage of tree cover lost is 18% lower than in control villages.
We conducted several further analyses to assess the robustness of the results (see “Robustness checks” section for more details). The robustness checks included using buffers of 3 or 10 km rather than 5 km (table S2a), making the dependent variable the villages’ rank within the sample instead of the IHS (table S2b), using alternative sets of control variables (table S3a) and calculating P values using randomization inference (table S3b). In all these cases, the magnitude and significance of the coefficients of interest remain largely unchanged. We also interact the treatment dummy with predetermined observable variables and find no clear pattern of treatment effect heterogeneity (table S4).
Are there negative spillover effects?
Might the reduction in tree cover loss be the result of a displacement rather than a net reduction in tree cover loss? This could occur in two ways. First, tree cover loss may be displaced over time if villagers anticipate the formalization and clear land beforehand. Second, tree cover loss may be displaced geographically if the program makes it relatively harder to clear trees in the treated village.
To investigate the possibility of temporal displacement, we first undertake our analysis at the monthly level using data on the number of fires (see “Displacement and spillovers” section in the Supplementary Materials and Methods for more details). We find no significant impact of treatment on fires in the months in between the information campaign and formalization (table S5a). We also see no evidence of anticipatory deforestation when we plot annual regression coefficients (fig. S7). Last, we find no evidence of greater tree cover loss or fires during the period of the information campaign in those areas where it took place (table S5b).
To test for spatial displacement, we first consider spillovers to nearby villages by estimating the impact of villages within 5 km having been treated (table S6). We find no evidence of negative spillovers in either treatment or control villages. We also find no evidence of greater forest loss or fires in areas that are near treated villages but were probably not demarcated (table S7).
What explains the reduction in tree cover loss?
Evidence from the pattern of tree cover loss and survey data suggests three likely mechanisms through which the PFR program reduced tree cover loss. First, the program may have increased farmers’ investment in existing plots, increasing productivity and, hence, reducing the need to clear further land. This is supported by the results of two other studies that found the program led to increased investment (14, 27). We also find evidence consistent with this in our analysis of community surveys, which are reported in Table 2. The regressions in this table use the same estimation technique as outlined in Eq. 1 but are restricted to the smaller set of villages that were surveyed in 2011. In column 1 of Table 2, we regress a binary variable that takes a value of 1 if the community leader interviewed reported land clearing as a technique that was used in the village to acquire land. We can see that this is significantly less likely to be the case in treated villages. We also find a significant reduction in tree cover loss directly adjacent to agricultural plots, which is likely to be the area most at risk from the expansion of agriculture (panel b of table S8).
Second, the increase in tenure security may have guaranteed farmers that they had enough land for future production, and hence reduced the incentive to claim land through clearing. In column 2 of Table 2, we note that treated villages are significantly less likely to experience border conflicts, presumably as a result of the demarcation process. In our qualitative survey, one interviewee responded that “since the project placed markers, everybody knows their parcels and there are no more land problems.” This mechanism may have operated both at an individual level and at the level of the village. In column 3 of Table 2, we note that community leaders are significantly less likely to report that their village has a conflict with a neighboring village over border encroachment in the past year. This fall in intervillage conflict is therefore likely to have reduced the incentive of villagers to clear land on disputed village boundaries to enhance their claims.
Third, the delimitation of landholdings and the creation of local land committees are likely to have reduced the transaction costs of commons management. We see in column 4 of Table 2 that treated villages report a significantly higher level of trust in institutions to resolve land conflicts. Moreover, respondents in treated villages were significantly more likely to say that communal areas were managed by a local committee such as the conseil du village, rather than an individual (column 5 of Table 2). This is consistent with the PFR program’s stated aim of encouraging participation in land governance (28). We also observe in column 6 of Table 2 that treated villages are significantly more likely to restrict access to communal forests than control villages. The qualitative interviews we undertook in 2017 suggest that restrictions on land clearing such as these were effective, with one interviewee telling us that “Today you can’t go clear land like before. If you want more land to cultivate, you can go rent the demarcated plot of another villager.”
MATERIALS AND METHODS
The Millenium Challenge Corporation (MCC)–supported PFR program aimed to produce 300 censuses of landholdings and land rights held in rural villages across 40 communes throughout Benin (communes are subregional units equivalent to districts—there are 77 communes in Benin). Selection into the program was done in three steps. First, all villages in each of the 40 communes received an information campaign. The intention was to inform villages about the program and invite them to apply for a chance to receive 1 of the 300 PFRs. Second, proposals were reviewed against preestablished selection criteria. The following criteria were used: poverty index, potential for commercial activities, regional market integration, local interest in promoting gender equality, infrastructure for economic activities, adherence to the PFR application procedures, incidence of land conflicts, and production of main crops. From this review, a list of eligible villages was produced. Third, each commune organized lotteries to randomly select villages within the eligible pool into the program.
Overall, around 1200 villages applied for the program, out of the 2062 that were informed. Of these 1200 villages, 575 met the eligibility criteria. These 575 villages that were part of the lotteries are not representative of all villages in Benin—they are, for instance, more rural, more ethnically diverse, and report a greater number of land conflicts than ineligible villages. This is not a problem for our identification because we only look at differences between eligible villages, but it may be a concern for external validity. We therefore pay attention to these variables when we test for heterogeneous treatment effects (see “External validity and heterogeneity” section in the Supplementary Materials and Methods for more details). Moreover, we construct a propensity score for each village estimating how likely it would be deemed eligible based on observable characteristics and test for whether the treatment effect is heterogeneous in this variable.
To select treatment and control villages, the implementing agency of the MCC in Benin organized 80 public lotteries, 2 in each commune. Each pair of lotteries was structured to allow for villages sampled in the 2006 national household survey [Enquête modulaire integrée sur les conditions de vie (EMICoV)] to be overrepresented in the program, thus allowing for the EMICoV survey to be used for evaluation purposes. Because the EMICoV survey uses a random sampling strategy at the commune level, this should not affect the validity of our identification. We account for this lottery stratification in our econometric analysis. Figure S2 shows the different steps of the selection process. Land demarcation activities were undertaken in 98% of the treated villages and only in 2 of the 275 control villages. The two control villages that received land demarcation activities served as replacement villages for two of the treated villages that dropped out of the program shortly after the lotteries.
Goldstein et al. (14) analyze the short-term impact on household agricultural activity. They find an increase in investment on treated parcels, with important heterogeneity across genders. In particular, female-managed landholdings in treated villages are more likely to be left fallow. The surveys used in this analysis did not ask questions about uncultivated plots or deforestation directly, but we use some questions in the survey to document how land registration activities have changed land management practices at the communal levels to produce the results in Table 2.
The first step in constructing our dataset is to locate all the 575 eligible villages. For 283 of the villages, we can use the GPS coordinates of cultivated plots, which were surveyed by Goldstein et al. (14), and we take the village centroid to be the center of the surveyed plots. For the remaining villages, we use a combination of sources, because no single source contains the location of all our villages. These sources include data from the Institut National de la Statistique et de l’Analyse Économique (INSAE), the location of infrastructure projects collected by EMICoV, and maps from l’Institut Géographique National du Bénin (IGN-Bénin) and INSAE. We estimate the quality of each source on the basis of how close the coordinates are to the centroids calculated in the surveyed villages and locate each village according to the highest-quality source, which gives a position of the village.
We then construct buffers around each village. We do not make use of village maps, because we do not have maps of village boundaries for the control villages, and, instead, use the same procedure for treatment and control villages. These are circles with a 5-km diameter by default, which we ensure to be nonintersecting by assigning areas to the closest village when within 5 km of more than one village. These buffers are displayed in fig. S4. The 5-km figure is somewhat arbitrary, and we therefore test for robustness to alternative distances. The buffers generally seem consistent with the surveyed plot data—see the villages in fig. S8, for example—but we do not construct buffers based on the surveyed plot data because the surveyed plots are likely to be unrepresentatively close to the village centers.
Our primary measure of forest loss comes from version 1.5 of the Global Forest Change dataset constructed by Hansen et al. (22) using Landsat satellite data. For each 30 m × 30 m pixel globally, they construct a measure of forest cover in 2000 and whether each pixel experienced forest loss between 2000 and 2017. To reduce noise, we only consider grid cells with more than 25% forest cover in 2000, in accordance with previous work using this data (30, 31). To calculate the total forest loss within a village buffer, we weigh pixels by the percentage of forest cover in 2000—deforestation of a pixel that had 80% forest cover therefore counts for double that of a pixel that had 40% forest cover. In this way, we form an estimate of the total area of forest loss within a village buffer for each year between 2001 and 2017.
Our second outcome measure estimates tree cover loss using data from the RapidEye satellites, which have a 5-m pixel resolution. For each village, we compared two images taken between December and February (the driest season) in different years to estimate tree loss between the earliest year possible and 2016. For the northern part of the country, it was possible to use images from 2010, but for the southern part, the earliest usable images were from 2012. The pixels were then classified using supervised classification by maximum likelihood estimation to obtain maps of vegetal cover. Pixels were then classified as suffering tree cover loss if they switched from being classified as forest in the earlier year to bare soil in the latter year. Our village-level measure is the total area of the pixels classified as suffering tree cover loss between the 2 years divided by the number of years between the images.
In addition to measuring forest loss directly, we also use data on fires because burning is a key tool in forest clearance and is generally correlated with deforestation. Data on fires are from the MODIS Collection 6 Active Fire Product (24). This product detects fires in 1-km pixels that are burning at the time of overpass under relatively cloud-free conditions using a contextual algorithm, where thresholds are first applied to the observed middle-infrared and thermal infrared brightness temperature, and then false detections are rejected by examining the brightness temperature relative to neighboring pixels. We sum the number of fires within each buffer for each year between 2002 and 2017, weighting fires by the confidence measure calculated by Giglio et al. (24).
We use all three of these measures in our main analysis because each source has its relative advantages. One potential problem with using direct tree cover loss measures is that they do not distinguish between permanent deforestation and temporary clearances (1). We observe in our tree cover loss data patterns that appear to be the result of managed logging, which we would not anticipate being affected by the PFR program. An advantage of using data on fires is that fires are not used to clear in this way and, hence, are more likely to be correlated with the kind of tree cover loss we are most interested in. A further advantage of using the fire data is that the observation frequency is much higher. Although the Hansen et al. data assign a year to any measured forest loss, in their validation exercise, they find the year only accords with MODIS data 75.2% of the time (22). The main advantage of the RapidEye-based measure is that it has a much higher resolution than the measure derived by Hansen et al. This is particularly helpful when analyzing where tree cover loss occurred near plot boundaries, because the 30-m pixels often span both sides of a demarcated boundary. An additional advantage of these data is that we calibrated the measure using imagery from Benin only, and hence, it may be more sensitive to changes in tree cover than the Hansen et al. measure, which has global cover. When we correlate the relevant variation in the three measures in the posttreatment period, we find the two measures of forest loss have a correlation of 0.33 with each other, and both have a correlation of 0.24 with the measure of fires. This suggests that they bring different information to the analysis. We therefore also construct a composite index containing information from all the available outcome measures—further details can be found below in the “Econometric specification” section.
In table S1, we present summary statistics of various variables for the period 2001 to 2008 across treatment and control villages. Note that, according to our tree cover measure constructed using the Global Forest Change data, the average village had around 80 ha of tree cover in 2000 but had lost roughly 1 ha each year before 2009. This annual rate of forest loss of around 1.25% is slightly higher than the national rate according to the Hansen et al. data, which is 0.8%. Both are comparable to the 1% national rate estimated by the Food and Agriculture Organization of the United Nations over a similar period (12).
The random assignment of the program at the village level establishes our identification, and we exploit within-lottery variation to estimate the average treatment effect of being selected for the PFR program. As displayed in Eq. 1, we estimate a vector of treatment effects, one for each lottery, and then take the weighted average of these, which produces an unbiased and consistent estimate of the average treatment effect (25).
We test for balance across treatment and control groups in column 5 of table S1 by undertaking the regression detailed in Eq. 1 with various predetermined variables as the outcome variable. In this case, the set of control variables Xv includes only lottery dummies and an indicator of whether the village was surveyed or not. We control for lottery fixed effects because the probability of treatment varies across lottery. The source used to determine the village’s location is also correlated with a village’s treatment status, because treated villages are more likely to be surveyed, and where possible, we use the surveys to locate the villages.
In general, we can see that most variables are balanced across the two samples—of the 40 variables we include, we find significant differences at the 10% level for only 4 of them and at the 5% level for only 1. This is consistent with the balance found across previously surveyed villages by Goldstein et al. (14). We include the four unbalanced variables—namely, total tree cover in the village buffer, the distance to the nearest protected area, the share of houses with metal roofs, and whether the village is classified as urban—in the vector of control variables Xv in all the following regressions unless otherwise stated.
Independently of the satellites used, our main variables of interests show that the distributions of area deforested and the number of fires per buffer are highly skewed. For instance, 95% of our observations have an area of tree loss of less than 5 ha, but our maximum value is more than 100 ha. Similarly, while the median observation has less than one fire (adjusted for confidence), the top 1% of our observations have more than 15 fires. If we regress these variables linearly on treatment status, results are likely to be driven by these outliers. This is concerning to us because we believe these outliers may well be the result of measurement error and will therefore simply add noise—a large amount of forest loss, for instance, may be the result of the cutting of a plantation, while a large number of observed fires may represent repeated observations of the same out of control wildfire rather than repeated land clearances.
In our main analysis, we therefore use the IHS of each of these variables. Transforming skewed data to reduce the influence of extreme values and comply with the normality assumptions of linear regression models is good practice (32–35), and the IHS is widely used because it approximates the natural logarithm while allowing the retention of zero-valued observations (26, 36).
To accommodate other possible transformations, our baseline specification includes three more outcome variables in addition to the IHS of the measures described above. First, we construct a composite outcome index, which combines the information in the first three outcome variables. In particular, we convert the three initial outcome measures into z scores by subtracting the mean and dividing by the SD, and then we take the average of the available scores to produce a combined index. We then demean and divide this index by the SD to ease interpretation. Second, for our measures of tree cover loss, we divide by baseline tree cover instead of taking the IHS. This creates two measures of the percentage of tree cover loss, reducing skewness caused by large variations in baseline tree cover. Note, however, that this can lead to extreme values when villages have a small amount of tree cover. For instance, there is one village in the treatment group that lost 45% of its tree cover according to RapidEye (no other village lost more than 20%)—the village in question only has 475 m2 of tree cover, the sixth lowest in our sample. Note also that the percentage of tree cover lost is undefined for areas that had no tree cover at baseline, and therefore, the number of observations is typically smaller in regressions with this outcome measure. For both measures of tree cover, we do not find any significant difference in the number of villages with no cover when we check for balance in the same way as we do for other variables in table S1.
Table S2b considers how sensitive our results are to our choice of dependent variable. An alternative methodology to reduce the influence of outliers is to rank villages each year according to the outcome measure, normalize the rank around zero, and then take this as the dependent variable. The results of this process are reported in columns 1 to 3, and we note that there is still a statistically significant impact of treatment for two of the three measures. The coefficient using the RapidEye measure of tree loss is insignificant, but it is not significantly different from either of the coefficients in columns 1 and 3—in all three cases, it appears that treatment reduces the ranking of treated villages by around 10 places. One way of interpreting these coefficients is to ask what percentage decrease in the base outcome variable in the control group would have led to a coefficient of zero in these regressions. For the case of tree cover loss measured by Landsat, for instance, we find this to be a 20.3% reduction in tree cover loss, which is very similar to the magnitude we estimated in the baseline specification. In columns 4 to 6, we consider an extreme transformation whereby we simply record whether any forest loss—or any fires—are recorded in the village buffer each year. In this case, we find small insignificant coefficients, which suggests that the variation driving our result is on the intensive margin. This is perhaps not unexpected given that, on average, villages experience tree cover loss/fires in most years. In columns 7 to 10, we use the raw outcome measures, which we know to be highly skewed. When we use an ordinary least square (OLS) regression in columns 7 to 9, the coefficients are insignificant, consistent with this specification being vulnerable to extreme values unrelated to the PFR program. We see that there is one village in the treatment group, Ologo, that experienced more than a hundred times the median amount of deforestation between 2009 and 2017. Looking at the satellite data, we can see that this cannot be a result of the treatment—instead, it is driven by the cutting of a large-scale industrial plantation that was near the village but not part of the PFR program. For the number of fires, as well as using the OLS, we can also undertake a Poisson regression, because the raw fire measure is at its base a count variable. We do this in column 10, and here, the coefficient is statistically significant at the 10% level and corresponds to an incidence rate ratio of 0.93, meaning that treatment led to a 7% reduction in the number of fires. This estimate is very close to that of the baseline specification in Table 1.
We may also wonder how robust our measure is to using alternative sources of forest cover data. We therefore construct a measure of forest cover loss using the MODIS Vegetation Continuous Fields product (37). This product estimates forest cover continuously each year within 250-m pixels. Because the resolution is much lower than Landsat or RapidEye, we do not measure tree loss as going to 0% tree cover but instead measure the number of pixels in which there was a significant reduction in forest cover, calculated using the pixel-level SDs that the product supplies. Column 11 of table S2b then reports the results when we take the IHS of the area, which observed significant tree loss, while column 12 reports the results when we simply measure whether any pixels experienced significant tree loss. In both cases, we obtain coefficients very similar to those estimated when using the Global Forest Change product produced by Hansen et al. (22).
Table S3a considers how robust our results are to using alternative sets of control variables. In panel 1, we include only the pretreatment average outcome as a control variable, removing the other variables that are included in the baseline specification (i.e., those variables that are unbalanced at baseline). In panel 2, we replace the control variables that were previously in Xv with the full set of 30 predetermined variables, which are listed in table S1. In panel 3, we then use the “post–double selection” lasso technique to select the most relevant control variables from among this set of 30 and their interactions (38, 39). In all three cases, we can see that our results generally do not change substantially and remain significant.
Another concern is that our SEs may be inappropriate given spatial correlation in both the treatment and the outcome variables. In panel 3 of table S3b, we therefore present SEs calculated in two alternative ways. In parentheses below, the coefficients are SEs, which allow for spatial correlation over 10 km using the Stata routine reg2hdfespatial developed by Thiemo Fetzer, based on the method developed by Hsiang et al. (40) and Conley (41). Below this, we then give the P values using randomization inference. These are calculated by permuting the assigned treatment status across villages 1000 times using the Stata command ritest and then calculating the regression coefficient in each case (42). We can note that P values remain very close to those in the baseline specification.
In panel 4 of table S3b, we estimate the average treatment effect using inverse probability weighting (IPW) instead of estimating lottery-specific treatment effects. We do not include lottery-fixed effects but instead weigh each observation as follows. Each treated observation is weighted by the inverse of the proportion of villages in the lottery who were assigned to the treatment condition, and each control village is weighted by the inverse of the proportion of villages in the lottery who were assigned to the control condition. Note that eight villages are assigned zero weight and are therefore dropped from the regression, because they are in lotteries where all villages were treated. This occurred because of quotas on the required number of villages to be treated within each lottery. These villages do not contribute directly to the estimated treatment effect in our main specification but remain in the regression because they influence the estimated coefficients on the control variables. We can see from the table that coefficients estimated using IPW are very similar to those estimated using our main technique. The SEs given by IPW are typically slightly larger than those in Table 1, but the P values given through randomization inference are very similar.
Figure S6 displays dot plots of the first four outcome variables. These are constructed according to the technique outlined in Gerber and Green (43), whereby we first regress both the outcome variable and the treatment status on the set of controls in separate regressions. We then plot the average of the residuals within each group (treatment and control within each lottery) against the average of the residual of the treatment status within this group. The size of the circle plotted corresponds to the sum of the normalized inverse probability weights within each group—this is equal to the number of villages in the corresponding lottery. Plotting the variables in this way gives us a sense of what the treatment effect looks like. We can first note that the treatment effect is small relative to the variation across lotteries. This is consistent with our regression results—in column 4 of Table 1, for instance, we estimate the treatment effect on the composite outcome index to be 10% of the SD across villages. A second point we can note from this figure is that the results do not appear to be driven by outliers. While there are groups with large residual values both above and below the mean, these do not appear to be consistently driving the difference between the treatment and control groups.