You are here

5.3 Population distribution maps

Printer-friendly versionPDF version

This chapter discusses the various sources for population distribution maps. It also gives an example from Grenada on the generation of population distribution at building level, by combining building footprints and census data at enumeration district level. 

Keywords: Population distribution, Dasymetric mapping, Gridded population data, GPW, GRUMP, LandScan, WorldPop

Authors: Manzul Kumar Hazarika, Syams Nashrrullah, Mujeeb Alam and Cees van Westen

Links:

Introduction

High resolution and spatially accurate data on population distribution are very important for disaster and risk management. Rapid urbanization and population growth over the past decade has brought the population distribution to spotlight from risk perspective. Census is the only consistent source for demographic data with a wide geographic scope. It is the most reliable information for describing households, neighborhoods, cities, and countries. They are also used as a benchmark data for studying population changes (trend/direction), and are key input for making population projections including gender, poverty, labor force, employment etc. Census data is the basis for the governments for policy development, management and evaluation of development programs. However, collection of census data is a costly affair and resource intensive and it is normally done once in a decade in most of the countries. Furthermore, census data are often only available at an aggregated level.

Population distribution maps are mainly based on population count from census data which used administrative units as basic map unit. Since 1990s, a spatially gridded population distribution map have also been developed by incorporating land use information and other ancillary data relevant to the distribution of population. In the following sections, the concept of population distribution model is explained and the current global gridded population datasets including Gridded Population of the World (GPW) database, Global Rural Urban Mapping Project (GRUMP), LandScan, and WorldPop is introduced. An example of local level population data is also described to demonstrate the application for small island countries of the Caribbean.

Objectives

  • Understand the concept of population distribution models.
  • Familiarize the reader with the available global population distribution maps.

Description

Population distributions are commonly displayed using choropleth maps of decennial census data. Choropleth maps aggregates population data with administrative units (census tracts or block groups) whose boundaries do not always reflect the natural distribution of human populations. Meanwhile, dasymetric mapping is a method for mapping the distribution of population relative to land use or other ancillary data, such as building footprints, locations of roads, slope and elevation etc. Dasymetric mapping depicts quantitative distribution of population using boundaries that divide an area into zones of relative homogeneity. In dasymetric mapping, a source layer or population data is converted into a surface and an ancillary data layer is added to the surface with a weighting scheme applied to cells coinciding within the zonal boundaries of ancillary data. Dasymetric maps are closely related to choropleth maps, however, they are difference in several ways:

  • Zonal boundaries on choropleth maps demarcate by administrative units, while zonal boundaries on dasymetric maps are based on changes in the statistical zones derive from ancillary information.
  • Choropleth zones have varying levels of internal homogeneity, while dasymetric zones are designed to be internally homogeneous.

 

Figure 1: (Left) Choropleth map of population distribution in part of Grenada, (Right) Dasymetric map of population distribution in part of Belize taken from WorldPop

A gridded population distribution transform population data from their native spatial units, which are usually administrative units of varying resolutions, to a global grid of quadrilateral cells at a certain spatial resolution. The idea to grid population distribution is not new, though the earliest population grids showed little or no improvement over data available by administrative boundaries. Currently, the most widely known global population grids include the Gridded Population of the World (GPW), the Gridded Rural-Urban Mapping Project (GRUMP), LandScan, and WorldPop. GPW provide population estimates at roughly a 5.0 km resolution, while GRUMP and LandScan offer the gridded dataset at roughly a 1.0 km resolution. A newly developed regional and country-specific models WorldPop provides an open access archive of spatial demographic datasets for Africa, Asia, Central and South America at a resolution of 100 m. These datasets vary in their modeling techniques and the types of input data used for their construction. It is important to note that not all of the current available global datasets is useful for the Caribbean countries, considering their small size of area in comparison with the resolution of the maps. For example, GPW dataset cannot shows the population distribution very well for Grenada which has only around 310 square kilometers of total area. On the other hand, Belize has larger area compare to other Caribbean countries and WorldPop datasets can be used to describe the population distribution in this country (Figure 1b). Therefore, it is necessary to develop the population model for small island countries of the Caribbean, by taking example from the existed global population distribution models.The following briefly explained the methodology for each population distribution maps:

Gridded Population of the World (GPW)

The GPW v3 is the most recent version of GPW that provides some improvements over the earlier versions, especially in the production input and methods, and the quality of output to allow identification of anomalies and errors introduced during the processing. The basic
method for the development of GPW v3 is more or less the same with the method used for GPW v1 and GPW v2. A proportional allocation algorithm (also known as areal weighting) is applied on the highest spatial resolution census or the most recent survey data available with their smallest administrative units, to produce a global grid of quadrilateral cells at a resolution of 2.5 arc-minutes or approximately 4km at the equator. The algorithm works on the assumption that the variable being modeled – in this case population – is distributed evenly over the administrative unit. Grid cells are assigned a portion of the total population for the administrative unit they fall within, dependent on the proportion of the area of administrative unit that the grid cell takes up. The main drawbacks of the GPW dataset are lack of or no ancillary data used for the population distribution model, that unlikely provide a realistic output, especially within large units with significant variation in characteristics of land cover.

Figure 1: (Left) Choropleth map of population distribution in part of Grenada, (Right) Dasymetric map of population distribution in part of Belize taken from WorldPop  

53_A_1.JPG      

Gridded Rural-Urban Mapping Project (GRUMP)

The first version of GRUMP population distribution builds on the GPW v3 datasets, but incorporates urban areas based on the observations of satellite night-lights collected from a series of NOAA’s satellites over several decades. The night-light data and other ancillary data used to develop an urban-rural “mask,” or urban extents grid, which identifies those areas that appear to be urbanized. GRUMP v1 also includes a geo-referenced database of urban settlements with populations greater than 5,000 persons. A mass - conserving algorithm called GRUMPe (Global Rural Urban Mapping Programme) was developed to produce the population distribution model, which reallocates people into rural and urban areas within each administrative unit. The final stage is to convert the output coverage from GRUMPe into a grid cells at a resolution of 30 arc-seconds or approximately 1km at the equator.

LandScan

The LandScan population distribution used a multi-layered, dasymetric, spatial modeling approach that is also referred to as a “smart interpolation” technique. In LandScan models, the typical dasymetric modeling is improved by incorporating and employing multiple ancillary or indicator data layers. The modeling process uses sub-national level census data for each country and ancillary datasets, including land cover, roads, slope, urban areas, village locations, and high resolution imagery analysis. Cells are preferentially weighted for the possible occurrence of population during a day based upon the spatial data and the socioeconomic and cultural understanding of an area. A “likelihood” coefficient for each cell is calculated within each country and applied to the census counts, then the total population for that area is allocated to each cell proportionally at 30 arc-seconds or approximately 1km spatial resolution. The resultant of population count is an ambient or average day/night population count.

WorldPop

The WorldPop project was initiated in October 2013 to combine the AfriPop, AsiaPop and AmeriPop population mapping projects. The gridded population distribution model uses dasymetric modelling methods, a local high resolution census data combined with detailed satellite imagery-based mapping of settlements and land cover information. The new population distribution model of WorldPop follow the dasymetric methodologies with an improved random forest regression tree-based mapping approach. In brief, a new semi-automated dasymetric modeling approach known as "Random Forest" estimation technique has been built that integrates census data and a wide range of open access geospatial datasets in a flexible manner. A combination of remotely-sensed and ancillary datasets (e.g. settlement locations, land cover, roads, building maps, health facility locations, VIIRS nightlights, vegetation, and topography) contribute to the modeled dasymetric weights. The “Random Forest” model is used to generate a gridded prediction of population density at 3 arc-seconds or approximately 100m spatial resolution, which then used as the weighting surface to calculate census counts at a country level. 

List of widely available global population distribution maps and their specific characteristics is summarized in Table 1.

 Table 1: Summary of global population distribution maps

 

GPW, v3

GRUMP, v1

LandScan

WorldPop

Website

 http://sedac.ciesin.co

lumbia.edu/data/

collection/gpw-v3 

http://sedac.ciesin.col

umbia.edu/data/

collection/grump-v1

 

http://www.ornl.gov/

sci/landscan

http://www.worldpop.

org.uk

Resolution

2.5 arc minute

(~4km at the equator)

30 arc second

(~1km at the equator)

30 arc second

(~1km at the equator)

3 arc second

(~100m at the equator)

Scope

Global

Global

Global

Africa, Asia, and America

Products License

Free of charge

Free of charge

Free of charge for U.S. Federal Government agencies. License fees for all others requests, humanitarian organizations, educational research and commercial organizations are determined on a case-by-case basis.

Free of charge

Input data

Census data

Administrative boundaries

Coastlines

Census data

Administrative boundaries

Coastlines

Satellite night time light-derived urban extents

Census data

Administrative boundaries

Land Cover

Coastlines

High resolution imagery

Elevation and slope

Roads

Populated areas (urban boundaries) and populated points (towns and villages)

Census data

Administrative boundaries

Land Cover

Detailed settlement information from sources such as high resolution imagery, OpenStreetMap (OSM), and GRUMP

Products

Population count grids (1990, 1995, 2000)

Population density grids (1990, 1995, 2000)

Population count grid future estimates (2005, 2010, 2015)

Population density grid future estimates (2005, 2010, 2015)

National boundaries (2000)

Subnational boundaries (1990, 1995, 2000)

National identifier grids (1990, 1995, 2000, 2005, 2010, 2015)

Coastlines (2000)

Centroids  of administrative units (1990, 1995, 2000, 2005, 2010, 2015)

Population count grids (1990, 1995, 2000)

Population density grids (1990, 1995, 2000)

Urban settlement points (1990, 1995, 2000)

Urban-extents grids (1995)

Land/geographic unit area grids (1990)

National boundaries (1990)

National identifier grids (1990, 1995, 2000)

Coastlines (2000)

Population count grids (most recently for 2011)

Population count grids (2010, 2015, 2020)

Available only for selected regions and countries:

Numbers of live births per grid square (2010, 2012, 2015, 2020, 2025, 2030, and 2035)

Numbers of pregnancies per grid square (2010, 2012, 2015, 2020, 2025, 2030, and 2035)

Estimates of proportion of people per grid square living in poverty (2008)

Estimates of number of people in each 5-year age group per grid square (2014)

Authors/

Developers

Center for International Earth Science Information Network (CIESIN), Columbia University

Center for International Earth Science Information Network (CIESIN), Columbia University

Oak Ridge National Laboratory (ORNL)

Dr. Andrew Tatem (University of Southampton, UK / Fogarty International Center, National Institutes of Health, USA )

Dr. Catherine Linard (Universite Libre de Bruxelles, Belgium)

Dr. Andrea Gaughan (University of Louisville, USA)

Selected publications

Balk et al., 2001; Balk et al., 2004

Balk et al., 2005; Balk et al., 2010

Bhaduri et al., 2002; Bhaduri et al., 2007; Dobson et al., 2000

Linard et al., 2011; Gaughan et al., 2013; Tatem et al., 2007

Local Level Population Data in the Caribbean: the example of Grenada

The above mentioned examples of global population data are not so very relevant for most of the Caribbean island, as they would be covered by a few cells of the general population distribution maps. Population information should be available at a much larger resolution. This type of data has to come from a national census, where a lot of information is collected at household level, which is then generalized to enumeration district. The data at the household level are not available generally due to privacy reasons. Information at the Enumeration district level may be available, although in some countries there is a lack of willingness to share even this level data with other organizations.  In this example we will show the situation in Grenada, where census data was shared. 

The latest census data for the study was provided by the Central Statistical Office (CSO), Ministry of Finance, government of Grenada. The statistics office collects a whole range of data including, demographic, housing, health, education, income, crime etc. (See Use Case 7.5) and stores it in a relational database for analysis and reporting. In general, a comprehensive database is available with CSO. They collect data at the household level which is relationally linked with Enumeration District and Parishes (administrative boundary - there are 8 Perishes in Grenada). An Enumeration District (ED) is a polygon consists of a certain number of houses that is specially established to facilitate census survey. Grenada is sub-divided into 287 EDs (See Figure 2) including two small islands. However, an ED is not any administrative unit. Further, there is no as such fixed number of households in each ED. The minimum number of households in a ED based on available data are 3 and maximum is 467 (average 126 households).

Figure 2: Enumeration Districts in Grenada with population information. Click to open in GeoNode

53 ED with demographic data.jpg

 

In order to be able to make a proper exposure and risk analysis it is important to change this information to a building level. Exposure analysis is an important step in risk analysis. It is basically spatial relationship of hazard footprints and elements-at-risk such as buildings. It is usually carried out to identify elements that are exposed to a particular hazard and subsequently determine the physical vulnerability of exposed elements. Results of such analysis being used in risk management and planning.  

Therefore we prepared a population distribution map to model how population is spatially distributed over the island. We generated a population distribution map by using census data as main source of population and building footprint as auxiliary source employing basic concepts of dasymetric mapping. The principal concept of dasymetric mapping involves the process of disaggregating  spatial data to a finer unit of analysis, using supplementary (ancillary) data to help refine locations of population or other phenomenon being mapped (Maantay & Andrew, 2009). In order to use both census data (population) and building footprints to produce population distribution map for exposure analysis it was imperative to combine both datasets. In other words, the task was attaching household population with building footprints. The workflow for the preparation of the population distribution map based on 2011 census data is presented is Figure 3.

Figure 3: Flowchart illustrating the procedure for generating a detailed population map for Grenada 

53 flowchart population mapping.jpg

After the characterization of buildings, illustrated in Data Management Book section 5.2, it was essential to geographically link all the buildings with Census Enumeration Districts. In ArcGIS, using spatial overlay techniques all buildings located inside the boundary of a particular ED were selected one by one for each ED and assigned unique Enumeration District ID to respective buildings. This way a relation was built between census data and building footprint map. However, we discovered that the number of dwellings in each ED was higher than actual households reported through the census. The reason was that the quality of the building footprint map was not very good. Many buildings were digitized in several small pieces (polygons) and one single building was sometimes composed of several polygons, with some very small ones, representing sheds etc. To build some kind of relation between the number of households in each ED as reported in the census data and residential houses from the building footprint map, it was imperative to clean multiple small polygons and other temporary small structures from the database without deleting main polygons. With careful observation on sizes of buildings on the satellite imagery in each ED and analyzing the average polygon size from the building footprint map for each ED, thresholds were set to remove small polygons from each building. Polygons smaller than 10m2, 17m2 , 26m2 , 37m2, 50m2, and 100m2 were removed depending on the average size of buildings in each ED.

A separate file was created only for the residential buildings (dwellings). The objective was to assign population with each dwelling. It was impossible to determine the exact number of persons living in each residential building since there was no common attribute between household information from the census data and building footprint map from the physical planning unit. Making any rule based on the size of the residential building was not practical as well, because buildings were not very well digitized. The obvious choice was to distribute (equally) population of each ED from the census to the total number of dwellings in the same ED.  Therefore, finally, the population distribution map for Grenada was generated by dividing the total ED population with the total number of residential buildings in the respective ED. The resulting map is shown in Figure 4.

Figure 4: Resulting population map at building level generated by integrating census data and building footprints. Click to open the file in GeoNode.

53 Population per building.jpg

Obviously these population maps are only the first estimation, as they assume equal population numbers per dwelling, and also do not consider population in other occupancy types. Also the temporal variation of population is not taken into account (e.g. daytime-nighttime variation or seasonal variation). More work is needed to be able to generate such data for the target countries. It starts with the willingness to share existing data, such as census data, and to integrate the survey that are carried out by different agencies into a single shareable database.  The Central Statistics Office of Grenada conducted the survey in 2011 but they did not collect geographical coordinates of buildings where households live. The utility of census data then becomes limited to other applications such as risk assessment. In the next census they may plan to collect building coordinates as well. Alternatively, through a joint effort of all concerned organizations in the country, they recollect building footprints with better quality and in the time of census the field enumerators will just assign unique code from the building footprint to the respective household. The third option could be the census department geo-locate buildings in each ED. However, before taking such an initiative they have to improve the quality of existing building footprints or freshly digitize buildings with latest high resolution satellite imageries.

References

Balk D., Deichmann, U., Yetman, G. (2001). Transforming Population Data for Interdisciplinary Usages: From census to grid.

Balk D., Yetman, G. (2004). The Global Distribution of Population: Evaluating the gains in resolution refinement. Center for International Earth Science Information Network (CIESIN), Columbia University.

Balk, D., Brickman, M., Anderson, B., Pozzi, F., Yetman, G. (2005). Mapping global urban and rural population distributions: Estimates of future global population distribution to 2015. FAO Working Paper: Environment and Natural Resources.

Balk D., Yetman, G., de Sherbinin, A. (2010). Construction of Gridded Population and Poverty Data Sets from Different Data Sources. E– Proceedings of European Forum for Geostatistics Conference, 5-7 October 2010, Tallinn, Estonia.

Bhaduri, B., Bright, E., Coleman, P., & Dobson, J. (2002). LandScan: Locating people is what matters. Geo Informatics, 5(2), pp 34–37.

Bhaduri, B., Bright, E., Coleman, P., & Urban, M. L. (2007). LandScan USA: A high-resolution geospatial and temporal modeling approach for population distribution and dynamics. GeoJournal, 69(1), pp 103–117.

Dobson, J. E., Bright, E. A., Coleman, P. R., Durfee, R. C., & Worley, B. A. (2000). LandScan: A global population database for estimating populations at risk. Photogrammetric Engineering and Remote Sensing, 66(7), pp 849–857.

Eicher, C.L., Brewer, C.A. (2001). Dasymetric Mapping and Areal Interpolation: Implementation and Evaluation. Cartography and Geographic Information Science, 28(2), pp.125-138.

Gaughan A.E., Stevens F.R., Linard C., Jia P., Tatem A.J. (2013) High Resolution Population Distribution Maps for Southeast Asia in 2010 and 2015. PLoS ONE 8(2): e55882.

Jordan, L. (2014). Beyond Borders: Gridded Demographic Data Sets and Web Mapping Applications for Population, Development, and Environmental Research. Paper presented on Population Association of America 2014 Annual Meeting Program, 1-3 May 2014, Boston MA.

Linard, C., Gilbert, M., Tatem A.J. (2011). Assessing the use of global land cover data for guiding large area population distribution modelling. GeoJournal, 76(5), pp 525-538.

Maantay, juliana, & Andrew, M. (2009). Mapping urban risk: Flood hazards, race, & environmnetal justice in New York. Applied Geography, 29(1), 111–124. doi:10.1016/j.apgeog.2008.08.002.Mapping

Rose, A.N., Bright, E. (2014). The LandScan Global Population Distribution Project: Current State of the Art and Prospective Innovation. Paper presented on Population Association of America 2014 Annual Meeting Program, 1-3 May 2014, Boston MA.

Tatem, A.J., Noor, A.M., von Hagen, C., Di Gregorio, A., Hay, S.I. (2007). High Resolution Population Maps for Low Income Nations: Combining Land Cover and Census in East Africa. PLoS ONE 2(12): e1298.

van Westen, C.J., Alkema, D., Damen, M.C.J., Kerle N., Kingma, N.C. (2011). Multi-hazard risk assessment: Distance education course Guide book. United Nations University – ITC School on Disaster Geo-information Management (UNU-ITC DGIM).

Last update: 
15-07-2016

Post new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.
Based on the "Busy-template"