Skip to content
This repository has been archived by the owner on Oct 7, 2022. It is now read-only.

Aggregate polygons of one shapefile by another #44

Open
mikefab opened this issue Oct 13, 2018 · 0 comments
Open

Aggregate polygons of one shapefile by another #44

mikefab opened this issue Oct 13, 2018 · 0 comments

Comments

@mikefab
Copy link
Collaborator

mikefab commented Oct 13, 2018

Problem description

When working with region-based datasets, we usually encounter the situation where the polygons in one dataset do not geospatially align with the polygons in another.

Examples:

  • Cell tower's coverage areas VS. administrative divisions
  • Voronoi diagram of Ebola outbreak points VS. administrative divisions
  • Health zones VS. population administrative divisions

One feature we want for the MagicBox Platform is the ability to aggregate the values for polygons of one shapefile/GeoJSON/csv based on the polygons of another shapefile/GeoJSON/csv. Hence, corresponding with the examples above:

  • Cell tower's coverage areas determine where cellphone calls originate. To know mobility i.e. how people move from one place to another, we need to know where people are at a given time and the call locations will tell us that. But we are not interested in the human movement from one coverage area to another; rather, we want human movement from one administrative division ("admin" for short) to another. To generate this "mobility by admin" data, we have to aggregate the mobility values from the "coverage area" shapefile/GeoJSON/csv to the "admins" shapefile/GeoJSON/csv. The output will be a new column in the "admins" file: the mobility_value column. Summarized problem statement: Aggregate mobility values by admins, based on mobility values by cell tower's coverage areas.
  • From the Ebola outbreak points, we can generate a Voronoi diagram showing the probability of new contract cases. These Voronoi polygons of course won't fall squarely on the admins polygons. What we want, then, is to assign the new_case_probability value to the admins, so that we know which admins are more vulnerable than others and thus need more attention and resources from the government/aid agencies. Summarized problem statement: Aggregate new case probabilities by admins, based on Voronoi diagram of Ebola outbreak points.
  • We have the number of Zika cases per health zone, but those values will be more helpful to MagicBox data scientists if they are by population area. An derived use case is to connect this case-per-admin data with the mobility-per-admin data above and plot the (potential) spread of disease. Alternatively, we might have data on the medical capacity per health zone (e.g. the number of doctors/nurses/community health workers/vaccines). If we can aggregate these values by population admin, governments and policy-makers will be informed of how well or uderrserved each admin is. Summarized problem statement: Aggregate [a numeric attribute] by admins, based on [that attribute's values] by health zones.

The pattern is spotted:

  • Input: geofile_1, geofile_2, an attribute_of_interest that exists in geofile_1
  • Output: geofile_2 augmented with attribute_of_interest

An example use case

Here you see a Voronoi diagram over a country shapefile. The generating points are arbitrary, but we can assume they are cell tower locations (pairs of coordinates to be exact).
screen shot 2018-10-03 at 2 16 44 pm

This Medium article discusses in-depth this use case, though it goes one step further with the output. Going beyond (origin_admin, destination_admin, mobility_value) for each row/tuple in the resulting geofile, it asks for a mobility matrix. The latter is easily generated from the former, though.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant