Skip to content

Tasks: Updating Census data

James McKinney edited this page Feb 26, 2017 · 7 revisions

Every five years, we must update the Census data on which Represent relies. The last update was for Census 2016.

First, manually compare the tables for Census division and subdivision types across Census years. Note any new types to be integrated. Then, update the software and spreadsheets as described below.

Software

The following repositories may require updates, in this order:

ocd-division-ids

  • utils.rb: Update census_division_type_names and census_subdivision_type_names
  • ca_census_divisions.rb: Update names
  • ca_census_subdivisions.rb: Update names, name
  • ca_municipal_subdivisions.rb: Update names, posts_count, has_children, type_map, census_subdivisions_on, census_subdivisions_sk
  • ca_provinces_and_territories.rb: Update rows
  • ca_regions.rb: Update rows
  • classes.rb: Update normalize, @name_mappings, @type_patterns

Also, in scripts/country-ca/, grep for \b\d{7}\b|ocd-division/country:ca to find constants.

Regenerate the identifiers and update country-ca.csv in represent-canada-data and scrapers-ca:

curl -O https://raw.githubusercontent.com/opencivicdata/ocd-division-ids/master/identifiers/country-ca.csv

represent-canada

If Canada adds or removes a province or territory, update key_map in finder/static/js/data.js.

Also, grep for ocd-division/country:ca to find constants.

represent-canada-data

  • tasks.py: In spreadsheet, update the StatCan URLs and the following for-loops
  • constants.py: Update municipal_subdivisions
  • boundaries/ca_cd/definition.py: Update URLs and run invoke shapefiles --base=boundaries/ca_cd
  • boundaries/ca_csd/definition.py: Update URLs and run invoke shapefiles --base=boundaries/ca_csd

Also, grep for (?<!'division_id': )'ocd-division/country:ca|[^\n,:-]\b\d{7}\b[^&.<]|[^\n',/]ocd-division/country:ca|[^/]cs?d: to find constants except in division_id keys of definition files, manifest, country-ca.csv, file paths, and data files.

Run:

  • ruby boundaries/ca_qc_districts/sets.rb and its following steps
  • invoke definitions
  • invoke definitions --base=../represent-canada-private-data
  • ../represent-canada/manage.py analyzeshapefiles -d . > manifest
  • invoke spreadsheet --base=. --private-base=../represent-canada-private-data

Note: The geographic codes in the following files are validated by the definitions task:

  • boundaries/ca_nb_wards/definition.py
  • boundaries/ca_ns_districts/definition.py
  • boundaries/ca_on_waterloo_wards/definition.py
  • boundaries/ca_qc_districts/definition.py

represent-canada-private-data

Also, grep for (?<!'division_id': )'ocd-division/country:ca|[^\n,:-]\b\d{7}\b[^&.<]|[^\n',/]ocd-division/country:ca|[^/]\bcs?d: to find constants.

Note: The geographic codes in the following files are validated by the definitions task:

  • boundaries/ca_sk_divisions/definition.py

scrapers-ca

  • tasks.py: In get_definition, update the StatCan URLs and the following for-loops

Also, grep for (?<!division_id = )'ocd-division/country:ca|[^:]\b\d{7}\b|[^\n',]ocd-division/country:ca to find constants except in division_id variables of __init__.py files.

Run invoke tidy

scrapers_ca_app

  • reports/management/commands/status.py: Update the StatCan URLs and the following for-loops

Also, in reports/, grep for [^:]\b\d{7}\b|ocd-division/country:ca to find constants.

Run heroku run pupa dbinit ca

Spreadsheets

The following spreadsheets store Census codes, names and populations:

Boundaries data request progress is validated by the spreadsheet task in represent-canada-data. To validate and update the others, from scrapers-ca, run:

invoke validate_spreadsheet 'https://docs.google.com/spreadsheets/d/1AmLQD2KwSpz3B4eStLUPmUQJmOOjRLI3ZUZSD5xUTWM/pub?gid=0&single=true&output=csv' Code Name
invoke validate_spreadsheet 'https://docs.google.com/spreadsheets/d/1AmLQD2KwSpz3B4eStLUPmUQJmOOjRLI3ZUZSD5xUTWM/pub?gid=743638453&single=true&output=csv' Code Name
invoke validate_spreadsheet 'https://docs.google.com/spreadsheets/d/11qUKd5bHeG5KIzXYERtVgs3hKcd9yuZlt-tCTLBFRpI/pub?gid=0&single=true&output=csv' Identifier Name
invoke validate_spreadsheet 'https://docs.google.com/spreadsheets/d/11qUKd5bHeG5KIzXYERtVgs3hKcd9yuZlt-tCTLBFRpI/pub?gid=1&single=true&output=csv' Identifier Name
invoke validate_spreadsheet 'https://docs.google.com/spreadsheets/d/11qUKd5bHeG5KIzXYERtVgs3hKcd9yuZlt-tCTLBFRpI/pub?gid=2&single=true&output=csv' Identifier Name

You will need to update the populations in Data catalog contact information and Boundaries data request progress.