Skip to content

How can we structure the datasets?

Holger Drewes edited this page Aug 18, 2015 · 3 revisions

One of the issues with existing OpenSpending is that it fails to be clear about the availability of data within the platform: is data regarding a particular body of government (e.g. a city or federal state) available? Is it up-to-date? Based on information from a trustworthy source?

These challenges could be addressed with better, more domain-specific metadata. As more metadata is available about the coverage of the data (both in terms of time and institutions), this should also help to be more systematic in mapping out regions or countries.

The outcome of this discussion will be spendb/fiscalmodel, a re-usable Python package with metadata.

Dataset-level metadata

  • Country
  • Language
  • Time period
    • fiscal years vs. actual coverage
  • Level of government
    • international
    • national
    • state / federal / province / district
    • regional
    • municipial
    • cf. Wikipedia.
  • Classification
    • government
    • semi-governmental body (e.g. QUANGOs, PPPs, IGOs, public schools etc.)
    • not-for-profit organisations
    • for-profit companies
  • Publisher / Subject (e.g. WB BOOST about Gov of Kenya)
    • Title
    • License (official works, public domain, openly licensed, copyright pending)
    • Informally disclosed (y/n)
    • URL
    • Released as data / pdf / ...
    • Uploaded by government
  • Status
    • "proposed", "approved", "adjusted", or "executed"
    • Planning
    • Government draft
    • Budget
    • Report
    • Audited report
  • Activity
    • upstream vs. downstream
    • Budget
    • Procurement
    • Spending
    • Report
  • Flow
    • Expenditure
    • Revenue
  • Sector / Topics
  • Granularity
    • programs
    • items
    • projects
    • payments

Dimensions

  • measure (bool)

    • money
    • related statistic
  • temporal

    • end of period
    • beginning of period
    • year
  • functional classification

  • economic classification

  • institutional classification

  • products classification

  • company name

  • public body identifier

Better ways to run the modelling

  1. Select all monetary facets
  2. Select all timestamps in the dataset
  3. How is data classified in this dataset?
    • Functional
    • Economic
    • Programmes/Products
    • Institutional (e.g. by Department)