Since the extraction module can result in metrics of a single dataset being divided over different output files, the aggregation module was implemented to aggregate the metrics of each dataset. Once these metrics are aggregated, the aggretion module will construct a taxonomy tree for each dataset. The metrics and the taxonomy tree are written to a CartoDB table that serves as a back end for the extension.
- click: this package is used to create a command line interface.
- nesting: this package is used to construct the taxonomy tree.
- requests: this package is used to communicate with the CartoDB API.
src/aggregator.py
: contains two classes. TheReportAggregator
class will read all JSON files in a given directory, and merge the data into one JSON structure (actually, that is a Python dict). This results in one set of metric counts per dataset. TheCartoDBWriter
will write the data to a CartoDB table using therequests package
.src/test_aggregate_reports.py
: this file contains unit tests for theReportAggregator
. The easiest way to run these tests is by using thenose testing package
.bin/aggregate_metrics.py
: Python script to run from the command line. It will use theReportAggregator
andCartoDBWriter
class to aggregate the data and write it to CartoDB.
-
Install the requirements:
$ pip install nesting $ pip install requests
-
Put all the output files of the extraction module in a single directory. Make sure no other JSON files are in there.
-
Create a
settings.json
file. That file should contain JSON data and store the CartoDB API key with the tagapi_key
. -
Run the aggregator:
python bin/aggregate_metrics.py <data directory> <settings.json> data directory: this should point to a directory containing chunks of metric data. metric data should be in json and ordered by dataset key. settings.json: contains the `api_key` that will be used to contact the cartodb API.
The script will aggregate all metrics data in the path_to_data_directory
and write the results to the CartoDB table. Every dataset key and response from the CartoDB API is written to the command line. Redirect this output to a file if subsequent analysis is required.