This repo contains all the code needed to calculate the monthly Wikimedia movement metrics related to content and contributors. It has three main dependencies:
- This code is designed to run on one of the SWAP servers and will not work elsewhere.
- The contributors-related metrics are calculated from the mediawiki_history dataset.
- The content-related metrics are calculated from the AQS API.
For more details about our monthly reporting process, see mw:Product Analytics/Movement metrics.
For a full list of metric definitions, see mw:Wikimedia Product/Data dictionary.
- Clone this onto one of the SWAP hosts.
- In any order, run the two notebooks numbered 01
- 01a-editor-month-table.ipynb: creates or updates an intermediate editor-month table in the neilpquinn Hive database.
- 01b-new-editor-table.ipynb: creates or updates an intermediate table of new editors in the neilpquinn Hive database.
- Run the notebook 02-calculation.ipynb, which actually calculates the metrics (some of them using the editor-month and new editor tables calculated in the previous step) and inserts them into metrics.tsv.
- Run the notebook 03-report.ipynb, which does a few simple transformations on the metrics and produces the table of values needed for the final report, as well as a graph of each metric.
- Do any analysis you need to understand major trends (drawing on the analysis notes in past months' slides if needed). The analysis folder has a variety of notebook you could reuse; if you do new analysis, considering keeping it in an existing or new notebook in this folder, so it can be reused in the future.