A script gets specified number of last merged PRs from a Github repository, anonymizes sensitive data, and saves the PRs into a file that is later used for PR analysis in the co-create tool.
python_graphql_client
(for retrieving PR data from GitHub)faker
(for anonymizing sensitive data)
there is a requirements.txt file, but if you prefer to install them directly:
pip3 install python_graphql_client
pip3 install faker
python3 src/get_prs.py organization repository number-of-prs-to-get your-github-token
For example, for getting the last 15 merged PRs from https://github.com/symfony/polyfill repository, you would use:
python3 src/get_prs.py symfony polyfill 15 your-github-token
Personal GitHub token can be generated from this page and you'll need to select the repo
scope in order to get the data needed for the PR analysis. You can see what type of data is retrieved using GraphQl in queries.py.
If your organization requires SSO, you'll need to authorize the token using SSO in the Personal access tokens (classic) page.
Data is anonymized by default. Data that is anonymized are:
- GitHub usernames of the PR author and reviewers/commenters (
'login'
field in GraphQl response from GitHub), and - PR comments body (
'bodyText'
in GraphQl)
For every username a substitute is used instead. The script also produces a file with the mapping from usernames to substitutes that were used, and it's prefixed with Username_substitutes
.
To turn off anonymization, you can pass --plain
as the last argument.
Things that are not anonymized from the potentially sensitive data are:
- name of the repository (if it is private), and
- PR numbers, which are used in the PR URLs in the following format
https://github.com/symfony/polyfill/pull/427
(repository name + PR number)
There is possibility to anonymize this data as well, but in that case URLs in the PR analysis report won't work.