This repository contains a set of Shell and Python scripts for downloading, unzipping, cleaning, and filtering GitHub event data specifically for the nodejs/node repository. The data is then processed and analyzed using Polars in Node.js to generate key metrics and visualizations.
Poetry for managing Python dependencies.
First, install the required dependencies and execute the script to download and preprocess the data:
poetry install
sh process_gharchive.sh
This will generate a cleaned JSON file at data/final/node.json
. This file should then be copied to the node-metrics
directory to be used for further analysis.
cp data/final/node.json node-metrics/data/node.json
The Node.js part of the project processes and visualizes the pre-built JSON data using the Polars library and Chart.js. It calculates key metrics such as the most active contributors, the rolling mean of pull requests over time, and the number of open issues.
To set up and run the analysis:
cd node-metrics
npm install
node src/index.js
This will generate server-side charts to visualize the metrics.
Scripts for data download, extraction, cleaning, and filtering. Outputs a final JSON file with the processed nodejs/node events.
Analyzes the pre-processed JSON data. Uses Polars for data manipulation and Chart.js for visualization. Includes modular components for data loading, metric calculation, and chart rendering.
Ensure you have all dependencies installed before running the scripts.