Generate CodeQL Models-as-Data (MaD) summaries (sources, sinks, summaries) from existing CodeQL databases and export them in multiple formats suitable for:
- Data extensions (YAML) for CodeQL packs
 - Customization libraries (
.qll) - Bundled packs containing generated customizations
 - Raw JSON for further processing
 
- Automated download of CodeQL databases via the Code Scanning API (when a token is provided)
 - Multiple export formats: 
json,extensions,customizations,bundle - GitHub Action + GH CLI extension + direct CLI usage
 - Automatic language detection from database metadata (fallback to manual selection)
 - Caching support (skip with 
--disable-cache) - Supports (current): 
java,csharp 
Currently limited to the languages enforced in the code (CODEQL_LANGUAGES):
- Java
 - C#
 
Requests / PRs to add more languages are welcome once the upstream model generator queries support them.
- name: Generate CodeQL Summaries
  uses: advanced-security/codeql-summarize@v0.2.0
  with:
    projects: ./projects.json
    token: ${{ secrets.CODEQL_SUMMARY_GENERATOR_TOKEN }}
    format: extensions
    output: ./generatedgh extension install advanced-security/gh-codeql-summarize
gh codeql-summarize --helpExample:
gh codeql-summarize \
  --format bundle \
  --input examples/projects.json \
  --output ./examplesgit clone https://github.com/advanced-security/codeql-summarize.git
cd codeql-summarize
pipenv install --dev  # or pip install -e . if a setup is added later
pipenv run python -m codeqlsummarize --helpMinimal invocation (using a local database + explicit language):
python -m codeqlsummarize \
  -db /path/to/codeql-db \
  -l java \
  -f json \
  -o ./out| Input | Description | Default | 
|---|---|---|
project | 
Single repository (owner/name) to summarize | (none) | 
projects | 
Path to a JSON file mapping language to list of repositories | ./projects.json | 
language | 
Comma-separated language list (overrides auto-detect) | (auto) | 
format | 
Export format: json, extensions, customizations, bundle | 
extensions | 
output | 
Output directory (or file for certain formats) | ./ | 
repository | 
GitHub repository context (fallback for project) | 
${{ github.repository }} | 
token | 
GitHub token used to download databases | ${{ github.token }} | 
Notes:
- To download CodeQL databases the token must have appropriate permissions (typically 
security_events:read/repodepending on visibility). A fine‑grained PAT with Code scanning read access is recommended. - If a database cannot be downloaded it will be skipped.
 
Example (examples/projects.json):
{
  "java": ["ESAPI/esapi-java-legacy"]
}Structure: <language> → array of <owner>/<repo> strings.
| Format | Description | Output Shape | 
|---|---|---|
json | 
Raw rows per model type | One JSON file per database / summary (future enhancement) | 
extensions | 
Data extensions YAML under a CodeQL pack structure | Writes .yml under generated/ inside the detected pack | 
customizations | 
Single .qll customization library aggregating models | 
Requires -o <file>.qll | 
bundle | 
Initializes / updates a CodeQL pack containing generated customizations | Creates / updates pack in output dir | 
bundle will (if necessary) create a pack (e.g. java-summarize/) and generate per‑repository .qll files plus a Customizations.qll aggregator.
| Variable | Purpose | 
|---|---|
GITHUB_TOKEN | 
Default token for API calls (Actions) | 
GITHUB_REPOSITORY | 
Default repo context (owner/name) | 
RUNNER_TEMP | 
Temp directory root (Actions) | 
DEBUG | 
If set (non-empty) enables debug logging | 
The tool skips repositories whose databases cannot be fetched or located, logging warnings rather than stopping the entire run.
- Maintain a 
projects.jsonfile listing target repositories per language. - Schedule a workflow (e.g. nightly) to regenerate models.
 - Commit or publish the generated Data Extensions / Pack as needed.
 - Consume generated models in downstream CodeQL analysis.
 
Run tests:
pipenv run python -m unittest -vLint / format:
pipenv run black .See CONTRIBUTING.md. Please open an issue before large changes.
See SECURITY.md.
See SUPPORT.md. For general questions open a GitHub issue.
- Limited language set (Java, C#)
 - No parallel download throttling handling yet
 - No direct GitHub language detection fallback implemented
 - JSON exporter minimal (subject to enhancement)
 
Licensed under the MIT License – see LICENSE.
- @GeekMasher – Author
 - @zbazztian – Major contributor