Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 4, 2025

Fix GitHub Domain Redaction in Agent Log Ingestion ✅

Problem

GitHub domains were being redacted during the agent log ingestion phase. The sanitization logic in sanitize.cjs only allowed domains from a static allowlist, which didn't account for:

  • The current GitHub API root (api.github.com for GitHub.com, but different for GitHub Enterprise)
  • The current GitHub server URL and its variations
  • Raw content domains (raw.githubusercontent.com and variations)

This caused legitimate GitHub URLs to be replaced with (redacted) in agent logs, making debugging and analysis difficult.

Solution ✅

  • Investigate the issue and understand the codebase
  • Add environment variables to pass GitHub server URL and API URL to sanitization
  • Update sanitize.cjs to extract and allow domains from GitHub context URLs
  • Add raw content domain support for GitHub.com and GitHub Enterprise
  • Add tests to verify GitHub domains are not redacted
  • Run tests and validate the fix
  • Format and lint code
  • Merge main branch

Implementation Details

1. Compiler Changes (pkg/workflow/compiler_yaml.go)

Added two new environment variables to the "Ingest agent output" step:

env:
  GITHUB_SERVER_URL: ${{ github.server_url }}
  GITHUB_API_URL: ${{ github.api_url }}

These variables provide the current GitHub deployment's server and API URLs to the sanitization JavaScript code.

2. JavaScript Sanitization Updates (pkg/workflow/js/sanitize.cjs)

New extractDomainsFromUrl() function:

  • Parses GitHub URLs and extracts hostnames
  • For github.com:
    • Adds api.github.com (API endpoint)
    • Adds raw.githubusercontent.com (raw content)
    • Adds *.githubusercontent.com (wildcard for all githubusercontent subdomains)
  • For custom GitHub Enterprise domains:
    • Adds api. prefix (e.g., api.github.example.com)
    • Adds raw. prefix (e.g., raw.github.example.com)
  • Handles invalid URLs gracefully (returns empty array)

Updated sanitizeContent() function:

  • Reads GITHUB_SERVER_URL and GITHUB_API_URL from environment
  • Extracts domains from these URLs dynamically using extractDomainsFromUrl()
  • Appends extracted domains to the allowed domains list
  • Removes duplicates using Set for efficient deduplication

3. Test Updates

Modified tests (to handle GitHub environment variables in test environment):

  • sanitize_output.test.cjs - "should respect custom allowed domains from environment"
  • sanitize_output.test.cjs - "should handle empty environment variable gracefully"
  • collect_ndjson_output.test.cjs - "should handle custom allowed domains from environment"

New tests:

  • sanitize_output.test.cjs - "should allow GitHub domains from environment variables"
    • Validates dynamic extraction from custom GitHub Enterprise URLs
    • Verifies both server and API URLs are properly allowed
    • Confirms raw content domains are allowed
    • Confirms custom domains are still respected
  • sanitize_output.test.cjs - "should allow raw.githubusercontent.com for github.com"
    • Validates that GitHub.com gets raw.githubusercontent.com support
    • Verifies raw content URLs are not redacted

All tests properly save/restore environment variables to prevent cross-test interference.

How It Works

The sanitization process now follows these steps:

  1. Read configured domains from GH_AW_ALLOWED_DOMAINS (from workflow network config)
  2. Extract GitHub domains from GITHUB_SERVER_URL (e.g., github.com[github.com, api.github.com, raw.githubusercontent.com, *.githubusercontent.com])
  3. Extract API domains from GITHUB_API_URL (e.g., api.github.com)
  4. Merge all domains (configured + GitHub server + GitHub API)
  5. Remove duplicates for efficient lookup
  6. Apply filtering to content using the merged allowlist

Examples

GitHub.com Environment

GITHUB_SERVER_URL=https://github.com
GITHUB_API_URL=https://api.github.com
GH_AW_ALLOWED_DOMAINS=example.com

// Result: [example.com, github.com, api.github.com, raw.githubusercontent.com, *.githubusercontent.com]
// ✓ https://github.com/repo → allowed
// ✓ https://api.github.com/v1 → allowed
// ✓ https://raw.githubusercontent.com/owner/repo/main/file.txt → allowed
// ✓ https://example.com/page → allowed
// ✗ https://malicious.com → redacted

GitHub Enterprise Environment

GITHUB_SERVER_URL=https://github.saobby.my.eu.orgpany.com
GITHUB_API_URL=https://api.github.company.com
GH_AW_ALLOWED_DOMAINS=internal.company.com

// Result: [internal.company.com, github.company.com, api.github.company.com, raw.github.company.com]
// ✓ https://github.saobby.my.eu.orgpany.com/repo → allowed
// ✓ https://api.github.company.com/v1 → allowed
// ✓ https://raw.github.company.com/owner/repo/main/file.txt → allowed
// ✓ https://internal.company.com/doc → allowed
// ✗ https://external.com → redacted

Verification ✅

  • ✅ All JavaScript tests pass (648 tests)
  • ✅ All Go unit tests pass
  • ✅ Code formatting validated (prettier)
  • ✅ Linting passed
  • ✅ Raw content domain support verified
  • ✅ Main branch merged successfully

Impact

This fix ensures that GitHub URLs (including raw content URLs) are never redacted in agent logs, regardless of:

  • GitHub deployment type (GitHub.com vs GitHub Enterprise)
  • Custom domain configurations
  • Network permission settings

GitHub domains are now always allowed in addition to any configured allowed domains, making log analysis and debugging significantly easier.

Original prompt

Investigate the agent log ingestion phase. It seems that GitHub domains are being redacted away.Aside from tbe domains list, it should allow the current GitHub api root, and variations

See logs at https://github.com/githubnext/gh-aw/actions/runs/19059684375/job/54436801071#step:43:1


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits November 4, 2025 15:35
- Add GITHUB_SERVER_URL and GITHUB_API_URL environment variables to sanitization step
- Update sanitize.cjs to extract and allow GitHub domains from environment
- Add extractDomainsFromUrl helper function to parse GitHub URLs
- Update tests to handle GitHub environment variables properly
- Ensure GitHub domains are always allowed in addition to configured domains

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan
Copy link
Contributor

pelikhan commented Nov 4, 2025

@copilot also support raw content domains

Copilot AI changed the title [WIP] Investigate agent log ingestion phase for GitHub domains Fix GitHub domain redaction in agent log ingestion Nov 4, 2025
Copilot AI requested a review from pelikhan November 4, 2025 15:43
- Update extractDomainsFromUrl to include raw.githubusercontent.com for github.com
- Add raw.hostname variation for GitHub Enterprise domains
- Add wildcard *.githubusercontent.com support
- Add test for raw.githubusercontent.com support
- Update existing test to verify raw content domains are allowed

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Nov 4, 2025

@copilot also support raw content domains

Added raw content domain support in commit 0d86a5b. The extractDomainsFromUrl function now includes:

For GitHub.com:

  • raw.githubusercontent.com
  • *.githubusercontent.com

For GitHub Enterprise:

  • raw.{hostname} (e.g., raw.github.company.com)

Raw content URLs are now automatically allowed based on the GITHUB_SERVER_URL environment variable.

@pelikhan pelikhan marked this pull request as ready for review November 4, 2025 15:56
Copilot AI review requested due to automatic review settings November 4, 2025 15:56
@pelikhan pelikhan enabled auto-merge (squash) November 4, 2025 15:56
@pelikhan pelikhan disabled auto-merge November 4, 2025 15:56
@pelikhan
Copy link
Contributor

pelikhan commented Nov 4, 2025

@copilot merge main

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the URL sanitization system to automatically allow GitHub domains from environment variables, enabling better support for GitHub Enterprise deployments with custom domains.

  • Adds extractDomainsFromUrl() function to extract domain variations from GitHub server and API URLs
  • Updates sanitization logic to dynamically include GitHub domains from GITHUB_SERVER_URL and GITHUB_API_URL environment variables
  • Adds comprehensive test coverage for custom domain handling and environment variable cleanup

Reviewed Changes

Copilot reviewed 62 out of 62 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/workflow/js/sanitize.cjs Adds domain extraction function and dynamic GitHub domain detection
pkg/workflow/js/sanitize_output.test.cjs Adds tests for environment variable handling and domain extraction
pkg/workflow/js/collect_ndjson_output.test.cjs Updates test to properly clean up environment variables
pkg/workflow/compiler_yaml.go Adds GITHUB_SERVER_URL and GITHUB_API_URL to workflow environment
.github/workflows/*.lock.yml Generated workflow files with updated sanitization logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…og-ingestion-phase

# Conflicts:
#	.github/workflows/smoke-copilot.firewall.lock.yml
#	.github/workflows/smoke-opencode.lock.yml
Copy link
Contributor Author

Copilot AI commented Nov 4, 2025

@copilot merge main

Merged main in commit 8525ab7. Resolved conflicts by removing deleted workflow files (smoke-copilot.firewall and smoke-opencode) and recompiled all lock files. All tests pass.

@pelikhan pelikhan merged commit 2ca0632 into main Nov 4, 2025
3 checks passed
@pelikhan pelikhan deleted the copilot/investigate-log-ingestion-phase branch November 4, 2025 16:06
Copilot AI requested a review from pelikhan November 4, 2025 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants