Skip to content

Conversation

@bokelley
Copy link
Contributor

Problem

When the AdCP server restarts and regenerates schemas, it creates new ETags and timestamps even though the schema content is identical. This causes 120+ files to appear as modified in git with only metadata changes - pure noise that clutters git history and creates confusion.

Root Cause

  1. AdCP server regenerates schemas → new ETags in response headers
  2. refresh_adcp_schemas.py downloads → updates .meta files with new ETags
  3. generate_schemas.py sees different ETags → regenerates Python files
  4. Result: 120+ "modified" files with only timestamp/ETag changes

Solution

Replace ETag-based tracking with content-based hashing:

  • Hash the actual JSON schema content (normalized for consistency)
  • Store content hash in generated Python file headers
  • Skip regeneration if content hash unchanged
  • Add early exit optimization to skip expensive datamodel-codegen calls

Changes

Modified scripts/generate_schemas.py:

  1. add_etag_metadata_to_generated_files():

    • Changed from tracking server ETags to hashing schema content
    • Compare existing schema_hash with current content hash
    • Skip file updates when content unchanged
    • Header format: # schema_hash: abc123def456
  2. generate_schemas_from_json():

    • Added early check: compare schema directory hash with __init__.py SCHEMA_HASH
    • Skip calling datamodel-codegen entirely if schemas unchanged
    • Avoids expensive regeneration when not needed

New Schemas (from AdCP spec):

  • webhook-payload - Webhook payload structure for async task notifications
  • task-type - Valid AdCP task types enum

Benefits

Before fix:

  • Server restart → 120+ files modified (metadata only)
  • Clutters git history with noise
  • CI runs unnecessarily
  • Developers confused about "changes"

After fix:

  • Server restart → 0 files modified (schema content unchanged)
  • Clean git status
  • CI skips if no real changes
  • Only updates when schemas actually change

Testing

Tested the workflow:

  1. Generated schemas with content hashes
  2. Simulated server restart (downloaded new .meta files with different ETags)
  3. Re-ran generation → skipped everything (content unchanged)
  4. ✅ git status clean - no spurious modifications

Code Review

  • ✅ Logic is correct and well-tested
  • ✅ MD5 usage appropriate for content fingerprinting (not security)
  • ✅ JSON normalization ensures deterministic hashing
  • ✅ No security vulnerabilities
  • ✅ Performance improvements from early exit optimization

- Replace ETag-based tracking with schema content hashing in generate_schemas.py
- Hash actual JSON schema content instead of server ETags/timestamps
- Skip regeneration when schema content unchanged (early exit optimization)
- Add new webhook-payload and task-type schemas from AdCP spec

Problem: AdCP server restarts regenerate schemas with new ETags but identical
content, causing 120+ files to show as modified with only metadata changes.

Solution: Hash normalized JSON schema content and only update generated files
when content actually changes. Server restarts no longer create git noise.

Benefits:
- Clean git status after schema refresh with no content changes
- Avoids expensive datamodel-codegen calls when unnecessary
- Only commits when schemas genuinely change
- Reduces CI noise and developer confusion
@bokelley bokelley merged commit 5625955 into main Oct 28, 2025
9 checks passed
danf-newton pushed a commit to Newton-Research-Inc/salesagent that referenced this pull request Nov 24, 2025
adcontextprotocol#649)

- Replace ETag-based tracking with schema content hashing in generate_schemas.py
- Hash actual JSON schema content instead of server ETags/timestamps
- Skip regeneration when schema content unchanged (early exit optimization)
- Add new webhook-payload and task-type schemas from AdCP spec

Problem: AdCP server restarts regenerate schemas with new ETags but identical
content, causing 120+ files to show as modified with only metadata changes.

Solution: Hash normalized JSON schema content and only update generated files
when content actually changes. Server restarts no longer create git noise.

Benefits:
- Clean git status after schema refresh with no content changes
- Avoids expensive datamodel-codegen calls when unnecessary
- Only commits when schemas genuinely change
- Reduces CI noise and developer confusion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants