Skip to content

Conversation

@bokelley
Copy link
Contributor

Summary

Fixes unnecessary meta file updates caused by weak ETag changes on the schema server, even when content is identical.

Problem

  • Weak ETags (W/"...") change on every server restart/deployment
  • Schema validator was updating all 59 meta files whenever ETag changed
  • Created git noise from timestamp-only changes despite no schema changes
  • Makes commits harder to review with 50+ "changed" files

Solution

Added content hash verification using SHA-256:

  1. Compute SHA-256 hash of JSON schema content (with sorted keys)
  2. Compare content hashes before updating files
  3. Skip file writes if content hash matches despite new ETag
  4. Store content_hash in metadata alongside etag and timestamps

Changes

  • tests/e2e/adcp_schema_validator.py:

    • Add content_hash computation and comparison in _download_schema_index() and _download_schema()
    • Only update files when content actually changes
  • docs/schema-caching-strategy.md:

    • Document content_hash field in metadata format
    • Explain weak ETag behavior and solution
  • schemas/v1/*.meta:

    • Add content_hash field to all 59 meta files
    • Update index.json to v2.2.0 (new webhook-payload and task-type schemas)

Test Plan

  • Run schema validator twice - second run shows no file changes
  • Pre-commit hooks pass (all 38 checks)
  • Unit tests pass (850 passed)
  • Future weak ETag changes won't trigger meta file updates

Impact

✅ No more false updates when weak ETags change
✅ Significantly reduced git noise in pull requests
✅ All ETag-based caching benefits maintained
✅ Backward compatible - works with old meta files too

🤖 Generated with Claude Code

Problem:
- Weak ETags (W/"...") change on every server restart/deploy
- Meta files were updating even when schema content was identical
- Created unnecessary git noise from ETag/timestamp-only changes
- 59 meta files changed despite no actual schema changes

Solution:
- Add SHA-256 content hash to meta files
- Compare content hashes before updating files
- Only update meta files when content actually changes
- Weak ETag changes alone no longer trigger file updates

Changes:
- tests/e2e/adcp_schema_validator.py:
  - Add content_hash to metadata in both _download_schema_index and _download_schema
  - Compute SHA-256 hash of JSON content (with sorted keys)
  - Skip file updates if content_hash matches despite new ETag
  - Return cached data without writing to avoid git noise

- docs/schema-caching-strategy.md:
  - Document content_hash field in metadata format
  - Explain weak ETag behavior and why content hash is needed
  - Update implementation examples to show content hash verification

- schemas/v1/*.meta:
  - Add content_hash field to all 59 meta files
  - Update index.json to v2.2.0 (new webhook-payload and task-type schemas)

Impact:
- Future server restarts won't cause meta file churn
- Only real schema changes will show in git diff
- Reduces noise in pull requests and commits
- Maintains all ETag-based caching benefits

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@bokelley bokelley merged commit 20b0a16 into main Oct 28, 2025
9 checks passed
EmmaLouise2018 pushed a commit that referenced this pull request Oct 29, 2025
Problem:
- Weak ETags (W/"...") change on every server restart/deploy
- Meta files were updating even when schema content was identical
- Created unnecessary git noise from ETag/timestamp-only changes
- 59 meta files changed despite no actual schema changes

Solution:
- Add SHA-256 content hash to meta files
- Compare content hashes before updating files
- Only update meta files when content actually changes
- Weak ETag changes alone no longer trigger file updates

Changes:
- tests/e2e/adcp_schema_validator.py:
  - Add content_hash to metadata in both _download_schema_index and _download_schema
  - Compute SHA-256 hash of JSON content (with sorted keys)
  - Skip file updates if content_hash matches despite new ETag
  - Return cached data without writing to avoid git noise

- docs/schema-caching-strategy.md:
  - Document content_hash field in metadata format
  - Explain weak ETag behavior and why content hash is needed
  - Update implementation examples to show content hash verification

- schemas/v1/*.meta:
  - Add content_hash field to all 59 meta files
  - Update index.json to v2.2.0 (new webhook-payload and task-type schemas)

Impact:
- Future server restarts won't cause meta file churn
- Only real schema changes will show in git diff
- Reduces noise in pull requests and commits
- Maintains all ETag-based caching benefits

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
danf-newton pushed a commit to Newton-Research-Inc/salesagent that referenced this pull request Nov 24, 2025
…extprotocol#659)

Problem:
- Weak ETags (W/"...") change on every server restart/deploy
- Meta files were updating even when schema content was identical
- Created unnecessary git noise from ETag/timestamp-only changes
- 59 meta files changed despite no actual schema changes

Solution:
- Add SHA-256 content hash to meta files
- Compare content hashes before updating files
- Only update meta files when content actually changes
- Weak ETag changes alone no longer trigger file updates

Changes:
- tests/e2e/adcp_schema_validator.py:
  - Add content_hash to metadata in both _download_schema_index and _download_schema
  - Compute SHA-256 hash of JSON content (with sorted keys)
  - Skip file updates if content_hash matches despite new ETag
  - Return cached data without writing to avoid git noise

- docs/schema-caching-strategy.md:
  - Document content_hash field in metadata format
  - Explain weak ETag behavior and why content hash is needed
  - Update implementation examples to show content hash verification

- schemas/v1/*.meta:
  - Add content_hash field to all 59 meta files
  - Update index.json to v2.2.0 (new webhook-payload and task-type schemas)

Impact:
- Future server restarts won't cause meta file churn
- Only real schema changes will show in git diff
- Reduces noise in pull requests and commits
- Maintains all ETag-based caching benefits

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants