Skip to content

Fix AdCP schema caching: Replace time-based with ETag validation#396

Merged
bokelley merged 7 commits intomainfrom
bokelley/confirm-budget-number
Oct 14, 2025
Merged

Fix AdCP schema caching: Replace time-based with ETag validation#396
bokelley merged 7 commits intomainfrom
bokelley/confirm-budget-number

Conversation

@bokelley
Copy link
Collaborator

Problem

24-hour time-based cache was inappropriate for rapidly evolving AdCP spec:

  • Assumed schemas valid for 24 hours (wrong during active development)
  • Couldn't detect changes within cache window
  • Led to persistent stale schemas causing test failures
  • budget.json schema persisted in cache despite not existing in official spec

Solution: ETag-Based HTTP Conditional GET

Replaced time-based caching with server-driven validation:

  • ✅ Always checks with server using If-None-Match header
  • ✅ Server returns 304 Not Modified if unchanged (use cache)
  • ✅ Server returns 200 OK if changed (download new version)
  • ✅ Falls back to cache if server unavailable (resilient)

Key Changes

1. Schema Validator (tests/e2e/adcp_schema_validator.py)

  • Deprecated time-based _is_cache_valid() logic
  • Implemented ETag-based conditional GET in download methods
  • Added metadata file tracking (.meta files store ETags)
  • Graceful fallback to cache on network errors

2. Schema Refresh Tool (scripts/refresh_adcp_schemas.py)

  • Clean slate approach: deletes all cached schemas before refresh
  • Handles both .json and .meta files
  • Verifies no outdated references remain (e.g., budget.json)

3. Budget Format Fix (tests/e2e/adcp_request_builder.py)

  • Fixed top-level budget: plain number (was object)
  • Fixed package budget: plain number (was object)
  • Matches official AdCP spec (currency from pricing_option_id)

4. Cache Metadata (.gitignore)

  • Ignore .meta files (local ETag cache metadata)

Benefits

  • Always fresh - detects changes immediately, no 24h delay
  • Bandwidth efficient - only downloads when schemas actually change
  • Resilient - falls back to cache if server unavailable
  • Zero maintenance - automatic in online mode

Schema Updates

  • Deleted obsolete budget.json (doesn't exist in official spec)
  • Added new schemas from official source (performance-feedback, etc.)
  • All schemas now validated against https://adcontextprotocol.org

Documentation

  • docs/schema-caching-strategy.md - Technical implementation guide
  • BUDGET_FORMAT_FINDINGS.md - Budget format analysis
  • SCHEMA_CACHE_SOLUTION.md - Solution overview

Testing

  • ✅ All unit tests pass (697 passed)
  • ✅ All integration tests pass (192 passed)
  • ✅ ETag caching validated manually
  • ✅ Schema refresh tested with dry-run and actual execution

Migration Notes

No action required - ETag caching is automatic and backward compatible.

bokelley and others added 7 commits October 14, 2025 00:49
## Problem
24-hour time-based cache was inappropriate for rapidly evolving AdCP spec:
- Assumed schemas valid for 24 hours (wrong during active development)
- Couldn't detect changes within cache window
- Led to persistent stale schemas causing test failures
- budget.json schema persisted in cache despite not existing in official spec

## Solution: ETag-Based HTTP Conditional GET
Replaced time-based caching with server-driven validation:
- Always checks with server using If-None-Match header
- Server returns 304 Not Modified if unchanged (use cache)
- Server returns 200 OK if changed (download new version)
- Falls back to cache if server unavailable (resilient)

### Key Changes
1. **Schema Validator** (tests/e2e/adcp_schema_validator.py):
   - Deprecated time-based _is_cache_valid() logic
   - Implemented ETag-based conditional GET in download methods
   - Added metadata file tracking (.meta files store ETags)
   - Graceful fallback to cache on network errors

2. **Schema Refresh Tool** (scripts/refresh_adcp_schemas.py):
   - Clean slate approach: deletes all cached schemas before refresh
   - Handles both .json and .meta files
   - Verifies no outdated references remain (e.g., budget.json)

3. **Budget Format Fix** (tests/e2e/adcp_request_builder.py):
   - Fixed top-level budget: plain number (was object)
   - Fixed package budget: plain number (was object)
   - Matches official AdCP spec (currency from pricing_option_id)

4. **Cache Metadata** (.gitignore):
   - Ignore .meta files (local ETag cache metadata)

### Benefits
✅ Always fresh - detects changes immediately, no 24h delay
✅ Bandwidth efficient - only downloads when schemas actually change
✅ Resilient - falls back to cache if server unavailable
✅ Zero maintenance - automatic in online mode

### Schema Updates
- Deleted obsolete budget.json (doesn't exist in official spec)
- Added new schemas from official source (performance-feedback, etc.)
- All schemas now validated against https://adcontextprotocol.org

## Documentation
- docs/schema-caching-strategy.md - Technical implementation guide
- BUDGET_FORMAT_FINDINGS.md - Budget format analysis
- SCHEMA_CACHE_SOLUTION.md - Solution overview

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
BUDGET_FORMAT_FINDINGS.md and SCHEMA_CACHE_SOLUTION.md were temporary
analysis documents. The technical documentation in docs/schema-caching-strategy.md
is sufficient.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…mode

Fixes from python-expert review (#396):

1. Fixed metadata cleanup (Issue #1):
   - Delete old .meta files before saving new ones
   - Prevents stale ETag issues when schemas change
   - Applied to both index and schema downloads

2. Removed dead code (Issue #2):
   - Deleted deprecated _is_cache_valid() method
   - Was unused and confusing

3. Added --offline-schemas flag (Issue #6):
   - New pytest flag for offline development
   - Creates adcp_validator fixture with offline mode support
   - Useful for airplane mode / no network scenarios

These changes address the must-fix issues identified in the expert review
before CI completes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
E2E tests were failing because code expected budget.total but budget is now
a plain number per AdCP spec.

Fixed get_total_budget() method to handle:
- Plain numbers (new AdCP spec format)
- Dict format {total, currency} (backward compat)
- Budget object with .total attribute (legacy)

This maintains backward compatibility while supporting the spec-compliant
plain number format.

Fixes CI E2E test failure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Line 4135: Use request_currency instead of req.budget.currency
- Lines 4776-4780: Add hasattr() check before accessing .currency attribute
- Lines 3571-3580: Add type annotation for start_time to satisfy mypy
- Lines 4783-4786: Use currency_stmt variable name to avoid type conflict
- Fixes 'float' object has no attribute 'currency' error in E2E tests

Related to AdCP v2.4 budget format (plain number instead of object)
- Use inline type comment for start_time to avoid redefining parameter
- Add isinstance assertion to clarify start_time is datetime when not 'asap'
- Use currency_stmt variable name to avoid SQLAlchemy type conflict
- Convert media_buy.currency to str to fix mypy assignment error

These changes resolve mypy errors in the budget currency handling code.
Issue: Schema validation was failing because pricing-options schemas
(cpm-fixed-option, cpm-auction-option, etc.) were not being downloaded.

Root cause: _preload_schema_references() only downloaded direct references,
not references within those references (non-recursive).

Solution: Made _preload_schema_references() recursive with cycle detection:
- Added _visited set to track already-preloaded schemas
- Recursively preload references found in each schema
- Prevents infinite loops via visited set

This ensures all nested schema references are downloaded and cached before
validation runs, eliminating 'could not resolve schema reference' warnings.

Fixes E2E test: test_valid_get_products_response

Also includes updated cached schemas from adcontextprotocol.org with latest
pricing-options schemas and minor updates to existing schemas.
@bokelley bokelley merged commit 8060ec2 into main Oct 14, 2025
8 checks passed
danf-newton pushed a commit to Newton-Research-Inc/salesagent that referenced this pull request Nov 24, 2025
…bid#396)

* Fix AdCP schema caching: Replace time-based with ETag validation

## Problem
24-hour time-based cache was inappropriate for rapidly evolving AdCP spec:
- Assumed schemas valid for 24 hours (wrong during active development)
- Couldn't detect changes within cache window
- Led to persistent stale schemas causing test failures
- budget.json schema persisted in cache despite not existing in official spec

## Solution: ETag-Based HTTP Conditional GET
Replaced time-based caching with server-driven validation:
- Always checks with server using If-None-Match header
- Server returns 304 Not Modified if unchanged (use cache)
- Server returns 200 OK if changed (download new version)
- Falls back to cache if server unavailable (resilient)

### Key Changes
1. **Schema Validator** (tests/e2e/adcp_schema_validator.py):
   - Deprecated time-based _is_cache_valid() logic
   - Implemented ETag-based conditional GET in download methods
   - Added metadata file tracking (.meta files store ETags)
   - Graceful fallback to cache on network errors

2. **Schema Refresh Tool** (scripts/refresh_adcp_schemas.py):
   - Clean slate approach: deletes all cached schemas before refresh
   - Handles both .json and .meta files
   - Verifies no outdated references remain (e.g., budget.json)

3. **Budget Format Fix** (tests/e2e/adcp_request_builder.py):
   - Fixed top-level budget: plain number (was object)
   - Fixed package budget: plain number (was object)
   - Matches official AdCP spec (currency from pricing_option_id)

4. **Cache Metadata** (.gitignore):
   - Ignore .meta files (local ETag cache metadata)

### Benefits
✅ Always fresh - detects changes immediately, no 24h delay
✅ Bandwidth efficient - only downloads when schemas actually change
✅ Resilient - falls back to cache if server unavailable
✅ Zero maintenance - automatic in online mode

### Schema Updates
- Deleted obsolete budget.json (doesn't exist in official spec)
- Added new schemas from official source (performance-feedback, etc.)
- All schemas now validated against https://adcontextprotocol.org

## Documentation
- docs/schema-caching-strategy.md - Technical implementation guide
- BUDGET_FORMAT_FINDINGS.md - Budget format analysis
- SCHEMA_CACHE_SOLUTION.md - Solution overview

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Remove temporary analysis docs from repo root

BUDGET_FORMAT_FINDINGS.md and SCHEMA_CACHE_SOLUTION.md were temporary
analysis documents. The technical documentation in docs/schema-caching-strategy.md
is sufficient.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Address expert review feedback: fix metadata cleanup and add offline mode

Fixes from python-expert review (prebid#396):

1. Fixed metadata cleanup (Issue prebid#1):
   - Delete old .meta files before saving new ones
   - Prevents stale ETag issues when schemas change
   - Applied to both index and schema downloads

2. Removed dead code (Issue prebid#2):
   - Deleted deprecated _is_cache_valid() method
   - Was unused and confusing

3. Added --offline-schemas flag (Issue prebid#6):
   - New pytest flag for offline development
   - Creates adcp_validator fixture with offline mode support
   - Useful for airplane mode / no network scenarios

These changes address the must-fix issues identified in the expert review
before CI completes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix budget handling: support both number and object formats

E2E tests were failing because code expected budget.total but budget is now
a plain number per AdCP spec.

Fixed get_total_budget() method to handle:
- Plain numbers (new AdCP spec format)
- Dict format {total, currency} (backward compat)
- Budget object with .total attribute (legacy)

This maintains backward compatibility while supporting the spec-compliant
plain number format.

Fixes CI E2E test failure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix budget.currency access for plain number budgets

- Line 4135: Use request_currency instead of req.budget.currency
- Lines 4776-4780: Add hasattr() check before accessing .currency attribute
- Lines 3571-3580: Add type annotation for start_time to satisfy mypy
- Lines 4783-4786: Use currency_stmt variable name to avoid type conflict
- Fixes 'float' object has no attribute 'currency' error in E2E tests

Related to AdCP v2.4 budget format (plain number instead of object)

* Add mypy type annotations for budget currency handling

- Use inline type comment for start_time to avoid redefining parameter
- Add isinstance assertion to clarify start_time is datetime when not 'asap'
- Use currency_stmt variable name to avoid SQLAlchemy type conflict
- Convert media_buy.currency to str to fix mypy assignment error

These changes resolve mypy errors in the budget currency handling code.

* Fix recursive schema preloading for nested references

Issue: Schema validation was failing because pricing-options schemas
(cpm-fixed-option, cpm-auction-option, etc.) were not being downloaded.

Root cause: _preload_schema_references() only downloaded direct references,
not references within those references (non-recursive).

Solution: Made _preload_schema_references() recursive with cycle detection:
- Added _visited set to track already-preloaded schemas
- Recursively preload references found in each schema
- Prevents infinite loops via visited set

This ensures all nested schema references are downloaded and cached before
validation runs, eliminating 'could not resolve schema reference' warnings.

Fixes E2E test: test_valid_get_products_response

Also includes updated cached schemas from adcontextprotocol.org with latest
pricing-options schemas and minor updates to existing schemas.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant