-
Notifications
You must be signed in to change notification settings - Fork 578
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
The GET /face-clusters/{id}/images endpoint returns image metadata as a serialized JSON string rather than a parsed object, violating the API schema contract and breaking client-side implementations.
Current Behavior
API Response:
json{
"images": [
{
"id": 123,
"metadata": "{"camera":"Canon EOS","location":"NYC"}"
}
]
}
Metadata is returned as a string
Expected Behavior:
json{
"images": [
{
"id": 123,
"metadata": {
"camera": "Canon EOS",
"location": "NYC"
}
}
]
}
Metadata should be a proper JSON object
Root Cause
File: face_clusters.py
Function: db_get_images_by_cluster_id (lines ~267-305)
The function retrieves the metadata column directly from the database without deserializing it. Since most SQL databases store JSON as text, the raw string is passed through to the Pydantic response model, which expects a dictionary type.
Steps to Reproduce
Insert a face record with JSON metadata:
sql INSERT INTO faces (cluster_id, image_id, metadata)
VALUES (1, 100, '{"camera":"Canon EOS","location":"NYC"}');
Call the API endpoint:
bash curl http://localhost:8000/api/face-clusters/1/images
Observe the response contains stringified metadata instead of an object
Impact
API Contract Violation: Response doesn't match GetClusterImagesResponse schema
Frontend Errors: Client code cannot access metadata.camera directly without additional parsing
Type Safety: TypeScript/JavaScript clients receive incorrect types
Test Failures: Unit tests expecting dict types will fail with string values
Proposed Solution
Parse the metadata field before constructing the response:
pythondef db_get_images_by_cluster_id(cluster_id: int):
"""Retrieve all images for a specific face cluster."""
cursor.execute(query, (cluster_id,))
images = []
for row in cursor.fetchall():
# Deserialize metadata if stored as JSON string
metadata = row['metadata']
if metadata and isinstance(metadata, str):
try:
metadata = json.loads(metadata)
except json.JSONDecodeError:
logger.warning(f"Invalid JSON in metadata for image {row['id']}")
metadata = None
images.append({
'id': row['id'],
'metadata': metadata, # Now properly typed as dict
# ... other fields
})
return images
Key improvements:
Handles both string and dict inputs (database-agnostic)
Graceful error handling for malformed JSON
Logging for debugging
Maintains backward compatibility
Alternative approach:
If there's an existing utility function like parse_metadata() in the codebase, we should use that for consistency:
pythonfrom utils.image_utils import parse_metadata
metadata = parse_metadata(row['metadata'])
Testing Plan
Unit test: metadata stored as JSON string is correctly parsed
Unit test: metadata already as dict passes through unchanged
Unit test: invalid JSON is handled gracefully (returns null/empty dict)
Integration test: API response validation passes
Verify existing tests in test_face_clusters.py still pass
Files to Modify
backend/face_clusters.py (primary fix)
tests/test_face_clusters.py (add test cases)
Additional Context
This is a common issue when working with JSON columns in SQL databases. SQLite stores JSON as TEXT, PostgreSQL has native JSON support but can return strings depending on the driver configuration. The fix ensures consistent behavior across database backends.
I'd be happy to submit a PR with the fix and comprehensive tests! Let me know if you'd prefer a specific approach or if there are existing utilities I should use.
Record
- I agree to follow this project's Code of Conduct
Checklist before Submitting
- Have you updated docs for it?
- Have you added unit tests?
- Have you made sure unit tests pass?
- Have you made sure code formatting is correct?
- Do Your changes passes all tests?