Skip to content

BUG: Face cluster API returns metadata as JSON string instead of object #705

@VasuS609

Description

@VasuS609

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

The GET /face-clusters/{id}/images endpoint returns image metadata as a serialized JSON string rather than a parsed object, violating the API schema contract and breaking client-side implementations.

Current Behavior

API Response:
json{
"images": [
{
"id": 123,
"metadata": "{"camera":"Canon EOS","location":"NYC"}"
}
]
}
Metadata is returned as a string
Expected Behavior:
json{
"images": [
{
"id": 123,
"metadata": {
"camera": "Canon EOS",
"location": "NYC"
}
}
]
}

Metadata should be a proper JSON object
Root Cause
File: face_clusters.py
Function: db_get_images_by_cluster_id (lines ~267-305)
The function retrieves the metadata column directly from the database without deserializing it. Since most SQL databases store JSON as text, the raw string is passed through to the Pydantic response model, which expects a dictionary type.
Steps to Reproduce

Insert a face record with JSON metadata:

sql INSERT INTO faces (cluster_id, image_id, metadata)
VALUES (1, 100, '{"camera":"Canon EOS","location":"NYC"}');

Call the API endpoint:

bash curl http://localhost:8000/api/face-clusters/1/images

Observe the response contains stringified metadata instead of an object

Impact

API Contract Violation: Response doesn't match GetClusterImagesResponse schema
Frontend Errors: Client code cannot access metadata.camera directly without additional parsing
Type Safety: TypeScript/JavaScript clients receive incorrect types
Test Failures: Unit tests expecting dict types will fail with string values

Proposed Solution
Parse the metadata field before constructing the response:
pythondef db_get_images_by_cluster_id(cluster_id: int):
"""Retrieve all images for a specific face cluster."""
cursor.execute(query, (cluster_id,))
images = []

for row in cursor.fetchall():
    # Deserialize metadata if stored as JSON string
    metadata = row['metadata']
    if metadata and isinstance(metadata, str):
        try:
            metadata = json.loads(metadata)
        except json.JSONDecodeError:
            logger.warning(f"Invalid JSON in metadata for image {row['id']}")
            metadata = None
    
    images.append({
        'id': row['id'],
        'metadata': metadata,  # Now properly typed as dict
        # ... other fields
    })

return images

Key improvements:

Handles both string and dict inputs (database-agnostic)
Graceful error handling for malformed JSON
Logging for debugging
Maintains backward compatibility

Alternative approach:
If there's an existing utility function like parse_metadata() in the codebase, we should use that for consistency:
pythonfrom utils.image_utils import parse_metadata

metadata = parse_metadata(row['metadata'])

Testing Plan

Unit test: metadata stored as JSON string is correctly parsed
Unit test: metadata already as dict passes through unchanged
Unit test: invalid JSON is handled gracefully (returns null/empty dict)
Integration test: API response validation passes
Verify existing tests in test_face_clusters.py still pass

Files to Modify

backend/face_clusters.py (primary fix)
tests/test_face_clusters.py (add test cases)

Additional Context
This is a common issue when working with JSON columns in SQL databases. SQLite stores JSON as TEXT, PostgreSQL has native JSON support but can return strings depending on the driver configuration. The fix ensures consistent behavior across database backends.

I'd be happy to submit a PR with the fix and comprehensive tests! Let me know if you'd prefer a specific approach or if there are existing utilities I should use.

Record

  • I agree to follow this project's Code of Conduct

Checklist before Submitting

  • Have you updated docs for it?
  • Have you added unit tests?
  • Have you made sure unit tests pass?
  • Have you made sure code formatting is correct?
  • Do Your changes passes all tests?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions