Fix: Return metadata as dict instead of string in db_get_images_by_cluster_id #707

VasuS609 · 2025-12-10T14:17:03Z

What does this PR do?

Fixes the issue where db_get_images_by_cluster_id returned metadata as a
JSON string instead of a Python dict. This caused inconsistencies while
processing images on the backend.

What was changed?

Parsed the metadata field using json.loads before returning the response.
Ensures all returned metadata is a proper dict.

Why is this needed?

Downstream functions expect metadata as a dictionary. Returning it as a string
was causing unexpected behavior and required extra parsing.

How was it tested?

Ran the backend locally.
Retrieved images by cluster ID and verified metadata is now a dict.

Related Issue

Fixes #705

Summary by CodeRabbit

Bug Fixes
- Fixed image metadata handling in cluster operations—metadata is now properly normalized and formatted for consistency.
- Improved logging system stability and prevented potential logging loop issues.
Documentation
- Updated API schema to enforce stricter validation rules for image metadata fields, ensuring data conformity.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-10T14:17:30Z

Walkthrough

The PR fixes a bug where face cluster metadata was returned as a serialized JSON string instead of a parsed object, adds logging interception improvements with recursion prevention, and updates the OpenAPI schema to reflect the corrected metadata structure.

Changes

Cohort / File(s)	Summary
Metadata normalization `backend/app/database/face_clusters.py`	Added import of `image_util_parse_metadata`; in `db_get_images_by_cluster_id`, metadata is now normalized by parsing it to a dict using the utility function before returning in the response.
Logging handler improvements `backend/app/logging/setup_logging.py`	Replaced log interception logic with a recursion-guarded approach using an `_in_intercept_handler` flag, formats records with `self.format(record)` before printing to stdout, and handles errors via stderr with proper cleanup in finally block.
OpenAPI schema update `docs/backend/backend_python/openapi.json`	Removed `"additionalProperties": true` from `ImageInCluster.Metadata` schema definition, restricting metadata objects to defined properties only while retaining null alternative.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Extra attention: Review the recursion-prevention logic in setup_logging.py to ensure the _in_intercept_handler flag is correctly set/cleared in all code paths, particularly error scenarios and edge cases.
Cross-file validation: Verify that the metadata normalization in face_clusters.py properly handles all edge cases (null, malformed JSON) before returning to clients.

Possibly related PRs

Feat: Scrollbar Timeline with {Month-Year} Markers for Home page and AI Tagging page. #552 — Also modifies face_clusters.py to normalize image metadata using image_util_parse_metadata, same core fix.
[Fix] Fixed 500 issue in /cluster_id/images API #598 — Parses metadata JSON in db_get_images_by_cluster_id to return dict instead of raw string, directly related bug fix.

Suggested labels

bug, backend

Suggested reviewers

rahulharpal1603

Poem

🐰 A string that masqueraded as an object,
Now parsed with care—no more to neglect!
Logs guard their steps with flags held tight,
Schema and truth now shine bright.
Hops of joy for metadata's plight! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR partially addresses Issue #705 objectives by parsing metadata, but uses image_util_parse_metadata utility instead of json.loads with error handling, and lacks test additions and documentation updates specified in the issue requirements.	Ensure graceful error handling for malformed JSON (log warning, set metadata to None), add unit tests for string/dict/invalid JSON cases, and add integration test validation as specified in Issue #705.
Out of Scope Changes check	⚠️ Warning	The PR includes a schema change to openapi.json that removes additionalProperties allowance from Metadata, which is not mentioned in Issue #705 requirements and may represent a breaking API change.	Clarify whether the OpenAPI schema change is intentional; if not directly related to the metadata parsing fix, consider reverting it to a separate PR with proper justification and testing.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the primary fix: returning metadata as a dict instead of string in the db_get_images_by_cluster_id function.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

backend/app/database/face_clusters.py (1)

310-318: Correct fix for the metadata parsing issue.

The metadata normalization correctly addresses the bug where metadata was returned as a JSON string instead of a dict. The image_util_parse_metadata function handles all the edge cases mentioned in the issue (string parsing, dict passthrough, malformed JSON → empty dict).

Minor consideration: The return type annotation List[Dict[str, Union[str, int]]] on line 257 doesn't fully reflect that metadata is now a Mapping[str, Any]. Consider updating the type hint for better type safety if the project uses strict type checking.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d07d817 and a9dfbc7.

📒 Files selected for processing (3)

backend/app/database/face_clusters.py (2 hunks)
backend/app/logging/setup_logging.py (1 hunks)
docs/backend/backend_python/openapi.json (0 hunks)

💤 Files with no reviewable changes (1)

docs/backend/backend_python/openapi.json

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-13T16:40:22.622Z

Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 548
File: backend/app/logging/setup_logging.py:224-243
Timestamp: 2025-10-13T16:40:22.622Z
Learning: In the PictoPy project's centralized logging system (backend/app/logging/setup_logging.py and sync-microservice/app/logging/setup_logging.py), stack traces should not be shown in logs. When rerouting logs through InterceptHandler, exc_info and stack_info should not be preserved.

Applied to files:

backend/app/logging/setup_logging.py

🧬 Code graph analysis (2)

backend/app/logging/setup_logging.py (1)

sync-microservice/app/logging/setup_logging.py (1)

format (63-110)

backend/app/database/face_clusters.py (1)

backend/app/utils/images.py (1)

image_util_parse_metadata (496-513)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Backend Tests

🔇 Additional comments (1)

backend/app/database/face_clusters.py (1)

4-4: LGTM!

Good reuse of the existing image_util_parse_metadata utility function, which already handles all edge cases (None, string, dict, malformed JSON) as required by the issue.

coderabbitai · 2025-12-10T14:20:33Z

backend/app/logging/setup_logging.py

+        # Prevent infinite recursion
+        if getattr(record, "_in_intercept_handler", False):
+            return
+
+        record._in_intercept_handler = True
+
+        try:
+            # Just print directly to avoid recursion issues
+            msg = self.format(record)
+            print(msg, file=sys.stdout)
+        except Exception as e:
+            print(f"Logging error: {e}", file=sys.stderr)
+        finally:
+            if hasattr(record, "_in_intercept_handler"):
+                delattr(record, "_in_intercept_handler")


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check for InterceptHandler implementations across the codebase rg -n "class InterceptHandler" --type py -A 10

Repository: AOSSIE-Org/PictoPy

Length of output: 180

🏁 Script executed:

#!/bin/bash # Also check if there's a sync-microservice version as mentioned in learnings find . -type f -name "setup_logging.py" | head -20

Repository: AOSSIE-Org/PictoPy

Length of output: 180

🏁 Script executed:

#!/bin/bash # Get full context of the emit method and surrounding code in backend version sed -n '200,260p' backend/app/logging/setup_logging.py | cat -n

Repository: AOSSIE-Org/PictoPy

Length of output: 180

🌐 Web query:

Python logging Formatter exc_info stack_info behavior format method

💡 Result:

Short answer — how Formatter.format() handles exc_info and stack_info:

When you call a logging method with exc_info truthy or stack_info=True, those values are stored on the LogRecord (record.exc_info and record.stack_info). The Formatter.format(record) method:

computes record.message from record.msg and record.args,

uses your format string to produce a first-line result,

if record.exc_info is truthy it calls Formatter.formatException(record.exc_info) and appends the resulting traceback text to the formatted string,

if record.stack_info is truthy it calls Formatter.formatStack(record.stack_info) and appends that text as well.

The default formatException uses the traceback-printing machinery (traceback.print_exception) to produce a standard exception traceback. The default formatStack returns the supplied stack-info string (traceback.print_stack output). You can override formatException/formatStack to change how those blocks are rendered. [1][2]

References:

Python logging docs: Formatter.formatException / formatStack and description of exc_info and stack_info behavior. [1][2]

Stack traces are being preserved in formatted output, contradicting project requirements.

The current implementation calls self.format(record) without clearing exc_info or stack_info from the LogRecord. Python's Formatter.format() method automatically appends exception tracebacks and stack information to the formatted string if these fields are set. This directly violates the project learning that "exc_info and stack_info should not be preserved" when rerouting through InterceptHandler.

To comply with project requirements, clear these fields before formatting:

record.exc_info = None record.stack_info = None msg = self.format(record)

The recursion guard and error handling structure are sound, but the stack trace preservation must be addressed.

🤖 Prompt for AI Agents

In backend/app/logging/setup_logging.py around lines 238 to 252, the handler calls self.format(record) while record.exc_info and record.stack_info may still be set, causing formatted output to include stack traces which violates the project rule to not preserve these fields; before calling self.format(record) clear them by setting record.exc_info = None and record.stack_info = None, then format and print as before, leaving the existing recursion guard, exception handling, and final deletion of record._in_intercept_handler unchanged.

Thank you for the automated analysis!
This issue is unrelated to the changes in this PR (which only modifies metadata parsing behavior).
The logging-related finding is valid, but I believe it should be handled in a separate PR to keep changes atomic.

Let me know if you'd like me to open a dedicated issue/PR for that fix.

@VasuS609, you're absolutely right that keeping changes atomic is the best practice here! I appreciate you acknowledging the logging concern.

Since you mentioned the logging fix should be handled separately, would you like me to create a GitHub issue for it? I can open one with the details about the stack trace preservation issue in the InterceptHandler class, including:

The specific problem (exc_info/stack_info being preserved in formatted output)

Reference to the project requirement from the learnings

Link back to this PR and comment thread

Just let me know, and I'll create the issue and assign it to you (or you can create it manually if you prefer). Either way, keeping this PR focused on the metadata parsing fix makes sense! 👍

🧠 Learnings used

Learnt from: Hemil36 Repo: AOSSIE-Org/PictoPy PR: 548 File: backend/app/logging/setup_logging.py:224-243 Timestamp: 2025-10-13T16:40:22.622Z Learning: In the PictoPy project's centralized logging system (backend/app/logging/setup_logging.py and sync-microservice/app/logging/setup_logging.py), stack traces should not be shown in logs. When rerouting logs through InterceptHandler, exc_info and stack_info should not be preserved.

Learnt from: Hemil36 Repo: AOSSIE-Org/PictoPy PR: 548 File: backend/app/database/images.py:115-115 Timestamp: 2025-10-15T07:13:34.502Z Learning: In the PictoPy backend and sync-microservice, exception logging using `logger.error(f"...")` without stack traces (i.e., without `exc_info=True` or `logger.exception()`) is acceptable for console logging, as this is a deliberate design decision made with the maintainer for console-only logs.

Fix: parse metadata... return dict instead of string

a9dfbc7

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

VasuS609 mentioned this pull request Jan 2, 2026

BUG: Face cluster API returns metadata as JSON string instead of object #705

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Return metadata as dict instead of string in db_get_images_by_cluster_id #707

Fix: Return metadata as dict instead of string in db_get_images_by_cluster_id #707

Uh oh!

VasuS609 commented Dec 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 10, 2025 •

edited

Loading

Uh oh!

VasuS609 Dec 10, 2025

Uh oh!

coderabbitai bot Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: Return metadata as dict instead of string in db_get_images_by_cluster_id #707

Are you sure you want to change the base?

Fix: Return metadata as dict instead of string in db_get_images_by_cluster_id #707

Uh oh!

Conversation

VasuS609 commented Dec 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

What was changed?

Why is this needed?

How was it tested?

Related Issue

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VasuS609 Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

VasuS609 commented Dec 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 10, 2025 •

edited

Loading

coderabbitai bot Dec 10, 2025 •

edited

Loading