Skip to content

Conversation

@VasuS609
Copy link
Contributor

@VasuS609 VasuS609 commented Dec 10, 2025

What does this PR do?

Fixes the issue where db_get_images_by_cluster_id returned metadata as a
JSON string instead of a Python dict. This caused inconsistencies while
processing images on the backend.

What was changed?

  • Parsed the metadata field using json.loads before returning the response.
  • Ensures all returned metadata is a proper dict.

Why is this needed?

Downstream functions expect metadata as a dictionary. Returning it as a string
was causing unexpected behavior and required extra parsing.

How was it tested?

  • Ran the backend locally.
  • Retrieved images by cluster ID and verified metadata is now a dict.

Related Issue

Fixes #705

Summary by CodeRabbit

  • Bug Fixes

    • Fixed image metadata handling in cluster operations—metadata is now properly normalized and formatted for consistency.
    • Improved logging system stability and prevented potential logging loop issues.
  • Documentation

    • Updated API schema to enforce stricter validation rules for image metadata fields, ensuring data conformity.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 10, 2025

Walkthrough

The PR fixes a bug where face cluster metadata was returned as a serialized JSON string instead of a parsed object, adds logging interception improvements with recursion prevention, and updates the OpenAPI schema to reflect the corrected metadata structure.

Changes

Cohort / File(s) Summary
Metadata normalization
backend/app/database/face_clusters.py
Added import of image_util_parse_metadata; in db_get_images_by_cluster_id, metadata is now normalized by parsing it to a dict using the utility function before returning in the response.
Logging handler improvements
backend/app/logging/setup_logging.py
Replaced log interception logic with a recursion-guarded approach using an _in_intercept_handler flag, formats records with self.format(record) before printing to stdout, and handles errors via stderr with proper cleanup in finally block.
OpenAPI schema update
docs/backend/backend_python/openapi.json
Removed "additionalProperties": true from ImageInCluster.Metadata schema definition, restricting metadata objects to defined properties only while retaining null alternative.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Extra attention: Review the recursion-prevention logic in setup_logging.py to ensure the _in_intercept_handler flag is correctly set/cleared in all code paths, particularly error scenarios and edge cases.
  • Cross-file validation: Verify that the metadata normalization in face_clusters.py properly handles all edge cases (null, malformed JSON) before returning to clients.

Possibly related PRs

Suggested labels

bug, backend

Suggested reviewers

  • rahulharpal1603

Poem

🐰 A string that masqueraded as an object,
Now parsed with care—no more to neglect!
Logs guard their steps with flags held tight,
Schema and truth now shine bright.
Hops of joy for metadata's plight! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR partially addresses Issue #705 objectives by parsing metadata, but uses image_util_parse_metadata utility instead of json.loads with error handling, and lacks test additions and documentation updates specified in the issue requirements. Ensure graceful error handling for malformed JSON (log warning, set metadata to None), add unit tests for string/dict/invalid JSON cases, and add integration test validation as specified in Issue #705.
Out of Scope Changes check ⚠️ Warning The PR includes a schema change to openapi.json that removes additionalProperties allowance from Metadata, which is not mentioned in Issue #705 requirements and may represent a breaking API change. Clarify whether the OpenAPI schema change is intentional; if not directly related to the metadata parsing fix, consider reverting it to a separate PR with proper justification and testing.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the primary fix: returning metadata as a dict instead of string in the db_get_images_by_cluster_id function.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/app/database/face_clusters.py (1)

310-318: Correct fix for the metadata parsing issue.

The metadata normalization correctly addresses the bug where metadata was returned as a JSON string instead of a dict. The image_util_parse_metadata function handles all the edge cases mentioned in the issue (string parsing, dict passthrough, malformed JSON → empty dict).

Minor consideration: The return type annotation List[Dict[str, Union[str, int]]] on line 257 doesn't fully reflect that metadata is now a Mapping[str, Any]. Consider updating the type hint for better type safety if the project uses strict type checking.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d07d817 and a9dfbc7.

📒 Files selected for processing (3)
  • backend/app/database/face_clusters.py (2 hunks)
  • backend/app/logging/setup_logging.py (1 hunks)
  • docs/backend/backend_python/openapi.json (0 hunks)
💤 Files with no reviewable changes (1)
  • docs/backend/backend_python/openapi.json
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-13T16:40:22.622Z
Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 548
File: backend/app/logging/setup_logging.py:224-243
Timestamp: 2025-10-13T16:40:22.622Z
Learning: In the PictoPy project's centralized logging system (backend/app/logging/setup_logging.py and sync-microservice/app/logging/setup_logging.py), stack traces should not be shown in logs. When rerouting logs through InterceptHandler, exc_info and stack_info should not be preserved.

Applied to files:

  • backend/app/logging/setup_logging.py
🧬 Code graph analysis (2)
backend/app/logging/setup_logging.py (1)
sync-microservice/app/logging/setup_logging.py (1)
  • format (63-110)
backend/app/database/face_clusters.py (1)
backend/app/utils/images.py (1)
  • image_util_parse_metadata (496-513)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Backend Tests
🔇 Additional comments (1)
backend/app/database/face_clusters.py (1)

4-4: LGTM!

Good reuse of the existing image_util_parse_metadata utility function, which already handles all edge cases (None, string, dict, malformed JSON) as required by the issue.

Comment on lines +238 to +252
# Prevent infinite recursion
if getattr(record, "_in_intercept_handler", False):
return

record._in_intercept_handler = True

try:
# Just print directly to avoid recursion issues
msg = self.format(record)
print(msg, file=sys.stdout)
except Exception as e:
print(f"Logging error: {e}", file=sys.stderr)
finally:
if hasattr(record, "_in_intercept_handler"):
delattr(record, "_in_intercept_handler")
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for InterceptHandler implementations across the codebase
rg -n "class InterceptHandler" --type py -A 10

Repository: AOSSIE-Org/PictoPy

Length of output: 180


🏁 Script executed:

#!/bin/bash
# Also check if there's a sync-microservice version as mentioned in learnings
find . -type f -name "setup_logging.py" | head -20

Repository: AOSSIE-Org/PictoPy

Length of output: 180


🏁 Script executed:

#!/bin/bash
# Get full context of the emit method and surrounding code in backend version
sed -n '200,260p' backend/app/logging/setup_logging.py | cat -n

Repository: AOSSIE-Org/PictoPy

Length of output: 180


🌐 Web query:

Python logging Formatter exc_info stack_info behavior format method

💡 Result:

Short answer — how Formatter.format() handles exc_info and stack_info:

  • When you call a logging method with exc_info truthy or stack_info=True, those values are stored on the LogRecord (record.exc_info and record.stack_info). The Formatter.format(record) method:
    1. computes record.message from record.msg and record.args,
    2. uses your format string to produce a first-line result,
    3. if record.exc_info is truthy it calls Formatter.formatException(record.exc_info) and appends the resulting traceback text to the formatted string,
    4. if record.stack_info is truthy it calls Formatter.formatStack(record.stack_info) and appends that text as well.
  • The default formatException uses the traceback-printing machinery (traceback.print_exception) to produce a standard exception traceback. The default formatStack returns the supplied stack-info string (traceback.print_stack output). You can override formatException/formatStack to change how those blocks are rendered. [1][2]

References:

  • Python logging docs: Formatter.formatException / formatStack and description of exc_info and stack_info behavior. [1][2]

Stack traces are being preserved in formatted output, contradicting project requirements.

The current implementation calls self.format(record) without clearing exc_info or stack_info from the LogRecord. Python's Formatter.format() method automatically appends exception tracebacks and stack information to the formatted string if these fields are set. This directly violates the project learning that "exc_info and stack_info should not be preserved" when rerouting through InterceptHandler.

To comply with project requirements, clear these fields before formatting:

record.exc_info = None
record.stack_info = None
msg = self.format(record)

The recursion guard and error handling structure are sound, but the stack trace preservation must be addressed.

🤖 Prompt for AI Agents
In backend/app/logging/setup_logging.py around lines 238 to 252, the handler
calls self.format(record) while record.exc_info and record.stack_info may still
be set, causing formatted output to include stack traces which violates the
project rule to not preserve these fields; before calling self.format(record)
clear them by setting record.exc_info = None and record.stack_info = None, then
format and print as before, leaving the existing recursion guard, exception
handling, and final deletion of record._in_intercept_handler unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the automated analysis!
This issue is unrelated to the changes in this PR (which only modifies metadata parsing behavior).
The logging-related finding is valid, but I believe it should be handled in a separate PR to keep changes atomic.

Let me know if you'd like me to open a dedicated issue/PR for that fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VasuS609, you're absolutely right that keeping changes atomic is the best practice here! I appreciate you acknowledging the logging concern.

Since you mentioned the logging fix should be handled separately, would you like me to create a GitHub issue for it? I can open one with the details about the stack trace preservation issue in the InterceptHandler class, including:

  • The specific problem (exc_info/stack_info being preserved in formatted output)
  • Reference to the project requirement from the learnings
  • Link back to this PR and comment thread

Just let me know, and I'll create the issue and assign it to you (or you can create it manually if you prefer). Either way, keeping this PR focused on the metadata parsing fix makes sense! 👍


🧠 Learnings used
Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 548
File: backend/app/logging/setup_logging.py:224-243
Timestamp: 2025-10-13T16:40:22.622Z
Learning: In the PictoPy project's centralized logging system (backend/app/logging/setup_logging.py and sync-microservice/app/logging/setup_logging.py), stack traces should not be shown in logs. When rerouting logs through InterceptHandler, exc_info and stack_info should not be preserved.

Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 548
File: backend/app/database/images.py:115-115
Timestamp: 2025-10-15T07:13:34.502Z
Learning: In the PictoPy backend and sync-microservice, exception logging using `logger.error(f"...")` without stack traces (i.e., without `exc_info=True` or `logger.exception()`) is acceptable for console logging, as this is a deliberate design decision made with the maintainer for console-only logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Face cluster API returns metadata as JSON string instead of object

1 participant