-
Notifications
You must be signed in to change notification settings - Fork 571
Fix: Return metadata as dict instead of string in db_get_images_by_cluster_id #707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe PR fixes a bug where face cluster metadata was returned as a serialized JSON string instead of a parsed object, adds logging interception improvements with recursion prevention, and updates the OpenAPI schema to reflect the corrected metadata structure. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/app/database/face_clusters.py (1)
310-318: Correct fix for the metadata parsing issue.The metadata normalization correctly addresses the bug where metadata was returned as a JSON string instead of a dict. The
image_util_parse_metadatafunction handles all the edge cases mentioned in the issue (string parsing, dict passthrough, malformed JSON → empty dict).Minor consideration: The return type annotation
List[Dict[str, Union[str, int]]]on line 257 doesn't fully reflect thatmetadatais now aMapping[str, Any]. Consider updating the type hint for better type safety if the project uses strict type checking.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/app/database/face_clusters.py(2 hunks)backend/app/logging/setup_logging.py(1 hunks)docs/backend/backend_python/openapi.json(0 hunks)
💤 Files with no reviewable changes (1)
- docs/backend/backend_python/openapi.json
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-13T16:40:22.622Z
Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 548
File: backend/app/logging/setup_logging.py:224-243
Timestamp: 2025-10-13T16:40:22.622Z
Learning: In the PictoPy project's centralized logging system (backend/app/logging/setup_logging.py and sync-microservice/app/logging/setup_logging.py), stack traces should not be shown in logs. When rerouting logs through InterceptHandler, exc_info and stack_info should not be preserved.
Applied to files:
backend/app/logging/setup_logging.py
🧬 Code graph analysis (2)
backend/app/logging/setup_logging.py (1)
sync-microservice/app/logging/setup_logging.py (1)
format(63-110)
backend/app/database/face_clusters.py (1)
backend/app/utils/images.py (1)
image_util_parse_metadata(496-513)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Backend Tests
🔇 Additional comments (1)
backend/app/database/face_clusters.py (1)
4-4: LGTM!Good reuse of the existing
image_util_parse_metadatautility function, which already handles all edge cases (None, string, dict, malformed JSON) as required by the issue.
| # Prevent infinite recursion | ||
| if getattr(record, "_in_intercept_handler", False): | ||
| return | ||
|
|
||
| record._in_intercept_handler = True | ||
|
|
||
| try: | ||
| # Just print directly to avoid recursion issues | ||
| msg = self.format(record) | ||
| print(msg, file=sys.stdout) | ||
| except Exception as e: | ||
| print(f"Logging error: {e}", file=sys.stderr) | ||
| finally: | ||
| if hasattr(record, "_in_intercept_handler"): | ||
| delattr(record, "_in_intercept_handler") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for InterceptHandler implementations across the codebase
rg -n "class InterceptHandler" --type py -A 10Repository: AOSSIE-Org/PictoPy
Length of output: 180
🏁 Script executed:
#!/bin/bash
# Also check if there's a sync-microservice version as mentioned in learnings
find . -type f -name "setup_logging.py" | head -20Repository: AOSSIE-Org/PictoPy
Length of output: 180
🏁 Script executed:
#!/bin/bash
# Get full context of the emit method and surrounding code in backend version
sed -n '200,260p' backend/app/logging/setup_logging.py | cat -nRepository: AOSSIE-Org/PictoPy
Length of output: 180
🌐 Web query:
Python logging Formatter exc_info stack_info behavior format method
💡 Result:
Short answer — how Formatter.format() handles exc_info and stack_info:
- When you call a logging method with exc_info truthy or stack_info=True, those values are stored on the LogRecord (record.exc_info and record.stack_info). The Formatter.format(record) method:
- computes record.message from record.msg and record.args,
- uses your format string to produce a first-line result,
- if record.exc_info is truthy it calls Formatter.formatException(record.exc_info) and appends the resulting traceback text to the formatted string,
- if record.stack_info is truthy it calls Formatter.formatStack(record.stack_info) and appends that text as well.
- The default formatException uses the traceback-printing machinery (traceback.print_exception) to produce a standard exception traceback. The default formatStack returns the supplied stack-info string (traceback.print_stack output). You can override formatException/formatStack to change how those blocks are rendered. [1][2]
References:
- Python logging docs: Formatter.formatException / formatStack and description of exc_info and stack_info behavior. [1][2]
Stack traces are being preserved in formatted output, contradicting project requirements.
The current implementation calls self.format(record) without clearing exc_info or stack_info from the LogRecord. Python's Formatter.format() method automatically appends exception tracebacks and stack information to the formatted string if these fields are set. This directly violates the project learning that "exc_info and stack_info should not be preserved" when rerouting through InterceptHandler.
To comply with project requirements, clear these fields before formatting:
record.exc_info = None
record.stack_info = None
msg = self.format(record)The recursion guard and error handling structure are sound, but the stack trace preservation must be addressed.
🤖 Prompt for AI Agents
In backend/app/logging/setup_logging.py around lines 238 to 252, the handler
calls self.format(record) while record.exc_info and record.stack_info may still
be set, causing formatted output to include stack traces which violates the
project rule to not preserve these fields; before calling self.format(record)
clear them by setting record.exc_info = None and record.stack_info = None, then
format and print as before, leaving the existing recursion guard, exception
handling, and final deletion of record._in_intercept_handler unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the automated analysis!
This issue is unrelated to the changes in this PR (which only modifies metadata parsing behavior).
The logging-related finding is valid, but I believe it should be handled in a separate PR to keep changes atomic.
Let me know if you'd like me to open a dedicated issue/PR for that fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VasuS609, you're absolutely right that keeping changes atomic is the best practice here! I appreciate you acknowledging the logging concern.
Since you mentioned the logging fix should be handled separately, would you like me to create a GitHub issue for it? I can open one with the details about the stack trace preservation issue in the InterceptHandler class, including:
- The specific problem (exc_info/stack_info being preserved in formatted output)
- Reference to the project requirement from the learnings
- Link back to this PR and comment thread
Just let me know, and I'll create the issue and assign it to you (or you can create it manually if you prefer). Either way, keeping this PR focused on the metadata parsing fix makes sense! 👍
🧠 Learnings used
Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 548
File: backend/app/logging/setup_logging.py:224-243
Timestamp: 2025-10-13T16:40:22.622Z
Learning: In the PictoPy project's centralized logging system (backend/app/logging/setup_logging.py and sync-microservice/app/logging/setup_logging.py), stack traces should not be shown in logs. When rerouting logs through InterceptHandler, exc_info and stack_info should not be preserved.
Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 548
File: backend/app/database/images.py:115-115
Timestamp: 2025-10-15T07:13:34.502Z
Learning: In the PictoPy backend and sync-microservice, exception logging using `logger.error(f"...")` without stack traces (i.e., without `exc_info=True` or `logger.exception()`) is acceptable for console logging, as this is a deliberate design decision made with the maintainer for console-only logs.
What does this PR do?
Fixes the issue where
db_get_images_by_cluster_idreturned metadata as aJSON string instead of a Python dict. This caused inconsistencies while
processing images on the backend.
What was changed?
json.loadsbefore returning the response.Why is this needed?
Downstream functions expect metadata as a dictionary. Returning it as a string
was causing unexpected behavior and required extra parsing.
How was it tested?
Related Issue
Fixes #705
Summary by CodeRabbit
Bug Fixes
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.