-
Notifications
You must be signed in to change notification settings - Fork 76
Feature: Judge Perspective Node #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughA comprehensive fact-checking and perspective-generation pipeline was introduced to the backend, leveraging Groq LLM, DuckDuckGo search, and LangGraph for orchestrating multi-step workflows. New modules handle claim extraction, web search, fact verification, sentiment analysis, perspective generation, and judgment. The frontend's hero section received a minor UI update with an informational paragraph. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend
participant BackendAPI
participant Scraper
participant LangGraphWorkflow
participant SentimentNode
participant FactCheckNode
participant PerspectiveNode
participant JudgeNode
participant StoreNode
participant ErrorHandler
User->>Frontend: Submit article URL
Frontend->>BackendAPI: POST /process {url}
BackendAPI->>Scraper: Extract and clean article text
Scraper-->>BackendAPI: {cleaned_text}
BackendAPI->>LangGraphWorkflow: Start workflow with {cleaned_text}
LangGraphWorkflow->>SentimentNode: Analyze sentiment
SentimentNode-->>LangGraphWorkflow: {sentiment}
LangGraphWorkflow->>FactCheckNode: Extract and verify claims
FactCheckNode->>FactCheckNode: (Claim extraction, web search, LLM verification)
FactCheckNode-->>LangGraphWorkflow: {facts}
LangGraphWorkflow->>PerspectiveNode: Generate counter-perspective
PerspectiveNode-->>LangGraphWorkflow: {perspective}
LangGraphWorkflow->>JudgeNode: Judge perspective quality
JudgeNode-->>LangGraphWorkflow: {score}
alt Score < 70 and retries < 3
LangGraphWorkflow->>PerspectiveNode: Retry perspective generation
else Score >= 70 or retries >= 3
LangGraphWorkflow->>StoreNode: Store and send results
StoreNode-->>LangGraphWorkflow: {status: success}
end
LangGraphWorkflow-->>BackendAPI: {final result}
BackendAPI-->>Frontend: Return result
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 19
🧹 Nitpick comments (18)
frontend/app/page.tsx (1)
118-121: Minor copy/UX polish – hyphenate “sign-in” for grammatical correctnessThe new paragraph is a nice touch 👍.
Consider using the compound adjective “sign-in” (with a hyphen or NB-SP) to avoid the brief cognitive stumble some readers experience with “sign in”.- <p className="mt-3 text-xs text-slate-500 dark:text-slate-400 animate-fade-in delay-600"> - No sign in required. It’s completely free. + <p className="mt-3 text-xs text-slate-500 dark:text-slate-400 animate-fade-in delay-600"> + No sign-in required. It’s completely free. </p>new-backend/app/modules/langgraph_nodes/store_and_send.py (3)
1-1: Consider removing commented import or implementing the functionality.The commented import suggests incomplete implementation. Either implement the vector store functionality or remove the comment if it's not needed yet.
11-17: Improve error handling specificity.The generic exception handling could mask specific errors. Consider logging the full exception details and potentially handling specific exception types differently.
except Exception as e: - print(f"some error occured in store_and_send:{e}") + logger.error(f"Error occurred in store_and_send: {str(e)}", exc_info=True) return { "status": "error", "error_from": "store_and_send", - "message": f"{e}", + "message": str(e), }
19-22: Validate state structure before spreading.Using the spread operator
**statewithout validation could lead to issues if the state contains unexpected keys or structures.Consider validating the state structure or being more explicit about which keys to include in the response.
new-backend/app/modules/facts_check/web_search.py (1)
4-15: Good implementation with room for improvement.The core search functionality is well-implemented and returns a clean, structured format. However, consider the following improvements:
- Add error handling for search failures
- Replace print with proper logging
- Add input validation for the query parameter
+import logging + +logger = logging.getLogger(__name__) + def search_duckduckgo(query, max_results=1): + if not query or not query.strip(): + raise ValueError("Query cannot be empty") + with DDGS() as ddgs: - results = ddgs.text(query, max_results=max_results) - print(results) - return [ - { - "title": r["title"], - "snippet": r["body"], - "link": r["href"] - } - for r in results - ] + try: + results = ddgs.text(query, max_results=max_results) + logger.debug(f"Search results for '{query}': {results}") + return [ + { + "title": r.get("title", ""), + "snippet": r.get("body", ""), + "link": r.get("href", "") + } + for r in results + ] + except Exception as e: + logger.error(f"Search failed for query '{query}': {e}") + return []new-backend/app/modules/langgraph_nodes/fact_check.py (2)
1-1: Remove the malformed commented import.The commented import line contains a trailing backslash which would cause a syntax error if uncommented.
-# from app.modules.pipeline import run_fact_check_pipeline\
14-14: Fix the typo in the error message."occured" should be "occurred".
- print(f"some error occured in fact_checking:{e}") + print(f"some error occurred in fact_checking:{e}")new-backend/app/utils/fact_check_utils.py (1)
28-28: Consider making the search delay configurable.The 4-second delay is hardcoded, which may be too conservative for some use cases or insufficient for others depending on rate limits.
- time.sleep(4) # Add 4 second delay to prevent rate-limit + time.sleep(4) # TODO: Make this configurable based on rate limit requirementsnew-backend/app/modules/pipeline.py (1)
41-64: Remove commented-out code.The commented-out implementation should be removed to keep the codebase clean, especially since it's now implemented in
app.utils.fact_check_utils.-# def run_fact_check_pipeline(state): - -# result = run_claim_extractor_sdk(state) -# # Step 1: Extract claims -# raw_output = result["verifiable_claims"] - -# # Match any line that starts with *, -, or • followed by text -# claims = re.findall(r"^[\*\-•]\s+(.*)", raw_output, re.MULTILINE) -# claims = [claim.strip() for claim in claims if claim.strip()] - -# # Step 2: Search each claim with polite delay -# search_results = [] -# for claim in claims: -# print(f"\n🔍Searching for claim...: {claim}") -# try: -# val = search_duckduckgo(claim) -# val[0]["claim"] = claim -# search_results.append(val[0]) -# except Exception as e: -# print(f"❌ Search failed for: {claim} -> {e}") -# time.sleep(4) # Add 4 second delay to prevent rate-limit - -# final = run_fact_verifier_sdk(search_results) -# return final["verifications"]new-backend/app/modules/scraper/cleaner.py (1)
35-35: Review the copyright pattern for potential over-removal.The copyright regex
r"© \d{4}.*"might remove legitimate copyright information that could be part of the article content, especially in articles about legal matters or publishing.Consider making this pattern more specific:
- r"© \d{4}.*", # copyright lines + r"© \d{4}[^.]*\.|© \d{4}\s*all rights reserved", # More specific copyright patternsnew-backend/app/utils/prompt_templates.py (1)
21-32: Consider adding JSON validation guidance.The prompt requests JSON output but doesn't specify how to handle malformed JSON responses. Consider adding instructions for the model to ensure valid JSON formatting.
You could enhance the prompt with more specific JSON formatting instructions:
Use *step-by-step reasoning* and return your output in this JSON format: + +Important: Ensure your response is valid JSON. Do not include any text before or after the JSON object.new-backend/app/modules/langgraph_nodes/judge.py (1)
38-43: Consider more robust score parsing.The current regex
\b(\d{1,3})\bwill match any 1-3 digit number, which could potentially match unintended numbers in the response. Consider making the parsing more specific to scores.- # 5) Pull the first integer 0–100 - m = re.search(r"\b(\d{1,3})\b", raw) + # Extract score - look for patterns like "85", "Score: 85", etc. + m = re.search(r"(?:score[:\s]*)?(\d{1,3})\b", raw, re.IGNORECASE)new-backend/app/modules/langgraph_nodes/generate_perspective.py (1)
35-38: Simplify conditional logic.The
elifafterraiseis unnecessary and can be simplified to improve readability.Apply this diff to simplify the conditional:
if not text: raise ValueError("Missing or empty 'cleaned_text' in state") - elif not facts: + if not facts: raise ValueError("Missing or empty 'facts' in state")new-backend/app/modules/facts_check/llm_processing.py (5)
12-12: Add type hints for better code documentation.The function signature lacks type hints, which reduces code clarity and IDE support.
-def run_claim_extractor_sdk(state): +def run_claim_extractor_sdk(state: dict) -> dict:
23-28: Clean up string concatenation in system prompt.The multi-line string concatenation with mixed quotes makes the code harder to read and maintain.
- "You are an assistant that extracts v" - "erifiable factual claims from articles. " - "Each claim must be short, fact-based, and" - " independently verifiable through internet search. " - "Only return a list of 3 clear bullet-point claims." + "You are an assistant that extracts verifiable factual claims from articles. " + "Each claim must be short, fact-based, and independently verifiable through internet search. " + "Only return a list of 3 clear bullet-point claims."
60-60: Add type hints for function parameters and return value.The function signature lacks type hints, which reduces code clarity and type safety.
-def run_fact_verifier_sdk(search_results): +def run_fact_verifier_sdk(search_results: list) -> dict:
109-109: Remove debug print statement.This appears to be leftover debug code that should be removed from production.
- print(content)
114-122: Improve error handling specificity.The broad exception handling makes debugging difficult and could mask important errors.
- except Exception as parse_err: + except (json.JSONDecodeError, KeyError, TypeError) as parse_err: print(f"❌ LLM JSON parse error: {parse_err}") + # Log the original content for debugging + print(f"Original content: {content}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
new-backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (17)
frontend/app/page.tsx(1 hunks)new-backend/app/modules/facts_check/llm_processing.py(1 hunks)new-backend/app/modules/facts_check/web_search.py(1 hunks)new-backend/app/modules/langgraph_builder.py(1 hunks)new-backend/app/modules/langgraph_nodes/error_handler.py(1 hunks)new-backend/app/modules/langgraph_nodes/fact_check.py(1 hunks)new-backend/app/modules/langgraph_nodes/generate_perspective.py(1 hunks)new-backend/app/modules/langgraph_nodes/judge.py(1 hunks)new-backend/app/modules/langgraph_nodes/sentiment.py(1 hunks)new-backend/app/modules/langgraph_nodes/store_and_send.py(1 hunks)new-backend/app/modules/pipeline.py(2 hunks)new-backend/app/modules/scraper/cleaner.py(2 hunks)new-backend/app/modules/scraper/extractor.py(1 hunks)new-backend/app/routes/routes.py(2 hunks)new-backend/app/utils/fact_check_utils.py(1 hunks)new-backend/app/utils/prompt_templates.py(1 hunks)new-backend/pyproject.toml(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
new-backend/app/modules/langgraph_nodes/fact_check.py (1)
new-backend/app/utils/fact_check_utils.py (1)
run_fact_check_pipeline(8-31)
new-backend/app/modules/pipeline.py (1)
new-backend/app/modules/langgraph_builder.py (1)
build_langgraph(24-106)
new-backend/app/routes/routes.py (1)
new-backend/app/modules/pipeline.py (1)
run_langgraph_workflow(35-38)
new-backend/app/utils/fact_check_utils.py (2)
new-backend/app/modules/facts_check/web_search.py (1)
search_duckduckgo(4-15)new-backend/app/modules/facts_check/llm_processing.py (2)
run_claim_extractor_sdk(12-57)run_fact_verifier_sdk(60-137)
🪛 Pylint (3.3.7)
new-backend/app/modules/langgraph_nodes/generate_perspective.py
[refactor] 9-9: Too few public methods (0/2)
(R0903)
[refactor] 35-38: Unnecessary "elif" after "raise", remove the leading "el" from "elif"
(R1720)
🔇 Additional comments (9)
new-backend/app/modules/scraper/extractor.py (1)
40-40: Good addition for consistency in text extraction.Adding
no_fallback=Trueensures more consistent extraction behavior by disabling fallback mechanisms, which is beneficial for a fact-checking pipeline where consistent text quality is important.However, consider the trade-off: this may reduce extraction success rates for difficult-to-parse web pages.
new-backend/app/routes/routes.py (1)
4-4: Good integration of the LangGraph workflow.The import statement correctly brings in the new workflow function.
new-backend/app/modules/langgraph_nodes/store_and_send.py (1)
8-10: Implement or remove commented vector DB code.The commented code suggests this functionality is planned but not implemented. Consider either implementing it or removing the comments to avoid confusion.
Is the vector database functionality intended to be implemented in this PR, or should the commented code be removed?
new-backend/pyproject.toml (1)
9-19: LGTM! New dependencies align with the implemented features.The added dependencies are appropriate for the fact-checking, sentiment analysis, and LangGraph workflow functionality being introduced.
new-backend/app/modules/scraper/cleaner.py (1)
32-71: LGTM! Comprehensive boilerplate removal patterns.The extensive list of boilerplate patterns should significantly improve text cleaning quality by removing common web article noise and navigation elements.
new-backend/app/modules/langgraph_nodes/judge.py (1)
15-18: To locate and inspect the actualPerspectiveOutputdefinition, let’s search for it and then pull in the surrounding code:#!/bin/bash set -e echo "🔍 Searching for any references to PerspectiveOutput…" rg -n "PerspectiveOutput" -A 5 || true echo echo "📂 Locating generate_perspective.py…" fd generate_perspective.py || true echo echo "📄 Dumping first 200 lines of generate_perspective.py (if found)…" TARGET=$(fd generate_perspective.py | head -n1) if [ -n "$TARGET" ]; then echo "=== Contents of $TARGET ===" sed -n '1,200p' "$TARGET" else echo "⚠️ generate_perspective.py not found in repo." finew-backend/app/modules/langgraph_nodes/sentiment.py (1)
10-52: Well-implemented sentiment analysis function.The implementation is clean and follows good practices:
- Proper error handling with try-catch
- Clear system prompt with specific instructions
- Appropriate model parameters (temperature=0.2, max_tokens=10)
- Consistent state management pattern
new-backend/app/modules/langgraph_nodes/generate_perspective.py (1)
9-11: Pydantic model structure is appropriate.The pylint warning about too few public methods is a false positive for Pydantic models, which are designed as data containers rather than behavior-rich classes.
new-backend/app/modules/langgraph_builder.py (1)
24-106: Well-structured LangGraph workflow.The overall workflow design is solid with proper:
- State management using TypedDict
- Error handling paths for all nodes
- Logical flow from sentiment analysis through to final storage
- Retry logic for perspective generation based on quality scores
| @@ -0,0 +1,11 @@ | |||
|
|
|||
|
|
|||
| def error_handler(input): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use a more descriptive parameter name.
The parameter name input shadows the built-in input() function. Consider using state or error_state for clarity.
-def error_handler(input):
+def error_handler(state):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def error_handler(input): | |
| def error_handler(state): |
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/error_handler.py at line 3, rename
the function parameter from `input` to a more descriptive name like `state` or
`error_state` to avoid shadowing the built-in `input()` function and improve
code clarity.
| print("Error detected!") | ||
| print(f"From: {input.get('error_from')}") | ||
| print(f"Message: {input.get('message')}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Replace print statements with proper logging.
Using print() statements is not ideal for production code. Consider using the logging module for better log management and control.
+import logging
+
+logger = logging.getLogger(__name__)
+
def error_handler(input):
- print("Error detected!")
- print(f"From: {input.get('error_from')}")
- print(f"Message: {input.get('message')}")
+ logger.error("Error detected!")
+ logger.error(f"From: {input.get('error_from')}")
+ logger.error(f"Message: {input.get('message')}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print("Error detected!") | |
| print(f"From: {input.get('error_from')}") | |
| print(f"Message: {input.get('message')}") | |
| import logging | |
| logger = logging.getLogger(__name__) | |
| def error_handler(input): | |
| logger.error("Error detected!") | |
| logger.error(f"From: {input.get('error_from')}") | |
| logger.error(f"Message: {input.get('message')}") |
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/error_handler.py around lines 4 to 6,
replace the print statements with calls to the logging module. Import the
logging module at the top if not already done, configure a logger, and use
logger.info or logger.error to log the error messages instead of print, ensuring
better log management and control.
| return {"status": "stopped_due_to_error", | ||
| "from": [input.get("error_from")], | ||
| "error": [input.get("message")] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add input validation and fix inconsistent return structure.
The function should validate that required keys exist in the input. Also, the return structure is inconsistent - from and error are returned as lists containing single values, which seems unnecessary.
return {"status": "stopped_due_to_error",
- "from": [input.get("error_from")],
- "error": [input.get("message")]
+ "error_from": input.get("error_from", "unknown"),
+ "message": input.get("message", "unknown error")
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return {"status": "stopped_due_to_error", | |
| "from": [input.get("error_from")], | |
| "error": [input.get("message")] | |
| } | |
| return {"status": "stopped_due_to_error", | |
| "error_from": input.get("error_from", "unknown"), | |
| "message": input.get("message", "unknown error") | |
| } |
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/error_handler.py around lines 8 to
11, add validation to check that the input dictionary contains the required keys
"error_from" and "message" before accessing them. Also, modify the return
statement to return the values of "error_from" and "message" directly as strings
instead of wrapping them in single-element lists, ensuring a consistent and
simpler return structure.
| data = run_langgraph_workflow(article_text) | ||
| return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling around workflow execution.
The workflow execution should be wrapped in try-catch blocks to handle potential failures gracefully and return appropriate HTTP error responses.
- data = run_langgraph_workflow(article_text)
- return data
+ try:
+ data = run_langgraph_workflow(article_text)
+ return data
+ except Exception as e:
+ logging.error(f"Workflow execution failed: {e}")
+ return {"error": "Failed to process article", "status": "error"}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| data = run_langgraph_workflow(article_text) | |
| return data | |
| try: | |
| data = run_langgraph_workflow(article_text) | |
| return data | |
| except Exception as e: | |
| logging.error(f"Workflow execution failed: {e}") | |
| return {"error": "Failed to process article", "status": "error"} |
🤖 Prompt for AI Agents
In new-backend/app/routes/routes.py around lines 23 to 24, the call to
run_langgraph_workflow lacks error handling. Wrap this call in a try-except
block to catch exceptions, and return an appropriate HTTP error response (such
as a 500 status code with an error message) if the workflow execution fails,
ensuring the API handles failures gracefully.
| def store_and_send(state): | ||
| # to store data in vector db | ||
| try: | ||
| print(state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Replace print with proper logging.
Using print() for debugging is not ideal for production code. Consider using the logging module.
+import logging
+
+logger = logging.getLogger(__name__)
+
- print(state)
+ logger.info(f"Processing state: {state}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(state) | |
| import logging | |
| logger = logging.getLogger(__name__) | |
| # … other imports and code … | |
| def store_and_send(state): | |
| # … existing logic before … | |
| logger.info(f"Processing state: {state}") | |
| # … existing logic after … |
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/store_and_send.py at line 7, replace
the use of print(state) with a proper logging statement. Import the logging
module if not already imported, configure a logger for the module, and use
logger.debug or logger.info to log the state variable instead of printing it
directly.
| graph.add_conditional_edges( | ||
| "judge_perspective", | ||
| lambda state: ( | ||
| "error_handler" | ||
| if state.get("status") == "error" | ||
| else ( | ||
| "store_and_send" | ||
| if state.get("retries", 0) >= 3 | ||
| else "generate_perspective" | ||
| ) | ||
| if state.get("score", 0) < 70 | ||
| else "store_and_send" | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Simplify complex conditional logic for better readability.
The nested ternary operators in the judge_perspective conditional edge are difficult to read and maintain. Consider breaking this into a separate function for clarity.
Create a separate function to handle the judge perspective routing logic:
+def judge_perspective_router(state):
+ """Route after judge_perspective based on status, score, and retries."""
+ if state.get("status") == "error":
+ return "error_handler"
+
+ score = state.get("score", 0)
+ retries = state.get("retries", 0)
+
+ if score >= 70:
+ return "store_and_send"
+
+ if retries >= 3:
+ return "store_and_send"
+
+ return "generate_perspective"
+
graph.add_conditional_edges(
"judge_perspective",
- lambda state: (
- "error_handler"
- if state.get("status") == "error"
- else (
- "store_and_send"
- if state.get("retries", 0) >= 3
- else "generate_perspective"
- )
- if state.get("score", 0) < 70
- else "store_and_send"
- )
+ judge_perspective_router
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| graph.add_conditional_edges( | |
| "judge_perspective", | |
| lambda state: ( | |
| "error_handler" | |
| if state.get("status") == "error" | |
| else ( | |
| "store_and_send" | |
| if state.get("retries", 0) >= 3 | |
| else "generate_perspective" | |
| ) | |
| if state.get("score", 0) < 70 | |
| else "store_and_send" | |
| ) | |
| ) | |
| def judge_perspective_router(state): | |
| """Route after judge_perspective based on status, score, and retries.""" | |
| if state.get("status") == "error": | |
| return "error_handler" | |
| score = state.get("score", 0) | |
| retries = state.get("retries", 0) | |
| if score >= 70: | |
| return "store_and_send" | |
| if retries >= 3: | |
| return "store_and_send" | |
| return "generate_perspective" | |
| graph.add_conditional_edges( | |
| "judge_perspective", | |
| judge_perspective_router | |
| ) |
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_builder.py around lines 81 to 94, the
nested ternary operators in the lambda for judge_perspective make the
conditional logic hard to read. Refactor by extracting this lambda into a
separate named function that takes the state as input and returns the
appropriate next step string based on the conditions. Then pass this function to
add_conditional_edges instead of the inline lambda to improve clarity and
maintainability.
| f' "original_claim": "{claim}",\n' | ||
| f' "source_link": "{source}"\n' | ||
| "}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix potential JSON injection in prompt template.
Using f-strings to embed user data directly into JSON template strings can cause malformed JSON if the data contains quotes or special characters.
- f' "original_claim": "{claim}",\n'
- f' "source_link": "{source}"\n'
+ ' "original_claim": "' + claim.replace('"', '\\"') + '",\n'
+ ' "source_link": "' + source.replace('"', '\\"') + '"\n'Or better yet, use a proper JSON template approach:
json_template = {
"verdict": "True | False | Unverifiable",
"explanation": "...",
"original_claim": claim,
"source_link": source
}
template_str = json.dumps(json_template, indent=2)🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py around lines 94 to 96,
the current code uses f-strings to insert user data directly into a JSON string,
which risks JSON injection and malformed output if the data contains quotes or
special characters. To fix this, replace the f-string construction with a
dictionary representing the JSON structure, assign the user data to the
appropriate keys, and then serialize the dictionary to a JSON string using
json.dumps with proper indentation.
| for result in search_results: | ||
| source = result.get("link", "N/A") | ||
| claim = result.get("claim", "N/A") | ||
| evidence = (f"{result.get('title', '')}" | ||
| f"\n{result.get('snippet', '')}" | ||
| f"\nLink: {source}") | ||
|
|
||
| chat_completion = client.chat.completions.create( | ||
| messages=[ | ||
| { | ||
| "role": "system", | ||
| "content": ( | ||
| "You are a fact-checking assistant. " | ||
| "Your job is to determine whether the given" | ||
| " claim is True, False, or Unverifiable " | ||
| "based on the provided web search evidence." | ||
| " Keep it concise and structured." | ||
| ), | ||
| }, | ||
| { | ||
| "role": "user", | ||
| "content": ( | ||
| f"Claim: {claim}\n\n" | ||
| f"Web Evidence:\n{evidence}\n\n" | ||
| "Based on this evidence, is the claim true?\n" | ||
| "Respond only in this JSON format:\n\n" | ||
| "{\n" | ||
| ' "verdict": "True" | "False" | "Unverifiable",\n' | ||
| ' "explanation": "...",\n' | ||
| f' "original_claim": "{claim}",\n' | ||
| f' "source_link": "{source}"\n' | ||
| "}" | ||
| ), | ||
| }, | ||
| ], | ||
| model="gemma2-9b-it", | ||
| temperature=0.3, | ||
| max_tokens=256, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimize API calls and fix potential performance issues.
Making sequential API calls for each search result could be slow and expensive. Consider batching or implementing rate limiting.
For better performance, consider:
- Implementing concurrent API calls with proper rate limiting
- Batching multiple claims in a single API call if the model supports it
- Adding retry logic for failed API calls
Example implementation for concurrent processing:
import asyncio
from concurrent.futures import ThreadPoolExecutor
def process_single_claim(result):
# Move the existing logic here
pass
# In the main function:
with ThreadPoolExecutor(max_workers=3) as executor:
futures = [executor.submit(process_single_claim, result) for result in search_results]
results_list = [future.result() for future in futures]🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py between lines 65 and
103, the code makes sequential API calls for each search result, which can cause
slow performance and high costs. Refactor by moving the existing API call logic
into a separate function that processes a single claim, then use concurrent
execution (e.g., ThreadPoolExecutor or asyncio) to run multiple calls in
parallel with a controlled number of workers to avoid rate limits. Additionally,
consider batching multiple claims per API call if supported and add retry logic
to handle transient failures gracefully.
| return { | ||
| "claim": claim, | ||
| "verifications": results_list, | ||
| "status": "success", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect return value for 'claim' field.
The returned claim value comes from the last iteration of the loop, which is likely not the intended behavior.
return {
- "claim": claim,
"verifications": results_list,
"status": "success",
}If you need to return claims, consider returning all claims or restructuring the return value.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return { | |
| "claim": claim, | |
| "verifications": results_list, | |
| "status": "success", | |
| } | |
| return { | |
| "verifications": results_list, | |
| "status": "success", | |
| } |
🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py around lines 125 to
129, the 'claim' field in the return dictionary incorrectly uses the last loop
iteration's claim value. To fix this, modify the code to return all claims
collected during the loop, either as a list or by restructuring the return value
to include all relevant claims instead of just one.
|
|
||
| load_dotenv() | ||
|
|
||
| client = Groq(api_key=os.getenv("GROQ_API_KEY")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for missing API key.
The Groq client is initialized at module level without checking if the API key exists, which could cause runtime errors when the module is imported.
-client = Groq(api_key=os.getenv("GROQ_API_KEY"))
+def get_groq_client():
+ api_key = os.getenv("GROQ_API_KEY")
+ if not api_key:
+ raise ValueError("GROQ_API_KEY environment variable is required")
+ return Groq(api_key=api_key)
+
+client = get_groq_client()🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py at line 9, the Groq
client is initialized directly with an API key from the environment without
verifying its presence. To fix this, add a check to confirm the GROQ_API_KEY
environment variable is set before initializing the client. If the key is
missing, raise a clear exception or handle the error gracefully to prevent
runtime errors during module import.
This PR introduces the
judge_perspectivenode to the LangGraph pipeline. This node evaluates the quality of a generated counter-perspective using a scoring model (gemma2-9b-it) provided via the Groq SDK.🧠 How It Works
Input:
Takes the
perspectiveobject from the pipeline state.Prompt:
A scoring prompt is constructed that asks the LLM to give a single integer score (0–100) based on:
Scoring:
The model response is parsed to extract the first valid integer in the range 0–100, and the score is added back to the pipeline state.
Model Used:
gemma2-9b-itvialangchain_groq.ChatGroq.Sample Output State (after generate-perspective-node)
{ "cleaned_text": "The 2025 French Open men’s final at Roland Garros was more than just a sporting event — it was also a major celebrity moment.\n\nAs Carlos Alcaraz battled Jannik Sinner on the iconic clay courts of Paris, the stands were filled with famous faces from film, music, and sport.\n\nAmong those spotted in the crowd were singer and fashion icon Pharrell Williams, actors Natalie Portman, Lily Collins, Dustin Hoffman, and Eddie Redmayne, as well as filmmaker Spike Lee. Netflix star Taylor Zakhar-Perez and British Formula 1 driver George Russell also made an appearance, adding to the glamour and excitement of the championship match.\n\nRoland Garros has long been a favourite among stars, known for its unique combination of top-level tennis and Parisian flair. This year was no exception, with celebrity attendees enjoying both the high-stakes final and the stylish atmosphere of the grounds.\n\nTake a look at the various celebrities who turned up at the event:\n\nWhile all eyes were on Alcaraz and Sinner as they went head-to-head in a tense and athletic final, the buzz in the stands was equally electric. Fans and photographers alike turned their cameras to the VIP section, capturing moments of the celebrities enjoying the match, chatting between sets, and soaking in the summer sunshine.\n\nWith its mix of elite sport and high-profile guests, the Roland Garros men’s final once again proved that tennis can bring together the worlds of film, fashion, and speed — all in one unforgettable Paris afternoon.\n\nStay updated with the latest Trending, India , World and United States news. Follow all the latest updates on Israel Iran Conflict here on Livemint.\n\nBusiness NewsNewsUs NewsRoland Garros 2025: From Pharrell Williams to Natalie Portman, stars step out at French Open Men’s Final", "facts": [ { "verdict": "True", "explanation": "The provided article states that Carlos Alcaraz and Jannik Sinner played in the 2025 French Open men's final.", "original_claim": "**Carlos Alcaraz and Jannik Sinner played in the 2025 French Open men's final.**", "source_link": "https://www.cbssports.com/tennis/news/2025-french-open-what-carlos-alcaraz-jannik-sinner-said-after-all-time-mens-singles-final-at-roland-garros/" }, { "verdict": "Unverifiable", "explanation": "The article lists celebrities who attended the 2023 French Open final, not the 2025 final. ", "original_claim": "**Pharrell Williams attended the 2025 French Open men's final.**", "source_link": "https://www.essentiallysports.com/atp-tennis-news-pharell-williams-to-dustin-hoffman-celebrities-who-attended-the-french-open-final-between-alcaraz-and-sinner/" }, { "verdict": "Unverifiable", "explanation": "The provided evidence focuses on George Russell's participation in the Canadian Grand Prix, a Formula 1 race. It does not offer any information about his presence at the 2025 French Open.", "original_claim": "**George Russell, a British Formula 1 driver, was present at the 2025 French Open men's final.**", "source_link": "https://www.espn.com/f1/story/_/id/45519593/red-bull-protest-russell-canadian-gp-victory" } ], "sentiment": "Positive", "perspective": { "reasoning": "The article presents a positive sentiment towards the 2025 French Open men's final, highlighting the presence of celebrities in the crowd. However, a counter-perspective could argue that this focus on celebrity attendance detracts from the true significance of the event, which is the sporting competition itself. Step 1: Identify the main theme of the article, which is the intersection of sports and celebrity culture at the French Open. Step 2: Consider the potential drawbacks of this intersection, such as the distraction from the athletes' achievements and the sport's integrity. Step 3: Evaluate the verified facts provided, noting that some claims of celebrity attendance are unverifiable. Step 4: Reflect on the potential consequences of prioritizing celebrity presence over the sport itself, including the possible undermining of the event's integrity. Step 5: Formulate a counter-perspective that presents a more nuanced view of the event, acknowledging both the excitement of celebrity attendance and the potential risks to the sport's appreciation.", "perspective": "The 2025 French Open men's final, while a significant sporting event, may not be as glamorous or celebrity-studded as perceived. The presence of famous faces could be seen as a distraction from the true essence of the competition, which is the athletes' skill and dedication. Furthermore, the emphasis on celebrity attendees might overshadow the achievements of the players and the sport as a whole. The verified facts provided do not fully support the claims of celebrity attendance, which could indicate a focus on sensationalism over factual reporting. Ultimately, the intersection of sports and celebrity culture at events like the French Open can be seen as a double-edged sword, potentially undermining the integrity and appreciation of the sport itself." }, "score": 80, "retries": 1, "status": "success" } <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added a paragraph below the "Get Started" button on the landing page indicating "No sign in required. It’s completely free." with styled text and animation. * Introduced advanced backend workflows for article processing, including sentiment analysis, fact-checking, counter-perspective generation, and automated evaluation using language models. * Integrated DuckDuckGo web search to support fact verification. * Enhanced text cleaning with expanded boilerplate removal for improved article extraction. * **Bug Fixes** * Improved extraction reliability by disabling fallback mechanisms during article text extraction. * **Chores** * Added new dependencies for language modeling, web search, and natural language processing. <!-- end of auto-generated comment: release notes by coderabbit.ai -->