Skip to content

Conversation

@ParagGhatage
Copy link
Collaborator

@ParagGhatage ParagGhatage commented Jun 26, 2025

This PR introduces the judge_perspective node to the LangGraph pipeline. This node evaluates the quality of a generated counter-perspective using a scoring model (gemma2-9b-it) provided via the Groq SDK.


🧠 How It Works

  • Input:
    Takes the perspective object from the pipeline state.

  • Prompt:
    A scoring prompt is constructed that asks the LLM to give a single integer score (0–100) based on:

    • Originality
    • Reasoning quality
    • Factual grounding
  • Scoring:
    The model response is parsed to extract the first valid integer in the range 0–100, and the score is added back to the pipeline state.

  • Model Used:
    gemma2-9b-it via langchain_groq.ChatGroq.


Sample Output State (after generate-perspective-node)

{
    "cleaned_text": "The 2025 French Open men’s final at Roland Garros was more than just a sporting event — it was also a major celebrity moment.\n\nAs Carlos Alcaraz battled Jannik Sinner on the iconic clay courts of Paris, the stands were filled with famous faces from film, music, and sport.\n\nAmong those spotted in the crowd were singer and fashion icon Pharrell Williams, actors Natalie Portman, Lily Collins, Dustin Hoffman, and Eddie Redmayne, as well as filmmaker Spike Lee. Netflix star Taylor Zakhar-Perez and British Formula 1 driver George Russell also made an appearance, adding to the glamour and excitement of the championship match.\n\nRoland Garros has long been a favourite among stars, known for its unique combination of top-level tennis and Parisian flair. This year was no exception, with celebrity attendees enjoying both the high-stakes final and the stylish atmosphere of the grounds.\n\nTake a look at the various celebrities who turned up at the event:\n\nWhile all eyes were on Alcaraz and Sinner as they went head-to-head in a tense and athletic final, the buzz in the stands was equally electric. Fans and photographers alike turned their cameras to the VIP section, capturing moments of the celebrities enjoying the match, chatting between sets, and soaking in the summer sunshine.\n\nWith its mix of elite sport and high-profile guests, the Roland Garros men’s final once again proved that tennis can bring together the worlds of film, fashion, and speed — all in one unforgettable Paris afternoon.\n\nStay updated with the latest Trending, India , World and United States news. Follow all the latest updates on Israel Iran Conflict here on Livemint.\n\nBusiness NewsNewsUs NewsRoland Garros 2025: From Pharrell Williams to Natalie Portman, stars step out at French Open Men’s Final",
    "facts": [
        {
            "verdict": "True",
            "explanation": "The provided article states that Carlos Alcaraz and Jannik Sinner played in the 2025 French Open men's final.",
            "original_claim": "**Carlos Alcaraz and Jannik Sinner played in the 2025 French Open men's final.**",
            "source_link": "https://www.cbssports.com/tennis/news/2025-french-open-what-carlos-alcaraz-jannik-sinner-said-after-all-time-mens-singles-final-at-roland-garros/"
        },
        {
            "verdict": "Unverifiable",
            "explanation": "The article lists celebrities who attended the 2023 French Open final, not the 2025 final. ",
            "original_claim": "**Pharrell Williams attended the 2025 French Open men's final.**",
            "source_link": "https://www.essentiallysports.com/atp-tennis-news-pharell-williams-to-dustin-hoffman-celebrities-who-attended-the-french-open-final-between-alcaraz-and-sinner/"
        },
        {
            "verdict": "Unverifiable",
            "explanation": "The provided evidence focuses on George Russell's participation in the Canadian Grand Prix, a Formula 1 race. It does not offer any information about his presence at the 2025 French Open.",
            "original_claim": "**George Russell, a British Formula 1 driver, was present at the 2025 French Open men's final.**",
            "source_link": "https://www.espn.com/f1/story/_/id/45519593/red-bull-protest-russell-canadian-gp-victory"
        }
    ],
    "sentiment": "Positive",
    "perspective": {
        "reasoning": "The article presents a positive sentiment towards the 2025 French Open men's final, highlighting the presence of celebrities in the crowd. However, a counter-perspective could argue that this focus on celebrity attendance detracts from the true significance of the event, which is the sporting competition itself. Step 1: Identify the main theme of the article, which is the intersection of sports and celebrity culture at the French Open. Step 2: Consider the potential drawbacks of this intersection, such as the distraction from the athletes' achievements and the sport's integrity. Step 3: Evaluate the verified facts provided, noting that some claims of celebrity attendance are unverifiable. Step 4: Reflect on the potential consequences of prioritizing celebrity presence over the sport itself, including the possible undermining of the event's integrity. Step 5: Formulate a counter-perspective that presents a more nuanced view of the event, acknowledging both the excitement of celebrity attendance and the potential risks to the sport's appreciation.",
        "perspective": "The 2025 French Open men's final, while a significant sporting event, may not be as glamorous or celebrity-studded as perceived. The presence of famous faces could be seen as a distraction from the true essence of the competition, which is the athletes' skill and dedication. Furthermore, the emphasis on celebrity attendees might overshadow the achievements of the players and the sport as a whole. The verified facts provided do not fully support the claims of celebrity attendance, which could indicate a focus on sensationalism over factual reporting. Ultimately, the intersection of sports and celebrity culture at events like the French Open can be seen as a double-edged sword, potentially undermining the integrity and appreciation of the sport itself."
    },
"score": 80,
"retries": 1,
 "status": "success"
}



<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

* **New Features**
  * Added a paragraph below the "Get Started" button on the landing page indicating "No sign in required. It’s completely free." with styled text and animation.
  * Introduced advanced backend workflows for article processing, including sentiment analysis, fact-checking, counter-perspective generation, and automated evaluation using language models.
  * Integrated DuckDuckGo web search to support fact verification.
  * Enhanced text cleaning with expanded boilerplate removal for improved article extraction.

* **Bug Fixes**
  * Improved extraction reliability by disabling fallback mechanisms during article text extraction.

* **Chores**
  * Added new dependencies for language modeling, web search, and natural language processing.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 26, 2025

Walkthrough

A comprehensive fact-checking and perspective-generation pipeline was introduced to the backend, leveraging Groq LLM, DuckDuckGo search, and LangGraph for orchestrating multi-step workflows. New modules handle claim extraction, web search, fact verification, sentiment analysis, perspective generation, and judgment. The frontend's hero section received a minor UI update with an informational paragraph.

Changes

File(s)/Module(s) Change Summary
frontend/app/page.tsx Added a styled, animated paragraph below the "Get Started" button indicating no sign-in is required and the service is free.
new-backend/app/modules/facts_check/llm_processing.py Added functions for claim extraction and fact verification using Groq LLM API, with error handling and structured responses.
new-backend/app/modules/facts_check/web_search.py Introduced DuckDuckGo-based search utility for claim evidence retrieval.
new-backend/app/modules/langgraph_builder.py Added a LangGraph-based workflow builder for state-driven text processing, defining nodes for each analytical step and error handling.
new-backend/app/modules/langgraph_nodes/error_handler.py Added an error handler function for workflow error reporting and state updating.
new-backend/app/modules/langgraph_nodes/fact_check.py Added a function to run the fact-checking step using a utility pipeline, with input validation and error handling.
new-backend/app/modules/langgraph_nodes/generate_perspective.py Added structured LLM-based perspective generation with output validation, error handling, and retry logic.
new-backend/app/modules/langgraph_nodes/judge.py Added a function to rate the generated perspective using LLM, extracting and clamping a score from the model output.
new-backend/app/modules/langgraph_nodes/sentiment.py Added sentiment analysis using Groq LLM, with input validation and error handling.
new-backend/app/modules/langgraph_nodes/store_and_send.py Added a placeholder function for storing results, with error handling and state updating.
new-backend/app/modules/pipeline.py Integrated the LangGraph workflow, exposing a function to run the full pipeline on input state.
new-backend/app/modules/scraper/cleaner.py Ensured NLTK corpora are available and expanded boilerplate removal patterns for text cleaning.
new-backend/app/modules/scraper/extractor.py Modified trafilatura extraction to disable fallback extraction.
new-backend/app/routes/routes.py Updated /process endpoint to run the full LangGraph workflow after scraping, returning the workflow result.
new-backend/app/utils/fact_check_utils.py Added a utility to orchestrate claim extraction, web search, and fact verification with error handling and rate limiting.
new-backend/app/utils/prompt_templates.py Added a prompt template for generating counter-perspectives with reasoning steps, using LangChain's prompt system.
new-backend/pyproject.toml Added dependencies: dotenv, duckduckgo-search, groq, langchain, langchain-community, langchain-groq, langgraph, and nltk.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend
    participant BackendAPI
    participant Scraper
    participant LangGraphWorkflow
    participant SentimentNode
    participant FactCheckNode
    participant PerspectiveNode
    participant JudgeNode
    participant StoreNode
    participant ErrorHandler

    User->>Frontend: Submit article URL
    Frontend->>BackendAPI: POST /process {url}
    BackendAPI->>Scraper: Extract and clean article text
    Scraper-->>BackendAPI: {cleaned_text}
    BackendAPI->>LangGraphWorkflow: Start workflow with {cleaned_text}
    LangGraphWorkflow->>SentimentNode: Analyze sentiment
    SentimentNode-->>LangGraphWorkflow: {sentiment}
    LangGraphWorkflow->>FactCheckNode: Extract and verify claims
    FactCheckNode->>FactCheckNode: (Claim extraction, web search, LLM verification)
    FactCheckNode-->>LangGraphWorkflow: {facts}
    LangGraphWorkflow->>PerspectiveNode: Generate counter-perspective
    PerspectiveNode-->>LangGraphWorkflow: {perspective}
    LangGraphWorkflow->>JudgeNode: Judge perspective quality
    JudgeNode-->>LangGraphWorkflow: {score}
    alt Score < 70 and retries < 3
        LangGraphWorkflow->>PerspectiveNode: Retry perspective generation
    else Score >= 70 or retries >= 3
        LangGraphWorkflow->>StoreNode: Store and send results
        StoreNode-->>LangGraphWorkflow: {status: success}
    end
    LangGraphWorkflow-->>BackendAPI: {final result}
    BackendAPI-->>Frontend: Return result
Loading

Possibly related PRs

  • Added Sentiment Analysis Node using Groq SDK #101: Adds the same paragraph element in the frontend and implements the sentiment analysis node using Groq SDK, closely related to the sentiment node and frontend update here.
  • Base Langgraph setup with node files. #99: Establishes the base LangGraph structure and node files; this PR builds upon that foundation by adding detailed implementations of fact-checking, claim extraction, web search, and perspective generation.

Poem

🐇
A hop, a skip, a backend leap,
Fact-checking claims while you sleep!
With graphs and Groq, the workflow flows,
Sentiments judged, perspectives composed.
Free for all, no sign-in fuss—
The rabbit’s made it easy for us!
🌱✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f95f8a4 and 457a50a.

📒 Files selected for processing (1)
  • new-backend/app/modules/langgraph_nodes/sentiment.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • new-backend/app/modules/langgraph_nodes/sentiment.py
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

🧹 Nitpick comments (18)
frontend/app/page.tsx (1)

118-121: Minor copy/UX polish – hyphenate “sign-in” for grammatical correctness

The new paragraph is a nice touch 👍.
Consider using the compound adjective “sign-in” (with a hyphen or NB-SP) to avoid the brief cognitive stumble some readers experience with “sign in”.

-  <p className="mt-3 text-xs text-slate-500 dark:text-slate-400 animate-fade-in delay-600">
-     No sign in required. It’s completely free.
+  <p className="mt-3 text-xs text-slate-500 dark:text-slate-400 animate-fade-in delay-600">
+     No sign-in required. It’s completely free.
   </p>
new-backend/app/modules/langgraph_nodes/store_and_send.py (3)

1-1: Consider removing commented import or implementing the functionality.

The commented import suggests incomplete implementation. Either implement the vector store functionality or remove the comment if it's not needed yet.


11-17: Improve error handling specificity.

The generic exception handling could mask specific errors. Consider logging the full exception details and potentially handling specific exception types differently.

     except Exception as e:
-        print(f"some error occured in store_and_send:{e}")
+        logger.error(f"Error occurred in store_and_send: {str(e)}", exc_info=True)
         return {
             "status": "error",
             "error_from": "store_and_send",
-            "message": f"{e}",
+            "message": str(e),
         }

19-22: Validate state structure before spreading.

Using the spread operator **state without validation could lead to issues if the state contains unexpected keys or structures.

Consider validating the state structure or being more explicit about which keys to include in the response.

new-backend/app/modules/facts_check/web_search.py (1)

4-15: Good implementation with room for improvement.

The core search functionality is well-implemented and returns a clean, structured format. However, consider the following improvements:

  1. Add error handling for search failures
  2. Replace print with proper logging
  3. Add input validation for the query parameter
+import logging
+
+logger = logging.getLogger(__name__)
+
 def search_duckduckgo(query, max_results=1):
+    if not query or not query.strip():
+        raise ValueError("Query cannot be empty")
+        
     with DDGS() as ddgs:
-        results = ddgs.text(query, max_results=max_results)
-        print(results)
-        return [
-            {
-                "title": r["title"],
-                "snippet": r["body"],
-                "link": r["href"]
-            }
-            for r in results
-        ]
+        try:
+            results = ddgs.text(query, max_results=max_results)
+            logger.debug(f"Search results for '{query}': {results}")
+            return [
+                {
+                    "title": r.get("title", ""),
+                    "snippet": r.get("body", ""),
+                    "link": r.get("href", "")
+                }
+                for r in results
+            ]
+        except Exception as e:
+            logger.error(f"Search failed for query '{query}': {e}")
+            return []
new-backend/app/modules/langgraph_nodes/fact_check.py (2)

1-1: Remove the malformed commented import.

The commented import line contains a trailing backslash which would cause a syntax error if uncommented.

-# from app.modules.pipeline import run_fact_check_pipeline\

14-14: Fix the typo in the error message.

"occured" should be "occurred".

-        print(f"some error occured in fact_checking:{e}")
+        print(f"some error occurred in fact_checking:{e}")
new-backend/app/utils/fact_check_utils.py (1)

28-28: Consider making the search delay configurable.

The 4-second delay is hardcoded, which may be too conservative for some use cases or insufficient for others depending on rate limits.

-        time.sleep(4)  # Add 4 second delay to prevent rate-limit
+        time.sleep(4)  # TODO: Make this configurable based on rate limit requirements
new-backend/app/modules/pipeline.py (1)

41-64: Remove commented-out code.

The commented-out implementation should be removed to keep the codebase clean, especially since it's now implemented in app.utils.fact_check_utils.

-# def run_fact_check_pipeline(state):
-
-#     result = run_claim_extractor_sdk(state)
-#     # Step 1: Extract claims
-#     raw_output = result["verifiable_claims"]
-
-#     # Match any line that starts with *, -, or • followed by text
-#     claims = re.findall(r"^[\*\-•]\s+(.*)", raw_output, re.MULTILINE)
-#     claims = [claim.strip() for claim in claims if claim.strip()]
-
-#     # Step 2: Search each claim with polite delay
-#     search_results = []
-#     for claim in claims:
-#         print(f"\n🔍Searching for claim...: {claim}")
-#         try:
-#             val = search_duckduckgo(claim)
-#             val[0]["claim"] = claim
-#             search_results.append(val[0])
-#         except Exception as e:
-#             print(f"❌ Search failed for: {claim} -> {e}")
-#         time.sleep(4)  # Add 4 second delay to prevent rate-limit
-
-#     final = run_fact_verifier_sdk(search_results)
-#     return final["verifications"]
new-backend/app/modules/scraper/cleaner.py (1)

35-35: Review the copyright pattern for potential over-removal.

The copyright regex r"© \d{4}.*" might remove legitimate copyright information that could be part of the article content, especially in articles about legal matters or publishing.

Consider making this pattern more specific:

-        r"© \d{4}.*",               # copyright lines
+        r"© \d{4}[^.]*\.|© \d{4}\s*all rights reserved",  # More specific copyright patterns
new-backend/app/utils/prompt_templates.py (1)

21-32: Consider adding JSON validation guidance.

The prompt requests JSON output but doesn't specify how to handle malformed JSON responses. Consider adding instructions for the model to ensure valid JSON formatting.

You could enhance the prompt with more specific JSON formatting instructions:

 Use *step-by-step reasoning* and return your output in this JSON format:
+
+Important: Ensure your response is valid JSON. Do not include any text before or after the JSON object.
new-backend/app/modules/langgraph_nodes/judge.py (1)

38-43: Consider more robust score parsing.

The current regex \b(\d{1,3})\b will match any 1-3 digit number, which could potentially match unintended numbers in the response. Consider making the parsing more specific to scores.

-        # 5) Pull the first integer 0–100
-        m = re.search(r"\b(\d{1,3})\b", raw)
+        # Extract score - look for patterns like "85", "Score: 85", etc.
+        m = re.search(r"(?:score[:\s]*)?(\d{1,3})\b", raw, re.IGNORECASE)
new-backend/app/modules/langgraph_nodes/generate_perspective.py (1)

35-38: Simplify conditional logic.

The elif after raise is unnecessary and can be simplified to improve readability.

Apply this diff to simplify the conditional:

         if not text:
             raise ValueError("Missing or empty 'cleaned_text' in state")
-        elif not facts:
+        if not facts:
             raise ValueError("Missing or empty 'facts' in state")
new-backend/app/modules/facts_check/llm_processing.py (5)

12-12: Add type hints for better code documentation.

The function signature lacks type hints, which reduces code clarity and IDE support.

-def run_claim_extractor_sdk(state):
+def run_claim_extractor_sdk(state: dict) -> dict:

23-28: Clean up string concatenation in system prompt.

The multi-line string concatenation with mixed quotes makes the code harder to read and maintain.

-                        "You are an assistant that extracts v"
-                        "erifiable factual claims from articles. "
-                        "Each claim must be short, fact-based, and"
-                        " independently verifiable through internet search. "
-                        "Only return a list of 3 clear bullet-point claims."
+                        "You are an assistant that extracts verifiable factual claims from articles. "
+                        "Each claim must be short, fact-based, and independently verifiable through internet search. "
+                        "Only return a list of 3 clear bullet-point claims."

60-60: Add type hints for function parameters and return value.

The function signature lacks type hints, which reduces code clarity and type safety.

-def run_fact_verifier_sdk(search_results):
+def run_fact_verifier_sdk(search_results: list) -> dict:

109-109: Remove debug print statement.

This appears to be leftover debug code that should be removed from production.

-            print(content)

114-122: Improve error handling specificity.

The broad exception handling makes debugging difficult and could mask important errors.

-            except Exception as parse_err:
+            except (json.JSONDecodeError, KeyError, TypeError) as parse_err:
                 print(f"❌ LLM JSON parse error: {parse_err}")
+                # Log the original content for debugging
+                print(f"Original content: {content}")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9232290 and 3b54f63.

⛔ Files ignored due to path filters (1)
  • new-backend/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • frontend/app/page.tsx (1 hunks)
  • new-backend/app/modules/facts_check/llm_processing.py (1 hunks)
  • new-backend/app/modules/facts_check/web_search.py (1 hunks)
  • new-backend/app/modules/langgraph_builder.py (1 hunks)
  • new-backend/app/modules/langgraph_nodes/error_handler.py (1 hunks)
  • new-backend/app/modules/langgraph_nodes/fact_check.py (1 hunks)
  • new-backend/app/modules/langgraph_nodes/generate_perspective.py (1 hunks)
  • new-backend/app/modules/langgraph_nodes/judge.py (1 hunks)
  • new-backend/app/modules/langgraph_nodes/sentiment.py (1 hunks)
  • new-backend/app/modules/langgraph_nodes/store_and_send.py (1 hunks)
  • new-backend/app/modules/pipeline.py (2 hunks)
  • new-backend/app/modules/scraper/cleaner.py (2 hunks)
  • new-backend/app/modules/scraper/extractor.py (1 hunks)
  • new-backend/app/routes/routes.py (2 hunks)
  • new-backend/app/utils/fact_check_utils.py (1 hunks)
  • new-backend/app/utils/prompt_templates.py (1 hunks)
  • new-backend/pyproject.toml (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
new-backend/app/modules/langgraph_nodes/fact_check.py (1)
new-backend/app/utils/fact_check_utils.py (1)
  • run_fact_check_pipeline (8-31)
new-backend/app/modules/pipeline.py (1)
new-backend/app/modules/langgraph_builder.py (1)
  • build_langgraph (24-106)
new-backend/app/routes/routes.py (1)
new-backend/app/modules/pipeline.py (1)
  • run_langgraph_workflow (35-38)
new-backend/app/utils/fact_check_utils.py (2)
new-backend/app/modules/facts_check/web_search.py (1)
  • search_duckduckgo (4-15)
new-backend/app/modules/facts_check/llm_processing.py (2)
  • run_claim_extractor_sdk (12-57)
  • run_fact_verifier_sdk (60-137)
🪛 Pylint (3.3.7)
new-backend/app/modules/langgraph_nodes/generate_perspective.py

[refactor] 9-9: Too few public methods (0/2)

(R0903)


[refactor] 35-38: Unnecessary "elif" after "raise", remove the leading "el" from "elif"

(R1720)

🔇 Additional comments (9)
new-backend/app/modules/scraper/extractor.py (1)

40-40: Good addition for consistency in text extraction.

Adding no_fallback=True ensures more consistent extraction behavior by disabling fallback mechanisms, which is beneficial for a fact-checking pipeline where consistent text quality is important.

However, consider the trade-off: this may reduce extraction success rates for difficult-to-parse web pages.

new-backend/app/routes/routes.py (1)

4-4: Good integration of the LangGraph workflow.

The import statement correctly brings in the new workflow function.

new-backend/app/modules/langgraph_nodes/store_and_send.py (1)

8-10: Implement or remove commented vector DB code.

The commented code suggests this functionality is planned but not implemented. Consider either implementing it or removing the comments to avoid confusion.

Is the vector database functionality intended to be implemented in this PR, or should the commented code be removed?

new-backend/pyproject.toml (1)

9-19: LGTM! New dependencies align with the implemented features.

The added dependencies are appropriate for the fact-checking, sentiment analysis, and LangGraph workflow functionality being introduced.

new-backend/app/modules/scraper/cleaner.py (1)

32-71: LGTM! Comprehensive boilerplate removal patterns.

The extensive list of boilerplate patterns should significantly improve text cleaning quality by removing common web article noise and navigation elements.

new-backend/app/modules/langgraph_nodes/judge.py (1)

15-18: To locate and inspect the actual PerspectiveOutput definition, let’s search for it and then pull in the surrounding code:

#!/bin/bash
set -e

echo "🔍 Searching for any references to PerspectiveOutput…"
rg -n "PerspectiveOutput" -A 5 || true

echo
echo "📂 Locating generate_perspective.py…"
fd generate_perspective.py || true

echo
echo "📄 Dumping first 200 lines of generate_perspective.py (if found)…"
TARGET=$(fd generate_perspective.py | head -n1)
if [ -n "$TARGET" ]; then
  echo "=== Contents of $TARGET ==="
  sed -n '1,200p' "$TARGET"
else
  echo "⚠️  generate_perspective.py not found in repo."
fi
new-backend/app/modules/langgraph_nodes/sentiment.py (1)

10-52: Well-implemented sentiment analysis function.

The implementation is clean and follows good practices:

  • Proper error handling with try-catch
  • Clear system prompt with specific instructions
  • Appropriate model parameters (temperature=0.2, max_tokens=10)
  • Consistent state management pattern
new-backend/app/modules/langgraph_nodes/generate_perspective.py (1)

9-11: Pydantic model structure is appropriate.

The pylint warning about too few public methods is a false positive for Pydantic models, which are designed as data containers rather than behavior-rich classes.

new-backend/app/modules/langgraph_builder.py (1)

24-106: Well-structured LangGraph workflow.

The overall workflow design is solid with proper:

  • State management using TypedDict
  • Error handling paths for all nodes
  • Logical flow from sentiment analysis through to final storage
  • Retry logic for perspective generation based on quality scores

@@ -0,0 +1,11 @@


def error_handler(input):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use a more descriptive parameter name.

The parameter name input shadows the built-in input() function. Consider using state or error_state for clarity.

-def error_handler(input):
+def error_handler(state):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def error_handler(input):
def error_handler(state):
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/error_handler.py at line 3, rename
the function parameter from `input` to a more descriptive name like `state` or
`error_state` to avoid shadowing the built-in `input()` function and improve
code clarity.

Comment on lines 4 to 6
print("Error detected!")
print(f"From: {input.get('error_from')}")
print(f"Message: {input.get('message')}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace print statements with proper logging.

Using print() statements is not ideal for production code. Consider using the logging module for better log management and control.

+import logging
+
+logger = logging.getLogger(__name__)
+
 def error_handler(input):
-    print("Error detected!")
-    print(f"From: {input.get('error_from')}")
-    print(f"Message: {input.get('message')}")
+    logger.error("Error detected!")
+    logger.error(f"From: {input.get('error_from')}")
+    logger.error(f"Message: {input.get('message')}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print("Error detected!")
print(f"From: {input.get('error_from')}")
print(f"Message: {input.get('message')}")
import logging
logger = logging.getLogger(__name__)
def error_handler(input):
logger.error("Error detected!")
logger.error(f"From: {input.get('error_from')}")
logger.error(f"Message: {input.get('message')}")
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/error_handler.py around lines 4 to 6,
replace the print statements with calls to the logging module. Import the
logging module at the top if not already done, configure a logger, and use
logger.info or logger.error to log the error messages instead of print, ensuring
better log management and control.

Comment on lines 8 to 11
return {"status": "stopped_due_to_error",
"from": [input.get("error_from")],
"error": [input.get("message")]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation and fix inconsistent return structure.

The function should validate that required keys exist in the input. Also, the return structure is inconsistent - from and error are returned as lists containing single values, which seems unnecessary.

 return {"status": "stopped_due_to_error",
-        "from": [input.get("error_from")],
-        "error": [input.get("message")]
+        "error_from": input.get("error_from", "unknown"),
+        "message": input.get("message", "unknown error")
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return {"status": "stopped_due_to_error",
"from": [input.get("error_from")],
"error": [input.get("message")]
}
return {"status": "stopped_due_to_error",
"error_from": input.get("error_from", "unknown"),
"message": input.get("message", "unknown error")
}
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/error_handler.py around lines 8 to
11, add validation to check that the input dictionary contains the required keys
"error_from" and "message" before accessing them. Also, modify the return
statement to return the values of "error_from" and "message" directly as strings
instead of wrapping them in single-element lists, ensuring a consistent and
simpler return structure.

Comment on lines 23 to 24
data = run_langgraph_workflow(article_text)
return data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling around workflow execution.

The workflow execution should be wrapped in try-catch blocks to handle potential failures gracefully and return appropriate HTTP error responses.

-    data = run_langgraph_workflow(article_text)
-    return data
+    try:
+        data = run_langgraph_workflow(article_text)
+        return data
+    except Exception as e:
+        logging.error(f"Workflow execution failed: {e}")
+        return {"error": "Failed to process article", "status": "error"}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
data = run_langgraph_workflow(article_text)
return data
try:
data = run_langgraph_workflow(article_text)
return data
except Exception as e:
logging.error(f"Workflow execution failed: {e}")
return {"error": "Failed to process article", "status": "error"}
🤖 Prompt for AI Agents
In new-backend/app/routes/routes.py around lines 23 to 24, the call to
run_langgraph_workflow lacks error handling. Wrap this call in a try-except
block to catch exceptions, and return an appropriate HTTP error response (such
as a 500 status code with an error message) if the workflow execution fails,
ensuring the API handles failures gracefully.

def store_and_send(state):
# to store data in vector db
try:
print(state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace print with proper logging.

Using print() for debugging is not ideal for production code. Consider using the logging module.

+import logging
+
+logger = logging.getLogger(__name__)
+
-        print(state)
+        logger.info(f"Processing state: {state}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(state)
import logging
logger = logging.getLogger(__name__)
# … other imports and code …
def store_and_send(state):
# … existing logic before …
logger.info(f"Processing state: {state}")
# … existing logic after …
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/store_and_send.py at line 7, replace
the use of print(state) with a proper logging statement. Import the logging
module if not already imported, configure a logger for the module, and use
logger.debug or logger.info to log the state variable instead of printing it
directly.

Comment on lines 81 to 94
graph.add_conditional_edges(
"judge_perspective",
lambda state: (
"error_handler"
if state.get("status") == "error"
else (
"store_and_send"
if state.get("retries", 0) >= 3
else "generate_perspective"
)
if state.get("score", 0) < 70
else "store_and_send"
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Simplify complex conditional logic for better readability.

The nested ternary operators in the judge_perspective conditional edge are difficult to read and maintain. Consider breaking this into a separate function for clarity.

Create a separate function to handle the judge perspective routing logic:

+def judge_perspective_router(state):
+    """Route after judge_perspective based on status, score, and retries."""
+    if state.get("status") == "error":
+        return "error_handler"
+    
+    score = state.get("score", 0)
+    retries = state.get("retries", 0)
+    
+    if score >= 70:
+        return "store_and_send"
+    
+    if retries >= 3:
+        return "store_and_send"
+    
+    return "generate_perspective"
+
     graph.add_conditional_edges(
         "judge_perspective",
-        lambda state: (
-            "error_handler"
-            if state.get("status") == "error"
-            else (
-                "store_and_send"
-                if state.get("retries", 0) >= 3
-                else "generate_perspective"
-            )
-            if state.get("score", 0) < 70
-            else "store_and_send"
-            )
+        judge_perspective_router
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
graph.add_conditional_edges(
"judge_perspective",
lambda state: (
"error_handler"
if state.get("status") == "error"
else (
"store_and_send"
if state.get("retries", 0) >= 3
else "generate_perspective"
)
if state.get("score", 0) < 70
else "store_and_send"
)
)
def judge_perspective_router(state):
"""Route after judge_perspective based on status, score, and retries."""
if state.get("status") == "error":
return "error_handler"
score = state.get("score", 0)
retries = state.get("retries", 0)
if score >= 70:
return "store_and_send"
if retries >= 3:
return "store_and_send"
return "generate_perspective"
graph.add_conditional_edges(
"judge_perspective",
judge_perspective_router
)
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_builder.py around lines 81 to 94, the
nested ternary operators in the lambda for judge_perspective make the
conditional logic hard to read. Refactor by extracting this lambda into a
separate named function that takes the state as input and returns the
appropriate next step string based on the conditions. Then pass this function to
add_conditional_edges instead of the inline lambda to improve clarity and
maintainability.

Comment on lines 94 to 96
f' "original_claim": "{claim}",\n'
f' "source_link": "{source}"\n'
"}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix potential JSON injection in prompt template.

Using f-strings to embed user data directly into JSON template strings can cause malformed JSON if the data contains quotes or special characters.

-                            f'  "original_claim": "{claim}",\n'
-                            f'  "source_link": "{source}"\n'
+                            '  "original_claim": "' + claim.replace('"', '\\"') + '",\n'
+                            '  "source_link": "' + source.replace('"', '\\"') + '"\n'

Or better yet, use a proper JSON template approach:

json_template = {
    "verdict": "True | False | Unverifiable",
    "explanation": "...",
    "original_claim": claim,
    "source_link": source
}
template_str = json.dumps(json_template, indent=2)
🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py around lines 94 to 96,
the current code uses f-strings to insert user data directly into a JSON string,
which risks JSON injection and malformed output if the data contains quotes or
special characters. To fix this, replace the f-string construction with a
dictionary representing the JSON structure, assign the user data to the
appropriate keys, and then serialize the dictionary to a JSON string using
json.dumps with proper indentation.

Comment on lines 65 to 103
for result in search_results:
source = result.get("link", "N/A")
claim = result.get("claim", "N/A")
evidence = (f"{result.get('title', '')}"
f"\n{result.get('snippet', '')}"
f"\nLink: {source}")

chat_completion = client.chat.completions.create(
messages=[
{
"role": "system",
"content": (
"You are a fact-checking assistant. "
"Your job is to determine whether the given"
" claim is True, False, or Unverifiable "
"based on the provided web search evidence."
" Keep it concise and structured."
),
},
{
"role": "user",
"content": (
f"Claim: {claim}\n\n"
f"Web Evidence:\n{evidence}\n\n"
"Based on this evidence, is the claim true?\n"
"Respond only in this JSON format:\n\n"
"{\n"
' "verdict": "True" | "False" | "Unverifiable",\n'
' "explanation": "...",\n'
f' "original_claim": "{claim}",\n'
f' "source_link": "{source}"\n'
"}"
),
},
],
model="gemma2-9b-it",
temperature=0.3,
max_tokens=256,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Optimize API calls and fix potential performance issues.

Making sequential API calls for each search result could be slow and expensive. Consider batching or implementing rate limiting.

For better performance, consider:

  1. Implementing concurrent API calls with proper rate limiting
  2. Batching multiple claims in a single API call if the model supports it
  3. Adding retry logic for failed API calls

Example implementation for concurrent processing:

import asyncio
from concurrent.futures import ThreadPoolExecutor

def process_single_claim(result):
    # Move the existing logic here
    pass

# In the main function:
with ThreadPoolExecutor(max_workers=3) as executor:
    futures = [executor.submit(process_single_claim, result) for result in search_results]
    results_list = [future.result() for future in futures]
🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py between lines 65 and
103, the code makes sequential API calls for each search result, which can cause
slow performance and high costs. Refactor by moving the existing API call logic
into a separate function that processes a single claim, then use concurrent
execution (e.g., ThreadPoolExecutor or asyncio) to run multiple calls in
parallel with a controlled number of workers to avoid rate limits. Additionally,
consider batching multiple claims per API call if supported and add retry logic
to handle transient failures gracefully.

Comment on lines 125 to 129
return {
"claim": claim,
"verifications": results_list,
"status": "success",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect return value for 'claim' field.

The returned claim value comes from the last iteration of the loop, which is likely not the intended behavior.

         return {
-            "claim": claim,
             "verifications": results_list,
             "status": "success",
         }

If you need to return claims, consider returning all claims or restructuring the return value.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return {
"claim": claim,
"verifications": results_list,
"status": "success",
}
return {
"verifications": results_list,
"status": "success",
}
🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py around lines 125 to
129, the 'claim' field in the return dictionary incorrectly uses the last loop
iteration's claim value. To fix this, modify the code to return all claims
collected during the loop, either as a list or by restructuring the return value
to include all relevant claims instead of just one.


load_dotenv()

client = Groq(api_key=os.getenv("GROQ_API_KEY"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for missing API key.

The Groq client is initialized at module level without checking if the API key exists, which could cause runtime errors when the module is imported.

-client = Groq(api_key=os.getenv("GROQ_API_KEY"))
+def get_groq_client():
+    api_key = os.getenv("GROQ_API_KEY")
+    if not api_key:
+        raise ValueError("GROQ_API_KEY environment variable is required")
+    return Groq(api_key=api_key)
+
+client = get_groq_client()
🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py at line 9, the Groq
client is initialized directly with an API key from the environment without
verifying its presence. To fix this, add a check to confirm the GROQ_API_KEY
environment variable is set before initializing the client. If the key is
missing, raise a clear exception or handle the error gracefully to prevent
runtime errors during module import.

@ManavSarkar ManavSarkar merged commit 8d48e5b into main Jul 18, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants