Skip to content

Backend deployment#106

Merged
ManavSarkar merged 7 commits intomainfrom
backend-deployment
Jul 18, 2025
Merged

Backend deployment#106
ManavSarkar merged 7 commits intomainfrom
backend-deployment

Conversation

@ParagGhatage
Copy link
Collaborator

@ParagGhatage ParagGhatage commented Jul 7, 2025

Tasks Done:

  • Added Dockerfile and .dockerignore for backend deployment.

  • created Hugging Face Space for backend deployment and configured it.

  • Deployed backend
    (url: https://thunder1245-perspective-backend.hf.space/api/ )

  • GitHub Actions workflow to Deploy backend to Hugging Face Space on each push to main branch.

  • Tested GitHub Actions workflow locally using act

[🚀 Deploy Backend to HF Space/deploy   ]   ✅  Success - Main 📤 Sync backend code [3.237888166s]
[🚀 Deploy Backend to HF Space/deploy   ] ⭐ Run Complete job
[🚀 Deploy Backend to HF Space/deploy   ] Cleaning up container for job deploy
[🚀 Deploy Backend to HF Space/deploy   ]   ✅  Success - Complete job
[🚀 Deploy Backend to HF Space/deploy   ] 🏁  Job succeeded

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Added a new AI prompt template for generating counter-perspectives to articles.
    • Introduced a fact-checking pipeline integrating claim extraction, web search, and verification.
    • Added Pinecone vector store integration for managing embeddings.
    • Implemented new backend modules for generating, judging, and storing perspectives using advanced language models.
    • Enhanced frontend analysis loading and results pages with improved state management and error handling.
  • Bug Fixes

    • Improved the analysis loading workflow to handle API errors and ensure progress only starts after a successful response.
    • The analysis results page now displays the latest analysis data retrieved from session storage.
  • Style

    • Minor formatting and semicolon additions to the Bias Meter component for consistency.
  • Chores

    • Updated frontend dependencies to include axios for API requests.
    • Removed unused backend and new-backend files, dependencies, and configuration to streamline the codebase.
    • Added Docker and GitHub Actions workflows for backend deployment and containerization.
    • Added backend README and startup scripts for improved developer experience.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 7, 2025

Important

Review skipped

Review was skipped as selected files did not have any reviewable changes.

💤 Files selected but had no reviewable changes (2)
  • backend/app/modules/langgraph_nodes/fact_check.py
  • backend/app/modules/pipeline.py

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This change removes the legacy backend and test files, replacing them with a new modular backend architecture. It introduces new FastAPI app setup, vector store integration, fact-checking and perspective generation modules, embedding utilities, and a Docker-based deployment workflow. The frontend is updated for analysis result handling, and dependency management is migrated to pyproject.toml.

Changes

Files/Groups Change Summary
backend/app/main.py, backend/app/routes.py, backend/app/services/*,
backend/app/prompts/*, backend/app/scrapers/*,
backend/app/test_perspective.py, backend/requirements.txt
Deleted legacy backend FastAPI app, all API routes, prompt templates, service modules, scrapers, and test script.
new-backend/main.py, new-backend/app/modules/langgraph_nodes/*,
new-backend/app/utils/prompt_templates.py, new-backend/README.md
Deleted new-backend FastAPI entrypoint, all node modules, and documentation (now replaced by new backend structure).
backend/main.py, backend/app/db/vector_store.py, backend/app/modules/facts_check/*,
backend/app/modules/langgraph_builder.py, backend/app/modules/langgraph_nodes/*,
backend/app/modules/vector_store/*, backend/app/utils/*
Added new FastAPI app, Pinecone vector store setup, fact-checking modules, state graph builder, node implementations, embedding, chunking, and utility functions.
backend/pyproject.toml Added new dependencies: search, NLP, LangChain, Pinecone, sentence-transformers, etc.
backend/Dockerfile, backend/.dockerignore, backend/README.md,
backend/start.sh
Added Dockerfile, .dockerignore, new backend README, and startup script for containerized deployment.
.github/workflows/deploy-backend-to-hf.yml Added GitHub Actions workflow for deploying backend to Hugging Face Spaces.
.gitignore Added .github/act-events/ and .secrets to ignored files.
frontend/app/analyze/loading/page.tsx Refactored analysis workflow to await API call before progress simulation; improved error handling.
frontend/app/analyze/results/page.tsx Added retrieval and display of analysis results from sessionStorage as formatted JSON.
frontend/components/bias-meter.tsx Minor formatting and semicolon additions; no logic changes.
frontend/package.json Added axios dependency.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend
    participant Backend API
    participant FactCheck
    participant PerspectiveGen
    participant VectorStore

    User->>Frontend: Submit article URL
    Frontend->>Backend API: POST /api/analyze {url}
    Backend API->>FactCheck: Extract and verify claims
    FactCheck->>FactCheck: Web search & LLM verification
    FactCheck-->>Backend API: Verified facts
    Backend API->>PerspectiveGen: Generate counter-perspective
    PerspectiveGen-->>Backend API: Perspective result
    Backend API->>VectorStore: Store embeddings/chunks
    VectorStore-->>Backend API: Store confirmation
    Backend API-->>Frontend: Analysis results
    Frontend-->>User: Display results
Loading

Possibly related PRs

  • AOSSIE-Org/Perspective#97: Migrates backend to a new modular FastAPI structure, directly corresponding to the removal and replacement of the legacy backend in this PR.

Poem

A backend reborn, the old swept away,
With vectors and facts, we’re ready to play!
From claims to perspectives, embeddings we store,
Docker and scripts help us open the door.
The frontend now fetches results with delight—
This bunny’s code garden is looking just right!
🥕✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

🔭 Outside diff range comments (2)
new-backend/pyproject.toml (1)

6-6: Relax Python version requirement to 3.11+

Most of Python 3.13’s new capabilities (enhanced REPL, experimental free-threaded mode, preliminary JIT, richer tracebacks, stdlib import optimizations, etc.) are either developer-convenience features or still experimental. All core AI/ML and web-search libraries you’re using are fully compatible with Python 3.11+, so you can broaden your deployment options by targeting 3.11 instead:

-requires-python = ">=3.13"
+requires-python = ">=3.11"
new-backend/app/modules/langgraph_nodes/generate_perspective.py (1)

49-55: Fix typo and improve error handling.

There's a typo in the error message and the exception handling could be more specific.

     except Exception as e:
-        print(f"some error occured in generate_perspective:{e}")
+        print(f"Error occurred in generate_perspective: {e}")
         return {
             "status": "error",
             "error_from": "generate_perspective",
-            "message": f"{e}",
+            "message": str(e),
         }

Consider catching specific exceptions (e.g., ValueError, KeyError) for better error handling.

🧹 Nitpick comments (20)
frontend/app/analyze/results/page.tsx (1)

23-23: Consider adding type safety for analysis data.

The analysisData state is typed as null but could benefit from proper TypeScript typing based on the expected API response structure.

-const [analysisData, setAnalysisData] = useState(null)
+const [analysisData, setAnalysisData] = useState<AnalysisResult | null>(null)

Consider defining an interface for the expected analysis result structure.

frontend/app/analyze/loading/page.tsx (1)

95-102: Optimize progress bar animation performance.

Updating progress every 100ms may cause unnecessary re-renders.

-const progressInterval = setInterval(() => {
-  setProgress((prev) => {
-    if (prev < 100) {
-      return prev + 1
-    }
-    return prev
-  })
-}, 100)
+const progressInterval = setInterval(() => {
+  setProgress((prev) => {
+    if (prev < 100) {
+      return Math.min(prev + 2, 100) // Increment by 2 every 200ms instead
+    }
+    return prev
+  })
+}, 200)

This reduces the update frequency while maintaining smooth animation.

new-backend/app/modules/scraper/cleaner.py (1)

2-10: Consider pre-downloading NLTK data during Docker build instead of runtime.

The current implementation downloads NLTK corpora at module import time, which can cause delays during application startup and potential network issues in production environments.

For containerized deployments, consider downloading NLTK data during the Docker build process instead:

# In Dockerfile
RUN python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt_tab')"

Then simplify the code to:

-try:
-    nltk.data.find('corpora/stopwords')
-    nltk.data.find('corpora/punkt_tab')
-
-except LookupError:
-    nltk.download('stopwords')
-    nltk.download('punkt_tab')
new-backend/app/modules/facts_check/web_search.py (1)

6-8: Consider using a more secure method for API key handling.

Storing API keys in environment variables is a good practice, but consider additional security measures for production deployments.

For enhanced security, consider:

  • Using a secrets management service
  • Implementing API key rotation
  • Adding logging for security monitoring (without exposing the key)
 def search_with_serpapi(query, max_results=1):
     api_key = os.getenv("SERPAPI_KEY")
     if not api_key:
-        raise ValueError("SERPAPI_KEY not set in environment")
+        raise ValueError("SERPAPI_KEY not set in environment")
+    
+    # Log API usage for monitoring (without exposing key)
+    print(f"Performing search with query: {query[:50]}...")
new-backend/.dockerignore (1)

1-2: Consider adding more comprehensive exclusions for production deployment.

The current exclusions are good, but consider adding common development and build artifacts:

 /.venv
 */.env
+*.pyc
+__pycache__/
+.git/
+.pytest_cache/
+*.log
+.DS_Store
+node_modules/
+.coverage
+htmlcov/
new-backend/app/modules/langgraph_nodes/sentiment.py (1)

34-34: Consider reducing temperature for more deterministic sentiment analysis.

A temperature of 0.2 might introduce unnecessary randomness for sentiment analysis, which should be deterministic.

-            temperature=0.2,
+            temperature=0.0,
new-backend/start.sh (2)

4-5: Consider optimizing the uv installation check.

The script installs uv unconditionally, which may be inefficient if it's already present. Consider checking if uv is available before installation.

# Install uv if not present
-pip install uv
+if ! command -v uv &> /dev/null; then
+    echo "Installing uv..."
+    pip install uv
+fi

8-9: Add error handling for critical operations.

Consider adding validation to ensure the sync operation succeeds before attempting to run the application.

# Sync environment and run app
-uv sync
-uv run main.py
+echo "Syncing dependencies..."
+uv sync || { echo "Failed to sync dependencies"; exit 1; }
+echo "Starting application..."
+uv run main.py
new-backend/app/utils/generate_chunk_id.py (1)

4-8: Consider collision risk with truncated hash.

The function truncates the SHA-256 hash to 15 characters, which reduces the collision resistance. While this is likely acceptable for article IDs, consider documenting this limitation or using a longer hash if uniqueness is critical.

For better collision resistance, consider using a longer hash:

-    return f"article-{hashed_text[:15]}"
+    return f"article-{hashed_text[:32]}"  # Use 32 characters for better uniqueness

Alternatively, add documentation about the collision risk:

def generate_id(text: str) -> str:
+    """Generate a unique ID for article text using SHA-256 hash.
+    
+    Note: Hash is truncated to 15 characters. While collision risk is low,
+    consider using full hash for critical applications.
+    """
new-backend/app/utils/prompt_templates.py (1)

3-32: Well-structured prompt template with minor enhancement suggestions.

The prompt template is well-designed with clear sections and structured output format. Consider adding guidance for edge cases where facts might be contradictory or insufficient.

Consider adding instructions for handling edge cases:

Generate a logical and respectful *opposite perspective* to the article.
+If the verified facts contradict the article's claims, acknowledge this in your reasoning.
+If insufficient facts are available, clearly state this limitation.
Use *step-by-step reasoning* and return your output in this JSON format:
new-backend/main.py (2)

28-30: Good deployment configuration with minor improvement suggestion.

The dynamic port configuration and host binding to 0.0.0.0 are appropriate for container deployment. Consider adding validation for the port value.

-    port = int(os.environ.get("PORT",  7860))
+    port = int(os.environ.get("PORT", 7860))
+    if not 1 <= port <= 65535:
+        raise ValueError(f"Invalid port number: {port}")

26-27: Consider adding environment validation.

While the import placement is fine, consider validating required environment variables at startup to fail fast if configuration is missing.

 if __name__ == "__main__":
     import uvicorn
     import os
+    
+    # Validate required environment variables
+    required_env_vars = ["GROQ_API_KEY", "PINECONE_API_KEY"]  # Adjust based on actual requirements
+    missing_vars = [var for var in required_env_vars if not os.getenv(var)]
+    if missing_vars:
+        raise EnvironmentError(f"Missing required environment variables: {missing_vars}")
new-backend/Dockerfile (1)

12-24: Consider cache configuration consistency.

The cache directory is set up but --no-cache flag is used during installation. This might be redundant or conflicting.

Consider either using the cache or removing the cache directory setup:

# Option 1: Use cache
-RUN uv sync --locked --no-cache
+RUN uv sync --locked

# Option 2: Remove cache directory if not using it
-ENV UV_CACHE_DIR=/app/.uv-cache
-RUN mkdir -p /app/.uv-cache && \
-    adduser --disabled-password --gecos "" appuser && \
-    chown -R appuser:appuser /app
+RUN adduser --disabled-password --gecos "" appuser && \
+    chown -R appuser:appuser /app
new-backend/app/modules/langgraph_nodes/fact_check.py (1)

14-14: Fix spelling errors in error messages.

There are typos in the error logging statements: "occured" should be "occurred".

Apply this diff to fix the spelling:

-            print(f"some error occured in fact_checking:{error_message}")
+            print(f"some error occurred in fact_checking:{error_message}")
-        print(f"some error occured in fact_checking:{e}")
+        print(f"some error occurred in fact_checking:{e}")

Also applies to: 22-22

new-backend/app/utils/fact_check_utils.py (2)

46-47: Consider explicit error handling for the final verification step.

The function returns a tuple where the second element can be None on success. Consider making the return type more explicit or handle potential failures in the verification step.

-    final = run_fact_verifier_sdk(search_results)
-    return final.get("verifications", []), None
+    final = run_fact_verifier_sdk(search_results)
+    if final.get("status") != "success":
+        return [], "Fact verification failed."
+    return final.get("verifications", []), None

40-40: Consider making the rate limiting delay configurable.

The hardcoded 5-second delay works for avoiding rate limits but could be made configurable for different environments or API providers.

-        time.sleep(5)  # ⏱️ Gentle delay to avoid DuckDuckGo ratelimit
+        time.sleep(5)  # ⏱️ Gentle delay to avoid SerpAPI ratelimit

Note: The comment mentions DuckDuckGo but the code uses SerpAPI.

new-backend/app/modules/langgraph_nodes/judge.py (1)

6-10: Consider increasing max_tokens for more reliable scoring.

The max_tokens=10 limit might be too restrictive for the LLM to provide consistent scoring responses, especially if the model occasionally includes explanatory text before the score.

     groq_llm = ChatGroq(
         model="gemma2-9b-it",
         temperature=0.0,
-        max_tokens=10,
+        max_tokens=50,
     )
new-backend/app/modules/vector_store/chunk_rag_data.py (1)

4-4: Add type hints for better code documentation.

The function lacks type hints which would improve code maintainability and IDE support. Consider adding them based on the expected input/output types.

+from typing import List, Dict, Any, Union
+
-def chunk_rag_data(data):
+def chunk_rag_data(data: Dict[str, Any]) -> List[Dict[str, Any]]:
new-backend/app/modules/langgraph_nodes/generate_perspective.py (1)

14-19: Consider using environment variables for model configuration.

The model name and temperature are hardcoded. For better flexibility across different environments, consider loading these from environment variables.

+import os
+
-my_llm = "llama-3.3-70b-versatile"
+my_llm = os.getenv("GROQ_MODEL_NAME", "llama-3.3-70b-versatile")

 llm = ChatGroq(
     model=my_llm,
-    temperature=0.7
+    temperature=float(os.getenv("GROQ_TEMPERATURE", "0.7"))
 )
new-backend/app/modules/facts_check/llm_processing.py (1)

22-28: Fix spacing in prompt content.

There's a missing space in the prompt text that could affect the LLM's understanding.

                     "content": (
                         "You are an assistant that extracts "
                         "verifiable factual claims from articles. "
-                        "Each claim must be short, fact-based, and"
-                        " independently verifiable through internet search. "
+                        "Each claim must be short, fact-based, and "
+                        "independently verifiable through internet search. "
                         "Only return a list of 3 clear bullet-point claims."
                     ),
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 501e9c2 and 56804b6.

⛔ Files ignored due to path filters (2)
  • frontend/package-lock.json is excluded by !**/package-lock.json
  • new-backend/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (39)
  • backend/app/main.py (0 hunks)
  • backend/app/prompts/opposite_perspective.py (0 hunks)
  • backend/app/prompts/related_topics.py (0 hunks)
  • backend/app/routes.py (0 hunks)
  • backend/app/scrapers/article_scraper.py (0 hunks)
  • backend/app/scrapers/clean_data.py (0 hunks)
  • backend/app/services/ai_service.py (0 hunks)
  • backend/app/services/analysis_service.py (0 hunks)
  • backend/app/services/counter_service.py (0 hunks)
  • backend/app/services/related_topics.py (0 hunks)
  • backend/app/services/summarization_service.py (0 hunks)
  • backend/app/test_perspective.py (0 hunks)
  • backend/requirements.txt (0 hunks)
  • frontend/app/analyze/loading/page.tsx (2 hunks)
  • frontend/app/analyze/results/page.tsx (3 hunks)
  • frontend/components/bias-meter.tsx (3 hunks)
  • frontend/package.json (2 hunks)
  • new-backend/.dockerignore (1 hunks)
  • new-backend/Dockerfile (1 hunks)
  • new-backend/README.md (1 hunks)
  • new-backend/app/db/vector_store.py (1 hunks)
  • new-backend/app/modules/facts_check/llm_processing.py (1 hunks)
  • new-backend/app/modules/facts_check/web_search.py (1 hunks)
  • new-backend/app/modules/langgraph_builder.py (5 hunks)
  • new-backend/app/modules/langgraph_nodes/fact_check.py (2 hunks)
  • new-backend/app/modules/langgraph_nodes/generate_perspective.py (2 hunks)
  • new-backend/app/modules/langgraph_nodes/judge.py (1 hunks)
  • new-backend/app/modules/langgraph_nodes/sentiment.py (1 hunks)
  • new-backend/app/modules/langgraph_nodes/store_and_send.py (1 hunks)
  • new-backend/app/modules/scraper/cleaner.py (1 hunks)
  • new-backend/app/modules/vector_store/chunk_rag_data.py (1 hunks)
  • new-backend/app/modules/vector_store/embed.py (1 hunks)
  • new-backend/app/utils/fact_check_utils.py (1 hunks)
  • new-backend/app/utils/generate_chunk_id.py (1 hunks)
  • new-backend/app/utils/prompt_templates.py (1 hunks)
  • new-backend/app/utils/store_vectors.py (1 hunks)
  • new-backend/main.py (1 hunks)
  • new-backend/pyproject.toml (1 hunks)
  • new-backend/start.sh (1 hunks)
💤 Files with no reviewable changes (13)
  • backend/requirements.txt
  • backend/app/main.py
  • backend/app/prompts/related_topics.py
  • backend/app/prompts/opposite_perspective.py
  • backend/app/scrapers/clean_data.py
  • backend/app/services/analysis_service.py
  • backend/app/services/summarization_service.py
  • backend/app/scrapers/article_scraper.py
  • backend/app/services/related_topics.py
  • backend/app/test_perspective.py
  • backend/app/services/counter_service.py
  • backend/app/services/ai_service.py
  • backend/app/routes.py
🧰 Additional context used
🧬 Code Graph Analysis (4)
new-backend/app/modules/langgraph_nodes/fact_check.py (1)
new-backend/app/utils/fact_check_utils.py (1)
  • run_fact_check_pipeline (10-47)
new-backend/app/modules/langgraph_nodes/store_and_send.py (3)
new-backend/app/modules/vector_store/chunk_rag_data.py (1)
  • chunk_rag_data (4-73)
new-backend/app/modules/vector_store/embed.py (1)
  • embed_chunks (7-30)
new-backend/app/utils/store_vectors.py (1)
  • store (10-32)
new-backend/app/modules/vector_store/chunk_rag_data.py (1)
new-backend/app/utils/generate_chunk_id.py (1)
  • generate_id (4-8)
new-backend/app/modules/langgraph_builder.py (2)
new-backend/app/modules/langgraph_nodes/sentiment.py (1)
  • run_sentiment_sdk (10-53)
new-backend/app/modules/langgraph_nodes/error_handler.py (1)
  • error_handler (3-11)
🪛 Ruff (0.11.9)
new-backend/app/utils/store_vectors.py

32-32: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

new-backend/app/modules/langgraph_nodes/store_and_send.py

13-13: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


15-15: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


21-21: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

new-backend/app/db/vector_store.py

14-14: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


40-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (22)
frontend/package.json (1)

41-41: Axios dependency is up to date and secure

Version ^1.10.0 is the latest stable release (June 14, 2025) and has no known security advisories. No further action is needed.

frontend/components/bias-meter.tsx (1)

1-79: LGTM! Excellent formatting improvements.

The addition of semicolons and improved JSX formatting enhances code readability and aligns with TypeScript best practices.

new-backend/README.md (1)

1-10: LGTM! Proper Hugging Face Spaces configuration.

The YAML front matter is correctly configured for Hugging Face Spaces deployment with Docker SDK, which aligns with the PR objectives for backend deployment.

new-backend/app/modules/langgraph_nodes/sentiment.py (2)

35-35: The reduced token limit is appropriate for sentiment analysis.

Reducing max_tokens to 3 makes sense since the expected output is a single word (positive/negative/neutral), which helps ensure concise responses and reduces API costs.


39-39: Good practice to normalize sentiment output.

Converting sentiment to lowercase ensures consistent output format for downstream processing.

new-backend/app/utils/generate_chunk_id.py (1)

5-6: LGTM: Good input validation.

The input validation properly checks for both empty strings and correct type, which prevents common errors.

new-backend/app/utils/prompt_templates.py (1)

21-31: Good JSON structure specification.

The JSON format specification is clear and will help ensure consistent output parsing. The reasoning steps format encourages structured thinking.

new-backend/app/utils/store_vectors.py (1)

10-28: Well-structured function with good validation and logging.

The function properly validates input, handles the Pinecone upsert operation, and provides informative logging. The structure and error handling approach are solid.

new-backend/Dockerfile (1)

1-31: Good security practices with non-root user and proper structure.

The Dockerfile follows security best practices by using a non-root user and properly sets up the working directory. The port configuration for Hugging Face deployment is appropriate.

new-backend/app/modules/langgraph_nodes/fact_check.py (1)

11-20: Well-integrated pipeline with proper error handling.

The integration with the fact-checking pipeline is clean and maintains proper error handling. The function correctly handles both pipeline errors and exceptions while preserving the state structure.

new-backend/app/db/vector_store.py (2)

17-34: Well-designed index management with proper constants.

The index creation logic is sound with appropriate constants and conditional creation. The serverless specification for AWS US East 1 is properly configured.


5-7: Good practice for environment variable validation.

Proper validation of required environment variables with clear error messages.

new-backend/app/modules/vector_store/embed.py (3)

13-18: Excellent input validation with clear error messages.

The validation logic properly checks chunk structure and provides detailed error messages including the problematic index, which aids in debugging.


20-30: Well-structured embedding and vector creation process.

The function efficiently processes text embeddings and creates properly formatted vectors for Pinecone storage. The data structure aligns well with the expected format.


4-4: Appropriate model choice for general text embeddings.

The "all-MiniLM-L6-v2" model is a good choice for general text embeddings, providing a good balance between performance and accuracy. The 384-dimensional output aligns with the vector store configuration.

new-backend/app/modules/langgraph_nodes/judge.py (1)

29-44: Excellent robust response parsing and score validation.

The implementation handles multiple response formats gracefully and includes proper bounds checking for the extracted score. The regex pattern effectively extracts integer values from the response.

new-backend/app/modules/langgraph_builder.py (2)

14-22: Excellent addition of typed state management.

The MyState TypedDict provides clear type definitions for all state variables, improving code maintainability and IDE support. The type annotations are comprehensive and match the expected data flow.


56-102: Verify LangGraph dependency for conditional edges and terminal marker

We couldn’t locate StateGraph or its add_conditional_edges implementation in the repo, nor import langgraph in the sandbox. Please confirm that your installed LangGraph version’s StateGraph API supports:

  • The add_conditional_edges(source: str, condition: Callable) method
  • The "__end__" terminal marker

Typical checks:

pip show langgraph  
python - <<EOF
import inspect
from langgraph.graph import StateGraph
print(inspect.signature(StateGraph.add_conditional_edges))
EOF

If unsupported, either revert to set_conditional_edges or bump your LangGraph dependency accordingly.
[new-backend/app/modules/langgraph_builder.py:56–102]

new-backend/app/modules/vector_store/chunk_rag_data.py (2)

6-32: Excellent comprehensive validation of input data.

The validation logic properly checks for required fields, validates data types, and handles both dictionary and object-based perspective data. The safety checks for perspective object attributes are particularly well-implemented.


44-68: Robust fact validation and chunk generation.

The implementation properly validates all required fact fields and generates well-structured chunks with comprehensive metadata. The enumeration approach for fact indexing is clean and maintainable.

new-backend/app/modules/langgraph_nodes/generate_perspective.py (1)

35-38: Good input validation!

The validation of required state fields is well implemented with clear error messages.

new-backend/app/modules/facts_check/llm_processing.py (1)

108-110: Good markdown stripping implementation!

The regex pattern correctly handles both opening and closing markdown code blocks.

Comment on lines 42 to 49
const storedData = sessionStorage.getItem("analysisResult")
if (storedData) {
setAnalysisData(JSON.parse(storedData))
} else {
// fallback if user visits results page directly
// maybe redirect or show error
console.warn("No analysis result found")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for sessionStorage parsing.

The JSON.parse operation could throw an error if the stored data is invalid.

const storedData = sessionStorage.getItem("analysisResult")
if (storedData) {
-  setAnalysisData(JSON.parse(storedData))
+  try {
+    setAnalysisData(JSON.parse(storedData))
+  } catch (error) {
+    console.error("Failed to parse analysis data:", error)
+    // Consider redirecting to analyze page or showing error message
+  }
} else {
  // fallback if user visits results page directly
  // maybe redirect or show error
  console.warn("No analysis result found")
}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const storedData = sessionStorage.getItem("analysisResult")
if (storedData) {
setAnalysisData(JSON.parse(storedData))
} else {
// fallback if user visits results page directly
// maybe redirect or show error
console.warn("No analysis result found")
}
const storedData = sessionStorage.getItem("analysisResult")
if (storedData) {
try {
setAnalysisData(JSON.parse(storedData))
} catch (error) {
console.error("Failed to parse analysis data:", error)
// Consider redirecting to analyze page or showing an error message
}
} else {
// fallback if user visits results page directly
// maybe redirect or show error
console.warn("No analysis result found")
}
🤖 Prompt for AI Agents
In frontend/app/analyze/results/page.tsx around lines 42 to 49, the JSON.parse
call on sessionStorage data can throw an error if the stored data is invalid.
Wrap the JSON.parse call in a try-catch block to handle any parsing errors
gracefully. In the catch block, log the error and handle the failure case, such
as clearing the invalid data or showing an error message to the user.

Comment on lines 78 to 83
<div className="p-4">
<h1 className="text-2xl font-bold mb-4">Analysis Results</h1>
<pre className="bg-black p-4 rounded text-sm overflow-x-auto whitespace-pre-wrap">
{JSON.stringify(analysisData, null, 2)}
</pre>
</div>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve JSON display styling and user experience.

The hardcoded black background doesn't respect theme preferences and raw JSON display might not be user-friendly.

-<div className="p-4">
-  <h1 className="text-2xl font-bold mb-4">Analysis Results</h1>
-  <pre className="bg-black p-4 rounded text-sm overflow-x-auto whitespace-pre-wrap">
-    {JSON.stringify(analysisData, null, 2)}
-  </pre>
-</div>
+{analysisData && (
+  <div className="p-4">
+    <h1 className="text-2xl font-bold mb-4">Analysis Results</h1>
+    <pre className="bg-muted p-4 rounded text-sm overflow-x-auto whitespace-pre-wrap border">
+      {JSON.stringify(analysisData, null, 2)}
+    </pre>
+  </div>
+)}

Consider replacing the raw JSON display with a structured, user-friendly presentation of the analysis results.

🤖 Prompt for AI Agents
In frontend/app/analyze/results/page.tsx around lines 78 to 83, the JSON display
uses a hardcoded black background and raw JSON stringifying, which ignores theme
preferences and is not user-friendly. Replace the raw JSON <pre> block with a
structured, styled component that formats the analysisData into readable
sections or tables, and use theme-aware styling instead of a fixed black
background to improve user experience and accessibility.

Comment on lines 64 to 66
const res = await axios.post("https://Thunder1245-perspective1.hf.space/api/process", {
url: storedUrl,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Extract hardcoded API URL to environment configuration.

The hardcoded API endpoint should be configurable and not embedded in the code.

-const res = await axios.post("https://Thunder1245-perspective1.hf.space/api/process", {
+const res = await axios.post(process.env.NEXT_PUBLIC_API_URL || "https://Thunder1245-perspective1.hf.space/api/process", {
  url: storedUrl,
})

Add the API URL to your environment variables in .env.local:

NEXT_PUBLIC_API_URL=https://Thunder1245-perspective1.hf.space/api/process
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const res = await axios.post("https://Thunder1245-perspective1.hf.space/api/process", {
url: storedUrl,
})
const res = await axios.post(
process.env.NEXT_PUBLIC_API_URL || "https://Thunder1245-perspective1.hf.space/api/process",
{
url: storedUrl,
}
)
🤖 Prompt for AI Agents
In frontend/app/analyze/loading/page.tsx around lines 64 to 66, the API URL is
hardcoded in the axios.post call. To fix this, move the URL to an environment
variable by adding
NEXT_PUBLIC_API_URL=https://Thunder1245-perspective1.hf.space/api/process in
.env.local, then replace the hardcoded string with
process.env.NEXT_PUBLIC_API_URL in the axios.post call to make the endpoint
configurable.

Comment on lines 63 to 78
try {
const res = await axios.post("https://Thunder1245-perspective1.hf.space/api/process", {
url: storedUrl,
})

// Save response to sessionStorage
sessionStorage.setItem("analysisResult", JSON.stringify(res.data))

// optional logging
console.log("Analysis result saved")
console.log(res)
} catch (err) {
console.error("Failed to process article:", err)
router.push("/analyze") // fallback in case of error
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add user feedback during API processing.

The API call happens silently, leaving users unaware of the actual processing status.

Consider adding a loading state and user feedback:

const [currentStep, setCurrentStep] = useState(0)
const [progress, setProgress] = useState(0)
const [articleUrl, setArticleUrl] = useState("")
+const [isProcessing, setIsProcessing] = useState(false)
+const [apiError, setApiError] = useState<string | null>(null)
const router = useRouter()

// In the runAnalysis function:
try {
+  setIsProcessing(true)
  const res = await axios.post(process.env.NEXT_PUBLIC_API_URL, {
    url: storedUrl,
  })
+  setIsProcessing(false)
  
  // Save response to sessionStorage
  sessionStorage.setItem("analysisResult", JSON.stringify(res.data))
} catch (err) {
+  setIsProcessing(false)
+  setApiError("Failed to process article. Please try again.")
  console.error("Failed to process article:", err)
}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In frontend/app/analyze/loading/page.tsx around lines 63 to 78, the API call to
process the article happens without any user feedback, leaving users unaware of
the processing status. Introduce a loading state variable to track when the API
call is in progress. Before the try block, set the loading state to true, and in
both the success and catch blocks, set it back to false. Use this loading state
to conditionally render a loading indicator or message in the UI to inform users
that processing is underway.

Comment on lines 58 to 114
const runAnalysis = async () => {
const storedUrl = sessionStorage.getItem("articleUrl")
if (storedUrl) {
setArticleUrl(storedUrl)

try {
const res = await axios.post("https://Thunder1245-perspective1.hf.space/api/process", {
url: storedUrl,
})

// Save response to sessionStorage
sessionStorage.setItem("analysisResult", JSON.stringify(res.data))

// optional logging
console.log("Analysis result saved")
console.log(res)
} catch (err) {
console.error("Failed to process article:", err)
router.push("/analyze") // fallback in case of error
return
}

// Progress and step simulation
const stepInterval = setInterval(() => {
setCurrentStep((prev) => {
if (prev < steps.length - 1) {
return prev + 1
} else {
clearInterval(stepInterval)
setTimeout(() => {
router.push("/analyze/results")
}, 2000)
return prev
}
})
}, 2000)

const progressInterval = setInterval(() => {
setProgress((prev) => {
if (prev < 100) {
return prev + 1
}
return prev
})
}, 100)

return () => {
clearInterval(stepInterval)
clearInterval(progressInterval)
}
} else {
// Redirect back if no URL found
router.push("/analyze")
return
}
}

runAnalysis()
}, [router])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix memory leak and improve async handling.

The current implementation has several issues that could cause memory leaks and race conditions.

useEffect(() => {
-  const runAnalysis = async () => {
+  let mounted = true;
+  let stepInterval: NodeJS.Timeout;
+  let progressInterval: NodeJS.Timeout;
+
+  const runAnalysis = async () => {
    const storedUrl = sessionStorage.getItem("articleUrl")
    if (storedUrl) {
+      if (!mounted) return;
      setArticleUrl(storedUrl)

      try {
        const res = await axios.post(process.env.NEXT_PUBLIC_API_URL || "https://Thunder1245-perspective1.hf.space/api/process", {
          url: storedUrl,
        })

+        if (!mounted) return;
        // Save response to sessionStorage
        sessionStorage.setItem("analysisResult", JSON.stringify(res.data))

        // optional logging
        console.log("Analysis result saved")
        console.log(res)
      } catch (err) {
        console.error("Failed to process article:", err)
+        if (mounted) {
          router.push("/analyze") // fallback in case of error
+        }
        return
      }

      // Progress and step simulation
-      const stepInterval = setInterval(() => {
+      stepInterval = setInterval(() => {
        setCurrentStep((prev) => {
          if (prev < steps.length - 1) {
            return prev + 1
          } else {
            clearInterval(stepInterval)
            setTimeout(() => {
+              if (mounted) {
                router.push("/analyze/results")
+              }
            }, 2000)
            return prev
          }
        })
      }, 2000)

-      const progressInterval = setInterval(() => {
+      progressInterval = setInterval(() => {
        setProgress((prev) => {
          if (prev < 100) {
            return prev + 1
          }
          return prev
        })
      }, 100)

-      return () => {
-        clearInterval(stepInterval)
-        clearInterval(progressInterval)
-      }
    } else {
+      if (mounted) {
        router.push("/analyze")
+      }
    }
  }

  runAnalysis()
+
+  return () => {
+    mounted = false;
+    if (stepInterval) clearInterval(stepInterval);
+    if (progressInterval) clearInterval(progressInterval);
+  }
}, [router])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const runAnalysis = async () => {
const storedUrl = sessionStorage.getItem("articleUrl")
if (storedUrl) {
setArticleUrl(storedUrl)
try {
const res = await axios.post("https://Thunder1245-perspective1.hf.space/api/process", {
url: storedUrl,
})
// Save response to sessionStorage
sessionStorage.setItem("analysisResult", JSON.stringify(res.data))
// optional logging
console.log("Analysis result saved")
console.log(res)
} catch (err) {
console.error("Failed to process article:", err)
router.push("/analyze") // fallback in case of error
return
}
// Progress and step simulation
const stepInterval = setInterval(() => {
setCurrentStep((prev) => {
if (prev < steps.length - 1) {
return prev + 1
} else {
clearInterval(stepInterval)
setTimeout(() => {
router.push("/analyze/results")
}, 2000)
return prev
}
})
}, 2000)
const progressInterval = setInterval(() => {
setProgress((prev) => {
if (prev < 100) {
return prev + 1
}
return prev
})
}, 100)
return () => {
clearInterval(stepInterval)
clearInterval(progressInterval)
}
} else {
// Redirect back if no URL found
router.push("/analyze")
return
}
}
runAnalysis()
}, [router])
useEffect(() => {
let mounted = true
let stepInterval: NodeJS.Timeout
let progressInterval: NodeJS.Timeout
const runAnalysis = async () => {
const storedUrl = sessionStorage.getItem("articleUrl")
if (storedUrl) {
if (!mounted) return
setArticleUrl(storedUrl)
try {
const res = await axios.post(
process.env.NEXT_PUBLIC_API_URL || "https://Thunder1245-perspective1.hf.space/api/process",
{ url: storedUrl },
)
if (!mounted) return
// Save response to sessionStorage
sessionStorage.setItem("analysisResult", JSON.stringify(res.data))
// optional logging
console.log("Analysis result saved")
console.log(res)
} catch (err) {
console.error("Failed to process article:", err)
if (mounted) {
router.push("/analyze") // fallback in case of error
}
return
}
// Progress and step simulation
stepInterval = setInterval(() => {
setCurrentStep((prev) => {
if (prev < steps.length - 1) {
return prev + 1
} else {
clearInterval(stepInterval)
setTimeout(() => {
if (mounted) {
router.push("/analyze/results")
}
}, 2000)
return prev
}
})
}, 2000)
progressInterval = setInterval(() => {
setProgress((prev) => {
if (prev < 100) {
return prev + 1
}
return prev
})
}, 100)
} else {
if (mounted) {
router.push("/analyze")
}
}
}
runAnalysis()
return () => {
mounted = false
if (stepInterval) clearInterval(stepInterval)
if (progressInterval) clearInterval(progressInterval)
}
}, [router])
🤖 Prompt for AI Agents
In frontend/app/analyze/loading/page.tsx around lines 58 to 114, the runAnalysis
function sets up intervals inside an async function without proper cleanup,
causing potential memory leaks and race conditions. Refactor by moving the
interval setup outside the async function or use a React effect cleanup function
to clear intervals when the component unmounts or dependencies change. Also,
ensure async calls are properly awaited and handle component unmount scenarios
to avoid state updates on unmounted components.

Comment on lines 10 to 22
try:
chunks = chunk_rag_data(state)
except KeyError as e:
raise Exception(f"Missing required data field for chunking: {e}")
except Exception as e:
raise Exception(f"Failed to chunk data: {e}")
try:
vectors = embed_chunks(chunks)
if vectors:
print("embedding generated successfully!")
except Exception as e:
raise Exception(f"failed to embed chunks: {e}")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve exception handling to follow Python best practices.

The exception handling should use raise ... from err or raise ... from None to properly chain exceptions and provide better debugging information.

         try:
             chunks = chunk_rag_data(state)
         except KeyError as e:
-            raise Exception(f"Missing required data field for chunking: {e}")
+            raise Exception(f"Missing required data field for chunking: {e}") from e
         except Exception as e:
-            raise Exception(f"Failed to chunk data: {e}")
+            raise Exception(f"Failed to chunk data: {e}") from e
         try:
             vectors = embed_chunks(chunks)
             if vectors:
                 print("embedding generated successfully!")
         except Exception as e:
-            raise Exception(f"failed to embed chunks: {e}")
+            raise Exception(f"failed to embed chunks: {e}") from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
chunks = chunk_rag_data(state)
except KeyError as e:
raise Exception(f"Missing required data field for chunking: {e}")
except Exception as e:
raise Exception(f"Failed to chunk data: {e}")
try:
vectors = embed_chunks(chunks)
if vectors:
print("embedding generated successfully!")
except Exception as e:
raise Exception(f"failed to embed chunks: {e}")
try:
chunks = chunk_rag_data(state)
except KeyError as e:
raise Exception(f"Missing required data field for chunking: {e}") from e
except Exception as e:
raise Exception(f"Failed to chunk data: {e}") from e
try:
vectors = embed_chunks(chunks)
if vectors:
print("embedding generated successfully!")
except Exception as e:
raise Exception(f"failed to embed chunks: {e}") from e
🧰 Tools
🪛 Ruff (0.11.9)

13-13: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


15-15: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


21-21: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/store_and_send.py around lines 10 to
22, the exception handling currently raises new exceptions without chaining the
original ones, which loses traceback context. Update the raise statements to use
"raise Exception(...) from e" to properly chain exceptions and preserve
debugging information.

Comment on lines 40 to 42
facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
"Verdict: {f['verdict']}\nExplanation: "
"{f['explanation']}" for f in state["facts"]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix f-string formatting bug.

The f-string formatting is broken. Only the first line has the f prefix, causing the variables in subsequent lines to not be interpolated.

-        facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
-                               "Verdict: {f['verdict']}\nExplanation: "
-                               "{f['explanation']}" for f in state["facts"]])
+        facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
+                               f"Verdict: {f['verdict']}\nExplanation: "
+                               f"{f['explanation']}" for f in state["facts"]])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
"Verdict: {f['verdict']}\nExplanation: "
"{f['explanation']}" for f in state["facts"]])
facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
f"Verdict: {f['verdict']}\nExplanation: "
f"{f['explanation']}" for f in state["facts"]])
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/generate_perspective.py around lines
40 to 42, the f-string formatting is incorrect because only the first line has
the f prefix, so variables in the following lines are not interpolated. Fix this
by adding the f prefix to each line inside the join list comprehension to ensure
all variables are properly formatted.

Comment on lines 9 to 10
client = Groq(api_key=os.getenv("GROQ_API_KEY"))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Validate API key existence.

The API key is loaded without validation. Consider checking if it exists to provide better error messages.

-client = Groq(api_key=os.getenv("GROQ_API_KEY"))
+api_key = os.getenv("GROQ_API_KEY")
+if not api_key:
+    raise ValueError("GROQ_API_KEY environment variable is not set")
+client = Groq(api_key=api_key)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
client = Groq(api_key=os.getenv("GROQ_API_KEY"))
api_key = os.getenv("GROQ_API_KEY")
if not api_key:
raise ValueError("GROQ_API_KEY environment variable is not set")
client = Groq(api_key=api_key)
🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py around lines 9 to 10,
the API key for Groq client is loaded without checking if it exists. Add a
validation step after retrieving the API key from the environment variable to
verify it is not None or empty. If the key is missing, raise a clear and
descriptive error or log a message indicating the missing API key before
initializing the Groq client.

Comment on lines 78 to 82
"You are a fact-checking assistant. "
"Your job is to determine whether the given"
" claim is True, False"
"based on the provided web search evidence."
" Keep it concise and structured."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incomplete prompt text.

The prompt has spacing issues and an incomplete sentence about claim determination.

                         "content": (
                             "You are a fact-checking assistant. "
-                            "Your job is to determine whether the given"
-                            " claim is True, False"
+                            "Your job is to determine whether the given "
+                            "claim is True or False "
                             "based on the provided web search evidence."
                             " Keep it concise and structured."
                         ),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"You are a fact-checking assistant. "
"Your job is to determine whether the given"
" claim is True, False"
"based on the provided web search evidence."
" Keep it concise and structured."
"content": (
"You are a fact-checking assistant. "
"Your job is to determine whether the given "
"claim is True or False "
"based on the provided web search evidence."
" Keep it concise and structured."
),
🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py around lines 78 to 82,
the prompt string is missing spaces between concatenated parts and ends with an
incomplete sentence about claim determination. Fix this by adding necessary
spaces between the string segments and completing the sentence so it clearly
instructs the assistant to determine if the claim is True or False based on the
evidence, ensuring the prompt reads as a coherent, complete instruction.

Comment on lines 113 to 119
try:
parsed = json.loads(content)
except Exception as parse_err:
print(f"❌ LLM JSON parse error: {parse_err}")

results_list.append(parsed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: Handle JSON parse errors properly.

The code continues to use parsed even if JSON parsing fails, which will cause an UnboundLocalError.

             # Try parsing the JSON response
             try:
                 parsed = json.loads(content)
+                results_list.append(parsed)
             except Exception as parse_err:
                 print(f"❌ LLM JSON parse error: {parse_err}")
-
-            results_list.append(parsed)
+                # Skip this result or add a default error result
+                results_list.append({
+                    "verdict": "Error",
+                    "explanation": f"Failed to parse LLM response: {parse_err}",
+                    "original_claim": claim,
+                    "source_link": source
+                })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
parsed = json.loads(content)
except Exception as parse_err:
print(f"❌ LLM JSON parse error: {parse_err}")
results_list.append(parsed)
# Try parsing the JSON response
try:
parsed = json.loads(content)
results_list.append(parsed)
except Exception as parse_err:
print(f"❌ LLM JSON parse error: {parse_err}")
# Skip this result or add a default error result
results_list.append({
"verdict": "Error",
"explanation": f"Failed to parse LLM response: {parse_err}",
"original_claim": claim,
"source_link": source
})
🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py around lines 113 to
119, the variable 'parsed' is used after a JSON parsing attempt without ensuring
it was successfully assigned, leading to a potential UnboundLocalError if
parsing fails. To fix this, modify the code to handle the exception properly by
either initializing 'parsed' to a safe default value before the try block or by
skipping appending to results_list when parsing fails, ensuring 'parsed' is only
used if JSON parsing succeeds.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (8)
backend/app/modules/scraper/cleaner.py (1)

2-2: Remove unused NLTK import or add justification comment.

The nltk library is imported but not used anywhere in the current implementation of clean_extracted_text. This adds unnecessary overhead to the module.

If NLTK functionality is planned for future use, consider adding a comment explaining this. Otherwise, remove the import:

-import nltk
backend/start.sh (1)

1-2: Prefer set -euo pipefail for safer Bash scripting

Adding u and o pipefail prevents silent failures from unset vars and broken pipelines.

-set -e
+set -euo pipefail
backend/README.md (3)

80-80: Fix comment typo .loc.lock

-├── uv.lock               # .loc file like package-lock.json
+├── uv.lock               # .lock file similar to npm's package-lock.json

59-61: Specify a language for the fenced code block to appease markdownlint

-```
-http://localhost:8000/api/
-```
+```text
+http://localhost:8000/api/
+```

24-24: Minor grammar: add “the”

-### 1. Clone the repo & jump into backend folder
+### 1. Clone the repo & jump into the backend folder
backend/app/utils/generate_chunk_id.py (1)

4-8: Consider increasing hash length to reduce collision risk.

Using only 15 characters of SHA-256 provides ~60 bits of entropy, which may lead to collisions at scale. Consider increasing the length or using the full hash.

-    return f"article-{hashed_text[:15]}"
+    return f"article-{hashed_text[:32]}"  # 128 bits of entropy

Additionally, consider making the prefix configurable for better reusability:

-def generate_id(text: str) -> str:
+def generate_id(text: str, prefix: str = "article") -> str:
     if not text or not isinstance(text, str):
         raise ValueError("Text must be non-empty string")
     hashed_text = hashlib.sha256(text.encode("utf-8")).hexdigest()
-    return f"article-{hashed_text[:15]}"
+    return f"{prefix}-{hashed_text[:32]}"
.github/workflows/deploy-backend-to-hf.yml (1)

34-37: Remove unused rsync installation.

The workflow installs rsync but doesn't use it in the subsequent steps. The file synchronization is handled through git operations instead.

Remove the unused rsync installation:

-      - name: 📦 Install rsync
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y rsync
backend/app/modules/facts_check/llm_processing.py (1)

110-110: Replace print statement with proper logging.

Debug print statements should use a proper logging framework instead of print().

-            print(content)
+            # Consider using logging.debug(content) instead
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d5eaf0 and b02db47.

⛔ Files ignored due to path filters (1)
  • backend/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (24)
  • .github/workflows/deploy-backend-to-hf.yml (1 hunks)
  • .gitignore (1 hunks)
  • backend/.dockerignore (1 hunks)
  • backend/Dockerfile (1 hunks)
  • backend/README.md (1 hunks)
  • backend/app/db/vector_store.py (1 hunks)
  • backend/app/modules/facts_check/llm_processing.py (1 hunks)
  • backend/app/modules/facts_check/web_search.py (1 hunks)
  • backend/app/modules/langgraph_builder.py (5 hunks)
  • backend/app/modules/langgraph_nodes/fact_check.py (2 hunks)
  • backend/app/modules/langgraph_nodes/generate_perspective.py (1 hunks)
  • backend/app/modules/langgraph_nodes/judge.py (1 hunks)
  • backend/app/modules/langgraph_nodes/sentiment.py (1 hunks)
  • backend/app/modules/langgraph_nodes/store_and_send.py (1 hunks)
  • backend/app/modules/scraper/cleaner.py (1 hunks)
  • backend/app/modules/vector_store/chunk_rag_data.py (1 hunks)
  • backend/app/modules/vector_store/embed.py (1 hunks)
  • backend/app/utils/fact_check_utils.py (1 hunks)
  • backend/app/utils/generate_chunk_id.py (1 hunks)
  • backend/app/utils/prompt_templates.py (1 hunks)
  • backend/app/utils/store_vectors.py (1 hunks)
  • backend/main.py (1 hunks)
  • backend/pyproject.toml (1 hunks)
  • backend/start.sh (1 hunks)
✅ Files skipped from review due to trivial changes (5)
  • backend/.dockerignore
  • .gitignore
  • backend/app/utils/prompt_templates.py
  • backend/pyproject.toml
  • backend/Dockerfile
🧰 Additional context used
🧬 Code Graph Analysis (5)
backend/app/utils/fact_check_utils.py (2)
backend/app/modules/facts_check/web_search.py (1)
  • search_with_serpapi (5-28)
backend/app/modules/facts_check/llm_processing.py (2)
  • run_claim_extractor_sdk (12-57)
  • run_fact_verifier_sdk (60-132)
backend/app/modules/vector_store/chunk_rag_data.py (1)
backend/app/utils/generate_chunk_id.py (1)
  • generate_id (4-8)
backend/app/modules/langgraph_nodes/store_and_send.py (3)
backend/app/modules/vector_store/chunk_rag_data.py (1)
  • chunk_rag_data (4-73)
backend/app/modules/vector_store/embed.py (1)
  • embed_chunks (7-30)
backend/app/utils/store_vectors.py (1)
  • store (10-32)
backend/app/modules/langgraph_nodes/fact_check.py (1)
backend/app/utils/fact_check_utils.py (1)
  • run_fact_check_pipeline (10-47)
backend/app/modules/langgraph_builder.py (2)
backend/app/modules/langgraph_nodes/sentiment.py (1)
  • run_sentiment_sdk (10-53)
backend/app/modules/langgraph_nodes/error_handler.py (1)
  • error_handler (3-11)
🪛 Ruff (0.11.9)
backend/app/db/vector_store.py

14-14: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


40-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

backend/app/modules/langgraph_nodes/store_and_send.py

13-13: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


15-15: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


21-21: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

backend/app/utils/store_vectors.py

32-32: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🪛 actionlint (1.7.7)
.github/workflows/deploy-backend-to-hf.yml

30-30: shellcheck reported issue in this script: SC2086:info:2:31: Double quote to prevent globbing and word splitting

(shellcheck)

🪛 LanguageTool
backend/README.md

[uncategorized] ~24-~24: You might be missing the article “the” here.
Context: ...rted ### 1. Clone the repo & jump into backend folder ```bash git clone https://githu...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🪛 markdownlint-cli2 (0.17.2)
backend/README.md

59-59: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


76-76: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (27)
backend/app/modules/scraper/cleaner.py (1)

13-86: LGTM! Text cleaning implementation is well-structured.

The clean_extracted_text function is well-implemented with comprehensive boilerplate removal patterns and proper text processing logic. The function handles edge cases appropriately and maintains good readability.

backend/app/modules/langgraph_nodes/sentiment.py (2)

39-39: Good improvement for consistent output formatting.

Converting sentiment to lowercase ensures consistent output regardless of API response formatting.


35-35: Validate max_tokens adequacy for sentiment outputs

Reducing max_tokens to 3 is fine for single-word replies, but the Groq API may include punctuation or brief variations (e.g., “Positive.” or “The sentiment is positive”), which could exceed that limit. Please test against these edge cases and consider increasing to 5 tokens if needed.

• File: backend/app/modules/langgraph_nodes/sentiment.py:35

- max_tokens=3,
+ max_tokens=5,  # allow for punctuation or slight phrasing variations
backend/main.py (1)

28-30: Good port configuration with environment variable support.

Using environment variables for port configuration with a sensible default is a good practice for deployment flexibility.

backend/app/modules/facts_check/web_search.py (1)

5-28: Well-implemented search function with good error handling.

The function properly validates the API key, handles search parameters correctly, and processes results with graceful fallbacks for missing keys. The implementation is clean and follows good practices.

backend/app/modules/langgraph_nodes/fact_check.py (3)

1-1: Good integration of the new fact-checking pipeline.

The import of the comprehensive fact-checking pipeline replaces the previous placeholder implementation, improving functionality significantly.


11-20: Improved error handling with structured responses.

The updated logic properly handles errors from the pipeline and returns structured error responses, which is better than the previous placeholder approach.


30-30: Correct integration of verification results.

The function now properly returns the verifications from the pipeline as "facts" in the state, maintaining the expected output format.

backend/app/db/vector_store.py (2)

5-7: Good API key validation.

Proper validation of the required environment variable with clear error message.


22-34: Good index management logic.

The index creation logic properly checks for existence and creates with appropriate serverless configuration for AWS US East 1.

backend/app/modules/vector_store/chunk_rag_data.py (5)

4-13: Excellent field validation.

Comprehensive validation of required fields with clear error messages. The list type check for facts is particularly good.


15-18: Smart handling of perspective data normalization.

The check for .dict() method allows for flexible input types (both dict and object with dict method).


28-32: Good safety validation for perspective object.

The validation ensures the perspective object has the required attributes before accessing them.


44-67: Thorough fact validation and processing.

The validation of each fact's required fields and the systematic chunk creation with unique IDs is well-implemented.


71-73: Appropriate error handling.

The catch-all exception handling with logging and re-raising preserves the original error while providing debugging information.

backend/app/modules/vector_store/embed.py (4)

1-4: Good model choice and initialization.

The all-MiniLM-L6-v2 model is a solid choice for general text embeddings, providing good performance with reasonable computational requirements.


9-10: Proper handling of empty input.

Early return for empty chunks prevents unnecessary processing and potential errors.


12-18: Comprehensive chunk validation.

The validation ensures each chunk is a dictionary with the required 'text' field, providing clear error messages with indices for debugging.


20-30: Efficient embedding generation and vector construction.

The batch processing approach is efficient, and the vector construction properly maps each chunk to its embedding with preserved metadata.

.github/workflows/deploy-backend-to-hf.yml (1)

1-59: Well-structured CI/CD workflow for HF Space deployment.

The workflow correctly triggers on backend changes, handles authentication securely, and implements proper git operations for deployment.

backend/app/utils/fact_check_utils.py (1)

26-47: Excellent error handling and rate limiting implementation.

The search loop properly handles exceptions, logs outcomes, and includes appropriate delays to prevent rate limiting. The final verification step is well-integrated.

backend/app/modules/langgraph_nodes/judge.py (2)

6-10: Appropriate configuration for scoring task.

The low max_tokens (10) is perfect for a simple scoring response, and zero temperature ensures consistent outputs.


31-43: Robust response parsing with proper error handling.

The code handles multiple response formats and includes proper score validation with clamping. The regex pattern correctly extracts numeric scores.

backend/app/modules/langgraph_nodes/generate_perspective.py (1)

9-24: Excellent use of structured output and proper LLM configuration.

The Pydantic model ensures type safety, and the temperature setting (0.7) is appropriate for creative perspective generation.

backend/app/modules/langgraph_builder.py (2)

14-22: Excellent addition of typed state definition.

The TypedDict provides clear type hints for the state structure, improving code maintainability and IDE support.


56-102: Well-structured conditional logic with comprehensive error handling.

The graph correctly handles error propagation and implements a retry mechanism with scoring thresholds. The updated method calls align with the LangGraph API.

backend/app/modules/facts_check/llm_processing.py (1)

12-58: Well-structured claim extraction function.

The function has proper error handling, input validation, and returns consistent response structure. The implementation follows good practices.

Comment on lines 4 to 10
try:
nltk.data.find('corpora/stopwords')
nltk.data.find('corpora/punkt_tab')

except LookupError:
nltk.download('stopwords')
nltk.download('punkt_tab')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid downloading NLTK data during module import.

Downloading NLTK data during module import can cause several issues:

  • Slow import times
  • Potential failures in deployment environments with limited network access
  • Unnecessary downloads when NLTK functionality isn't used

Consider one of these alternatives:

Option 1: Remove the NLTK downloads entirely (if not needed):

-try:
-    nltk.data.find('corpora/stopwords')
-    nltk.data.find('corpora/punkt_tab')
-
-except LookupError:
-    nltk.download('stopwords')
-    nltk.download('punkt_tab')

Option 2: Lazy loading (if NLTK will be used):

-try:
-    nltk.data.find('corpora/stopwords')
-    nltk.data.find('corpora/punkt_tab')
-
-except LookupError:
-    nltk.download('stopwords')
-    nltk.download('punkt_tab')
+def _ensure_nltk_data():
+    """Ensure NLTK data is available when needed."""
+    try:
+        nltk.data.find('corpora/stopwords')
+        nltk.data.find('corpora/punkt_tab')
+    except LookupError:
+        nltk.download('stopwords')
+        nltk.download('punkt_tab')

Then call _ensure_nltk_data() only when NLTK functionality is actually needed.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
nltk.data.find('corpora/stopwords')
nltk.data.find('corpora/punkt_tab')
except LookupError:
nltk.download('stopwords')
nltk.download('punkt_tab')
# Remove the top-level try/except block and instead define a helper:
def _ensure_nltk_data():
"""Ensure NLTK data is available when needed."""
try:
nltk.data.find('corpora/stopwords')
nltk.data.find('corpora/punkt_tab')
except LookupError:
nltk.download('stopwords')
nltk.download('punkt_tab')
# …later, in whatever function actually uses NLTK:
# _ensure_nltk_data()
# tokens = nltk.word_tokenize(text)
# filtered = [w for w in tokens if w not in nltk.corpus.stopwords.words('english')]
🤖 Prompt for AI Agents
In backend/app/modules/scraper/cleaner.py around lines 4 to 10, avoid
downloading NLTK data during module import as it slows imports and can fail in
restricted environments. Refactor by moving the NLTK data check and download
logic into a separate function, for example, _ensure_nltk_data(), and call this
function only when NLTK functionality is actually needed during runtime instead
of at import time.

Comment on lines 8 to 9
uv sync
uv run main.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use exec so the app becomes PID 1 and receives signals

Without exec, the shell stays PID 1 in a container, preventing proper signal forwarding (e.g., SIGTERM on docker stop).

-uv run main.py
+exec uv run main.py
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uv sync
uv run main.py
uv sync
exec uv run main.py
🤖 Prompt for AI Agents
In backend/start.sh at lines 8 to 9, the commands launching the app should use
`exec` to replace the shell process with the app process. Modify the lines to
prefix the commands with `exec` so that the app becomes PID 1 and can properly
receive and handle signals like SIGTERM, ensuring correct behavior in container
environments.

set -e

# Install uv if not present
pip install uv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Install uv only when absent and pin the version

Unconditionally invoking pip install uv adds unnecessary overhead on every start and may pull a newer, untested version. Guard the install and pin a version for reproducibility.

-# Install uv if not present
-pip install uv
+# Install uv if not present
+if ! command -v uv >/dev/null 2>&1; then
+  pip install --no-cache-dir --upgrade 'uv>=0.100.0'
+fi

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In backend/start.sh at line 5, the script unconditionally runs 'pip install uv',
causing unnecessary overhead and potential version inconsistencies. Modify the
script to first check if 'uv' is already installed, and only run 'pip install
uv==<specific_version>' if it is absent. Replace <specific_version> with a fixed
version number to ensure reproducibility.

Comment on lines +28 to +29
cd new-backend
````
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct directory name in the quick-start step

The repo places this README under backend/, not new-backend/. The current command will fail.

-cd new-backend
+cd backend
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cd new-backend
````
cd backend
🤖 Prompt for AI Agents
In backend/README.md around lines 28 to 29, the quick-start step uses the
incorrect directory name 'new-backend'. Change the directory name in the command
from 'new-backend' to 'backend' to match the actual folder where the README and
backend code reside.

Comment on lines 14 to 20
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

CORS configuration is too permissive for production.

Allowing all origins (allow_origins=["*"]) with credentials enabled poses security risks. Consider restricting origins based on environment.

+import os
+
+# Configure CORS origins based on environment
+origins = ["*"] if os.getenv("ENVIRONMENT") == "development" else [
+    "https://yourdomain.com",
+    "https://www.yourdomain.com"
+]
+
 app.add_middleware(
     CORSMiddleware,
-    allow_origins=["*"],
+    allow_origins=origins,
     allow_credentials=True,
     allow_methods=["*"],
     allow_headers=["*"],
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
import os
# Configure CORS origins based on environment
origins = ["*"] if os.getenv("ENVIRONMENT") == "development" else [
"https://yourdomain.com",
"https://www.yourdomain.com"
]
app.add_middleware(
CORSMiddleware,
allow_origins=origins,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
🤖 Prompt for AI Agents
In backend/main.py around lines 14 to 20, the CORS middleware configuration is
too permissive by allowing all origins with credentials enabled, which is a
security risk in production. Modify the allow_origins parameter to restrict it
to a specific list of trusted origins based on the environment (e.g.,
development vs production). Ensure that allow_credentials remains True only if
the origins are explicitly specified and trusted.

Comment on lines 13 to 15
if state.get("status") != "success":
print("❌ Claim extraction failed.")
return [], "Claim extraction failed."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the status check logic.

The code checks state.get("status") but should check result.get("status") since result contains the response from the claim extraction SDK.

Apply this diff to fix the status check:

-    if state.get("status") != "success":
+    if result.get("status") != "success":
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if state.get("status") != "success":
print("❌ Claim extraction failed.")
return [], "Claim extraction failed."
if result.get("status") != "success":
print("❌ Claim extraction failed.")
return [], "Claim extraction failed."
🤖 Prompt for AI Agents
In backend/app/utils/fact_check_utils.py around lines 13 to 15, the status check
incorrectly uses state.get("status") instead of result.get("status"). Update the
condition to check result.get("status") to correctly verify the claim extraction
SDK response status and handle failures properly.

Comment on lines 40 to 42
facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
"Verdict: {f['verdict']}\nExplanation: "
"{f['explanation']}" for f in state["facts"]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the string formatting syntax.

The f-string formatting is incorrect - the variable references are not properly wrapped in curly braces.

Apply this diff to fix the string formatting:

-        facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
-                               "Verdict: {f['verdict']}\nExplanation: "
-                               "{f['explanation']}" for f in state["facts"]])
+        facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
+                               f"Verdict: {f['verdict']}\nExplanation: "
+                               f"{f['explanation']}" for f in state["facts"]])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
"Verdict: {f['verdict']}\nExplanation: "
"{f['explanation']}" for f in state["facts"]])
facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
f"Verdict: {f['verdict']}\nExplanation: "
f"{f['explanation']}" for f in state["facts"]])
🤖 Prompt for AI Agents
In backend/app/modules/langgraph_nodes/generate_perspective.py around lines 40
to 42, the string formatting uses f-strings but the variable references inside
the strings are not properly enclosed in curly braces. Fix this by ensuring the
entire string is an f-string and all variable references are wrapped in curly
braces within the string literals.


load_dotenv()

client = Groq(api_key=os.getenv("GROQ_API_KEY"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add API key validation before initializing Groq client.

The Groq client is initialized without checking if the API key exists. This could lead to runtime errors if the environment variable is not set.

Consider adding validation:

-client = Groq(api_key=os.getenv("GROQ_API_KEY"))
+api_key = os.getenv("GROQ_API_KEY")
+if not api_key:
+    raise ValueError("GROQ_API_KEY environment variable is not set")
+client = Groq(api_key=api_key)
🤖 Prompt for AI Agents
In backend/app/modules/facts_check/llm_processing.py at line 9, the Groq client
is initialized directly with the API key from the environment without checking
if the key exists. Add a validation step before initializing the client to check
if the GROQ_API_KEY environment variable is set; if it is missing, raise an
appropriate error or handle it gracefully to prevent runtime failures.

Comment on lines 113 to 119
try:
parsed = json.loads(content)
except Exception as parse_err:
print(f"❌ LLM JSON parse error: {parse_err}")

results_list.append(parsed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix critical error: undefined variable usage after JSON parsing failure.

If JSON parsing fails, parsed remains undefined but is still appended to results_list on line 118, which will raise a NameError.

Apply this fix to handle parsing errors properly:

             # Try parsing the JSON response
             try:
                 parsed = json.loads(content)
+                results_list.append(parsed)
             except Exception as parse_err:
                 print(f"❌ LLM JSON parse error: {parse_err}")
-
-            results_list.append(parsed)
+                # Skip this result or add error placeholder
+                results_list.append({
+                    "verdict": "Error",
+                    "explanation": f"Failed to parse LLM response: {parse_err}",
+                    "original_claim": claim,
+                    "source_link": source
+                })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
parsed = json.loads(content)
except Exception as parse_err:
print(f"❌ LLM JSON parse error: {parse_err}")
results_list.append(parsed)
# Try parsing the JSON response
try:
parsed = json.loads(content)
results_list.append(parsed)
except Exception as parse_err:
print(f"❌ LLM JSON parse error: {parse_err}")
# Skip this result or add error placeholder
results_list.append({
"verdict": "Error",
"explanation": f"Failed to parse LLM response: {parse_err}",
"original_claim": claim,
"source_link": source
})
🤖 Prompt for AI Agents
In backend/app/modules/facts_check/llm_processing.py around lines 113 to 119,
the variable 'parsed' is appended to 'results_list' even if JSON parsing fails,
causing a NameError. To fix this, ensure that 'parsed' is only appended if JSON
parsing succeeds by moving the append statement inside the try block or adding a
conditional check that 'parsed' is defined before appending.

Comment on lines 120 to 124
return {
"claim": claim,
"verifications": results_list,
"status": "success",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix variable scope issue: claim is undefined in return statement.

The claim variable on line 121 references the last value from the loop iteration, not a function-level claim. This appears to be incorrect logic.

The return structure should probably not include a single claim when processing multiple claims. Consider:

         return {
-            "claim": claim,
             "verifications": results_list,
             "status": "success",
         }

Or if you need to return all claims, extract them properly:

         return {
-            "claim": claim,
+            "claims": [result.get("claim") for result in search_results],
             "verifications": results_list,
             "status": "success",
         }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In backend/app/modules/facts_check/llm_processing.py around lines 120 to 124,
the variable 'claim' used in the return statement is undefined or incorrectly
scoped as it refers to the last loop iteration variable rather than a
function-level claim. To fix this, remove the single 'claim' from the return
dictionary or replace it with a properly collected list of all claims processed.
Ensure the return structure accurately reflects the data processed, either by
returning all claims as a list or omitting the claim field if not applicable.

@ManavSarkar ManavSarkar merged commit 5dac470 into main Jul 18, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants