Skip to content

Added Doc strings and logger#115

Merged
ParagGhatage merged 6 commits intomainfrom
doc_strings_and_logger
Aug 16, 2025
Merged

Added Doc strings and logger#115
ParagGhatage merged 6 commits intomainfrom
doc_strings_and_logger

Conversation

@ParagGhatage
Copy link
Collaborator

@ParagGhatage ParagGhatage commented Aug 14, 2025

Tasks done-

  • Added Docstrings to each code file.
  • Added logger instead of print statements.

Summary by CodeRabbit

  • New Features

    • Frontend can use NEXT_PUBLIC_API_URL; backend routes mounted under /api. End-to-end article analysis pipeline and a reusable prompt for generating counter-perspectives added. Deterministic IDs for articles introduced.
  • Refactor

    • Improved logging across the backend; clearer LLM/chat flows and structured outputs for better observability.
  • Documentation

    • Added frontend .env setup, corrected backend .env path, and fixed API key formatting.
  • Chores

    • Backend package renamed and obsolete startup script removed; frontend updated to use secure backend endpoints.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 14, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Load GROQ and Pinecone credentials from environment, initialize clients at import, add structured logging utility and module loggers, introduce pipeline orchestrator and deterministic chunk ID generator, return compiled LangGraph from build_langgraph, remove legacy start script, and apply docstring/formatting cleanups across frontend and backend.

Changes

Cohort / File(s) Change Summary
Frontend: env hints & endpoints
frontend/app/analyze/loading/page.tsx, frontend/app/analyze/results/page.tsx, frontend/app/page.tsx
Added commented hint for NEXT_PUBLIC_API_URL, replaced hard-coded backend URLs with canonical https endpoints in calls, and performed formatting/copy tweaks.
Docs / README
README.md
Added frontend .env instructions, corrected backend .env path, and fixed GROQ_API_KEY line formatting.
Logging utility
backend/app/logging/logging_config.py
New setup_logger(name: str) returning configured logger with console and file handlers and shared formatter.
Backend: Pinecone vector store init
backend/app/db/vector_store.py
Load PINECONE_API_KEY from env, initialize Pinecone client, create/connect perspective index (INDEX_NAME/DIMENSIONS/METRIC), expose module-level index and constants, and switch prints to logger usage and explicit errors.
Backend: Groq clients & LLM wrappers
backend/app/modules/bias_detection/check_bias.py, backend/app/modules/facts_check/llm_processing.py, backend/app/modules/chat/llm_processing.py, backend/app/modules/langgraph_nodes/sentiment.py, backend/app/modules/langgraph_nodes/judge.py
Initialize Groq client from GROQ_API_KEY at import, add module loggers via setup_logger, replace prints with logger calls, add input validation and improved error logging, minor prompt formatting adjustments.
Chat / RAG & embeddings
backend/app/modules/chat/get_rag_data.py, backend/app/modules/chat/embed_query.py, backend/app/modules/vector_store/embed.py
Initialize Pinecone client/index at import, minor formatting/refactor in embedding/query paths, remove an explicit embedder.encode call in embed_query, and convert some dicts to single-line forms.
LangGraph: builder & nodes
backend/app/modules/langgraph_builder.py, backend/app/modules/langgraph_nodes/*
Add MyState TypedDict, return compiled graph from build_langgraph(), add docstrings, introduce PerspectiveOutput structured output, add retries/validation/logging in nodes, tighten error handling, and adjust some signatures/returns.
Pipeline orchestration & helpers
backend/app/modules/pipeline.py, backend/app/utils/generate_chunk_id.py, backend/app/utils/prompt_templates.py
New orchestrator with _LANGGRAPH_WORKFLOW = build_langgraph(), run_scraper_pipeline(url) and run_langgraph_workflow(state) public APIs; add deterministic generate_id(text) and a generation_prompt ChatPromptTemplate.
Vector store helpers & chunking
backend/app/modules/vector_store/chunk_rag_data.py, backend/app/modules/vector_store/embed.py, backend/app/utils/store_vectors.py
Add module loggers, stricter input validation, structured chunk metadata, try/except logging, empty-vectors pre-check, and consolidated return formatting.
Scraper: extractor/cleaner/keywords
backend/app/modules/scraper/extractor.py, backend/app/modules/scraper/cleaner.py, backend/app/modules/scraper/keywords.py
Added docstrings, standardized quoting (double quotes), small refactors and extra imports (readability, requests), formatting changes; functional behavior preserved.
Facts-check web search / utils
backend/app/modules/facts_check/web_search.py, backend/app/utils/fact_check_utils.py
Add docstrings, load_dotenv() usage, switch prints to logging, reflow request formatting, and integrate run_fact_verifier_sdk into the fact-check pipeline.
Routes, main, packaging, startup
backend/app/routes/routes.py, backend/main.py, backend/pyproject.toml, backend/start.sh
Added logging and module-level logger, new Pydantic request models (URlRequest, ChatQuery), mount article router and update app description, change project name in pyproject (new-backendbackend), and delete legacy backend/start.sh.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant Frontend
  participant Backend
  participant LangGraph
  participant Pinecone as VectorStore

  User->>Frontend: Submit article URL
  Frontend->>Backend: POST /api/process (uses NEXT_PUBLIC_API_URL hint)
  Backend->>Backend: run_scraper_pipeline -> cleaned_text, keywords
  Backend->>LangGraph: _LANGGRAPH_WORKFLOW.invoke(state)
  LangGraph->>Pinecone: store_and_send(state) (chunking → embed → upsert)
  Pinecone-->>LangGraph: store result
  LangGraph-->>Backend: workflow result
  Backend-->>Frontend: aggregated response
Loading
sequenceDiagram
  participant Client
  participant API
  participant GroqLLM

  Client->>API: POST /api/chat (message)
  API->>GroqLLM: client.chat.completions.create(system, user) using GROQ_API_KEY
  GroqLLM-->>API: completion
  API-->>Client: answer
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

I’m a rabbit in the code so spry,
I hop on env vars, Pinecone in my eye.
Groq hums answers, LangGraph spins the art,
Chunks get tidy IDs and pipelines start.
Hooray — I nibbled docs and left a tiny heart. 🥕

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0549c21 and 59d3ec0.

📒 Files selected for processing (2)
  • frontend/app/analyze/loading/page.tsx (2 hunks)
  • frontend/app/analyze/results/page.tsx (4 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch doc_strings_and_logger

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🔭 Outside diff range comments (18)
backend/app/modules/langgraph_nodes/store_and_send.py (1)

6-33: Replace print statements with logger and avoid logging full state/PII

This PR’s goal includes replacing prints with a logger. Current prints leak the entire state to stdout and don’t capture stacktraces on failures. Also, calling store(vectors) with an empty list will raise a ValueError upstream. Guard and log accordingly.

Proposed refactor:

+import logging
 from app.modules.vector_store.chunk_rag_data import chunk_rag_data
 from app.modules.vector_store.embed import embed_chunks
 from app.utils.store_vectors import store
 
+logger = logging.getLogger(__name__)
 
 def store_and_send(state):
     # to store data in vector db
     try:
-        print(state)
+        # Avoid logging the entire state to prevent PII leakage; log keys only.
+        logger.debug("store_and_send received state keys=%s", list(state.keys()))
         try:
             chunks = chunk_rag_data(state)
         except KeyError as e:
             raise Exception(f"Missing required data field for chunking: {e}")
         except Exception as e:
             raise Exception(f"Failed to chunk data: {e}")
         try:
             vectors = embed_chunks(chunks)
-            if vectors:
-                print("embedding generated successfully!")
+            if vectors:
+                logger.info("Embeddings generated successfully: count=%d", len(vectors))
         except Exception as e:
             raise Exception(f"failed to embed chunks: {e}")
 
-        store(vectors)
-        print("Vectors saved to Pinecone!")
+        if not vectors:
+            logger.warning("No vectors generated; skipping storage")
+            return {**state, "status": "success"}  # No-op but not an error
+
+        store(vectors)
+        logger.info("Stored %d vectors to Pinecone", len(vectors))
 
     except Exception as e:
-        print(f"some error occured in store_and_send:{e}")
+        logger.exception("Some error occurred in store_and_send")
         return {
             "status": "error",
             "error_from": "store_and_send",
             "message": f"{e}",
         }
     #  sending to frontend
     return {**state, "status": "success"}
backend/pyproject.toml (1)

18-18: Remove “logging” from dependencies; it’s part of the standard library

Adding “logging>=0.4.9.6” pulls an unnecessary PyPI package and could introduce confusion or supply-chain risk. Python’s logging is built-in.

-    "logging>=0.4.9.6",
README.md (1)

162-171: Fix list indentation, code fence language, and .env formatting (backend section)

Apply consistent list indentation, add a fenced code language, and remove spaces around = for dotenv compatibility.

-*Setup environment variables:*
-  - add .env file in `/backend`directory.
-  - add following environment variable in your .env file.
-  ```
-GROQ_API_KEY= <groq_api_key>
-PINECONE_API_KEY = <your_pinecone_API_KEY>
-PORT = 8000
-SEARCH_KEY = <your_Google_custom_search_engine_API_key>
-  ```
+*Setup environment variables:*
+- Add a .env file in the `/backend` directory.
+- Add the following environment variables to your .env file.
+```env
+GROQ_API_KEY=<groq_api_key>
+PINECONE_API_KEY=<your_pinecone_API_KEY>
+PORT=8000
+SEARCH_KEY=<your_Google_custom_search_engine_API_key>
+```
backend/app/modules/langgraph_nodes/sentiment.py (1)

48-53: Replace print with logger and include traceback

Use the logger with exception context instead of printing. Also consider avoiding logging raw user content elsewhere in this module to reduce PII leakage risk.

-        print(f"Error in sentiment_analysis: {e}")
+        logger.exception("Error in sentiment_analysis")

Add at the top of the file:

import logging
logger = logging.getLogger(__name__)

Optional: add a short docstring to run_sentiment_sdk describing inputs/outputs.

backend/app/modules/bias_detection/check_bias.py (2)

41-46: Parse and validate numeric bias score (0–100)

The LLM might return extra tokens; parse and clamp to the expected range to keep downstream code robust.

-        bias_score = chat_completion.choices[0].message.content.strip()
-        return {
-            "bias_score": bias_score,
-            "status": "success",
-        }
+        raw = chat_completion.choices[0].message.content.strip()
+        match = re.search(r"\d{1,3}", raw)
+        if not match:
+            raise ValueError(f"Non-numeric bias score: {raw!r}")
+        bias_score = max(0, min(int(match.group(0)), 100))
+        return {
+            "bias_score": bias_score,
+            "status": "success",
+        }

Also add at the top of this file if not present:

import re

48-54: Use logger with traceback in exception path

Replace print with logger.exception to retain stack traces and centralize logging.

-        print(f"Error in bias_detection: {e}")
+        logger.exception("Error in bias_detection")
backend/app/modules/chat/llm_processing.py (5)

29-37: Add error handling around chat completion call

The network call can raise exceptions (timeouts, auth errors). Capture/log and return a graceful error to callers.

-    response = client.chat.completions.create(
-        model="gemma2-9b-it",
-        messages=[
-            {"role": "system", "content": "Use only the context to answer."},
-            {"role": "user", "content": prompt},
-        ],
-    )
-
-    return response.choices[0].message.content
+    try:
+        response = client.chat.completions.create(
+            model="gemma2-9b-it",
+            messages=[
+                {"role": "system", "content": "Use only the context to answer."},
+                {"role": "user", "content": prompt},
+            ],
+        )
+        return response.choices[0].message.content
+    except Exception as e:
+        logger.exception("LLM call failed")
+        return "Sorry, I couldn't generate a response at this time."

Note: See logger addition in the next comment.


19-19: Replace print with structured logging (aligns with PR goal)

Printing the context leaks to stdout and contradicts the PR objective. Switch to a module logger and log at debug level.

+import logging
@@
+logger = logging.getLogger(__name__)
@@
-    print(context)
+    logger.debug("RAG context length=%d", len(context))

7-8: Guard against missing GROQ_API_KEY at startup

If GROQ_API_KEY is unset, client creation will fail later with confusing errors. Validate early.

-client = Groq(api_key=os.getenv("GROQ_API_KEY"))
+api_key = os.getenv("GROQ_API_KEY")
+if not api_key:
+    raise RuntimeError("GROQ_API_KEY is not set")
+client = Groq(api_key=api_key)

17-27: Cap/truncate the assembled context before building the prompt (fix required)

build_context currently concatenates all doc explanations/reasoning and is called directly in ask_llm — this can cause token overflows/cost spikes. rg shows build_context is defined and used only in backend/app/modules/chat/llm_processing.py, so it's safe to change the function signature/behavior.

What to change (concise):

  • Limit total characters/tokens and/or select top-k docs before joining.
  • Prefer top-k by a score in metadata (e.g. score/similarity) when available, otherwise fall back to longest/most relevant texts.
  • Replace printing with logging.debug and parameterize max size via config/env.

Suggested replacement (minimal patch):

# backend/app/modules/chat/llm_processing.py

def build_context(docs, max_chars=20000, top_k=20):
    entries = []
    for m in docs:
        meta = m.get("metadata", {}) or {}
        text = meta.get("explanation") or meta.get("reasoning") or ""
        score = meta.get("score") or meta.get("similarity") or 0
        entries.append({"text": text, "score": score})

    # prefer scored ordering if scores exist, else by text length
    if any(e["score"] for e in entries):
        entries.sort(key=lambda e: e["score"], reverse=True)
    else:
        entries.sort(key=lambda e: len(e["text"]), reverse=True)

    selected = []
    total = 0
    for e in entries[:top_k]:
        t = e["text"]
        if not t:
            continue
        if total + len(t) > max_chars:
            remain = max_chars - total
            if remain > 0:
                selected.append(t[:remain])
                total = max_chars
            break
        selected.append(t)
        total += len(t)

    return "\n".join(selected)


def ask_llm(question, docs):
    context = build_context(docs, max_chars=20000, top_k=20)
    # use logging.debug(...) instead of print in production
    prompt = f"""You are an assistant that answers based on context.

Context:
{context}

Question:
{question}
"""

Notes:

  • Adjust max_chars/top_k to your model's token limits (consider converting chars -> tokens if you have a tokenizer).
  • If you rely on downstream callers, update them for the new build_context signature (rg output shows only local usage now).
  • Replace print(context) with logging as appropriate.

1-37: Action required: replace remaining print(...) calls with logging across backend

The grep results show many leftover print statements in backend files. Replace them with logger calls and add a module-level logger (e.g. import logging; logger = logging.getLogger(name)). Only backend/app/utils/store_vectors.py currently defines a logger.

Files that need attention (path:line):

  • backend/main.py:51
  • backend/app/utils/fact_check_utils.py:14,21,29,35,37,39
  • backend/app/routes/routes.py:31,38,48
  • backend/app/db/vector_store.py:22,30
  • backend/app/modules/pipeline.py:26
  • backend/app/modules/vector_store/chunk_rag_data.py:72
  • backend/app/modules/langgraph_nodes/generate_perspective.py:54
  • backend/app/modules/chat/llm_processing.py:19
  • backend/app/modules/langgraph_nodes/store_and_send.py:9,19,24,27
  • backend/app/modules/langgraph_nodes/error_handler.py:2,3,4
  • backend/app/modules/langgraph_nodes/sentiment.py:48
  • backend/app/modules/langgraph_nodes/judge.py:48
  • backend/app/modules/langgraph_nodes/fact_check.py:14,22
  • backend/app/modules/bias_detection/check_bias.py:13,14,49
  • backend/app/modules/facts_check/llm_processing.py:52,110,116,127

Recommended changes (concise):

  • Add at top of each module: import logging; logger = logging.getLogger(name).
  • Replace print(...) with appropriate logger levels: logger.debug/info/warning/error.
  • Ensure logging is configured once in the app entrypoint (backend/main.py) rather than using prints there.
  • Re-run the rg check to confirm no prints remain.

Example (backend/app/modules/chat/llm_processing.py):

  • Add:
    import logging
    logger = logging.getLogger(name)
  • Replace:
    print(context)
    with:
    logger.debug(context)
backend/app/modules/scraper/extractor.py (1)

26-31: Critical: incorrect requests.get call uses headers as params

requests.get(url, headers) treats headers as query params. This breaks headers and can leak data. Use the named headers kwarg.

-            res = requests.get(self.url, self.headers, timeout=10)
+            res = requests.get(self.url, headers=self.headers, timeout=10)
backend/app/db/vector_store.py (1)

20-31: Use logger instead of prints; chain exceptions with from e; fix typos

Two improvements recommended here:

  • Replace print with a module logger to align with PR goal.
  • Use exception chaining (B904) and fix message typos (“occurred”, “initializing”).

Apply this diff:

@@
-import os
-from pinecone import Pinecone, ServerlessSpec, CloudProvider, AwsRegion
+import os
+import logging
+from pinecone import Pinecone, ServerlessSpec, CloudProvider, AwsRegion
@@
-try:
-    # Initialize Pinecone client
-    pc = Pinecone(api_key=PINECONE_API_KEY)
-
-except Exception as e:
-    raise RuntimeError(f"Error occured while intialising pinecone client:{e}")
+logger = logging.getLogger(__name__)
+try:
+    # Initialize Pinecone client
+    pc = Pinecone(api_key=PINECONE_API_KEY)
+except Exception as e:
+    raise RuntimeError("Error occurred while initializing Pinecone client") from e
@@
 if not pc.has_index(INDEX_NAME):
-    print(f"Creating index: {INDEX_NAME}")
+    logger.info("Creating index: %s", INDEX_NAME)
     pc.create_index(
         name=INDEX_NAME,
         dimension=DIMENSIONS,
         metric=METRIC,
         spec=ServerlessSpec(cloud=CloudProvider.AWS, region=AwsRegion.US_EAST_1),
     )
 else:
-    print(f"Index '{INDEX_NAME}' already exists")
+    logger.info("Index '%s' already exists", INDEX_NAME)
@@
 try:
     # Connect to the index
     index = pc.Index(INDEX_NAME)
 except Exception as e:
-    raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}")
+    raise RuntimeError(f"Error occurred while connecting to the index {INDEX_NAME}") from e

Also applies to: 32-36

frontend/app/analyze/loading/page.tsx (1)

128-131: Intervals leak: cleanup function returned inside inner async is ignored by useEffect

The cleanup returned from runAnalysis is not used by useEffect, so intervals may continue running after unmount.

Here’s a minimal fix to ensure cleanup is returned by the effect itself:

useEffect(() => {
  let stepInterval: ReturnType<typeof setInterval> | undefined;
  let progressInterval: ReturnType<typeof setInterval> | undefined;

  const runAnalysis = async () => {
    // ... existing logic ...
    stepInterval = setInterval(/* ... */);
    progressInterval = setInterval(/* ... */);
  };

  runAnalysis();

  return () => {
    if (stepInterval) clearInterval(stepInterval);
    if (progressInterval) clearInterval(progressInterval);
  };
}, [router]);
backend/app/utils/fact_check_utils.py (3)

13-16: Bug: Checking input state instead of extractor result status

This will never detect extractor failures correctly if the input state lacks or retains a different status. You should check the result returned by run_claim_extractor_sdk.

-    if state.get("status") != "success":
-        print("❌ Claim extraction failed.")
-        return [], "Claim extraction failed."
+    if result.get("status") != "success":
+        return [], result.get("message", "Claim extraction failed.")

21-40: Replace prints with structured logging and implement the “polite delay” mentioned in comment

  • Replace all print calls with logger.info/warning/error for consistency and production readiness.
  • Implement a small delay (e.g., 1s) between search requests to respect provider rate limits.

Apply the following diff within this block:

-    print(f"🧠 Extracted claims: {claims}")
+    logger.info("Extracted claims: %s", claims)
@@
-        print(f"\n🔍 Searching for claim: {claim}")
+        logger.info("Searching for claim: %s", claim)
         try:
             results = search_google(claim)
             if results:
                 results[0]["claim"] = claim
                 search_results.append(results[0])
-                print(f"✅ Found result: {results[0]['title']}")
+                logger.info("Found result: %s", results[0]["title"])
             else:
-                print(f"⚠️ No search result for: {claim}")
+                logger.warning("No search result for: %s", claim)
         except Exception as e:
-            print(f"❌ Search failed for: {claim} -> {e}")
+            logger.exception("Search failed for: %s -> %s", claim, e)
+        # Be polite with search providers
+        time.sleep(1)

Add this near the imports (outside the shown range):

import logging
logger = logging.getLogger(__name__)

Do you want me to push a follow-up patch converting the remaining backend prints to use the module logger?


41-46: Propagate LLM verification errors to callers (fix required)

run_fact_verifier_sdk returns a status/message on failure; run_fact_check_pipeline currently swallows that and returns an empty list. Upstream callers already expect and handle an error tuple, so surface the LLM error instead of hiding it.

  • backend/app/modules/facts_check/llm_processing.py — run_fact_verifier_sdk (lines ~60–132): returns {"status":"success", "verifications": ...} or {"status":"error", "message": ...}
  • backend/app/utils/fact_check_utils.py — run_fact_check_pipeline (lines ~41–46): currently ignores final["status"] and returns final.get("verifications", []), None
  • backend/app/modules/langgraph_nodes/fact_check.py — caller (line 11) does: verifications, error_message = run_fact_check_pipeline(state) and checks error_message, so it will handle the propagated message

Suggested change:

-    final = run_fact_verifier_sdk(search_results)
-    return final.get("verifications", []), None
+    final = run_fact_verifier_sdk(search_results)
+    if final.get("status") != "success":
+        return [], final.get("message", "Fact verification failed.")
+    return final.get("verifications", []), None

Verified: the caller unpacks (verifications, error_message) and handles errors — apply the change to surface LLM verification failures.

backend/main.py (1)

33-40: CORS: Wildcard origin with allow_credentials=True is invalid in browsers

Browsers reject Access-Control-Allow-Origin: * when Access-Control-Allow-Credentials: true. Define explicit origins for credentialed requests, or turn credentials off for wildcard.

-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["*"],
-    allow_credentials=True,
-    allow_methods=["*"],
-    allow_headers=["*"],
-)
+allowed_origins = os.getenv("CORS_ALLOWED_ORIGINS", "*")
+if allowed_origins == "*":
+    # Wildcard allowed, but credentials must be disabled to be valid in browsers
+    app.add_middleware(
+        CORSMiddleware,
+        allow_origins=["*"],
+        allow_credentials=False,
+        allow_methods=["*"],
+        allow_headers=["*"],
+    )
+else:
+    origins = [o.strip() for o in allowed_origins.split(",") if o.strip()]
+    app.add_middleware(
+        CORSMiddleware,
+        allow_origins=origins,
+        allow_credentials=True,
+        allow_methods=["*"],
+        allow_headers=["*"],
+    )

Add this import near the top (outside the range):

import os

I can add a README note showing how to set CORS_ALLOWED_ORIGINS (comma-separated) for common deployments.

🧹 Nitpick comments (21)
backend/app/utils/store_vectors.py (1)

27-29: Prefer lazy/structured logging over f-strings in log messages

Using f-strings eagerly formats the message even when the log level is disabled. Switch to parameterized logging for performance and to enable structured logging later.

Apply:

-        logger.info(
-            f"Successfully stored {len(vectors)} vectors in namespace '{namespace}'"
-        )
+        logger.info(
+            "Successfully stored %d vectors in namespace '%s'",
+            len(vectors),
+            namespace,
+        )
backend/app/modules/langgraph_nodes/store_and_send.py (1)

9-21: Normalize error message casing/spelling

Minor consistency: capitalize messages and fix “occured” -> “occurred” to standardize logs and returned errors.

If you keep the message values user-visible, consider:

-            raise Exception(f"failed to embed chunks: {e}")
+            raise Exception(f"Failed to embed chunks: {e}")

And as shown above in the logger.exception message: “occurred”.

README.md (1)

139-141: Minor grammar and spacing

Add missing spaces and adjust capitalization for clarity.

  • /frontenddirectory” -> “/frontend directory”
  • “add following environment variable” -> “Add the following environment variable”
backend/app/modules/scraper/keywords.py (4)

21-22: Be explicit about the sort key; future-proof against upstream changes

RAKE returns (score, phrase) tuples today, but being explicit avoids surprises if the return shape changes.

-    keywords = [phrase for score, phrase in sorted(keywords_with_scores, reverse=True)]
+    keywords = [
+        phrase for score, phrase in sorted(
+            keywords_with_scores, key=lambda t: t[0], reverse=True
+        )
+    ]

5-15: Add return type hints for clarity

The docstring declares List[str], but the signature lacks a return type. Add it for better IDE/type-checker support.

-def extract_keywords(text: str, max_keywords: int = 15):
+def extract_keywords(text: str, max_keywords: int = 15) -> list[str]:

1-3: Adjust typing import for richer return typing in extract_keyword_data

If you annotate extract_keyword_data’s return as Dict[str, Any], import Any.

-from typing import Dict
+from typing import Dict, Any

25-41: Optionally annotate extract_keyword_data return

Improves downstream usage and tooling support.

-def extract_keyword_data(text: str) -> Dict:
+def extract_keyword_data(text: str) -> Dict[str, Any]:
backend/app/modules/bias_detection/check_bias.py (1)

16-18: Error message mismatch with parameter name

The function takes text, not cleaned_text. Align the message.

-            raise ValueError("Missing or empty 'cleaned_text'")
+            raise ValueError("Missing or empty 'text'")
backend/app/modules/chat/get_rag_data.py (1)

15-17: Make namespace configurable via environment

Hard-coding namespace reduces flexibility across environments/tenants. Consider reading it from an env var with a safe default.

Example:

-    results = index.query(
-        vector=embeddings, top_k=top_k, include_metadata=True, namespace="default"
-    )
+    namespace = os.getenv("PINECONE_NAMESPACE", "default")
+    results = index.query(vector=embeddings, top_k=top_k, include_metadata=True, namespace=namespace)
backend/app/modules/vector_store/embed.py (1)

21-26: Use a list comprehension for vectors; improves clarity and performance

The loop is fine, but a comprehension is simpler and faster for pure construction.

-    vectors = []
-    for chunk, embedding in zip(chunks, embeddings):
-        vectors.append(
-            {"id": chunk["id"], "values": embedding, "metadata": chunk["metadata"]}
-        )
-    return vectors
+    return [
+        {"id": chunk["id"], "values": embedding, "metadata": chunk["metadata"]}
+        for chunk, embedding in zip(chunks, embeddings)
+    ]
backend/app/modules/chat/llm_processing.py (1)

11-14: LGTM on context builder formatting

Equivalent semantics; joins explanations or reasoning across docs. Consider filtering out empty strings to avoid stray newlines.

frontend/app/page.tsx (1)

107-111: A11y nit: mark decorative icons as hidden from assistive tech

These Globe icons appear decorative. Add aria-hidden to avoid noise for screen readers.

-              <Globe className="w-4 h-4 md:w-5 md:h-5 text-white" />
+              <Globe aria-hidden className="w-4 h-4 md:w-5 md:h-5 text-white" />
@@
-              <Globe className="w-3 h-3 md:w-4 md:h-4 text-white" />
+              <Globe aria-hidden className="w-3 h-3 md:w-4 md:h-4 text-white" />

Also applies to: 281-284

backend/app/modules/scraper/extractor.py (1)

81-93: Docstrings missing; add brief descriptions (aligns with PR goal)

This module and class lack docstrings, which contradicts the PR objective. Consider adding concise docstrings for the class and methods.

Example insertion after class definition:

 class Article_extractor:
+    """Extracts article content using multiple strategies (trafilatura, Newspaper3k, BS4+Readability),
+    returning the first successful result with a non-empty 'text' field."""

If you want, I can generate full docstrings for all methods in this module.

backend/app/modules/vector_store/chunk_rag_data.py (1)

48-53: Hoist fact field names to a module constant

Minor readability/maintainability improvement: define FACT_FIELDS once at module scope and reuse.

Apply something like:

FACT_FIELDS = ("original_claim", "verdict", "explanation", "source_link")

# In the loop:
for field in FACT_FIELDS:
    if field not in fact:
        raise ValueError(f"Missing required fact field: {field} in fact index {i}")
backend/app/db/vector_store.py (1)

27-27: Parameterize region via env and keep spec formatting—LGTM otherwise

The single-line ServerlessSpec call is fine. Consider allowing CLOUD/REGION via env for deployments across regions. Example envs: PINECONE_CLOUD=AWS, PINECONE_REGION=us-east-1.

frontend/app/analyze/loading/page.tsx (1)

75-80: Use the normalized URL helper for API calls

Prevents double slashes and handles empty base (falls back to relative paths).

Apply this diff:

-            axios.post(`${backend_url}/api/process`, {
+            axios.post(makeUrl("/api/process"), {
               url: storedUrl,
             }),
-            axios.post(`${backend_url}/api/bias`, {
+            axios.post(makeUrl("/api/bias"), {
               url: storedUrl,
             }),
frontend/app/analyze/results/page.tsx (2)

48-51: Avoid toggling loading state in two places

You already handle loading in the second effect. Setting it here as well can cause flicker and redundant state updates.

-    if (storedBiasScore && storedData) {
-      setIsLoading(false);
-    }
+    // Let the second effect set isLoading once both are present

68-75: Prevent potential redirect race by setting the ref before push

You created isRedirecting but never set it, so the early return won’t trigger. Set it before router.push to avoid repeated redirects.

-    } else {
-      console.warn("No bias or data found. Redirecting...");
-      router.push("/analyze");
+    } else {
+      console.warn("No bias or data found. Redirecting...");
+      isRedirecting.current = true;
+      router.push("/analyze");
     }
backend/main.py (2)

49-52: Use logging instead of print for startup message

Aligns with “replace prints with logger” and integrates with uvicorn logging.

-    # Run development server
-    port = int(os.environ.get("PORT", 7860))
-    print(f"Server is running on http://0.0.0.0:{port}")
+    # Run development server
+    port = int(os.environ.get("PORT", 7860))
+    import logging
+    logger = logging.getLogger("uvicorn.error")
+    logger.info("Server is running on http://0.0.0.0:%s", port)

15-17: Docstring usage path may be inaccurate

Given this file is backend/main.py, verify the suggested command should be:

  • uvicorn backend.main:app --reload
backend/app/modules/langgraph_builder.py (1)

71-76: Redundant termination logic: pick either finish point or explicit end edge

You both:

  • add_conditional_edges("store_and_send", ...) to "end", and
  • set_finish_point("store_and_send").

Only one is needed. Consider removing the conditional edges for store_and_send.

-    graph.add_conditional_edges(
-        "store_and_send",
-        lambda x: ("error_handler" if x.get("status") == "error" else "__end__"),
-    )
-
-    graph.set_finish_point("store_and_send")
+    graph.set_finish_point("store_and_send")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ba87804 and 0455e36.

📒 Files selected for processing (27)
  • README.md (2 hunks)
  • backend/app/db/vector_store.py (2 hunks)
  • backend/app/modules/bias_detection/check_bias.py (2 hunks)
  • backend/app/modules/chat/embed_query.py (0 hunks)
  • backend/app/modules/chat/get_rag_data.py (1 hunks)
  • backend/app/modules/chat/llm_processing.py (2 hunks)
  • backend/app/modules/facts_check/web_search.py (1 hunks)
  • backend/app/modules/langgraph_builder.py (3 hunks)
  • backend/app/modules/langgraph_nodes/error_handler.py (1 hunks)
  • backend/app/modules/langgraph_nodes/fact_check.py (1 hunks)
  • backend/app/modules/langgraph_nodes/generate_perspective.py (2 hunks)
  • backend/app/modules/langgraph_nodes/sentiment.py (1 hunks)
  • backend/app/modules/langgraph_nodes/store_and_send.py (1 hunks)
  • backend/app/modules/scraper/cleaner.py (4 hunks)
  • backend/app/modules/scraper/extractor.py (5 hunks)
  • backend/app/modules/scraper/keywords.py (2 hunks)
  • backend/app/modules/vector_store/chunk_rag_data.py (1 hunks)
  • backend/app/modules/vector_store/embed.py (1 hunks)
  • backend/app/routes/routes.py (0 hunks)
  • backend/app/utils/fact_check_utils.py (1 hunks)
  • backend/app/utils/store_vectors.py (1 hunks)
  • backend/main.py (2 hunks)
  • backend/pyproject.toml (1 hunks)
  • backend/start.sh (0 hunks)
  • frontend/app/analyze/loading/page.tsx (2 hunks)
  • frontend/app/analyze/results/page.tsx (4 hunks)
  • frontend/app/page.tsx (11 hunks)
💤 Files with no reviewable changes (3)
  • backend/app/routes/routes.py
  • backend/app/modules/chat/embed_query.py
  • backend/start.sh
🧰 Additional context used
🧬 Code Graph Analysis (2)
backend/app/utils/fact_check_utils.py (1)
backend/app/modules/facts_check/llm_processing.py (1)
  • run_fact_verifier_sdk (60-132)
backend/app/modules/langgraph_builder.py (6)
backend/app/modules/langgraph_nodes/error_handler.py (1)
  • error_handler (1-10)
backend/app/modules/langgraph_nodes/sentiment.py (1)
  • run_sentiment_sdk (10-53)
backend/app/modules/langgraph_nodes/fact_check.py (1)
  • run_fact_check (4-28)
backend/app/modules/langgraph_nodes/generate_perspective.py (1)
  • generate_perspective (24-60)
backend/app/modules/langgraph_nodes/judge.py (1)
  • judge_perspective (13-53)
backend/app/modules/langgraph_nodes/store_and_send.py (1)
  • store_and_send (6-34)
🪛 Ruff (0.12.2)
backend/app/db/vector_store.py

36-36: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🪛 markdownlint-cli2 (0.17.2)
README.md

139-139: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


140-140: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


141-141: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


163-163: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


164-164: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


165-165: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (15)
backend/pyproject.toml (2)

2-2: Project name change looks good

Renaming the project to “backend” aligns with the README and directory layout.


6-6: Confirm Python 3.13 requirement across dependencies

Pinning requires-python to ">=3.13" is aggressive. Some libraries may lag 3.13 support. Verify that FastAPI, Uvicorn, LangChain, Pinecone SDK, etc., are all 3.13-compatible in your target deploy environment.

If not strictly required, consider ">=3.10" or ">=3.11" which are more widely supported.

backend/app/modules/scraper/keywords.py (1)

41-41: LGTM on trailing comma

Trailing comma improves diffs and is consistent with the style elsewhere.

backend/app/modules/langgraph_nodes/sentiment.py (1)

28-30: LGTM: simplified prompt construction

Consolidating to a single f-string improves readability without changing behavior.

backend/app/modules/bias_detection/check_bias.py (1)

33-34: LGTM: unified f-string

The prompt construction change is a no-op functionally and reads cleaner.

backend/app/modules/langgraph_nodes/generate_perspective.py (1)

16-16: Model identifier validated — no change required

Confirmed: Groq's ChatGroq supports "llama-3.3-70b-versatile" (per Groq docs).

  • File: backend/app/modules/langgraph_nodes/generate_perspective.py — line 16: llm = ChatGroq(model=my_llm, temperature=0.7)
backend/app/modules/chat/llm_processing.py (1)

33-34: Good: add user message with prompt

Adding the user message fixes the common pitfall of sending only a system message. This should improve LLM adherence to the provided context.

frontend/app/page.tsx (1)

136-139: LGTM: copy edits and formatting

UI copy reflows, semicolons, and CTA tweaks look good. No functional changes introduced.

Also applies to: 150-151, 162-164, 178-184, 195-197, 237-239, 264-266, 286-295, 300-300

backend/app/modules/scraper/extractor.py (1)

17-22: LGTM: header formatting cleanup

Pure formatting; effective UA header preserved.

backend/app/modules/vector_store/chunk_rag_data.py (1)

55-67: LGTM on the fact chunk construction

Clear, consistent structure with metadata (including article_id). Looks good.

backend/app/modules/facts_check/web_search.py (1)

22-22: LGTM on minor formatting

Trailing comma in the returned list is harmless and consistent.

backend/app/utils/fact_check_utils.py (1)

2-5: Import of run_fact_verifier_sdk enabled correctly

Making run_fact_verifier_sdk available here aligns this module with the LLM verification flow. No issues spotted.

frontend/app/analyze/results/page.tsx (1)

22-22: Guard against missing NEXT_PUBLIC_API_URL

If the env var is not set at build time, the URL becomes undefined and axios will call “undefined/api/chat”. Fail fast or provide a user-facing error.

-const backend_url = process.env.NEXT_PUBLIC_API_URL;
+const backend_url = process.env.NEXT_PUBLIC_API_URL ?? "";

Optionally add a runtime check inside handlers to show a friendly message when backend_url is empty (see chat handler suggestion below).

backend/app/modules/langgraph_builder.py (2)

8-9: Minor import formatting improvement LGTM

Trailing comma is fine and future-proofs additional imports.


57-70: Decision edge behavior differs from the PR summary

The code routes low scores (<70) to generate_perspective (unless retries>=3), not back to judge_perspective as described. Confirm intended behavior; if the loop should be judge->judge, adjust accordingly.

Comment on lines 13 to 15
print(text)
print(json.dumps(text))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Avoid printing user content; use a logger and minimize PII exposure

Dumping the full article to stdout/stderr is a privacy and compliance risk. Use a logger and avoid logging raw text in production.

-        print(text)
-        print(json.dumps(text))
+        logger.debug("Bias detection invoked (input_length=%s)", len(text) if isinstance(text, str) else "n/a")

Add at the top:

import logging
logger = logging.getLogger(__name__)
🤖 Prompt for AI Agents
In backend/app/modules/bias_detection/check_bias.py around lines 13-15, the code
prints full user/article text to stdout which risks exposing PII; remove the
print(text) and print(json.dumps(text)) calls and replace them with a logger
usage (add import logging and logger = logging.getLogger(__name__) at the top),
logging only non-sensitive minimal metadata instead of raw content — e.g., log
the text length, a deterministic hash/ID, or a redacted/truncated excerpt, and
ensure logging level is appropriate (debug/info) and that no raw article content
or PII is written to logs.

Comment on lines 19 to 24
matches = []
for match in results["matches"]:
matches.append({
"id": match["id"],
"score": match["score"],
"metadata": match["metadata"]
})
matches.append(
{"id": match["id"], "score": match["score"], "metadata": match["metadata"]}
)
return matches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid KeyError and simplify match extraction

Accessing results["matches"] can raise a KeyError if the client response changes or errors. Also, the append loop can be replaced with a concise, readable comprehension.

Apply this diff:

-    matches = []
-    for match in results["matches"]:
-        matches.append(
-            {"id": match["id"], "score": match["score"], "metadata": match["metadata"]}
-        )
-    return matches
+    return [
+        {"id": m.get("id"), "score": m.get("score"), "metadata": m.get("metadata")}
+        for m in results.get("matches", [])
+    ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
matches = []
for match in results["matches"]:
matches.append({
"id": match["id"],
"score": match["score"],
"metadata": match["metadata"]
})
matches.append(
{"id": match["id"], "score": match["score"], "metadata": match["metadata"]}
)
return matches
return [
{"id": m.get("id"), "score": m.get("score"), "metadata": m.get("metadata")}
for m in results.get("matches", [])
]
🤖 Prompt for AI Agents
In backend/app/modules/chat/get_rag_data.py around lines 19 to 24, replace the
manual loop that does results["matches"] and appends dicts with a safe, concise
list comprehension that uses results.get("matches", []) to avoid KeyError and
uses match.get("id"), match.get("score"), and match.get("metadata") for
defensive access; return the comprehension directly (e.g., return [{ "id":
match.get("id"), "score": match.get("score"), "metadata": match.get("metadata")
} for match in results.get("matches", [])]).

Comment on lines +11 to +13
results = requests.get(
f"https://www.googleapis.com/customsearch/v1?key={GOOGLE_SEARCH}&cx=f637ab77b5d8b4a3c&q={query}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Encode query via params, add timeout and HTTP error handling; avoid hard-coded cx

Current call risks bad encoding and long hangs. Use params, set a timeout, check status, and make cx configurable via env. Also guard against missing items to avoid KeyError.

Apply this diff:

-def search_google(query):
-    results = requests.get(
-        f"https://www.googleapis.com/customsearch/v1?key={GOOGLE_SEARCH}&cx=f637ab77b5d8b4a3c&q={query}"
-    )
-    res = results.json()
+def search_google(query):
+    cx = os.getenv("SEARCH_CX")
+    if not GOOGLE_SEARCH or not cx:
+        raise RuntimeError("Missing Google Custom Search API configuration (SEARCH_KEY/SEARCH_CX)")
+    results = requests.get(
+        "https://www.googleapis.com/customsearch/v1",
+        params={"key": GOOGLE_SEARCH, "cx": cx, "q": query},
+        timeout=10,
+    )
+    results.raise_for_status()
+    res = results.json()

Additionally, harden the parsing (outside the changed hunk):

items = res.get("items") or []
if not items:
    return []
first = {
    "title": items[0].get("title", ""),
    "link": items[0].get("link", ""),
    "snippet": items[0].get("snippet", ""),
}

Note: Avoid logging the full request URL to prevent leaking the API key.

🤖 Prompt for AI Agents
In backend/app/modules/facts_check/web_search.py around lines 11 to 13, the
requests.get call should be replaced to use params (so the query is
URL-encoded), include a timeout, and avoid a hard-coded cx: read CX from an
environment variable (with a sensible default or raise if missing) and pass both
key and cx via the params dict; after the request call check for HTTP errors
(response.raise_for_status() or if response.status_code != 200 then
handle/raise) and catch requests.exceptions.Timeout/RequestException to handle
network errors instead of hanging; when parsing the JSON, guard against missing
"items" by using res.get("items") or [] and return [] if empty, and build the
first result using .get for title/link/snippet as shown in the suggested
snippet; do not log the full request URL (avoid including the API key) — log
only safe metadata if needed.

Comment on lines 1 to 10
def error_handler(input):
print("Error detected!")
print(f"From: {input.get('error_from')}")
print(f"Message: {input.get('message')}")

return {"status": "stopped_due_to_error",
"from": [input.get("error_from")],
"error": [input.get("message")]
}
return {
"status": "stopped_due_to_error",
"from": [input.get("error_from")],
"error": [input.get("message")],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace prints with logger, add docstring, and avoid shadowing built-in input

This module still uses print statements and shadows the built-in name input. Switch to structured logging and rename the parameter to avoid confusion.

+import logging
+
-def error_handler(input):
-    print("Error detected!")
-    print(f"From: {input.get('error_from')}")
-    print(f"Message: {input.get('message')}")
+logger = logging.getLogger(__name__)
+
+def error_handler(err):
+    """Normalize and log an error emitted from a node in the LangGraph pipeline."""
+    logger.error("Error detected! from=%s message=%s", err.get("error_from"), err.get("message"))
@@
-    return {
-        "status": "stopped_due_to_error",
-        "from": [input.get("error_from")],
-        "error": [input.get("message")],
-    }
+    return {
+        "status": "stopped_due_to_error",
+        "from": err.get("error_from"),
+        "error": err.get("message"),
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def error_handler(input):
print("Error detected!")
print(f"From: {input.get('error_from')}")
print(f"Message: {input.get('message')}")
return {"status": "stopped_due_to_error",
"from": [input.get("error_from")],
"error": [input.get("message")]
}
return {
"status": "stopped_due_to_error",
"from": [input.get("error_from")],
"error": [input.get("message")],
}
import logging
logger = logging.getLogger(__name__)
def error_handler(err):
"""Normalize and log an error emitted from a node in the LangGraph pipeline."""
logger.error("Error detected! from=%s message=%s", err.get("error_from"), err.get("message"))
return {
"status": "stopped_due_to_error",
"from": err.get("error_from"),
"error": err.get("message"),
}
🤖 Prompt for AI Agents
In backend/app/modules/langgraph_nodes/error_handler.py around lines 1 to 10,
the function shadows the built-in name input, uses print statements, and lacks a
docstring; rename the parameter to something like payload or data to avoid
shadowing, add a concise docstring describing the function and its
parameters/return, and replace the prints with structured logging (use the
module logger: logger = logging.getLogger(__name__)) logging the error source
and message via logger.error or logger.exception with payload fields; keep the
returned dictionary shape the same and ensure you import logging at top if not
present.

Comment on lines +6 to +10
return {
"status": "stopped_due_to_error",
"from": [input.get("error_from")],
"error": [input.get("message")],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Avoid unintended schema change: 'from' and 'error' turned into lists

Returning lists instead of strings will likely break consumers expecting scalars. Unless downstream explicitly expects arrays here, keep these fields as strings.

Apply this diff:

-    return {
-        "status": "stopped_due_to_error",
-        "from": [input.get("error_from")],
-        "error": [input.get("message")],
-    }
+    return {
+        "status": "stopped_due_to_error",
+        "from": input.get("error_from"),
+        "error": input.get("message"),
+    }

If arrays are intended, confirm the schema and update all callers accordingly.


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Show file (if present) ==="
if [ -f backend/app/modules/langgraph_nodes/error_handler.py ]; then
  sed -n '1,240p' backend/app/modules/langgraph_nodes/error_handler.py || true
else
  echo "File not found: backend/app/modules/langgraph_nodes/error_handler.py"
fi

echo
echo "=== Search repository for relevant usages ==="
rg --hidden -n --no-ignore-vcs "stopped_due_to_error" || true
rg --hidden -n --no-ignore-vcs "error_from" || true
rg --hidden -n --no-ignore-vcs "\"from\":\s*\[" || true
rg --hidden -n --no-ignore-vcs "\"error\":\s*\[" || true
rg --hidden -n --no-ignore-vcs "get\\(.*message" || true
rg --hidden -n --no-ignore-vcs "get\\(.*error" || true
rg --hidden -n --no-ignore-vcs "['\"]error['\"]\]\[0\]" || true
rg --hidden -n --no-ignore-vcs "['\"]from['\"]\]\[0\]" || true

Length of output: 3449


Fix required — return scalars for "from" and "error" in error_handler

Returning lists here changes the response schema and will likely break callers. Repo search shows other nodes set "error_from" as strings and I found no callers expecting arrays — only this file returns arrays.

Files to fix / review:

  • backend/app/modules/langgraph_nodes/error_handler.py — change the returned fields to scalars.
  • Caller locations that supply error_from (for verification): backend/app/modules/langgraph_nodes/sentiment.py, generate_perspective.py, fact_check.py, judge.py, backend/app/modules/facts_check/llm_processing.py, backend/app/modules/bias_detection/check_bias.py, backend/app/modules/langgraph_nodes/store_and_send.py

Apply this diff:

-    return {
-        "status": "stopped_due_to_error",
-        "from": [input.get("error_from")],
-        "error": [input.get("message")],
-    }
+    return {
+        "status": "stopped_due_to_error",
+        "from": input.get("error_from"),
+        "error": input.get("message"),
+    }

If arrays were intentionally introduced, confirm the schema and update all callers accordingly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return {
"status": "stopped_due_to_error",
"from": [input.get("error_from")],
"error": [input.get("message")],
}
return {
"status": "stopped_due_to_error",
"from": input.get("error_from"),
"error": input.get("message"),
}
🤖 Prompt for AI Agents
In backend/app/modules/langgraph_nodes/error_handler.py around lines 6 to 10,
the function is returning "from" and "error" as single-element lists which
breaks the response schema; change both fields to scalars by returning the raw
values (e.g., input.get("error_from") and input.get("message")) instead of
wrapping them in lists, and ensure sensible defaults (empty string or None) if
keys are missing; after making this change, verify the listed caller files
(sentiment.py, generate_perspective.py, fact_check.py, judge.py,
backend/app/modules/facts_check/llm_processing.py,
backend/app/modules/bias_detection/check_bias.py, store_and_send.py) still work
with scalar values and update any caller that was (incorrectly) expecting arrays
or adjust the schema if arrays were intentional.

Comment on lines 12 to +16
for i, chunk in enumerate(chunks):
if not isinstance(chunk, dict) or "text" not in chunk:
raise ValueError(
f"Invalid chunk structure at index {i}:"
" missing 'text' field"
)
f"Invalid chunk structure at index {i}: missing 'text' field"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Validate required keys to prevent KeyError later

You only validate the 'text' field, but later access 'id' and 'metadata' directly. Missing keys will raise KeyError at vector construction.

Apply this diff to strengthen validation:

-    # Validate chunk structure
-    for i, chunk in enumerate(chunks):
-        if not isinstance(chunk, dict) or "text" not in chunk:
-            raise ValueError(
-                f"Invalid chunk structure at index {i}: missing 'text' field"
-            )
+    # Validate chunk structure
+    for i, chunk in enumerate(chunks):
+        if not isinstance(chunk, dict):
+            raise ValueError(f"Invalid chunk at index {i}: expected dict, got {type(chunk).__name__}")
+        missing = [k for k in ("id", "text", "metadata") if k not in chunk]
+        if missing:
+            raise ValueError(f"Invalid chunk structure at index {i}: missing keys: {missing}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i, chunk in enumerate(chunks):
if not isinstance(chunk, dict) or "text" not in chunk:
raise ValueError(
f"Invalid chunk structure at index {i}:"
" missing 'text' field"
)
f"Invalid chunk structure at index {i}: missing 'text' field"
)
# Validate chunk structure
for i, chunk in enumerate(chunks):
if not isinstance(chunk, dict):
raise ValueError(f"Invalid chunk at index {i}: expected dict, got {type(chunk).__name__}")
missing = [k for k in ("id", "text", "metadata") if k not in chunk]
if missing:
raise ValueError(f"Invalid chunk structure at index {i}: missing keys: {missing}")
🤖 Prompt for AI Agents
In backend/app/modules/vector_store/embed.py around lines 12 to 16, the current
validation only checks for 'text' but later code accesses 'id' and 'metadata',
which can cause KeyError; update the validation to ensure each chunk is a dict
and contains the keys 'text', 'id', and 'metadata' (optionally check that 'id'
is a string/int and 'metadata' is a dict), and raise a clear ValueError
indicating which key is missing and the chunk index so downstream vector
construction never receives malformed input.

Comment on lines +31 to 32
logger.error(f"Failed to store vectors in namespace '{namespace}': {e}")
raise RuntimeError(f"Vector storage failed: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Capture stacktrace when logging exceptions

Use logger.exception (or exc_info=True) to preserve the traceback. The current message loses valuable debugging context.

-        logger.error(f"Failed to store vectors in namespace '{namespace}': {e}")
+        logger.exception("Failed to store vectors in namespace '%s'", namespace)
         raise RuntimeError(f"Vector storage failed: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.error(f"Failed to store vectors in namespace '{namespace}': {e}")
raise RuntimeError(f"Vector storage failed: {e}")
logger.exception("Failed to store vectors in namespace '%s'", namespace)
raise RuntimeError(f"Vector storage failed: {e}")
🧰 Tools
🪛 Ruff (0.12.2)

32-32: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
In backend/app/utils/store_vectors.py around lines 31 to 32, the exception is
being logged with logger.error which drops the traceback; replace that call with
logger.exception(f"Failed to store vectors in namespace '{namespace}': {e}") or
logger.error(..., exc_info=True) to preserve the stacktrace, and when re-raising
wrap the original exception (e.g., raise RuntimeError(f"Vector storage failed:
{e}") from e) so exception chaining is preserved.

Comment on lines 19 to 20
const backend_url = process.env.NEXT_PUBLIC_API_URL;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Guard undefined NEXT_PUBLIC_API_URL and prevent double slashes

If the env is unset, requests will hit "undefined/api/...". Normalize the base URL or default to relative paths.

Apply this diff:

-const backend_url = process.env.NEXT_PUBLIC_API_URL;
+const rawBackendUrl = process.env.NEXT_PUBLIC_API_URL ?? "";
+// Remove trailing slashes to avoid `//api/...`
+const backend_url = rawBackendUrl.replace(/\/+$/, "");
+const makeUrl = (path: string) =>
+  `${backend_url}${path.startsWith("/") ? "" : "/"}${path}`;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const backend_url = process.env.NEXT_PUBLIC_API_URL;
const rawBackendUrl = process.env.NEXT_PUBLIC_API_URL ?? "";
// Remove trailing slashes to avoid `//api/...`
const backend_url = rawBackendUrl.replace(/\/+$/, "");
const makeUrl = (path: string) =>
`${backend_url}${path.startsWith("/") ? "" : "/"}${path}`;
🤖 Prompt for AI Agents
In frontend/app/analyze/loading/page.tsx around lines 19-20, guard against an
undefined NEXT_PUBLIC_API_URL and normalize it to avoid requests like
"undefined/api/..." or double slashes: read the env var, default to an empty
string or relative base when unset, strip any trailing slashes from the
configured base URL, and when building endpoint paths always join with a single
slash (e.g., ensure base has no trailing slash and prepend path with a single
leading slash) so resulting URLs are valid.

Comment on lines 78 to 95
async function handleSendMessage(e: React.FormEvent) {
e.preventDefault();
if (!message.trim()) return;
const newMessages = [...messages, { role: "user", content: message }];
setMessages(newMessages);
setMessage("");

const res = await axios.post("http://Thunder1245-perspective-backend.hf.space/api/chat", {
message: message
const res = await axios.post(`${backend_url}/api/chat`, {
message: message,
});
const data = res.data;

console.log(data)
const data = res.data;

// 🔹 Step 2: Append LLM’s response
setMessages([...newMessages, { role: "assistant", content: data.answer }]);
};
console.log(data);

// 🔹 Step 2: Append LLM’s response
setMessages([...newMessages, { role: "assistant", content: data.answer }]);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden chat submit: add error handling, backend_url guard, and functional state updates

  • Add try/catch around the request.
  • Use functional setMessages to avoid stale closures.
  • Guard against missing backend_url.
  • Remove console.log to avoid noisy logs in production.
-  async function handleSendMessage(e: React.FormEvent) {
+  async function handleSendMessage(e: React.FormEvent) {
     e.preventDefault();
     if (!message.trim()) return;
-    const newMessages = [...messages, { role: "user", content: message }];
-    setMessages(newMessages);
-    setMessage("");
-
-    const res = await axios.post(`${backend_url}/api/chat`, {
-      message: message,
-    });
-    const data = res.data;
-
-    console.log(data);
-
-    // 🔹 Step 2: Append LLM’s response
-    setMessages([...newMessages, { role: "assistant", content: data.answer }]);
+    const userMessage = { role: "user", content: message.trim() };
+    setMessages((prev) => [...prev, userMessage]);
+    setMessage("");
+
+    if (!backend_url) {
+      console.error("NEXT_PUBLIC_API_URL is not configured");
+      setMessages((prev) => [
+        ...prev,
+        { role: "assistant", content: "Configuration error: backend URL not set." },
+      ]);
+      return;
+    }
+
+    try {
+      const res = await axios.post(`${backend_url}/api/chat`, {
+        message: userMessage.content,
+      });
+      const data = res.data;
+      setMessages((prev) => [
+        ...prev,
+        { role: "assistant", content: data.answer },
+      ]);
+    } catch (err) {
+      console.error("Chat request failed", err);
+      setMessages((prev) => [
+        ...prev,
+        { role: "assistant", content: "Sorry, I couldn't reach the server. Please try again." },
+      ]);
+    }
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async function handleSendMessage(e: React.FormEvent) {
e.preventDefault();
if (!message.trim()) return;
const newMessages = [...messages, { role: "user", content: message }];
setMessages(newMessages);
setMessage("");
const res = await axios.post("http://Thunder1245-perspective-backend.hf.space/api/chat", {
message: message
const res = await axios.post(`${backend_url}/api/chat`, {
message: message,
});
const data = res.data;
console.log(data)
const data = res.data;
// 🔹 Step 2: Append LLM’s response
setMessages([...newMessages, { role: "assistant", content: data.answer }]);
};
console.log(data);
// 🔹 Step 2: Append LLM’s response
setMessages([...newMessages, { role: "assistant", content: data.answer }]);
}
async function handleSendMessage(e: React.FormEvent) {
e.preventDefault();
if (!message.trim()) return;
const userMessage = { role: "user", content: message.trim() };
setMessages((prev) => [...prev, userMessage]);
setMessage("");
if (!backend_url) {
console.error("NEXT_PUBLIC_API_URL is not configured");
setMessages((prev) => [
...prev,
{ role: "assistant", content: "Configuration error: backend URL not set." },
]);
return;
}
try {
const res = await axios.post(`${backend_url}/api/chat`, {
message: userMessage.content,
});
const data = res.data;
setMessages((prev) => [
...prev,
{ role: "assistant", content: data.answer },
]);
} catch (err) {
console.error("Chat request failed", err);
setMessages((prev) => [
...prev,
{ role: "assistant", content: "Sorry, I couldn't reach the server. Please try again." },
]);
}
}
🤖 Prompt for AI Agents
In frontend/app/analyze/results/page.tsx around lines 78 to 95, the chat submit
handler lacks error handling, doesn't guard against a missing backend_url,
mutates state with potential stale closures, and leaves a console.log; wrap the
axios call and subsequent state updates in a try/catch, return or show an error
if backend_url is falsy before making the request, remove the console.log, and
update messages using functional updates (e.g., setMessages(prev => [...prev,
{role: "user", content: message}]) and later setMessages(prev => [...prev,
{role: "assistant", content: data?.answer ?? 'Error: no response'}])) so you
avoid stale state and handle missing/undefined responses.

Comment on lines +137 to 146

Setup environment variables:*
- add .env file in `/frontend`directory.
- add following environment variable in your .env file.
```
NEXT_PUBLIC_API_URL = http://localhost:8000

```

```bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix list indentation, code fence language, and .env formatting

Address markdownlint issues and ensure .env syntax is valid (no spaces around =).

-Setup environment variables:*
-  - add .env file in `/frontend`directory.
-  - add following environment variable in your .env file.
-```
-NEXT_PUBLIC_API_URL = http://localhost:8000
-
-```
+Setup environment variables:
+- Add a .env file in the `/frontend` directory.
+- Add the following environment variable to your .env file.
+```env
+NEXT_PUBLIC_API_URL=http://localhost:8000
+```
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

139-139: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


140-140: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


141-141: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In README.md around lines 137 to 146, fix the Markdown formatting: correct the
list indentation and capitalization, use a fenced code block with language
"env", and remove spaces around the equals sign in the .env example;
specifically update the bullet items to "Add a .env file in the `/frontend`
directory." and "Add the following environment variable to your .env file.",
replace the current code fence with ```env and show
NEXT_PUBLIC_API_URL=http://localhost:8000 (no spaces around =), then close the
fence with ``` to satisfy markdownlint and valid .env syntax.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (16)
backend/app/modules/vector_store/chunk_rag_data.py (1)

38-41: Validate cleaned_text type/emptiness early for clearer errors.

Currently, invalid cleaned_text types bubble up from generate_id. Add an explicit check.

Apply this diff:

         if not isinstance(data["facts"], list):
             raise ValueError("Facts must be a list")
 
-        # Validate perspective structure
+        # Validate cleaned_text
+        if not isinstance(data["cleaned_text"], str) or not data["cleaned_text"].strip():
+            raise ValueError("cleaned_text must be a non-empty string")
+
+        # Validate perspective structure
backend/app/utils/fact_check_utils.py (2)

45-47: Bug: checking the wrong object for status (state vs result)

You call run_claim_extractor_sdk(state) into result but then check state.get("status"), which will never reflect the extractor outcome.

Apply this diff:

-    if state.get("status") != "success":
-        print("❌ Claim extraction failed.")
-        return [], "Claim extraction failed."
+    if result.get("status") != "success":
+        return [], "Claim extraction failed."

53-71: Replace prints with logger, honor “polite delay,” and propagate verifier errors

  • Replace print calls with a module logger to match PR goals and avoid noisy stdout.
  • The docstring mentions a “polite delay” but none is implemented; add a small sleep to respect rate limits.
  • Handle verifier error by checking final.get("status").

Apply this diff:

+import logging
 from app.modules.facts_check.web_search import search_google
 from app.modules.facts_check.llm_processing import (
     run_claim_extractor_sdk,
     run_fact_verifier_sdk,
 )
 import re
 import time
+logger = logging.getLogger(__name__)
@@
-    print(f"🧠 Extracted claims: {claims}")
+    logger.debug("Extracted claims: %s", claims)
@@
-    for claim in claims:
-        print(f"\n🔍 Searching for claim: {claim}")
+    for claim in claims:
+        logger.info("Searching for claim: %s", claim)
         try:
             results = search_google(claim)
             if results:
                 results[0]["claim"] = claim
                 search_results.append(results[0])
-                print(f"✅ Found result: {results[0]['title']}")
+                logger.info("Found result: %s", results[0].get("title"))
             else:
-                print(f"⚠️ No search result for: {claim}")
+                logger.warning("No search result for: %s", claim)
         except Exception as e:
-            print(f"❌ Search failed for: {claim} -> {e}")
+            logger.exception("Search failed for claim: %s", claim)
+        # Polite delay to avoid hammering the search API
+        time.sleep(1)
@@
-    final = run_fact_verifier_sdk(search_results)
-    return final.get("verifications", []), None
+    final = run_fact_verifier_sdk(search_results)
+    if final.get("status") != "success":
+        return [], final.get("message", "Fact verification failed.")
+    return final.get("verifications", []), None

Also applies to: 73-79

backend/app/modules/langgraph_nodes/sentiment.py (1)

63-69: Replace print with logger and consider guarding missing API key

Use a module logger for errors. Also, constructing the Groq client with a missing API key yields confusing runtime failures. Fail fast with a clear message.

Apply this diff:

+import logging
 import os
 from groq import Groq
 from dotenv import load_dotenv
 
 load_dotenv()
 
-client = Groq(api_key=os.getenv("GROQ_API_KEY"))
+logger = logging.getLogger(__name__)
+api_key = os.getenv("GROQ_API_KEY")
+if not api_key:
+    # Fail fast with actionable error; avoid silent misconfigurations
+    raise RuntimeError("GROQ_API_KEY is not set")
+client = Groq(api_key=api_key)
@@
     except Exception as e:
-        print(f"Error in sentiment_analysis: {e}")
+        logger.exception("Error in sentiment_analysis")
         return {
             "status": "error",
             "error_from": "sentiment_analysis",
             "message": str(e),
         }
backend/app/utils/prompt_templates.py (1)

45-59: Align output schema with PerspectiveOutput and avoid fenced JSON

Your structured output model uses fields "perspective" and "reasoning" (string), but the prompt instructs "counter_perspective" and "reasoning_steps" (list) and wraps the JSON in a code fence. This will conflict with structured parsing.

Proposed fix: instruct the model to return flat JSON (no code fence) with keys that match the Pydantic model.

-Generate a logical and respectful *opposite perspective* to the article.
-Use *step-by-step reasoning* and return your output in this JSON format:
-
-```json
-{
-  "counter_perspective": "<your opposite point of view>",
-  "reasoning_steps": [
-    "<step 1>",
-    "<step 2>",
-    "<step 3>",
-    "...",
-    "<final reasoning>"
-  ]
-}
-```
+Generate a logical and respectful opposite perspective to the article.
+Use step-by-step reasoning and return ONLY valid JSON with this schema:
+{
+  "perspective": "<your opposite point of view>",
+  "reasoning": "<step-by-step reasoning in a single coherent paragraph>"
+}

If you prefer to keep a list of steps, update PerspectiveOutput to use reasoning_steps: list[str] and reflect this across the pipeline instead.

backend/app/modules/bias_detection/check_bias.py (1)

65-70: Parse and validate numeric bias score

The model can still return extra tokens; parse, validate [0..100], and return an integer.

-        bias_score = chat_completion.choices[0].message.content.strip()
-
-        return {
-            "bias_score": bias_score,
-            "status": "success",
-        }
+        raw = chat_completion.choices[0].message.content.strip()
+        match = re.search(r"\b\d{1,3}\b", raw)
+        if not match:
+            raise ValueError(f"Non-numeric bias score returned: {raw!r}")
+        score = int(match.group(0))
+        if not 0 <= score <= 100:
+            raise ValueError(f"Bias score out of range: {score}")
+        return {"bias_score": score, "status": "success"}

Add once at the top-level imports (outside the selected range):

import re
backend/app/modules/chat/llm_processing.py (2)

44-45: Replace printing context (PII) with safe logging metadata

Do not emit raw context; log only minimal metadata via logger.

-    print(context)
+    logger.debug("ask_llm invoked (context_length=%s, docs=%s)", len(context), len(docs) if docs else 0)

Add once at top of file (outside selected range):

import logging
logger = logging.getLogger(__name__)

54-62: Add error handling around LLM call and provide clearer failure mode

Wrap the call to handle network/auth/model errors; log exceptions and re-raise or return a controlled error.

-    response = client.chat.completions.create(
-        model="gemma2-9b-it",
-        messages=[
-            {"role": "system", "content": "Use only the context to answer."},
-            {"role": "user", "content": prompt},
-        ],
-    )
+    try:
+        response = client.chat.completions.create(
+            model="gemma2-9b-it",
+            messages=[
+                {"role": "system", "content": "Use only the context to answer."},
+                {"role": "user", "content": prompt},
+            ],
+        )
+    except Exception:
+        logger.exception("LLM request failed in ask_llm")
+        raise
backend/app/db/vector_store.py (3)

34-36: Use exception chaining and correct typos in error messages

Chain exceptions with "from e" and fix spelling for clarity.

-except Exception as e:
-    raise RuntimeError(f"Error occured while intialising pinecone client:{e}")
+except Exception as e:
+    raise RuntimeError("Error occurred while initializing Pinecone client") from e
...
-except Exception as e:
-    raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}")
+except Exception as e:
+    raise RuntimeError(f"Error occurred while connecting to index {INDEX_NAME}") from e

Also applies to: 58-58


43-50: Replace prints with logger to meet PR objective and avoid stdout noise

Use logging instead of print for index creation/existence messages.

-if not pc.has_index(INDEX_NAME):
-    print(f"Creating index: {INDEX_NAME}")
+if not pc.has_index(INDEX_NAME):
+    logger.info("Creating index: %s", INDEX_NAME)
@@
-else:
-    print(f"Index '{INDEX_NAME}' already exists")
+else:
+    logger.info("Index %r already exists", INDEX_NAME)

Add once at top of file (outside selected range):

import logging
logger = logging.getLogger(__name__)

Also applies to: 52-52


23-33: Avoid import-time side effects — make Pinecone init lazy or explicit

vector_store.py performs Pinecone initialization, may create the index, and opens a connection at import time; store_vectors.py imports index, which triggers those side effects during import.

Points to address

  • backend/app/db/vector_store.py
    • Module-level operations to move: reading/validating PINECONE_API_KEY, pc = Pinecone(...), pc.has_index(...) / pc.create_index(...), and index = pc.Index(...).
  • backend/app/utils/store_vectors.py
    • Line ~24: from app.db.vector_store import index — importing index causes the import-time side effects.

Suggested fix (minimal)

  • Replace module-level initialization with an explicit initializer or lazy getter, e.g.:
    • Expose def get_index(): (lazily initializes client/index and returns it) or def init_pinecone() called from application startup.
    • Move PINECONE_API_KEY validation into that function (do not raise on import).
  • Update callers (store_vectors.py) to call get_index() (or receive the index via DI) instead of importing index at module import.

Quick example:

  • Change store_vectors import:
    • from: from app.db.vector_store import index
    • to: from app.db.vector_store import get_index
    • then inside store(): index = get_index()

Reason: avoids tests and process startup failures (missing env var, network calls, index creation) at import time and allows easier testing/DI.

backend/app/modules/langgraph_nodes/generate_perspective.py (1)

31-41: Mismatch between prompt output fields and PerspectiveOutput schema

Prompt templates currently instruct "counter_perspective" and "reasoning_steps", but this model expects "perspective" (str) and "reasoning" (str). Resolve by aligning the prompt (preferred) or the model fields.

  • Option A (preferred): Update prompt output keys to "perspective" and "reasoning" (see prompt_templates.py suggestion).
  • Option B: Change PerspectiveOutput to match prompt (e.g., reasoning_steps: list[str], counter_perspective: str) and adapt downstream users accordingly.
backend/app/modules/langgraph_builder.py (1)

47-55: Fix MyState typing: optional fields + perspective type mismatch with downstream usage

  • All fields except cleaned_text are optional in practice. Use Required/NotRequired to reflect this and avoid misleading type hints.
  • generate_perspective returns an object that judge_perspective accesses with getattr(..., "perspective", ...) (see langgraph_nodes/judge.py). Typing perspective as str is incorrect and will confuse maintainers and tools.

Apply this diff to the state definition:

-class MyState(TypedDict):
-    cleaned_text: str
-    facts: list[dict]
-    sentiment: str
-    perspective: str
-    score: int
-    retries: int
-    status: str
+class MyState(TypedDict):
+    cleaned_text: Required[str]
+    facts: NotRequired[list[dict]]
+    sentiment: NotRequired[str]
+    # judge_perspective expects an object with a .perspective attribute; allow object or str
+    perspective: NotRequired[object]
+    score: NotRequired[int]
+    retries: NotRequired[int]
+    status: NotRequired[str]

Optionally, if a concrete type exists (e.g., PerspectiveOutput), we can type it more precisely without importing at runtime:

# Place near imports
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from app.modules.langgraph_nodes.generate_perspective import PerspectiveOutput
# Then use:
# perspective: NotRequired["PerspectiveOutput | str"]
backend/app/modules/pipeline.py (3)

59-61: Replace print with logger (aligns with PR objective and avoids noisy stdout)

Use the standard logging module and log at debug level instead of printing.

Apply these diffs:

@@
-import json
+import json
+import logging
+logger = logging.getLogger(__name__)
@@
-    # Optional: pretty print raw_text for debugging
-    print(json.dumps(result, indent=2, ensure_ascii=False))
+    # Optional: pretty print result for debugging
+    logger.debug("Scraper pipeline result: %s", json.dumps(result, indent=2, ensure_ascii=False))

50-54: Harden against missing extractor output to avoid KeyError

Article_extractor.extract() may fail or change shape; indexing raw_text["text"] will raise KeyError. Guard and provide a clear error.

Apply this diff:

-    result = {}
-    cleaned_text = clean_extracted_text(raw_text["text"])
+    result = {}
+    try:
+        extracted_text = raw_text["text"]
+    except (TypeError, KeyError):
+        raise ValueError("Extractor returned no 'text' field")  # or return {"status": "error", ...}
+    cleaned_text = clean_extracted_text(extracted_text)

59-61: Replace remaining print(...) calls with logger (repository-wide)

Search of backend/app found 34 print(...) occurrences across 14 files; the print in backend/app/modules/pipeline.py (line 60) is still present.

Files/locations to fix:

  • backend/app/modules/pipeline.py:60
    • print(json.dumps(result, indent=2, ensure_ascii=False))
  • backend/app/utils/fact_check_utils.py:46, 53, 61, 67, 69, 71
  • backend/app/routes/routes.py:63, 70, 80
  • backend/app/db/vector_store.py:44, 52
  • backend/app/modules/bias_detection/check_bias.py:37, 38, 73
  • backend/app/modules/vector_store/chunk_rag_data.py:98
  • backend/app/modules/chat/llm_processing.py:44
  • backend/app/modules/langgraph_nodes/store_and_send.py:26, 36, 41, 44
  • backend/app/modules/langgraph_nodes/error_handler.py:17, 18, 19
  • backend/app/modules/langgraph_nodes/judge.py:65
  • backend/app/modules/langgraph_nodes/sentiment.py:64, 81 (commented)
  • backend/app/modules/langgraph_nodes/fact_check.py:31, 39
  • backend/app/modules/langgraph_nodes/generate_perspective.py:76
  • backend/app/modules/facts_check/llm_processing.py:77, 135, 141, 152

Please replace these debug prints with the project logger (e.g., logger.debug/info/error) or remove them if no longer needed.

♻️ Duplicate comments (11)
backend/app/modules/vector_store/chunk_rag_data.py (4)

41-59: Perspective normalization is incomplete; dict-shaped inputs will fail.

You compute perspective_data but never use it, and only support attribute access. Dict inputs with perspective/reasoning will raise. Normalize and use perspective_text/perspective_reasoning.

Apply this diff:

-        # Validate perspective structure
-        perspective_data = data["perspective"]
-        if hasattr(perspective_data, "dict"):
-            perspective_data = perspective_data.dict()
+        # Normalize perspective into text and reasoning
+        perspective = data["perspective"]
+        if hasattr(perspective, "dict"):
+            perspective = perspective.dict()
+        if isinstance(perspective, dict):
+            if "perspective" not in perspective or "reasoning" not in perspective:
+                raise ValueError("Perspective dict missing required fields: 'perspective' and 'reasoning'")
+            perspective_text = perspective["perspective"]
+            perspective_reasoning = perspective["reasoning"]
+        else:
+            if not (hasattr(perspective, "perspective") and hasattr(perspective, "reasoning")):
+                raise ValueError("Perspective object missing required fields")
+            perspective_text = perspective.perspective
+            perspective_reasoning = perspective.reasoning

49-59: Remove unused perspective_obj block; it’s superseded by normalization.

This block is redundant and causes the dict case to fail.

Apply this diff:

-        # Add counter-perspective chunk
-        perspective_obj = data["perspective"]
-
-        # Optional safety check
-
-        if not (
-            hasattr(perspective_obj, "perspective")
-            and hasattr(perspective_obj, "reasoning")
-        ):
-            raise ValueError("Perspective object missing required fields")
+        # Add counter-perspective chunk

97-99: Replace print with module logger and keep stack trace.

Use a logger per PR goal; logger.exception preserves traceback.

Apply this diff:

-    except Exception as e:
-        print(f"[Error] Failed to chunk the data: {e}")
-        raise
+    except Exception as e:
+        logger.exception("Failed to chunk the data")
+        raise

Add this near the imports (top of file):

import logging
logger = logging.getLogger(__name__)

60-67: Use normalized variables to avoid attribute errors for dict perspectives.

Apply this diff:

         chunks.append(
             {
                 "id": f"{article_id}-perspective",
-                "text": perspective_obj.perspective,
+                "text": perspective_text,
                 "metadata": {
                     "type": "counter-perspective",
-                    "reasoning": perspective_obj.reasoning,
+                    "reasoning": perspective_reasoning,
                     "article_id": article_id,
                 },
             }
         )
backend/app/modules/langgraph_nodes/fact_check.py (1)

30-44: Replace prints with logger and add function docstring (per earlier review)

This echoes the previous review’s guidance: use a module logger and document run_fact_check; also fix “occured” typo.

Apply this diff:

-from app.utils.fact_check_utils import run_fact_check_pipeline
+from app.utils.fact_check_utils import run_fact_check_pipeline
+import logging
+logger = logging.getLogger(__name__)
@@
-def run_fact_check(state):
+def run_fact_check(state: dict) -> dict:
+    """
+    Run the fact-check pipeline for a given state.
+
+    Args:
+        state (dict): Expects 'cleaned_text' and optional context.
+
+    Returns:
+        dict: On success, returns the updated state with 'facts' and 'status'='success'.
+              On failure, returns an error dict with 'status'='error', 'error_from', and 'message'.
+    """
     try:
         text = state.get("cleaned_text")
@@
-        if error_message:
-            print(f"some error occured in fact_checking:{error_message}")
+        if error_message:
+            logger.error("Error in fact_checking: %s", error_message)
             return {
                 "status": "error",
                 "error_from": "fact_checking",
                 "message": f"{error_message}",
             }
 
     except Exception as e:
-        print(f"some error occured in fact_checking:{e}")
+        logger.exception("Unexpected error in fact_checking")
         return {
             "status": "error",
             "error_from": "fact_checking",
-            "message": f"{e}",
+            "message": str(e),
         }
backend/app/modules/chat/get_rag_data.py (1)

45-50: Avoid KeyError and simplify match extraction (reuse earlier suggestion)

Return a safe, concise list comprehension that tolerates missing keys in the response.

Apply this diff:

-    matches = []
-    for match in results["matches"]:
-        matches.append(
-            {"id": match["id"], "score": match["score"], "metadata": match["metadata"]}
-        )
-    return matches
+    return [
+        {"id": m.get("id"), "score": m.get("score"), "metadata": m.get("metadata")}
+        for m in results.get("matches", [])
+    ]
backend/app/modules/bias_detection/check_bias.py (1)

37-39: Replace prints of user content with logger; avoid PII leakage

Raw article text is printed, and errors are printed. Use a logger and log only metadata (e.g., length). This was flagged previously.

-        print(text)
-        print(json.dumps(text))
+        logger.debug("Bias detection invoked (input_length=%s)", len(text) if isinstance(text, str) else "n/a")
...
-    except Exception as e:
-        print(f"Error in bias_detection: {e}")
+    except Exception as e:
+        logger.exception("Error in bias_detection")

Add once at top of file (outside the selected range):

import logging
logger = logging.getLogger(__name__)

Also applies to: 72-74

backend/app/db/vector_store.py (1)

58-58: Static analysis: prefer B904 (raise from e)

This addresses the Ruff B904 hint and improves debuggability. The diff above includes this fix.

backend/app/modules/langgraph_nodes/generate_perspective.py (3)

59-66: Fix f-strings and use local facts variable (bug)

Only the first line is an f-string; verdict and explanation are literal. Also iterate over the local facts variable instead of state["facts"].

-        facts_str = "\n".join(
-            [
-                f"Claim: {f['original_claim']}\n"
-                "Verdict: {f['verdict']}\nExplanation: "
-                "{f['explanation']}"
-                for f in state["facts"]
-            ]
-        )
+        facts_str = "\n".join(
+            [
+                f"Claim: {f['original_claim']}\n"
+                f"Verdict: {f['verdict']}\n"
+                f"Explanation: {f['explanation']}"
+                for f in facts
+            ]
+        )

75-81: Use logger in exception path and fix typo

Replace print with structured logging; "occured" -> "occurred".

-    except Exception as e:
-        print(f"some error occured in generate_perspective:{e}")
+    except Exception as e:
+        logger.exception("Error occurred in generate_perspective")
         return {
             "status": "error",
             "error_from": "generate_perspective",
-            "message": f"{e}",
+            "message": str(e),
         }

Add once at top of file (outside selected range):

import logging
logger = logging.getLogger(__name__)

82-82: Normalize structured LLM output to a plain dict for downstream/serialization

Returning a Pydantic model directly can cause serialization issues. Convert to dict and, if desired, expose specific fields consistently.

-    return {**state, "perspective": result, "status": "success"}
+    result_dict = (
+        result
+        if isinstance(result, dict)
+        else (result.model_dump() if hasattr(result, "model_dump") else result.dict())
+    )
+    # If downstream expects only the perspective text:
+    perspective_value = result_dict.get("perspective", result_dict)
+    return {**state, "perspective": perspective_value, "status": "success"}
🧹 Nitpick comments (21)
backend/app/utils/generate_chunk_id.py (2)

29-33: Deterministic short IDs LGTM; minor grammar nit in error message.

Logic is correct and concise. Consider fixing the error string grammar.

Apply this diff:

-        raise ValueError("Text must be non-empty string")
+        raise ValueError("Text must be a non-empty string")

29-33: Optional: expose prefix/length as constants to balance readability vs. collision risk.

15 hex chars (~60 bits) makes collisions extremely unlikely, but if IDs are externally visible or long-lived across datasets, consider defining PREFIX = "article-" and HASH_LEN = 15 at module scope for easy adjustment later (e.g., HASH_LEN=16 or 20).

backend/app/modules/vector_store/chunk_rag_data.py (1)

72-93: Optional: validate fact field types/emptiness for robustness.

Currently you only check presence. Consider asserting string types and non-empty for original_claim/explanation/source_link and a constrained set for verdict if applicable.

I can provide a minimal schema check snippet if you want stricter validation.

backend/app/modules/chat/embed_query.py (1)

25-31: Avoid import-time model initialization; add type hints and resilient error logging

Constructing SentenceTransformer at import time increases cold-start latency (especially in serverless) and makes module import fail if the model/env is misconfigured. Prefer lazy, cached initialization; also add a return type hint and structured logging.

Apply this diff to add lazy init, type hints, and logging:

+import logging
+from functools import lru_cache
 from sentence_transformers import SentenceTransformer
+logger = logging.getLogger(__name__)
 
-embedder = SentenceTransformer("all-MiniLM-L6-v2")
+@lru_cache(maxsize=1)
+def _get_embedder() -> SentenceTransformer:
+    # Pin the model name to ensure embedding dimension stability (384 for MiniLM-L6-v2)
+    return SentenceTransformer("all-MiniLM-L6-v2")
 
-def embed_query(query: str):
-    embeddings = embedder.encode(query).tolist()
-
-    return embeddings
+def embed_query(query: str) -> list[float]:
+    try:
+        return _get_embedder().encode(query).tolist()
+    except Exception:
+        logger.exception("Failed to embed query")
+        raise
backend/app/utils/fact_check_utils.py (3)

33-37: Optional: Validate Google API key presence before search loop

If GOOGLE_SEARCH or related env is missing, search_google may fail repeatedly. Consider early validation and a clear error before iterating claims.


1-30: Docstring mentions behavior not enforced (polite delay)

The narrative promises a polite delay in searches; before the above refactor it was missing. Ensure the code and docstring stay in sync.


33-37: Potential downstream JSON-parse robustness issue in run_fact_verifier_sdk

Referencing backend/app/modules/facts_check/llm_processing.py (snippet provided), parsed can be referenced even if json.loads fails. That function should guard parsed initialization or skip appending on parse failure to prevent NameError and inconsistent outputs.

Would you like me to open a follow-up PR to harden run_fact_verifier_sdk? I can patch it to default parsed to a structured error object on parse failures and avoid prints.

backend/app/modules/langgraph_nodes/sentiment.py (1)

26-62: Optional: add type hints to function for consistency

Add -> dict return annotation to run_sentiment_sdk for better IDE/type-checker support.

backend/app/modules/chat/get_rag_data.py (2)

34-36: Avoid import-time client/index initialization; validate API key and consider centralizing config

  • Import-time Pinecone client/index creation can break module import and hurts cold starts. Lazily initialize and cache instead.
  • Validate PINECONE_API_KEY and make index name configurable (env/constant) to prevent hard-coding and drift.

Apply this refactor:

-from dotenv import load_dotenv
-from app.modules.chat.embed_query import embed_query
-import os
-
-load_dotenv()
-
-pc = Pinecone(os.getenv("PINECONE_API_KEY"))
-index = pc.Index("perspective")
+from dotenv import load_dotenv
+from app.modules.chat.embed_query import embed_query
+import os
+import logging
+from functools import lru_cache
+
+logger = logging.getLogger(__name__)
+load_dotenv()
+
+@lru_cache(maxsize=1)
+def _get_index():
+    api_key = os.getenv("PINECONE_API_KEY")
+    if not api_key:
+        raise RuntimeError("PINECONE_API_KEY is not set")
+    index_name = os.getenv("PINECONE_INDEX_NAME", "perspective")
+    pc = Pinecone(api_key)
+    return pc.Index(index_name)
@@
-    results = index.query(
+    results = _get_index().query(
         vector=embeddings, top_k=top_k, include_metadata=True, namespace="default"
     )

Also consider moving Pinecone constants into a single vector_store module and reusing them here to avoid configuration drift.


13-24: Docstring states “Encodes the input query” here — ensure embed_query already pre-encodes

Given embed_query(query) returns the vector, this function no longer “encodes” itself; consider rephrasing to “Embeds the query and searches Pinecone.”

backend/app/utils/prompt_templates.py (2)

31-33: Remove stray quotes splitting the sentence in the template header

The single quotes around the line break will render literally in the prompt. Merge the sentence without quotes.

-You are an AI assistant that generates a well-reasoned '
-'counter-perspective to a given article.
+You are an AI assistant that generates a well-reasoned counter-perspective to a given article.

13-16: Docstring input type for facts is inaccurate

The prompt currently accepts a single formatted string for facts (facts_str), not a Python list. Update the docstring to avoid confusion.

-            facts (list): Verified factual information related to the article.
+            facts (str): Verified factual information related to the article (formatted string).
backend/app/modules/bias_detection/check_bias.py (3)

41-41: Error message references the wrong parameter name

The function parameter is "text", not "cleaned_text".

-            raise ValueError("Missing or empty 'cleaned_text'")
+            raise ValueError("Missing or empty 'text'")

30-33: Validate GROQ_API_KEY early to fail fast with a clear error

If the env var is missing, initializing the client succeeds but requests will fail later with an opaque error. Validate upfront.

Example (outside selected range):

api_key = os.getenv("GROQ_API_KEY")
if not api_key:
    raise RuntimeError("GROQ_API_KEY environment variable not set")
client = Groq(api_key=api_key)

25-33: Consider lazy client initialization for import-time side effects

Creating external clients at import can hinder testing and module import; consider lazy-init in the function or via a getter.

backend/app/modules/chat/llm_processing.py (1)

30-33: Validate GROQ_API_KEY presence to fail fast (optional)

Similar to bias_detection, validate env upfront to avoid opaque runtime errors.

Example (outside selected range):

api_key = os.getenv("GROQ_API_KEY")
if not api_key:
    raise RuntimeError("GROQ_API_KEY environment variable not set")
client = Groq(api_key=api_key)
backend/app/modules/langgraph_builder.py (3)

1-31: Good, comprehensive module docstring, but the termination claim is misleading

Doc says “Ensures the graph terminates only after successful storage.” Yet there is an explicit error path to error_handler. Either:

  • update the doc to reflect error termination, or
  • wire error_handler to end (and avoid set_finish_point) so the doc matches behavior.

68-69: Nit: remove trailing comma in set_entry_point for readability

No functional change, but avoids a dangling tuple-like style.

Apply this diff:

-    graph.set_entry_point(
-        "sentiment_analysis",
-    )
+    graph.set_entry_point("sentiment_analysis")

90-103: Simplify nested conditional for judge routing for maintainability

The nested ternary is hard to scan. A small helper improves clarity and reduces risk of logic mistakes.

You can extract the routing logic:

def _route_from_judge(state: dict) -> str:
    if state.get("status") == "error":
        return "error_handler"
    score = state.get("score", 0)
    if score < 70:
        return "store_and_send" if state.get("retries", 0) >= 3 else "generate_perspective"
    return "store_and_send"

graph.add_conditional_edges("judge_perspective", _route_from_judge)
backend/app/modules/pipeline.py (2)

42-44: Consider lazy-initializing the compiled graph to avoid heavy work at import time

Compiling the graph at module import makes any import of pipeline.py perform potentially heavy work. Prefer lazy initialization with caching to reduce cold-start latency and make importing cheap.

Suggested pattern:

-# Compile once when module loads
-_LANGGRAPH_WORKFLOW = build_langgraph()
+# Lazily compile and cache when first needed
+_LANGGRAPH_WORKFLOW = None
+
+def _get_workflow():
+    global _LANGGRAPH_WORKFLOW
+    if _LANGGRAPH_WORKFLOW is None:
+        _LANGGRAPH_WORKFLOW = build_langgraph()
+    return _LANGGRAPH_WORKFLOW
@@
-    result = _LANGGRAPH_WORKFLOW.invoke(state)
+    result = _get_workflow().invoke(state)

65-68: Optional: wrap workflow invocation with error handling to normalize failures

If the compiled graph raises, you can return a consistent error structure (matching the rest of the pipeline).

Example:

def run_langgraph_workflow(state: dict):
    """Execute the pre-compiled LangGraph workflow."""
    try:
        return _get_workflow().invoke(state)
    except Exception as e:
        # If you have a logger configured:
        logger.exception("LangGraph workflow invocation failed")
        return {"status": "error", "error_from": "langgraph_workflow", "message": str(e)}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0455e36 and f7ba2c9.

📒 Files selected for processing (26)
  • backend/app/db/vector_store.py (3 hunks)
  • backend/app/modules/bias_detection/check_bias.py (3 hunks)
  • backend/app/modules/chat/embed_query.py (1 hunks)
  • backend/app/modules/chat/get_rag_data.py (2 hunks)
  • backend/app/modules/chat/llm_processing.py (3 hunks)
  • backend/app/modules/facts_check/llm_processing.py (1 hunks)
  • backend/app/modules/facts_check/web_search.py (2 hunks)
  • backend/app/modules/langgraph_builder.py (3 hunks)
  • backend/app/modules/langgraph_nodes/error_handler.py (1 hunks)
  • backend/app/modules/langgraph_nodes/fact_check.py (2 hunks)
  • backend/app/modules/langgraph_nodes/generate_perspective.py (3 hunks)
  • backend/app/modules/langgraph_nodes/judge.py (1 hunks)
  • backend/app/modules/langgraph_nodes/sentiment.py (2 hunks)
  • backend/app/modules/langgraph_nodes/store_and_send.py (2 hunks)
  • backend/app/modules/pipeline.py (1 hunks)
  • backend/app/modules/scraper/cleaner.py (4 hunks)
  • backend/app/modules/scraper/extractor.py (6 hunks)
  • backend/app/modules/scraper/keywords.py (3 hunks)
  • backend/app/modules/vector_store/chunk_rag_data.py (2 hunks)
  • backend/app/modules/vector_store/embed.py (1 hunks)
  • backend/app/routes/routes.py (1 hunks)
  • backend/app/utils/fact_check_utils.py (1 hunks)
  • backend/app/utils/generate_chunk_id.py (1 hunks)
  • backend/app/utils/prompt_templates.py (1 hunks)
  • backend/app/utils/store_vectors.py (2 hunks)
  • backend/main.py (2 hunks)
✅ Files skipped from review due to trivial changes (5)
  • backend/app/modules/facts_check/llm_processing.py
  • backend/app/modules/langgraph_nodes/judge.py
  • backend/app/modules/langgraph_nodes/store_and_send.py
  • backend/app/routes/routes.py
  • backend/app/modules/vector_store/embed.py
🚧 Files skipped from review as they are similar to previous changes (7)
  • backend/app/modules/langgraph_nodes/error_handler.py
  • backend/app/modules/scraper/cleaner.py
  • backend/app/utils/store_vectors.py
  • backend/app/modules/scraper/keywords.py
  • backend/app/modules/facts_check/web_search.py
  • backend/app/modules/scraper/extractor.py
  • backend/main.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
backend/app/utils/fact_check_utils.py (2)
backend/app/modules/facts_check/web_search.py (1)
  • search_google (30-42)
backend/app/modules/facts_check/llm_processing.py (2)
  • run_claim_extractor_sdk (37-82)
  • run_fact_verifier_sdk (85-157)
backend/app/modules/langgraph_builder.py (6)
backend/app/modules/langgraph_nodes/generate_perspective.py (1)
  • generate_perspective (46-82)
backend/app/modules/langgraph_nodes/store_and_send.py (1)
  • store_and_send (23-51)
backend/app/modules/langgraph_nodes/error_handler.py (1)
  • error_handler (16-25)
backend/app/modules/langgraph_nodes/sentiment.py (1)
  • run_sentiment_sdk (26-69)
backend/app/modules/langgraph_nodes/fact_check.py (1)
  • run_fact_check (21-45)
backend/app/modules/langgraph_nodes/judge.py (1)
  • judge_perspective (30-70)
🪛 Ruff (0.12.2)
backend/app/db/vector_store.py

58-58: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (11)
backend/app/utils/generate_chunk_id.py (1)

1-23: Docstring addition is clear and useful.

Good overview, example, and behavior notes. This aligns with the PR objective to add docstrings.

backend/app/modules/vector_store/chunk_rag_data.py (2)

1-24: Module docstring reads well and documents the chunking flow.

Clear description of inputs/outputs and chunk types. Nicely done.


27-27: Import path check — OK (no change required)

Confirmed: backend/app/utils/generate_chunk_id.py exists and init.py files are present under backend/app/, so app is a package and the import is consistent.

  • backend/app/utils/generate_chunk_id.py — found
  • backend/app/init.py (and other app package init.py files) — present
  • backend/app/modules/vector_store/chunk_rag_data.py:27 — contains from app.utils.generate_chunk_id import generate_id
backend/app/modules/chat/embed_query.py (1)

1-20: Docstring addition improves discoverability — LGTM

Clear, concise module documentation aligned with the model usage.

backend/app/modules/langgraph_nodes/fact_check.py (1)

45-45: Return merge looks good

Merging the verifications into the original state while setting a success status is clean and predictable.

backend/app/modules/langgraph_nodes/sentiment.py (1)

44-46: Prompt formatting change is neutral — LGTM

Equivalent content formation with a single f-string; no behavior change.

backend/app/modules/langgraph_builder.py (3)

60-66: LGTM: Node registration is clear and explicit

Node names map cleanly to functions; this improves readability when traversing the graph.


71-81: OK on conditional routing; ensure error path reaches a terminal end

Routing to error_handler on status == "error" is correct. Make sure error_handler has a path to end (see comment below), otherwise some paths may not terminate cleanly.


104-111: Ensure error paths terminate and confirm StateGraph finish-point semantics

Quick summary: I couldn't find the StateGraph implementation in this repo (so I can't confirm what set_finish_point() does). However, error_handler has no outgoing edge and returns "status": "stopped_due_to_error" (not "error"), so error runs may dead-end. Recommend adding an explicit edge from error_handler -> end and re-evaluating the use of set_finish_point.

Files to check

  • backend/app/modules/langgraph_builder.py (store_and_send conditional + set_finish_point)
  • backend/app/modules/langgraph_nodes/error_handler.py (returns "stopped_due_to_error")

Suggested change

     graph.add_conditional_edges(
         "store_and_send",
         lambda x: ("error_handler" if x.get("status") == "error" else "__end__"),
     )

-    graph.set_finish_point("store_and_send")
+    # Ensure error path also terminates
+    graph.add_edge("error_handler", "__end__")
+    # Optionally keep or remove the finish point depending on StateGraph semantics:
+    # graph.set_finish_point("store_and_send")
 
     return graph.compile()

Manual verification requested:

  • Please confirm StateGraph semantics for set_finish_point() (why it was added).
  • Run a smoke test exercising success and error paths to ensure all runs reach end (or behave as intended).
backend/app/modules/pipeline.py (2)

1-32: Docstring is clear and helpful

Good overview of the pipeline stages and responsibilities.


1-32: Docstring mentions RAKE-based keywords; verify actual implementation

If extract_keywords uses a different algorithm (e.g., spaCy, yake, keybert), adjust the docstring to avoid drift.

Comment on lines +1 to +23
"""
llm_processing.py
-----------------
Handles Large Language Model (LLM) interactions for context-based question answering.

This module:
- Connects to the Groq API using credentials from environment variables.
- Builds a context string from retrieved documents.
- Sends user questions along with context to the LLM.
- Returns generated answers.

Functions:
build_context(docs: list[dict]) -> str:
Extracts relevant fields (explanation or reasoning) from document
metadata and combines them into a single context string.

ask_llm(question: str, docs: list[dict]) -> str:
Builds context from the provided documents, sends it along with the
question to the LLM, and returns the model's answer.

Environment Variables:
GROQ_API_KEY (str): API key for authenticating with Groq.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

PR objective gap: prints not replaced with logger

This file still uses print statements despite the PR goal to replace them with logger. The diffs above show how to switch to logging.

To verify repository-wide, run:


🏁 Script executed:

#!/bin/bash
rg -n --hidden --no-ignore-vcs '\bprint\(' backend | sed -n '1,200p'

Length of output: 3345


Replace remaining print() calls with logger

I scanned the repo and found multiple leftover print() calls that should be converted to logging (logger.debug/info/error) to meet the PR objective.

Files/locations needing attention:

  • backend/main.py:47 — print(f"Server is running on http://0.0.0.0:{port}")
  • backend/app/utils/fact_check_utils.py:46 — print("❌ Claim extraction failed.")
  • backend/app/utils/fact_check_utils.py:53 — print(f"🧠 Extracted claims: {claims}")
  • backend/app/utils/fact_check_utils.py:61 — print(f"\n🔍 Searching for claim: {claim}")
  • backend/app/utils/fact_check_utils.py:67 — print(f"✅ Found result: {results[0]['title']}")
  • backend/app/utils/fact_check_utils.py:69 — print(f"⚠️ No search result for: {claim}")
  • backend/app/utils/fact_check_utils.py:71 — print(f"❌ Search failed for: {claim} -> {e}")
  • backend/app/routes/routes.py:63 — print(bias_score)
  • backend/app/routes/routes.py:70 — print(json.dumps(article_text, indent=2))
  • backend/app/routes/routes.py:80 — print(answer)
  • backend/app/modules/pipeline.py:60 — print(json.dumps(result, indent=2, ensure_ascii=False))
  • backend/app/db/vector_store.py:44 — print(f"Creating index: {INDEX_NAME}")
  • backend/app/db/vector_store.py:52 — print(f"Index '{INDEX_NAME}' already exists")
  • backend/app/modules/chat/llm_processing.py:44 — print(context)
  • backend/app/modules/vector_store/chunk_rag_data.py:98 — print(f"[Error] Failed to chunk the data: {e}")
  • backend/app/modules/langgraph_nodes/store_and_send.py:26 — print(state)
  • backend/app/modules/langgraph_nodes/store_and_send.py:36 — print("embedding generated successfully!")
  • backend/app/modules/langgraph_nodes/store_and_send.py:41 — print("Vectors saved to Pinecone!")
  • backend/app/modules/langgraph_nodes/store_and_send.py:44 — print(f"some error occured in store_and_send:{e}")
  • backend/app/modules/langgraph_nodes/judge.py:65 — print(f"Error in judge_perspective: {e}")
  • backend/app/modules/langgraph_nodes/sentiment.py:64 — print(f"Error in sentiment_analysis: {e}")
  • backend/app/modules/langgraph_nodes/sentiment.py:81 — # print("Sentiment Output:", result) (commented)
  • backend/app/modules/langgraph_nodes/generate_perspective.py:76 — print(f"some error occured in generate_perspective:{e}")
  • backend/app/modules/langgraph_nodes/fact_check.py:31 — print(f"some error occured in fact_checking:{error_message}")
  • backend/app/modules/langgraph_nodes/fact_check.py:39 — print(f"some error occured in fact_checking:{e}")
  • backend/app/modules/langgraph_nodes/error_handler.py:17 — print("Error detected!")
  • backend/app/modules/langgraph_nodes/error_handler.py:18 — print(f"From: {input.get('error_from')}")
  • backend/app/modules/langgraph_nodes/error_handler.py:19 — print(f"Message: {input.get('message')}")
  • backend/app/modules/bias_detection/check_bias.py:37 — print(text)
  • backend/app/modules/bias_detection/check_bias.py:38 — print(json.dumps(text))
  • backend/app/modules/bias_detection/check_bias.py:73 — print(f"Error in bias_detection: {e}")
  • backend/app/modules/facts_check/llm_processing.py:77 — print(f"Error in claim_extraction: {e}")
  • backend/app/modules/facts_check/llm_processing.py:135 — print(content)
  • backend/app/modules/facts_check/llm_processing.py:141 — print(f"❌ LLM JSON parse error: {parse_err}")
  • backend/app/modules/facts_check/llm_processing.py:152 — print(f"🔥 Error in fact_verification: {e}")

Suggested minimal change pattern (example for backend/app/modules/chat/llm_processing.py):

  • At module top:
    • add: import logging
    • add: logger = logging.getLogger(name)
  • Replace:
    • print(context)
      with:
    • logger.debug(context) # or logger.info/error as appropriate

Please replace these prints across the repository and ensure a consistent logging configuration is used.

🤖 Prompt for AI Agents
In backend/app/modules/chat/llm_processing.py around lines 1 to 23, replace any
leftover print() calls (e.g., the print(context) at line ~44 referenced in the
review) with structured logging: add "import logging" at the top, create a
module logger via "logger = logging.getLogger(__name__)", and replace print(...)
with the appropriate logger method (logger.debug(...) for diagnostic context
output, logger.info/error where severity differs); ensure you use consistent
logger names and levels so the project's global logging configuration captures
these messages.

Comment on lines +36 to +39
return "\n".join(
f"{m['metadata'].get('explanation') or m['metadata'].get('reasoning', '')}"
for m in docs
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Guard against missing metadata keys when building context

Accessing m['metadata'] can raise KeyError. Use .get with defaults.

-    return "\n".join(
-        f"{m['metadata'].get('explanation') or m['metadata'].get('reasoning', '')}"
-        for m in docs
-    )
+    return "\n".join(
+        (m.get("metadata", {}) or {}).get("explanation")
+        or (m.get("metadata", {}) or {}).get("reasoning", "")
+        for m in docs or []
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return "\n".join(
f"{m['metadata'].get('explanation') or m['metadata'].get('reasoning', '')}"
for m in docs
)
return "\n".join(
(m.get("metadata", {}) or {}).get("explanation")
or (m.get("metadata", {}) or {}).get("reasoning", "")
for m in docs or []
)
🤖 Prompt for AI Agents
In backend/app/modules/chat/llm_processing.py around lines 36 to 39, the code
accesses m['metadata'] which can raise a KeyError; change the access to use
m.get('metadata', {}) and then read explanation via metadata.get('explanation')
or metadata.get('reasoning', '') so missing metadata or keys yield an empty
string, and keep the existing join logic to build the context.

Comment on lines +41 to +42
error_handler,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Logger adoption not complete across nodes

PR objective says “Replaced print statements with a logger,” but referenced nodes still use print (error_handler, fact_check, store_and_send, sentiment, judge). Migrate those to a structured logger for consistency.

Run this script to list remaining print() calls under backend/app/modules:


🏁 Script executed:

#!/bin/bash
rg -n --no-heading --line-number -S 'print\(' backend/app/modules | sed 's/^/print: /'

Length of output: 2540


Logger adoption not complete — replace remaining print() calls

Search found active print() usages under backend/app/modules. Migrate these to the structured logger (e.g., logger = logging.getLogger(name); logger.info/error/exception) for consistency.

Files/locations to fix:

  • backend/app/modules/pipeline.py:60 — print(json.dumps(result, indent=2, ensure_ascii=False))
  • backend/app/modules/chat/llm_processing.py:44 — print(context)
  • backend/app/modules/vector_store/chunk_rag_data.py:98 — print(f"[Error] Failed to chunk the data: {e}")
  • backend/app/modules/langgraph_nodes/judge.py:65 — print(f"Error in judge_perspective: {e}")
  • backend/app/modules/langgraph_nodes/sentiment.py:64 — print(f"Error in sentiment_analysis: {e}")
  • backend/app/modules/langgraph_nodes/sentiment.py:81 — # print("Sentiment Output:", result) (commented)
  • backend/app/modules/langgraph_nodes/store_and_send.py:26 — print(state)
  • backend/app/modules/langgraph_nodes/store_and_send.py:36 — print("embedding generated successfully!")
  • backend/app/modules/langgraph_nodes/store_and_send.py:41 — print("Vectors saved to Pinecone!")
  • backend/app/modules/langgraph_nodes/store_and_send.py:44 — print(f"some error occured in store_and_send:{e}")
  • backend/app/modules/langgraph_nodes/generate_perspective.py:76 — print(f"some error occured in generate_perspective:{e}")
  • backend/app/modules/langgraph_nodes/fact_check.py:31 — print(f"some error occured in fact_checking:{error_message}")
  • backend/app/modules/langgraph_nodes/fact_check.py:39 — print(f"some error occured in fact_checking:{e}")
  • backend/app/modules/langgraph_nodes/error_handler.py:17 — print("Error detected!")
  • backend/app/modules/langgraph_nodes/error_handler.py:18 — print(f"From: {input.get('error_from')}")
  • backend/app/modules/langgraph_nodes/error_handler.py:19 — print(f"Message: {input.get('message')}")
  • backend/app/modules/facts_check/llm_processing.py:77 — print(f"Error in claim_extraction: {e}")
  • backend/app/modules/facts_check/llm_processing.py:135 — print(content)
  • backend/app/modules/facts_check/llm_processing.py:141 — print(f"❌ LLM JSON parse error: {parse_err}")
  • backend/app/modules/facts_check/llm_processing.py:152 — print(f"🔥 Error in fact_verification: {e}")
  • backend/app/modules/bias_detection/check_bias.py:37 — print(text)
  • backend/app/modules/bias_detection/check_bias.py:38 — print(json.dumps(text))
  • backend/app/modules/bias_detection/check_bias.py:73 — print(f"Error in bias_detection: {e}")

Recommendation: replace these prints with logger calls and ensure the app config initializes logging handlers so messages are structured and configurable.

🤖 Prompt for AI Agents
backend/app/modules/langgraph_builder.py around lines 41-42 and multiple files
under backend/app/modules contain remaining print() calls; replace each print()
with a module-level structured logger (e.g., logger =
logging.getLogger(__name__)) and use the appropriate logger method
(logger.info/debug/warning/error/exception) instead of print, include exception
details via logger.exception or exc_info=True for catches, and ensure the
application logging configuration initializes handlers/formatters so messages
are structured and configurable; update the listed files/lines to use logger.*
calls and remove print() usages accordingly.

Comment on lines 44 to 45
from typing_extensions import TypedDict

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Import NotRequired/Required for accurate TypedDict field optionality

To align the state typing with actual usage (most fields are optional and accessed with .get()), import NotRequired and Required here.

Apply this diff:

-from typing_extensions import TypedDict
+from typing_extensions import TypedDict, NotRequired, Required
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from typing_extensions import TypedDict
from typing_extensions import TypedDict, NotRequired, Required
🤖 Prompt for AI Agents
In backend/app/modules/langgraph_builder.py around lines 44-45, the TypedDict
usage needs explicit NotRequired and Required imports so optional fields reflect
actual usage (most accessed with .get()). Update the import statement to also
import NotRequired and Required from typing_extensions, and then use NotRequired
for fields that are optional and Required for mandatory ones in the TypedDict
declarations to match runtime access patterns.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🔭 Outside diff range comments (3)
backend/app/db/vector_store.py (1)

38-39: Chain exceptions and fix typos in error messages (ruff B904)

Preserve original tracebacks with raise ... from e and fix spelling in messages.

-except Exception as e:
-    raise RuntimeError(f"Error occured while intialising pinecone client:{e}")
+except Exception as e:
+    raise RuntimeError("Error occurred while initializing Pinecone client") from e
-except Exception as e:
-    raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}")
+except Exception as e:
+    raise RuntimeError(f"Error occurred while connecting to the index {INDEX_NAME}") from e

Also applies to: 62-62

backend/app/utils/fact_check_utils.py (1)

78-80: Propagate verifier errors instead of returning ([], None)

Surface the error message from the verifier to callers.

-    final = run_fact_verifier_sdk(search_results)
-    return final.get("verifications", []), None
+    final = run_fact_verifier_sdk(search_results)
+    if final.get("status") != "success":
+        return [], final.get("message", "Fact verification failed.")
+    return final.get("verifications", []), None
backend/app/modules/langgraph_nodes/store_and_send.py (1)

31-36: Preserve exception chaining with ‘raise … from e’ (Ruff B904)

Re-raise with explicit cause to keep the original traceback and satisfy B904.

-        except KeyError as e:
-            raise Exception(f"Missing required data field for chunking: {e}")
-        except Exception as e:
-            raise Exception(f"Failed to chunk data: {e}")
+        except KeyError as e:
+            raise Exception(f"Missing required data field for chunking: {e}") from e
+        except Exception as e:
+            raise Exception(f"Failed to chunk data: {e}") from e
@@
-        except Exception as e:
-            raise Exception(f"failed to embed chunks: {e}")
+        except Exception as e:
+            raise Exception(f"Failed to embed chunks: {e}") from e

Also applies to: 37-41

♻️ Duplicate comments (4)
backend/app/modules/bias_detection/check_bias.py (1)

40-42: Do not log raw user/article text; log metadata only

This was flagged previously. Logging raw content risks PII leakage. Log minimal metadata (length) instead.

-        logger.debug(f"Raw article text: {text}")
-        logger.debug(f"JSON dump of text: {json.dumps(text)}")
+        logger.debug(
+            "Bias detection invoked (input_length=%s)",
+            len(text) if hasattr(text, "__len__") else "n/a",
+        )
backend/app/modules/vector_store/chunk_rag_data.py (1)

44-61: Bug: Normalized perspective_data is computed but unused; attribute access will fail for dict inputs

You normalize the incoming perspective into perspective_data but then ignore it and operate on perspective_obj via attribute access. If data["perspective"] is a dict (or a Pydantic model converted to dict), this raises “Perspective object missing required fields”. Use the normalized dict consistently and support both Pydantic v2 .model_dump() and v1 .dict().

Apply this diff to normalize and use the dict consistently:

-        # Validate perspective structure
-        perspective_data = data["perspective"]
-        if hasattr(perspective_data, "dict"):
-            perspective_data = perspective_data.dict()
+        # Normalize perspective structure to a dict
+        perspective_data = data["perspective"]
+        if hasattr(perspective_data, "model_dump"):
+            perspective_data = perspective_data.model_dump()
+        elif hasattr(perspective_data, "dict"):
+            perspective_data = perspective_data.dict()
+        elif not isinstance(perspective_data, dict):
+            raise ValueError("Perspective must be a dict or Pydantic model")
@@
-        # Add counter-perspective chunk
-        perspective_obj = data["perspective"]
-
-        # Optional safety check
-
-        if not (
-            hasattr(perspective_obj, "perspective")
-            and hasattr(perspective_obj, "reasoning")
-        ):
-            raise ValueError("Perspective object missing required fields")
+        # Add counter-perspective chunk
+        if not ("perspective" in perspective_data and "reasoning" in perspective_data):
+            raise ValueError("Perspective dict missing required fields: 'perspective', 'reasoning'")
@@
-                "text": perspective_obj.perspective,
+                "text": perspective_data["perspective"],
@@
-                    "reasoning": perspective_obj.reasoning,
+                    "reasoning": perspective_data["reasoning"],

Also applies to: 63-73

backend/app/modules/langgraph_nodes/generate_perspective.py (2)

62-69: Bug: missing f-strings and wrong iterable for facts_str

Only the first line is an f-string, so verdict/explanation render literally. Also, iterate over the local facts variable, not state["facts"].

-        facts_str = "\n".join(
-            [
-                f"Claim: {f['original_claim']}\n"
-                "Verdict: {f['verdict']}\nExplanation: "
-                "{f['explanation']}"
-                for f in state["facts"]
-            ]
-        )
+        facts_str = "\n".join(
+            [
+                f"Claim: {f['original_claim']}\n"
+                f"Verdict: {f['verdict']}\n"
+                f"Explanation: {f['explanation']}"
+                for f in facts
+            ]
+        )

85-85: Normalize structured LLM result to a plain dict for downstream + serialization

You return a Pydantic model in state["perspective"]. Downstream chunk_rag_data partially normalizes but then uses attribute access — this mismatch causes runtime/serialization issues. Return a plain dict.

-    return {**state, "perspective": result, "status": "success"}
+    result_dict = (
+        result
+        if isinstance(result, dict)
+        else (result.model_dump() if hasattr(result, "model_dump") else result.dict())
+    )
+    return {**state, "perspective": result_dict, "status": "success"}

Pair this with the corresponding normalization/use change suggested in chunk_rag_data.py.

🧹 Nitpick comments (16)
backend/app/logging/logging_config.py (1)

31-37: Avoid duplicate/parent handlers: disable propagation

If the application (or a library) configures root handlers, messages will be emitted twice. Set propagate to False once you add your own handlers.

     console_handler.setLevel(logging.INFO)
     console_handler.setFormatter(formatter)
     logger.addHandler(console_handler)
 
-    # File Handler
+    # File Handler
     file_handler = logging.FileHandler("app.log")
     file_handler.setLevel(logging.DEBUG)  # Keep detailed logs in file
     file_handler.setFormatter(formatter)
     logger.addHandler(file_handler)
 
+    # Prevent messages from bubbling up to root handlers and being duplicated
+    logger.propagate = False
backend/app/modules/facts_check/llm_processing.py (4)

139-141: Strip fenced JSON more robustly

Handle both json and plain fences, including surrounding whitespace/newlines.

-            content = re.sub(r"^```json|```$", "", content).strip()
+            content = re.sub(r"^```(?:json)?\s*|\s*```$", "", content, flags=re.DOTALL).strip()

150-154: Return shape: remove redundant ‘claim’ or structure per-claim

Returning a single claim (the last one processed) is misleading. Either drop it or return a per-claim mapping.

Minimal adjustment:

-        return {
-            "claim": claim,
-            "verifications": results_list,
-            "status": "success",
-        }
+        return {
+            "verifications": results_list,
+            "status": "success",
+        }

If you prefer a per-claim mapping:

# Example
results_list.append({"claim": claim, **parsed})

71-73: Consider reducing sensitive/debug payloads

Storing full extracted claims and raw LLM output in logs can leak content/PII. Log sizes or hashes instead.

Example:

logger.debug("Extracted claims (chars=%d)", len(extracted_claims))
logger.debug("LLM output (chars=%d)", len(content))

Also applies to: 140-141


35-38: Fail fast if GROQ_API_KEY is missing

Improve error discoverability when credentials are not configured.

api_key = os.getenv("GROQ_API_KEY")
if not api_key:
    raise ValueError("GROQ_API_KEY environment variable is required")
client = Groq(api_key=api_key)
backend/app/utils/fact_check_utils.py (1)

63-63: Nit: avoid leading newline in logs and prefer parameterized logging

Removes a cosmetic newline and avoids f-strings in logs.

-        logger.info(f"\n🔍 Searching for claim: {claim}")
+        logger.info("🔍 Searching for claim: %s", claim)
backend/app/modules/bias_detection/check_bias.py (2)

44-46: Fix error message to reference ‘text’, not ‘cleaned_text’

Aligns with the function signature and reduces confusion.

-        if not text:
-            logger.error("Missing or empty 'cleaned_text'")
-            raise ValueError("Missing or empty 'cleaned_text'")
+        if not text:
+            logger.error("Missing or empty 'text'")
+            raise ValueError("Missing or empty 'text'")

35-36: Optional: fail fast if GROQ_API_KEY is missing

Adds clearer feedback when credentials are not configured.

api_key = os.getenv("GROQ_API_KEY")
if not api_key:
    raise ValueError("GROQ_API_KEY environment variable is required")
client = Groq(api_key=api_key)
backend/app/modules/vector_store/chunk_rag_data.py (2)

33-33: Add type hints to improve readability and tooling

Declare parameter and return types for the public function.

-def chunk_rag_data(data):
+def chunk_rag_data(data: dict) -> list[dict]:

100-102: Minor: simplify exception logging message

logger.exception already attaches the traceback and exception info. Avoid interpolating e in the message to reduce duplication.

-        logger.exception(f"Failed to chunk the data: {e}")
+        logger.exception("Failed to chunk the data")
backend/app/modules/langgraph_nodes/store_and_send.py (2)

37-45: Handle empty vectors explicitly to avoid storage errors

store(vectors) raises on empty input. Guard against this and log instead of treating it as a hard error.

         try:
             vectors = embed_chunks(chunks)
-            if vectors:
-                logger.info(f"Embedding complete — {len(vectors)} vectors generated.")
+            if vectors:
+                logger.info(f"Embedding complete — {len(vectors)} vectors generated.")
+            else:
+                logger.warning("Embedding produced no vectors; skipping storage.")
+                return {**state, "status": "success"}
         except Exception as e:
             raise Exception(f"Failed to embed chunks: {e}") from e
-        
-        store(vectors)
-        logger.info("Vectors successfully stored in Pinecone.")
+        if vectors:
+            store(vectors)
+            logger.info("Vectors successfully stored in Pinecone.")

29-29: Consider redacting sensitive fields from state in debug logs

Depending on the pipeline, state may contain PII or large payloads. Redact or log only keys/summary in debug to keep logs lean and safe.

backend/app/modules/pipeline.py (2)

48-65: Add type hints for public functions

Public APIs benefit from explicit typing; also annotate run_langgraph_workflow’s return.

-def run_scraper_pipeline(url: str) -> dict:
+def run_scraper_pipeline(url: str) -> dict:
     ...
-def run_langgraph_workflow(state: dict):
+def run_langgraph_workflow(state: dict) -> dict:

Also applies to: 67-71


45-46: Optional: lazy-initialize the compiled graph

Compiling at import can slow startup and complicate testing. Consider lazy init with a module-level getter if import-time cost becomes an issue.

backend/app/modules/langgraph_nodes/generate_perspective.py (2)

49-49: Add a return type hint

Small improvement for readability and tooling.

-def generate_perspective(state):
+def generate_perspective(state: dict) -> dict:

39-46: Sanity check: model and prompt are module-level singletons

This is fine for performance, but if you anticipate hot-reload or environment-driven changes (e.g., model name), consider moving construction into a factory or reading from config.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f7ba2c9 and 0549c21.

⛔ Files ignored due to path filters (1)
  • backend/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • backend/app/db/vector_store.py (2 hunks)
  • backend/app/logging/logging_config.py (1 hunks)
  • backend/app/modules/bias_detection/check_bias.py (3 hunks)
  • backend/app/modules/chat/llm_processing.py (2 hunks)
  • backend/app/modules/facts_check/llm_processing.py (5 hunks)
  • backend/app/modules/langgraph_nodes/error_handler.py (1 hunks)
  • backend/app/modules/langgraph_nodes/fact_check.py (2 hunks)
  • backend/app/modules/langgraph_nodes/generate_perspective.py (3 hunks)
  • backend/app/modules/langgraph_nodes/judge.py (2 hunks)
  • backend/app/modules/langgraph_nodes/sentiment.py (3 hunks)
  • backend/app/modules/langgraph_nodes/store_and_send.py (2 hunks)
  • backend/app/modules/pipeline.py (2 hunks)
  • backend/app/modules/vector_store/chunk_rag_data.py (2 hunks)
  • backend/app/routes/routes.py (2 hunks)
  • backend/app/utils/fact_check_utils.py (1 hunks)
  • backend/main.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • backend/app/routes/routes.py
  • backend/app/modules/langgraph_nodes/judge.py
  • backend/app/modules/langgraph_nodes/error_handler.py
  • backend/app/modules/langgraph_nodes/fact_check.py
  • backend/main.py
  • backend/app/modules/langgraph_nodes/sentiment.py
  • backend/app/modules/chat/llm_processing.py
🧰 Additional context used
🧬 Code Graph Analysis (8)
backend/app/utils/fact_check_utils.py (3)
backend/app/modules/facts_check/web_search.py (1)
  • search_google (30-42)
backend/app/modules/facts_check/llm_processing.py (2)
  • run_claim_extractor_sdk (40-87)
  • run_fact_verifier_sdk (90-162)
backend/app/logging/logging_config.py (1)
  • setup_logger (4-39)
backend/app/modules/bias_detection/check_bias.py (1)
backend/app/logging/logging_config.py (1)
  • setup_logger (4-39)
backend/app/modules/facts_check/llm_processing.py (1)
backend/app/logging/logging_config.py (1)
  • setup_logger (4-39)
backend/app/modules/pipeline.py (5)
backend/app/modules/scraper/extractor.py (1)
  • Article_extractor (31-110)
backend/app/modules/scraper/cleaner.py (1)
  • clean_extracted_text (32-105)
backend/app/modules/scraper/keywords.py (1)
  • extract_keywords (26-43)
backend/app/modules/langgraph_builder.py (1)
  • build_langgraph (57-111)
backend/app/logging/logging_config.py (1)
  • setup_logger (4-39)
backend/app/modules/vector_store/chunk_rag_data.py (2)
backend/app/utils/generate_chunk_id.py (1)
  • generate_id (29-33)
backend/app/logging/logging_config.py (1)
  • setup_logger (4-39)
backend/app/modules/langgraph_nodes/generate_perspective.py (1)
backend/app/logging/logging_config.py (1)
  • setup_logger (4-39)
backend/app/db/vector_store.py (1)
backend/app/logging/logging_config.py (1)
  • setup_logger (4-39)
backend/app/modules/langgraph_nodes/store_and_send.py (4)
backend/app/modules/vector_store/chunk_rag_data.py (1)
  • chunk_rag_data (33-102)
backend/app/modules/vector_store/embed.py (1)
  • embed_chunks (31-50)
backend/app/utils/store_vectors.py (1)
  • store (33-55)
backend/app/logging/logging_config.py (1)
  • setup_logger (4-39)
🪛 Ruff (0.12.2)
backend/app/db/vector_store.py

62-62: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

backend/app/modules/langgraph_nodes/store_and_send.py

41-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (10)
backend/app/logging/logging_config.py (1)

21-25: LGTM: clear formatter and sensible console level

The formatter is readable and the console INFO threshold avoids noisy DEBUG logs in stdout.

Also applies to: 27-31

backend/app/db/vector_store.py (1)

48-48: LGTM: informative index lifecycle logs

The messages for index creation and existence are clear and actionable.

Also applies to: 56-56

backend/app/utils/fact_check_utils.py (1)

55-56: LGTM: clear logging for extraction and search flow

Informative messages for extracted claims, found results, and warnings on misses. Good operational visibility.

Also applies to: 70-74

backend/app/modules/bias_detection/check_bias.py (1)

68-75: LGTM: structured success/error responses and logging

Good use of logger.info for the score and logger.exception for failures, with a consistent response envelope.

backend/app/modules/vector_store/chunk_rag_data.py (1)

1-24: Docstring and logger integration look good

Clear module docstring outlining inputs/outputs and the logger setup align with the PR goals.

Also applies to: 28-31

backend/app/modules/langgraph_nodes/store_and_send.py (1)

1-15: Good: module docstring and logger usage

Docstring explains the workflow well, and print statements were correctly replaced with logger calls.

Also applies to: 21-24

backend/app/modules/pipeline.py (1)

1-32: Docstring and logger integration LGTM

Module-level docstring is clear; logger replaces prints consistently. The JSON debug dump is helpful for observability.

Also applies to: 39-43, 61-63, 70-71

backend/app/modules/langgraph_nodes/generate_perspective.py (3)

49-53: Verify retry semantics: incrementing retries on success may be unintended

retries is incremented before generation regardless of outcome. If routing conditions use retries to decide on backoff or stopping, incrementing on success can cause premature termination in later steps.

Would you confirm the intended meaning of retries? If it should increment only on failure, move the increment into the except path:

-    try:
-        retries = state.get("retries", 0)
-        state["retries"] = retries + 1
+    try:
+        retries = state.get("retries", 0)
@@
-    except Exception as e:
+    except Exception as e:
+        state["retries"] = retries + 1
         logger.exception(f"Error in generate_perspective: {e}")
         return {
             "status": "error",
             "error_from": "generate_perspective",
             "message": f"{e}",
         }

If the current behavior is intentional, please ignore.


23-29: Consistent logging usage is good

Using setup_logger and logger.exception preserves tracebacks and aligns with the PR intent.

Also applies to: 41-47, 78-85


1-20: Docstring is clear and helpful

Well-structured module docstring describing responsibilities and outputs.

Comment on lines +28 to 29
logger = setup_logger(__name__)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid import-time network side effects; lazily initialize the index

Creating clients, checking/creating indexes, and connecting at import time makes the module fragile (e.g., breaks tests/migrations when env isn’t ready). Encapsulate initialization behind a function and cache the result.

Example approach:

# At module level
_index = None
_pc = None

def get_index():
    global _index, _pc
    if _index:
        return _index
    api_key = os.getenv("PINECONE_API_KEY")
    if not api_key:
        raise ValueError("PINECONE_API_KEY environment variable is required")
    try:
        _pc = Pinecone(api_key=api_key)
    except Exception as e:
        raise RuntimeError("Error occurred while initializing Pinecone client") from e

    if not _pc.has_index(INDEX_NAME):
        logger.info("Creating index: %s", INDEX_NAME)
        _pc.create_index(
            name=INDEX_NAME,
            dimension=DIMENSIONS,
            metric=METRIC,
            spec=ServerlessSpec(cloud=CloudProvider.AWS, region=AwsRegion.US_EAST_1),
        )
    else:
        logger.info("Index '%s' already exists", INDEX_NAME)

    try:
        _index = _pc.Index(INDEX_NAME)
    except Exception as e:
        raise RuntimeError(f"Error occurred while connecting to the index {INDEX_NAME}") from e
    return _index

Would you like me to refactor the module accordingly and update call sites?

Also applies to: 46-55

Comment on lines +15 to +19
logger.setLevel(logging.DEBUG) # Log everything from DEBUG and above

# Avoid adding duplicate handlers if logger already set
if logger.handlers:
return logger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Prevent unintentional log-level overrides on subsequent calls

Calling setLevel(DEBUG) before checking existing handlers can silently override a logger that's already configured elsewhere. Move setLevel after the early return.

-    logger.setLevel(logging.DEBUG)  # Log everything from DEBUG and above
-
-    # Avoid adding duplicate handlers if logger already set
-    if logger.handlers:
-        return logger
+    # Avoid adding duplicate handlers if logger already set
+    if logger.handlers:
+        return logger
+    logger.setLevel(logging.DEBUG)  # Log everything from DEBUG and above
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.setLevel(logging.DEBUG) # Log everything from DEBUG and above
# Avoid adding duplicate handlers if logger already set
if logger.handlers:
return logger
# Avoid adding duplicate handlers if logger already set
if logger.handlers:
return logger
logger.setLevel(logging.DEBUG) # Log everything from DEBUG and above
🤖 Prompt for AI Agents
In backend/app/logging/logging_config.py around lines 15 to 19,
setLevel(logging.DEBUG) is called before checking for existing handlers which
can inadvertently change an already-configured logger's level; move the setLevel
call to after the early-return (after the "if logger.handlers: return logger")
so the function exits without modifying level when handlers already exist, and
only apply logger.setLevel(logging.DEBUG) when initializing handlers for a fresh
logger.

Comment on lines +33 to +37
# File Handler
file_handler = logging.FileHandler("app.log")
file_handler.setLevel(logging.DEBUG) # Keep detailed logs in file
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use a rotating file handler with UTF-8 and guard file I/O errors

A plain FileHandler can grow unbounded and may fail on read-only filesystems. Prefer RotatingFileHandler, set encoding, and handle OSErrors gracefully.

-    # File Handler
-    file_handler = logging.FileHandler("app.log")
-    file_handler.setLevel(logging.DEBUG)  # Keep detailed logs in file
-    file_handler.setFormatter(formatter)
-    logger.addHandler(file_handler)
+    # File Handler (rotating)
+    try:
+        file_handler = logging.handlers.RotatingFileHandler(
+            "app.log", maxBytes=10_485_760, backupCount=3, encoding="utf-8"
+        )
+    except OSError as err:
+        logger.warning("File logging disabled (cannot open app.log): %s", err)
+    else:
+        file_handler.setLevel(logging.DEBUG)  # Keep detailed logs in file
+        file_handler.setFormatter(formatter)
+        logger.addHandler(file_handler)

Add this import at the top of the file:

import logging.handlers
🤖 Prompt for AI Agents
In backend/app/logging/logging_config.py around lines 33 to 37, replace the
plain FileHandler with a logging.handlers.RotatingFileHandler configured with
encoding='utf-8' and reasonable maxBytes and backupCount values; wrap the
handler creation in a try/except OSError block so that on failure you fall back
to a StreamHandler (stderr) and still set level and formatter; also add the
import "import logging.handlers" at the top of the file.

Comment on lines 143 to 149
try:
parsed = json.loads(content)
except Exception as parse_err:
print(f"LLM JSON parse error: {parse_err}")
logger.error(f"LLM JSON parse error: {parse_err}")

results_list.append(parsed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Bug: ‘parsed’ may be referenced before assignment when JSON parse fails

If json.loads raises, parsed is undefined and results_list.append(parsed) will raise UnboundLocalError. Provide a fallback object to keep the pipeline moving.

-            try:
-                parsed = json.loads(content)
-            except Exception as parse_err:
-                logger.error(f"LLM JSON parse error: {parse_err}")
-
-            results_list.append(parsed)
+            try:
+                parsed = json.loads(content)
+            except Exception as parse_err:
+                logger.error("LLM JSON parse error: %s; content=%r", parse_err, content[:500])
+                parsed = {
+                    "verdict": "Unknown",
+                    "explanation": "Failed to parse LLM output",
+                    "original_claim": claim,
+                    "source_link": source,
+                    "_raw": content,
+                }
+            results_list.append(parsed)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In backend/app/modules/facts_check/llm_processing.py around lines 143 to 149,
the try/except catches json.loads errors but still appends parsed even when
parsing failed, causing an UnboundLocalError; modify the except block to assign
a safe fallback (e.g., an empty dict or a minimal error object that includes the
parse error and optionally the raw content) to parsed before logging, so
results_list.append(parsed) always has a defined value; ensure the fallback
contains enough info for downstream handling (like {"_parse_error":
str(parse_err), "raw": content}) and keep the existing logger.error call.

Comment on lines +62 to +69
facts_str = "\n".join(
[
f"Claim: {f['original_claim']}\n"
"Verdict: {f['verdict']}\nExplanation: "
"{f['explanation']}"
for f in state["facts"]
]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Repo-wide consistency check for perspective representation

After normalizing generate_perspective to return a dict and updating chunk_rag_data to consume it, ensure there are no other sites expecting attribute access (e.g., .perspective).

Also applies to: 85-85


🏁 Script executed:

#!/bin/bash
# Find attribute-style and dict-style uses of 'perspective' across the repo.
set -euo pipefail
echo "== Attribute-style access =="
rg -n "\.perspective\b" -A 2 || true
echo
echo "== Dict-style access =="
rg -n "\['perspective'\]" -A 2 || true

Length of output: 842


Action: fix remaining attribute-style uses of perspective

I ran a repo-wide search — there are still attribute-style accesses that will break if generate_perspective now returns a dict. Update these locations to use dict-style access or adjust the data shape returned to preserve attribute access.

Files to fix:

  • backend/app/modules/vector_store/chunk_rag_data.py:66
    • Replace: "text": perspective_obj.perspective,
    • With: "text": perspective_obj['perspective'], (and similarly for other perspective fields)
  • frontend/app/analyze/results/page.tsx:161–163
    • Uses: perspective.perspective and perspective.reasoning
    • Confirm the frontend receives a plain JSON object (dot access is fine for JS). If the backend now nests/renames fields or returns a different shape, update these accesses to match (e.g., perspective['perspective'] or adjust the API/serialization).

Also re-check any other Python files for .perspective usage after making these changes.

Comment on lines 47 to 49
if state.get("status") != "success":
print("❌ Claim extraction failed.")
logger.error("❌ Claim extraction failed.")
return [], "Claim extraction failed."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Bug: checking ‘state’ instead of claim-extractor ‘result’

This condition will almost always fail, short-circuiting the pipeline. Check the extractor’s return instead.

-    if state.get("status") != "success":
+    if result.get("status") != "success":
         logger.error("❌ Claim extraction failed.")
         return [], "Claim extraction failed."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if state.get("status") != "success":
print("❌ Claim extraction failed.")
logger.error("❌ Claim extraction failed.")
return [], "Claim extraction failed."
if result.get("status") != "success":
logger.error("❌ Claim extraction failed.")
return [], "Claim extraction failed."
🤖 Prompt for AI Agents
In backend/app/utils/fact_check_utils.py around lines 47 to 49, the code is
incorrectly checking state.get("status") instead of the claim extractor's
returned value, causing the pipeline to short-circuit; update the condition to
inspect the extractor result (e.g., result.get("status") or the actual variable
name returned by the extractor), log the error using that result when status !=
"success", and only then return the empty list and error string so the pipeline
proceeds correctly when the extractor succeeded.

@ParagGhatage ParagGhatage merged commit d7dee6c into main Aug 16, 2025
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant