Conversation
|
Caution Review failedThe pull request is closed. WalkthroughLoad GROQ and Pinecone credentials from environment, initialize clients at import, add structured logging utility and module loggers, introduce pipeline orchestrator and deterministic chunk ID generator, return compiled LangGraph from build_langgraph, remove legacy start script, and apply docstring/formatting cleanups across frontend and backend. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Frontend
participant Backend
participant LangGraph
participant Pinecone as VectorStore
User->>Frontend: Submit article URL
Frontend->>Backend: POST /api/process (uses NEXT_PUBLIC_API_URL hint)
Backend->>Backend: run_scraper_pipeline -> cleaned_text, keywords
Backend->>LangGraph: _LANGGRAPH_WORKFLOW.invoke(state)
LangGraph->>Pinecone: store_and_send(state) (chunking → embed → upsert)
Pinecone-->>LangGraph: store result
LangGraph-->>Backend: workflow result
Backend-->>Frontend: aggregated response
sequenceDiagram
participant Client
participant API
participant GroqLLM
Client->>API: POST /api/chat (message)
API->>GroqLLM: client.chat.completions.create(system, user) using GROQ_API_KEY
GroqLLM-->>API: completion
API-->>Client: answer
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (2)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 17
🔭 Outside diff range comments (18)
backend/app/modules/langgraph_nodes/store_and_send.py (1)
6-33: Replace print statements with logger and avoid logging full state/PIIThis PR’s goal includes replacing prints with a logger. Current prints leak the entire state to stdout and don’t capture stacktraces on failures. Also, calling store(vectors) with an empty list will raise a ValueError upstream. Guard and log accordingly.
Proposed refactor:
+import logging from app.modules.vector_store.chunk_rag_data import chunk_rag_data from app.modules.vector_store.embed import embed_chunks from app.utils.store_vectors import store +logger = logging.getLogger(__name__) def store_and_send(state): # to store data in vector db try: - print(state) + # Avoid logging the entire state to prevent PII leakage; log keys only. + logger.debug("store_and_send received state keys=%s", list(state.keys())) try: chunks = chunk_rag_data(state) except KeyError as e: raise Exception(f"Missing required data field for chunking: {e}") except Exception as e: raise Exception(f"Failed to chunk data: {e}") try: vectors = embed_chunks(chunks) - if vectors: - print("embedding generated successfully!") + if vectors: + logger.info("Embeddings generated successfully: count=%d", len(vectors)) except Exception as e: raise Exception(f"failed to embed chunks: {e}") - store(vectors) - print("Vectors saved to Pinecone!") + if not vectors: + logger.warning("No vectors generated; skipping storage") + return {**state, "status": "success"} # No-op but not an error + + store(vectors) + logger.info("Stored %d vectors to Pinecone", len(vectors)) except Exception as e: - print(f"some error occured in store_and_send:{e}") + logger.exception("Some error occurred in store_and_send") return { "status": "error", "error_from": "store_and_send", "message": f"{e}", } # sending to frontend return {**state, "status": "success"}backend/pyproject.toml (1)
18-18: Remove “logging” from dependencies; it’s part of the standard libraryAdding “logging>=0.4.9.6” pulls an unnecessary PyPI package and could introduce confusion or supply-chain risk. Python’s logging is built-in.
- "logging>=0.4.9.6",README.md (1)
162-171: Fix list indentation, code fence language, and .env formatting (backend section)Apply consistent list indentation, add a fenced code language, and remove spaces around = for dotenv compatibility.
-*Setup environment variables:* - - add .env file in `/backend`directory. - - add following environment variable in your .env file. - ``` -GROQ_API_KEY= <groq_api_key> -PINECONE_API_KEY = <your_pinecone_API_KEY> -PORT = 8000 -SEARCH_KEY = <your_Google_custom_search_engine_API_key> - ``` +*Setup environment variables:* +- Add a .env file in the `/backend` directory. +- Add the following environment variables to your .env file. +```env +GROQ_API_KEY=<groq_api_key> +PINECONE_API_KEY=<your_pinecone_API_KEY> +PORT=8000 +SEARCH_KEY=<your_Google_custom_search_engine_API_key> +```backend/app/modules/langgraph_nodes/sentiment.py (1)
48-53: Replace print with logger and include tracebackUse the logger with exception context instead of printing. Also consider avoiding logging raw user content elsewhere in this module to reduce PII leakage risk.
- print(f"Error in sentiment_analysis: {e}") + logger.exception("Error in sentiment_analysis")Add at the top of the file:
import logging logger = logging.getLogger(__name__)Optional: add a short docstring to
run_sentiment_sdkdescribing inputs/outputs.backend/app/modules/bias_detection/check_bias.py (2)
41-46: Parse and validate numeric bias score (0–100)The LLM might return extra tokens; parse and clamp to the expected range to keep downstream code robust.
- bias_score = chat_completion.choices[0].message.content.strip() - return { - "bias_score": bias_score, - "status": "success", - } + raw = chat_completion.choices[0].message.content.strip() + match = re.search(r"\d{1,3}", raw) + if not match: + raise ValueError(f"Non-numeric bias score: {raw!r}") + bias_score = max(0, min(int(match.group(0)), 100)) + return { + "bias_score": bias_score, + "status": "success", + }Also add at the top of this file if not present:
import re
48-54: Use logger with traceback in exception pathReplace print with
logger.exceptionto retain stack traces and centralize logging.- print(f"Error in bias_detection: {e}") + logger.exception("Error in bias_detection")backend/app/modules/chat/llm_processing.py (5)
29-37: Add error handling around chat completion callThe network call can raise exceptions (timeouts, auth errors). Capture/log and return a graceful error to callers.
- response = client.chat.completions.create( - model="gemma2-9b-it", - messages=[ - {"role": "system", "content": "Use only the context to answer."}, - {"role": "user", "content": prompt}, - ], - ) - - return response.choices[0].message.content + try: + response = client.chat.completions.create( + model="gemma2-9b-it", + messages=[ + {"role": "system", "content": "Use only the context to answer."}, + {"role": "user", "content": prompt}, + ], + ) + return response.choices[0].message.content + except Exception as e: + logger.exception("LLM call failed") + return "Sorry, I couldn't generate a response at this time."Note: See logger addition in the next comment.
19-19: Replace print with structured logging (aligns with PR goal)Printing the context leaks to stdout and contradicts the PR objective. Switch to a module logger and log at debug level.
+import logging @@ +logger = logging.getLogger(__name__) @@ - print(context) + logger.debug("RAG context length=%d", len(context))
7-8: Guard against missing GROQ_API_KEY at startupIf GROQ_API_KEY is unset, client creation will fail later with confusing errors. Validate early.
-client = Groq(api_key=os.getenv("GROQ_API_KEY")) +api_key = os.getenv("GROQ_API_KEY") +if not api_key: + raise RuntimeError("GROQ_API_KEY is not set") +client = Groq(api_key=api_key)
17-27: Cap/truncate the assembled context before building the prompt (fix required)build_context currently concatenates all doc explanations/reasoning and is called directly in ask_llm — this can cause token overflows/cost spikes. rg shows build_context is defined and used only in backend/app/modules/chat/llm_processing.py, so it's safe to change the function signature/behavior.
What to change (concise):
- Limit total characters/tokens and/or select top-k docs before joining.
- Prefer top-k by a score in metadata (e.g. score/similarity) when available, otherwise fall back to longest/most relevant texts.
- Replace printing with logging.debug and parameterize max size via config/env.
Suggested replacement (minimal patch):
# backend/app/modules/chat/llm_processing.py def build_context(docs, max_chars=20000, top_k=20): entries = [] for m in docs: meta = m.get("metadata", {}) or {} text = meta.get("explanation") or meta.get("reasoning") or "" score = meta.get("score") or meta.get("similarity") or 0 entries.append({"text": text, "score": score}) # prefer scored ordering if scores exist, else by text length if any(e["score"] for e in entries): entries.sort(key=lambda e: e["score"], reverse=True) else: entries.sort(key=lambda e: len(e["text"]), reverse=True) selected = [] total = 0 for e in entries[:top_k]: t = e["text"] if not t: continue if total + len(t) > max_chars: remain = max_chars - total if remain > 0: selected.append(t[:remain]) total = max_chars break selected.append(t) total += len(t) return "\n".join(selected) def ask_llm(question, docs): context = build_context(docs, max_chars=20000, top_k=20) # use logging.debug(...) instead of print in production prompt = f"""You are an assistant that answers based on context. Context: {context} Question: {question} """Notes:
- Adjust max_chars/top_k to your model's token limits (consider converting chars -> tokens if you have a tokenizer).
- If you rely on downstream callers, update them for the new build_context signature (rg output shows only local usage now).
- Replace print(context) with logging as appropriate.
1-37: Action required: replace remaining print(...) calls with logging across backendThe grep results show many leftover print statements in backend files. Replace them with logger calls and add a module-level logger (e.g. import logging; logger = logging.getLogger(name)). Only backend/app/utils/store_vectors.py currently defines a logger.
Files that need attention (path:line):
- backend/main.py:51
- backend/app/utils/fact_check_utils.py:14,21,29,35,37,39
- backend/app/routes/routes.py:31,38,48
- backend/app/db/vector_store.py:22,30
- backend/app/modules/pipeline.py:26
- backend/app/modules/vector_store/chunk_rag_data.py:72
- backend/app/modules/langgraph_nodes/generate_perspective.py:54
- backend/app/modules/chat/llm_processing.py:19
- backend/app/modules/langgraph_nodes/store_and_send.py:9,19,24,27
- backend/app/modules/langgraph_nodes/error_handler.py:2,3,4
- backend/app/modules/langgraph_nodes/sentiment.py:48
- backend/app/modules/langgraph_nodes/judge.py:48
- backend/app/modules/langgraph_nodes/fact_check.py:14,22
- backend/app/modules/bias_detection/check_bias.py:13,14,49
- backend/app/modules/facts_check/llm_processing.py:52,110,116,127
Recommended changes (concise):
- Add at top of each module: import logging; logger = logging.getLogger(name).
- Replace print(...) with appropriate logger levels: logger.debug/info/warning/error.
- Ensure logging is configured once in the app entrypoint (backend/main.py) rather than using prints there.
- Re-run the rg check to confirm no prints remain.
Example (backend/app/modules/chat/llm_processing.py):
- Add:
import logging
logger = logging.getLogger(name)- Replace:
print(context)
with:
logger.debug(context)backend/app/modules/scraper/extractor.py (1)
26-31: Critical: incorrect requests.get call uses headers as paramsrequests.get(url, headers) treats headers as query params. This breaks headers and can leak data. Use the named headers kwarg.
- res = requests.get(self.url, self.headers, timeout=10) + res = requests.get(self.url, headers=self.headers, timeout=10)backend/app/db/vector_store.py (1)
20-31: Use logger instead of prints; chain exceptions withfrom e; fix typosTwo improvements recommended here:
- Replace print with a module logger to align with PR goal.
- Use exception chaining (B904) and fix message typos (“occurred”, “initializing”).
Apply this diff:
@@ -import os -from pinecone import Pinecone, ServerlessSpec, CloudProvider, AwsRegion +import os +import logging +from pinecone import Pinecone, ServerlessSpec, CloudProvider, AwsRegion @@ -try: - # Initialize Pinecone client - pc = Pinecone(api_key=PINECONE_API_KEY) - -except Exception as e: - raise RuntimeError(f"Error occured while intialising pinecone client:{e}") +logger = logging.getLogger(__name__) +try: + # Initialize Pinecone client + pc = Pinecone(api_key=PINECONE_API_KEY) +except Exception as e: + raise RuntimeError("Error occurred while initializing Pinecone client") from e @@ if not pc.has_index(INDEX_NAME): - print(f"Creating index: {INDEX_NAME}") + logger.info("Creating index: %s", INDEX_NAME) pc.create_index( name=INDEX_NAME, dimension=DIMENSIONS, metric=METRIC, spec=ServerlessSpec(cloud=CloudProvider.AWS, region=AwsRegion.US_EAST_1), ) else: - print(f"Index '{INDEX_NAME}' already exists") + logger.info("Index '%s' already exists", INDEX_NAME) @@ try: # Connect to the index index = pc.Index(INDEX_NAME) except Exception as e: - raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}") + raise RuntimeError(f"Error occurred while connecting to the index {INDEX_NAME}") from eAlso applies to: 32-36
frontend/app/analyze/loading/page.tsx (1)
128-131: Intervals leak: cleanup function returned inside inner async is ignored by useEffectThe cleanup returned from runAnalysis is not used by useEffect, so intervals may continue running after unmount.
Here’s a minimal fix to ensure cleanup is returned by the effect itself:
useEffect(() => { let stepInterval: ReturnType<typeof setInterval> | undefined; let progressInterval: ReturnType<typeof setInterval> | undefined; const runAnalysis = async () => { // ... existing logic ... stepInterval = setInterval(/* ... */); progressInterval = setInterval(/* ... */); }; runAnalysis(); return () => { if (stepInterval) clearInterval(stepInterval); if (progressInterval) clearInterval(progressInterval); }; }, [router]);backend/app/utils/fact_check_utils.py (3)
13-16: Bug: Checking input state instead of extractor result statusThis will never detect extractor failures correctly if the input state lacks or retains a different status. You should check the result returned by run_claim_extractor_sdk.
- if state.get("status") != "success": - print("❌ Claim extraction failed.") - return [], "Claim extraction failed." + if result.get("status") != "success": + return [], result.get("message", "Claim extraction failed.")
21-40: Replace prints with structured logging and implement the “polite delay” mentioned in comment
- Replace all print calls with logger.info/warning/error for consistency and production readiness.
- Implement a small delay (e.g., 1s) between search requests to respect provider rate limits.
Apply the following diff within this block:
- print(f"🧠 Extracted claims: {claims}") + logger.info("Extracted claims: %s", claims) @@ - print(f"\n🔍 Searching for claim: {claim}") + logger.info("Searching for claim: %s", claim) try: results = search_google(claim) if results: results[0]["claim"] = claim search_results.append(results[0]) - print(f"✅ Found result: {results[0]['title']}") + logger.info("Found result: %s", results[0]["title"]) else: - print(f"⚠️ No search result for: {claim}") + logger.warning("No search result for: %s", claim) except Exception as e: - print(f"❌ Search failed for: {claim} -> {e}") + logger.exception("Search failed for: %s -> %s", claim, e) + # Be polite with search providers + time.sleep(1)Add this near the imports (outside the shown range):
import logging logger = logging.getLogger(__name__)Do you want me to push a follow-up patch converting the remaining backend prints to use the module logger?
41-46: Propagate LLM verification errors to callers (fix required)run_fact_verifier_sdk returns a status/message on failure; run_fact_check_pipeline currently swallows that and returns an empty list. Upstream callers already expect and handle an error tuple, so surface the LLM error instead of hiding it.
- backend/app/modules/facts_check/llm_processing.py — run_fact_verifier_sdk (lines ~60–132): returns {"status":"success", "verifications": ...} or {"status":"error", "message": ...}
- backend/app/utils/fact_check_utils.py — run_fact_check_pipeline (lines ~41–46): currently ignores final["status"] and returns final.get("verifications", []), None
- backend/app/modules/langgraph_nodes/fact_check.py — caller (line 11) does: verifications, error_message = run_fact_check_pipeline(state) and checks error_message, so it will handle the propagated message
Suggested change:
- final = run_fact_verifier_sdk(search_results) - return final.get("verifications", []), None + final = run_fact_verifier_sdk(search_results) + if final.get("status") != "success": + return [], final.get("message", "Fact verification failed.") + return final.get("verifications", []), NoneVerified: the caller unpacks (verifications, error_message) and handles errors — apply the change to surface LLM verification failures.
backend/main.py (1)
33-40: CORS: Wildcard origin with allow_credentials=True is invalid in browsersBrowsers reject Access-Control-Allow-Origin: * when Access-Control-Allow-Credentials: true. Define explicit origins for credentialed requests, or turn credentials off for wildcard.
-app.add_middleware( - CORSMiddleware, - allow_origins=["*"], - allow_credentials=True, - allow_methods=["*"], - allow_headers=["*"], -) +allowed_origins = os.getenv("CORS_ALLOWED_ORIGINS", "*") +if allowed_origins == "*": + # Wildcard allowed, but credentials must be disabled to be valid in browsers + app.add_middleware( + CORSMiddleware, + allow_origins=["*"], + allow_credentials=False, + allow_methods=["*"], + allow_headers=["*"], + ) +else: + origins = [o.strip() for o in allowed_origins.split(",") if o.strip()] + app.add_middleware( + CORSMiddleware, + allow_origins=origins, + allow_credentials=True, + allow_methods=["*"], + allow_headers=["*"], + )Add this import near the top (outside the range):
import osI can add a README note showing how to set CORS_ALLOWED_ORIGINS (comma-separated) for common deployments.
🧹 Nitpick comments (21)
backend/app/utils/store_vectors.py (1)
27-29: Prefer lazy/structured logging over f-strings in log messagesUsing f-strings eagerly formats the message even when the log level is disabled. Switch to parameterized logging for performance and to enable structured logging later.
Apply:
- logger.info( - f"Successfully stored {len(vectors)} vectors in namespace '{namespace}'" - ) + logger.info( + "Successfully stored %d vectors in namespace '%s'", + len(vectors), + namespace, + )backend/app/modules/langgraph_nodes/store_and_send.py (1)
9-21: Normalize error message casing/spellingMinor consistency: capitalize messages and fix “occured” -> “occurred” to standardize logs and returned errors.
If you keep the message values user-visible, consider:
- raise Exception(f"failed to embed chunks: {e}") + raise Exception(f"Failed to embed chunks: {e}")And as shown above in the logger.exception message: “occurred”.
README.md (1)
139-141: Minor grammar and spacingAdd missing spaces and adjust capitalization for clarity.
- “
/frontenddirectory” -> “/frontenddirectory”- “add following environment variable” -> “Add the following environment variable”
backend/app/modules/scraper/keywords.py (4)
21-22: Be explicit about the sort key; future-proof against upstream changesRAKE returns (score, phrase) tuples today, but being explicit avoids surprises if the return shape changes.
- keywords = [phrase for score, phrase in sorted(keywords_with_scores, reverse=True)] + keywords = [ + phrase for score, phrase in sorted( + keywords_with_scores, key=lambda t: t[0], reverse=True + ) + ]
5-15: Add return type hints for clarityThe docstring declares List[str], but the signature lacks a return type. Add it for better IDE/type-checker support.
-def extract_keywords(text: str, max_keywords: int = 15): +def extract_keywords(text: str, max_keywords: int = 15) -> list[str]:
1-3: Adjust typing import for richer return typing in extract_keyword_dataIf you annotate extract_keyword_data’s return as Dict[str, Any], import Any.
-from typing import Dict +from typing import Dict, Any
25-41: Optionally annotate extract_keyword_data returnImproves downstream usage and tooling support.
-def extract_keyword_data(text: str) -> Dict: +def extract_keyword_data(text: str) -> Dict[str, Any]:backend/app/modules/bias_detection/check_bias.py (1)
16-18: Error message mismatch with parameter nameThe function takes
text, notcleaned_text. Align the message.- raise ValueError("Missing or empty 'cleaned_text'") + raise ValueError("Missing or empty 'text'")backend/app/modules/chat/get_rag_data.py (1)
15-17: Make namespace configurable via environmentHard-coding namespace reduces flexibility across environments/tenants. Consider reading it from an env var with a safe default.
Example:
- results = index.query( - vector=embeddings, top_k=top_k, include_metadata=True, namespace="default" - ) + namespace = os.getenv("PINECONE_NAMESPACE", "default") + results = index.query(vector=embeddings, top_k=top_k, include_metadata=True, namespace=namespace)backend/app/modules/vector_store/embed.py (1)
21-26: Use a list comprehension for vectors; improves clarity and performanceThe loop is fine, but a comprehension is simpler and faster for pure construction.
- vectors = [] - for chunk, embedding in zip(chunks, embeddings): - vectors.append( - {"id": chunk["id"], "values": embedding, "metadata": chunk["metadata"]} - ) - return vectors + return [ + {"id": chunk["id"], "values": embedding, "metadata": chunk["metadata"]} + for chunk, embedding in zip(chunks, embeddings) + ]backend/app/modules/chat/llm_processing.py (1)
11-14: LGTM on context builder formattingEquivalent semantics; joins explanations or reasoning across docs. Consider filtering out empty strings to avoid stray newlines.
frontend/app/page.tsx (1)
107-111: A11y nit: mark decorative icons as hidden from assistive techThese Globe icons appear decorative. Add aria-hidden to avoid noise for screen readers.
- <Globe className="w-4 h-4 md:w-5 md:h-5 text-white" /> + <Globe aria-hidden className="w-4 h-4 md:w-5 md:h-5 text-white" /> @@ - <Globe className="w-3 h-3 md:w-4 md:h-4 text-white" /> + <Globe aria-hidden className="w-3 h-3 md:w-4 md:h-4 text-white" />Also applies to: 281-284
backend/app/modules/scraper/extractor.py (1)
81-93: Docstrings missing; add brief descriptions (aligns with PR goal)This module and class lack docstrings, which contradicts the PR objective. Consider adding concise docstrings for the class and methods.
Example insertion after class definition:
class Article_extractor: + """Extracts article content using multiple strategies (trafilatura, Newspaper3k, BS4+Readability), + returning the first successful result with a non-empty 'text' field."""If you want, I can generate full docstrings for all methods in this module.
backend/app/modules/vector_store/chunk_rag_data.py (1)
48-53: Hoist fact field names to a module constantMinor readability/maintainability improvement: define FACT_FIELDS once at module scope and reuse.
Apply something like:
FACT_FIELDS = ("original_claim", "verdict", "explanation", "source_link") # In the loop: for field in FACT_FIELDS: if field not in fact: raise ValueError(f"Missing required fact field: {field} in fact index {i}")backend/app/db/vector_store.py (1)
27-27: Parameterize region via env and keep spec formatting—LGTM otherwiseThe single-line ServerlessSpec call is fine. Consider allowing CLOUD/REGION via env for deployments across regions. Example envs: PINECONE_CLOUD=AWS, PINECONE_REGION=us-east-1.
frontend/app/analyze/loading/page.tsx (1)
75-80: Use the normalized URL helper for API callsPrevents double slashes and handles empty base (falls back to relative paths).
Apply this diff:
- axios.post(`${backend_url}/api/process`, { + axios.post(makeUrl("/api/process"), { url: storedUrl, }), - axios.post(`${backend_url}/api/bias`, { + axios.post(makeUrl("/api/bias"), { url: storedUrl, }),frontend/app/analyze/results/page.tsx (2)
48-51: Avoid toggling loading state in two placesYou already handle loading in the second effect. Setting it here as well can cause flicker and redundant state updates.
- if (storedBiasScore && storedData) { - setIsLoading(false); - } + // Let the second effect set isLoading once both are present
68-75: Prevent potential redirect race by setting the ref before pushYou created isRedirecting but never set it, so the early return won’t trigger. Set it before router.push to avoid repeated redirects.
- } else { - console.warn("No bias or data found. Redirecting..."); - router.push("/analyze"); + } else { + console.warn("No bias or data found. Redirecting..."); + isRedirecting.current = true; + router.push("/analyze"); }backend/main.py (2)
49-52: Use logging instead of print for startup messageAligns with “replace prints with logger” and integrates with uvicorn logging.
- # Run development server - port = int(os.environ.get("PORT", 7860)) - print(f"Server is running on http://0.0.0.0:{port}") + # Run development server + port = int(os.environ.get("PORT", 7860)) + import logging + logger = logging.getLogger("uvicorn.error") + logger.info("Server is running on http://0.0.0.0:%s", port)
15-17: Docstring usage path may be inaccurateGiven this file is backend/main.py, verify the suggested command should be:
- uvicorn backend.main:app --reload
backend/app/modules/langgraph_builder.py (1)
71-76: Redundant termination logic: pick either finish point or explicit end edgeYou both:
- add_conditional_edges("store_and_send", ...) to "end", and
- set_finish_point("store_and_send").
Only one is needed. Consider removing the conditional edges for store_and_send.
- graph.add_conditional_edges( - "store_and_send", - lambda x: ("error_handler" if x.get("status") == "error" else "__end__"), - ) - - graph.set_finish_point("store_and_send") + graph.set_finish_point("store_and_send")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these settings in your CodeRabbit configuration.
📒 Files selected for processing (27)
README.md(2 hunks)backend/app/db/vector_store.py(2 hunks)backend/app/modules/bias_detection/check_bias.py(2 hunks)backend/app/modules/chat/embed_query.py(0 hunks)backend/app/modules/chat/get_rag_data.py(1 hunks)backend/app/modules/chat/llm_processing.py(2 hunks)backend/app/modules/facts_check/web_search.py(1 hunks)backend/app/modules/langgraph_builder.py(3 hunks)backend/app/modules/langgraph_nodes/error_handler.py(1 hunks)backend/app/modules/langgraph_nodes/fact_check.py(1 hunks)backend/app/modules/langgraph_nodes/generate_perspective.py(2 hunks)backend/app/modules/langgraph_nodes/sentiment.py(1 hunks)backend/app/modules/langgraph_nodes/store_and_send.py(1 hunks)backend/app/modules/scraper/cleaner.py(4 hunks)backend/app/modules/scraper/extractor.py(5 hunks)backend/app/modules/scraper/keywords.py(2 hunks)backend/app/modules/vector_store/chunk_rag_data.py(1 hunks)backend/app/modules/vector_store/embed.py(1 hunks)backend/app/routes/routes.py(0 hunks)backend/app/utils/fact_check_utils.py(1 hunks)backend/app/utils/store_vectors.py(1 hunks)backend/main.py(2 hunks)backend/pyproject.toml(1 hunks)backend/start.sh(0 hunks)frontend/app/analyze/loading/page.tsx(2 hunks)frontend/app/analyze/results/page.tsx(4 hunks)frontend/app/page.tsx(11 hunks)
💤 Files with no reviewable changes (3)
- backend/app/routes/routes.py
- backend/app/modules/chat/embed_query.py
- backend/start.sh
🧰 Additional context used
🧬 Code Graph Analysis (2)
backend/app/utils/fact_check_utils.py (1)
backend/app/modules/facts_check/llm_processing.py (1)
run_fact_verifier_sdk(60-132)
backend/app/modules/langgraph_builder.py (6)
backend/app/modules/langgraph_nodes/error_handler.py (1)
error_handler(1-10)backend/app/modules/langgraph_nodes/sentiment.py (1)
run_sentiment_sdk(10-53)backend/app/modules/langgraph_nodes/fact_check.py (1)
run_fact_check(4-28)backend/app/modules/langgraph_nodes/generate_perspective.py (1)
generate_perspective(24-60)backend/app/modules/langgraph_nodes/judge.py (1)
judge_perspective(13-53)backend/app/modules/langgraph_nodes/store_and_send.py (1)
store_and_send(6-34)
🪛 Ruff (0.12.2)
backend/app/db/vector_store.py
36-36: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🪛 markdownlint-cli2 (0.17.2)
README.md
139-139: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
140-140: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
141-141: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
163-163: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
164-164: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
165-165: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (15)
backend/pyproject.toml (2)
2-2: Project name change looks goodRenaming the project to “backend” aligns with the README and directory layout.
6-6: Confirm Python 3.13 requirement across dependenciesPinning requires-python to ">=3.13" is aggressive. Some libraries may lag 3.13 support. Verify that FastAPI, Uvicorn, LangChain, Pinecone SDK, etc., are all 3.13-compatible in your target deploy environment.
If not strictly required, consider ">=3.10" or ">=3.11" which are more widely supported.
backend/app/modules/scraper/keywords.py (1)
41-41: LGTM on trailing commaTrailing comma improves diffs and is consistent with the style elsewhere.
backend/app/modules/langgraph_nodes/sentiment.py (1)
28-30: LGTM: simplified prompt constructionConsolidating to a single f-string improves readability without changing behavior.
backend/app/modules/bias_detection/check_bias.py (1)
33-34: LGTM: unified f-stringThe prompt construction change is a no-op functionally and reads cleaner.
backend/app/modules/langgraph_nodes/generate_perspective.py (1)
16-16: Model identifier validated — no change requiredConfirmed: Groq's ChatGroq supports "llama-3.3-70b-versatile" (per Groq docs).
- File: backend/app/modules/langgraph_nodes/generate_perspective.py — line 16:
llm = ChatGroq(model=my_llm, temperature=0.7)backend/app/modules/chat/llm_processing.py (1)
33-34: Good: add user message with promptAdding the user message fixes the common pitfall of sending only a system message. This should improve LLM adherence to the provided context.
frontend/app/page.tsx (1)
136-139: LGTM: copy edits and formattingUI copy reflows, semicolons, and CTA tweaks look good. No functional changes introduced.
Also applies to: 150-151, 162-164, 178-184, 195-197, 237-239, 264-266, 286-295, 300-300
backend/app/modules/scraper/extractor.py (1)
17-22: LGTM: header formatting cleanupPure formatting; effective UA header preserved.
backend/app/modules/vector_store/chunk_rag_data.py (1)
55-67: LGTM on the fact chunk constructionClear, consistent structure with metadata (including article_id). Looks good.
backend/app/modules/facts_check/web_search.py (1)
22-22: LGTM on minor formattingTrailing comma in the returned list is harmless and consistent.
backend/app/utils/fact_check_utils.py (1)
2-5: Import of run_fact_verifier_sdk enabled correctlyMaking run_fact_verifier_sdk available here aligns this module with the LLM verification flow. No issues spotted.
frontend/app/analyze/results/page.tsx (1)
22-22: Guard against missing NEXT_PUBLIC_API_URLIf the env var is not set at build time, the URL becomes undefined and axios will call “undefined/api/chat”. Fail fast or provide a user-facing error.
-const backend_url = process.env.NEXT_PUBLIC_API_URL; +const backend_url = process.env.NEXT_PUBLIC_API_URL ?? "";Optionally add a runtime check inside handlers to show a friendly message when backend_url is empty (see chat handler suggestion below).
backend/app/modules/langgraph_builder.py (2)
8-9: Minor import formatting improvement LGTMTrailing comma is fine and future-proofs additional imports.
57-70: Decision edge behavior differs from the PR summaryThe code routes low scores (<70) to generate_perspective (unless retries>=3), not back to judge_perspective as described. Confirm intended behavior; if the loop should be judge->judge, adjust accordingly.
| print(text) | ||
| print(json.dumps(text)) | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid printing user content; use a logger and minimize PII exposure
Dumping the full article to stdout/stderr is a privacy and compliance risk. Use a logger and avoid logging raw text in production.
- print(text)
- print(json.dumps(text))
+ logger.debug("Bias detection invoked (input_length=%s)", len(text) if isinstance(text, str) else "n/a")Add at the top:
import logging
logger = logging.getLogger(__name__)🤖 Prompt for AI Agents
In backend/app/modules/bias_detection/check_bias.py around lines 13-15, the code
prints full user/article text to stdout which risks exposing PII; remove the
print(text) and print(json.dumps(text)) calls and replace them with a logger
usage (add import logging and logger = logging.getLogger(__name__) at the top),
logging only non-sensitive minimal metadata instead of raw content — e.g., log
the text length, a deterministic hash/ID, or a redacted/truncated excerpt, and
ensure logging level is appropriate (debug/info) and that no raw article content
or PII is written to logs.
| matches = [] | ||
| for match in results["matches"]: | ||
| matches.append({ | ||
| "id": match["id"], | ||
| "score": match["score"], | ||
| "metadata": match["metadata"] | ||
| }) | ||
| matches.append( | ||
| {"id": match["id"], "score": match["score"], "metadata": match["metadata"]} | ||
| ) | ||
| return matches |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid KeyError and simplify match extraction
Accessing results["matches"] can raise a KeyError if the client response changes or errors. Also, the append loop can be replaced with a concise, readable comprehension.
Apply this diff:
- matches = []
- for match in results["matches"]:
- matches.append(
- {"id": match["id"], "score": match["score"], "metadata": match["metadata"]}
- )
- return matches
+ return [
+ {"id": m.get("id"), "score": m.get("score"), "metadata": m.get("metadata")}
+ for m in results.get("matches", [])
+ ]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| matches = [] | |
| for match in results["matches"]: | |
| matches.append({ | |
| "id": match["id"], | |
| "score": match["score"], | |
| "metadata": match["metadata"] | |
| }) | |
| matches.append( | |
| {"id": match["id"], "score": match["score"], "metadata": match["metadata"]} | |
| ) | |
| return matches | |
| return [ | |
| {"id": m.get("id"), "score": m.get("score"), "metadata": m.get("metadata")} | |
| for m in results.get("matches", []) | |
| ] |
🤖 Prompt for AI Agents
In backend/app/modules/chat/get_rag_data.py around lines 19 to 24, replace the
manual loop that does results["matches"] and appends dicts with a safe, concise
list comprehension that uses results.get("matches", []) to avoid KeyError and
uses match.get("id"), match.get("score"), and match.get("metadata") for
defensive access; return the comprehension directly (e.g., return [{ "id":
match.get("id"), "score": match.get("score"), "metadata": match.get("metadata")
} for match in results.get("matches", [])]).
| results = requests.get( | ||
| f"https://www.googleapis.com/customsearch/v1?key={GOOGLE_SEARCH}&cx=f637ab77b5d8b4a3c&q={query}" | ||
| ) |
There was a problem hiding this comment.
Encode query via params, add timeout and HTTP error handling; avoid hard-coded cx
Current call risks bad encoding and long hangs. Use params, set a timeout, check status, and make cx configurable via env. Also guard against missing items to avoid KeyError.
Apply this diff:
-def search_google(query):
- results = requests.get(
- f"https://www.googleapis.com/customsearch/v1?key={GOOGLE_SEARCH}&cx=f637ab77b5d8b4a3c&q={query}"
- )
- res = results.json()
+def search_google(query):
+ cx = os.getenv("SEARCH_CX")
+ if not GOOGLE_SEARCH or not cx:
+ raise RuntimeError("Missing Google Custom Search API configuration (SEARCH_KEY/SEARCH_CX)")
+ results = requests.get(
+ "https://www.googleapis.com/customsearch/v1",
+ params={"key": GOOGLE_SEARCH, "cx": cx, "q": query},
+ timeout=10,
+ )
+ results.raise_for_status()
+ res = results.json()Additionally, harden the parsing (outside the changed hunk):
items = res.get("items") or []
if not items:
return []
first = {
"title": items[0].get("title", ""),
"link": items[0].get("link", ""),
"snippet": items[0].get("snippet", ""),
}Note: Avoid logging the full request URL to prevent leaking the API key.
🤖 Prompt for AI Agents
In backend/app/modules/facts_check/web_search.py around lines 11 to 13, the
requests.get call should be replaced to use params (so the query is
URL-encoded), include a timeout, and avoid a hard-coded cx: read CX from an
environment variable (with a sensible default or raise if missing) and pass both
key and cx via the params dict; after the request call check for HTTP errors
(response.raise_for_status() or if response.status_code != 200 then
handle/raise) and catch requests.exceptions.Timeout/RequestException to handle
network errors instead of hanging; when parsing the JSON, guard against missing
"items" by using res.get("items") or [] and return [] if empty, and build the
first result using .get for title/link/snippet as shown in the suggested
snippet; do not log the full request URL (avoid including the API key) — log
only safe metadata if needed.
| def error_handler(input): | ||
| print("Error detected!") | ||
| print(f"From: {input.get('error_from')}") | ||
| print(f"Message: {input.get('message')}") | ||
|
|
||
| return {"status": "stopped_due_to_error", | ||
| "from": [input.get("error_from")], | ||
| "error": [input.get("message")] | ||
| } | ||
| return { | ||
| "status": "stopped_due_to_error", | ||
| "from": [input.get("error_from")], | ||
| "error": [input.get("message")], | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Replace prints with logger, add docstring, and avoid shadowing built-in input
This module still uses print statements and shadows the built-in name input. Switch to structured logging and rename the parameter to avoid confusion.
+import logging
+
-def error_handler(input):
- print("Error detected!")
- print(f"From: {input.get('error_from')}")
- print(f"Message: {input.get('message')}")
+logger = logging.getLogger(__name__)
+
+def error_handler(err):
+ """Normalize and log an error emitted from a node in the LangGraph pipeline."""
+ logger.error("Error detected! from=%s message=%s", err.get("error_from"), err.get("message"))
@@
- return {
- "status": "stopped_due_to_error",
- "from": [input.get("error_from")],
- "error": [input.get("message")],
- }
+ return {
+ "status": "stopped_due_to_error",
+ "from": err.get("error_from"),
+ "error": err.get("message"),
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def error_handler(input): | |
| print("Error detected!") | |
| print(f"From: {input.get('error_from')}") | |
| print(f"Message: {input.get('message')}") | |
| return {"status": "stopped_due_to_error", | |
| "from": [input.get("error_from")], | |
| "error": [input.get("message")] | |
| } | |
| return { | |
| "status": "stopped_due_to_error", | |
| "from": [input.get("error_from")], | |
| "error": [input.get("message")], | |
| } | |
| import logging | |
| logger = logging.getLogger(__name__) | |
| def error_handler(err): | |
| """Normalize and log an error emitted from a node in the LangGraph pipeline.""" | |
| logger.error("Error detected! from=%s message=%s", err.get("error_from"), err.get("message")) | |
| return { | |
| "status": "stopped_due_to_error", | |
| "from": err.get("error_from"), | |
| "error": err.get("message"), | |
| } |
🤖 Prompt for AI Agents
In backend/app/modules/langgraph_nodes/error_handler.py around lines 1 to 10,
the function shadows the built-in name input, uses print statements, and lacks a
docstring; rename the parameter to something like payload or data to avoid
shadowing, add a concise docstring describing the function and its
parameters/return, and replace the prints with structured logging (use the
module logger: logger = logging.getLogger(__name__)) logging the error source
and message via logger.error or logger.exception with payload fields; keep the
returned dictionary shape the same and ensure you import logging at top if not
present.
| return { | ||
| "status": "stopped_due_to_error", | ||
| "from": [input.get("error_from")], | ||
| "error": [input.get("message")], | ||
| } |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Avoid unintended schema change: 'from' and 'error' turned into lists
Returning lists instead of strings will likely break consumers expecting scalars. Unless downstream explicitly expects arrays here, keep these fields as strings.
Apply this diff:
- return {
- "status": "stopped_due_to_error",
- "from": [input.get("error_from")],
- "error": [input.get("message")],
- }
+ return {
+ "status": "stopped_due_to_error",
+ "from": input.get("error_from"),
+ "error": input.get("message"),
+ }If arrays are intended, confirm the schema and update all callers accordingly.
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Show file (if present) ==="
if [ -f backend/app/modules/langgraph_nodes/error_handler.py ]; then
sed -n '1,240p' backend/app/modules/langgraph_nodes/error_handler.py || true
else
echo "File not found: backend/app/modules/langgraph_nodes/error_handler.py"
fi
echo
echo "=== Search repository for relevant usages ==="
rg --hidden -n --no-ignore-vcs "stopped_due_to_error" || true
rg --hidden -n --no-ignore-vcs "error_from" || true
rg --hidden -n --no-ignore-vcs "\"from\":\s*\[" || true
rg --hidden -n --no-ignore-vcs "\"error\":\s*\[" || true
rg --hidden -n --no-ignore-vcs "get\\(.*message" || true
rg --hidden -n --no-ignore-vcs "get\\(.*error" || true
rg --hidden -n --no-ignore-vcs "['\"]error['\"]\]\[0\]" || true
rg --hidden -n --no-ignore-vcs "['\"]from['\"]\]\[0\]" || trueLength of output: 3449
Fix required — return scalars for "from" and "error" in error_handler
Returning lists here changes the response schema and will likely break callers. Repo search shows other nodes set "error_from" as strings and I found no callers expecting arrays — only this file returns arrays.
Files to fix / review:
- backend/app/modules/langgraph_nodes/error_handler.py — change the returned fields to scalars.
- Caller locations that supply error_from (for verification): backend/app/modules/langgraph_nodes/sentiment.py, generate_perspective.py, fact_check.py, judge.py, backend/app/modules/facts_check/llm_processing.py, backend/app/modules/bias_detection/check_bias.py, backend/app/modules/langgraph_nodes/store_and_send.py
Apply this diff:
- return {
- "status": "stopped_due_to_error",
- "from": [input.get("error_from")],
- "error": [input.get("message")],
- }
+ return {
+ "status": "stopped_due_to_error",
+ "from": input.get("error_from"),
+ "error": input.get("message"),
+ }If arrays were intentionally introduced, confirm the schema and update all callers accordingly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return { | |
| "status": "stopped_due_to_error", | |
| "from": [input.get("error_from")], | |
| "error": [input.get("message")], | |
| } | |
| return { | |
| "status": "stopped_due_to_error", | |
| "from": input.get("error_from"), | |
| "error": input.get("message"), | |
| } |
🤖 Prompt for AI Agents
In backend/app/modules/langgraph_nodes/error_handler.py around lines 6 to 10,
the function is returning "from" and "error" as single-element lists which
breaks the response schema; change both fields to scalars by returning the raw
values (e.g., input.get("error_from") and input.get("message")) instead of
wrapping them in lists, and ensure sensible defaults (empty string or None) if
keys are missing; after making this change, verify the listed caller files
(sentiment.py, generate_perspective.py, fact_check.py, judge.py,
backend/app/modules/facts_check/llm_processing.py,
backend/app/modules/bias_detection/check_bias.py, store_and_send.py) still work
with scalar values and update any caller that was (incorrectly) expecting arrays
or adjust the schema if arrays were intentional.
| for i, chunk in enumerate(chunks): | ||
| if not isinstance(chunk, dict) or "text" not in chunk: | ||
| raise ValueError( | ||
| f"Invalid chunk structure at index {i}:" | ||
| " missing 'text' field" | ||
| ) | ||
| f"Invalid chunk structure at index {i}: missing 'text' field" | ||
| ) |
There was a problem hiding this comment.
Validate required keys to prevent KeyError later
You only validate the 'text' field, but later access 'id' and 'metadata' directly. Missing keys will raise KeyError at vector construction.
Apply this diff to strengthen validation:
- # Validate chunk structure
- for i, chunk in enumerate(chunks):
- if not isinstance(chunk, dict) or "text" not in chunk:
- raise ValueError(
- f"Invalid chunk structure at index {i}: missing 'text' field"
- )
+ # Validate chunk structure
+ for i, chunk in enumerate(chunks):
+ if not isinstance(chunk, dict):
+ raise ValueError(f"Invalid chunk at index {i}: expected dict, got {type(chunk).__name__}")
+ missing = [k for k in ("id", "text", "metadata") if k not in chunk]
+ if missing:
+ raise ValueError(f"Invalid chunk structure at index {i}: missing keys: {missing}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for i, chunk in enumerate(chunks): | |
| if not isinstance(chunk, dict) or "text" not in chunk: | |
| raise ValueError( | |
| f"Invalid chunk structure at index {i}:" | |
| " missing 'text' field" | |
| ) | |
| f"Invalid chunk structure at index {i}: missing 'text' field" | |
| ) | |
| # Validate chunk structure | |
| for i, chunk in enumerate(chunks): | |
| if not isinstance(chunk, dict): | |
| raise ValueError(f"Invalid chunk at index {i}: expected dict, got {type(chunk).__name__}") | |
| missing = [k for k in ("id", "text", "metadata") if k not in chunk] | |
| if missing: | |
| raise ValueError(f"Invalid chunk structure at index {i}: missing keys: {missing}") |
🤖 Prompt for AI Agents
In backend/app/modules/vector_store/embed.py around lines 12 to 16, the current
validation only checks for 'text' but later code accesses 'id' and 'metadata',
which can cause KeyError; update the validation to ensure each chunk is a dict
and contains the keys 'text', 'id', and 'metadata' (optionally check that 'id'
is a string/int and 'metadata' is a dict), and raise a clear ValueError
indicating which key is missing and the chunk index so downstream vector
construction never receives malformed input.
| logger.error(f"Failed to store vectors in namespace '{namespace}': {e}") | ||
| raise RuntimeError(f"Vector storage failed: {e}") |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Capture stacktrace when logging exceptions
Use logger.exception (or exc_info=True) to preserve the traceback. The current message loses valuable debugging context.
- logger.error(f"Failed to store vectors in namespace '{namespace}': {e}")
+ logger.exception("Failed to store vectors in namespace '%s'", namespace)
raise RuntimeError(f"Vector storage failed: {e}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.error(f"Failed to store vectors in namespace '{namespace}': {e}") | |
| raise RuntimeError(f"Vector storage failed: {e}") | |
| logger.exception("Failed to store vectors in namespace '%s'", namespace) | |
| raise RuntimeError(f"Vector storage failed: {e}") |
🧰 Tools
🪛 Ruff (0.12.2)
32-32: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In backend/app/utils/store_vectors.py around lines 31 to 32, the exception is
being logged with logger.error which drops the traceback; replace that call with
logger.exception(f"Failed to store vectors in namespace '{namespace}': {e}") or
logger.error(..., exc_info=True) to preserve the stacktrace, and when re-raising
wrap the original exception (e.g., raise RuntimeError(f"Vector storage failed:
{e}") from e) so exception chaining is preserved.
| const backend_url = process.env.NEXT_PUBLIC_API_URL; | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Guard undefined NEXT_PUBLIC_API_URL and prevent double slashes
If the env is unset, requests will hit "undefined/api/...". Normalize the base URL or default to relative paths.
Apply this diff:
-const backend_url = process.env.NEXT_PUBLIC_API_URL;
+const rawBackendUrl = process.env.NEXT_PUBLIC_API_URL ?? "";
+// Remove trailing slashes to avoid `//api/...`
+const backend_url = rawBackendUrl.replace(/\/+$/, "");
+const makeUrl = (path: string) =>
+ `${backend_url}${path.startsWith("/") ? "" : "/"}${path}`;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const backend_url = process.env.NEXT_PUBLIC_API_URL; | |
| const rawBackendUrl = process.env.NEXT_PUBLIC_API_URL ?? ""; | |
| // Remove trailing slashes to avoid `//api/...` | |
| const backend_url = rawBackendUrl.replace(/\/+$/, ""); | |
| const makeUrl = (path: string) => | |
| `${backend_url}${path.startsWith("/") ? "" : "/"}${path}`; |
🤖 Prompt for AI Agents
In frontend/app/analyze/loading/page.tsx around lines 19-20, guard against an
undefined NEXT_PUBLIC_API_URL and normalize it to avoid requests like
"undefined/api/..." or double slashes: read the env var, default to an empty
string or relative base when unset, strip any trailing slashes from the
configured base URL, and when building endpoint paths always join with a single
slash (e.g., ensure base has no trailing slash and prepend path with a single
leading slash) so resulting URLs are valid.
| async function handleSendMessage(e: React.FormEvent) { | ||
| e.preventDefault(); | ||
| if (!message.trim()) return; | ||
| const newMessages = [...messages, { role: "user", content: message }]; | ||
| setMessages(newMessages); | ||
| setMessage(""); | ||
|
|
||
| const res = await axios.post("http://Thunder1245-perspective-backend.hf.space/api/chat", { | ||
| message: message | ||
| const res = await axios.post(`${backend_url}/api/chat`, { | ||
| message: message, | ||
| }); | ||
| const data = res.data; | ||
|
|
||
| console.log(data) | ||
| const data = res.data; | ||
|
|
||
| // 🔹 Step 2: Append LLM’s response | ||
| setMessages([...newMessages, { role: "assistant", content: data.answer }]); | ||
| }; | ||
| console.log(data); | ||
|
|
||
| // 🔹 Step 2: Append LLM’s response | ||
| setMessages([...newMessages, { role: "assistant", content: data.answer }]); | ||
| } | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Harden chat submit: add error handling, backend_url guard, and functional state updates
- Add try/catch around the request.
- Use functional setMessages to avoid stale closures.
- Guard against missing backend_url.
- Remove console.log to avoid noisy logs in production.
- async function handleSendMessage(e: React.FormEvent) {
+ async function handleSendMessage(e: React.FormEvent) {
e.preventDefault();
if (!message.trim()) return;
- const newMessages = [...messages, { role: "user", content: message }];
- setMessages(newMessages);
- setMessage("");
-
- const res = await axios.post(`${backend_url}/api/chat`, {
- message: message,
- });
- const data = res.data;
-
- console.log(data);
-
- // 🔹 Step 2: Append LLM’s response
- setMessages([...newMessages, { role: "assistant", content: data.answer }]);
+ const userMessage = { role: "user", content: message.trim() };
+ setMessages((prev) => [...prev, userMessage]);
+ setMessage("");
+
+ if (!backend_url) {
+ console.error("NEXT_PUBLIC_API_URL is not configured");
+ setMessages((prev) => [
+ ...prev,
+ { role: "assistant", content: "Configuration error: backend URL not set." },
+ ]);
+ return;
+ }
+
+ try {
+ const res = await axios.post(`${backend_url}/api/chat`, {
+ message: userMessage.content,
+ });
+ const data = res.data;
+ setMessages((prev) => [
+ ...prev,
+ { role: "assistant", content: data.answer },
+ ]);
+ } catch (err) {
+ console.error("Chat request failed", err);
+ setMessages((prev) => [
+ ...prev,
+ { role: "assistant", content: "Sorry, I couldn't reach the server. Please try again." },
+ ]);
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async function handleSendMessage(e: React.FormEvent) { | |
| e.preventDefault(); | |
| if (!message.trim()) return; | |
| const newMessages = [...messages, { role: "user", content: message }]; | |
| setMessages(newMessages); | |
| setMessage(""); | |
| const res = await axios.post("http://Thunder1245-perspective-backend.hf.space/api/chat", { | |
| message: message | |
| const res = await axios.post(`${backend_url}/api/chat`, { | |
| message: message, | |
| }); | |
| const data = res.data; | |
| console.log(data) | |
| const data = res.data; | |
| // 🔹 Step 2: Append LLM’s response | |
| setMessages([...newMessages, { role: "assistant", content: data.answer }]); | |
| }; | |
| console.log(data); | |
| // 🔹 Step 2: Append LLM’s response | |
| setMessages([...newMessages, { role: "assistant", content: data.answer }]); | |
| } | |
| async function handleSendMessage(e: React.FormEvent) { | |
| e.preventDefault(); | |
| if (!message.trim()) return; | |
| const userMessage = { role: "user", content: message.trim() }; | |
| setMessages((prev) => [...prev, userMessage]); | |
| setMessage(""); | |
| if (!backend_url) { | |
| console.error("NEXT_PUBLIC_API_URL is not configured"); | |
| setMessages((prev) => [ | |
| ...prev, | |
| { role: "assistant", content: "Configuration error: backend URL not set." }, | |
| ]); | |
| return; | |
| } | |
| try { | |
| const res = await axios.post(`${backend_url}/api/chat`, { | |
| message: userMessage.content, | |
| }); | |
| const data = res.data; | |
| setMessages((prev) => [ | |
| ...prev, | |
| { role: "assistant", content: data.answer }, | |
| ]); | |
| } catch (err) { | |
| console.error("Chat request failed", err); | |
| setMessages((prev) => [ | |
| ...prev, | |
| { role: "assistant", content: "Sorry, I couldn't reach the server. Please try again." }, | |
| ]); | |
| } | |
| } |
🤖 Prompt for AI Agents
In frontend/app/analyze/results/page.tsx around lines 78 to 95, the chat submit
handler lacks error handling, doesn't guard against a missing backend_url,
mutates state with potential stale closures, and leaves a console.log; wrap the
axios call and subsequent state updates in a try/catch, return or show an error
if backend_url is falsy before making the request, remove the console.log, and
update messages using functional updates (e.g., setMessages(prev => [...prev,
{role: "user", content: message}]) and later setMessages(prev => [...prev,
{role: "assistant", content: data?.answer ?? 'Error: no response'}])) so you
avoid stale state and handle missing/undefined responses.
|
|
||
| Setup environment variables:* | ||
| - add .env file in `/frontend`directory. | ||
| - add following environment variable in your .env file. | ||
| ``` | ||
| NEXT_PUBLIC_API_URL = http://localhost:8000 | ||
|
|
||
| ``` | ||
|
|
||
| ```bash |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Fix list indentation, code fence language, and .env formatting
Address markdownlint issues and ensure .env syntax is valid (no spaces around =).
-Setup environment variables:*
- - add .env file in `/frontend`directory.
- - add following environment variable in your .env file.
-```
-NEXT_PUBLIC_API_URL = http://localhost:8000
-
-```
+Setup environment variables:
+- Add a .env file in the `/frontend` directory.
+- Add the following environment variable to your .env file.
+```env
+NEXT_PUBLIC_API_URL=http://localhost:8000
+```🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
139-139: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
140-140: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
141-141: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In README.md around lines 137 to 146, fix the Markdown formatting: correct the
list indentation and capitalization, use a fenced code block with language
"env", and remove spaces around the equals sign in the .env example;
specifically update the bullet items to "Add a .env file in the `/frontend`
directory." and "Add the following environment variable to your .env file.",
replace the current code fence with ```env and show
NEXT_PUBLIC_API_URL=http://localhost:8000 (no spaces around =), then close the
fence with ``` to satisfy markdownlint and valid .env syntax.
There was a problem hiding this comment.
Actionable comments posted: 4
🔭 Outside diff range comments (16)
backend/app/modules/vector_store/chunk_rag_data.py (1)
38-41: Validate cleaned_text type/emptiness early for clearer errors.Currently, invalid cleaned_text types bubble up from generate_id. Add an explicit check.
Apply this diff:
if not isinstance(data["facts"], list): raise ValueError("Facts must be a list") - # Validate perspective structure + # Validate cleaned_text + if not isinstance(data["cleaned_text"], str) or not data["cleaned_text"].strip(): + raise ValueError("cleaned_text must be a non-empty string") + + # Validate perspective structurebackend/app/utils/fact_check_utils.py (2)
45-47: Bug: checking the wrong object for status (state vs result)You call run_claim_extractor_sdk(state) into result but then check state.get("status"), which will never reflect the extractor outcome.
Apply this diff:
- if state.get("status") != "success": - print("❌ Claim extraction failed.") - return [], "Claim extraction failed." + if result.get("status") != "success": + return [], "Claim extraction failed."
53-71: Replace prints with logger, honor “polite delay,” and propagate verifier errors
- Replace print calls with a module logger to match PR goals and avoid noisy stdout.
- The docstring mentions a “polite delay” but none is implemented; add a small sleep to respect rate limits.
- Handle verifier error by checking final.get("status").
Apply this diff:
+import logging from app.modules.facts_check.web_search import search_google from app.modules.facts_check.llm_processing import ( run_claim_extractor_sdk, run_fact_verifier_sdk, ) import re import time +logger = logging.getLogger(__name__) @@ - print(f"🧠 Extracted claims: {claims}") + logger.debug("Extracted claims: %s", claims) @@ - for claim in claims: - print(f"\n🔍 Searching for claim: {claim}") + for claim in claims: + logger.info("Searching for claim: %s", claim) try: results = search_google(claim) if results: results[0]["claim"] = claim search_results.append(results[0]) - print(f"✅ Found result: {results[0]['title']}") + logger.info("Found result: %s", results[0].get("title")) else: - print(f"⚠️ No search result for: {claim}") + logger.warning("No search result for: %s", claim) except Exception as e: - print(f"❌ Search failed for: {claim} -> {e}") + logger.exception("Search failed for claim: %s", claim) + # Polite delay to avoid hammering the search API + time.sleep(1) @@ - final = run_fact_verifier_sdk(search_results) - return final.get("verifications", []), None + final = run_fact_verifier_sdk(search_results) + if final.get("status") != "success": + return [], final.get("message", "Fact verification failed.") + return final.get("verifications", []), NoneAlso applies to: 73-79
backend/app/modules/langgraph_nodes/sentiment.py (1)
63-69: Replace print with logger and consider guarding missing API keyUse a module logger for errors. Also, constructing the Groq client with a missing API key yields confusing runtime failures. Fail fast with a clear message.
Apply this diff:
+import logging import os from groq import Groq from dotenv import load_dotenv load_dotenv() -client = Groq(api_key=os.getenv("GROQ_API_KEY")) +logger = logging.getLogger(__name__) +api_key = os.getenv("GROQ_API_KEY") +if not api_key: + # Fail fast with actionable error; avoid silent misconfigurations + raise RuntimeError("GROQ_API_KEY is not set") +client = Groq(api_key=api_key) @@ except Exception as e: - print(f"Error in sentiment_analysis: {e}") + logger.exception("Error in sentiment_analysis") return { "status": "error", "error_from": "sentiment_analysis", "message": str(e), }backend/app/utils/prompt_templates.py (1)
45-59: Align output schema with PerspectiveOutput and avoid fenced JSONYour structured output model uses fields "perspective" and "reasoning" (string), but the prompt instructs "counter_perspective" and "reasoning_steps" (list) and wraps the JSON in a code fence. This will conflict with structured parsing.
Proposed fix: instruct the model to return flat JSON (no code fence) with keys that match the Pydantic model.
-Generate a logical and respectful *opposite perspective* to the article. -Use *step-by-step reasoning* and return your output in this JSON format: - -```json -{ - "counter_perspective": "<your opposite point of view>", - "reasoning_steps": [ - "<step 1>", - "<step 2>", - "<step 3>", - "...", - "<final reasoning>" - ] -} -``` +Generate a logical and respectful opposite perspective to the article. +Use step-by-step reasoning and return ONLY valid JSON with this schema: +{ + "perspective": "<your opposite point of view>", + "reasoning": "<step-by-step reasoning in a single coherent paragraph>" +}If you prefer to keep a list of steps, update PerspectiveOutput to use reasoning_steps: list[str] and reflect this across the pipeline instead.
backend/app/modules/bias_detection/check_bias.py (1)
65-70: Parse and validate numeric bias scoreThe model can still return extra tokens; parse, validate [0..100], and return an integer.
- bias_score = chat_completion.choices[0].message.content.strip() - - return { - "bias_score": bias_score, - "status": "success", - } + raw = chat_completion.choices[0].message.content.strip() + match = re.search(r"\b\d{1,3}\b", raw) + if not match: + raise ValueError(f"Non-numeric bias score returned: {raw!r}") + score = int(match.group(0)) + if not 0 <= score <= 100: + raise ValueError(f"Bias score out of range: {score}") + return {"bias_score": score, "status": "success"}Add once at the top-level imports (outside the selected range):
import rebackend/app/modules/chat/llm_processing.py (2)
44-45: Replace printing context (PII) with safe logging metadataDo not emit raw context; log only minimal metadata via logger.
- print(context) + logger.debug("ask_llm invoked (context_length=%s, docs=%s)", len(context), len(docs) if docs else 0)Add once at top of file (outside selected range):
import logging logger = logging.getLogger(__name__)
54-62: Add error handling around LLM call and provide clearer failure modeWrap the call to handle network/auth/model errors; log exceptions and re-raise or return a controlled error.
- response = client.chat.completions.create( - model="gemma2-9b-it", - messages=[ - {"role": "system", "content": "Use only the context to answer."}, - {"role": "user", "content": prompt}, - ], - ) + try: + response = client.chat.completions.create( + model="gemma2-9b-it", + messages=[ + {"role": "system", "content": "Use only the context to answer."}, + {"role": "user", "content": prompt}, + ], + ) + except Exception: + logger.exception("LLM request failed in ask_llm") + raisebackend/app/db/vector_store.py (3)
34-36: Use exception chaining and correct typos in error messagesChain exceptions with "from e" and fix spelling for clarity.
-except Exception as e: - raise RuntimeError(f"Error occured while intialising pinecone client:{e}") +except Exception as e: + raise RuntimeError("Error occurred while initializing Pinecone client") from e ... -except Exception as e: - raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}") +except Exception as e: + raise RuntimeError(f"Error occurred while connecting to index {INDEX_NAME}") from eAlso applies to: 58-58
43-50: Replace prints with logger to meet PR objective and avoid stdout noiseUse logging instead of print for index creation/existence messages.
-if not pc.has_index(INDEX_NAME): - print(f"Creating index: {INDEX_NAME}") +if not pc.has_index(INDEX_NAME): + logger.info("Creating index: %s", INDEX_NAME) @@ -else: - print(f"Index '{INDEX_NAME}' already exists") +else: + logger.info("Index %r already exists", INDEX_NAME)Add once at top of file (outside selected range):
import logging logger = logging.getLogger(__name__)Also applies to: 52-52
23-33: Avoid import-time side effects — make Pinecone init lazy or explicitvector_store.py performs Pinecone initialization, may create the index, and opens a connection at import time; store_vectors.py imports
index, which triggers those side effects during import.Points to address
- backend/app/db/vector_store.py
- Module-level operations to move: reading/validating PINECONE_API_KEY,
pc = Pinecone(...),pc.has_index(...)/pc.create_index(...), andindex = pc.Index(...).- backend/app/utils/store_vectors.py
- Line ~24:
from app.db.vector_store import index— importingindexcauses the import-time side effects.Suggested fix (minimal)
- Replace module-level initialization with an explicit initializer or lazy getter, e.g.:
- Expose
def get_index():(lazily initializes client/index and returns it) ordef init_pinecone()called from application startup.- Move PINECONE_API_KEY validation into that function (do not raise on import).
- Update callers (store_vectors.py) to call
get_index()(or receive the index via DI) instead of importingindexat module import.Quick example:
- Change store_vectors import:
- from:
from app.db.vector_store import index- to:
from app.db.vector_store import get_index- then inside store():
index = get_index()Reason: avoids tests and process startup failures (missing env var, network calls, index creation) at import time and allows easier testing/DI.
backend/app/modules/langgraph_nodes/generate_perspective.py (1)
31-41: Mismatch between prompt output fields and PerspectiveOutput schemaPrompt templates currently instruct "counter_perspective" and "reasoning_steps", but this model expects "perspective" (str) and "reasoning" (str). Resolve by aligning the prompt (preferred) or the model fields.
- Option A (preferred): Update prompt output keys to "perspective" and "reasoning" (see prompt_templates.py suggestion).
- Option B: Change PerspectiveOutput to match prompt (e.g., reasoning_steps: list[str], counter_perspective: str) and adapt downstream users accordingly.
backend/app/modules/langgraph_builder.py (1)
47-55: Fix MyState typing: optional fields + perspective type mismatch with downstream usage
- All fields except cleaned_text are optional in practice. Use Required/NotRequired to reflect this and avoid misleading type hints.
- generate_perspective returns an object that judge_perspective accesses with getattr(..., "perspective", ...) (see langgraph_nodes/judge.py). Typing perspective as str is incorrect and will confuse maintainers and tools.
Apply this diff to the state definition:
-class MyState(TypedDict): - cleaned_text: str - facts: list[dict] - sentiment: str - perspective: str - score: int - retries: int - status: str +class MyState(TypedDict): + cleaned_text: Required[str] + facts: NotRequired[list[dict]] + sentiment: NotRequired[str] + # judge_perspective expects an object with a .perspective attribute; allow object or str + perspective: NotRequired[object] + score: NotRequired[int] + retries: NotRequired[int] + status: NotRequired[str]Optionally, if a concrete type exists (e.g., PerspectiveOutput), we can type it more precisely without importing at runtime:
# Place near imports from typing import TYPE_CHECKING if TYPE_CHECKING: from app.modules.langgraph_nodes.generate_perspective import PerspectiveOutput # Then use: # perspective: NotRequired["PerspectiveOutput | str"]backend/app/modules/pipeline.py (3)
59-61: Replace print with logger (aligns with PR objective and avoids noisy stdout)Use the standard logging module and log at debug level instead of printing.
Apply these diffs:
@@ -import json +import json +import logging +logger = logging.getLogger(__name__) @@ - # Optional: pretty print raw_text for debugging - print(json.dumps(result, indent=2, ensure_ascii=False)) + # Optional: pretty print result for debugging + logger.debug("Scraper pipeline result: %s", json.dumps(result, indent=2, ensure_ascii=False))
50-54: Harden against missing extractor output to avoid KeyErrorArticle_extractor.extract() may fail or change shape; indexing raw_text["text"] will raise KeyError. Guard and provide a clear error.
Apply this diff:
- result = {} - cleaned_text = clean_extracted_text(raw_text["text"]) + result = {} + try: + extracted_text = raw_text["text"] + except (TypeError, KeyError): + raise ValueError("Extractor returned no 'text' field") # or return {"status": "error", ...} + cleaned_text = clean_extracted_text(extracted_text)
59-61: Replace remaining print(...) calls with logger (repository-wide)Search of backend/app found 34
print(...)occurrences across 14 files; the print in backend/app/modules/pipeline.py (line 60) is still present.Files/locations to fix:
- backend/app/modules/pipeline.py:60
- print(json.dumps(result, indent=2, ensure_ascii=False))
- backend/app/utils/fact_check_utils.py:46, 53, 61, 67, 69, 71
- backend/app/routes/routes.py:63, 70, 80
- backend/app/db/vector_store.py:44, 52
- backend/app/modules/bias_detection/check_bias.py:37, 38, 73
- backend/app/modules/vector_store/chunk_rag_data.py:98
- backend/app/modules/chat/llm_processing.py:44
- backend/app/modules/langgraph_nodes/store_and_send.py:26, 36, 41, 44
- backend/app/modules/langgraph_nodes/error_handler.py:17, 18, 19
- backend/app/modules/langgraph_nodes/judge.py:65
- backend/app/modules/langgraph_nodes/sentiment.py:64, 81 (commented)
- backend/app/modules/langgraph_nodes/fact_check.py:31, 39
- backend/app/modules/langgraph_nodes/generate_perspective.py:76
- backend/app/modules/facts_check/llm_processing.py:77, 135, 141, 152
Please replace these debug prints with the project logger (e.g., logger.debug/info/error) or remove them if no longer needed.
♻️ Duplicate comments (11)
backend/app/modules/vector_store/chunk_rag_data.py (4)
41-59: Perspective normalization is incomplete; dict-shaped inputs will fail.You compute perspective_data but never use it, and only support attribute access. Dict inputs with perspective/reasoning will raise. Normalize and use perspective_text/perspective_reasoning.
Apply this diff:
- # Validate perspective structure - perspective_data = data["perspective"] - if hasattr(perspective_data, "dict"): - perspective_data = perspective_data.dict() + # Normalize perspective into text and reasoning + perspective = data["perspective"] + if hasattr(perspective, "dict"): + perspective = perspective.dict() + if isinstance(perspective, dict): + if "perspective" not in perspective or "reasoning" not in perspective: + raise ValueError("Perspective dict missing required fields: 'perspective' and 'reasoning'") + perspective_text = perspective["perspective"] + perspective_reasoning = perspective["reasoning"] + else: + if not (hasattr(perspective, "perspective") and hasattr(perspective, "reasoning")): + raise ValueError("Perspective object missing required fields") + perspective_text = perspective.perspective + perspective_reasoning = perspective.reasoning
49-59: Remove unused perspective_obj block; it’s superseded by normalization.This block is redundant and causes the dict case to fail.
Apply this diff:
- # Add counter-perspective chunk - perspective_obj = data["perspective"] - - # Optional safety check - - if not ( - hasattr(perspective_obj, "perspective") - and hasattr(perspective_obj, "reasoning") - ): - raise ValueError("Perspective object missing required fields") + # Add counter-perspective chunk
97-99: Replace print with module logger and keep stack trace.Use a logger per PR goal; logger.exception preserves traceback.
Apply this diff:
- except Exception as e: - print(f"[Error] Failed to chunk the data: {e}") - raise + except Exception as e: + logger.exception("Failed to chunk the data") + raiseAdd this near the imports (top of file):
import logging logger = logging.getLogger(__name__)
60-67: Use normalized variables to avoid attribute errors for dict perspectives.Apply this diff:
chunks.append( { "id": f"{article_id}-perspective", - "text": perspective_obj.perspective, + "text": perspective_text, "metadata": { "type": "counter-perspective", - "reasoning": perspective_obj.reasoning, + "reasoning": perspective_reasoning, "article_id": article_id, }, } )backend/app/modules/langgraph_nodes/fact_check.py (1)
30-44: Replace prints with logger and add function docstring (per earlier review)This echoes the previous review’s guidance: use a module logger and document run_fact_check; also fix “occured” typo.
Apply this diff:
-from app.utils.fact_check_utils import run_fact_check_pipeline +from app.utils.fact_check_utils import run_fact_check_pipeline +import logging +logger = logging.getLogger(__name__) @@ -def run_fact_check(state): +def run_fact_check(state: dict) -> dict: + """ + Run the fact-check pipeline for a given state. + + Args: + state (dict): Expects 'cleaned_text' and optional context. + + Returns: + dict: On success, returns the updated state with 'facts' and 'status'='success'. + On failure, returns an error dict with 'status'='error', 'error_from', and 'message'. + """ try: text = state.get("cleaned_text") @@ - if error_message: - print(f"some error occured in fact_checking:{error_message}") + if error_message: + logger.error("Error in fact_checking: %s", error_message) return { "status": "error", "error_from": "fact_checking", "message": f"{error_message}", } except Exception as e: - print(f"some error occured in fact_checking:{e}") + logger.exception("Unexpected error in fact_checking") return { "status": "error", "error_from": "fact_checking", - "message": f"{e}", + "message": str(e), }backend/app/modules/chat/get_rag_data.py (1)
45-50: Avoid KeyError and simplify match extraction (reuse earlier suggestion)Return a safe, concise list comprehension that tolerates missing keys in the response.
Apply this diff:
- matches = [] - for match in results["matches"]: - matches.append( - {"id": match["id"], "score": match["score"], "metadata": match["metadata"]} - ) - return matches + return [ + {"id": m.get("id"), "score": m.get("score"), "metadata": m.get("metadata")} + for m in results.get("matches", []) + ]backend/app/modules/bias_detection/check_bias.py (1)
37-39: Replace prints of user content with logger; avoid PII leakageRaw article text is printed, and errors are printed. Use a logger and log only metadata (e.g., length). This was flagged previously.
- print(text) - print(json.dumps(text)) + logger.debug("Bias detection invoked (input_length=%s)", len(text) if isinstance(text, str) else "n/a") ... - except Exception as e: - print(f"Error in bias_detection: {e}") + except Exception as e: + logger.exception("Error in bias_detection")Add once at top of file (outside the selected range):
import logging logger = logging.getLogger(__name__)Also applies to: 72-74
backend/app/db/vector_store.py (1)
58-58: Static analysis: prefer B904 (raise from e)This addresses the Ruff B904 hint and improves debuggability. The diff above includes this fix.
backend/app/modules/langgraph_nodes/generate_perspective.py (3)
59-66: Fix f-strings and use local facts variable (bug)Only the first line is an f-string; verdict and explanation are literal. Also iterate over the local facts variable instead of state["facts"].
- facts_str = "\n".join( - [ - f"Claim: {f['original_claim']}\n" - "Verdict: {f['verdict']}\nExplanation: " - "{f['explanation']}" - for f in state["facts"] - ] - ) + facts_str = "\n".join( + [ + f"Claim: {f['original_claim']}\n" + f"Verdict: {f['verdict']}\n" + f"Explanation: {f['explanation']}" + for f in facts + ] + )
75-81: Use logger in exception path and fix typoReplace print with structured logging; "occured" -> "occurred".
- except Exception as e: - print(f"some error occured in generate_perspective:{e}") + except Exception as e: + logger.exception("Error occurred in generate_perspective") return { "status": "error", "error_from": "generate_perspective", - "message": f"{e}", + "message": str(e), }Add once at top of file (outside selected range):
import logging logger = logging.getLogger(__name__)
82-82: Normalize structured LLM output to a plain dict for downstream/serializationReturning a Pydantic model directly can cause serialization issues. Convert to dict and, if desired, expose specific fields consistently.
- return {**state, "perspective": result, "status": "success"} + result_dict = ( + result + if isinstance(result, dict) + else (result.model_dump() if hasattr(result, "model_dump") else result.dict()) + ) + # If downstream expects only the perspective text: + perspective_value = result_dict.get("perspective", result_dict) + return {**state, "perspective": perspective_value, "status": "success"}
🧹 Nitpick comments (21)
backend/app/utils/generate_chunk_id.py (2)
29-33: Deterministic short IDs LGTM; minor grammar nit in error message.Logic is correct and concise. Consider fixing the error string grammar.
Apply this diff:
- raise ValueError("Text must be non-empty string") + raise ValueError("Text must be a non-empty string")
29-33: Optional: expose prefix/length as constants to balance readability vs. collision risk.15 hex chars (~60 bits) makes collisions extremely unlikely, but if IDs are externally visible or long-lived across datasets, consider defining PREFIX = "article-" and HASH_LEN = 15 at module scope for easy adjustment later (e.g., HASH_LEN=16 or 20).
backend/app/modules/vector_store/chunk_rag_data.py (1)
72-93: Optional: validate fact field types/emptiness for robustness.Currently you only check presence. Consider asserting string types and non-empty for original_claim/explanation/source_link and a constrained set for verdict if applicable.
I can provide a minimal schema check snippet if you want stricter validation.
backend/app/modules/chat/embed_query.py (1)
25-31: Avoid import-time model initialization; add type hints and resilient error loggingConstructing SentenceTransformer at import time increases cold-start latency (especially in serverless) and makes module import fail if the model/env is misconfigured. Prefer lazy, cached initialization; also add a return type hint and structured logging.
Apply this diff to add lazy init, type hints, and logging:
+import logging +from functools import lru_cache from sentence_transformers import SentenceTransformer +logger = logging.getLogger(__name__) -embedder = SentenceTransformer("all-MiniLM-L6-v2") +@lru_cache(maxsize=1) +def _get_embedder() -> SentenceTransformer: + # Pin the model name to ensure embedding dimension stability (384 for MiniLM-L6-v2) + return SentenceTransformer("all-MiniLM-L6-v2") -def embed_query(query: str): - embeddings = embedder.encode(query).tolist() - - return embeddings +def embed_query(query: str) -> list[float]: + try: + return _get_embedder().encode(query).tolist() + except Exception: + logger.exception("Failed to embed query") + raisebackend/app/utils/fact_check_utils.py (3)
33-37: Optional: Validate Google API key presence before search loopIf GOOGLE_SEARCH or related env is missing, search_google may fail repeatedly. Consider early validation and a clear error before iterating claims.
1-30: Docstring mentions behavior not enforced (polite delay)The narrative promises a polite delay in searches; before the above refactor it was missing. Ensure the code and docstring stay in sync.
33-37: Potential downstream JSON-parse robustness issue in run_fact_verifier_sdkReferencing backend/app/modules/facts_check/llm_processing.py (snippet provided), parsed can be referenced even if json.loads fails. That function should guard parsed initialization or skip appending on parse failure to prevent NameError and inconsistent outputs.
Would you like me to open a follow-up PR to harden run_fact_verifier_sdk? I can patch it to default parsed to a structured error object on parse failures and avoid prints.
backend/app/modules/langgraph_nodes/sentiment.py (1)
26-62: Optional: add type hints to function for consistencyAdd -> dict return annotation to run_sentiment_sdk for better IDE/type-checker support.
backend/app/modules/chat/get_rag_data.py (2)
34-36: Avoid import-time client/index initialization; validate API key and consider centralizing config
- Import-time Pinecone client/index creation can break module import and hurts cold starts. Lazily initialize and cache instead.
- Validate PINECONE_API_KEY and make index name configurable (env/constant) to prevent hard-coding and drift.
Apply this refactor:
-from dotenv import load_dotenv -from app.modules.chat.embed_query import embed_query -import os - -load_dotenv() - -pc = Pinecone(os.getenv("PINECONE_API_KEY")) -index = pc.Index("perspective") +from dotenv import load_dotenv +from app.modules.chat.embed_query import embed_query +import os +import logging +from functools import lru_cache + +logger = logging.getLogger(__name__) +load_dotenv() + +@lru_cache(maxsize=1) +def _get_index(): + api_key = os.getenv("PINECONE_API_KEY") + if not api_key: + raise RuntimeError("PINECONE_API_KEY is not set") + index_name = os.getenv("PINECONE_INDEX_NAME", "perspective") + pc = Pinecone(api_key) + return pc.Index(index_name) @@ - results = index.query( + results = _get_index().query( vector=embeddings, top_k=top_k, include_metadata=True, namespace="default" )Also consider moving Pinecone constants into a single vector_store module and reusing them here to avoid configuration drift.
13-24: Docstring states “Encodes the input query” here — ensure embed_query already pre-encodesGiven embed_query(query) returns the vector, this function no longer “encodes” itself; consider rephrasing to “Embeds the query and searches Pinecone.”
backend/app/utils/prompt_templates.py (2)
31-33: Remove stray quotes splitting the sentence in the template headerThe single quotes around the line break will render literally in the prompt. Merge the sentence without quotes.
-You are an AI assistant that generates a well-reasoned ' -'counter-perspective to a given article. +You are an AI assistant that generates a well-reasoned counter-perspective to a given article.
13-16: Docstring input type for facts is inaccurateThe prompt currently accepts a single formatted string for facts (facts_str), not a Python list. Update the docstring to avoid confusion.
- facts (list): Verified factual information related to the article. + facts (str): Verified factual information related to the article (formatted string).backend/app/modules/bias_detection/check_bias.py (3)
41-41: Error message references the wrong parameter nameThe function parameter is "text", not "cleaned_text".
- raise ValueError("Missing or empty 'cleaned_text'") + raise ValueError("Missing or empty 'text'")
30-33: Validate GROQ_API_KEY early to fail fast with a clear errorIf the env var is missing, initializing the client succeeds but requests will fail later with an opaque error. Validate upfront.
Example (outside selected range):
api_key = os.getenv("GROQ_API_KEY") if not api_key: raise RuntimeError("GROQ_API_KEY environment variable not set") client = Groq(api_key=api_key)
25-33: Consider lazy client initialization for import-time side effectsCreating external clients at import can hinder testing and module import; consider lazy-init in the function or via a getter.
backend/app/modules/chat/llm_processing.py (1)
30-33: Validate GROQ_API_KEY presence to fail fast (optional)Similar to bias_detection, validate env upfront to avoid opaque runtime errors.
Example (outside selected range):
api_key = os.getenv("GROQ_API_KEY") if not api_key: raise RuntimeError("GROQ_API_KEY environment variable not set") client = Groq(api_key=api_key)backend/app/modules/langgraph_builder.py (3)
1-31: Good, comprehensive module docstring, but the termination claim is misleadingDoc says “Ensures the graph terminates only after successful storage.” Yet there is an explicit error path to error_handler. Either:
- update the doc to reflect error termination, or
- wire error_handler to end (and avoid set_finish_point) so the doc matches behavior.
68-69: Nit: remove trailing comma in set_entry_point for readabilityNo functional change, but avoids a dangling tuple-like style.
Apply this diff:
- graph.set_entry_point( - "sentiment_analysis", - ) + graph.set_entry_point("sentiment_analysis")
90-103: Simplify nested conditional for judge routing for maintainabilityThe nested ternary is hard to scan. A small helper improves clarity and reduces risk of logic mistakes.
You can extract the routing logic:
def _route_from_judge(state: dict) -> str: if state.get("status") == "error": return "error_handler" score = state.get("score", 0) if score < 70: return "store_and_send" if state.get("retries", 0) >= 3 else "generate_perspective" return "store_and_send" graph.add_conditional_edges("judge_perspective", _route_from_judge)backend/app/modules/pipeline.py (2)
42-44: Consider lazy-initializing the compiled graph to avoid heavy work at import timeCompiling the graph at module import makes any import of pipeline.py perform potentially heavy work. Prefer lazy initialization with caching to reduce cold-start latency and make importing cheap.
Suggested pattern:
-# Compile once when module loads -_LANGGRAPH_WORKFLOW = build_langgraph() +# Lazily compile and cache when first needed +_LANGGRAPH_WORKFLOW = None + +def _get_workflow(): + global _LANGGRAPH_WORKFLOW + if _LANGGRAPH_WORKFLOW is None: + _LANGGRAPH_WORKFLOW = build_langgraph() + return _LANGGRAPH_WORKFLOW @@ - result = _LANGGRAPH_WORKFLOW.invoke(state) + result = _get_workflow().invoke(state)
65-68: Optional: wrap workflow invocation with error handling to normalize failuresIf the compiled graph raises, you can return a consistent error structure (matching the rest of the pipeline).
Example:
def run_langgraph_workflow(state: dict): """Execute the pre-compiled LangGraph workflow.""" try: return _get_workflow().invoke(state) except Exception as e: # If you have a logger configured: logger.exception("LangGraph workflow invocation failed") return {"status": "error", "error_from": "langgraph_workflow", "message": str(e)}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these settings in your CodeRabbit configuration.
📒 Files selected for processing (26)
backend/app/db/vector_store.py(3 hunks)backend/app/modules/bias_detection/check_bias.py(3 hunks)backend/app/modules/chat/embed_query.py(1 hunks)backend/app/modules/chat/get_rag_data.py(2 hunks)backend/app/modules/chat/llm_processing.py(3 hunks)backend/app/modules/facts_check/llm_processing.py(1 hunks)backend/app/modules/facts_check/web_search.py(2 hunks)backend/app/modules/langgraph_builder.py(3 hunks)backend/app/modules/langgraph_nodes/error_handler.py(1 hunks)backend/app/modules/langgraph_nodes/fact_check.py(2 hunks)backend/app/modules/langgraph_nodes/generate_perspective.py(3 hunks)backend/app/modules/langgraph_nodes/judge.py(1 hunks)backend/app/modules/langgraph_nodes/sentiment.py(2 hunks)backend/app/modules/langgraph_nodes/store_and_send.py(2 hunks)backend/app/modules/pipeline.py(1 hunks)backend/app/modules/scraper/cleaner.py(4 hunks)backend/app/modules/scraper/extractor.py(6 hunks)backend/app/modules/scraper/keywords.py(3 hunks)backend/app/modules/vector_store/chunk_rag_data.py(2 hunks)backend/app/modules/vector_store/embed.py(1 hunks)backend/app/routes/routes.py(1 hunks)backend/app/utils/fact_check_utils.py(1 hunks)backend/app/utils/generate_chunk_id.py(1 hunks)backend/app/utils/prompt_templates.py(1 hunks)backend/app/utils/store_vectors.py(2 hunks)backend/main.py(2 hunks)
✅ Files skipped from review due to trivial changes (5)
- backend/app/modules/facts_check/llm_processing.py
- backend/app/modules/langgraph_nodes/judge.py
- backend/app/modules/langgraph_nodes/store_and_send.py
- backend/app/routes/routes.py
- backend/app/modules/vector_store/embed.py
🚧 Files skipped from review as they are similar to previous changes (7)
- backend/app/modules/langgraph_nodes/error_handler.py
- backend/app/modules/scraper/cleaner.py
- backend/app/utils/store_vectors.py
- backend/app/modules/scraper/keywords.py
- backend/app/modules/facts_check/web_search.py
- backend/app/modules/scraper/extractor.py
- backend/main.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
backend/app/utils/fact_check_utils.py (2)
backend/app/modules/facts_check/web_search.py (1)
search_google(30-42)backend/app/modules/facts_check/llm_processing.py (2)
run_claim_extractor_sdk(37-82)run_fact_verifier_sdk(85-157)
backend/app/modules/langgraph_builder.py (6)
backend/app/modules/langgraph_nodes/generate_perspective.py (1)
generate_perspective(46-82)backend/app/modules/langgraph_nodes/store_and_send.py (1)
store_and_send(23-51)backend/app/modules/langgraph_nodes/error_handler.py (1)
error_handler(16-25)backend/app/modules/langgraph_nodes/sentiment.py (1)
run_sentiment_sdk(26-69)backend/app/modules/langgraph_nodes/fact_check.py (1)
run_fact_check(21-45)backend/app/modules/langgraph_nodes/judge.py (1)
judge_perspective(30-70)
🪛 Ruff (0.12.2)
backend/app/db/vector_store.py
58-58: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (11)
backend/app/utils/generate_chunk_id.py (1)
1-23: Docstring addition is clear and useful.Good overview, example, and behavior notes. This aligns with the PR objective to add docstrings.
backend/app/modules/vector_store/chunk_rag_data.py (2)
1-24: Module docstring reads well and documents the chunking flow.Clear description of inputs/outputs and chunk types. Nicely done.
27-27: Import path check — OK (no change required)Confirmed: backend/app/utils/generate_chunk_id.py exists and init.py files are present under backend/app/, so
appis a package and the import is consistent.
- backend/app/utils/generate_chunk_id.py — found
- backend/app/init.py (and other app package init.py files) — present
- backend/app/modules/vector_store/chunk_rag_data.py:27 — contains
from app.utils.generate_chunk_id import generate_idbackend/app/modules/chat/embed_query.py (1)
1-20: Docstring addition improves discoverability — LGTMClear, concise module documentation aligned with the model usage.
backend/app/modules/langgraph_nodes/fact_check.py (1)
45-45: Return merge looks goodMerging the verifications into the original state while setting a success status is clean and predictable.
backend/app/modules/langgraph_nodes/sentiment.py (1)
44-46: Prompt formatting change is neutral — LGTMEquivalent content formation with a single f-string; no behavior change.
backend/app/modules/langgraph_builder.py (3)
60-66: LGTM: Node registration is clear and explicitNode names map cleanly to functions; this improves readability when traversing the graph.
71-81: OK on conditional routing; ensure error path reaches a terminal endRouting to error_handler on status == "error" is correct. Make sure error_handler has a path to end (see comment below), otherwise some paths may not terminate cleanly.
104-111: Ensure error paths terminate and confirm StateGraph finish-point semanticsQuick summary: I couldn't find the StateGraph implementation in this repo (so I can't confirm what set_finish_point() does). However, error_handler has no outgoing edge and returns "status": "stopped_due_to_error" (not "error"), so error runs may dead-end. Recommend adding an explicit edge from error_handler -> end and re-evaluating the use of set_finish_point.
Files to check
- backend/app/modules/langgraph_builder.py (store_and_send conditional + set_finish_point)
- backend/app/modules/langgraph_nodes/error_handler.py (returns "stopped_due_to_error")
Suggested change
graph.add_conditional_edges( "store_and_send", lambda x: ("error_handler" if x.get("status") == "error" else "__end__"), ) - graph.set_finish_point("store_and_send") + # Ensure error path also terminates + graph.add_edge("error_handler", "__end__") + # Optionally keep or remove the finish point depending on StateGraph semantics: + # graph.set_finish_point("store_and_send") return graph.compile()Manual verification requested:
- Please confirm StateGraph semantics for set_finish_point() (why it was added).
- Run a smoke test exercising success and error paths to ensure all runs reach end (or behave as intended).
backend/app/modules/pipeline.py (2)
1-32: Docstring is clear and helpfulGood overview of the pipeline stages and responsibilities.
1-32: Docstring mentions RAKE-based keywords; verify actual implementationIf extract_keywords uses a different algorithm (e.g., spaCy, yake, keybert), adjust the docstring to avoid drift.
| """ | ||
| llm_processing.py | ||
| ----------------- | ||
| Handles Large Language Model (LLM) interactions for context-based question answering. | ||
|
|
||
| This module: | ||
| - Connects to the Groq API using credentials from environment variables. | ||
| - Builds a context string from retrieved documents. | ||
| - Sends user questions along with context to the LLM. | ||
| - Returns generated answers. | ||
|
|
||
| Functions: | ||
| build_context(docs: list[dict]) -> str: | ||
| Extracts relevant fields (explanation or reasoning) from document | ||
| metadata and combines them into a single context string. | ||
|
|
||
| ask_llm(question: str, docs: list[dict]) -> str: | ||
| Builds context from the provided documents, sends it along with the | ||
| question to the LLM, and returns the model's answer. | ||
|
|
||
| Environment Variables: | ||
| GROQ_API_KEY (str): API key for authenticating with Groq. | ||
| """ |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
PR objective gap: prints not replaced with logger
This file still uses print statements despite the PR goal to replace them with logger. The diffs above show how to switch to logging.
To verify repository-wide, run:
🏁 Script executed:
#!/bin/bash
rg -n --hidden --no-ignore-vcs '\bprint\(' backend | sed -n '1,200p'Length of output: 3345
Replace remaining print() calls with logger
I scanned the repo and found multiple leftover print() calls that should be converted to logging (logger.debug/info/error) to meet the PR objective.
Files/locations needing attention:
- backend/main.py:47 — print(f"Server is running on http://0.0.0.0:{port}")
- backend/app/utils/fact_check_utils.py:46 — print("❌ Claim extraction failed.")
- backend/app/utils/fact_check_utils.py:53 — print(f"🧠 Extracted claims: {claims}")
- backend/app/utils/fact_check_utils.py:61 — print(f"\n🔍 Searching for claim: {claim}")
- backend/app/utils/fact_check_utils.py:67 — print(f"✅ Found result: {results[0]['title']}")
- backend/app/utils/fact_check_utils.py:69 — print(f"
⚠️ No search result for: {claim}") - backend/app/utils/fact_check_utils.py:71 — print(f"❌ Search failed for: {claim} -> {e}")
- backend/app/routes/routes.py:63 — print(bias_score)
- backend/app/routes/routes.py:70 — print(json.dumps(article_text, indent=2))
- backend/app/routes/routes.py:80 — print(answer)
- backend/app/modules/pipeline.py:60 — print(json.dumps(result, indent=2, ensure_ascii=False))
- backend/app/db/vector_store.py:44 — print(f"Creating index: {INDEX_NAME}")
- backend/app/db/vector_store.py:52 — print(f"Index '{INDEX_NAME}' already exists")
- backend/app/modules/chat/llm_processing.py:44 — print(context)
- backend/app/modules/vector_store/chunk_rag_data.py:98 — print(f"[Error] Failed to chunk the data: {e}")
- backend/app/modules/langgraph_nodes/store_and_send.py:26 — print(state)
- backend/app/modules/langgraph_nodes/store_and_send.py:36 — print("embedding generated successfully!")
- backend/app/modules/langgraph_nodes/store_and_send.py:41 — print("Vectors saved to Pinecone!")
- backend/app/modules/langgraph_nodes/store_and_send.py:44 — print(f"some error occured in store_and_send:{e}")
- backend/app/modules/langgraph_nodes/judge.py:65 — print(f"Error in judge_perspective: {e}")
- backend/app/modules/langgraph_nodes/sentiment.py:64 — print(f"Error in sentiment_analysis: {e}")
- backend/app/modules/langgraph_nodes/sentiment.py:81 — # print("Sentiment Output:", result) (commented)
- backend/app/modules/langgraph_nodes/generate_perspective.py:76 — print(f"some error occured in generate_perspective:{e}")
- backend/app/modules/langgraph_nodes/fact_check.py:31 — print(f"some error occured in fact_checking:{error_message}")
- backend/app/modules/langgraph_nodes/fact_check.py:39 — print(f"some error occured in fact_checking:{e}")
- backend/app/modules/langgraph_nodes/error_handler.py:17 — print("Error detected!")
- backend/app/modules/langgraph_nodes/error_handler.py:18 — print(f"From: {input.get('error_from')}")
- backend/app/modules/langgraph_nodes/error_handler.py:19 — print(f"Message: {input.get('message')}")
- backend/app/modules/bias_detection/check_bias.py:37 — print(text)
- backend/app/modules/bias_detection/check_bias.py:38 — print(json.dumps(text))
- backend/app/modules/bias_detection/check_bias.py:73 — print(f"Error in bias_detection: {e}")
- backend/app/modules/facts_check/llm_processing.py:77 — print(f"Error in claim_extraction: {e}")
- backend/app/modules/facts_check/llm_processing.py:135 — print(content)
- backend/app/modules/facts_check/llm_processing.py:141 — print(f"❌ LLM JSON parse error: {parse_err}")
- backend/app/modules/facts_check/llm_processing.py:152 — print(f"🔥 Error in fact_verification: {e}")
Suggested minimal change pattern (example for backend/app/modules/chat/llm_processing.py):
- At module top:
- add: import logging
- add: logger = logging.getLogger(name)
- Replace:
- print(context)
with: - logger.debug(context) # or logger.info/error as appropriate
- print(context)
Please replace these prints across the repository and ensure a consistent logging configuration is used.
🤖 Prompt for AI Agents
In backend/app/modules/chat/llm_processing.py around lines 1 to 23, replace any
leftover print() calls (e.g., the print(context) at line ~44 referenced in the
review) with structured logging: add "import logging" at the top, create a
module logger via "logger = logging.getLogger(__name__)", and replace print(...)
with the appropriate logger method (logger.debug(...) for diagnostic context
output, logger.info/error where severity differs); ensure you use consistent
logger names and levels so the project's global logging configuration captures
these messages.
| return "\n".join( | ||
| f"{m['metadata'].get('explanation') or m['metadata'].get('reasoning', '')}" | ||
| for m in docs | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Guard against missing metadata keys when building context
Accessing m['metadata'] can raise KeyError. Use .get with defaults.
- return "\n".join(
- f"{m['metadata'].get('explanation') or m['metadata'].get('reasoning', '')}"
- for m in docs
- )
+ return "\n".join(
+ (m.get("metadata", {}) or {}).get("explanation")
+ or (m.get("metadata", {}) or {}).get("reasoning", "")
+ for m in docs or []
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return "\n".join( | |
| f"{m['metadata'].get('explanation') or m['metadata'].get('reasoning', '')}" | |
| for m in docs | |
| ) | |
| return "\n".join( | |
| (m.get("metadata", {}) or {}).get("explanation") | |
| or (m.get("metadata", {}) or {}).get("reasoning", "") | |
| for m in docs or [] | |
| ) |
🤖 Prompt for AI Agents
In backend/app/modules/chat/llm_processing.py around lines 36 to 39, the code
accesses m['metadata'] which can raise a KeyError; change the access to use
m.get('metadata', {}) and then read explanation via metadata.get('explanation')
or metadata.get('reasoning', '') so missing metadata or keys yield an empty
string, and keep the existing join logic to build the context.
| error_handler, | ||
| ) |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Logger adoption not complete across nodes
PR objective says “Replaced print statements with a logger,” but referenced nodes still use print (error_handler, fact_check, store_and_send, sentiment, judge). Migrate those to a structured logger for consistency.
Run this script to list remaining print() calls under backend/app/modules:
🏁 Script executed:
#!/bin/bash
rg -n --no-heading --line-number -S 'print\(' backend/app/modules | sed 's/^/print: /'Length of output: 2540
Logger adoption not complete — replace remaining print() calls
Search found active print() usages under backend/app/modules. Migrate these to the structured logger (e.g., logger = logging.getLogger(name); logger.info/error/exception) for consistency.
Files/locations to fix:
- backend/app/modules/pipeline.py:60 — print(json.dumps(result, indent=2, ensure_ascii=False))
- backend/app/modules/chat/llm_processing.py:44 — print(context)
- backend/app/modules/vector_store/chunk_rag_data.py:98 — print(f"[Error] Failed to chunk the data: {e}")
- backend/app/modules/langgraph_nodes/judge.py:65 — print(f"Error in judge_perspective: {e}")
- backend/app/modules/langgraph_nodes/sentiment.py:64 — print(f"Error in sentiment_analysis: {e}")
- backend/app/modules/langgraph_nodes/sentiment.py:81 — # print("Sentiment Output:", result) (commented)
- backend/app/modules/langgraph_nodes/store_and_send.py:26 — print(state)
- backend/app/modules/langgraph_nodes/store_and_send.py:36 — print("embedding generated successfully!")
- backend/app/modules/langgraph_nodes/store_and_send.py:41 — print("Vectors saved to Pinecone!")
- backend/app/modules/langgraph_nodes/store_and_send.py:44 — print(f"some error occured in store_and_send:{e}")
- backend/app/modules/langgraph_nodes/generate_perspective.py:76 — print(f"some error occured in generate_perspective:{e}")
- backend/app/modules/langgraph_nodes/fact_check.py:31 — print(f"some error occured in fact_checking:{error_message}")
- backend/app/modules/langgraph_nodes/fact_check.py:39 — print(f"some error occured in fact_checking:{e}")
- backend/app/modules/langgraph_nodes/error_handler.py:17 — print("Error detected!")
- backend/app/modules/langgraph_nodes/error_handler.py:18 — print(f"From: {input.get('error_from')}")
- backend/app/modules/langgraph_nodes/error_handler.py:19 — print(f"Message: {input.get('message')}")
- backend/app/modules/facts_check/llm_processing.py:77 — print(f"Error in claim_extraction: {e}")
- backend/app/modules/facts_check/llm_processing.py:135 — print(content)
- backend/app/modules/facts_check/llm_processing.py:141 — print(f"❌ LLM JSON parse error: {parse_err}")
- backend/app/modules/facts_check/llm_processing.py:152 — print(f"🔥 Error in fact_verification: {e}")
- backend/app/modules/bias_detection/check_bias.py:37 — print(text)
- backend/app/modules/bias_detection/check_bias.py:38 — print(json.dumps(text))
- backend/app/modules/bias_detection/check_bias.py:73 — print(f"Error in bias_detection: {e}")
Recommendation: replace these prints with logger calls and ensure the app config initializes logging handlers so messages are structured and configurable.
🤖 Prompt for AI Agents
backend/app/modules/langgraph_builder.py around lines 41-42 and multiple files
under backend/app/modules contain remaining print() calls; replace each print()
with a module-level structured logger (e.g., logger =
logging.getLogger(__name__)) and use the appropriate logger method
(logger.info/debug/warning/error/exception) instead of print, include exception
details via logger.exception or exc_info=True for catches, and ensure the
application logging configuration initializes handlers/formatters so messages
are structured and configurable; update the listed files/lines to use logger.*
calls and remove print() usages accordingly.
| from typing_extensions import TypedDict | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Import NotRequired/Required for accurate TypedDict field optionality
To align the state typing with actual usage (most fields are optional and accessed with .get()), import NotRequired and Required here.
Apply this diff:
-from typing_extensions import TypedDict
+from typing_extensions import TypedDict, NotRequired, Required📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from typing_extensions import TypedDict | |
| from typing_extensions import TypedDict, NotRequired, Required |
🤖 Prompt for AI Agents
In backend/app/modules/langgraph_builder.py around lines 44-45, the TypedDict
usage needs explicit NotRequired and Required imports so optional fields reflect
actual usage (most accessed with .get()). Update the import statement to also
import NotRequired and Required from typing_extensions, and then use NotRequired
for fields that are optional and Required for mandatory ones in the TypedDict
declarations to match runtime access patterns.
There was a problem hiding this comment.
Actionable comments posted: 6
🔭 Outside diff range comments (3)
backend/app/db/vector_store.py (1)
38-39: Chain exceptions and fix typos in error messages (ruff B904)Preserve original tracebacks with
raise ... from eand fix spelling in messages.-except Exception as e: - raise RuntimeError(f"Error occured while intialising pinecone client:{e}") +except Exception as e: + raise RuntimeError("Error occurred while initializing Pinecone client") from e-except Exception as e: - raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}") +except Exception as e: + raise RuntimeError(f"Error occurred while connecting to the index {INDEX_NAME}") from eAlso applies to: 62-62
backend/app/utils/fact_check_utils.py (1)
78-80: Propagate verifier errors instead of returning ([], None)Surface the error message from the verifier to callers.
- final = run_fact_verifier_sdk(search_results) - return final.get("verifications", []), None + final = run_fact_verifier_sdk(search_results) + if final.get("status") != "success": + return [], final.get("message", "Fact verification failed.") + return final.get("verifications", []), Nonebackend/app/modules/langgraph_nodes/store_and_send.py (1)
31-36: Preserve exception chaining with ‘raise … from e’ (Ruff B904)Re-raise with explicit cause to keep the original traceback and satisfy B904.
- except KeyError as e: - raise Exception(f"Missing required data field for chunking: {e}") - except Exception as e: - raise Exception(f"Failed to chunk data: {e}") + except KeyError as e: + raise Exception(f"Missing required data field for chunking: {e}") from e + except Exception as e: + raise Exception(f"Failed to chunk data: {e}") from e @@ - except Exception as e: - raise Exception(f"failed to embed chunks: {e}") + except Exception as e: + raise Exception(f"Failed to embed chunks: {e}") from eAlso applies to: 37-41
♻️ Duplicate comments (4)
backend/app/modules/bias_detection/check_bias.py (1)
40-42: Do not log raw user/article text; log metadata onlyThis was flagged previously. Logging raw content risks PII leakage. Log minimal metadata (length) instead.
- logger.debug(f"Raw article text: {text}") - logger.debug(f"JSON dump of text: {json.dumps(text)}") + logger.debug( + "Bias detection invoked (input_length=%s)", + len(text) if hasattr(text, "__len__") else "n/a", + )backend/app/modules/vector_store/chunk_rag_data.py (1)
44-61: Bug: Normalized perspective_data is computed but unused; attribute access will fail for dict inputsYou normalize the incoming perspective into
perspective_databut then ignore it and operate onperspective_objvia attribute access. Ifdata["perspective"]is a dict (or a Pydantic model converted to dict), this raises “Perspective object missing required fields”. Use the normalized dict consistently and support both Pydantic v2.model_dump()and v1.dict().Apply this diff to normalize and use the dict consistently:
- # Validate perspective structure - perspective_data = data["perspective"] - if hasattr(perspective_data, "dict"): - perspective_data = perspective_data.dict() + # Normalize perspective structure to a dict + perspective_data = data["perspective"] + if hasattr(perspective_data, "model_dump"): + perspective_data = perspective_data.model_dump() + elif hasattr(perspective_data, "dict"): + perspective_data = perspective_data.dict() + elif not isinstance(perspective_data, dict): + raise ValueError("Perspective must be a dict or Pydantic model") @@ - # Add counter-perspective chunk - perspective_obj = data["perspective"] - - # Optional safety check - - if not ( - hasattr(perspective_obj, "perspective") - and hasattr(perspective_obj, "reasoning") - ): - raise ValueError("Perspective object missing required fields") + # Add counter-perspective chunk + if not ("perspective" in perspective_data and "reasoning" in perspective_data): + raise ValueError("Perspective dict missing required fields: 'perspective', 'reasoning'") @@ - "text": perspective_obj.perspective, + "text": perspective_data["perspective"], @@ - "reasoning": perspective_obj.reasoning, + "reasoning": perspective_data["reasoning"],Also applies to: 63-73
backend/app/modules/langgraph_nodes/generate_perspective.py (2)
62-69: Bug: missing f-strings and wrong iterable for facts_strOnly the first line is an f-string, so verdict/explanation render literally. Also, iterate over the local
factsvariable, notstate["facts"].- facts_str = "\n".join( - [ - f"Claim: {f['original_claim']}\n" - "Verdict: {f['verdict']}\nExplanation: " - "{f['explanation']}" - for f in state["facts"] - ] - ) + facts_str = "\n".join( + [ + f"Claim: {f['original_claim']}\n" + f"Verdict: {f['verdict']}\n" + f"Explanation: {f['explanation']}" + for f in facts + ] + )
85-85: Normalize structured LLM result to a plain dict for downstream + serializationYou return a Pydantic model in
state["perspective"]. Downstreamchunk_rag_datapartially normalizes but then uses attribute access — this mismatch causes runtime/serialization issues. Return a plain dict.- return {**state, "perspective": result, "status": "success"} + result_dict = ( + result + if isinstance(result, dict) + else (result.model_dump() if hasattr(result, "model_dump") else result.dict()) + ) + return {**state, "perspective": result_dict, "status": "success"}Pair this with the corresponding normalization/use change suggested in chunk_rag_data.py.
🧹 Nitpick comments (16)
backend/app/logging/logging_config.py (1)
31-37: Avoid duplicate/parent handlers: disable propagationIf the application (or a library) configures root handlers, messages will be emitted twice. Set propagate to False once you add your own handlers.
console_handler.setLevel(logging.INFO) console_handler.setFormatter(formatter) logger.addHandler(console_handler) - # File Handler + # File Handler file_handler = logging.FileHandler("app.log") file_handler.setLevel(logging.DEBUG) # Keep detailed logs in file file_handler.setFormatter(formatter) logger.addHandler(file_handler) + # Prevent messages from bubbling up to root handlers and being duplicated + logger.propagate = Falsebackend/app/modules/facts_check/llm_processing.py (4)
139-141: Strip fenced JSON more robustlyHandle both
json and plainfences, including surrounding whitespace/newlines.- content = re.sub(r"^```json|```$", "", content).strip() + content = re.sub(r"^```(?:json)?\s*|\s*```$", "", content, flags=re.DOTALL).strip()
150-154: Return shape: remove redundant ‘claim’ or structure per-claimReturning a single
claim(the last one processed) is misleading. Either drop it or return a per-claim mapping.Minimal adjustment:
- return { - "claim": claim, - "verifications": results_list, - "status": "success", - } + return { + "verifications": results_list, + "status": "success", + }If you prefer a per-claim mapping:
# Example results_list.append({"claim": claim, **parsed})
71-73: Consider reducing sensitive/debug payloadsStoring full extracted claims and raw LLM output in logs can leak content/PII. Log sizes or hashes instead.
Example:
logger.debug("Extracted claims (chars=%d)", len(extracted_claims)) logger.debug("LLM output (chars=%d)", len(content))Also applies to: 140-141
35-38: Fail fast if GROQ_API_KEY is missingImprove error discoverability when credentials are not configured.
api_key = os.getenv("GROQ_API_KEY") if not api_key: raise ValueError("GROQ_API_KEY environment variable is required") client = Groq(api_key=api_key)backend/app/utils/fact_check_utils.py (1)
63-63: Nit: avoid leading newline in logs and prefer parameterized loggingRemoves a cosmetic newline and avoids f-strings in logs.
- logger.info(f"\n🔍 Searching for claim: {claim}") + logger.info("🔍 Searching for claim: %s", claim)backend/app/modules/bias_detection/check_bias.py (2)
44-46: Fix error message to reference ‘text’, not ‘cleaned_text’Aligns with the function signature and reduces confusion.
- if not text: - logger.error("Missing or empty 'cleaned_text'") - raise ValueError("Missing or empty 'cleaned_text'") + if not text: + logger.error("Missing or empty 'text'") + raise ValueError("Missing or empty 'text'")
35-36: Optional: fail fast if GROQ_API_KEY is missingAdds clearer feedback when credentials are not configured.
api_key = os.getenv("GROQ_API_KEY") if not api_key: raise ValueError("GROQ_API_KEY environment variable is required") client = Groq(api_key=api_key)backend/app/modules/vector_store/chunk_rag_data.py (2)
33-33: Add type hints to improve readability and toolingDeclare parameter and return types for the public function.
-def chunk_rag_data(data): +def chunk_rag_data(data: dict) -> list[dict]:
100-102: Minor: simplify exception logging message
logger.exceptionalready attaches the traceback and exception info. Avoid interpolatingein the message to reduce duplication.- logger.exception(f"Failed to chunk the data: {e}") + logger.exception("Failed to chunk the data")backend/app/modules/langgraph_nodes/store_and_send.py (2)
37-45: Handle empty vectors explicitly to avoid storage errors
store(vectors)raises on empty input. Guard against this and log instead of treating it as a hard error.try: vectors = embed_chunks(chunks) - if vectors: - logger.info(f"Embedding complete — {len(vectors)} vectors generated.") + if vectors: + logger.info(f"Embedding complete — {len(vectors)} vectors generated.") + else: + logger.warning("Embedding produced no vectors; skipping storage.") + return {**state, "status": "success"} except Exception as e: raise Exception(f"Failed to embed chunks: {e}") from e - - store(vectors) - logger.info("Vectors successfully stored in Pinecone.") + if vectors: + store(vectors) + logger.info("Vectors successfully stored in Pinecone.")
29-29: Consider redacting sensitive fields from state in debug logsDepending on the pipeline, state may contain PII or large payloads. Redact or log only keys/summary in debug to keep logs lean and safe.
backend/app/modules/pipeline.py (2)
48-65: Add type hints for public functionsPublic APIs benefit from explicit typing; also annotate run_langgraph_workflow’s return.
-def run_scraper_pipeline(url: str) -> dict: +def run_scraper_pipeline(url: str) -> dict: ... -def run_langgraph_workflow(state: dict): +def run_langgraph_workflow(state: dict) -> dict:Also applies to: 67-71
45-46: Optional: lazy-initialize the compiled graphCompiling at import can slow startup and complicate testing. Consider lazy init with a module-level getter if import-time cost becomes an issue.
backend/app/modules/langgraph_nodes/generate_perspective.py (2)
49-49: Add a return type hintSmall improvement for readability and tooling.
-def generate_perspective(state): +def generate_perspective(state: dict) -> dict:
39-46: Sanity check: model and prompt are module-level singletonsThis is fine for performance, but if you anticipate hot-reload or environment-driven changes (e.g., model name), consider moving construction into a factory or reading from config.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these settings in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
backend/app/db/vector_store.py(2 hunks)backend/app/logging/logging_config.py(1 hunks)backend/app/modules/bias_detection/check_bias.py(3 hunks)backend/app/modules/chat/llm_processing.py(2 hunks)backend/app/modules/facts_check/llm_processing.py(5 hunks)backend/app/modules/langgraph_nodes/error_handler.py(1 hunks)backend/app/modules/langgraph_nodes/fact_check.py(2 hunks)backend/app/modules/langgraph_nodes/generate_perspective.py(3 hunks)backend/app/modules/langgraph_nodes/judge.py(2 hunks)backend/app/modules/langgraph_nodes/sentiment.py(3 hunks)backend/app/modules/langgraph_nodes/store_and_send.py(2 hunks)backend/app/modules/pipeline.py(2 hunks)backend/app/modules/vector_store/chunk_rag_data.py(2 hunks)backend/app/routes/routes.py(2 hunks)backend/app/utils/fact_check_utils.py(1 hunks)backend/main.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- backend/app/routes/routes.py
- backend/app/modules/langgraph_nodes/judge.py
- backend/app/modules/langgraph_nodes/error_handler.py
- backend/app/modules/langgraph_nodes/fact_check.py
- backend/main.py
- backend/app/modules/langgraph_nodes/sentiment.py
- backend/app/modules/chat/llm_processing.py
🧰 Additional context used
🧬 Code Graph Analysis (8)
backend/app/utils/fact_check_utils.py (3)
backend/app/modules/facts_check/web_search.py (1)
search_google(30-42)backend/app/modules/facts_check/llm_processing.py (2)
run_claim_extractor_sdk(40-87)run_fact_verifier_sdk(90-162)backend/app/logging/logging_config.py (1)
setup_logger(4-39)
backend/app/modules/bias_detection/check_bias.py (1)
backend/app/logging/logging_config.py (1)
setup_logger(4-39)
backend/app/modules/facts_check/llm_processing.py (1)
backend/app/logging/logging_config.py (1)
setup_logger(4-39)
backend/app/modules/pipeline.py (5)
backend/app/modules/scraper/extractor.py (1)
Article_extractor(31-110)backend/app/modules/scraper/cleaner.py (1)
clean_extracted_text(32-105)backend/app/modules/scraper/keywords.py (1)
extract_keywords(26-43)backend/app/modules/langgraph_builder.py (1)
build_langgraph(57-111)backend/app/logging/logging_config.py (1)
setup_logger(4-39)
backend/app/modules/vector_store/chunk_rag_data.py (2)
backend/app/utils/generate_chunk_id.py (1)
generate_id(29-33)backend/app/logging/logging_config.py (1)
setup_logger(4-39)
backend/app/modules/langgraph_nodes/generate_perspective.py (1)
backend/app/logging/logging_config.py (1)
setup_logger(4-39)
backend/app/db/vector_store.py (1)
backend/app/logging/logging_config.py (1)
setup_logger(4-39)
backend/app/modules/langgraph_nodes/store_and_send.py (4)
backend/app/modules/vector_store/chunk_rag_data.py (1)
chunk_rag_data(33-102)backend/app/modules/vector_store/embed.py (1)
embed_chunks(31-50)backend/app/utils/store_vectors.py (1)
store(33-55)backend/app/logging/logging_config.py (1)
setup_logger(4-39)
🪛 Ruff (0.12.2)
backend/app/db/vector_store.py
62-62: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
backend/app/modules/langgraph_nodes/store_and_send.py
41-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (10)
backend/app/logging/logging_config.py (1)
21-25: LGTM: clear formatter and sensible console levelThe formatter is readable and the console INFO threshold avoids noisy DEBUG logs in stdout.
Also applies to: 27-31
backend/app/db/vector_store.py (1)
48-48: LGTM: informative index lifecycle logsThe messages for index creation and existence are clear and actionable.
Also applies to: 56-56
backend/app/utils/fact_check_utils.py (1)
55-56: LGTM: clear logging for extraction and search flowInformative messages for extracted claims, found results, and warnings on misses. Good operational visibility.
Also applies to: 70-74
backend/app/modules/bias_detection/check_bias.py (1)
68-75: LGTM: structured success/error responses and loggingGood use of logger.info for the score and logger.exception for failures, with a consistent response envelope.
backend/app/modules/vector_store/chunk_rag_data.py (1)
1-24: Docstring and logger integration look goodClear module docstring outlining inputs/outputs and the logger setup align with the PR goals.
Also applies to: 28-31
backend/app/modules/langgraph_nodes/store_and_send.py (1)
1-15: Good: module docstring and logger usageDocstring explains the workflow well, and print statements were correctly replaced with logger calls.
Also applies to: 21-24
backend/app/modules/pipeline.py (1)
1-32: Docstring and logger integration LGTMModule-level docstring is clear; logger replaces prints consistently. The JSON debug dump is helpful for observability.
Also applies to: 39-43, 61-63, 70-71
backend/app/modules/langgraph_nodes/generate_perspective.py (3)
49-53: Verify retry semantics: incrementing retries on success may be unintended
retriesis incremented before generation regardless of outcome. If routing conditions useretriesto decide on backoff or stopping, incrementing on success can cause premature termination in later steps.Would you confirm the intended meaning of
retries? If it should increment only on failure, move the increment into the except path:- try: - retries = state.get("retries", 0) - state["retries"] = retries + 1 + try: + retries = state.get("retries", 0) @@ - except Exception as e: + except Exception as e: + state["retries"] = retries + 1 logger.exception(f"Error in generate_perspective: {e}") return { "status": "error", "error_from": "generate_perspective", "message": f"{e}", }If the current behavior is intentional, please ignore.
23-29: Consistent logging usage is goodUsing setup_logger and logger.exception preserves tracebacks and aligns with the PR intent.
Also applies to: 41-47, 78-85
1-20: Docstring is clear and helpfulWell-structured module docstring describing responsibilities and outputs.
| logger = setup_logger(__name__) | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid import-time network side effects; lazily initialize the index
Creating clients, checking/creating indexes, and connecting at import time makes the module fragile (e.g., breaks tests/migrations when env isn’t ready). Encapsulate initialization behind a function and cache the result.
Example approach:
# At module level
_index = None
_pc = None
def get_index():
global _index, _pc
if _index:
return _index
api_key = os.getenv("PINECONE_API_KEY")
if not api_key:
raise ValueError("PINECONE_API_KEY environment variable is required")
try:
_pc = Pinecone(api_key=api_key)
except Exception as e:
raise RuntimeError("Error occurred while initializing Pinecone client") from e
if not _pc.has_index(INDEX_NAME):
logger.info("Creating index: %s", INDEX_NAME)
_pc.create_index(
name=INDEX_NAME,
dimension=DIMENSIONS,
metric=METRIC,
spec=ServerlessSpec(cloud=CloudProvider.AWS, region=AwsRegion.US_EAST_1),
)
else:
logger.info("Index '%s' already exists", INDEX_NAME)
try:
_index = _pc.Index(INDEX_NAME)
except Exception as e:
raise RuntimeError(f"Error occurred while connecting to the index {INDEX_NAME}") from e
return _indexWould you like me to refactor the module accordingly and update call sites?
Also applies to: 46-55
| logger.setLevel(logging.DEBUG) # Log everything from DEBUG and above | ||
|
|
||
| # Avoid adding duplicate handlers if logger already set | ||
| if logger.handlers: | ||
| return logger |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Prevent unintentional log-level overrides on subsequent calls
Calling setLevel(DEBUG) before checking existing handlers can silently override a logger that's already configured elsewhere. Move setLevel after the early return.
- logger.setLevel(logging.DEBUG) # Log everything from DEBUG and above
-
- # Avoid adding duplicate handlers if logger already set
- if logger.handlers:
- return logger
+ # Avoid adding duplicate handlers if logger already set
+ if logger.handlers:
+ return logger
+ logger.setLevel(logging.DEBUG) # Log everything from DEBUG and above📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.setLevel(logging.DEBUG) # Log everything from DEBUG and above | |
| # Avoid adding duplicate handlers if logger already set | |
| if logger.handlers: | |
| return logger | |
| # Avoid adding duplicate handlers if logger already set | |
| if logger.handlers: | |
| return logger | |
| logger.setLevel(logging.DEBUG) # Log everything from DEBUG and above |
🤖 Prompt for AI Agents
In backend/app/logging/logging_config.py around lines 15 to 19,
setLevel(logging.DEBUG) is called before checking for existing handlers which
can inadvertently change an already-configured logger's level; move the setLevel
call to after the early-return (after the "if logger.handlers: return logger")
so the function exits without modifying level when handlers already exist, and
only apply logger.setLevel(logging.DEBUG) when initializing handlers for a fresh
logger.
| # File Handler | ||
| file_handler = logging.FileHandler("app.log") | ||
| file_handler.setLevel(logging.DEBUG) # Keep detailed logs in file | ||
| file_handler.setFormatter(formatter) | ||
| logger.addHandler(file_handler) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Use a rotating file handler with UTF-8 and guard file I/O errors
A plain FileHandler can grow unbounded and may fail on read-only filesystems. Prefer RotatingFileHandler, set encoding, and handle OSErrors gracefully.
- # File Handler
- file_handler = logging.FileHandler("app.log")
- file_handler.setLevel(logging.DEBUG) # Keep detailed logs in file
- file_handler.setFormatter(formatter)
- logger.addHandler(file_handler)
+ # File Handler (rotating)
+ try:
+ file_handler = logging.handlers.RotatingFileHandler(
+ "app.log", maxBytes=10_485_760, backupCount=3, encoding="utf-8"
+ )
+ except OSError as err:
+ logger.warning("File logging disabled (cannot open app.log): %s", err)
+ else:
+ file_handler.setLevel(logging.DEBUG) # Keep detailed logs in file
+ file_handler.setFormatter(formatter)
+ logger.addHandler(file_handler)Add this import at the top of the file:
import logging.handlers🤖 Prompt for AI Agents
In backend/app/logging/logging_config.py around lines 33 to 37, replace the
plain FileHandler with a logging.handlers.RotatingFileHandler configured with
encoding='utf-8' and reasonable maxBytes and backupCount values; wrap the
handler creation in a try/except OSError block so that on failure you fall back
to a StreamHandler (stderr) and still set level and formatter; also add the
import "import logging.handlers" at the top of the file.
| try: | ||
| parsed = json.loads(content) | ||
| except Exception as parse_err: | ||
| print(f"❌ LLM JSON parse error: {parse_err}") | ||
| logger.error(f"LLM JSON parse error: {parse_err}") | ||
|
|
||
| results_list.append(parsed) | ||
|
|
There was a problem hiding this comment.
Bug: ‘parsed’ may be referenced before assignment when JSON parse fails
If json.loads raises, parsed is undefined and results_list.append(parsed) will raise UnboundLocalError. Provide a fallback object to keep the pipeline moving.
- try:
- parsed = json.loads(content)
- except Exception as parse_err:
- logger.error(f"LLM JSON parse error: {parse_err}")
-
- results_list.append(parsed)
+ try:
+ parsed = json.loads(content)
+ except Exception as parse_err:
+ logger.error("LLM JSON parse error: %s; content=%r", parse_err, content[:500])
+ parsed = {
+ "verdict": "Unknown",
+ "explanation": "Failed to parse LLM output",
+ "original_claim": claim,
+ "source_link": source,
+ "_raw": content,
+ }
+ results_list.append(parsed)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In backend/app/modules/facts_check/llm_processing.py around lines 143 to 149,
the try/except catches json.loads errors but still appends parsed even when
parsing failed, causing an UnboundLocalError; modify the except block to assign
a safe fallback (e.g., an empty dict or a minimal error object that includes the
parse error and optionally the raw content) to parsed before logging, so
results_list.append(parsed) always has a defined value; ensure the fallback
contains enough info for downstream handling (like {"_parse_error":
str(parse_err), "raw": content}) and keep the existing logger.error call.
| facts_str = "\n".join( | ||
| [ | ||
| f"Claim: {f['original_claim']}\n" | ||
| "Verdict: {f['verdict']}\nExplanation: " | ||
| "{f['explanation']}" | ||
| for f in state["facts"] | ||
| ] | ||
| ) |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Repo-wide consistency check for perspective representation
After normalizing generate_perspective to return a dict and updating chunk_rag_data to consume it, ensure there are no other sites expecting attribute access (e.g., .perspective).
Also applies to: 85-85
🏁 Script executed:
#!/bin/bash
# Find attribute-style and dict-style uses of 'perspective' across the repo.
set -euo pipefail
echo "== Attribute-style access =="
rg -n "\.perspective\b" -A 2 || true
echo
echo "== Dict-style access =="
rg -n "\['perspective'\]" -A 2 || trueLength of output: 842
Action: fix remaining attribute-style uses of perspective
I ran a repo-wide search — there are still attribute-style accesses that will break if generate_perspective now returns a dict. Update these locations to use dict-style access or adjust the data shape returned to preserve attribute access.
Files to fix:
- backend/app/modules/vector_store/chunk_rag_data.py:66
- Replace:
"text": perspective_obj.perspective, - With:
"text": perspective_obj['perspective'],(and similarly for other perspective fields)
- Replace:
- frontend/app/analyze/results/page.tsx:161–163
- Uses:
perspective.perspectiveandperspective.reasoning - Confirm the frontend receives a plain JSON object (dot access is fine for JS). If the backend now nests/renames fields or returns a different shape, update these accesses to match (e.g.,
perspective['perspective']or adjust the API/serialization).
- Uses:
Also re-check any other Python files for .perspective usage after making these changes.
| if state.get("status") != "success": | ||
| print("❌ Claim extraction failed.") | ||
| logger.error("❌ Claim extraction failed.") | ||
| return [], "Claim extraction failed." |
There was a problem hiding this comment.
Bug: checking ‘state’ instead of claim-extractor ‘result’
This condition will almost always fail, short-circuiting the pipeline. Check the extractor’s return instead.
- if state.get("status") != "success":
+ if result.get("status") != "success":
logger.error("❌ Claim extraction failed.")
return [], "Claim extraction failed."📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if state.get("status") != "success": | |
| print("❌ Claim extraction failed.") | |
| logger.error("❌ Claim extraction failed.") | |
| return [], "Claim extraction failed." | |
| if result.get("status") != "success": | |
| logger.error("❌ Claim extraction failed.") | |
| return [], "Claim extraction failed." |
🤖 Prompt for AI Agents
In backend/app/utils/fact_check_utils.py around lines 47 to 49, the code is
incorrectly checking state.get("status") instead of the claim extractor's
returned value, causing the pipeline to short-circuit; update the condition to
inspect the extractor result (e.g., result.get("status") or the actual variable
name returned by the extractor), log the error using that result when status !=
"success", and only then return the empty list and error string so the pipeline
proceeds correctly when the extractor succeeded.
Tasks done-
Summary by CodeRabbit
New Features
Refactor
Documentation
Chores