-
Notifications
You must be signed in to change notification settings - Fork 76
store_and_send Node: Embedding & Pinecone Storage + Frontend Sync
#105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…case before storing
WalkthroughThis update adds a fact-checking pipeline with LLM-based claim extraction and verification using Groq API, integrated with SerpAPI web search. The LangGraph workflow is enhanced with typed states and updated nodes for sentiment, perspective generation, and judgment using LLMs. Vector storage is implemented via chunking, embedding, and Pinecone integration. NLTK dependencies and prompt templates are introduced. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant LangGraph
participant FactCheckNode
participant LLMProcessing
participant WebSearch
participant FactVerifier
participant PerspectiveGen
participant Judge
participant Chunker
User->>LangGraph: Submit article
LangGraph->>FactCheckNode: Process state
FactCheckNode->>LLMProcessing: Extract claims
LLMProcessing->>FactCheckNode: Return claims
FactCheckNode->>WebSearch: Search for each claim
WebSearch->>FactCheckNode: Return search results
FactCheckNode->>FactVerifier: Verify claims with evidence
FactVerifier->>FactCheckNode: Return verdicts
FactCheckNode->>LangGraph: Return facts
LangGraph->>PerspectiveGen: Generate counter-perspective
PerspectiveGen->>LangGraph: Return perspective
LangGraph->>Judge: Score perspective
Judge->>LangGraph: Return score
LangGraph->>Chunker: Chunk data for storage
Chunker->>LangGraph: Return chunks
LangGraph->>User: Send results
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 18
🧹 Nitpick comments (12)
new-backend/app/modules/langgraph_nodes/sentiment.py (1)
35-35: Consider increasing max_tokens for reliability.While 3 tokens works for single-word responses, it might be too restrictive if the model occasionally includes punctuation or formatting. Consider using 5-10 tokens for better reliability.
- max_tokens=3, + max_tokens=5,new-backend/app/utils/prompt_templates.py (1)
4-5: Fix string formatting issue.There's an unnecessary line break in the string that creates awkward formatting.
-You are an AI assistant that generates a well-reasoned ' -'counter-perspective to a given article. +You are an AI assistant that generates a well-reasoned counter-perspective to a given article.new-backend/app/modules/langgraph_nodes/store_and_send.py (1)
8-8: Remove debug print statement.The debug print statement should be removed from production code as it may expose sensitive state information.
- print(state)new-backend/app/modules/scraper/cleaner.py (1)
2-2: Consider if NLTK import is actually needed.The NLTK library is imported but not used anywhere in the current
clean_extracted_textfunction, which only uses regex operations. If NLTK functionality is planned for future use, consider adding a comment explaining the intended usage.new-backend/app/modules/vector_store/chunk_rag_data.py (1)
27-27: Fix typo in metadata key.The key "explaination" should be "explanation" for correct spelling.
- "explaination": fact["explaination"], + "explanation": fact["explanation"],new-backend/app/modules/langgraph_nodes/fact_check.py (1)
6-10: Consider removing redundant validation.The function validates
cleaned_textpresence but then passes the entirestatetorun_fact_check_pipeline. Since the pipeline should handle missing text internally, this validation might be redundant.Consider either removing this check or ensuring the pipeline actually requires this validation:
def run_fact_check(state): try: - text = state.get("cleaned_text") - - if not text: - raise ValueError("Missing or empty 'cleaned_text' in state") - verifications, error_message = run_fact_check_pipeline(state)new-backend/app/modules/langgraph_nodes/judge.py (1)
39-43: Consider validating score range before clamping.The current approach silently clamps out-of-range values, which might hide unexpected LLM outputs. Consider explicit validation for better error visibility.
- m = re.search(r"\b(\d{1,3})\b", raw) + m = re.search(r"\b(\d{1,3})\b", raw) if not m: raise ValueError(f"Couldn't parse a score from: '{raw}'") - score = max(0, min(100, int(m.group(1)))) + score = int(m.group(1)) + if not 0 <= score <= 100: + print(f"Warning: Score {score} outside expected range [0-100], clamping.") + score = max(0, min(100, score))new-backend/app/modules/langgraph_builder.py (1)
52-54: Remove unnecessary trailing comma.graph.set_entry_point( - "sentiment_analysis", + "sentiment_analysis" )new-backend/app/modules/langgraph_nodes/generate_perspective.py (2)
6-6: Remove redundant variable assignment.The intermediate
promptvariable is unnecessary sincegeneration_promptis only used once.-prompt = generation_prompt - # ... other code ... -chain = prompt | structured_llm +chain = generation_prompt | structured_llmAlso applies to: 24-24
50-50: Fix typo in error message.- print(f"some error occured in generate_perspective:{e}") + print(f"Error occurred in generate_perspective: {e}")new-backend/app/modules/facts_check/llm_processing.py (2)
107-109: Improve markdown code block parsing to handle case variations.The current regex only handles lowercase "json" and might miss valid markdown blocks.
-# Strip markdown code blocks if present - content = re.sub(r"^```json|```$", "", content).strip() + # Strip markdown code blocks if present (case-insensitive) + content = re.sub(r"^```(?:json|JSON)?|```$", "", content, flags=re.IGNORECASE).strip()
52-52: Maintain consistency in error logging format.The codebase uses different styles for error messages - some with emojis (❌, 🔥) and some without. Consider standardizing the format.
- print(f"Error in claim_extraction: {e}") + print(f"❌ Error in claim_extraction: {e}") # ... and ... - print(f"🔥 Error in fact_verification: {e}") + print(f"❌ Error in fact_verification: {e}")Also applies to: 115-115, 132-132
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
new-backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
new-backend/app/modules/facts_check/llm_processing.py(1 hunks)new-backend/app/modules/facts_check/web_search.py(1 hunks)new-backend/app/modules/langgraph_builder.py(5 hunks)new-backend/app/modules/langgraph_nodes/fact_check.py(2 hunks)new-backend/app/modules/langgraph_nodes/generate_perspective.py(2 hunks)new-backend/app/modules/langgraph_nodes/judge.py(1 hunks)new-backend/app/modules/langgraph_nodes/sentiment.py(1 hunks)new-backend/app/modules/langgraph_nodes/store_and_send.py(1 hunks)new-backend/app/modules/scraper/cleaner.py(1 hunks)new-backend/app/modules/vector_store/chunk_rag_data.py(1 hunks)new-backend/app/utils/fact_check_utils.py(1 hunks)new-backend/app/utils/generate_chunk_id.py(1 hunks)new-backend/app/utils/prompt_templates.py(1 hunks)new-backend/pyproject.toml(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
new-backend/app/modules/langgraph_nodes/store_and_send.py (1)
new-backend/app/modules/vector_store/chunk_rag_data.py (1)
chunk_rag_data(4-34)
new-backend/app/modules/langgraph_nodes/fact_check.py (1)
new-backend/app/utils/fact_check_utils.py (1)
run_fact_check_pipeline(8-38)
new-backend/app/modules/vector_store/chunk_rag_data.py (1)
new-backend/app/utils/generate_chunk_id.py (1)
generate_id(4-7)
new-backend/app/modules/langgraph_builder.py (2)
new-backend/app/modules/langgraph_nodes/sentiment.py (1)
run_sentiment_sdk(10-53)new-backend/app/modules/langgraph_nodes/error_handler.py (1)
error_handler(3-11)
🪛 Ruff (0.11.9)
new-backend/app/modules/langgraph_nodes/store_and_send.py
9-9: Local variable chunks is assigned to but never used
Remove assignment to unused variable chunks
(F841)
🪛 Flake8 (7.2.0)
new-backend/app/modules/langgraph_nodes/store_and_send.py
[error] 9-9: local variable 'chunks' is assigned to but never used
(F841)
🔇 Additional comments (7)
new-backend/pyproject.toml (1)
10-10: All added dependencies are up to date and free of known security issues
- duckduckgo-search (>=8.0.4)
• Latest upstream: 2025.4.4 (released 2025-04-04)
• No critical/high/medium vulnerabilities reported
• Note: the package was removed from PyPI in April 2025 (non-security compliance/hardening reasons); verify your install source or consider vendoring if that poses an issue- langchain-community (>=0.2.0)
• Latest upstream: 0.2.19
• No public advisories found- langchain-groq (>=0.1.0)
• Latest upstream: 0.3.4
• No known vulnerabilities- nltk (>=3.5)
• Latest upstream: 3.9.1
• No known security issues; earlier 3.8.x releases have been yankedNo further action required unless you need to pin to exact versions or address the PyPI removal of duckduckgo-search.
new-backend/app/modules/langgraph_nodes/sentiment.py (1)
39-39: No uppercase sentiment checks detected; lowercase conversion is safeA search for downstream usage of “Positive”, “Negative”, or “Neutral” found only:
- The prompt in
sentiment.py(“ Positive, Negative, or Neutral.”)- The passthrough in
generate_perspective.py:
"sentiment": state.get("sentiment", "neutral")- A JSON config using
"baseColor": "neutral"No code compares against uppercase sentiment values. Lowercasing won’t break existing consumers.
new-backend/app/utils/prompt_templates.py (1)
3-32: Well-structured prompt template with clear instructions.The prompt template is well-designed with clear instructions, appropriate placeholders, and structured JSON output format. This should work effectively for generating counter-perspectives.
new-backend/app/modules/langgraph_nodes/store_and_send.py (1)
21-24: State structure matches frontend expectations
TheMyStateTypedDict inlanggraph_builder.pydefines exactly these keys:
- cleaned_text
- facts
- sentiment
- perspective
- score
- retries
- status
The
store_and_sendnode returns{**state, "status": "success"}, preserving all existing fields and settingstatuscorrectly. No additional fields are required.new-backend/app/modules/langgraph_nodes/fact_check.py (1)
11-20: Good integration with the new fact-checking pipeline.The error handling structure is well-implemented and maintains consistency with the existing error response format.
new-backend/app/modules/langgraph_builder.py (2)
29-30: Method updates look good.The changes to use
run_sentiment_sdk, expliciterror_handler.error_handler,add_conditional_edges, and"__end__"align with the updated API and improve code clarity.Also applies to: 49-50, 56-61, 100-101
14-22: No type mismatch forperspective—keep asstrVerification shows that across the entire codebase:
- The only assignments to
perspectiveare string operations (e.g. inbackend/app/services/counter_service.py).- There are no attribute-style accesses like
obj.perspective.- No
PerspectiveOutputclass orgenerate_perspectivefunction returning an object was found.Leave the TypedDict definition as-is:
class MyState(TypedDict): cleaned_text: str facts: list[dict] sentiment: str - perspective: dict # proposed change—remove + perspective: str score: int retries: int status: strLikely an incorrect or invalid review comment.
| # from app.utils.vector_store import save_to_vector_db | ||
| from app.modules.vector_store.chunk_rag_data import chunk_rag_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify import path for chunk_rag_data function.
Ensure the import path is correct and the function exists at the specified location.
🏁 Script executed:
#!/bin/bash
# Verify the chunk_rag_data function exists at the imported path
fd -t f chunk_rag_data.py --exec cat {} \;Length of output: 960
Integrate chunked data storage and clean up debug code
The import for chunk_rag_data is correct—no change needed there. The function is currently incomplete and contains leftover debug code:
• Re-enable and use your vector-store save function
- # from app.utils.vector_store import save_to_vector_db
+ from app.utils.vector_store import save_to_vector_db
from app.modules.vector_store.chunk_rag_data import chunk_rag_data• Remove the debug print and actually store the chunks
- chunks = chunk_rag_data(data)
- print("Chunks:", chunks)
+ chunks = chunk_rag_data(data)
+ save_to_vector_db(chunks)• Add error handling around both chunking and storage calls to surface failures
• Ensure any unused variables are removed once storage is implemented
File to update:
new-backend/app/modules/langgraph_nodes/store_and_send.py
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # from app.utils.vector_store import save_to_vector_db | |
| from app.modules.vector_store.chunk_rag_data import chunk_rag_data | |
| # File: new-backend/app/modules/langgraph_nodes/store_and_send.py | |
| from app.utils.vector_store import save_to_vector_db | |
| from app.modules.vector_store.chunk_rag_data import chunk_rag_data | |
| def store_and_send(data): | |
| # … any pre-processing … | |
| chunks = chunk_rag_data(data) | |
| save_to_vector_db(chunks) | |
| # … any post-processing … |
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/store_and_send.py at the beginning of
the file, re-enable the import for the vector-store save function by
uncommenting it. In the function where chunk_rag_data is used, remove any debug
print statements and replace them with calls to the vector-store save function
to store the chunked data. Add try-except blocks around both the chunking and
storage operations to catch and log any errors that occur. Finally, remove any
variables that become unused after implementing the storage logic to keep the
code clean.
| try: | ||
| nltk.data.find('corpora/stopwords') | ||
| nltk.data.find('corpora/punkt_tab') | ||
|
|
||
| except LookupError: | ||
| nltk.download('stopwords') | ||
| nltk.download('punkt_tab') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Move NLTK data initialization to avoid import-time delays.
Downloading NLTK data at module import time can significantly slow application startup. Consider moving this logic to a dedicated initialization function or lazy-loading approach.
-try:
- nltk.data.find('corpora/stopwords')
- nltk.data.find('corpora/punkt_tab')
-
-except LookupError:
- nltk.download('stopwords')
- nltk.download('punkt_tab')
+def ensure_nltk_data():
+ """Ensure required NLTK data is available, downloading if necessary."""
+ try:
+ nltk.data.find('corpora/stopwords')
+ nltk.data.find('corpora/punkt_tab')
+ except LookupError:
+ nltk.download('stopwords')
+ nltk.download('punkt_tab')Then call ensure_nltk_data() when NLTK functionality is actually needed.
🤖 Prompt for AI Agents
In new-backend/app/modules/scraper/cleaner.py around lines 4 to 10, the NLTK
data check and download happen at import time, causing startup delays. Refactor
this code by moving the NLTK data initialization into a separate function, such
as ensure_nltk_data(), that performs the check and downloads if necessary. Then,
call this function only when NLTK functionality is required, avoiding delays
during module import.
| prompt = f""" | ||
| You are an expert evaluator. Please rate the following counter-perspective | ||
| on originality, reasoning quality, and factual grounding. Provide ONLY | ||
| a single integer score from 0 (very poor) to 100 (excellent). | ||
|
|
||
| === Perspective to score === | ||
| {text} | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Clarify whether evaluating "perspective" or "counter-perspective".
The prompt mentions "counter-perspective" but the code extracts and processes "perspective". This inconsistency could affect the scoring accuracy.
🏁 Script executed:
#!/bin/bash
# Check how perspective is used across the codebase
rg -A 3 -B 3 "counter-perspective|perspective" --type pyLength of output: 20874
Align “perspective” terminology in judge prompt
There’s a mismatch between the code’s perspective attribute (which holds the generated counter-perspective) and the prompt wording. To avoid confusion, update the wording in judge.py (or rename the field) so they match.
• File: new-backend/app/modules/langgraph_nodes/judge.py
• Change the prompt wording on line 21 from “counter-perspective” to “perspective” (or vice-versa rename the attribute to counter_perspective across your models).
• Example diff:
prompt = f"""
-You are an expert evaluator. Please rate the following counter-perspective
+You are an expert evaluator. Please rate the following perspective
on originality, reasoning quality, and factual grounding. Provide ONLY
a single integer score from 0 (very poor) to 100 (excellent).
=== Perspective to score ===
{text}
"""• (Optional) For full consistency, consider renaming the PerspectiveOutput.perspective field to counter_perspective in generate_perspective.py and update any downstream references (e.g. vector-store metadata).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| prompt = f""" | |
| You are an expert evaluator. Please rate the following counter-perspective | |
| on originality, reasoning quality, and factual grounding. Provide ONLY | |
| a single integer score from 0 (very poor) to 100 (excellent). | |
| === Perspective to score === | |
| {text} | |
| """ | |
| prompt = f""" | |
| You are an expert evaluator. Please rate the following perspective | |
| on originality, reasoning quality, and factual grounding. Provide ONLY | |
| a single integer score from 0 (very poor) to 100 (excellent). | |
| === Perspective to score === | |
| {text} | |
| """ |
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/judge.py around lines 20 to 27, the
prompt text uses "counter-perspective" while the code attribute is named
"perspective," causing inconsistency. To fix this, update the prompt wording on
line 21 to replace "counter-perspective" with "perspective" so the terminology
matches. Optionally, for full consistency, rename the attribute in related files
like generate_perspective.py and update all references accordingly.
| facts_str = "\n".join([f"Claim: {f['original_claim']}\n" | ||
| "Verdict: {f['verdict']}\nExplanation: " | ||
| "{f['explanation']}" for f in state["facts"]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix f-string formatting bug.
Lines 41-42 are not f-strings but contain {} placeholders. This will result in literal {f['verdict']} text instead of the actual values.
- facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
- "Verdict: {f['verdict']}\nExplanation: "
- "{f['explanation']}" for f in state["facts"]])
+ facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
+ f"Verdict: {f['verdict']}\nExplanation: "
+ f"{f['explanation']}" for f in state["facts"]])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| facts_str = "\n".join([f"Claim: {f['original_claim']}\n" | |
| "Verdict: {f['verdict']}\nExplanation: " | |
| "{f['explanation']}" for f in state["facts"]]) | |
| facts_str = "\n".join([f"Claim: {f['original_claim']}\n" | |
| f"Verdict: {f['verdict']}\nExplanation: " | |
| f"{f['explanation']}" for f in state["facts"]]) |
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/generate_perspective.py around lines
40 to 42, the string concatenation uses curly braces for variable placeholders
but the strings are not marked as f-strings, causing the placeholders to be
treated as literal text. Fix this by adding the 'f' prefix to all strings
containing placeholders so that the expressions inside curly braces are properly
evaluated and formatted.
| load_dotenv() | ||
|
|
||
| client = Groq(api_key=os.getenv("GROQ_API_KEY")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add validation for required API key.
The code should verify that the API key exists to fail fast with a clear error message.
load_dotenv()
-client = Groq(api_key=os.getenv("GROQ_API_KEY"))
+api_key = os.getenv("GROQ_API_KEY")
+if not api_key:
+ raise ValueError("GROQ_API_KEY environment variable is not set")
+client = Groq(api_key=api_key)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| load_dotenv() | |
| client = Groq(api_key=os.getenv("GROQ_API_KEY")) | |
| load_dotenv() | |
| api_key = os.getenv("GROQ_API_KEY") | |
| if not api_key: | |
| raise ValueError("GROQ_API_KEY environment variable is not set") | |
| client = Groq(api_key=api_key) |
🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py around lines 7 to 9,
the code initializes the Groq client with an API key from the environment
without checking if the key exists. Add a validation step after loading the
environment variables to check if the "GROQ_API_KEY" is present. If it is
missing, raise an exception or exit with a clear error message indicating the
missing API key to ensure the application fails fast and clearly.
| return { | ||
| "claim": claim, | ||
| "verifications": results_list, | ||
| "status": "success", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove incorrect "claim" field from return value.
The function returns "claim": claim which uses the last claim from the loop. This is incorrect when processing multiple claims. The function already returns all verifications in the list.
return {
- "claim": claim,
"verifications": results_list,
"status": "success",
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return { | |
| "claim": claim, | |
| "verifications": results_list, | |
| "status": "success", | |
| } | |
| return { | |
| "verifications": results_list, | |
| "status": "success", | |
| } |
🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py around lines 125 to
129, remove the "claim" field from the returned dictionary because it
incorrectly uses only the last claim from the loop. Instead, return only the
"verifications" list and the "status" to accurately represent all processed
claims.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
new-backend/app/modules/langgraph_nodes/store_and_send.py (1)
9-17: Complete the implementation - chunks are created but not used.The function creates chunks but doesn't store them in Pinecone as mentioned in the PR objectives. This is the same issue flagged in previous reviews and remains unresolved.
Based on the PR objectives, this function should:
- Generate embeddings for each chunk
- Store embeddings in Pinecone vector database
- Use the chunks variable that's currently unused
The current implementation is incomplete and doesn't fulfill the PR requirements.
🧹 Nitpick comments (2)
new-backend/app/modules/langgraph_nodes/store_and_send.py (2)
8-8: Remove debug print statement.The debug print statement should be removed from production code.
- print(state)
12-14: Improve exception handling with proper chaining.Use
raise ... from errto preserve the original exception context for better debugging.- except KeyError as e: - raise Exception(f"Missing required data field for chunking: {e}") - except Exception as e: - raise Exception(f"Failed to chunk data: {e}") + except KeyError as e: + raise Exception(f"Missing required data field for chunking: {e}") from e + except Exception as e: + raise Exception(f"Failed to chunk data: {e}") from e
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
new-backend/app/modules/langgraph_nodes/store_and_send.py(1 hunks)new-backend/app/modules/vector_store/chunk_rag_data.py(1 hunks)new-backend/app/utils/generate_chunk_id.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- new-backend/app/modules/vector_store/chunk_rag_data.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
new-backend/app/modules/langgraph_nodes/store_and_send.py (1)
new-backend/app/modules/vector_store/chunk_rag_data.py (1)
chunk_rag_data(4-65)
🪛 Ruff (0.11.9)
new-backend/app/modules/langgraph_nodes/store_and_send.py
10-10: Local variable chunks is assigned to but never used
Remove assignment to unused variable chunks
(F841)
12-12: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
14-14: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🪛 Flake8 (7.2.0)
new-backend/app/modules/langgraph_nodes/store_and_send.py
[error] 10-10: local variable 'chunks' is assigned to but never used
(F841)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
new-backend/app/modules/langgraph_nodes/store_and_send.py (1)
1-3: Complete the implementation - vector storage still missing.The
save_to_vector_dbimport remains commented out while chunking and embedding have been added. This confirms the incomplete implementation identified in previous reviews.
🧹 Nitpick comments (1)
new-backend/app/modules/langgraph_nodes/store_and_send.py (1)
5-5: Fix formatting: add blank line before function definition.PEP 8 requires two blank lines before top-level function definitions.
+ def store_and_send(state):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
new-backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
new-backend/app/modules/langgraph_nodes/store_and_send.py(1 hunks)new-backend/app/modules/vector_store/chunk_rag_data.py(1 hunks)new-backend/app/modules/vector_store/embed.py(1 hunks)new-backend/app/utils/generate_chunk_id.py(1 hunks)new-backend/pyproject.toml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- new-backend/app/modules/vector_store/chunk_rag_data.py
- new-backend/app/utils/generate_chunk_id.py
- new-backend/pyproject.toml
🧰 Additional context used
🧬 Code Graph Analysis (1)
new-backend/app/modules/langgraph_nodes/store_and_send.py (2)
new-backend/app/modules/vector_store/chunk_rag_data.py (1)
chunk_rag_data(4-73)new-backend/app/modules/vector_store/embed.py (1)
embed_chunks(7-30)
🪛 Flake8 (7.2.0)
new-backend/app/modules/langgraph_nodes/store_and_send.py
[error] 5-5: expected 2 blank lines, found 1
(E302)
🪛 Ruff (0.11.9)
new-backend/app/modules/langgraph_nodes/store_and_send.py
12-12: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
14-14: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
20-20: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (3)
new-backend/app/modules/vector_store/embed.py (3)
1-4: Good choice of embedding model and efficient global instance.The SentenceTransformer model "all-MiniLM-L6-v2" is a well-balanced choice for general text embedding tasks, offering good performance with reasonable computational requirements. Using a global instance is efficient and avoids reloading the model on each function call.
7-10: Well-designed function signature and early return optimization.The function has clear type hints and handles the empty input case efficiently with an early return.
13-30: Robust validation and efficient embedding generation.The validation logic provides clear error messages with specific indices, making debugging easier. The embedding generation efficiently processes all texts at once, and the output format with "id", "values", and "metadata" is well-structured for vector database storage.
| print(state) | ||
| try: | ||
| chunks = chunk_rag_data(state) | ||
| except KeyError as e: | ||
| raise Exception(f"Missing required data field for chunking: {e}") | ||
| except Exception as e: | ||
| raise Exception(f"Failed to chunk data: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove debug code and improve exception handling.
The print statement should be removed for production code. Additionally, exception handling should use proper chaining to preserve the original error context.
- print(state)
try:
chunks = chunk_rag_data(state)
except KeyError as e:
- raise Exception(f"Missing required data field for chunking: {e}")
+ raise Exception(f"Missing required data field for chunking: {e}") from e
except Exception as e:
- raise Exception(f"Failed to chunk data: {e}")
+ raise Exception(f"Failed to chunk data: {e}") from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(state) | |
| try: | |
| chunks = chunk_rag_data(state) | |
| except KeyError as e: | |
| raise Exception(f"Missing required data field for chunking: {e}") | |
| except Exception as e: | |
| raise Exception(f"Failed to chunk data: {e}") | |
| try: | |
| chunks = chunk_rag_data(state) | |
| except KeyError as e: | |
| raise Exception(f"Missing required data field for chunking: {e}") from e | |
| except Exception as e: | |
| raise Exception(f"Failed to chunk data: {e}") from e |
🧰 Tools
🪛 Ruff (0.11.9)
12-12: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
14-14: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/store_and_send.py around lines 8 to
14, remove the print(state) debug statement and update the exception handling to
use proper exception chaining by adding "from e" when re-raising exceptions,
preserving the original error context.
| try: | ||
| vectors = embed_chunks(chunks) | ||
| if vectors: | ||
| print("embedding generated successfully!") | ||
| except Exception as e: | ||
| raise Exception(f"failed to embed chunks: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve exception handling consistency.
Apply the same exception chaining improvement here as recommended for the chunking section.
try:
vectors = embed_chunks(chunks)
if vectors:
print("embedding generated successfully!")
except Exception as e:
- raise Exception(f"failed to embed chunks: {e}")
+ raise Exception(f"failed to embed chunks: {e}") from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| vectors = embed_chunks(chunks) | |
| if vectors: | |
| print("embedding generated successfully!") | |
| except Exception as e: | |
| raise Exception(f"failed to embed chunks: {e}") | |
| try: | |
| vectors = embed_chunks(chunks) | |
| if vectors: | |
| print("embedding generated successfully!") | |
| except Exception as e: | |
| raise Exception(f"failed to embed chunks: {e}") from e |
🧰 Tools
🪛 Ruff (0.11.9)
20-20: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/store_and_send.py around lines 15 to
20, the exception handling raises a new Exception without preserving the
original traceback. Modify the except block to use "raise Exception(f'failed to
embed chunks: {e}') from e" to maintain exception chaining and improve
debugging.
…he pinecone with metadata and original text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (5)
new-backend/app/modules/facts_check/llm_processing.py (2)
7-9: Add validation for required API key.The code should verify that the API key exists to fail fast with a clear error message.
load_dotenv() -client = Groq(api_key=os.getenv("GROQ_API_KEY")) +api_key = os.getenv("GROQ_API_KEY") +if not api_key: + raise ValueError("GROQ_API_KEY environment variable is not set") +client = Groq(api_key=api_key)
119-123: Remove incorrect "claim" field from return value.The function returns
"claim": claimwhich uses the last claim from the loop. This is incorrect when processing multiple claims. The function already returns all verifications in the list.return { - "claim": claim, "verifications": results_list, "status": "success", }new-backend/app/modules/langgraph_nodes/store_and_send.py (3)
9-9: Remove debug print statement.Debug print statements should be removed from production code or replaced with proper logging.
Apply this diff to remove the debug code:
- print(state)
10-15: Improve exception chaining for better error traceability.The exception handling should preserve the original error context as recommended by the static analysis tool.
Apply this diff to improve exception chaining:
except KeyError as e: - raise Exception(f"Missing required data field for chunking: {e}") + raise Exception(f"Missing required data field for chunking: {e}") from e except Exception as e: - raise Exception(f"Failed to chunk data: {e}") + raise Exception(f"Failed to chunk data: {e}") from e
16-21: Remove debug print and improve exception chaining.The debug print statement should be removed and exception chaining should be improved.
Apply this diff to address both issues:
if vectors: - print("embedding generated successfully!") except Exception as e: - raise Exception(f"failed to embed chunks: {e}") + raise Exception(f"failed to embed chunks: {e}") from e
🧹 Nitpick comments (3)
new-backend/app/modules/facts_check/llm_processing.py (2)
78-80: Fix missing space in system prompt.The system prompt has a missing space that makes it grammatically incorrect.
- "Your job is to determine whether the given" - " claim is True, False" - "based on the provided web search evidence." + "Your job is to determine whether the given " + "claim is True or False " + "based on the provided web search evidence."
107-109: Consider improving markdown stripping logic.The current regex pattern may not handle all markdown variations. Consider using a more robust pattern or a dedicated markdown parser.
-# Strip markdown code blocks if present - content = re.sub(r"^```json|```$", "", content).strip() + # Strip markdown code blocks if present + content = re.sub(r"^```(?:json)?\s*|```\s*$", "", content, flags=re.MULTILINE).strip()new-backend/app/modules/langgraph_nodes/store_and_send.py (1)
23-24: Complete implementation achieved - remove debug print.Great work completing the vector storage implementation! The function now properly stores vectors in Pinecone as required by the PR objectives. However, the debug print should be removed.
Apply this diff to remove the debug print:
store(vectors) - print("Vectors saved to Pinecone!")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
new-backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
new-backend/app/db/vector_store.py(1 hunks)new-backend/app/modules/facts_check/llm_processing.py(1 hunks)new-backend/app/modules/facts_check/web_search.py(1 hunks)new-backend/app/modules/langgraph_nodes/store_and_send.py(1 hunks)new-backend/app/utils/fact_check_utils.py(1 hunks)new-backend/app/utils/store_vectors.py(1 hunks)new-backend/pyproject.toml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- new-backend/app/modules/facts_check/web_search.py
- new-backend/pyproject.toml
- new-backend/app/utils/fact_check_utils.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
new-backend/app/modules/langgraph_nodes/store_and_send.py (3)
new-backend/app/modules/vector_store/chunk_rag_data.py (1)
chunk_rag_data(4-73)new-backend/app/modules/vector_store/embed.py (1)
embed_chunks(7-30)new-backend/app/utils/store_vectors.py (1)
store(10-32)
🪛 Ruff (0.11.9)
new-backend/app/utils/store_vectors.py
32-32: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
new-backend/app/modules/langgraph_nodes/store_and_send.py
13-13: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
15-15: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
21-21: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
new-backend/app/db/vector_store.py
14-14: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
40-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (8)
new-backend/app/modules/facts_check/llm_processing.py (1)
12-58: LGTM: Claim extraction function is well-implemented.The function properly handles state input validation, error handling, and API interaction. The prompt structure is clear and the return format correctly augments the original state.
new-backend/app/db/vector_store.py (3)
5-7: LGTM - Proper environment variable validation.The environment variable handling correctly validates the presence of the required API key and raises an appropriate error if missing.
16-19: LGTM - Appropriate constants for vector configuration.The constants for index name, dimensions (384), and cosine metric are well-defined and align with the embedding model specifications mentioned in the codebase.
22-34: LGTM - Proper index creation with serverless specification.The index creation logic correctly checks for existence before creating and uses appropriate serverless specifications for AWS US East 1.
new-backend/app/utils/store_vectors.py (2)
10-21: LGTM - Well-documented function with clear interface.The function signature, documentation, and parameter descriptions are well-structured and clearly communicate the expected behavior and exceptions.
22-23: LGTM - Proper input validation.The validation correctly checks for empty vectors list and raises an appropriate ValueError.
new-backend/app/modules/langgraph_nodes/store_and_send.py (2)
1-3: LGTM - Proper imports for the vector storage pipeline.The imports correctly reference the chunking, embedding, and storage functions needed for the complete pipeline implementation.
6-37: Excellent implementation of the complete vector storage pipeline.The function now successfully implements the full pipeline described in the PR objectives:
- ✅ Chunks the cleaned_text using
chunk_rag_data- ✅ Generates embeddings for each chunk using
embed_chunks- ✅ Stores embeddings in Pinecone using the
storefunction- ✅ Returns the updated state for frontend sync
This addresses the previously incomplete implementation and fulfills all the PR requirements. The error handling covers each step appropriately, and the function integrates well with the LangGraph pipeline.
| try: | ||
| parsed = json.loads(content) | ||
| except Exception as parse_err: | ||
| print(f"❌ LLM JSON parse error: {parse_err}") | ||
|
|
||
| results_list.append(parsed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined variable error in JSON parsing.
When JSON parsing fails, the parsed variable is not defined, but line 117 still tries to append it to results_list, which will cause a NameError.
# Try parsing the JSON response
try:
parsed = json.loads(content)
except Exception as parse_err:
print(f"❌ LLM JSON parse error: {parse_err}")
+ parsed = {
+ "verdict": "Error",
+ "explanation": f"JSON parsing failed: {parse_err}",
+ "original_claim": claim,
+ "source_link": source
+ }
results_list.append(parsed)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| parsed = json.loads(content) | |
| except Exception as parse_err: | |
| print(f"❌ LLM JSON parse error: {parse_err}") | |
| results_list.append(parsed) | |
| # Try parsing the JSON response | |
| try: | |
| parsed = json.loads(content) | |
| except Exception as parse_err: | |
| print(f"❌ LLM JSON parse error: {parse_err}") | |
| parsed = { | |
| "verdict": "Error", | |
| "explanation": f"JSON parsing failed: {parse_err}", | |
| "original_claim": claim, | |
| "source_link": source | |
| } | |
| results_list.append(parsed) |
🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py around lines 112 to
117, the variable 'parsed' is used outside the try-except block without
guaranteed initialization, causing a NameError if JSON parsing fails. To fix
this, initialize 'parsed' to a default value (e.g., None) before the try block
or only append 'parsed' to 'results_list' inside the try block after successful
parsing, ensuring it is defined before use.
| try: | ||
|
|
||
| # Initialize Pinecone client | ||
| pc = Pinecone(api_key=PINECONE_API_KEY) | ||
|
|
||
| except Exception as e: | ||
| raise RuntimeError(f"Error occured while intialising pinecone client:{e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo and improve exception chaining.
There's a typo in the error message and the exception handling should preserve the original error context.
Apply this diff to fix the issues:
- raise RuntimeError(f"Error occured while intialising pinecone client:{e}")
+ raise RuntimeError(f"Error occurred while initializing pinecone client: {e}") from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| # Initialize Pinecone client | |
| pc = Pinecone(api_key=PINECONE_API_KEY) | |
| except Exception as e: | |
| raise RuntimeError(f"Error occured while intialising pinecone client:{e}") | |
| try: | |
| # Initialize Pinecone client | |
| pc = Pinecone(api_key=PINECONE_API_KEY) | |
| except Exception as e: | |
| raise RuntimeError(f"Error occurred while initializing pinecone client: {e}") from e |
🧰 Tools
🪛 Ruff (0.11.9)
14-14: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In new-backend/app/db/vector_store.py around lines 8 to 14, fix the typo in the
error message by changing "occured" to "occurred" and "intialising" to
"initialising". Also, improve exception handling by using "raise
RuntimeError(...) from e" to preserve the original exception context.
| try: | ||
| # Connect to the index | ||
| index = pc.Index(INDEX_NAME) | ||
| except Exception as e: | ||
| raise RuntimeError("Error occured while " | ||
| f"connecting to the index {INDEX_NAME}:{e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo and improve exception chaining.
There's a typo in the error message and the exception handling should preserve the original error context.
Apply this diff to fix the issues:
- raise RuntimeError("Error occured while "
+ raise RuntimeError("Error occurred while "
f"connecting to the index {INDEX_NAME}:{e}")
+ f"connecting to the index {INDEX_NAME}: {e}") from eCommittable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.11.9)
40-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In new-backend/app/db/vector_store.py around lines 36 to 41, fix the typo in the
error message by changing "occured" to "occurred" and improve exception handling
by using "raise ... from e" to preserve the original exception context when
re-raising the RuntimeError.
| try: | ||
| index.upsert(vectors, namespace=namespace) | ||
| logger.info(f"Successfully stored {len(vectors)} " | ||
| f"vectors in namespace '{namespace}'") | ||
| except Exception as e: | ||
| logger.error("Failed to store " | ||
| f"vectors in namespace '{namespace}': {e}") | ||
| raise RuntimeError(f"Vector storage failed: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve exception chaining for better error traceability.
The exception handling should preserve the original error context as recommended by the static analysis tool.
Apply this diff to improve exception chaining:
- raise RuntimeError(f"Vector storage failed: {e}")
+ raise RuntimeError(f"Vector storage failed: {e}") from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| index.upsert(vectors, namespace=namespace) | |
| logger.info(f"Successfully stored {len(vectors)} " | |
| f"vectors in namespace '{namespace}'") | |
| except Exception as e: | |
| logger.error("Failed to store " | |
| f"vectors in namespace '{namespace}': {e}") | |
| raise RuntimeError(f"Vector storage failed: {e}") | |
| try: | |
| index.upsert(vectors, namespace=namespace) | |
| logger.info(f"Successfully stored {len(vectors)} " | |
| f"vectors in namespace '{namespace}'") | |
| except Exception as e: | |
| logger.error("Failed to store " | |
| f"vectors in namespace '{namespace}': {e}") | |
| raise RuntimeError(f"Vector storage failed: {e}") from e |
🧰 Tools
🪛 Ruff (0.11.9)
32-32: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In new-backend/app/utils/store_vectors.py around lines 25 to 32, the exception
handling catches errors but does not preserve the original traceback. Modify the
raise statement to use "raise RuntimeError(...) from e" to chain the original
exception, which improves error traceability and debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (1)
frontend/app/analyze/loading/page.tsx (1)
95-102: Optimize progress animation logic.The current progress animation increments every 100ms regardless of actual progress, which may not align with the step-based progress calculation.
Apply this diff to improve progress synchronization:
- const progressInterval = setInterval(() => { - setProgress((prev) => { - if (prev < 100) { - return prev + 1 - } - return prev - }) - }, 100) + const progressInterval = setInterval(() => { + setProgress((prev) => { + const targetProgress = Math.min((currentStep + 1) * 20, 100) + if (prev < targetProgress) { + return Math.min(prev + 2, targetProgress) + } + return prev + }) + }, 100)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
frontend/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (3)
frontend/app/analyze/loading/page.tsx(2 hunks)frontend/app/analyze/results/page.tsx(3 hunks)frontend/package.json(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- frontend/package.json
🔇 Additional comments (2)
frontend/app/analyze/results/page.tsx (1)
23-23: LGTM! State initialization is correct.The new state variable for analysis data is properly initialized.
frontend/app/analyze/loading/page.tsx (1)
9-9: LGTM! Axios import is appropriate.The axios library is properly imported for making HTTP requests.
| <div className="p-4"> | ||
| <h1 className="text-2xl font-bold mb-4">Analysis Results</h1> | ||
| <pre className="bg-black p-4 rounded text-sm overflow-x-auto whitespace-pre-wrap"> | ||
| {JSON.stringify(analysisData, null, 2)} | ||
| </pre> | ||
| </div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve JSON display implementation and consider security implications.
The current implementation has several issues:
- The debug section is positioned above the header, breaking the layout
- Raw JSON display may expose sensitive information
- No handling for null/undefined data
- Poor user experience with raw JSON dump
Apply this diff to improve the implementation:
- <div className="p-4">
- <h1 className="text-2xl font-bold mb-4">Analysis Results</h1>
- <pre className="bg-black p-4 rounded text-sm overflow-x-auto whitespace-pre-wrap">
- {JSON.stringify(analysisData, null, 2)}
- </pre>
- </div>
+ {process.env.NODE_ENV === 'development' && analysisData && (
+ <div className="p-4 mt-16">
+ <details className="mb-4">
+ <summary className="text-sm font-medium cursor-pointer">Debug: Raw Analysis Data</summary>
+ <pre className="bg-gray-100 dark:bg-gray-800 p-4 rounded text-xs overflow-x-auto whitespace-pre-wrap mt-2">
+ {JSON.stringify(analysisData, null, 2)}
+ </pre>
+ </details>
+ </div>
+ )}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In frontend/app/analyze/results/page.tsx around lines 78 to 83, the JSON display
is currently placed above the header, shows raw JSON which may expose sensitive
data, lacks handling for null or undefined analysisData, and results in poor
user experience. To fix this, move the JSON display below the header, add
conditional rendering to handle null or undefined analysisData gracefully,
sanitize or selectively display only non-sensitive parts of analysisData, and
replace the raw JSON dump with a more user-friendly formatted view or summary.
| const storedData = sessionStorage.getItem("analysisResult") | ||
| if (storedData) { | ||
| setAnalysisData(JSON.parse(storedData)) | ||
| } else { | ||
| // fallback if user visits results page directly | ||
| // maybe redirect or show error | ||
| console.warn("No analysis result found") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for JSON parsing and improve user experience.
The current implementation lacks error handling for JSON parsing, which could crash the component if the stored data is malformed.
Apply this diff to add proper error handling:
- const storedData = sessionStorage.getItem("analysisResult")
- if (storedData) {
- setAnalysisData(JSON.parse(storedData))
- } else {
- // fallback if user visits results page directly
- // maybe redirect or show error
- console.warn("No analysis result found")
- }
+ const storedData = sessionStorage.getItem("analysisResult")
+ if (storedData) {
+ try {
+ setAnalysisData(JSON.parse(storedData))
+ } catch (error) {
+ console.error("Failed to parse analysis data:", error)
+ // Handle corrupted data
+ sessionStorage.removeItem("analysisResult")
+ router.push("/analyze")
+ }
+ } else {
+ // fallback if user visits results page directly
+ console.warn("No analysis result found")
+ router.push("/analyze")
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const storedData = sessionStorage.getItem("analysisResult") | |
| if (storedData) { | |
| setAnalysisData(JSON.parse(storedData)) | |
| } else { | |
| // fallback if user visits results page directly | |
| // maybe redirect or show error | |
| console.warn("No analysis result found") | |
| } | |
| const storedData = sessionStorage.getItem("analysisResult") | |
| if (storedData) { | |
| try { | |
| setAnalysisData(JSON.parse(storedData)) | |
| } catch (error) { | |
| console.error("Failed to parse analysis data:", error) | |
| // Handle corrupted data | |
| sessionStorage.removeItem("analysisResult") | |
| router.push("/analyze") | |
| } | |
| } else { | |
| // fallback if user visits results page directly | |
| console.warn("No analysis result found") | |
| router.push("/analyze") | |
| } |
🤖 Prompt for AI Agents
In frontend/app/analyze/results/page.tsx around lines 42 to 49, the JSON.parse
call on storedData lacks error handling, which can cause the component to crash
if the data is malformed. Wrap the JSON.parse call in a try-catch block to catch
parsing errors, log the error, and handle the failure gracefully by either
clearing the invalid data or showing an appropriate user message to improve user
experience.
| const res = await axios.post("http://localhost:8000/api/process", { | ||
| url: storedUrl, | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add request timeout and better error handling.
The API call lacks timeout configuration and detailed error handling, which could lead to poor user experience.
Apply this diff to improve the API call:
- const res = await axios.post("http://localhost:8000/api/process", {
- url: storedUrl,
- })
+ const res = await axios.post(`${apiUrl}/api/process`, {
+ url: storedUrl,
+ }, {
+ timeout: 60000, // 60 seconds timeout for analysis
+ headers: {
+ 'Content-Type': 'application/json'
+ }
+ })📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const res = await axios.post("http://localhost:8000/api/process", { | |
| url: storedUrl, | |
| }) | |
| const res = await axios.post(`${apiUrl}/api/process`, { | |
| url: storedUrl, | |
| }, { | |
| timeout: 60000, // 60 seconds timeout for analysis | |
| headers: { | |
| 'Content-Type': 'application/json' | |
| } | |
| }) |
🤖 Prompt for AI Agents
In frontend/app/analyze/loading/page.tsx around lines 64 to 66, the axios POST
request to the API lacks a timeout setting and detailed error handling. Add a
timeout option to the axios request configuration to prevent hanging requests,
and wrap the call in a try-catch block to handle errors gracefully. In the catch
block, provide meaningful error messages or fallback behavior to improve user
experience.
| const runAnalysis = async () => { | ||
| const storedUrl = sessionStorage.getItem("articleUrl") | ||
| if (storedUrl) { | ||
| setArticleUrl(storedUrl) | ||
|
|
||
| try { | ||
| const res = await axios.post("http://localhost:8000/api/process", { | ||
| url: storedUrl, | ||
| }) | ||
|
|
||
| // Save response to sessionStorage | ||
| sessionStorage.setItem("analysisResult", JSON.stringify(res.data)) | ||
|
|
||
| // optional logging | ||
| console.log("Analysis result saved") | ||
| console.log(res) | ||
| } catch (err) { | ||
| console.error("Failed to process article:", err) | ||
| router.push("/analyze") // fallback in case of error | ||
| return | ||
| } | ||
|
|
||
| // Progress and step simulation | ||
| const stepInterval = setInterval(() => { | ||
| setCurrentStep((prev) => { | ||
| if (prev < steps.length - 1) { | ||
| return prev + 1 | ||
| } else { | ||
| clearInterval(stepInterval) | ||
| setTimeout(() => { | ||
| router.push("/analyze/results") | ||
| }, 2000) | ||
| return prev | ||
| } | ||
| }) | ||
| }, 2000) | ||
|
|
||
| const progressInterval = setInterval(() => { | ||
| setProgress((prev) => { | ||
| if (prev < 100) { | ||
| return prev + 1 | ||
| } | ||
| return prev | ||
| }) | ||
| }, 100) | ||
|
|
||
| return () => { | ||
| clearInterval(stepInterval) | ||
| clearInterval(progressInterval) | ||
| } | ||
| } else { | ||
| // Redirect back if no URL found | ||
| router.push("/analyze") | ||
| return | ||
| } | ||
| } | ||
|
|
||
| runAnalysis() | ||
| }, [router]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix critical async function and cleanup issues.
The current implementation has several critical problems:
- The cleanup function is incorrectly placed inside the async function
- Intervals may not be properly cleaned up, causing memory leaks
- Hard-coded localhost URL won't work in production
Apply this diff to fix the async function structure and cleanup:
- useEffect(() => {
- const runAnalysis = async () => {
+ useEffect(() => {
+ let stepInterval: NodeJS.Timeout
+ let progressInterval: NodeJS.Timeout
+
+ const runAnalysis = async () => {
const storedUrl = sessionStorage.getItem("articleUrl")
if (storedUrl) {
setArticleUrl(storedUrl)
try {
- const res = await axios.post("http://localhost:8000/api/process", {
+ const apiUrl = process.env.NEXT_PUBLIC_API_URL || "http://localhost:8000"
+ const res = await axios.post(`${apiUrl}/api/process`, {
url: storedUrl,
})
// Save response to sessionStorage
sessionStorage.setItem("analysisResult", JSON.stringify(res.data))
// optional logging
console.log("Analysis result saved")
- console.log(res)
} catch (err) {
console.error("Failed to process article:", err)
router.push("/analyze") // fallback in case of error
return
}
// Progress and step simulation
- const stepInterval = setInterval(() => {
+ stepInterval = setInterval(() => {
setCurrentStep((prev) => {
if (prev < steps.length - 1) {
return prev + 1
} else {
clearInterval(stepInterval)
setTimeout(() => {
router.push("/analyze/results")
}, 2000)
return prev
}
})
}, 2000)
- const progressInterval = setInterval(() => {
+ progressInterval = setInterval(() => {
setProgress((prev) => {
if (prev < 100) {
return prev + 1
}
return prev
})
}, 100)
- return () => {
- clearInterval(stepInterval)
- clearInterval(progressInterval)
- }
} else {
router.push("/analyze")
}
- }
+ }
- runAnalysis()
-}, [router])
+ runAnalysis()
+
+ return () => {
+ if (stepInterval) clearInterval(stepInterval)
+ if (progressInterval) clearInterval(progressInterval)
+ }
+ }, [router])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const runAnalysis = async () => { | |
| const storedUrl = sessionStorage.getItem("articleUrl") | |
| if (storedUrl) { | |
| setArticleUrl(storedUrl) | |
| try { | |
| const res = await axios.post("http://localhost:8000/api/process", { | |
| url: storedUrl, | |
| }) | |
| // Save response to sessionStorage | |
| sessionStorage.setItem("analysisResult", JSON.stringify(res.data)) | |
| // optional logging | |
| console.log("Analysis result saved") | |
| console.log(res) | |
| } catch (err) { | |
| console.error("Failed to process article:", err) | |
| router.push("/analyze") // fallback in case of error | |
| return | |
| } | |
| // Progress and step simulation | |
| const stepInterval = setInterval(() => { | |
| setCurrentStep((prev) => { | |
| if (prev < steps.length - 1) { | |
| return prev + 1 | |
| } else { | |
| clearInterval(stepInterval) | |
| setTimeout(() => { | |
| router.push("/analyze/results") | |
| }, 2000) | |
| return prev | |
| } | |
| }) | |
| }, 2000) | |
| const progressInterval = setInterval(() => { | |
| setProgress((prev) => { | |
| if (prev < 100) { | |
| return prev + 1 | |
| } | |
| return prev | |
| }) | |
| }, 100) | |
| return () => { | |
| clearInterval(stepInterval) | |
| clearInterval(progressInterval) | |
| } | |
| } else { | |
| // Redirect back if no URL found | |
| router.push("/analyze") | |
| return | |
| } | |
| } | |
| runAnalysis() | |
| }, [router]) | |
| useEffect(() => { | |
| let stepInterval: NodeJS.Timeout | |
| let progressInterval: NodeJS.Timeout | |
| const runAnalysis = async () => { | |
| const storedUrl = sessionStorage.getItem("articleUrl") | |
| if (storedUrl) { | |
| setArticleUrl(storedUrl) | |
| try { | |
| const apiUrl = process.env.NEXT_PUBLIC_API_URL || "http://localhost:8000" | |
| const res = await axios.post(`${apiUrl}/api/process`, { | |
| url: storedUrl, | |
| }) | |
| // Save response to sessionStorage | |
| sessionStorage.setItem("analysisResult", JSON.stringify(res.data)) | |
| // optional logging | |
| console.log("Analysis result saved") | |
| } catch (err) { | |
| console.error("Failed to process article:", err) | |
| router.push("/analyze") // fallback in case of error | |
| return | |
| } | |
| // Progress and step simulation | |
| stepInterval = setInterval(() => { | |
| setCurrentStep((prev) => { | |
| if (prev < steps.length - 1) { | |
| return prev + 1 | |
| } else { | |
| clearInterval(stepInterval) | |
| setTimeout(() => { | |
| router.push("/analyze/results") | |
| }, 2000) | |
| return prev | |
| } | |
| }) | |
| }, 2000) | |
| progressInterval = setInterval(() => { | |
| setProgress((prev) => { | |
| if (prev < 100) { | |
| return prev + 1 | |
| } | |
| return prev | |
| }) | |
| }, 100) | |
| } else { | |
| router.push("/analyze") | |
| } | |
| } | |
| runAnalysis() | |
| return () => { | |
| if (stepInterval) clearInterval(stepInterval) | |
| if (progressInterval) clearInterval(progressInterval) | |
| } | |
| }, [router]) |
🤖 Prompt for AI Agents
In frontend/app/analyze/loading/page.tsx lines 58 to 114, the async function
runAnalysis incorrectly includes the cleanup function inside it, causing
improper interval cleanup and potential memory leaks. Also, the API URL is
hard-coded to localhost, which is not suitable for production. To fix this, move
the interval setup and cleanup logic outside the async function into a useEffect
hook, ensure intervals are cleared in the cleanup function returned by
useEffect, and replace the hard-coded URL with a configurable environment
variable or relative path to support production deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
new-backend/app/modules/facts_check/web_search.py(1 hunks)new-backend/main.py(2 hunks)new-backend/start.sh(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- new-backend/start.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- new-backend/app/modules/facts_check/web_search.py
🔇 Additional comments (1)
new-backend/main.py (1)
3-3: LGTM - Clean import addition.The import follows FastAPI conventions and is necessary for the CORS middleware functionality.
| app.add_middleware( | ||
| CORSMiddleware, | ||
| allow_origins=["*"], | ||
| allow_credentials=True, | ||
| allow_methods=["*"], | ||
| allow_headers=["*"], | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security concern: Overly permissive CORS configuration.
The combination of allow_origins=["*"] with allow_credentials=True violates CORS security principles and creates potential security vulnerabilities. This configuration allows any origin to make credentialed requests to your API.
For production environments, consider this more secure configuration:
app.add_middleware(
CORSMiddleware,
- allow_origins=["*"],
+ allow_origins=[
+ "http://localhost:3000", # React dev server
+ "https://yourdomain.com", # Production frontend
+ ],
allow_credentials=True,
- allow_methods=["*"],
+ allow_methods=["GET", "POST", "PUT", "DELETE"],
- allow_headers=["*"],
+ allow_headers=["Content-Type", "Authorization"],
)Alternatively, if you need wildcard origins during development, set allow_credentials=False:
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
- allow_credentials=True,
+ allow_credentials=False,
allow_methods=["*"],
allow_headers=["*"],
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| app.add_middleware( | |
| CORSMiddleware, | |
| allow_origins=["*"], | |
| allow_credentials=True, | |
| allow_methods=["*"], | |
| allow_headers=["*"], | |
| ) | |
| app.add_middleware( | |
| CORSMiddleware, | |
| allow_origins=[ | |
| "http://localhost:3000", # React dev server | |
| "https://yourdomain.com", # Production frontend | |
| ], | |
| allow_credentials=True, | |
| allow_methods=["GET", "POST", "PUT", "DELETE"], | |
| allow_headers=["Content-Type", "Authorization"], | |
| ) |
| app.add_middleware( | |
| CORSMiddleware, | |
| allow_origins=["*"], | |
| allow_credentials=True, | |
| allow_methods=["*"], | |
| allow_headers=["*"], | |
| ) | |
| app.add_middleware( | |
| CORSMiddleware, | |
| allow_origins=["*"], | |
| allow_credentials=False, | |
| allow_methods=["*"], | |
| allow_headers=["*"], | |
| ) |
🤖 Prompt for AI Agents
In new-backend/main.py around lines 14 to 21, the CORS middleware is configured
with allow_origins=["*"] and allow_credentials=True, which is insecure. To fix
this, replace allow_origins=["*"] with a specific list of trusted origins for
production, or if wildcard origins are needed during development, set
allow_credentials=False to prevent credentialed requests from any origin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
new-backend/start.sh (1)
2-2: Provide a default forPORTto avoid empty output
$PORTis unset on most local machines; printing an empty value is confusing
and FastAPI will still start on its default port. Apply the defaulting shown in
the diff above.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
new-backend/main.py(2 hunks)new-backend/start.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- new-backend/main.py
🧰 Additional context used
🪛 Shellcheck (0.10.0)
new-backend/start.sh
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
new-backend/start.sh
Outdated
| uv sync | ||
| echo "Render PORT is: $PORT" | ||
| uv run main.py No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add a shebang + defensive flags + exec to harden the script
Without an explicit shebang the runtime shell is unpredictable (SC2148).
Adding set -euo pipefail stops the build on any failure and prevents silent
errors. Re-launching with exec ensures PID 1 receives signals correctly
(e.g. in Docker).
+#!/usr/bin/env bash
+
+# Abort on error, undefined var, or failed pipeline
+set -euo pipefail
+
+# Sync Python deps
uv sync
-echo "Render PORT is: $PORT"
-uv run main.py
+
+# Expose the port (falls back to 8000 locally)
+PORT=${PORT:-8000}
+echo "Render PORT is: $PORT"
+
+# Replace the shell with the app so signals are forwarded
+exec uv run main.py📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uv sync | |
| echo "Render PORT is: $PORT" | |
| uv run main.py | |
| #!/usr/bin/env bash | |
| # Abort on error, undefined var, or failed pipeline | |
| set -euo pipefail | |
| # Sync Python deps | |
| uv sync | |
| # Expose the port (falls back to 8000 locally) | |
| PORT=${PORT:-8000} | |
| echo "Render PORT is: $PORT" | |
| # Replace the shell with the app so signals are forwarded | |
| exec uv run main.py |
🧰 Tools
🪛 Shellcheck (0.10.0)
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
🤖 Prompt for AI Agents
In new-backend/start.sh at lines 1 to 3, the script lacks a shebang and
defensive flags, which can cause unpredictable behavior and silent errors. Add a
shebang line (e.g., #!/bin/bash) at the top, include `set -euo pipefail` to stop
on errors and undefined variables, and use `exec` before the command that runs
the main process to ensure proper signal handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
new-backend/start.sh (2)
2-2: Add the missing-uand-o pipefaildefensive flags.Only
set -eis present; without-u(undefined var) and-o pipefail(fail fast in pipelines) silent errors can slip through.
This was pointed out in a previous review but not fully applied.-set -e +set -euo pipefail
8-9: Preface the final command withexecso PID 1 gets signals.
uv runshould replace the shell so the process receives SIGTERM/SIGINT directly (important in Docker/Render deploys).
Same recommendation was given earlier but is still missing.-uv run main.py +exec uv run main.py
🧹 Nitpick comments (2)
new-backend/start.sh (2)
1-1: Prefer portable shebang
#!/usr/bin/env bashis more portable than hard-coding/bin/bash, which may not exist in minimal containers.-#!/bin/bash +#!/usr/bin/env bash
5-5: Avoid redundant package install on every startRe-installing
uveach launch slows cold-starts and may break offline images.
Consider a conditional install or vendor it in the image layer.-pip install uv +command -v uv >/dev/null 2>&1 || pip install --no-cache-dir -q uv
This PR introduces the
store_and_sendnode at the final stage of the LangGraph pipeline. It performs the following key tasks:cleaned_textfor better embedding efficiency.🔧 How it Works
Input:
statedictionary, expected to include:article_idcleaned_textkeywordssentimentfact_checkperspective,counter_perspective, etc.Steps:
article_id.✅ Checklist
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Improvements
Chores