Conversation
|
Important Review skippedReview was skipped as selected files did not have any reviewable changes. 💤 Files selected but had no reviewable changes (2)
You can disable this status message by setting the WalkthroughThis change removes the legacy backend and test files, replacing them with a new modular backend architecture. It introduces new FastAPI app setup, vector store integration, fact-checking and perspective generation modules, embedding utilities, and a Docker-based deployment workflow. The frontend is updated for analysis result handling, and dependency management is migrated to pyproject.toml. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend
participant Backend API
participant FactCheck
participant PerspectiveGen
participant VectorStore
User->>Frontend: Submit article URL
Frontend->>Backend API: POST /api/analyze {url}
Backend API->>FactCheck: Extract and verify claims
FactCheck->>FactCheck: Web search & LLM verification
FactCheck-->>Backend API: Verified facts
Backend API->>PerspectiveGen: Generate counter-perspective
PerspectiveGen-->>Backend API: Perspective result
Backend API->>VectorStore: Store embeddings/chunks
VectorStore-->>Backend API: Store confirmation
Backend API-->>Frontend: Analysis results
Frontend-->>User: Display results
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 20
🔭 Outside diff range comments (2)
new-backend/pyproject.toml (1)
6-6: Relax Python version requirement to 3.11+Most of Python 3.13’s new capabilities (enhanced REPL, experimental free-threaded mode, preliminary JIT, richer tracebacks, stdlib import optimizations, etc.) are either developer-convenience features or still experimental. All core AI/ML and web-search libraries you’re using are fully compatible with Python 3.11+, so you can broaden your deployment options by targeting 3.11 instead:
-requires-python = ">=3.13" +requires-python = ">=3.11"new-backend/app/modules/langgraph_nodes/generate_perspective.py (1)
49-55: Fix typo and improve error handling.There's a typo in the error message and the exception handling could be more specific.
except Exception as e: - print(f"some error occured in generate_perspective:{e}") + print(f"Error occurred in generate_perspective: {e}") return { "status": "error", "error_from": "generate_perspective", - "message": f"{e}", + "message": str(e), }Consider catching specific exceptions (e.g.,
ValueError,KeyError) for better error handling.
🧹 Nitpick comments (20)
frontend/app/analyze/results/page.tsx (1)
23-23: Consider adding type safety for analysis data.The
analysisDatastate is typed asnullbut could benefit from proper TypeScript typing based on the expected API response structure.-const [analysisData, setAnalysisData] = useState(null) +const [analysisData, setAnalysisData] = useState<AnalysisResult | null>(null)Consider defining an interface for the expected analysis result structure.
frontend/app/analyze/loading/page.tsx (1)
95-102: Optimize progress bar animation performance.Updating progress every 100ms may cause unnecessary re-renders.
-const progressInterval = setInterval(() => { - setProgress((prev) => { - if (prev < 100) { - return prev + 1 - } - return prev - }) -}, 100) +const progressInterval = setInterval(() => { + setProgress((prev) => { + if (prev < 100) { + return Math.min(prev + 2, 100) // Increment by 2 every 200ms instead + } + return prev + }) +}, 200)This reduces the update frequency while maintaining smooth animation.
new-backend/app/modules/scraper/cleaner.py (1)
2-10: Consider pre-downloading NLTK data during Docker build instead of runtime.The current implementation downloads NLTK corpora at module import time, which can cause delays during application startup and potential network issues in production environments.
For containerized deployments, consider downloading NLTK data during the Docker build process instead:
# In Dockerfile RUN python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt_tab')"Then simplify the code to:
-try: - nltk.data.find('corpora/stopwords') - nltk.data.find('corpora/punkt_tab') - -except LookupError: - nltk.download('stopwords') - nltk.download('punkt_tab')new-backend/app/modules/facts_check/web_search.py (1)
6-8: Consider using a more secure method for API key handling.Storing API keys in environment variables is a good practice, but consider additional security measures for production deployments.
For enhanced security, consider:
- Using a secrets management service
- Implementing API key rotation
- Adding logging for security monitoring (without exposing the key)
def search_with_serpapi(query, max_results=1): api_key = os.getenv("SERPAPI_KEY") if not api_key: - raise ValueError("SERPAPI_KEY not set in environment") + raise ValueError("SERPAPI_KEY not set in environment") + + # Log API usage for monitoring (without exposing key) + print(f"Performing search with query: {query[:50]}...")new-backend/.dockerignore (1)
1-2: Consider adding more comprehensive exclusions for production deployment.The current exclusions are good, but consider adding common development and build artifacts:
/.venv */.env +*.pyc +__pycache__/ +.git/ +.pytest_cache/ +*.log +.DS_Store +node_modules/ +.coverage +htmlcov/new-backend/app/modules/langgraph_nodes/sentiment.py (1)
34-34: Consider reducing temperature for more deterministic sentiment analysis.A temperature of 0.2 might introduce unnecessary randomness for sentiment analysis, which should be deterministic.
- temperature=0.2, + temperature=0.0,new-backend/start.sh (2)
4-5: Consider optimizing the uv installation check.The script installs
uvunconditionally, which may be inefficient if it's already present. Consider checking ifuvis available before installation.# Install uv if not present -pip install uv +if ! command -v uv &> /dev/null; then + echo "Installing uv..." + pip install uv +fi
8-9: Add error handling for critical operations.Consider adding validation to ensure the sync operation succeeds before attempting to run the application.
# Sync environment and run app -uv sync -uv run main.py +echo "Syncing dependencies..." +uv sync || { echo "Failed to sync dependencies"; exit 1; } +echo "Starting application..." +uv run main.pynew-backend/app/utils/generate_chunk_id.py (1)
4-8: Consider collision risk with truncated hash.The function truncates the SHA-256 hash to 15 characters, which reduces the collision resistance. While this is likely acceptable for article IDs, consider documenting this limitation or using a longer hash if uniqueness is critical.
For better collision resistance, consider using a longer hash:
- return f"article-{hashed_text[:15]}" + return f"article-{hashed_text[:32]}" # Use 32 characters for better uniquenessAlternatively, add documentation about the collision risk:
def generate_id(text: str) -> str: + """Generate a unique ID for article text using SHA-256 hash. + + Note: Hash is truncated to 15 characters. While collision risk is low, + consider using full hash for critical applications. + """new-backend/app/utils/prompt_templates.py (1)
3-32: Well-structured prompt template with minor enhancement suggestions.The prompt template is well-designed with clear sections and structured output format. Consider adding guidance for edge cases where facts might be contradictory or insufficient.
Consider adding instructions for handling edge cases:
Generate a logical and respectful *opposite perspective* to the article. +If the verified facts contradict the article's claims, acknowledge this in your reasoning. +If insufficient facts are available, clearly state this limitation. Use *step-by-step reasoning* and return your output in this JSON format:new-backend/main.py (2)
28-30: Good deployment configuration with minor improvement suggestion.The dynamic port configuration and host binding to
0.0.0.0are appropriate for container deployment. Consider adding validation for the port value.- port = int(os.environ.get("PORT", 7860)) + port = int(os.environ.get("PORT", 7860)) + if not 1 <= port <= 65535: + raise ValueError(f"Invalid port number: {port}")
26-27: Consider adding environment validation.While the import placement is fine, consider validating required environment variables at startup to fail fast if configuration is missing.
if __name__ == "__main__": import uvicorn import os + + # Validate required environment variables + required_env_vars = ["GROQ_API_KEY", "PINECONE_API_KEY"] # Adjust based on actual requirements + missing_vars = [var for var in required_env_vars if not os.getenv(var)] + if missing_vars: + raise EnvironmentError(f"Missing required environment variables: {missing_vars}")new-backend/Dockerfile (1)
12-24: Consider cache configuration consistency.The cache directory is set up but
--no-cacheflag is used during installation. This might be redundant or conflicting.Consider either using the cache or removing the cache directory setup:
# Option 1: Use cache -RUN uv sync --locked --no-cache +RUN uv sync --locked # Option 2: Remove cache directory if not using it -ENV UV_CACHE_DIR=/app/.uv-cache -RUN mkdir -p /app/.uv-cache && \ - adduser --disabled-password --gecos "" appuser && \ - chown -R appuser:appuser /app +RUN adduser --disabled-password --gecos "" appuser && \ + chown -R appuser:appuser /appnew-backend/app/modules/langgraph_nodes/fact_check.py (1)
14-14: Fix spelling errors in error messages.There are typos in the error logging statements: "occured" should be "occurred".
Apply this diff to fix the spelling:
- print(f"some error occured in fact_checking:{error_message}") + print(f"some error occurred in fact_checking:{error_message}")- print(f"some error occured in fact_checking:{e}") + print(f"some error occurred in fact_checking:{e}")Also applies to: 22-22
new-backend/app/utils/fact_check_utils.py (2)
46-47: Consider explicit error handling for the final verification step.The function returns a tuple where the second element can be
Noneon success. Consider making the return type more explicit or handle potential failures in the verification step.- final = run_fact_verifier_sdk(search_results) - return final.get("verifications", []), None + final = run_fact_verifier_sdk(search_results) + if final.get("status") != "success": + return [], "Fact verification failed." + return final.get("verifications", []), None
40-40: Consider making the rate limiting delay configurable.The hardcoded 5-second delay works for avoiding rate limits but could be made configurable for different environments or API providers.
- time.sleep(5) # ⏱️ Gentle delay to avoid DuckDuckGo ratelimit + time.sleep(5) # ⏱️ Gentle delay to avoid SerpAPI ratelimitNote: The comment mentions DuckDuckGo but the code uses SerpAPI.
new-backend/app/modules/langgraph_nodes/judge.py (1)
6-10: Consider increasing max_tokens for more reliable scoring.The
max_tokens=10limit might be too restrictive for the LLM to provide consistent scoring responses, especially if the model occasionally includes explanatory text before the score.groq_llm = ChatGroq( model="gemma2-9b-it", temperature=0.0, - max_tokens=10, + max_tokens=50, )new-backend/app/modules/vector_store/chunk_rag_data.py (1)
4-4: Add type hints for better code documentation.The function lacks type hints which would improve code maintainability and IDE support. Consider adding them based on the expected input/output types.
+from typing import List, Dict, Any, Union + -def chunk_rag_data(data): +def chunk_rag_data(data: Dict[str, Any]) -> List[Dict[str, Any]]:new-backend/app/modules/langgraph_nodes/generate_perspective.py (1)
14-19: Consider using environment variables for model configuration.The model name and temperature are hardcoded. For better flexibility across different environments, consider loading these from environment variables.
+import os + -my_llm = "llama-3.3-70b-versatile" +my_llm = os.getenv("GROQ_MODEL_NAME", "llama-3.3-70b-versatile") llm = ChatGroq( model=my_llm, - temperature=0.7 + temperature=float(os.getenv("GROQ_TEMPERATURE", "0.7")) )new-backend/app/modules/facts_check/llm_processing.py (1)
22-28: Fix spacing in prompt content.There's a missing space in the prompt text that could affect the LLM's understanding.
"content": ( "You are an assistant that extracts " "verifiable factual claims from articles. " - "Each claim must be short, fact-based, and" - " independently verifiable through internet search. " + "Each claim must be short, fact-based, and " + "independently verifiable through internet search. " "Only return a list of 3 clear bullet-point claims." ),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
frontend/package-lock.jsonis excluded by!**/package-lock.jsonnew-backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (39)
backend/app/main.py(0 hunks)backend/app/prompts/opposite_perspective.py(0 hunks)backend/app/prompts/related_topics.py(0 hunks)backend/app/routes.py(0 hunks)backend/app/scrapers/article_scraper.py(0 hunks)backend/app/scrapers/clean_data.py(0 hunks)backend/app/services/ai_service.py(0 hunks)backend/app/services/analysis_service.py(0 hunks)backend/app/services/counter_service.py(0 hunks)backend/app/services/related_topics.py(0 hunks)backend/app/services/summarization_service.py(0 hunks)backend/app/test_perspective.py(0 hunks)backend/requirements.txt(0 hunks)frontend/app/analyze/loading/page.tsx(2 hunks)frontend/app/analyze/results/page.tsx(3 hunks)frontend/components/bias-meter.tsx(3 hunks)frontend/package.json(2 hunks)new-backend/.dockerignore(1 hunks)new-backend/Dockerfile(1 hunks)new-backend/README.md(1 hunks)new-backend/app/db/vector_store.py(1 hunks)new-backend/app/modules/facts_check/llm_processing.py(1 hunks)new-backend/app/modules/facts_check/web_search.py(1 hunks)new-backend/app/modules/langgraph_builder.py(5 hunks)new-backend/app/modules/langgraph_nodes/fact_check.py(2 hunks)new-backend/app/modules/langgraph_nodes/generate_perspective.py(2 hunks)new-backend/app/modules/langgraph_nodes/judge.py(1 hunks)new-backend/app/modules/langgraph_nodes/sentiment.py(1 hunks)new-backend/app/modules/langgraph_nodes/store_and_send.py(1 hunks)new-backend/app/modules/scraper/cleaner.py(1 hunks)new-backend/app/modules/vector_store/chunk_rag_data.py(1 hunks)new-backend/app/modules/vector_store/embed.py(1 hunks)new-backend/app/utils/fact_check_utils.py(1 hunks)new-backend/app/utils/generate_chunk_id.py(1 hunks)new-backend/app/utils/prompt_templates.py(1 hunks)new-backend/app/utils/store_vectors.py(1 hunks)new-backend/main.py(1 hunks)new-backend/pyproject.toml(1 hunks)new-backend/start.sh(1 hunks)
💤 Files with no reviewable changes (13)
- backend/requirements.txt
- backend/app/main.py
- backend/app/prompts/related_topics.py
- backend/app/prompts/opposite_perspective.py
- backend/app/scrapers/clean_data.py
- backend/app/services/analysis_service.py
- backend/app/services/summarization_service.py
- backend/app/scrapers/article_scraper.py
- backend/app/services/related_topics.py
- backend/app/test_perspective.py
- backend/app/services/counter_service.py
- backend/app/services/ai_service.py
- backend/app/routes.py
🧰 Additional context used
🧬 Code Graph Analysis (4)
new-backend/app/modules/langgraph_nodes/fact_check.py (1)
new-backend/app/utils/fact_check_utils.py (1)
run_fact_check_pipeline(10-47)
new-backend/app/modules/langgraph_nodes/store_and_send.py (3)
new-backend/app/modules/vector_store/chunk_rag_data.py (1)
chunk_rag_data(4-73)new-backend/app/modules/vector_store/embed.py (1)
embed_chunks(7-30)new-backend/app/utils/store_vectors.py (1)
store(10-32)
new-backend/app/modules/vector_store/chunk_rag_data.py (1)
new-backend/app/utils/generate_chunk_id.py (1)
generate_id(4-8)
new-backend/app/modules/langgraph_builder.py (2)
new-backend/app/modules/langgraph_nodes/sentiment.py (1)
run_sentiment_sdk(10-53)new-backend/app/modules/langgraph_nodes/error_handler.py (1)
error_handler(3-11)
🪛 Ruff (0.11.9)
new-backend/app/utils/store_vectors.py
32-32: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
new-backend/app/modules/langgraph_nodes/store_and_send.py
13-13: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
15-15: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
21-21: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
new-backend/app/db/vector_store.py
14-14: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
40-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (22)
frontend/package.json (1)
41-41: Axios dependency is up to date and secureVersion
^1.10.0is the latest stable release (June 14, 2025) and has no known security advisories. No further action is needed.frontend/components/bias-meter.tsx (1)
1-79: LGTM! Excellent formatting improvements.The addition of semicolons and improved JSX formatting enhances code readability and aligns with TypeScript best practices.
new-backend/README.md (1)
1-10: LGTM! Proper Hugging Face Spaces configuration.The YAML front matter is correctly configured for Hugging Face Spaces deployment with Docker SDK, which aligns with the PR objectives for backend deployment.
new-backend/app/modules/langgraph_nodes/sentiment.py (2)
35-35: The reduced token limit is appropriate for sentiment analysis.Reducing max_tokens to 3 makes sense since the expected output is a single word (positive/negative/neutral), which helps ensure concise responses and reduces API costs.
39-39: Good practice to normalize sentiment output.Converting sentiment to lowercase ensures consistent output format for downstream processing.
new-backend/app/utils/generate_chunk_id.py (1)
5-6: LGTM: Good input validation.The input validation properly checks for both empty strings and correct type, which prevents common errors.
new-backend/app/utils/prompt_templates.py (1)
21-31: Good JSON structure specification.The JSON format specification is clear and will help ensure consistent output parsing. The reasoning steps format encourages structured thinking.
new-backend/app/utils/store_vectors.py (1)
10-28: Well-structured function with good validation and logging.The function properly validates input, handles the Pinecone upsert operation, and provides informative logging. The structure and error handling approach are solid.
new-backend/Dockerfile (1)
1-31: Good security practices with non-root user and proper structure.The Dockerfile follows security best practices by using a non-root user and properly sets up the working directory. The port configuration for Hugging Face deployment is appropriate.
new-backend/app/modules/langgraph_nodes/fact_check.py (1)
11-20: Well-integrated pipeline with proper error handling.The integration with the fact-checking pipeline is clean and maintains proper error handling. The function correctly handles both pipeline errors and exceptions while preserving the state structure.
new-backend/app/db/vector_store.py (2)
17-34: Well-designed index management with proper constants.The index creation logic is sound with appropriate constants and conditional creation. The serverless specification for AWS US East 1 is properly configured.
5-7: Good practice for environment variable validation.Proper validation of required environment variables with clear error messages.
new-backend/app/modules/vector_store/embed.py (3)
13-18: Excellent input validation with clear error messages.The validation logic properly checks chunk structure and provides detailed error messages including the problematic index, which aids in debugging.
20-30: Well-structured embedding and vector creation process.The function efficiently processes text embeddings and creates properly formatted vectors for Pinecone storage. The data structure aligns well with the expected format.
4-4: Appropriate model choice for general text embeddings.The "all-MiniLM-L6-v2" model is a good choice for general text embeddings, providing a good balance between performance and accuracy. The 384-dimensional output aligns with the vector store configuration.
new-backend/app/modules/langgraph_nodes/judge.py (1)
29-44: Excellent robust response parsing and score validation.The implementation handles multiple response formats gracefully and includes proper bounds checking for the extracted score. The regex pattern effectively extracts integer values from the response.
new-backend/app/modules/langgraph_builder.py (2)
14-22: Excellent addition of typed state management.The
MyStateTypedDict provides clear type definitions for all state variables, improving code maintainability and IDE support. The type annotations are comprehensive and match the expected data flow.
56-102: Verify LangGraph dependency for conditional edges and terminal markerWe couldn’t locate
StateGraphor itsadd_conditional_edgesimplementation in the repo, nor importlanggraphin the sandbox. Please confirm that your installed LangGraph version’sStateGraphAPI supports:
- The
add_conditional_edges(source: str, condition: Callable)method- The
"__end__"terminal markerTypical checks:
pip show langgraph python - <<EOF import inspect from langgraph.graph import StateGraph print(inspect.signature(StateGraph.add_conditional_edges)) EOFIf unsupported, either revert to
set_conditional_edgesor bump your LangGraph dependency accordingly.
[new-backend/app/modules/langgraph_builder.py:56–102]new-backend/app/modules/vector_store/chunk_rag_data.py (2)
6-32: Excellent comprehensive validation of input data.The validation logic properly checks for required fields, validates data types, and handles both dictionary and object-based perspective data. The safety checks for perspective object attributes are particularly well-implemented.
44-68: Robust fact validation and chunk generation.The implementation properly validates all required fact fields and generates well-structured chunks with comprehensive metadata. The enumeration approach for fact indexing is clean and maintainable.
new-backend/app/modules/langgraph_nodes/generate_perspective.py (1)
35-38: Good input validation!The validation of required state fields is well implemented with clear error messages.
new-backend/app/modules/facts_check/llm_processing.py (1)
108-110: Good markdown stripping implementation!The regex pattern correctly handles both opening and closing markdown code blocks.
| const storedData = sessionStorage.getItem("analysisResult") | ||
| if (storedData) { | ||
| setAnalysisData(JSON.parse(storedData)) | ||
| } else { | ||
| // fallback if user visits results page directly | ||
| // maybe redirect or show error | ||
| console.warn("No analysis result found") | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add error handling for sessionStorage parsing.
The JSON.parse operation could throw an error if the stored data is invalid.
const storedData = sessionStorage.getItem("analysisResult")
if (storedData) {
- setAnalysisData(JSON.parse(storedData))
+ try {
+ setAnalysisData(JSON.parse(storedData))
+ } catch (error) {
+ console.error("Failed to parse analysis data:", error)
+ // Consider redirecting to analyze page or showing error message
+ }
} else {
// fallback if user visits results page directly
// maybe redirect or show error
console.warn("No analysis result found")
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const storedData = sessionStorage.getItem("analysisResult") | |
| if (storedData) { | |
| setAnalysisData(JSON.parse(storedData)) | |
| } else { | |
| // fallback if user visits results page directly | |
| // maybe redirect or show error | |
| console.warn("No analysis result found") | |
| } | |
| const storedData = sessionStorage.getItem("analysisResult") | |
| if (storedData) { | |
| try { | |
| setAnalysisData(JSON.parse(storedData)) | |
| } catch (error) { | |
| console.error("Failed to parse analysis data:", error) | |
| // Consider redirecting to analyze page or showing an error message | |
| } | |
| } else { | |
| // fallback if user visits results page directly | |
| // maybe redirect or show error | |
| console.warn("No analysis result found") | |
| } |
🤖 Prompt for AI Agents
In frontend/app/analyze/results/page.tsx around lines 42 to 49, the JSON.parse
call on sessionStorage data can throw an error if the stored data is invalid.
Wrap the JSON.parse call in a try-catch block to handle any parsing errors
gracefully. In the catch block, log the error and handle the failure case, such
as clearing the invalid data or showing an error message to the user.
| <div className="p-4"> | ||
| <h1 className="text-2xl font-bold mb-4">Analysis Results</h1> | ||
| <pre className="bg-black p-4 rounded text-sm overflow-x-auto whitespace-pre-wrap"> | ||
| {JSON.stringify(analysisData, null, 2)} | ||
| </pre> | ||
| </div> |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Improve JSON display styling and user experience.
The hardcoded black background doesn't respect theme preferences and raw JSON display might not be user-friendly.
-<div className="p-4">
- <h1 className="text-2xl font-bold mb-4">Analysis Results</h1>
- <pre className="bg-black p-4 rounded text-sm overflow-x-auto whitespace-pre-wrap">
- {JSON.stringify(analysisData, null, 2)}
- </pre>
-</div>
+{analysisData && (
+ <div className="p-4">
+ <h1 className="text-2xl font-bold mb-4">Analysis Results</h1>
+ <pre className="bg-muted p-4 rounded text-sm overflow-x-auto whitespace-pre-wrap border">
+ {JSON.stringify(analysisData, null, 2)}
+ </pre>
+ </div>
+)}Consider replacing the raw JSON display with a structured, user-friendly presentation of the analysis results.
🤖 Prompt for AI Agents
In frontend/app/analyze/results/page.tsx around lines 78 to 83, the JSON display
uses a hardcoded black background and raw JSON stringifying, which ignores theme
preferences and is not user-friendly. Replace the raw JSON <pre> block with a
structured, styled component that formats the analysisData into readable
sections or tables, and use theme-aware styling instead of a fixed black
background to improve user experience and accessibility.
| const res = await axios.post("https://Thunder1245-perspective1.hf.space/api/process", { | ||
| url: storedUrl, | ||
| }) |
There was a problem hiding this comment.
Extract hardcoded API URL to environment configuration.
The hardcoded API endpoint should be configurable and not embedded in the code.
-const res = await axios.post("https://Thunder1245-perspective1.hf.space/api/process", {
+const res = await axios.post(process.env.NEXT_PUBLIC_API_URL || "https://Thunder1245-perspective1.hf.space/api/process", {
url: storedUrl,
})Add the API URL to your environment variables in .env.local:
NEXT_PUBLIC_API_URL=https://Thunder1245-perspective1.hf.space/api/process
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const res = await axios.post("https://Thunder1245-perspective1.hf.space/api/process", { | |
| url: storedUrl, | |
| }) | |
| const res = await axios.post( | |
| process.env.NEXT_PUBLIC_API_URL || "https://Thunder1245-perspective1.hf.space/api/process", | |
| { | |
| url: storedUrl, | |
| } | |
| ) |
🤖 Prompt for AI Agents
In frontend/app/analyze/loading/page.tsx around lines 64 to 66, the API URL is
hardcoded in the axios.post call. To fix this, move the URL to an environment
variable by adding
NEXT_PUBLIC_API_URL=https://Thunder1245-perspective1.hf.space/api/process in
.env.local, then replace the hardcoded string with
process.env.NEXT_PUBLIC_API_URL in the axios.post call to make the endpoint
configurable.
| try { | ||
| const res = await axios.post("https://Thunder1245-perspective1.hf.space/api/process", { | ||
| url: storedUrl, | ||
| }) | ||
|
|
||
| // Save response to sessionStorage | ||
| sessionStorage.setItem("analysisResult", JSON.stringify(res.data)) | ||
|
|
||
| // optional logging | ||
| console.log("Analysis result saved") | ||
| console.log(res) | ||
| } catch (err) { | ||
| console.error("Failed to process article:", err) | ||
| router.push("/analyze") // fallback in case of error | ||
| return | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add user feedback during API processing.
The API call happens silently, leaving users unaware of the actual processing status.
Consider adding a loading state and user feedback:
const [currentStep, setCurrentStep] = useState(0)
const [progress, setProgress] = useState(0)
const [articleUrl, setArticleUrl] = useState("")
+const [isProcessing, setIsProcessing] = useState(false)
+const [apiError, setApiError] = useState<string | null>(null)
const router = useRouter()
// In the runAnalysis function:
try {
+ setIsProcessing(true)
const res = await axios.post(process.env.NEXT_PUBLIC_API_URL, {
url: storedUrl,
})
+ setIsProcessing(false)
// Save response to sessionStorage
sessionStorage.setItem("analysisResult", JSON.stringify(res.data))
} catch (err) {
+ setIsProcessing(false)
+ setApiError("Failed to process article. Please try again.")
console.error("Failed to process article:", err)
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In frontend/app/analyze/loading/page.tsx around lines 63 to 78, the API call to
process the article happens without any user feedback, leaving users unaware of
the processing status. Introduce a loading state variable to track when the API
call is in progress. Before the try block, set the loading state to true, and in
both the success and catch blocks, set it back to false. Use this loading state
to conditionally render a loading indicator or message in the UI to inform users
that processing is underway.
| const runAnalysis = async () => { | ||
| const storedUrl = sessionStorage.getItem("articleUrl") | ||
| if (storedUrl) { | ||
| setArticleUrl(storedUrl) | ||
|
|
||
| try { | ||
| const res = await axios.post("https://Thunder1245-perspective1.hf.space/api/process", { | ||
| url: storedUrl, | ||
| }) | ||
|
|
||
| // Save response to sessionStorage | ||
| sessionStorage.setItem("analysisResult", JSON.stringify(res.data)) | ||
|
|
||
| // optional logging | ||
| console.log("Analysis result saved") | ||
| console.log(res) | ||
| } catch (err) { | ||
| console.error("Failed to process article:", err) | ||
| router.push("/analyze") // fallback in case of error | ||
| return | ||
| } | ||
|
|
||
| // Progress and step simulation | ||
| const stepInterval = setInterval(() => { | ||
| setCurrentStep((prev) => { | ||
| if (prev < steps.length - 1) { | ||
| return prev + 1 | ||
| } else { | ||
| clearInterval(stepInterval) | ||
| setTimeout(() => { | ||
| router.push("/analyze/results") | ||
| }, 2000) | ||
| return prev | ||
| } | ||
| }) | ||
| }, 2000) | ||
|
|
||
| const progressInterval = setInterval(() => { | ||
| setProgress((prev) => { | ||
| if (prev < 100) { | ||
| return prev + 1 | ||
| } | ||
| return prev | ||
| }) | ||
| }, 100) | ||
|
|
||
| return () => { | ||
| clearInterval(stepInterval) | ||
| clearInterval(progressInterval) | ||
| } | ||
| } else { | ||
| // Redirect back if no URL found | ||
| router.push("/analyze") | ||
| return | ||
| } | ||
| } | ||
|
|
||
| runAnalysis() | ||
| }, [router]) |
There was a problem hiding this comment.
Fix memory leak and improve async handling.
The current implementation has several issues that could cause memory leaks and race conditions.
useEffect(() => {
- const runAnalysis = async () => {
+ let mounted = true;
+ let stepInterval: NodeJS.Timeout;
+ let progressInterval: NodeJS.Timeout;
+
+ const runAnalysis = async () => {
const storedUrl = sessionStorage.getItem("articleUrl")
if (storedUrl) {
+ if (!mounted) return;
setArticleUrl(storedUrl)
try {
const res = await axios.post(process.env.NEXT_PUBLIC_API_URL || "https://Thunder1245-perspective1.hf.space/api/process", {
url: storedUrl,
})
+ if (!mounted) return;
// Save response to sessionStorage
sessionStorage.setItem("analysisResult", JSON.stringify(res.data))
// optional logging
console.log("Analysis result saved")
console.log(res)
} catch (err) {
console.error("Failed to process article:", err)
+ if (mounted) {
router.push("/analyze") // fallback in case of error
+ }
return
}
// Progress and step simulation
- const stepInterval = setInterval(() => {
+ stepInterval = setInterval(() => {
setCurrentStep((prev) => {
if (prev < steps.length - 1) {
return prev + 1
} else {
clearInterval(stepInterval)
setTimeout(() => {
+ if (mounted) {
router.push("/analyze/results")
+ }
}, 2000)
return prev
}
})
}, 2000)
- const progressInterval = setInterval(() => {
+ progressInterval = setInterval(() => {
setProgress((prev) => {
if (prev < 100) {
return prev + 1
}
return prev
})
}, 100)
- return () => {
- clearInterval(stepInterval)
- clearInterval(progressInterval)
- }
} else {
+ if (mounted) {
router.push("/analyze")
+ }
}
}
runAnalysis()
+
+ return () => {
+ mounted = false;
+ if (stepInterval) clearInterval(stepInterval);
+ if (progressInterval) clearInterval(progressInterval);
+ }
}, [router])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const runAnalysis = async () => { | |
| const storedUrl = sessionStorage.getItem("articleUrl") | |
| if (storedUrl) { | |
| setArticleUrl(storedUrl) | |
| try { | |
| const res = await axios.post("https://Thunder1245-perspective1.hf.space/api/process", { | |
| url: storedUrl, | |
| }) | |
| // Save response to sessionStorage | |
| sessionStorage.setItem("analysisResult", JSON.stringify(res.data)) | |
| // optional logging | |
| console.log("Analysis result saved") | |
| console.log(res) | |
| } catch (err) { | |
| console.error("Failed to process article:", err) | |
| router.push("/analyze") // fallback in case of error | |
| return | |
| } | |
| // Progress and step simulation | |
| const stepInterval = setInterval(() => { | |
| setCurrentStep((prev) => { | |
| if (prev < steps.length - 1) { | |
| return prev + 1 | |
| } else { | |
| clearInterval(stepInterval) | |
| setTimeout(() => { | |
| router.push("/analyze/results") | |
| }, 2000) | |
| return prev | |
| } | |
| }) | |
| }, 2000) | |
| const progressInterval = setInterval(() => { | |
| setProgress((prev) => { | |
| if (prev < 100) { | |
| return prev + 1 | |
| } | |
| return prev | |
| }) | |
| }, 100) | |
| return () => { | |
| clearInterval(stepInterval) | |
| clearInterval(progressInterval) | |
| } | |
| } else { | |
| // Redirect back if no URL found | |
| router.push("/analyze") | |
| return | |
| } | |
| } | |
| runAnalysis() | |
| }, [router]) | |
| useEffect(() => { | |
| let mounted = true | |
| let stepInterval: NodeJS.Timeout | |
| let progressInterval: NodeJS.Timeout | |
| const runAnalysis = async () => { | |
| const storedUrl = sessionStorage.getItem("articleUrl") | |
| if (storedUrl) { | |
| if (!mounted) return | |
| setArticleUrl(storedUrl) | |
| try { | |
| const res = await axios.post( | |
| process.env.NEXT_PUBLIC_API_URL || "https://Thunder1245-perspective1.hf.space/api/process", | |
| { url: storedUrl }, | |
| ) | |
| if (!mounted) return | |
| // Save response to sessionStorage | |
| sessionStorage.setItem("analysisResult", JSON.stringify(res.data)) | |
| // optional logging | |
| console.log("Analysis result saved") | |
| console.log(res) | |
| } catch (err) { | |
| console.error("Failed to process article:", err) | |
| if (mounted) { | |
| router.push("/analyze") // fallback in case of error | |
| } | |
| return | |
| } | |
| // Progress and step simulation | |
| stepInterval = setInterval(() => { | |
| setCurrentStep((prev) => { | |
| if (prev < steps.length - 1) { | |
| return prev + 1 | |
| } else { | |
| clearInterval(stepInterval) | |
| setTimeout(() => { | |
| if (mounted) { | |
| router.push("/analyze/results") | |
| } | |
| }, 2000) | |
| return prev | |
| } | |
| }) | |
| }, 2000) | |
| progressInterval = setInterval(() => { | |
| setProgress((prev) => { | |
| if (prev < 100) { | |
| return prev + 1 | |
| } | |
| return prev | |
| }) | |
| }, 100) | |
| } else { | |
| if (mounted) { | |
| router.push("/analyze") | |
| } | |
| } | |
| } | |
| runAnalysis() | |
| return () => { | |
| mounted = false | |
| if (stepInterval) clearInterval(stepInterval) | |
| if (progressInterval) clearInterval(progressInterval) | |
| } | |
| }, [router]) |
🤖 Prompt for AI Agents
In frontend/app/analyze/loading/page.tsx around lines 58 to 114, the runAnalysis
function sets up intervals inside an async function without proper cleanup,
causing potential memory leaks and race conditions. Refactor by moving the
interval setup outside the async function or use a React effect cleanup function
to clear intervals when the component unmounts or dependencies change. Also,
ensure async calls are properly awaited and handle component unmount scenarios
to avoid state updates on unmounted components.
| try: | ||
| chunks = chunk_rag_data(state) | ||
| except KeyError as e: | ||
| raise Exception(f"Missing required data field for chunking: {e}") | ||
| except Exception as e: | ||
| raise Exception(f"Failed to chunk data: {e}") | ||
| try: | ||
| vectors = embed_chunks(chunks) | ||
| if vectors: | ||
| print("embedding generated successfully!") | ||
| except Exception as e: | ||
| raise Exception(f"failed to embed chunks: {e}") | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Improve exception handling to follow Python best practices.
The exception handling should use raise ... from err or raise ... from None to properly chain exceptions and provide better debugging information.
try:
chunks = chunk_rag_data(state)
except KeyError as e:
- raise Exception(f"Missing required data field for chunking: {e}")
+ raise Exception(f"Missing required data field for chunking: {e}") from e
except Exception as e:
- raise Exception(f"Failed to chunk data: {e}")
+ raise Exception(f"Failed to chunk data: {e}") from e
try:
vectors = embed_chunks(chunks)
if vectors:
print("embedding generated successfully!")
except Exception as e:
- raise Exception(f"failed to embed chunks: {e}")
+ raise Exception(f"failed to embed chunks: {e}") from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| chunks = chunk_rag_data(state) | |
| except KeyError as e: | |
| raise Exception(f"Missing required data field for chunking: {e}") | |
| except Exception as e: | |
| raise Exception(f"Failed to chunk data: {e}") | |
| try: | |
| vectors = embed_chunks(chunks) | |
| if vectors: | |
| print("embedding generated successfully!") | |
| except Exception as e: | |
| raise Exception(f"failed to embed chunks: {e}") | |
| try: | |
| chunks = chunk_rag_data(state) | |
| except KeyError as e: | |
| raise Exception(f"Missing required data field for chunking: {e}") from e | |
| except Exception as e: | |
| raise Exception(f"Failed to chunk data: {e}") from e | |
| try: | |
| vectors = embed_chunks(chunks) | |
| if vectors: | |
| print("embedding generated successfully!") | |
| except Exception as e: | |
| raise Exception(f"failed to embed chunks: {e}") from e |
🧰 Tools
🪛 Ruff (0.11.9)
13-13: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
15-15: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
21-21: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/store_and_send.py around lines 10 to
22, the exception handling currently raises new exceptions without chaining the
original ones, which loses traceback context. Update the raise statements to use
"raise Exception(...) from e" to properly chain exceptions and preserve
debugging information.
| facts_str = "\n".join([f"Claim: {f['original_claim']}\n" | ||
| "Verdict: {f['verdict']}\nExplanation: " | ||
| "{f['explanation']}" for f in state["facts"]]) |
There was a problem hiding this comment.
Fix f-string formatting bug.
The f-string formatting is broken. Only the first line has the f prefix, causing the variables in subsequent lines to not be interpolated.
- facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
- "Verdict: {f['verdict']}\nExplanation: "
- "{f['explanation']}" for f in state["facts"]])
+ facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
+ f"Verdict: {f['verdict']}\nExplanation: "
+ f"{f['explanation']}" for f in state["facts"]])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| facts_str = "\n".join([f"Claim: {f['original_claim']}\n" | |
| "Verdict: {f['verdict']}\nExplanation: " | |
| "{f['explanation']}" for f in state["facts"]]) | |
| facts_str = "\n".join([f"Claim: {f['original_claim']}\n" | |
| f"Verdict: {f['verdict']}\nExplanation: " | |
| f"{f['explanation']}" for f in state["facts"]]) |
🤖 Prompt for AI Agents
In new-backend/app/modules/langgraph_nodes/generate_perspective.py around lines
40 to 42, the f-string formatting is incorrect because only the first line has
the f prefix, so variables in the following lines are not interpolated. Fix this
by adding the f prefix to each line inside the join list comprehension to ensure
all variables are properly formatted.
| client = Groq(api_key=os.getenv("GROQ_API_KEY")) | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Validate API key existence.
The API key is loaded without validation. Consider checking if it exists to provide better error messages.
-client = Groq(api_key=os.getenv("GROQ_API_KEY"))
+api_key = os.getenv("GROQ_API_KEY")
+if not api_key:
+ raise ValueError("GROQ_API_KEY environment variable is not set")
+client = Groq(api_key=api_key)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| client = Groq(api_key=os.getenv("GROQ_API_KEY")) | |
| api_key = os.getenv("GROQ_API_KEY") | |
| if not api_key: | |
| raise ValueError("GROQ_API_KEY environment variable is not set") | |
| client = Groq(api_key=api_key) |
🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py around lines 9 to 10,
the API key for Groq client is loaded without checking if it exists. Add a
validation step after retrieving the API key from the environment variable to
verify it is not None or empty. If the key is missing, raise a clear and
descriptive error or log a message indicating the missing API key before
initializing the Groq client.
| "You are a fact-checking assistant. " | ||
| "Your job is to determine whether the given" | ||
| " claim is True, False" | ||
| "based on the provided web search evidence." | ||
| " Keep it concise and structured." |
There was a problem hiding this comment.
Fix incomplete prompt text.
The prompt has spacing issues and an incomplete sentence about claim determination.
"content": (
"You are a fact-checking assistant. "
- "Your job is to determine whether the given"
- " claim is True, False"
+ "Your job is to determine whether the given "
+ "claim is True or False "
"based on the provided web search evidence."
" Keep it concise and structured."
),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "You are a fact-checking assistant. " | |
| "Your job is to determine whether the given" | |
| " claim is True, False" | |
| "based on the provided web search evidence." | |
| " Keep it concise and structured." | |
| "content": ( | |
| "You are a fact-checking assistant. " | |
| "Your job is to determine whether the given " | |
| "claim is True or False " | |
| "based on the provided web search evidence." | |
| " Keep it concise and structured." | |
| ), |
🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py around lines 78 to 82,
the prompt string is missing spaces between concatenated parts and ends with an
incomplete sentence about claim determination. Fix this by adding necessary
spaces between the string segments and completing the sentence so it clearly
instructs the assistant to determine if the claim is True or False based on the
evidence, ensuring the prompt reads as a coherent, complete instruction.
| try: | ||
| parsed = json.loads(content) | ||
| except Exception as parse_err: | ||
| print(f"❌ LLM JSON parse error: {parse_err}") | ||
|
|
||
| results_list.append(parsed) | ||
|
|
There was a problem hiding this comment.
Critical: Handle JSON parse errors properly.
The code continues to use parsed even if JSON parsing fails, which will cause an UnboundLocalError.
# Try parsing the JSON response
try:
parsed = json.loads(content)
+ results_list.append(parsed)
except Exception as parse_err:
print(f"❌ LLM JSON parse error: {parse_err}")
-
- results_list.append(parsed)
+ # Skip this result or add a default error result
+ results_list.append({
+ "verdict": "Error",
+ "explanation": f"Failed to parse LLM response: {parse_err}",
+ "original_claim": claim,
+ "source_link": source
+ })📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| parsed = json.loads(content) | |
| except Exception as parse_err: | |
| print(f"❌ LLM JSON parse error: {parse_err}") | |
| results_list.append(parsed) | |
| # Try parsing the JSON response | |
| try: | |
| parsed = json.loads(content) | |
| results_list.append(parsed) | |
| except Exception as parse_err: | |
| print(f"❌ LLM JSON parse error: {parse_err}") | |
| # Skip this result or add a default error result | |
| results_list.append({ | |
| "verdict": "Error", | |
| "explanation": f"Failed to parse LLM response: {parse_err}", | |
| "original_claim": claim, | |
| "source_link": source | |
| }) |
🤖 Prompt for AI Agents
In new-backend/app/modules/facts_check/llm_processing.py around lines 113 to
119, the variable 'parsed' is used after a JSON parsing attempt without ensuring
it was successfully assigned, leading to a potential UnboundLocalError if
parsing fails. To fix this, modify the code to handle the exception properly by
either initializing 'parsed' to a safe default value before the try block or by
skipping appending to results_list when parsing fails, ensuring 'parsed' is only
used if JSON parsing succeeds.
There was a problem hiding this comment.
Actionable comments posted: 15
🧹 Nitpick comments (8)
backend/app/modules/scraper/cleaner.py (1)
2-2: Remove unused NLTK import or add justification comment.The
nltklibrary is imported but not used anywhere in the current implementation ofclean_extracted_text. This adds unnecessary overhead to the module.If NLTK functionality is planned for future use, consider adding a comment explaining this. Otherwise, remove the import:
-import nltkbackend/start.sh (1)
1-2: Preferset -euo pipefailfor safer Bash scriptingAdding
uando pipefailprevents silent failures from unset vars and broken pipelines.-set -e +set -euo pipefailbackend/README.md (3)
80-80: Fix comment typo.loc→.lock-├── uv.lock # .loc file like package-lock.json +├── uv.lock # .lock file similar to npm's package-lock.json
59-61: Specify a language for the fenced code block to appease markdownlint-``` -http://localhost:8000/api/ -``` +```text +http://localhost:8000/api/ +```
24-24: Minor grammar: add “the”-### 1. Clone the repo & jump into backend folder +### 1. Clone the repo & jump into the backend folderbackend/app/utils/generate_chunk_id.py (1)
4-8: Consider increasing hash length to reduce collision risk.Using only 15 characters of SHA-256 provides ~60 bits of entropy, which may lead to collisions at scale. Consider increasing the length or using the full hash.
- return f"article-{hashed_text[:15]}" + return f"article-{hashed_text[:32]}" # 128 bits of entropyAdditionally, consider making the prefix configurable for better reusability:
-def generate_id(text: str) -> str: +def generate_id(text: str, prefix: str = "article") -> str: if not text or not isinstance(text, str): raise ValueError("Text must be non-empty string") hashed_text = hashlib.sha256(text.encode("utf-8")).hexdigest() - return f"article-{hashed_text[:15]}" + return f"{prefix}-{hashed_text[:32]}".github/workflows/deploy-backend-to-hf.yml (1)
34-37: Remove unused rsync installation.The workflow installs rsync but doesn't use it in the subsequent steps. The file synchronization is handled through git operations instead.
Remove the unused rsync installation:
- - name: 📦 Install rsync - run: | - sudo apt-get update - sudo apt-get install -y rsyncbackend/app/modules/facts_check/llm_processing.py (1)
110-110: Replace print statement with proper logging.Debug print statements should use a proper logging framework instead of print().
- print(content) + # Consider using logging.debug(content) instead
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (24)
.github/workflows/deploy-backend-to-hf.yml(1 hunks).gitignore(1 hunks)backend/.dockerignore(1 hunks)backend/Dockerfile(1 hunks)backend/README.md(1 hunks)backend/app/db/vector_store.py(1 hunks)backend/app/modules/facts_check/llm_processing.py(1 hunks)backend/app/modules/facts_check/web_search.py(1 hunks)backend/app/modules/langgraph_builder.py(5 hunks)backend/app/modules/langgraph_nodes/fact_check.py(2 hunks)backend/app/modules/langgraph_nodes/generate_perspective.py(1 hunks)backend/app/modules/langgraph_nodes/judge.py(1 hunks)backend/app/modules/langgraph_nodes/sentiment.py(1 hunks)backend/app/modules/langgraph_nodes/store_and_send.py(1 hunks)backend/app/modules/scraper/cleaner.py(1 hunks)backend/app/modules/vector_store/chunk_rag_data.py(1 hunks)backend/app/modules/vector_store/embed.py(1 hunks)backend/app/utils/fact_check_utils.py(1 hunks)backend/app/utils/generate_chunk_id.py(1 hunks)backend/app/utils/prompt_templates.py(1 hunks)backend/app/utils/store_vectors.py(1 hunks)backend/main.py(1 hunks)backend/pyproject.toml(1 hunks)backend/start.sh(1 hunks)
✅ Files skipped from review due to trivial changes (5)
- backend/.dockerignore
- .gitignore
- backend/app/utils/prompt_templates.py
- backend/pyproject.toml
- backend/Dockerfile
🧰 Additional context used
🧬 Code Graph Analysis (5)
backend/app/utils/fact_check_utils.py (2)
backend/app/modules/facts_check/web_search.py (1)
search_with_serpapi(5-28)backend/app/modules/facts_check/llm_processing.py (2)
run_claim_extractor_sdk(12-57)run_fact_verifier_sdk(60-132)
backend/app/modules/vector_store/chunk_rag_data.py (1)
backend/app/utils/generate_chunk_id.py (1)
generate_id(4-8)
backend/app/modules/langgraph_nodes/store_and_send.py (3)
backend/app/modules/vector_store/chunk_rag_data.py (1)
chunk_rag_data(4-73)backend/app/modules/vector_store/embed.py (1)
embed_chunks(7-30)backend/app/utils/store_vectors.py (1)
store(10-32)
backend/app/modules/langgraph_nodes/fact_check.py (1)
backend/app/utils/fact_check_utils.py (1)
run_fact_check_pipeline(10-47)
backend/app/modules/langgraph_builder.py (2)
backend/app/modules/langgraph_nodes/sentiment.py (1)
run_sentiment_sdk(10-53)backend/app/modules/langgraph_nodes/error_handler.py (1)
error_handler(3-11)
🪛 Ruff (0.11.9)
backend/app/db/vector_store.py
14-14: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
40-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
backend/app/modules/langgraph_nodes/store_and_send.py
13-13: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
15-15: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
21-21: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
backend/app/utils/store_vectors.py
32-32: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🪛 actionlint (1.7.7)
.github/workflows/deploy-backend-to-hf.yml
30-30: shellcheck reported issue in this script: SC2086:info:2:31: Double quote to prevent globbing and word splitting
(shellcheck)
🪛 LanguageTool
backend/README.md
[uncategorized] ~24-~24: You might be missing the article “the” here.
Context: ...rted ### 1. Clone the repo & jump into backend folder ```bash git clone https://githu...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
🪛 markdownlint-cli2 (0.17.2)
backend/README.md
59-59: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
76-76: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (27)
backend/app/modules/scraper/cleaner.py (1)
13-86: LGTM! Text cleaning implementation is well-structured.The
clean_extracted_textfunction is well-implemented with comprehensive boilerplate removal patterns and proper text processing logic. The function handles edge cases appropriately and maintains good readability.backend/app/modules/langgraph_nodes/sentiment.py (2)
39-39: Good improvement for consistent output formatting.Converting sentiment to lowercase ensures consistent output regardless of API response formatting.
35-35: Validate max_tokens adequacy for sentiment outputsReducing
max_tokensto 3 is fine for single-word replies, but the Groq API may include punctuation or brief variations (e.g., “Positive.” or “The sentiment is positive”), which could exceed that limit. Please test against these edge cases and consider increasing to 5 tokens if needed.• File: backend/app/modules/langgraph_nodes/sentiment.py:35
- max_tokens=3, + max_tokens=5, # allow for punctuation or slight phrasing variationsbackend/main.py (1)
28-30: Good port configuration with environment variable support.Using environment variables for port configuration with a sensible default is a good practice for deployment flexibility.
backend/app/modules/facts_check/web_search.py (1)
5-28: Well-implemented search function with good error handling.The function properly validates the API key, handles search parameters correctly, and processes results with graceful fallbacks for missing keys. The implementation is clean and follows good practices.
backend/app/modules/langgraph_nodes/fact_check.py (3)
1-1: Good integration of the new fact-checking pipeline.The import of the comprehensive fact-checking pipeline replaces the previous placeholder implementation, improving functionality significantly.
11-20: Improved error handling with structured responses.The updated logic properly handles errors from the pipeline and returns structured error responses, which is better than the previous placeholder approach.
30-30: Correct integration of verification results.The function now properly returns the verifications from the pipeline as "facts" in the state, maintaining the expected output format.
backend/app/db/vector_store.py (2)
5-7: Good API key validation.Proper validation of the required environment variable with clear error message.
22-34: Good index management logic.The index creation logic properly checks for existence and creates with appropriate serverless configuration for AWS US East 1.
backend/app/modules/vector_store/chunk_rag_data.py (5)
4-13: Excellent field validation.Comprehensive validation of required fields with clear error messages. The list type check for facts is particularly good.
15-18: Smart handling of perspective data normalization.The check for
.dict()method allows for flexible input types (both dict and object with dict method).
28-32: Good safety validation for perspective object.The validation ensures the perspective object has the required attributes before accessing them.
44-67: Thorough fact validation and processing.The validation of each fact's required fields and the systematic chunk creation with unique IDs is well-implemented.
71-73: Appropriate error handling.The catch-all exception handling with logging and re-raising preserves the original error while providing debugging information.
backend/app/modules/vector_store/embed.py (4)
1-4: Good model choice and initialization.The all-MiniLM-L6-v2 model is a solid choice for general text embeddings, providing good performance with reasonable computational requirements.
9-10: Proper handling of empty input.Early return for empty chunks prevents unnecessary processing and potential errors.
12-18: Comprehensive chunk validation.The validation ensures each chunk is a dictionary with the required 'text' field, providing clear error messages with indices for debugging.
20-30: Efficient embedding generation and vector construction.The batch processing approach is efficient, and the vector construction properly maps each chunk to its embedding with preserved metadata.
.github/workflows/deploy-backend-to-hf.yml (1)
1-59: Well-structured CI/CD workflow for HF Space deployment.The workflow correctly triggers on backend changes, handles authentication securely, and implements proper git operations for deployment.
backend/app/utils/fact_check_utils.py (1)
26-47: Excellent error handling and rate limiting implementation.The search loop properly handles exceptions, logs outcomes, and includes appropriate delays to prevent rate limiting. The final verification step is well-integrated.
backend/app/modules/langgraph_nodes/judge.py (2)
6-10: Appropriate configuration for scoring task.The low max_tokens (10) is perfect for a simple scoring response, and zero temperature ensures consistent outputs.
31-43: Robust response parsing with proper error handling.The code handles multiple response formats and includes proper score validation with clamping. The regex pattern correctly extracts numeric scores.
backend/app/modules/langgraph_nodes/generate_perspective.py (1)
9-24: Excellent use of structured output and proper LLM configuration.The Pydantic model ensures type safety, and the temperature setting (0.7) is appropriate for creative perspective generation.
backend/app/modules/langgraph_builder.py (2)
14-22: Excellent addition of typed state definition.The TypedDict provides clear type hints for the state structure, improving code maintainability and IDE support.
56-102: Well-structured conditional logic with comprehensive error handling.The graph correctly handles error propagation and implements a retry mechanism with scoring thresholds. The updated method calls align with the LangGraph API.
backend/app/modules/facts_check/llm_processing.py (1)
12-58: Well-structured claim extraction function.The function has proper error handling, input validation, and returns consistent response structure. The implementation follows good practices.
| try: | ||
| nltk.data.find('corpora/stopwords') | ||
| nltk.data.find('corpora/punkt_tab') | ||
|
|
||
| except LookupError: | ||
| nltk.download('stopwords') | ||
| nltk.download('punkt_tab') |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid downloading NLTK data during module import.
Downloading NLTK data during module import can cause several issues:
- Slow import times
- Potential failures in deployment environments with limited network access
- Unnecessary downloads when NLTK functionality isn't used
Consider one of these alternatives:
Option 1: Remove the NLTK downloads entirely (if not needed):
-try:
- nltk.data.find('corpora/stopwords')
- nltk.data.find('corpora/punkt_tab')
-
-except LookupError:
- nltk.download('stopwords')
- nltk.download('punkt_tab')Option 2: Lazy loading (if NLTK will be used):
-try:
- nltk.data.find('corpora/stopwords')
- nltk.data.find('corpora/punkt_tab')
-
-except LookupError:
- nltk.download('stopwords')
- nltk.download('punkt_tab')
+def _ensure_nltk_data():
+ """Ensure NLTK data is available when needed."""
+ try:
+ nltk.data.find('corpora/stopwords')
+ nltk.data.find('corpora/punkt_tab')
+ except LookupError:
+ nltk.download('stopwords')
+ nltk.download('punkt_tab')Then call _ensure_nltk_data() only when NLTK functionality is actually needed.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| nltk.data.find('corpora/stopwords') | |
| nltk.data.find('corpora/punkt_tab') | |
| except LookupError: | |
| nltk.download('stopwords') | |
| nltk.download('punkt_tab') | |
| # Remove the top-level try/except block and instead define a helper: | |
| def _ensure_nltk_data(): | |
| """Ensure NLTK data is available when needed.""" | |
| try: | |
| nltk.data.find('corpora/stopwords') | |
| nltk.data.find('corpora/punkt_tab') | |
| except LookupError: | |
| nltk.download('stopwords') | |
| nltk.download('punkt_tab') | |
| # …later, in whatever function actually uses NLTK: | |
| # _ensure_nltk_data() | |
| # tokens = nltk.word_tokenize(text) | |
| # filtered = [w for w in tokens if w not in nltk.corpus.stopwords.words('english')] |
🤖 Prompt for AI Agents
In backend/app/modules/scraper/cleaner.py around lines 4 to 10, avoid
downloading NLTK data during module import as it slows imports and can fail in
restricted environments. Refactor by moving the NLTK data check and download
logic into a separate function, for example, _ensure_nltk_data(), and call this
function only when NLTK functionality is actually needed during runtime instead
of at import time.
| uv sync | ||
| uv run main.py |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Use exec so the app becomes PID 1 and receives signals
Without exec, the shell stays PID 1 in a container, preventing proper signal forwarding (e.g., SIGTERM on docker stop).
-uv run main.py
+exec uv run main.py📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uv sync | |
| uv run main.py | |
| uv sync | |
| exec uv run main.py |
🤖 Prompt for AI Agents
In backend/start.sh at lines 8 to 9, the commands launching the app should use
`exec` to replace the shell process with the app process. Modify the lines to
prefix the commands with `exec` so that the app becomes PID 1 and can properly
receive and handle signals like SIGTERM, ensuring correct behavior in container
environments.
| set -e | ||
|
|
||
| # Install uv if not present | ||
| pip install uv |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Install uv only when absent and pin the version
Unconditionally invoking pip install uv adds unnecessary overhead on every start and may pull a newer, untested version. Guard the install and pin a version for reproducibility.
-# Install uv if not present
-pip install uv
+# Install uv if not present
+if ! command -v uv >/dev/null 2>&1; then
+ pip install --no-cache-dir --upgrade 'uv>=0.100.0'
+fiCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In backend/start.sh at line 5, the script unconditionally runs 'pip install uv',
causing unnecessary overhead and potential version inconsistencies. Modify the
script to first check if 'uv' is already installed, and only run 'pip install
uv==<specific_version>' if it is absent. Replace <specific_version> with a fixed
version number to ensure reproducibility.
| cd new-backend | ||
| ```` |
There was a problem hiding this comment.
Correct directory name in the quick-start step
The repo places this README under backend/, not new-backend/. The current command will fail.
-cd new-backend
+cd backend📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cd new-backend | |
| ```` | |
| cd backend |
🤖 Prompt for AI Agents
In backend/README.md around lines 28 to 29, the quick-start step uses the
incorrect directory name 'new-backend'. Change the directory name in the command
from 'new-backend' to 'backend' to match the actual folder where the README and
backend code reside.
| app.add_middleware( | ||
| CORSMiddleware, | ||
| allow_origins=["*"], | ||
| allow_credentials=True, | ||
| allow_methods=["*"], | ||
| allow_headers=["*"], | ||
| ) |
There was a problem hiding this comment.
CORS configuration is too permissive for production.
Allowing all origins (allow_origins=["*"]) with credentials enabled poses security risks. Consider restricting origins based on environment.
+import os
+
+# Configure CORS origins based on environment
+origins = ["*"] if os.getenv("ENVIRONMENT") == "development" else [
+ "https://yourdomain.com",
+ "https://www.yourdomain.com"
+]
+
app.add_middleware(
CORSMiddleware,
- allow_origins=["*"],
+ allow_origins=origins,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| app.add_middleware( | |
| CORSMiddleware, | |
| allow_origins=["*"], | |
| allow_credentials=True, | |
| allow_methods=["*"], | |
| allow_headers=["*"], | |
| ) | |
| import os | |
| # Configure CORS origins based on environment | |
| origins = ["*"] if os.getenv("ENVIRONMENT") == "development" else [ | |
| "https://yourdomain.com", | |
| "https://www.yourdomain.com" | |
| ] | |
| app.add_middleware( | |
| CORSMiddleware, | |
| allow_origins=origins, | |
| allow_credentials=True, | |
| allow_methods=["*"], | |
| allow_headers=["*"], | |
| ) |
🤖 Prompt for AI Agents
In backend/main.py around lines 14 to 20, the CORS middleware configuration is
too permissive by allowing all origins with credentials enabled, which is a
security risk in production. Modify the allow_origins parameter to restrict it
to a specific list of trusted origins based on the environment (e.g.,
development vs production). Ensure that allow_credentials remains True only if
the origins are explicitly specified and trusted.
| if state.get("status") != "success": | ||
| print("❌ Claim extraction failed.") | ||
| return [], "Claim extraction failed." |
There was a problem hiding this comment.
Fix the status check logic.
The code checks state.get("status") but should check result.get("status") since result contains the response from the claim extraction SDK.
Apply this diff to fix the status check:
- if state.get("status") != "success":
+ if result.get("status") != "success":📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if state.get("status") != "success": | |
| print("❌ Claim extraction failed.") | |
| return [], "Claim extraction failed." | |
| if result.get("status") != "success": | |
| print("❌ Claim extraction failed.") | |
| return [], "Claim extraction failed." |
🤖 Prompt for AI Agents
In backend/app/utils/fact_check_utils.py around lines 13 to 15, the status check
incorrectly uses state.get("status") instead of result.get("status"). Update the
condition to check result.get("status") to correctly verify the claim extraction
SDK response status and handle failures properly.
| facts_str = "\n".join([f"Claim: {f['original_claim']}\n" | ||
| "Verdict: {f['verdict']}\nExplanation: " | ||
| "{f['explanation']}" for f in state["facts"]]) |
There was a problem hiding this comment.
Fix the string formatting syntax.
The f-string formatting is incorrect - the variable references are not properly wrapped in curly braces.
Apply this diff to fix the string formatting:
- facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
- "Verdict: {f['verdict']}\nExplanation: "
- "{f['explanation']}" for f in state["facts"]])
+ facts_str = "\n".join([f"Claim: {f['original_claim']}\n"
+ f"Verdict: {f['verdict']}\nExplanation: "
+ f"{f['explanation']}" for f in state["facts"]])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| facts_str = "\n".join([f"Claim: {f['original_claim']}\n" | |
| "Verdict: {f['verdict']}\nExplanation: " | |
| "{f['explanation']}" for f in state["facts"]]) | |
| facts_str = "\n".join([f"Claim: {f['original_claim']}\n" | |
| f"Verdict: {f['verdict']}\nExplanation: " | |
| f"{f['explanation']}" for f in state["facts"]]) |
🤖 Prompt for AI Agents
In backend/app/modules/langgraph_nodes/generate_perspective.py around lines 40
to 42, the string formatting uses f-strings but the variable references inside
the strings are not properly enclosed in curly braces. Fix this by ensuring the
entire string is an f-string and all variable references are wrapped in curly
braces within the string literals.
|
|
||
| load_dotenv() | ||
|
|
||
| client = Groq(api_key=os.getenv("GROQ_API_KEY")) |
There was a problem hiding this comment.
Add API key validation before initializing Groq client.
The Groq client is initialized without checking if the API key exists. This could lead to runtime errors if the environment variable is not set.
Consider adding validation:
-client = Groq(api_key=os.getenv("GROQ_API_KEY"))
+api_key = os.getenv("GROQ_API_KEY")
+if not api_key:
+ raise ValueError("GROQ_API_KEY environment variable is not set")
+client = Groq(api_key=api_key)🤖 Prompt for AI Agents
In backend/app/modules/facts_check/llm_processing.py at line 9, the Groq client
is initialized directly with the API key from the environment without checking
if the key exists. Add a validation step before initializing the client to check
if the GROQ_API_KEY environment variable is set; if it is missing, raise an
appropriate error or handle it gracefully to prevent runtime failures.
| try: | ||
| parsed = json.loads(content) | ||
| except Exception as parse_err: | ||
| print(f"❌ LLM JSON parse error: {parse_err}") | ||
|
|
||
| results_list.append(parsed) | ||
|
|
There was a problem hiding this comment.
Fix critical error: undefined variable usage after JSON parsing failure.
If JSON parsing fails, parsed remains undefined but is still appended to results_list on line 118, which will raise a NameError.
Apply this fix to handle parsing errors properly:
# Try parsing the JSON response
try:
parsed = json.loads(content)
+ results_list.append(parsed)
except Exception as parse_err:
print(f"❌ LLM JSON parse error: {parse_err}")
-
- results_list.append(parsed)
+ # Skip this result or add error placeholder
+ results_list.append({
+ "verdict": "Error",
+ "explanation": f"Failed to parse LLM response: {parse_err}",
+ "original_claim": claim,
+ "source_link": source
+ })📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| parsed = json.loads(content) | |
| except Exception as parse_err: | |
| print(f"❌ LLM JSON parse error: {parse_err}") | |
| results_list.append(parsed) | |
| # Try parsing the JSON response | |
| try: | |
| parsed = json.loads(content) | |
| results_list.append(parsed) | |
| except Exception as parse_err: | |
| print(f"❌ LLM JSON parse error: {parse_err}") | |
| # Skip this result or add error placeholder | |
| results_list.append({ | |
| "verdict": "Error", | |
| "explanation": f"Failed to parse LLM response: {parse_err}", | |
| "original_claim": claim, | |
| "source_link": source | |
| }) |
🤖 Prompt for AI Agents
In backend/app/modules/facts_check/llm_processing.py around lines 113 to 119,
the variable 'parsed' is appended to 'results_list' even if JSON parsing fails,
causing a NameError. To fix this, ensure that 'parsed' is only appended if JSON
parsing succeeds by moving the append statement inside the try block or adding a
conditional check that 'parsed' is defined before appending.
| return { | ||
| "claim": claim, | ||
| "verifications": results_list, | ||
| "status": "success", | ||
| } |
There was a problem hiding this comment.
Fix variable scope issue: claim is undefined in return statement.
The claim variable on line 121 references the last value from the loop iteration, not a function-level claim. This appears to be incorrect logic.
The return structure should probably not include a single claim when processing multiple claims. Consider:
return {
- "claim": claim,
"verifications": results_list,
"status": "success",
}Or if you need to return all claims, extract them properly:
return {
- "claim": claim,
+ "claims": [result.get("claim") for result in search_results],
"verifications": results_list,
"status": "success",
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In backend/app/modules/facts_check/llm_processing.py around lines 120 to 124,
the variable 'claim' used in the return statement is undefined or incorrectly
scoped as it refers to the last loop iteration variable rather than a
function-level claim. To fix this, remove the single 'claim' from the return
dictionary or replace it with a properly collected list of all claims processed.
Ensure the return structure accurately reflects the data processed, either by
returning all claims as a list or omitting the claim field if not applicable.
to resolve merge conflicts
Tasks Done:
Added Dockerfile and .dockerignore for backend deployment.
created Hugging Face Space for backend deployment and configured it.
Deployed backend
(url: https://thunder1245-perspective-backend.hf.space/api/ )
GitHub Actions workflow to Deploy backend to Hugging Face Space on each push to main branch.
Tested GitHub Actions workflow locally using
actSummary by CodeRabbit
Summary by CodeRabbit
New Features
Bug Fixes
Style
Chores