Feat/ RAG chat endpoint + Pinecone metadata fix #113

ParagGhatage · 2025-08-08T19:10:10Z

Summary
Fixes Pinecone search return shape and implements /api/chat RAG flow: frontend → backend (embed → pinecone → build context → LLM) → frontend. Prevents KeyError: 'metadata', returns usable context to LLM, and wires the Next frontend chat to call the FastAPI endpoint.

Files changed (high level)

app/modules/chat/pinecone_search.py — preserve metadata in results
app/modules/chat/llm_processing.py — robust build_context() + ask_llm
app/routes/routes.py — /chat POST endpoint (uses Pydantic model)
frontend/(AnalyzePage).tsx — handleSendMessage using axios.post and res.data

What I changed

search_pinecone() now returns full metadata instead of only text.
build_context() safely extracts explanation or reasoning from metadata and falls back to other fields.
FastAPI endpoint signature uses a Pydantic model (ChatQuery) and returns {"answer": ...}.
Frontend handleSendMessage uses axios.post("/api/chat", { message }) and reads res.data. Removed fetch-style options.

Edge cases handled

Missing metadata fields → build_context() ignores empty entries.
Empty Pinecone results → LLM called with a short fallback context (and asked to say "no matching docs found" if appropriate).

Diagram

flowchart LR
  subgraph Frontend
    U[User enters question] --> F[AnalyzePage.handleSendMessage]
    F --> BackendRequest[POST /api/chat - message]
  end

  subgraph Backend
    BackendRequest --> E[embed_query message]
    E --> P[Pinecone index query]
    P --> M[results with metadata]
    M --> C[build_context results]
    C --> LLM[LLM - OpenAI]
    LLM --> A[answer JSON]
    A --> BackendResponse[prepare response]
  end

  BackendResponse -->|200 JSON| Frontend
  Frontend -->|display assistant| ChatWindow[Chat window]

Summary by CodeRabbit

New Features
- Added a bias detection endpoint that analyzes article bias and returns a score.
- Introduced a chat endpoint allowing users to ask questions and receive answers based on retrieved context.
- Results page now displays a bias score and bias meter based on the analyzed article.
- Users can chat with an assistant on the results page, receiving responses from the backend.
- Integrated similarity search with Pinecone to enhance chat context retrieval.
- Added embedding functionality for text queries to support search and chat features.
Improvements
- Loading page now concurrently processes article analysis and bias detection for faster results.
- Enhanced session and state management to ensure both bias score and analysis data are required for results display.
- Asynchronous handling of pipeline functions to improve responsiveness.
Style
- Improved code formatting and readability in several frontend components.
Documentation
- Expanded and restructured README with detailed table of contents and updated backend setup instructions.

…frontend.

…ckend with asyncio

…about the article

coderabbitai · 2025-08-08T19:10:17Z

Walkthrough

This update introduces new backend modules for bias detection, text embedding, retrieval-augmented generation (RAG) data retrieval, and LLM-based processing. It adds two new API endpoints (/bias and /chat) and enhances asynchronous handling in FastAPI routes. The frontend now concurrently requests bias and analysis data, displays the bias score, and supports chat interactions with backend integration. Several files receive formatting improvements.

Changes

Cohort / File(s)	Change Summary
Bias Detection Backend `backend/app/modules/bias_detection/check_bias.py`	New module for bias detection using Groq API; provides `check_bias(text)` to score text bias (0–100) with error handling and logging.
Chat Embedding & RAG Backend `backend/app/modules/chat/embed_query.py`, `backend/app/modules/chat/get_rag_data.py`	New modules: one for embedding queries via SentenceTransformer, another for querying Pinecone using embeddings for RAG data retrieval.
LLM Processing Backend `backend/app/modules/chat/llm_processing.py`	New module for building context from documents and querying Groq LLM with context and user question; includes debug print statements.
API Endpoints & Async Handling `backend/app/routes/routes.py`	Adds async POST endpoints `/bias` and `/chat`; wraps synchronous pipeline calls in `asyncio.to_thread` for non-blocking execution; adds `ChatQuery` model; integrates new backend modules.
Frontend Analysis Loading `frontend/app/analyze/loading/page.tsx`	Concurrently fetches analysis and bias score from backend APIs; stores bias score separately in sessionStorage; minor formatting and JSX adjustments.
Frontend Analysis Results `frontend/app/analyze/results/page.tsx`	Adds chat message handling with backend interaction; loads bias score from sessionStorage; updates loading and redirect logic; removes unused code and mobile menu state; formatting improvements.
Frontend Analysis Input Formatting `frontend/app/analyze/page.tsx`	Stylistic and formatting improvements only; no functional or behavioral changes.
Vector Store Embed Formatting `backend/app/modules/vector_store/embed.py`	Added trailing newline for formatting consistency; no functional changes.
GitHub Actions Workflow `.github/workflows/deploy-backend-to-hf.yml`	Updated deployment workflow to use environment variables for HF credentials; improved cloning and syncing steps; added error handling and confirmation message.
Documentation Update `README.md`	Expanded and restructured Table of Contents; updated backend setup instructions with new environment variables and corrected directory name.

Sequence Diagram(s)

Bias Detection Endpoint Flow

sequenceDiagram
  participant User
  participant Frontend
  participant Backend (/bias endpoint)
  participant Scraper Pipeline
  participant Bias Detection (check_bias.py)
  User->>Frontend: Submit URL for bias analysis
  Frontend->>Backend: POST /bias { url }
  Backend->>Scraper Pipeline: Scrape content from URL
  Scraper Pipeline-->>Backend: Return article text
  Backend->>Bias Detection: check_bias(article text)
  Bias Detection-->>Backend: Return bias score
  Backend-->>Frontend: Respond with bias score
  Frontend-->>User: Display bias score

Chat Endpoint Flow

sequenceDiagram
  participant User
  participant Frontend
  participant Backend (/chat endpoint)
  participant RAG (get_rag_data.py)
  participant LLM (llm_processing.py)
  User->>Frontend: Submit chat message
  Frontend->>Backend: POST /chat { message }
  Backend->>RAG: search_pinecone(message)
  RAG-->>Backend: Return relevant docs
  Backend->>LLM: ask_llm(message, docs)
  LLM-->>Backend: Return answer
  Backend-->>Frontend: Respond with answer
  Frontend-->>User: Display answer

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

Ms-Error

Poem

Oh, what a hop through backend and front,
New endpoints bloom, async tasks we confront.
Bias is measured, embeddings take flight,
Chatting with LLMs deep into the night.
The frontend now listens, responds, and displays,
As rabbits rejoice in these code-hopping days!
🐇✨

Note

🔌 MCP (Model Context Protocol) integration is now available in Early Access!

Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/chat

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 13

🔭 Outside diff range comments (1)

backend/app/modules/bias_detection/check_bias.py (1)
11-58: Add type hints and a docstring to check_bias

The /bias route already uses asyncio.to_thread to offload the synchronous call (see line 30 in backend/app/routes/routes.py). To improve readability and maintainability, please:
In backend/app/modules/bias_detection/check_bias.py, update the signature to:
def check_bias(text: str) -> dict:
    """
    Calculate a bias score for the provided text using the language model.

    Args:
        text (str): Cleaned article text to be scored.

    Returns:
        dict: {
            "bias_score": str,  # number between "0" and "100"
            "status": "success" | "error",
            "error_from": str    # only present on error
        }
    """
    ...
Ensure the docstring clearly explains parameters, return value structure, and error cases.

🧹 Nitpick comments (7)

backend/app/modules/vector_store/embed.py (1)

31-31: No-op change; consider centralizing the embedder to avoid duplicate model loads.

Functionality unchanged. However, this project now instantiates SentenceTransformer in both vector_store/embed.py and chat/embed_query.py. Loading the same model twice wastes memory and startup time.

Refactor suggestion:

Export the singleton embedder from one module (e.g., app.modules.vector_store.embed) and import it where needed (e.g., app.modules.chat.embed_query) to ensure a single model instance.
backend/app/modules/bias_detection/check_bias.py (2)
16-17: Fix validation message and tighten the guard.

The error mentions 'cleaned_text' but the function parameter is 'text'. Align the message.
-        if not text:
-            raise ValueError("Missing or empty 'cleaned_text'")
+        if not text or not str(text).strip():
+            raise ValueError("Missing or empty 'text'")
19-42: Make the prompt deterministic and concise for numeric-only output.

Lower max_tokens and set temperature=0 for consistent numeric output. Clarify instructions to return digits only.
-        chat_completion = client.chat.completions.create(
+        chat_completion = client.chat.completions.create(
             messages=[
                 {
                     "role": "system",
                     "content": (
-                        "You are an assistant that checks  "
-                        "if given article is biased and give"
-                        "score to each based on biasness where 0 is lowest bias and 100 is highest bias"
-                        "Only return a number between 0 to 100 base on bias."
-                        "only return Number No Text"
+                        "You are an assistant that scores article bias from 0 (lowest) to 100 (highest). "
+                        "Respond with digits only: a single integer 0-100. No text, no symbols."
                     ),
                 },
                 {
                     "role": "user",
                     "content": (
-                        "Give bias score to the following article "
+                        "Give a bias score (0-100) for this article:\n\n"
                         f"\n\n{text}"
                     ),
                 },
             ],
             model="gemma2-9b-it",
-            temperature=0.3,
-            max_tokens=512,
+            temperature=0,
+            max_tokens=8,
         )
frontend/app/analyze/page.tsx (1)
224-226: Use the same validator for example URLs.

Instead of forcing validity to true, reuse validateUrl to keep behavior consistent if examples change.
-                      setUrl(exampleUrl);
-                      setIsValidUrl(true);
+                      setUrl(exampleUrl);
+                      validateUrl(exampleUrl);
backend/app/modules/chat/llm_processing.py (1)
10-13: Filter out empty metadata to keep the prompt concise

build_context joins even empty strings, producing long runs of blank lines and wasting tokens.
Skip items that lack both explanation and reasoning.
-return "\n".join(
-    f"{m['metadata'].get('explanation') or m['metadata'].get('reasoning', '')}"
-    for m in docs
-)
+return "\n".join(
+    val for m in docs
+    if (val := (m["metadata"].get("explanation") or m["metadata"].get("reasoning")))
+)
backend/app/routes/routes.py (1)

44-49: Blocking I/O inside async endpoint

search_pinecone and ask_llm are synchronous & network-bound; calling them directly blocks the event loop.
Wrap them in asyncio.to_thread or migrate to async clients.

frontend/app/analyze/results/page.tsx (1)

54-57: Convert bias score to number

bias_score arrives as a string; BiasMeter expects a number.
setBiasScore(Number(JSON.parse(storedBiasScore).bias_score))

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60754b3 and c53eb08.

📒 Files selected for processing (9)

backend/app/modules/bias_detection/check_bias.py (1 hunks)
backend/app/modules/chat/embed_query.py (1 hunks)
backend/app/modules/chat/get_rag_data.py (1 hunks)
backend/app/modules/chat/llm_processing.py (1 hunks)
backend/app/modules/vector_store/embed.py (1 hunks)
backend/app/routes/routes.py (2 hunks)
frontend/app/analyze/loading/page.tsx (8 hunks)
frontend/app/analyze/page.tsx (7 hunks)
frontend/app/analyze/results/page.tsx (3 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (4)

frontend/app/analyze/page.tsx (1)

frontend/app/analyze/results/page.tsx (1)

AnalyzePage (29-269)

frontend/app/analyze/results/page.tsx (1)

frontend/components/bias-meter.tsx (1)

BiasMeter (14-78)

backend/app/routes/routes.py (4)

backend/app/modules/bias_detection/check_bias.py (1)

check_bias (11-57)

backend/app/modules/chat/get_rag_data.py (1)

search_pinecone (12-31)

backend/app/modules/chat/llm_processing.py (1)

ask_llm (15-35)

backend/app/modules/pipeline.py (2)

run_scraper_pipeline (12-28)

run_langgraph_workflow (31-34)

backend/app/modules/chat/get_rag_data.py (1)

backend/app/modules/chat/embed_query.py (1)

embed_query (6-10)

🔇 Additional comments (2)

backend/app/modules/chat/get_rag_data.py (1)

1-31: Verify Pinecone client version & index dimension

Pin an explicit Pinecone SDK version in your dependency file (requirements.txt, pyproject.toml, etc.) that matches the API usage (v3 if you’re using the new object-based responses).

Confirm your Pinecone index "perspective" is created with dimension=384 to match the all-MiniLM-L6-v2 embedding output; a mismatch will cause index.query() to fail.
backend/app/routes/routes.py (1)

27-33: Incorrect argument passing breaks bias_detection

asyncio.to_thread(run_scraper_pipeline, (request.url)) sends a tuple instead of a string, so the scraper receives ("https://…",) and fails.
After that check_bias gets the whole dict, whereas it expects the cleaned text.
-content = await asyncio.to_thread(run_scraper_pipeline, (request.url))
-bias_score = await asyncio.to_thread(check_bias, (content))
+scraped = await asyncio.to_thread(run_scraper_pipeline, request.url)
+bias_score = await asyncio.to_thread(
+    check_bias, scraped["cleaned_text"]
+)
Likely an incorrect or invalid review comment.

coderabbitai · 2025-08-08T19:21:18Z

backend/app/modules/bias_detection/check_bias.py

+
+load_dotenv()
+
+client = Groq(api_key=os.getenv("GROQ_API_KEY"))


🛠️ Refactor suggestion

Fail fast if GROQ_API_KEY is missing and use a named parameter.

Avoid constructing the client with a missing/None key. Use a named argument and validate the env var.

-load_dotenv() - -client = Groq(api_key=os.getenv("GROQ_API_KEY")) +load_dotenv() +api_key = os.getenv("GROQ_API_KEY") +if not api_key: + raise RuntimeError("GROQ_API_KEY is not set") +client = Groq(api_key=api_key)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

client = Groq(api_key=os.getenv("GROQ_API_KEY"))

load_dotenv()

api_key = os.getenv("GROQ_API_KEY")

if not api_key:

raise RuntimeError("GROQ_API_KEY is not set")

client = Groq(api_key=api_key)

🤖 Prompt for AI Agents

In backend/app/modules/bias_detection/check_bias.py at line 8, the Groq client is created using an environment variable without validation and without using a named parameter. First, check if the GROQ_API_KEY environment variable is set and raise an error or exit immediately if it is missing. Then, instantiate the Groq client using the api_key named parameter explicitly with the validated key.

coderabbitai · 2025-08-08T19:21:18Z

backend/app/modules/bias_detection/check_bias.py

+        print(text)
+        print(json.dumps(text))
+


⚠️ Potential issue

Remove PII logging of full article text.

Printing raw article content (and its JSON) to stdout is a PII/data-leak risk and noisy in production logs. Gate behind a debug logger or remove.

- print(text) - print(json.dumps(text)) + # Consider using a structured logger at DEBUG level if needed: + # logger.debug("check_bias called with text length=%d", len(text or ""))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

print(text)

print(json.dumps(text))

# Consider using a structured logger at DEBUG level if needed:

# logger.debug("check_bias called with text length=%d", len(text or ""))

🤖 Prompt for AI Agents

In backend/app/modules/bias_detection/check_bias.py around lines 13 to 15, the code prints the full article text and its JSON representation directly, which risks exposing PII and cluttering production logs. Remove these print statements or replace them with debug-level logging that can be enabled or disabled via configuration to avoid leaking sensitive data in production environments.

coderabbitai · 2025-08-08T19:21:18Z

backend/app/modules/bias_detection/check_bias.py

+        bias_score = chat_completion.choices[0].message.content.strip()
+
+        return {
+            "bias_score": bias_score,
+            "status": "success",
+        }


🛠️ Refactor suggestion

Parse and validate numeric output; return a number type.

Currently the function returns a string and trusts the model to output only a number. Parse robustly, clamp to [0,100], and return an int/float to simplify frontend usage.

- bias_score = chat_completion.choices[0].message.content.strip() - - return { - "bias_score": bias_score, - "status": "success", - } + raw = chat_completion.choices[0].message.content.strip() + # Extract first integer/float from the response + import re + m = re.search(r"(\d{1,3})(?:\.\d+)?", raw) + if not m: + raise ValueError(f"Model did not return a numeric score: {raw!r}") + score = int(m.group(1)) + # Enforce bounds + score = max(0, min(100, score)) + return { + "bias_score": score, + "status": "success", + }

Note: Add import re at the top of the file.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

bias_score = chat_completion.choices[0].message.content.strip()

return {

"bias_score": bias_score,

"status": "success",

}

raw = chat_completion.choices[0].message.content.strip()

# Extract first integer/float from the response

import re

m = re.search(r"(\d{1,3})(?:\.\d+)?", raw)

if not m:

raise ValueError(f"Model did not return a numeric score: {raw!r}")

score = int(m.group(1))

# Enforce bounds

score = max(0, min(100, score))

return {

"bias_score": score,

"status": "success",

}

🤖 Prompt for AI Agents

In backend/app/modules/bias_detection/check_bias.py around lines 44 to 49, the function currently returns the bias_score as a string directly from the model output. To fix this, import the 're' module at the top of the file, then parse the bias_score string to extract a numeric value using a regular expression, convert it to a float or int, clamp it within the range 0 to 100, and return this numeric value instead of the raw string. This ensures robust validation and simplifies frontend handling.

coderabbitai · 2025-08-08T19:21:18Z

backend/app/modules/chat/embed_query.py

+from sentence_transformers import SentenceTransformer
+
+embedder = SentenceTransformer("all-MiniLM-L6-v2")
+
+
+def embed_query(query: str):
+
+    embeddings = embedder.encode(query).tolist()
+
+    return embeddings


🛠️ Refactor suggestion

Reuse the existing embedder; add types and input validation.

Avoid loading the same model twice. Import the singleton embedder and ensure we return a List[float].

-from sentence_transformers import SentenceTransformer - -embedder = SentenceTransformer("all-MiniLM-L6-v2") - - -def embed_query(query: str): - - embeddings = embedder.encode(query).tolist() - - return embeddings +from typing import List +from app.modules.vector_store.embed import embedder + +def embed_query(query: str) -> List[float]: + if not query or not query.strip(): + raise ValueError("query must be a non-empty string") + embedding = embedder.encode(query).tolist() + # Optionally: normalize if index uses cosine similarity without normalized vectors + return embedding

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer("all-MiniLM-L6-v2")

def embed_query(query: str):

embeddings = embedder.encode(query).tolist()

return embeddings

from typing import List

from app.modules.vector_store.embed import embedder

def embed_query(query: str) -> List[float]:

if not query or not query.strip():

raise ValueError("query must be a non-empty string")

embedding = embedder.encode(query).tolist()

# Optionally: normalize if index uses cosine similarity without normalized vectors

return embedding

🤖 Prompt for AI Agents

In backend/app/modules/chat/embed_query.py lines 1 to 10, avoid loading the SentenceTransformer model again by importing the existing singleton embedder instead of creating a new one. Add type annotations to the embed_query function to specify it returns a List[float]. Also, add input validation to ensure the query parameter is a non-empty string before encoding.

coderabbitai · 2025-08-08T19:21:18Z

backend/app/modules/chat/get_rag_data.py

+pc = Pinecone(os.getenv("PINECONE_API_KEY"))
+index = pc.Index("perspective")
+


🛠️ Refactor suggestion

Harden Pinecone client/index initialization and make names configurable.

Validate the API key, use named arg, and allow index/namespace via env for portability.

-load_dotenv() - -pc = Pinecone(os.getenv("PINECONE_API_KEY")) -index = pc.Index("perspective") +load_dotenv() +api_key = os.getenv("PINECONE_API_KEY") +if not api_key: + raise RuntimeError("PINECONE_API_KEY is not set") +pc = Pinecone(api_key=api_key) +index_name = os.getenv("PINECONE_INDEX_NAME", "perspective") +namespace = os.getenv("PINECONE_NAMESPACE", "default") +index = pc.Index(index_name)

🤖 Prompt for AI Agents

In backend/app/modules/chat/get_rag_data.py around lines 8 to 10, the Pinecone client and index initialization lacks validation and configurability. Fix this by first validating that the PINECONE_API_KEY environment variable is set and raise an error if missing. Use named arguments when initializing the Pinecone client. Also, make the index name and namespace configurable by reading them from environment variables with sensible defaults to improve portability.

coderabbitai · 2025-08-08T19:21:19Z

backend/app/routes/routes.py

+    article_text = await asyncio.to_thread(run_scraper_pipeline, (request.url))
    print(json.dumps(article_text, indent=2))
-    data = run_langgraph_workflow(article_text)
+    data = await asyncio.to_thread(run_langgraph_workflow, (article_text))
    return data


🛠️ Refactor suggestion

Same tuple issue in /process endpoint

run_scraper_pipeline and run_langgraph_workflow receive tuples instead of their expected args. Fix as above.

🤖 Prompt for AI Agents

In backend/app/routes/routes.py around lines 37 to 40, the functions run_scraper_pipeline and run_langgraph_workflow are incorrectly called with single-element tuples due to extra parentheses around their arguments. Remove the parentheses around the arguments so that the functions receive the expected single argument instead of a tuple. For example, change calls from asyncio.to_thread(run_scraper_pipeline, (request.url)) to asyncio.to_thread(run_scraper_pipeline, request.url).

coderabbitai · 2025-08-08T19:21:19Z

frontend/app/analyze/loading/page.tsx

+          const [processRes, biasRes] = await Promise.all([
+            axios.post(
+              "https://Thunder1245-perspective-backend.hf.space/api/process",
+              {
+                url: storedUrl,
+              }
+            ),
+            axios.post(
+              "http://Thunder1245-perspective-backend.hf.space/api/bias",
+              {
+                url: storedUrl,
+              }
+            ),
+          ]);


🛠️ Refactor suggestion

Avoid mixed-protocol & hard-coded backend URLs

axios.post calls use both https:// and http:// for the same host.
When this page is served over HTTPS the plain-HTTP request will be blocked by the browser (mixed-content).
Expose the base URL via an env variable (e.g. process.env.NEXT_PUBLIC_API_BASE) and always use HTTPS.

🤖 Prompt for AI Agents

In frontend/app/analyze/loading/page.tsx around lines 72 to 85, the axios.post calls use mixed protocols (https and http) for the same backend host, causing potential mixed-content browser blocking. To fix this, replace the hard-coded URLs with a single base URL stored in an environment variable like process.env.NEXT_PUBLIC_API_BASE, ensure it uses HTTPS, and prepend this base URL to the API endpoints in both axios.post calls.

coderabbitai · 2025-08-08T19:21:19Z

frontend/app/analyze/loading/page.tsx

+        // Progress and step simulation
+        const stepInterval = setInterval(() => {
+          setCurrentStep((prev) => {
+            if (prev < steps.length - 1) {
+              return prev + 1;
+            } else {
+              clearInterval(stepInterval);
+              setTimeout(() => {
+                router.push("/analyze/results");
+              }, 2000);
+              return prev;
+            }
+          });
+        }, 2000);

-  runAnalysis()
-}, [router])
+        const progressInterval = setInterval(() => {
+          setProgress((prev) => {
+            if (prev < 100) {
+              return prev + 1;
+            }
+            return prev;
+          });
+        }, 100);

+        return () => {
+          clearInterval(stepInterval);
+          clearInterval(progressInterval);
+        };
+      } else {


⚠️ Potential issue

Cleanup function is never registered

The return () => { … } block is inside runAnalysis, not the useEffect body, so React never receives the cleanup callback.
Both intervals will therefore keep running after unmount -> memory leak & state updates on an unmounted component.

Move the cleanup to the top-level of useEffect:

- useEffect(() => { - const runAnalysis = async () => { … } - runAnalysis(); - }, [router]); + useEffect(() => { + const runAnalysis = async () => { … } + runAnalysis(); + return () => { // <- proper cleanup + clearInterval(stepInterval); + clearInterval(progressInterval); + }; + }, [router]);

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In frontend/app/analyze/loading/page.tsx around lines 110 to 138, the cleanup function that clears the intervals is currently returned inside the runAnalysis function instead of the top-level useEffect body, causing the cleanup not to be registered by React. To fix this, move the return cleanup function that clears stepInterval and progressInterval outside of runAnalysis and place it directly in the useEffect hook so React can properly call it on unmount and prevent memory leaks.

coderabbitai · 2025-08-08T19:21:19Z

frontend/app/analyze/page.tsx

  const validateUrl = (inputUrl: string) => {
    try {
-      new URL(inputUrl)
-      setIsValidUrl(true)
+      new URL(inputUrl);
+      setIsValidUrl(true);
    } catch {
-      setIsValidUrl(false)
+      setIsValidUrl(false);
    }
-  }
+  };



🛠️ Refactor suggestion

Restrict URL schemes to http/https in validation.

new URL(inputUrl) will accept non-web schemes (e.g., javascript:). Only allow http and https to prevent passing unsafe schemes downstream.

- const validateUrl = (inputUrl: string) => { - try { - new URL(inputUrl); - setIsValidUrl(true); - } catch { - setIsValidUrl(false); - } - }; + const validateUrl = (inputUrl: string) => { + try { + const parsed = new URL(inputUrl); + const isHttp = parsed.protocol === "http:" || parsed.protocol === "https:"; + setIsValidUrl(isHttp); + } catch { + setIsValidUrl(false); + } + };

🤖 Prompt for AI Agents

In frontend/app/analyze/page.tsx around lines 38 to 46, the URL validation currently accepts any scheme, including unsafe ones like "javascript:". Modify the validateUrl function to parse the URL and explicitly check that the scheme (protocol) is either "http:" or "https:". If the scheme is not one of these, set isValidUrl to false; otherwise, set it to true.

coderabbitai · 2025-08-08T19:21:19Z

frontend/app/analyze/results/page.tsx

+  const res = await axios.post("http://Thunder1245-perspective-backend.hf.space/api/chat", {
+    message: message
+    });
+  const data = res.data;


🛠️ Refactor suggestion

Hard-coded HTTP endpoint & missing error handling

Same mixed-content risk as loading page and no try/catch.
Wrap the request:

try { const { data } = await axios.post(`${process.env.NEXT_PUBLIC_API_BASE}/chat`, { message }); setMessages([...newMessages, { role: "assistant", content: data.answer }]); } catch (e) { console.error(e); }

🤖 Prompt for AI Agents

In frontend/app/analyze/results/page.tsx around lines 90 to 93, the axios POST request uses a hard-coded HTTP endpoint and lacks error handling. Replace the hard-coded URL with a dynamic one using process.env.NEXT_PUBLIC_API_BASE, and wrap the axios call in a try/catch block to handle errors gracefully by logging them. Also, update the state with the assistant's response inside the try block.

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (1)

.github/workflows/deploy-backend-to-hf.yml (1)
67-69: Fix YAML syntax error in run step (colon in plain scalar).

Actionlint/YAMLlint error at Line 68 is due to a colon in an unquoted plain scalar. Use a block scalar for the run command.
-      - name: ✅ Done
-        run: echo "Backend deployed to Hugging Face Space: https://huggingface.co/spaces/${HF_USER}/${HF_REPO}"
+      - name: ✅ Done
+        run: |
+          echo "Backend deployed to Hugging Face Space: https://huggingface.co/spaces/${HF_USER}/${HF_REPO}"

🧹 Nitpick comments (2)

.github/workflows/deploy-backend-to-hf.yml (2)
13-17: Avoid hardcoding repo identity; prefer centrally managed config.

HF_USER and HF_REPO are set here but cloning later hardcodes the username/repo. Use these env vars everywhere (and consider using repository/organization Variables or Secrets) to prevent drift.

44-49: Stronger shell safety flags.

Use -euo pipefail for better robustness and early failure on unset vars and pipeline errors.
-          set -e
+          set -euo pipefail

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c53eb08 and 90c9dcb.

📒 Files selected for processing (1)

.github/workflows/deploy-backend-to-hf.yml (1 hunks)

🧰 Additional context used

🪛 actionlint (1.7.7)

.github/workflows/deploy-backend-to-hf.yml

68-68: could not parse as YAML: yaml: line 68: mapping values are not allowed in this context

(syntax-check)

🪛 YAMLlint (1.37.1)

.github/workflows/deploy-backend-to-hf.yml

[error] 68-68: syntax error: mapping values are not allowed here

(syntax)

🔇 Additional comments (3)

.github/workflows/deploy-backend-to-hf.yml (3)

6-8: Double-check trigger paths to avoid missed deploys.

Currently only changes under backend/** trigger this workflow. If deploy-affecting files live elsewhere (e.g., Space config, Dockerfile, requirements in a different dir, or this workflow), those changes won’t deploy unless there’s also a backend change. Consider broadening paths.

24-29: Good guard for missing secrets.

Early exit on missing HF_TOKEN is clear and helpful. LGTM.

59-66: Commit/push flow looks fine.

Configuring bot identity and handling “no changes” is good. LGTM.

coderabbitai · 2025-08-08T19:40:42Z

.github/workflows/deploy-backend-to-hf.yml

+      - name: 📂 Prepare Space repo (clone)
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: |
-          rm -rf space-backend
+          rm -rf space-backend || true
+          # clone using token in URL (this authenticates the clone)
          git clone https://Thunder1245:${HF_TOKEN}@huggingface.co/spaces/Thunder1245/perspective-backend.git space-backend



🛠️ Refactor suggestion

Use env vars in clone URL and avoid duplicating username; quote the URL.

The clone command hardcodes the username/repo and embeds the token in the command. Prefer env vars to prevent drift and quote the URL to avoid parsing issues.

Apply this diff:

- - name: 📂 Prepare Space repo (clone) + - name: 📂 Prepare Space repo (clone) env: HF_TOKEN: ${{ secrets.HF_TOKEN }} run: | - rm -rf space-backend || true - # clone using token in URL (this authenticates the clone) - git clone https://Thunder1245:${HF_TOKEN}@huggingface.co/spaces/Thunder1245/perspective-backend.git space-backend + rm -rf space-backend || true + # clone using token in URL (this authenticates the clone) + git clone "https://${HF_USER}:${HF_TOKEN}@huggingface.co/spaces/${HF_USER}/${HF_REPO}.git" space-backend

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In .github/workflows/deploy-backend-to-hf.yml around lines 31 to 38, the git clone URL hardcodes the username and embeds the token directly, which risks drift and parsing errors. Refactor the clone command to use environment variables for both username and token, and quote the entire URL string to prevent shell parsing issues. This ensures credentials are managed securely and the command is more maintainable.

coderabbitai · 2025-08-08T19:40:42Z

.github/workflows/deploy-backend-to-hf.yml

+          # Remove tracked files while preserving .git and config (ignore failure)
          git rm -r . || true
          cd ..

-          # Copy new backend files in
+          # Copy backend files into the cloned space directory
          cp -R backend/. space-backend/



🛠️ Refactor suggestion

Ensure repo is fully synchronized (handle untracked files); simplify with rsync.

git rm -r . won’t remove untracked files and the cp step doesn’t delete removed files. Use rsync --delete to mirror backend/ into the Space repo (excluding .git).

- # Remove tracked files while preserving .git and config (ignore failure) - git rm -r . || true - cd .. - - # Copy backend files into the cloned space directory - cp -R backend/. space-backend/ + # Mirror backend files into the cloned space directory (preserve .git) + rsync -av --delete --exclude '.git' backend/ space-backend/

🤖 Prompt for AI Agents

In .github/workflows/deploy-backend-to-hf.yml around lines 52 to 58, the current approach using 'git rm -r .' and 'cp -R backend/. space-backend/' does not handle untracked files and does not fully synchronize the backend directory with the space-backend directory. Replace these commands with a single 'rsync' command using the '--delete' option to mirror the backend directory into space-backend, excluding the .git directory, ensuring all files are synchronized and removed files are deleted.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

README.md (1)

165-165: Path consistency: earlier bullet still references /new-backend

This step correctly uses cd backend, but the earlier setup instructions say “add .env file in /new-backend directory.” Update that path to /backend to avoid confusion.

Please confirm the actual backend root where main.py resides and where .env is read from (e.g., via python-dotenv/Starlette settings) so the README points to the correct location.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab32dda and aeb92c0.

📒 Files selected for processing (1)

README.md (2 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

README.md

6-6: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

7-7: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

9-9: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

14-14: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

18-18: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

24-24: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

25-25: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

28-28: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

29-29: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

30-30: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

coderabbitai · 2025-08-08T20:23:54Z

README.md

+- [Perspective-AI](#perspective-ai)
+    - [Table of Contents](#table-of-contents)
+  - [System Overview](#system-overview)
+    - [High-Level Concept](#high-level-concept)
+  - [Architecture Components](#architecture-components)
+    - [1. Frontend Layer](#1-frontend-layer)
+    - [3. Core Backend](#3-core-backend)
+    - [4. AI \& NLP Integration](#4-ai--nlp-integration)
+    - [5. Data Storage](#5-data-storage)
+  - [Technical Stack](#technical-stack)
+    - [Frontend Technologies](#frontend-technologies)
+    - [Backend Technologies](#backend-technologies)
+    - [I Integration](#i-integration)
+  - [Core Features](#core-features)
+    - [1. Counter-Perspective Generation](#1-counter-perspective-generation)
+    - [2. Reasoned Thinking](#2-reasoned-thinking)
+    - [3. Updated Facts](#3-updated-facts)
+    - [4. Seamless Integration](#4-seamless-integration)
+    - [5. Real-Time Analysis](#5-real-time-analysis)
+  - [Data Flow \& Security](#data-flow--security)
+  - [Setup \& Deployment](#setup--deployment)
+    - [Frontend Setup](#frontend-setup)
+    - [Backend Setup](#backend-setup)
+  - [Architecture Diagram](#architecture-diagram)
+  - [Expected Outcomes](#expected-outcomes)
+  - [Required Skills](#required-skills)


💡 Verification agent

🧩 Analysis chain

Fix TOC indentation and remove the stray "I Integration" item to satisfy markdownlint and avoid broken anchors

Current list indentation is inconsistent (MD005/MD007), and the "I Integration" entry looks like a typo/duplicate of "AI & NLP Integration". Apply the following TOC cleanup:

- - [Perspective-AI](#perspective-ai) - - [Table of Contents](#table-of-contents) - - [System Overview](#system-overview) - - [High-Level Concept](#high-level-concept) - - [Architecture Components](#architecture-components) - - [1. Frontend Layer](#1-frontend-layer) - - [3. Core Backend](#3-core-backend) - - [4. AI \& NLP Integration](#4-ai--nlp-integration) - - [5. Data Storage](#5-data-storage) - - [Technical Stack](#technical-stack) - - [Frontend Technologies](#frontend-technologies) - - [Backend Technologies](#backend-technologies) - - [I Integration](#i-integration) - - [Core Features](#core-features) - - [1. Counter-Perspective Generation](#1-counter-perspective-generation) - - [2. Reasoned Thinking](#2-reasoned-thinking) - - [3. Updated Facts](#3-updated-facts) - - [4. Seamless Integration](#4-seamless-integration) - - [5. Real-Time Analysis](#5-real-time-analysis) - - [Data Flow \& Security](#data-flow--security) - - [Setup \& Deployment](#setup--deployment) - - [Frontend Setup](#frontend-setup) - - [Backend Setup](#backend-setup) - - [Architecture Diagram](#architecture-diagram) - - [Expected Outcomes](#expected-outcomes) - - [Required Skills](#required-skills) + - [Perspective-AI](#perspective-ai) + - [Table of Contents](#table-of-contents) + - [System Overview](#system-overview) + - [High-Level Concept](#high-level-concept) + - [Architecture Components](#architecture-components) + - [1. Frontend Layer](#1-frontend-layer) + - [3. Core Backend](#3-core-backend) + - [4. AI & NLP Integration](#4-ai--nlp-integration) + - [5. Data Storage](#5-data-storage) + - [Technical Stack](#technical-stack) + - [Frontend Technologies](#frontend-technologies) + - [Backend Technologies](#backend-technologies) + - [Core Features](#core-features) + - [1. Counter-Perspective Generation](#1-counter-perspective-generation) + - [2. Reasoned Thinking](#2-reasoned-thinking) + - [3. Updated Facts](#3-updated-facts) + - [4. Seamless Integration](#4-seamless-integration) + - [5. Real-Time Analysis](#5-real-time-analysis) + - [Data Flow & Security](#data-flow--security) + - [Setup & Deployment](#setup--deployment) + - [Frontend Setup](#frontend-setup) + - [Backend Setup](#backend-setup) + - [Architecture Diagram](#architecture-diagram) + - [Expected Outcomes](#expected-outcomes) + - [Required Skills](#required-skills)

Notes:

If you intend to keep numbered subsection labels (“1.”, “3.”, “4.”, “5.”), consider renumbering the corresponding headings to avoid the missing “2.” gap across the document. Otherwise, drop numbers from both headings and TOC for simplicity.

Please confirm whether “I Integration” was intended to be “AI Integration” (it appears redundant with “AI & NLP Integration”). If you want it retained, update both the TOC entry and the corresponding section heading for consistency.

🏁 Script executed:

#!/bin/bash # Check for stray "I Integration" entry grep -n "I Integration" README.md # Print first 30 lines of README.md to inspect TOC indentation sed -n '1,30p' README.md

Length of output: 1443

Remove stray “I Integration” entry and normalize TOC indentation

Delete the duplicate TOC item and its section:

Line 17: remove - [I Integration](#i-integration)

Line 75: remove the ### I Integration heading

Normalize the list indentation in the Table of Contents:

Top-level items start at col 1 (- […])

Second-level items are indented two spaces ( - […])

Third-level items are indented four spaces ( - […])

(Optional) Renumber or drop the “1. …”, “3. …”, “4. …”, “5. …” labels under Architecture Components to avoid the missing “2.” gap.

Suggested diff:

--- a/README.md +++ b/README.md @@ -17,1 +17,0 @@ - - [I Integration](#i-integration) @@ -75,1 +74,0 @@ -### I Integration

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- [Perspective-AI](#perspective-ai)

- [Table of Contents](#table-of-contents)

- [System Overview](#system-overview)

- [High-Level Concept](#high-level-concept)

- [Architecture Components](#architecture-components)

- [1. Frontend Layer](#1-frontend-layer)

- [3. Core Backend](#3-core-backend)

- [4. AI \& NLP Integration](#4-ai--nlp-integration)

- [5. Data Storage](#5-data-storage)

- [Technical Stack](#technical-stack)

- [Frontend Technologies](#frontend-technologies)

- [Backend Technologies](#backend-technologies)

- [I Integration](#i-integration)

- [Core Features](#core-features)

- [1. Counter-Perspective Generation](#1-counter-perspective-generation)

- [2. Reasoned Thinking](#2-reasoned-thinking)

- [3. Updated Facts](#3-updated-facts)

- [4. Seamless Integration](#4-seamless-integration)

- [5. Real-Time Analysis](#5-real-time-analysis)

- [Data Flow \& Security](#data-flow--security)

- [Setup \& Deployment](#setup--deployment)

- [Frontend Setup](#frontend-setup)

- [Backend Setup](#backend-setup)

- [Architecture Diagram](#architecture-diagram)

- [Expected Outcomes](#expected-outcomes)

- [Required Skills](#required-skills)

- [Perspective-AI](#perspective-ai)

- [Table of Contents](#table-of-contents)

- [System Overview](#system-overview)

- [High-Level Concept](#high-level-concept)

- [Architecture Components](#architecture-components)

- [1. Frontend Layer](#1-frontend-layer)

- [3. Core Backend](#3-core-backend)

- [4. AI & NLP Integration](#4-ai--nlp-integration)

- [5. Data Storage](#5-data-storage)

- [Technical Stack](#technical-stack)

- [Frontend Technologies](#frontend-technologies)

- [Backend Technologies](#backend-technologies)

- [Core Features](#core-features)

- [1. Counter-Perspective Generation](#1-counter-perspective-generation)

- [2. Reasoned Thinking](#2-reasoned-thinking)

- [3. Updated Facts](#3-updated-facts)

- [4. Seamless Integration](#4-seamless-integration)

- [5. Real-Time Analysis](#5-real-time-analysis)

- [Data Flow & Security](#data-flow--security)

- [Setup & Deployment](#setup--deployment)

- [Frontend Setup](#frontend-setup)

- [Backend Setup](#backend-setup)

- [Architecture Diagram](#architecture-diagram)

- [Expected Outcomes](#expected-outcomes)

- [Required Skills](#required-skills)

🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

6-6: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

7-7: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

9-9: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

14-14: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

18-18: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

24-24: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

25-25: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

28-28: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

29-29: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

30-30: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

🤖 Prompt for AI Agents

In README.md from lines 5 to 30, remove the stray "I Integration" entry at line 17 and also delete the corresponding "### I Integration" heading at line 75. Normalize the Table of Contents indentation so that top-level items start at column 1 with "- ", second-level items are indented by two spaces with " - ", and third-level items by four spaces with " - ". Optionally, renumber or remove the numbered labels under Architecture Components to fix the missing "2." gap for consistent numbering.

coderabbitai · 2025-08-08T20:23:54Z

README.md

+  GROQ_API_KEY= <groq_api_key>
+PINECONE_API_KEY = <your_pinecone_API_KEY>
+PORT = 8000
+SEARCH_KEY = <your_Google_custom_search_engine_API_key>
  ```


💡 Verification agent

🧩 Analysis chain

Standardize .env variable formatting and naming for clarity

Make the keys consistent (no spaces around =, consistent placeholders). This also prevents copy-paste pitfalls.

- GROQ_API_KEY= <groq_api_key> -PINECONE_API_KEY = <your_pinecone_API_KEY> -PORT = 8000 -SEARCH_KEY = <your_Google_custom_search_engine_API_key> +GROQ_API_KEY=<GROQ_API_KEY> +PINECONE_API_KEY=<PINECONE_API_KEY> +PORT=8000 +SEARCH_KEY=<GOOGLE_CSE_API_KEY>

If HuggingFace is no longer required for this PR’s RAG/chat flow, remove the earlier “Get HuggingFace Access Token” step or clarify when it’s needed.

If additional vars are required for Pinecone (e.g., PINECONE_INDEX, PINECONE_ENVIRONMENT) or GROQ model names, add them here for completeness.

🏁 Script executed:

#!/bin/bash # Search for Pinecone usage in codebase grep -R "pinecone" -n . # Search for HuggingFace related usage in codebase and README grep -R "HuggingFace" -n . grep -R "get_huggingface_access_token" -n .

Length of output: 3242

🏁 Script executed:

#!/bin/bash # Check which environment variables are used for Pinecone initialization grep -R "os.getenv" -n backend/app/db/vector_store.py # Preview the beginning of the file to see how Pinecone is configured sed -n '1,200p' backend/app/db/vector_store.py

Length of output: 1289

🏁 Script executed:

#!/bin/bash # Search for GROQ usage in the codebase grep -R "GROQ" -n .

Length of output: 469

#!/bin/bash # List all environment variables used via os.getenv across the codebase grep -R "os.getenv" -n . # Preview the usage of SEARCH_KEY or equivalent in get_rag_data.py sed -n '1,200p' backend/app/modules/chat/get_rag_data.py

#!/bin/bash # Check which environment variables are used in get_rag_data.py grep -R "os.getenv" -n backend/app/modules/chat/get_rag_data.py # Preview the contents to see how the search key is retrieved and used sed -n '1,200p' backend/app/modules/chat/get_rag_data.py

🏁 Script executed:

#!/bin/bash # List environment variables used in get_rag_data.py grep -R "os.getenv" -n backend/app/modules/chat/get_rag_data.py # Show the beginning of the file for context sed -n '1,200p' backend/app/modules/chat/get_rag_data.py

Length of output: 834

🏁 Script executed:

#!/bin/bash # Check if SEARCH_KEY or related env vars are used in the codebase grep -R "SEARCH_KEY" -n . grep -R "GOOGLE_CSE" -n .

Length of output: 213

Standardize .env entries in README.md for consistency and accuracy

Please update the README’s environment-variable block (lines 157–161) to remove extra spaces around “=”, align placeholder names with the actual keys used in code, and drop the unused HuggingFace step. No additional Pinecone or GROQ variables are required beyond PINECONE_API_KEY and GROQ_API_KEY, and the Google CSE key is read from SEARCH_KEY in web_search.py.

File: README.md, lines 157–161

- GROQ_API_KEY= <groq_api_key> -PINECONE_API_KEY = <your_pinecone_API_KEY> -PORT = 8000 -SEARCH_KEY = <your_Google_custom_search_engine_API_key> +GROQ_API_KEY=<GROQ_API_KEY> +PINECONE_API_KEY=<PINECONE_API_KEY> +PORT=8000 +SEARCH_KEY=<GOOGLE_CSE_API_KEY>

• Remove the Get HuggingFace Access Token step (no HuggingFace calls remain in this PR).
• Confirm that only PINECONE_API_KEY, GROQ_API_KEY, and SEARCH_KEY are required by the code.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

GROQ_API_KEY= <groq_api_key>

PINECONE_API_KEY = <your_pinecone_API_KEY>

PORT = 8000

SEARCH_KEY = <your_Google_custom_search_engine_API_key>

```

GROQ_API_KEY=<GROQ_API_KEY>

PINECONE_API_KEY=<PINECONE_API_KEY>

PORT=8000

SEARCH_KEY=<GOOGLE_CSE_API_KEY>

🤖 Prompt for AI Agents

In README.md lines 157 to 161, remove the extra spaces around the equal signs in the environment variable assignments to standardize formatting, ensure the placeholder names exactly match the keys used in the code (PINECONE_API_KEY, GROQ_API_KEY, SEARCH_KEY), and delete the entire HuggingFace access token step since it is no longer used. Confirm that only these three environment variables are listed and no additional Pinecone or GROQ variables are included.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

README.md (3)

75-79: Remove the duplicate “I Integration” section

This section is redundant with “AI & NLP Integration” and referenced by the stray TOC entry.

-### I Integration
-
-  - **LLM**: OpenAI, Other NLP Models
-  - **processing**:Context-Aware
-

5-30: Fix TOC indentation, remove stray “I Integration”, and unescape ampersand

Normalize list indentation to satisfy markdownlint (MD005/MD007).
Remove duplicate/typo entry “I Integration”.
Use “AI & NLP Integration” (no backslash escape in link text).

Apply:

- - [Perspective-AI](#perspective-ai)
-     - [Table of Contents](#table-of-contents)
-   - [System Overview](#system-overview)
-     - [High-Level Concept](#high-level-concept)
-   - [Architecture Components](#architecture-components)
-     - [1. Frontend Layer](#1-frontend-layer)
-     - [3. Core Backend](#3-core-backend)
-     - [4. AI \& NLP Integration](#4-ai--nlp-integration)
-     - [5. Data Storage](#5-data-storage)
-   - [Technical Stack](#technical-stack)
-     - [Frontend Technologies](#frontend-technologies)
-     - [Backend Technologies](#backend-technologies)
-     - [I Integration](#i-integration)
-   - [Core Features](#core-features)
-     - [1. Counter-Perspective Generation](#1-counter-perspective-generation)
-     - [2. Reasoned Thinking](#2-reasoned-thinking)
-     - [3. Updated Facts](#3-updated-facts)
-     - [4. Seamless Integration](#4-seamless-integration)
-     - [5. Real-Time Analysis](#5-real-time-analysis)
-   - [Data Flow \& Security](#data-flow--security)
-   - [Setup \& Deployment](#setup--deployment)
-     - [Frontend Setup](#frontend-setup)
-     - [Backend Setup](#backend-setup)
-   - [Architecture Diagram](#architecture-diagram)
-   - [Expected Outcomes](#expected-outcomes)
-   - [Required Skills](#required-skills)
+ - [Perspective-AI](#perspective-ai)
+   - [Table of Contents](#table-of-contents)
+   - [System Overview](#system-overview)
+     - [High-Level Concept](#high-level-concept)
+   - [Architecture Components](#architecture-components)
+     - [1. Frontend Layer](#1-frontend-layer)
+     - [3. Core Backend](#3-core-backend)
+     - [4. AI & NLP Integration](#4-ai--nlp-integration)
+     - [5. Data Storage](#5-data-storage)
+   - [Technical Stack](#technical-stack)
+     - [Frontend Technologies](#frontend-technologies)
+     - [Backend Technologies](#backend-technologies)
+   - [Core Features](#core-features)
+     - [1. Counter-Perspective Generation](#1-counter-perspective-generation)
+     - [2. Reasoned Thinking](#2-reasoned-thinking)
+     - [3. Updated Facts](#3-updated-facts)
+     - [4. Seamless Integration](#4-seamless-integration)
+     - [5. Real-Time Analysis](#5-real-time-analysis)
+   - [Data Flow & Security](#data-flow--security)
+   - [Setup & Deployment](#setup--deployment)
+     - [Frontend Setup](#frontend-setup)
+     - [Backend Setup](#backend-setup)
+   - [Architecture Diagram](#architecture-diagram)
+   - [Expected Outcomes](#expected-outcomes)
+   - [Required Skills](#required-skills)

157-160: Standardize .env entries (formatting and placeholders)

Remove spaces around “=”, align placeholder names, and match keys used in code.

-  GROQ_API_KEY= <groq_api_key>
-PINECONE_API_KEY = <your_pinecone_API_KEY>
-PORT = 8000
-SEARCH_KEY = <your_Google_custom_search_engine_API_key>
+GROQ_API_KEY=<GROQ_API_KEY>
+PINECONE_API_KEY=<PINECONE_API_KEY>
+PORT=8000
+SEARCH_KEY=<GOOGLE_CSE_API_KEY>

🧹 Nitpick comments (2)

README.md (2)
154-156: Align backend directory naming

Instructions say to add .env under “/new-backend” but run steps cd into “backend”. Make them consistent.

Proposed fix:
-  - add .env file in `/new-backend`directory.
+  - add a .env file in the `/backend` directory.
Also applies to: 165-165

238-238: Minor grammar/spacing nit

Double space after colon.
-- **Frontend Development**:  Experience with Next.js and modern UI frameworks.
+- **Frontend Development**: Experience with Next.js and modern UI frameworks.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aeb92c0 and a8cc25d.

📒 Files selected for processing (1)

README.md (4 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

README.md

6-6: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

7-7: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

9-9: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

14-14: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

18-18: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

24-24: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

25-25: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

28-28: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

29-29: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

30-30: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

🔇 Additional comments (1)

README.md (1)

145-148: HuggingFace step appears obsolete for this PR

README still instructs to get a HuggingFace token, but the PR’s RAG/chat flow uses Groq + Pinecone and no HF calls.
-*Get HuggingFace Access Token:*
-- Go to HuggingFace website and create new access token.
-- copy that token
-
Likely an incorrect or invalid review comment.

ParagGhatage added 4 commits August 1, 2025 23:59

Added bias checking endpoint to the backend and also connected it to …

3feff7b

…frontend.

backend files

53a1c3b

added parrallel processing to the frontend with Promise and to the ba…

db0520c

…ckend with asyncio

implemented RAG-based chat feature to allow users to ask any queries …

c53eb08

…about the article

coderabbitai bot reviewed Aug 8, 2025

View reviewed changes

fixed-backend deployment error of HuggingFace

90c9dcb

coderabbitai bot reviewed Aug 8, 2025

View reviewed changes

ParagGhatage added 3 commits August 9, 2025 01:39

fixed typo in github actions

ab32dda

updated setup instructions in Readme

aeb92c0

changes in readme structure

a8cc25d

coderabbitai bot reviewed Aug 8, 2025

View reviewed changes

ManavSarkar merged commit ba87804 into main Aug 10, 2025
1 check passed

This was referenced Aug 11, 2025

optimizations and formatting frontend and backend #114

Merged

Added Doc strings and logger #115

Merged

coderabbitai bot mentioned this pull request Aug 29, 2025

added final report for GSoC-2025 Perspective project #117

Merged

coderabbitai bot mentioned this pull request Jan 3, 2026

Improve results page UX and layout #123

Open

coderabbitai bot mentioned this pull request Feb 7, 2026

fix(backend): replace decommissioned gemma2-9b-it with llama-3.3-70b-versatile #133

Open


		load_dotenv()

		client = Groq(api_key=os.getenv("GROQ_API_KEY"))

-client = Groq(api_key=os.getenv("GROQ_API_KEY"))
+load_dotenv()
+api_key = os.getenv("GROQ_API_KEY")
+if not api_key:
+    raise RuntimeError("GROQ_API_KEY is not set")
+client = Groq(api_key=api_key)

-        bias_score = chat_completion.choices[0].message.content.strip()
-        return {
-            "bias_score": bias_score,
-            "status": "success",
-        }
+        raw = chat_completion.choices[0].message.content.strip()
+        # Extract first integer/float from the response
+        import re
+        m = re.search(r"(\d{1,3})(?:\.\d+)?", raw)
+        if not m:
+            raise ValueError(f"Model did not return a numeric score: {raw!r}")
+        score = int(m.group(1))
+        # Enforce bounds
+        score = max(0, min(100, score))
+        return {
+            "bias_score": score,
+            "status": "success",
+        }

-from sentence_transformers import SentenceTransformer
-embedder = SentenceTransformer("all-MiniLM-L6-v2")
-def embed_query(query: str):
-    embeddings = embedder.encode(query).tolist()
-    return embeddings
+from typing import List
+from app.modules.vector_store.embed import embedder
+def embed_query(query: str) -> List[float]:
+    if not query or not query.strip():
+        raise ValueError("query must be a non-empty string")
+    embedding = embedder.encode(query).tolist()
+    # Optionally: normalize if index uses cosine similarity without normalized vectors
+    return embedding

		pc = Pinecone(os.getenv("PINECONE_API_KEY"))
		index = pc.Index("perspective")

Feat/ RAG chat endpoint + Pinecone metadata fix #113

Feat/ RAG chat endpoint + Pinecone metadata fix #113

Uh oh!

Conversation

ParagGhatage commented Aug 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Diagram

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Bias Detection Endpoint Flow

Chat Endpoint Flow

Estimated code review effort

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

ParagGhatage commented Aug 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 8, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)