Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 26 additions & 16 deletions .github/workflows/deploy-backend-to-hf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,57 +3,67 @@ name: 🚀 Deploy Backend to HF Space
on:
push:
branches:
- main # or your primary branch
- main
paths:
- "backend/**" # only trigger when anything under backend/ changes
- "backend/**"

jobs:
deploy:
runs-on: ubuntu-latest
# set your HF username here (or replace with a secret if you prefer)
env:
HF_USER: Thunder1245
HF_REPO: perspective-backend

steps:
- name: 👉 Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: 🔒 Install HF CLI
run: pip install huggingface_hub

- name: 🔑 HF login
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: huggingface-cli login --token "$HF_TOKEN"
- name: 🔍 Ensure HF_TOKEN is set
run: |
if [ -z "${{ secrets.HF_TOKEN }}" ]; then
echo "ERROR: HF_TOKEN secret is not set. Add it in repository secrets: Settings → Secrets & variables → Actions."
exit 1
fi

- name: 📂 Prepare Space repo
- name: 📂 Prepare Space repo (clone)
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
rm -rf space-backend
rm -rf space-backend || true
# clone using token in URL (this authenticates the clone)
git clone https://Thunder1245:${HF_TOKEN}@huggingface.co/spaces/Thunder1245/perspective-backend.git space-backend

Comment on lines +31 to 38
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use env vars in clone URL and avoid duplicating username; quote the URL.

The clone command hardcodes the username/repo and embeds the token in the command. Prefer env vars to prevent drift and quote the URL to avoid parsing issues.

Apply this diff:

-      - name: 📂 Prepare Space repo (clone)
+      - name: 📂 Prepare Space repo (clone)
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: |
-          rm -rf space-backend || true
-          # clone using token in URL (this authenticates the clone)
-          git clone https://Thunder1245:${HF_TOKEN}@huggingface.co/spaces/Thunder1245/perspective-backend.git space-backend
+          rm -rf space-backend || true
+          # clone using token in URL (this authenticates the clone)
+          git clone "https://${HF_USER}:${HF_TOKEN}@huggingface.co/spaces/${HF_USER}/${HF_REPO}.git" space-backend

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In .github/workflows/deploy-backend-to-hf.yml around lines 31 to 38, the git
clone URL hardcodes the username and embeds the token directly, which risks
drift and parsing errors. Refactor the clone command to use environment
variables for both username and token, and quote the entire URL string to
prevent shell parsing issues. This ensures credentials are managed securely and
the command is more maintainable.

- name: 📦 Install rsync
run: |
sudo apt-get update
sudo apt-get install -y rsync

- name: 📤 Sync backend code
- name: 📤 Sync backend code to Space
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
set -e

cd space-backend

# Only remove tracked files (preserve .git and config)
# Remove tracked files while preserving .git and config (ignore failure)
git rm -r . || true
cd ..

# Copy new backend files in
# Copy backend files into the cloned space directory
cp -R backend/. space-backend/

Comment on lines +52 to 58
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ensure repo is fully synchronized (handle untracked files); simplify with rsync.

git rm -r . won’t remove untracked files and the cp step doesn’t delete removed files. Use rsync --delete to mirror backend/ into the Space repo (excluding .git).

-          # Remove tracked files while preserving .git and config (ignore failure)
-          git rm -r . || true
-          cd ..
-
-          # Copy backend files into the cloned space directory
-          cp -R backend/. space-backend/
+          # Mirror backend files into the cloned space directory (preserve .git)
+          rsync -av --delete --exclude '.git' backend/ space-backend/
🤖 Prompt for AI Agents
In .github/workflows/deploy-backend-to-hf.yml around lines 52 to 58, the current
approach using 'git rm -r .' and 'cp -R backend/. space-backend/' does not
handle untracked files and does not fully synchronize the backend directory with
the space-backend directory. Replace these commands with a single 'rsync'
command using the '--delete' option to mirror the backend directory into
space-backend, excluding the .git directory, ensuring all files are synchronized
and removed files are deleted.

# Push new code to HF Space
# Commit & push
cd space-backend
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git add --all
git commit -m "Autodeploy backend: ${{ github.sha }}" || echo "No changes to commit"
git commit -m "Auto-deploy backend: ${{ github.sha }}" || echo "No changes to commit"
git push origin main

- name: ✅ Done
run: |
echo "Backend deployed to Hugging Face Space: https://huggingface.co/spaces/${HF_USER}/${HF_REPO}"
47 changes: 35 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,32 @@
![Perspective banner](frontend/public/perspective_banner.jpg)

### Table of Contents
- [System Overview](#system-overview)
- [Architecture Components](#architecture-components)
- [Technical Stack](#technical-stack)
- [Core Features](#core-features)
- [Data Flow & Security](#data-flow--security)
- [Setup & Deployment](#setup--deployment)
- [Detailed Architecture Diagram](#detailed-architecture-diagram)
- [Expected Outcomes](#expected-outcomes)
- [Required Skills](#required-skills)
- [Perspective-AI](#perspective-ai)
- [Table of Contents](#table-of-contents)
- [System Overview](#system-overview)
- [High-Level Concept](#high-level-concept)
- [Architecture Components](#architecture-components)
- [1. Frontend Layer](#1-frontend-layer)
- [3. Core Backend](#3-core-backend)
- [4. AI \& NLP Integration](#4-ai--nlp-integration)
- [5. Data Storage](#5-data-storage)
- [Technical Stack](#technical-stack)
- [Frontend Technologies](#frontend-technologies)
- [Backend Technologies](#backend-technologies)
- [I Integration](#i-integration)
- [Core Features](#core-features)
- [1. Counter-Perspective Generation](#1-counter-perspective-generation)
- [2. Reasoned Thinking](#2-reasoned-thinking)
- [3. Updated Facts](#3-updated-facts)
- [4. Seamless Integration](#4-seamless-integration)
- [5. Real-Time Analysis](#5-real-time-analysis)
- [Data Flow \& Security](#data-flow--security)
- [Setup \& Deployment](#setup--deployment)
- [Frontend Setup](#frontend-setup)
- [Backend Setup](#backend-setup)
- [Architecture Diagram](#architecture-diagram)
- [Expected Outcomes](#expected-outcomes)
- [Required Skills](#required-skills)
Comment on lines +5 to +30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Fix TOC indentation and remove the stray "I Integration" item to satisfy markdownlint and avoid broken anchors

Current list indentation is inconsistent (MD005/MD007), and the "I Integration" entry looks like a typo/duplicate of "AI & NLP Integration". Apply the following TOC cleanup:

- - [Perspective-AI](#perspective-ai)
-     - [Table of Contents](#table-of-contents)
-   - [System Overview](#system-overview)
-     - [High-Level Concept](#high-level-concept)
-   - [Architecture Components](#architecture-components)
-     - [1. Frontend Layer](#1-frontend-layer)
-     - [3. Core Backend](#3-core-backend)
-     - [4. AI \& NLP Integration](#4-ai--nlp-integration)
-     - [5. Data Storage](#5-data-storage)
-   - [Technical Stack](#technical-stack)
-     - [Frontend Technologies](#frontend-technologies)
-     - [Backend Technologies](#backend-technologies)
-     - [I Integration](#i-integration)
-   - [Core Features](#core-features)
-     - [1. Counter-Perspective Generation](#1-counter-perspective-generation)
-     - [2. Reasoned Thinking](#2-reasoned-thinking)
-     - [3. Updated Facts](#3-updated-facts)
-     - [4. Seamless Integration](#4-seamless-integration)
-     - [5. Real-Time Analysis](#5-real-time-analysis)
-   - [Data Flow \& Security](#data-flow--security)
-   - [Setup \& Deployment](#setup--deployment)
-     - [Frontend Setup](#frontend-setup)
-     - [Backend Setup](#backend-setup)
-   - [Architecture Diagram](#architecture-diagram)
-   - [Expected Outcomes](#expected-outcomes)
-   - [Required Skills](#required-skills)
+ - [Perspective-AI](#perspective-ai)
+   - [Table of Contents](#table-of-contents)
+   - [System Overview](#system-overview)
+     - [High-Level Concept](#high-level-concept)
+   - [Architecture Components](#architecture-components)
+     - [1. Frontend Layer](#1-frontend-layer)
+     - [3. Core Backend](#3-core-backend)
+     - [4. AI & NLP Integration](#4-ai--nlp-integration)
+     - [5. Data Storage](#5-data-storage)
+   - [Technical Stack](#technical-stack)
+     - [Frontend Technologies](#frontend-technologies)
+     - [Backend Technologies](#backend-technologies)
+   - [Core Features](#core-features)
+     - [1. Counter-Perspective Generation](#1-counter-perspective-generation)
+     - [2. Reasoned Thinking](#2-reasoned-thinking)
+     - [3. Updated Facts](#3-updated-facts)
+     - [4. Seamless Integration](#4-seamless-integration)
+     - [5. Real-Time Analysis](#5-real-time-analysis)
+   - [Data Flow & Security](#data-flow--security)
+   - [Setup & Deployment](#setup--deployment)
+     - [Frontend Setup](#frontend-setup)
+     - [Backend Setup](#backend-setup)
+   - [Architecture Diagram](#architecture-diagram)
+   - [Expected Outcomes](#expected-outcomes)
+   - [Required Skills](#required-skills)

Notes:

  • If you intend to keep numbered subsection labels (“1.”, “3.”, “4.”, “5.”), consider renumbering the corresponding headings to avoid the missing “2.” gap across the document. Otherwise, drop numbers from both headings and TOC for simplicity.

Please confirm whether “I Integration” was intended to be “AI Integration” (it appears redundant with “AI & NLP Integration”). If you want it retained, update both the TOC entry and the corresponding section heading for consistency.


🏁 Script executed:

#!/bin/bash
# Check for stray "I Integration" entry
grep -n "I Integration" README.md

# Print first 30 lines of README.md to inspect TOC indentation
sed -n '1,30p' README.md

Length of output: 1443


Remove stray “I Integration” entry and normalize TOC indentation

  • Delete the duplicate TOC item and its section:
    • Line 17: remove - [I Integration](#i-integration)
    • Line 75: remove the ### I Integration heading
  • Normalize the list indentation in the Table of Contents:
    • Top-level items start at col 1 (- […])
    • Second-level items are indented two spaces ( - […])
    • Third-level items are indented four spaces ( - […])
  • (Optional) Renumber or drop the “1. …”, “3. …”, “4. …”, “5. …” labels under Architecture Components to avoid the missing “2.” gap.

Suggested diff:

--- a/README.md
+++ b/README.md
@@ -17,1 +17,0 @@
-    - [I Integration](#i-integration)
@@ -75,1 +74,0 @@
-### I Integration
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- [Perspective-AI](#perspective-ai)
- [Table of Contents](#table-of-contents)
- [System Overview](#system-overview)
- [High-Level Concept](#high-level-concept)
- [Architecture Components](#architecture-components)
- [1. Frontend Layer](#1-frontend-layer)
- [3. Core Backend](#3-core-backend)
- [4. AI \& NLP Integration](#4-ai--nlp-integration)
- [5. Data Storage](#5-data-storage)
- [Technical Stack](#technical-stack)
- [Frontend Technologies](#frontend-technologies)
- [Backend Technologies](#backend-technologies)
- [I Integration](#i-integration)
- [Core Features](#core-features)
- [1. Counter-Perspective Generation](#1-counter-perspective-generation)
- [2. Reasoned Thinking](#2-reasoned-thinking)
- [3. Updated Facts](#3-updated-facts)
- [4. Seamless Integration](#4-seamless-integration)
- [5. Real-Time Analysis](#5-real-time-analysis)
- [Data Flow \& Security](#data-flow--security)
- [Setup \& Deployment](#setup--deployment)
- [Frontend Setup](#frontend-setup)
- [Backend Setup](#backend-setup)
- [Architecture Diagram](#architecture-diagram)
- [Expected Outcomes](#expected-outcomes)
- [Required Skills](#required-skills)
- [Perspective-AI](#perspective-ai)
- [Table of Contents](#table-of-contents)
- [System Overview](#system-overview)
- [High-Level Concept](#high-level-concept)
- [Architecture Components](#architecture-components)
- [1. Frontend Layer](#1-frontend-layer)
- [3. Core Backend](#3-core-backend)
- [4. AI & NLP Integration](#4-ai--nlp-integration)
- [5. Data Storage](#5-data-storage)
- [Technical Stack](#technical-stack)
- [Frontend Technologies](#frontend-technologies)
- [Backend Technologies](#backend-technologies)
- [Core Features](#core-features)
- [1. Counter-Perspective Generation](#1-counter-perspective-generation)
- [2. Reasoned Thinking](#2-reasoned-thinking)
- [3. Updated Facts](#3-updated-facts)
- [4. Seamless Integration](#4-seamless-integration)
- [5. Real-Time Analysis](#5-real-time-analysis)
- [Data Flow & Security](#data-flow--security)
- [Setup & Deployment](#setup--deployment)
- [Frontend Setup](#frontend-setup)
- [Backend Setup](#backend-setup)
- [Architecture Diagram](#architecture-diagram)
- [Expected Outcomes](#expected-outcomes)
- [Required Skills](#required-skills)
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

6-6: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


7-7: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)


9-9: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)


14-14: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)


18-18: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)


24-24: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)


25-25: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)


28-28: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)


29-29: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)


30-30: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 2

(MD005, list-indent)

🤖 Prompt for AI Agents
In README.md from lines 5 to 30, remove the stray "I Integration" entry at line
17 and also delete the corresponding "### I Integration" heading at line 75.
Normalize the Table of Contents indentation so that top-level items start at
column 1 with "- ", second-level items are indented by two spaces with "  - ",
and third-level items by four spaces with "    - ". Optionally, renumber or
remove the numbered labels under Architecture Components to fix the missing "2."
gap for consistent numbering.


---

Expand Down Expand Up @@ -137,20 +154,25 @@ npm run dev
- add .env file in `/new-backend`directory.
- add following environment variable in your .env file.
```
HF_TOKEN = <Your_hugging_face_access_token>
GROQ_API_KEY= <groq_api_key>
PINECONE_API_KEY = <your_pinecone_API_KEY>
PORT = 8000
SEARCH_KEY = <your_Google_custom_search_engine_API_key>
```
Comment on lines +157 to 161
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Standardize .env variable formatting and naming for clarity

Make the keys consistent (no spaces around =, consistent placeholders). This also prevents copy-paste pitfalls.

-  GROQ_API_KEY= <groq_api_key>
-PINECONE_API_KEY = <your_pinecone_API_KEY>
-PORT = 8000
-SEARCH_KEY = <your_Google_custom_search_engine_API_key>
+GROQ_API_KEY=<GROQ_API_KEY>
+PINECONE_API_KEY=<PINECONE_API_KEY>
+PORT=8000
+SEARCH_KEY=<GOOGLE_CSE_API_KEY>
  • If HuggingFace is no longer required for this PR’s RAG/chat flow, remove the earlier “Get HuggingFace Access Token” step or clarify when it’s needed.
  • If additional vars are required for Pinecone (e.g., PINECONE_INDEX, PINECONE_ENVIRONMENT) or GROQ model names, add them here for completeness.

🏁 Script executed:

#!/bin/bash
# Search for Pinecone usage in codebase
grep -R "pinecone" -n .
# Search for HuggingFace related usage in codebase and README
grep -R "HuggingFace" -n .
grep -R "get_huggingface_access_token" -n .

Length of output: 3242


🏁 Script executed:

#!/bin/bash
# Check which environment variables are used for Pinecone initialization
grep -R "os.getenv" -n backend/app/db/vector_store.py

# Preview the beginning of the file to see how Pinecone is configured
sed -n '1,200p' backend/app/db/vector_store.py

Length of output: 1289


🏁 Script executed:

#!/bin/bash
# Search for GROQ usage in the codebase
grep -R "GROQ" -n .

Length of output: 469


#!/bin/bash
# List all environment variables used via os.getenv across the codebase
grep -R "os.getenv" -n .

# Preview the usage of SEARCH_KEY or equivalent in get_rag_data.py
sed -n '1,200p' backend/app/modules/chat/get_rag_data.py

#!/bin/bash
# Check which environment variables are used in get_rag_data.py
grep -R "os.getenv" -n backend/app/modules/chat/get_rag_data.py

# Preview the contents to see how the search key is retrieved and used
sed -n '1,200p' backend/app/modules/chat/get_rag_data.py

🏁 Script executed:

#!/bin/bash
# List environment variables used in get_rag_data.py
grep -R "os.getenv" -n backend/app/modules/chat/get_rag_data.py

# Show the beginning of the file for context
sed -n '1,200p' backend/app/modules/chat/get_rag_data.py

Length of output: 834


🏁 Script executed:

#!/bin/bash
# Check if SEARCH_KEY or related env vars are used in the codebase
grep -R "SEARCH_KEY" -n .
grep -R "GOOGLE_CSE" -n .

Length of output: 213


Standardize .env entries in README.md for consistency and accuracy

Please update the README’s environment-variable block (lines 157–161) to remove extra spaces around “=”, align placeholder names with the actual keys used in code, and drop the unused HuggingFace step. No additional Pinecone or GROQ variables are required beyond PINECONE_API_KEY and GROQ_API_KEY, and the Google CSE key is read from SEARCH_KEY in web_search.py.

File: README.md, lines 157–161

-  GROQ_API_KEY= <groq_api_key>
-PINECONE_API_KEY = <your_pinecone_API_KEY>
-PORT = 8000
-SEARCH_KEY = <your_Google_custom_search_engine_API_key>
+GROQ_API_KEY=<GROQ_API_KEY>
+PINECONE_API_KEY=<PINECONE_API_KEY>
+PORT=8000
+SEARCH_KEY=<GOOGLE_CSE_API_KEY>

• Remove the Get HuggingFace Access Token step (no HuggingFace calls remain in this PR).
• Confirm that only PINECONE_API_KEY, GROQ_API_KEY, and SEARCH_KEY are required by the code.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
GROQ_API_KEY= <groq_api_key>
PINECONE_API_KEY = <your_pinecone_API_KEY>
PORT = 8000
SEARCH_KEY = <your_Google_custom_search_engine_API_key>
```
GROQ_API_KEY=<GROQ_API_KEY>
PINECONE_API_KEY=<PINECONE_API_KEY>
PORT=8000
SEARCH_KEY=<GOOGLE_CSE_API_KEY>
🤖 Prompt for AI Agents
In README.md lines 157 to 161, remove the extra spaces around the equal signs in
the environment variable assignments to standardize formatting, ensure the
placeholder names exactly match the keys used in the code (PINECONE_API_KEY,
GROQ_API_KEY, SEARCH_KEY), and delete the entire HuggingFace access token step
since it is no longer used. Confirm that only these three environment variables
are listed and no additional Pinecone or GROQ variables are included.


*Run backend:*
```bash
cd new-backend
cd backend
uv sync # Creating virtual environment at: .venv
uv run main.py #Runs the backend server
```

---


## Architecture Diagram


```mermaid
graph TB
%% Define Subgraphs with Colors and Text Styles
Expand All @@ -168,6 +190,7 @@ graph TB
Analyzer[Content Analyzer]
CNEngine[Counter-Narrative Engine]
Context[Context Manager]

end

subgraph AI & NLP Layer
Expand Down Expand Up @@ -212,7 +235,7 @@ graph TB

## Required Skills

- **Frontend Development**: Experience with Next.js and modern UI frameworks.
- **Frontend Development**: Experience with Next.js and modern UI frameworks.
- **Backend Development**: Proficiency in Python and FastAPI.
- **AI & NLP**: Familiarity with LangChain, Langgraph, and prompt engineering techniques.
- **Database Management**: Knowledge of vector databases system.
Expand Down
Empty file.
57 changes: 57 additions & 0 deletions backend/app/modules/bias_detection/check_bias.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import os
from groq import Groq
from dotenv import load_dotenv
import json

load_dotenv()

client = Groq(api_key=os.getenv("GROQ_API_KEY"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fail fast if GROQ_API_KEY is missing and use a named parameter.

Avoid constructing the client with a missing/None key. Use a named argument and validate the env var.

-load_dotenv()
-
-client = Groq(api_key=os.getenv("GROQ_API_KEY"))
+load_dotenv()
+api_key = os.getenv("GROQ_API_KEY")
+if not api_key:
+    raise RuntimeError("GROQ_API_KEY is not set")
+client = Groq(api_key=api_key)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
client = Groq(api_key=os.getenv("GROQ_API_KEY"))
load_dotenv()
api_key = os.getenv("GROQ_API_KEY")
if not api_key:
raise RuntimeError("GROQ_API_KEY is not set")
client = Groq(api_key=api_key)
🤖 Prompt for AI Agents
In backend/app/modules/bias_detection/check_bias.py at line 8, the Groq client
is created using an environment variable without validation and without using a
named parameter. First, check if the GROQ_API_KEY environment variable is set
and raise an error or exit immediately if it is missing. Then, instantiate the
Groq client using the api_key named parameter explicitly with the validated key.



def check_bias(text):
try:
print(text)
print(json.dumps(text))

Comment on lines +13 to +15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove PII logging of full article text.

Printing raw article content (and its JSON) to stdout is a PII/data-leak risk and noisy in production logs. Gate behind a debug logger or remove.

-        print(text)
-        print(json.dumps(text))
+        # Consider using a structured logger at DEBUG level if needed:
+        # logger.debug("check_bias called with text length=%d", len(text or ""))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(text)
print(json.dumps(text))
# Consider using a structured logger at DEBUG level if needed:
# logger.debug("check_bias called with text length=%d", len(text or ""))
🤖 Prompt for AI Agents
In backend/app/modules/bias_detection/check_bias.py around lines 13 to 15, the
code prints the full article text and its JSON representation directly, which
risks exposing PII and cluttering production logs. Remove these print statements
or replace them with debug-level logging that can be enabled or disabled via
configuration to avoid leaking sensitive data in production environments.

if not text:
raise ValueError("Missing or empty 'cleaned_text'")

chat_completion = client.chat.completions.create(
messages=[
{
"role": "system",
"content": (
"You are an assistant that checks "
"if given article is biased and give"
"score to each based on biasness where 0 is lowest bias and 100 is highest bias"
"Only return a number between 0 to 100 base on bias."
"only return Number No Text"
),
},
{
"role": "user",
"content": (
"Give bias score to the following article "
f"\n\n{text}"
),
},
],
model="gemma2-9b-it",
temperature=0.3,
max_tokens=512,
)

bias_score = chat_completion.choices[0].message.content.strip()

return {
"bias_score": bias_score,
"status": "success",
}
Comment on lines +44 to +49
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Parse and validate numeric output; return a number type.

Currently the function returns a string and trusts the model to output only a number. Parse robustly, clamp to [0,100], and return an int/float to simplify frontend usage.

-        bias_score = chat_completion.choices[0].message.content.strip()
-
-        return {
-            "bias_score": bias_score,
-            "status": "success",
-        }
+        raw = chat_completion.choices[0].message.content.strip()
+        # Extract first integer/float from the response
+        import re
+        m = re.search(r"(\d{1,3})(?:\.\d+)?", raw)
+        if not m:
+            raise ValueError(f"Model did not return a numeric score: {raw!r}")
+        score = int(m.group(1))
+        # Enforce bounds
+        score = max(0, min(100, score))
+        return {
+            "bias_score": score,
+            "status": "success",
+        }

Note: Add import re at the top of the file.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bias_score = chat_completion.choices[0].message.content.strip()
return {
"bias_score": bias_score,
"status": "success",
}
raw = chat_completion.choices[0].message.content.strip()
# Extract first integer/float from the response
import re
m = re.search(r"(\d{1,3})(?:\.\d+)?", raw)
if not m:
raise ValueError(f"Model did not return a numeric score: {raw!r}")
score = int(m.group(1))
# Enforce bounds
score = max(0, min(100, score))
return {
"bias_score": score,
"status": "success",
}
🤖 Prompt for AI Agents
In backend/app/modules/bias_detection/check_bias.py around lines 44 to 49, the
function currently returns the bias_score as a string directly from the model
output. To fix this, import the 're' module at the top of the file, then parse
the bias_score string to extract a numeric value using a regular expression,
convert it to a float or int, clamp it within the range 0 to 100, and return
this numeric value instead of the raw string. This ensures robust validation and
simplifies frontend handling.


except Exception as e:
print(f"Error in bias_detection: {e}")
return {
"status": "error",
"error_from": "bias_detection",
"message": str(e),
}
Empty file.
10 changes: 10 additions & 0 deletions backend/app/modules/chat/embed_query.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer("all-MiniLM-L6-v2")


def embed_query(query: str):

embeddings = embedder.encode(query).tolist()

return embeddings
Comment on lines +1 to +10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Reuse the existing embedder; add types and input validation.

Avoid loading the same model twice. Import the singleton embedder and ensure we return a List[float].

-from sentence_transformers import SentenceTransformer
-
-embedder = SentenceTransformer("all-MiniLM-L6-v2")
-
-
-def embed_query(query: str):
-
-    embeddings = embedder.encode(query).tolist()
-
-    return embeddings
+from typing import List
+from app.modules.vector_store.embed import embedder
+
+def embed_query(query: str) -> List[float]:
+    if not query or not query.strip():
+        raise ValueError("query must be a non-empty string")
+    embedding = embedder.encode(query).tolist()
+    # Optionally: normalize if index uses cosine similarity without normalized vectors
+    return embedding
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from sentence_transformers import SentenceTransformer
embedder = SentenceTransformer("all-MiniLM-L6-v2")
def embed_query(query: str):
embeddings = embedder.encode(query).tolist()
return embeddings
from typing import List
from app.modules.vector_store.embed import embedder
def embed_query(query: str) -> List[float]:
if not query or not query.strip():
raise ValueError("query must be a non-empty string")
embedding = embedder.encode(query).tolist()
# Optionally: normalize if index uses cosine similarity without normalized vectors
return embedding
🤖 Prompt for AI Agents
In backend/app/modules/chat/embed_query.py lines 1 to 10, avoid loading the
SentenceTransformer model again by importing the existing singleton embedder
instead of creating a new one. Add type annotations to the embed_query function
to specify it returns a List[float]. Also, add input validation to ensure the
query parameter is a non-empty string before encoding.

31 changes: 31 additions & 0 deletions backend/app/modules/chat/get_rag_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from pinecone import Pinecone
from dotenv import load_dotenv
from app.modules.chat.embed_query import embed_query
import os

load_dotenv()

pc = Pinecone(os.getenv("PINECONE_API_KEY"))
index = pc.Index("perspective")

Comment on lines +8 to +10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden Pinecone client/index initialization and make names configurable.

Validate the API key, use named arg, and allow index/namespace via env for portability.

-load_dotenv()
-
-pc = Pinecone(os.getenv("PINECONE_API_KEY"))
-index = pc.Index("perspective")
+load_dotenv()
+api_key = os.getenv("PINECONE_API_KEY")
+if not api_key:
+    raise RuntimeError("PINECONE_API_KEY is not set")
+pc = Pinecone(api_key=api_key)
+index_name = os.getenv("PINECONE_INDEX_NAME", "perspective")
+namespace = os.getenv("PINECONE_NAMESPACE", "default")
+index = pc.Index(index_name)
🤖 Prompt for AI Agents
In backend/app/modules/chat/get_rag_data.py around lines 8 to 10, the Pinecone
client and index initialization lacks validation and configurability. Fix this
by first validating that the PINECONE_API_KEY environment variable is set and
raise an error if missing. Use named arguments when initializing the Pinecone
client. Also, make the index name and namespace configurable by reading them
from environment variables with sensible defaults to improve portability.


def search_pinecone(query: str, top_k: int = 5):

embeddings = embed_query(query)

results = index.query(
vector=embeddings,
top_k=top_k,
include_metadata=True,
namespace="default"

)

matches = []
for match in results["matches"]:
matches.append({
"id": match["id"],
"score": match["score"],
"metadata": match["metadata"]
})
return matches
Comment on lines +12 to +31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Fix result parsing for Pinecone v3 responses; add validation and error handling.

index.query(...) often returns an object with a matches attribute (not subscriptable). Also validate input and handle empty/no-match cases.

-def search_pinecone(query: str, top_k: int = 5):
-
-    embeddings = embed_query(query)
-
-    results = index.query(
-        vector=embeddings,
-        top_k=top_k,
-        include_metadata=True,
-        namespace="default"
-
-    )
-
-    matches = []
-    for match in results["matches"]:
-        matches.append({
-            "id": match["id"],
-            "score": match["score"],
-            "metadata": match["metadata"]
-        })
-    return matches
+def search_pinecone(query: str, top_k: int = 5):
+    if not query or not query.strip():
+        return []
+    embedding = embed_query(query)
+    try:
+        res = index.query(
+            vector=embedding,
+            top_k=top_k,
+            include_metadata=True,
+            namespace=namespace,
+        )
+    except Exception as e:
+        # Consider logging and surfacing a structured error upstream
+        # logger.exception("Pinecone query failed")
+        return []
+
+    # Support both dict-like and object-like responses
+    raw_matches = []
+    if hasattr(res, "matches"):
+        raw_matches = res.matches or []
+    elif isinstance(res, dict):
+        raw_matches = res.get("matches", []) or []
+
+    normalized = []
+    for m in raw_matches:
+        # Support both dict items and attribute access
+        mid = m.get("id") if isinstance(m, dict) else getattr(m, "id", None)
+        mscore = m.get("score") if isinstance(m, dict) else getattr(m, "score", None)
+        mmeta = m.get("metadata") if isinstance(m, dict) else getattr(m, "metadata", None)
+        if mid is None:
+            continue
+        normalized.append({"id": mid, "score": mscore, "metadata": mmeta})
+    return normalized

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In backend/app/modules/chat/get_rag_data.py around lines 12 to 31, the code
incorrectly accesses Pinecone query results as a dictionary, but Pinecone v3
returns an object with a matches attribute. Update the code to access
results.matches instead of results["matches"]. Add input validation for the
query parameter to ensure it is a non-empty string. Include error handling for
cases where results.matches is empty or None, returning an empty list or
appropriate response to avoid runtime errors.

35 changes: 35 additions & 0 deletions backend/app/modules/chat/llm_processing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import os
from groq import Groq
from dotenv import load_dotenv

load_dotenv()

client = Groq(api_key=os.getenv("GROQ_API_KEY"))

Comment on lines +7 to +8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fail fast when the API key is missing

If GROQ_API_KEY is undefined Groq() still instantiates and later requests will 401.
Add an explicit check and raise a clear error during startup.

🤖 Prompt for AI Agents
In backend/app/modules/chat/llm_processing.py around lines 7 to 8, the code
instantiates the Groq client without verifying if the GROQ_API_KEY environment
variable is set, which can lead to 401 errors later. Add an explicit check right
before creating the Groq client to verify if the API key is present; if it is
missing, raise a clear and descriptive error to fail fast during startup.


def build_context(docs):

return "\n".join(f"{m['metadata'].get('explanation') or m['metadata'].get('reasoning', '')}"for m in docs)


def ask_llm(question, docs):
context = build_context(docs)
print(context)
prompt = f"""You are an assistant that answers based on context.
Comment on lines +17 to +18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove plaintext context logging in production

print(context) risks leaking user content and rapidly floods logs on long prompts.
Guard behind a debug flag or drop entirely.

🤖 Prompt for AI Agents
In backend/app/modules/chat/llm_processing.py around lines 17 to 18, the print
statement outputs the context in plaintext, which risks leaking sensitive user
data and flooding logs. Remove the print(context) statement or wrap it in a
conditional debug flag so it only logs when debugging is enabled, ensuring no
sensitive information is logged in production.


Context:
{context}

Question:
{question}
"""

response = client.chat.completions.create(
model="gemma2-9b-it",
messages=[
{"role": "system", "content": "Use only the context to answer."},
{"role": "user", "content": prompt}
]
)

return response.choices[0].message.content
1 change: 1 addition & 0 deletions backend/app/modules/vector_store/embed.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,4 @@ def embed_chunks(chunks: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"metadata": chunk["metadata"]
})
return vectors

31 changes: 29 additions & 2 deletions backend/app/routes/routes.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
from pydantic import BaseModel
from app.modules.pipeline import run_scraper_pipeline
from app.modules.pipeline import run_langgraph_workflow
from app.modules.bias_detection.check_bias import check_bias
from app.modules.chat.get_rag_data import search_pinecone
from app.modules.chat.llm_processing import ask_llm
import asyncio
import json

router = APIRouter()
Expand All @@ -11,14 +15,37 @@ class URlRequest(BaseModel):
url: str


class ChatQuery(BaseModel):
message: str


@router.get("/")
async def home():
return {"message": "Perspective API is live!"}


@router.post("/bias")
async def bias_detection(request: URlRequest):
content = await asyncio.to_thread(run_scraper_pipeline, (request.url))
bias_score = await asyncio.to_thread(check_bias, (content))
print(bias_score)
return bias_score


@router.post("/process")
async def run_pipelines(request: URlRequest):
article_text = run_scraper_pipeline(request.url)
article_text = await asyncio.to_thread(run_scraper_pipeline, (request.url))
print(json.dumps(article_text, indent=2))
data = run_langgraph_workflow(article_text)
data = await asyncio.to_thread(run_langgraph_workflow, (article_text))
return data
Comment on lines +37 to 40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Same tuple issue in /process endpoint

run_scraper_pipeline and run_langgraph_workflow receive tuples instead of their expected args. Fix as above.

🤖 Prompt for AI Agents
In backend/app/routes/routes.py around lines 37 to 40, the functions
run_scraper_pipeline and run_langgraph_workflow are incorrectly called with
single-element tuples due to extra parentheses around their arguments. Remove
the parentheses around the arguments so that the functions receive the expected
single argument instead of a tuple. For example, change calls from
asyncio.to_thread(run_scraper_pipeline, (request.url)) to
asyncio.to_thread(run_scraper_pipeline, request.url).



@router.post("/chat")
async def answer_query(request: ChatQuery):

query = request.message
results = search_pinecone(query)
answer = ask_llm(query, results)
print(answer)

return {"answer": answer}
Loading