Skip to content

Conversation

@ahmedxgouda
Copy link
Collaborator

@ahmedxgouda ahmedxgouda commented Nov 24, 2025

Proposed change

Resolves #2709

  • Added db environment variables to the docker compose backend service.
  • Added csrf_decorate that skips the csrf protection if the environment is e2e.
  • Fixed rest API internal error.
  • Updated the docs to elaborate how to update the e2e db data.

Checklist

  • I've read and followed the contributing guidelines.
  • I've run make check-test locally; all checks and tests passed.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

Summary by CodeRabbit

  • New Features

    • Added automated data masking for test database initialization.
  • Documentation

    • Updated setup prerequisites in contributing guide.
  • Chores

    • Improved end-to-end testing environment configuration.
    • Updated data loading process for test environments.
    • Enhanced environment variable handling for test setup.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds E2E configuration and SITE_URL, inserts an IS_E2E_ENVIRONMENT branch in API settings customization, introduces a dump_data Django management command to produce masked SQL dumps, updates Makefile and CI workflows to load SQL after backend start with timeouts, and adjusts env examples, lint rules, and wordlist entries.

Changes

Cohort / File(s) Change Summary
E2E settings & API customization
backend/settings/e2e.py, backend/apps/api/rest/v0/__init__.py, backend/.env.e2e.example
Added SITE_URL and CACHES in E2E settings; removed explicit DEBUG; added IS_E2E_ENVIRONMENT branch in API settings customization to set auth=None, a single E2E server using SITE_URL, and empty throttle; updated DJANGO_DB_HOST in example env.
E2E workflow / CI
.github/workflows/setup-e2e-environment/action.yaml, .github/workflows/run-ci-cd.yaml
Added 5-minute timeouts around DB/backend readiness waits, moved initial Postgres data load to after backend startup and switched to nest.sql.gz, added DJANGO_DB_HOST=localhost to backend step, and changed frontend env file to frontend/.env.e2e.example.
Data dump & Makefile
backend/apps/common/management/commands/dump_data.py, backend/Makefile
New dump_data management command generating masked, gzipped SQL dumps (creates temp DB, masks emails, uses pg_dump); Makefile targets updated to use this dump and to load SQL via piping into DB container; removed old e2e dump target.
Lint, docs, dictionary
backend/pyproject.toml, CONTRIBUTING.md, cspell/custom-dict.txt
Added per-file Ruff ignores for the new command (S603, S607); documented gzip prerequisite in CONTRIBUTING; added dictionary entries (Atqc, PGPASSWORD, attisdropped, nspname).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Focus review on:
    • backend/apps/common/management/commands/dump_data.py — subprocess calls, SQL generation for masking, credential handling (PGPASSWORD), and cleanup/error paths.
    • .github/workflows/setup-e2e-environment/action.yaml — ordering and timeouts to avoid race conditions between DB and backend readiness.
    • Consistency of DJANGO_DB_HOST and SITE_URL usage across CI, Makefile, and env examples.

Possibly related PRs

Suggested labels

docker

Suggested reviewers

  • kasya
  • arkid15r

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix running e2e backend' directly corresponds to the PR's main objective of resolving #2709 (Fix the running e2e backend instance).
Description check ✅ Passed The PR description is directly related to the changeset, detailing specific changes including database environment variables, csrf_decorate, REST API fixes, and documentation updates.
Linked Issues check ✅ Passed The PR addresses issue #2709 by implementing backend fixes: E2E environment configuration [backend/settings/e2e.py], API customization for E2E [backend/apps/api/rest/v0/init.py], workflow updates, data management tools [dump_data.py, Makefile], and documentation [CONTRIBUTING.md].
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the e2e backend instance: configuration updates, API fixes, CI/CD workflow adjustments, data dumping utilities, and documentation—all supporting the core objective of resolving #2709.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd2219d and 3ea8fc7.

📒 Files selected for processing (1)
  • backend/settings/e2e.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run Code Scan
  • GitHub Check: Run CI Denendencies Scan
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (2)
backend/settings/e2e.py (2)

12-12: LGTM! SITE_URL correctly configured for E2E environment.

The SITE_URL is appropriately set to the localhost backend port for E2E testing.


19-23: LGTM! LocMemCache is well-suited for E2E testing.

The in-memory cache backend provides fast, process-isolated caching that ensures test independence during E2E runs.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

The PR must be linked to an issue assigned to the PR author.

@github-actions github-actions bot closed this Nov 24, 2025
@ahmedxgouda
Copy link
Collaborator Author

I think I am assigned to the issue :)
image

@ahmedxgouda ahmedxgouda reopened this Nov 24, 2025
@github-actions
Copy link

The PR must be linked to an issue assigned to the PR author.

@github-actions github-actions bot closed this Nov 24, 2025
@ahmedxgouda ahmedxgouda linked an issue Nov 24, 2025 that may be closed by this pull request
@ahmedxgouda ahmedxgouda reopened this Nov 24, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/apps/common/utils.py (1)

71-76: Consider adding type hints for consistency.

Other functions in this file include type hints (e.g., convert_to_camel_case(text: str) -> str). Consider adding type hints to match the existing code style.

-def csrf_decorate(view):
+from typing import Callable
+from django.http import HttpRequest, HttpResponse
+
+def csrf_decorate(view: Callable[[HttpRequest], HttpResponse]) -> Callable[[HttpRequest], HttpResponse]:
     """Apply CSRF protection based on settings."""
     if settings.IS_E2E_ENVIRONMENT:
         return csrf_exempt(view)  # NOSONAR
     return csrf_protect(view)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 133c89f and fd0aa86.

📒 Files selected for processing (3)
  • backend/apps/common/utils.py (2 hunks)
  • backend/settings/urls.py (2 hunks)
  • docker-compose/e2e.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-31T13:48:09.830Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2155
File: frontend/graphql-codegen.ts:0-0
Timestamp: 2025-08-31T13:48:09.830Z
Learning: In the OWASP/Nest project, Django's GraphQL endpoint accepts CSRF tokens via 'x-csrftoken' header (lowercase) without requiring a Referer header, working fine in their configuration for GraphQL codegen introspection.

Applied to files:

  • backend/settings/urls.py
🧬 Code graph analysis (1)
backend/settings/urls.py (1)
backend/apps/common/utils.py (1)
  • csrf_decorate (71-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (4)
docker-compose/e2e.yaml (1)

15-20: LGTM! DB environment variables properly configured.

Adding explicit database connection environment variables ensures the backend service can connect to the PostgreSQL database. The values correctly reference the db service name for the host and align with the database service configuration (lines 38-40). This explicit configuration takes precedence over the env_file and likely resolves the internal server error mentioned in the linked issue.

backend/settings/urls.py (2)

14-14: LGTM! Import correctly references the new utility.

The import is correctly placed and the csrf_decorate utility is appropriately used for both the Algolia search and GraphQL endpoints.


24-25: Implementation correctly applies conditional CSRF protection.

The routes now use csrf_decorate which conditionally applies CSRF protection based on the environment. In production, these endpoints remain protected. Security implications of the E2E exemption are addressed in the review of backend/apps/common/utils.py.

Based on learnings, the GraphQL endpoint accepts CSRF tokens via the 'x-csrftoken' header, which continues to work with this implementation in non-E2E environments.

backend/apps/common/utils.py (1)

71-76: Verification confirms safeguards already in place—concern is substantially mitigated.

The architecture provides multiple layers of protection:

  1. Defensive default: IS_E2E_ENVIRONMENT = False in backend/settings/base.py
  2. Explicit opt-in: E2E config requires DJANGO_CONFIGURATION=E2E environment variable (not set in production deployment)
  3. Separate deployment: E2E environments use distinct .env.e2e.example and separate docker-compose files; production uses production_nest-app-network

The code pattern itself is sound—settings default to False and require explicit configuration to enable E2E behavior. The residual risk (human misconfiguration of DJANGO_CONFIGURATION) is addressed through deployment infrastructure and separate environment configurations.

@github-actions github-actions bot added the ci label Nov 24, 2025
@github-actions github-actions bot added the docs Improvements or additions to documentation label Nov 24, 2025
@ahmedxgouda ahmedxgouda marked this pull request as ready for review November 24, 2025 12:40
CONTRIBUTING.md Outdated

**Please note that you only need to do these steps once.**

#### Updating e2e Test Data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My data file update comment was not about the instructions for the data loading. It was about how to maintain the nest-e2e..sql.gz file.

return url.strip().rstrip(".,;:!?") or None


def csrf_decorate(view):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to avoid settings based on the running environment? We want it to be as close to production as possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last time I test sending fuzzing request in the fuzzing PR, it didn't work even with setting csrf header, but it was before a dedicated backend instance for e2e. I will remove this for now and try again in the fuzzing PR.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/.env.e2e.example (1)

7-13: Consider ordering DJANGO_DB_* variables alphabetically.

The change is functionally correct and aligns with the PR objectives. However, DJANGO_DB_HOST at line 9 is not in alphabetical order relative to other keys. Reordering these variables alphabetically would improve consistency and satisfy dotenv-linter's expectations.

Apply this diff to reorder the DB-related variables alphabetically:

 DJANGO_SETTINGS_MODULE=settings.e2e
 DJANGO_CONFIGURATION=E2E
-DJANGO_DB_HOST=db
 DJANGO_DB_NAME=nest_db_e2e
+DJANGO_DB_HOST=db
 DJANGO_DB_USER=nest_user_e2e
 DJANGO_DB_PASSWORD=nest_user_e2e_password
 DJANGO_DB_PORT=5432

Note: If your project uses dotenv-linter or similar tools in CI, you may want to apply this reorganization to avoid future warnings.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1556873 and c35e43f.

📒 Files selected for processing (3)
  • .github/workflows/setup-e2e-environment/action.yaml (3 hunks)
  • CONTRIBUTING.md (1 hunks)
  • backend/.env.e2e.example (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/setup-e2e-environment/action.yaml
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-17T15:25:55.624Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/providers.tf:1-3
Timestamp: 2025-10-17T15:25:55.624Z
Learning: The infrastructure code in the OWASP/Nest repository (infrastructure/ directory) is intended for quick testing purposes only, not for production deployment.

Applied to files:

  • CONTRIBUTING.md
📚 Learning: 2025-11-23T11:52:15.463Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2699
File: backend/wsgi.py:13-13
Timestamp: 2025-11-23T11:52:15.463Z
Learning: In the OWASP Nest project, the SSM parameter store setup in backend/wsgi.py (using boto3 to fetch parameters from AWS Systems Manager) is designed for staging and production environments, not just for testing purposes.

Applied to files:

  • backend/.env.e2e.example
🪛 dotenv-linter (4.0.0)
backend/.env.e2e.example

[warning] 9-9: [UnorderedKey] The DJANGO_DB_HOST key should go before the DJANGO_SETTINGS_MODULE key

(UnorderedKey)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run backend tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (2)
CONTRIBUTING.md (2)

444-470: Ensure the e2e data loading procedure aligns with the linked issue objectives.

The new "Updating e2e Test Data" section documents the procedure for refreshing e2e test data. Per the PR objective to "resolve the e2e backend instance error," verify that these steps fully address the data setup requirements and are consistent with any Docker Compose or backend setup changes made elsewhere in this PR.

A past reviewer flagged concerns about maintaining the nest-e2e..sql.gz data file itself. Confirm that the current documentation provides sufficient guidance on how to update the source data file (if applicable) before reloading into the container, rather than just the container lifecycle steps.


444-470: Let me retry the shell script execution to verify these issues:
<function_calls>


#!/bin/bash

Check the exact indentation on line 469 in CONTRIBUTING.md

echo "=== Checking line 469 indentation in CONTRIBUTING.md ==="
sed -n '465,475p' CONTRIBUTING.md | cat -A

echo ""
echo "=== Finding docker-compose files ==="
find . -maxdepth 3 -type f ( -name "docker-compose.yml" -o -name "docker-compose.yaml" -o -name ".env*" ) 2>/dev/null | head -20


</function_calls>

#!/bin/bash
# Search for e2e database volume definitions
echo "=== Searching for e2e-db volume definitions ==="
rg -n "e2e.*db.*volume|nest-e2e" --type yaml --type yml -C 3
</parameter>
</invoke>
</function_calls>

</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/Makefile (1)

69-71: Clarify the relationship between dump-data-local and dump-data-e2e targets.

Both dump-data-local and dump-data-e2e write to the same output file (backend/data/nest-e2e.sql.gz), which could be confusing for developers. The naming suggests these serve different purposes, but they're actually part of the same workflow (local data → e2e seed data).

Consider renaming either the target or output file to make the intent clearer:

  • Option 1: Rename the output to nest-e2e-local-seed.sql.gz to indicate it's local data prepared for e2e
  • Option 2: Rename the target to prepare-e2e-data-from-local to clarify its purpose
  • Option 3: Add a brief comment above the target explaining the workflow

The PR description mentions updated documentation on updating e2e DB data—ensure the docs clarify this workflow.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c35e43f and d2d8a58.

📒 Files selected for processing (1)
  • backend/Makefile (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.
📚 Learning: 2025-10-26T12:50:50.512Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.

Applied to files:

  • backend/Makefile
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: CodeQL (javascript-typescript)

backend/Makefile Outdated
@echo "Dumping Nest e2e data"
@CMD="pg_dumpall -U nest_user_e2e --clean | gzip -9 > backend/data/nest-e2e.sql.gz" $(MAKE) exec-db-command-e2e

dump-data-local:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's look at this one more time. Are we going to use our full data snapshot for e2e testing (e.g. the entire data from nest.json.gz)? If so I'd prefer to keep it simple and have just a single data file -- without nest-e2e.sql.gz introduction.

Copy link
Collaborator Author

@ahmedxgouda ahmedxgouda Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use nest.sql.gz instead of nest.json.gz. As it is extremely faster to load.

def decorator(view_func):
@wraps(view_func)
def _wrapper(request, *args, **kwargs):
if settings.IS_E2E_ENVIRONMENT:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why e2e needs this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't setup cache for e2e, so when it tries to access cache it gives 500 internal error. I think there is an option to setup redis cache in CI/CD. Maybe we can do that or keep it simple.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's add cache service for e2e instead. The closer to production architecture the better -- for both local and CI/CD cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will add the cache in another PR after this one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's configure the cache backed for e2e via Django settings (locmem for now)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
backend/apps/api/rest/v0/__init__.py (1)

61-71: Consider combining duplicate environment configurations.

The E2E configuration (lines 61-71) is nearly identical to the local configuration (lines 49-59), differing only in the server description. This duplication can be reduced for better maintainability.

Apply this diff to combine the configurations:

-if settings.IS_LOCAL_ENVIRONMENT:
+if settings.IS_LOCAL_ENVIRONMENT or settings.IS_E2E_ENVIRONMENT:
+    description = "E2E" if settings.IS_E2E_ENVIRONMENT else "Local"
     api_settings_customization = {
         "auth": None,
         "servers": [
             {
-                "description": "Local",
+                "description": description,
                 "url": settings.SITE_URL,
             }
         ],
         "throttle": [],
     }
-
-elif settings.IS_E2E_ENVIRONMENT:
-    api_settings_customization = {
-        "auth": None,
-        "servers": [
-            {
-                "description": "E2E",
-                "url": settings.SITE_URL,
-            }
-        ],
-        "throttle": [],
-    }
-
-elif settings.IS_STAGING_ENVIRONMENT:
+
+elif settings.IS_STAGING_ENVIRONMENT:
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2d8a58 and 74bdb6c.

⛔ Files ignored due to path filters (2)
  • backend/data/nest-e2e.sql.gz is excluded by !**/*.gz
  • backend/data/nest.sql.gz is excluded by !**/*.gz
📒 Files selected for processing (5)
  • .github/workflows/setup-e2e-environment/action.yaml (4 hunks)
  • CONTRIBUTING.md (1 hunks)
  • backend/Makefile (2 hunks)
  • backend/apps/api/rest/v0/__init__.py (1 hunks)
  • backend/data/dump.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • CONTRIBUTING.md
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.
📚 Learning: 2025-10-26T12:50:50.512Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.

Applied to files:

  • .github/workflows/setup-e2e-environment/action.yaml
  • backend/Makefile
🪛 GitHub Actions: Run CI/CD
backend/data/dump.sh

[error] 3-3: CSpell: Unknown word (PGPASSWORD)


[error] 13-13: CSpell: Unknown word (Atqc)


[error] 15-15: CSpell: Unknown word (nspname)


[error] 23-23: CSpell: Unknown word (attisdropped)


[error] 24-24: CSpell: Unknown word (nspname)


[error] CSpell: 5 unknown words found in 1 file. CSpell check failed; verify spelling or update dictionary.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (4)
backend/Makefile (1)

50-52: LGTM! Simplified dump workflow.

The refactored dump-data target now delegates to the dedicated dump.sh script, which is cleaner and more maintainable than the previous inline Django command sequence.

.github/workflows/setup-e2e-environment/action.yaml (3)

10-15: Good addition of timeout safeguards.

Adding 1-minute timeouts to the readiness loops prevents the workflow from hanging indefinitely if services fail to start. This improves CI reliability and provides faster feedback on failures.

Also applies to: 57-62


10-15: Verify 1-minute timeout is adequate for CI environments.

The 1-minute timeout for both database and backend readiness might be too short in resource-constrained or slow CI runners, potentially causing false failures. Consider whether this duration provides enough margin for typical startup times plus occasional CI slowness.

Monitor the first few CI runs after merging to ensure:

  1. Services consistently start within the timeout
  2. No spurious timeout failures occur during slow CI periods

If timeout failures occur, consider increasing to 2-3 minutes or making it configurable via environment variable.

Also applies to: 57-62


26-26: Unable to verify file existence due to repository access restrictions.

The review comment requests verification that backend/data/nest.sql.gz exists, but I cannot access the OWASP Nest repository to confirm this. The PostgreSQL import command syntax is correct (gunzip -c ... | psql), and the filename nest.sql.gz follows standard PostgreSQL dump conventions.

The verification should be completed by:

  1. Running the proposed script locally to confirm the file exists and is valid
  2. Ensuring the file path is updated consistently across any related Makefile or documentation changes
  3. Verifying the data file contains appropriate E2E test fixtures

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/data/dump.sh (2)

14-15: Add environment variable validation at script entry.

The script doesn't validate that required environment variables are set before use. If any variable is empty or unset, PostgreSQL commands fail with cryptic errors downstream. Building on prior feedback, add explicit validation.

Apply this diff to validate required variables early:

 #!/bin/sh
+# cspell:ignore PGPASSWORD Atqc nspname attisdropped
 
 set -e
 
+# Validate required environment variables
+: "${DJANGO_DB_PASSWORD:?DJANGO_DB_PASSWORD is required}"
+: "${DJANGO_DB_HOST:?DJANGO_DB_HOST is required}"
+: "${DJANGO_DB_USER:?DJANGO_DB_USER is required}"
+: "${DJANGO_DB_NAME:?DJANGO_DB_NAME is required}"
+
 # Cleanup function to ensure temp DB is dropped even on error
 cleanup() {
     if [ -n "$TEMP_DB" ]; then
         echo "Cleaning up temporary database $TEMP_DB..."
         psql -h "$DJANGO_DB_HOST" -U "$DJANGO_DB_USER" -d postgres -c "DROP DATABASE IF EXISTS $TEMP_DB;" 2>/dev/null || true
     fi
 }
 trap cleanup EXIT
 
 export PGPASSWORD="$DJANGO_DB_PASSWORD"

This adds the CSpell configuration (which was flagged in prior CI runs) and validates all critical variables before any database operations.


6-11: Add explicit return statement to cleanup function for SonarCloud compliance.

SonarCloud flags shell functions without explicit return statements. While this is often a false positive for shell cleanup functions, adding an explicit return improves linter compliance.

Apply this diff:

 cleanup() {
     if [ -n "$TEMP_DB" ]; then
         echo "Cleaning up temporary database $TEMP_DB..."
         psql -h "$DJANGO_DB_HOST" -U "$DJANGO_DB_USER" -d postgres -c "DROP DATABASE IF EXISTS $TEMP_DB;" 2>/dev/null || true
     fi
+    return 0
 }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b4c9e2 and f747481.

📒 Files selected for processing (1)
  • backend/data/dump.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.
🪛 GitHub Check: SonarCloud Code Analysis
backend/data/dump.sh

[warning] 6-6: Add an explicit return statement at the end of the function.

See more on https://sonarcloud.io/project/issues?id=OWASP_Nest&issues=AZrF2NveBsDskKfEapS9&open=AZrF2NveBsDskKfEapS9&pullRequest=2710

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run CI Denendencies Scan
  • GitHub Check: Run Code Scan
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (1)
backend/data/dump.sh (1)

38-43: LGTM.

The conditional check before applying UPDATE statements is sound—it gracefully handles cases where no email columns exist. Combined with the error handling (set -e, trap cleanup), the script is robust and properly addresses the prior feedback on error handling and cleanup safeguards.

@ahmedxgouda ahmedxgouda requested a review from arkid15r November 27, 2025 15:32
def decorator(view_func):
@wraps(view_func)
def _wrapper(request, *args, **kwargs):
if settings.IS_E2E_ENVIRONMENT:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's add cache service for e2e instead. The closer to production architecture the better -- for both local and CI/CD cases.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
.github/workflows/setup-e2e-environment/action.yaml (2)

51-56: Use a known health endpoint for backend readiness check instead of /a.

The readiness check probes /a, which is a non-standard, arbitrary path. If this endpoint doesn't exist or is removed, the check will silently pass but the API may not be fully ready. Use a dedicated health or status endpoint (e.g., /api/health, /healthz, or a known public endpoint) to ensure the backend is genuinely operational.

-          until wget --spider http://localhost:9000/a; do
+          until wget --spider http://localhost:9000/api/health; do

Verify that the chosen endpoint exists, is publicly accessible (no auth required), and reflects backend readiness.


60-64: Add error handling for the data load step.

If the psql command fails (e.g., due to connection issues, SQL errors, or permission problems), the workflow silently continues. This can mask data load failures and cause misleading E2E test results.

       - name: Load Postgres data
         env:
           PGPASSWORD: nest_user_e2e_password
         run: |
           gunzip -c backend/data/nest.sql.gz | psql -h localhost -U nest_user_e2e -d nest_db_e2e
+          if [ $? -ne 0 ]; then
+            echo "Failed to load Postgres data"
+            exit 1
+          fi
         shell: bash

Or, use set -e to fail fast on any command error.

backend/pyproject.toml (1)

97-100: Scoped Ruff ignore looks fine; consider narrowing if more subprocess usage is added later.

Limiting S603/S607 ignores to just dump_data.py is a reasonable compromise given the trusted, ops-only nature of that command. If this file grows more subprocess logic over time, consider moving to line-level ignores (# noqa: S603,S607) around the specific psql/pg_dump invocations so future additions in the same file still benefit from those checks.

backend/Makefile (1)

50-76: Make SQL‑based load robust against missing/corrupt dump files.

Switching load-data / load-data-e2e to pipe nest.sql.gz directly into psql is a good fit for the new dump command and avoids the previous schema‑drop behavior. To harden this a bit, consider explicitly checking that the dump file exists (and is non‑empty) before piping it, so a missing/corrupt dump can fail the make target instead of just emitting a gunzip warning:

load-data:
	@echo "Loading Nest data"
	@test -s backend/data/nest.sql.gz || { echo "Missing or empty backend/data/nest.sql.gz"; exit 1; }
	@gunzip -c backend/data/nest.sql.gz | docker exec -i nest-db psql -U nest_user_dev -d nest_db_dev

load-data-e2e:
	@echo "Loading Nest e2e data"
	@test -s backend/data/nest.sql.gz || { echo "Missing or empty backend/data/nest.sql.gz"; exit 1; }
	@gunzip -c backend/data/nest.sql.gz | docker exec -i e2e-nest-db psql -U nest_user_e2e -d nest_db_e2e

This keeps behavior the same on the happy path while surfacing issues earlier when the dump file is unavailable.

backend/apps/common/management/commands/dump_data.py (3)

20-30: CLI --table arguments are additive to the defaults (not replacing them).

Because tables uses action="append" with a non‑empty default list, passing --table will append to the default patterns instead of replacing them:

  • No --table["public.owasp_*", "public.github_*", "public.slack_*"]
  • --table public.foo["public.owasp_*", "public.github_*", "public.slack_*", "public.foo"]

If that additive behavior is intentional, it might be worth clarifying it in the help text (e.g., “may be specified multiple times to add tables; defaults are always included”). If you intended CLI tables to override the defaults, you can instead:

  • Use default=None in add_arguments, and

  • In handle, do something like:

    tables = options["tables"] or ["public.owasp_*", "public.github_*", "public.slack_*"]

Also applies to: 40-41


44-60: Temp database name can cause collisions; consider making it unique per run.

temp_db = f"temp_{name}" combined with an unconditional DROP DATABASE IF EXISTS {temp_db} in finally means:

  • Two concurrent runs against the same source DB will race on the same temp DB name; one run can drop the other’s temp DB in the middle of its dump.
  • If the main DB name ever contains characters invalid in unquoted identifiers, CREATE DATABASE {temp_db} TEMPLATE {name}; will fail.

For this ops‑only command it’s not a blocker, but you can make it more robust by incorporating a unique suffix (PID, timestamp, or both) and quoting via psql:

import os
import time

temp_db = f"temp_{name}_{os.getpid()}_{int(time.time())}"

and then using a small helper to generate safe SQL (e.g. via format('CREATE DATABASE %I TEMPLATE %I', ...) in a DO block executed with \gexec), if you want to fully guard against unusual DB names.

Also applies to: 88-104


134-158: Subprocess wrapper is consistent; consider minor enhancements if you extend it later.

The _psql helper centralizes all psql calls, uses check=True, and passes credentials via environment (PGPASSWORD), which is a solid, simple pattern for this internal command. If this grows, a couple of optional tweaks you might consider:

  • Add text=True and pass strings instead of bytes to avoid manual .encode() for stdin cases.
  • Capture stderr on failures to include more context in the CommandError message (e.g., via run(..., capture_output=True) and surfacing e.stderr).

Not required for this PR, but these can make debugging dump failures easier when operating this command.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f747481 and ddae26b.

⛔ Files ignored due to path filters (1)
  • backend/data/nest.sql.gz is excluded by !**/*.gz
📒 Files selected for processing (4)
  • .github/workflows/setup-e2e-environment/action.yaml (2 hunks)
  • backend/Makefile (2 hunks)
  • backend/apps/common/management/commands/dump_data.py (1 hunks)
  • backend/pyproject.toml (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.
📚 Learning: 2025-10-26T12:50:50.512Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.

Applied to files:

  • backend/Makefile
  • .github/workflows/setup-e2e-environment/action.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: Run frontend unit tests
🔇 Additional comments (2)
.github/workflows/setup-e2e-environment/action.yaml (1)

44-45: I'll verify the review comment by examining the actual workflow file and investigating the database initialization sequence. Let me start by inspecting the action.yaml file.
<function_calls>

cat -n .github/workflows/setup-e2e-environment/action.yaml

</function_calls>

#!/bin/bash
# Read the full action.yaml file
cat -n .github/workflows/setup-e2e-environment/action.yaml
backend/apps/common/management/commands/dump_data.py (1)

106-132: Email masking strategy is broad but appropriate for e2e; just be aware of side‑effects.

The DO block walks all non‑system schemas and unconditionally runs:

UPDATE <schema>.<table> SET email = '';

for every column named email. This is a simple and effective way to scrub PII for e2e, but keep in mind:

  • Any “email” column used for non‑PII purposes will also be blanked.
  • Very large tables will get full‑table updates, which may be slow on big dumps.

Given this is an explicit data‑masking step for generating test/e2e dumps, this trade‑off is generally fine; just something to keep in mind if performance or more granular masking ever becomes a concern.

def decorator(view_func):
@wraps(view_func)
def _wrapper(request, *args, **kwargs):
if settings.IS_E2E_ENVIRONMENT:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's configure the cache backed for e2e via Django settings (locmem for now)

"--table",
action="append",
dest="tables",
default=["public.owasp_*", "public.github_*", "public.slack_*"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't dump the entire slack app data atm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ai code is still difficult to maintain. Also there are not tests for this file.

def _hide_emails(self) -> str:
# Uses a DO block to UPDATE every column named 'email' in non-system schemas
return """
DO $$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more readable/maintainable way to get rid of email addresses?

WHERE a.attname = 'email'
AND a.attnum > 0
AND NOT a.attisdropped
AND n.nspname NOT IN ('pg_catalog','information_schema','pg_toast')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These magic names and pretty much the entire PL/SQL spell doesn't fit our clean and readable code approach. Yes, it's probably performs the best but it's not the only goal we have.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/apps/api/rest/v0/__init__.py (1)

60-70: E2E env behavior looks correct; consider DRYing up shared config.

The E2E branch cleanly mirrors the local environment (no auth, no throttling, same SITE_URL), which matches the intent for a test-only backend and keeps staging/prod untouched. To avoid duplication between the local and E2E branches, you could extract a small helper:

@@
-api_settings = {
-    "auth": ApiKey(),  # The `api_key` param name is based on the ApiKey class name.
-    "default_router": RouterPaginated(),
-    "description": "Open Worldwide Application Security Project API",
-    "docs": Swagger(settings={"persistAuthorization": True}),
-    "throttle": [AuthRateThrottle("10/s")],
-    "title": "OWASP Nest",
-    "version": "0.3.1",
-}
-
-api_settings_customization = {}
-if settings.IS_LOCAL_ENVIRONMENT:
-    api_settings_customization = {
-        "auth": None,
-        "servers": [
-            {
-                "description": "Local",
-                "url": settings.SITE_URL,
-            }
-        ],
-        "throttle": [],
-    }
-elif settings.IS_E2E_ENVIRONMENT:
-    api_settings_customization = {
-        "auth": None,
-        "servers": [
-            {
-                "description": "E2E",
-                "url": settings.SITE_URL,
-            }
-        ],
-        "throttle": [],
-    }
+api_settings = {
+    "auth": ApiKey(),  # The `api_key` param name is based on the ApiKey class name.
+    "default_router": RouterPaginated(),
+    "description": "Open Worldwide Application Security Project API",
+    "docs": Swagger(settings={"persistAuthorization": True}),
+    "throttle": [AuthRateThrottle("10/s")],
+    "title": "OWASP Nest",
+    "version": "0.3.1",
+}
+
+
+def _no_auth_env_server(description: str) -> dict:
+    return {
+        "auth": None,
+        "servers": [
+            {
+                "description": description,
+                "url": settings.SITE_URL,
+            }
+        ],
+        "throttle": [],
+    }
+
+
+api_settings_customization = {}
+if settings.IS_LOCAL_ENVIRONMENT:
+    api_settings_customization = _no_auth_env_server("Local")
+elif settings.IS_E2E_ENVIRONMENT:
+    api_settings_customization = _no_auth_env_server("E2E")
@@
 elif settings.IS_STAGING_ENVIRONMENT:
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95b1391 and 4d2e344.

📒 Files selected for processing (2)
  • .github/workflows/setup-e2e-environment/action.yaml (2 hunks)
  • backend/apps/api/rest/v0/__init__.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-26T12:50:50.512Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.

Applied to files:

  • .github/workflows/setup-e2e-environment/action.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run backend tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (4)
.github/workflows/setup-e2e-environment/action.yaml (4)

50-58: Verify the health check endpoint /a exists and is accessible.

The workflow checks for backend readiness by hitting http://localhost:9000/a, but this endpoint path appears arbitrary. If this endpoint doesn't exist, returns an error, or requires authentication, the workflow could timeout or fail even when the backend is functionally ready. Verify that /a is intentional or consider using a standard health check endpoint (e.g., /health, /, or /api/).


60-64: Verify that loading data after backend startup is correct.

Data loading has been moved to occur after backend startup and migrations (line 60), whereas previously it ran before backend startup. Since migrations execute before gunicorn starts (line 44), the database schema is initialized first, but confirm this sequence doesn't break expected behavior or require data to be present during any initialization steps.


35-47: Security: Verify --network host is necessary.

Using --network host exposes the container to all host network interfaces, which is a permissive configuration. Confirm this is required for your E2E testing or consider using an explicit Docker network bridge for better isolation.


8-16: Timeout pattern is well-structured.

The timeout 5m wrapper around the pg_isready retry loop is a solid defensive measure to prevent indefinite hangs. Similarly applied to the backend readiness check (lines 51–56).

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 2, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend ci docs Improvements or additions to documentation makefile

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix the running e2e backend instance

2 participants