feat: fix ci and tests #174

smuu · 2024-12-13T11:18:17Z

Summary by CodeRabbit

Documentation
- Updated the README for improved clarity on project description and installation instructions.
New Features
- Enhanced Docker configuration for bridge and light node services with new environment variables.
- Improved scripts for managing trusted peers in bridge and light node setups.
Bug Fixes
- Adjusted connection strings in tests to connect to localhost instead of an IP address.
Chores
- Added error handling and installation checks in the justfile commands.
- Updated CI workflow to include wait steps for bridge and light nodes during integration tests.
- Modified Dockerfiles to use a newer base image version.

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

coderabbitai · 2024-12-13T11:18:25Z

Walkthrough

The pull request includes updates to various documentation and configuration files. The README.md was revised for clarity and detail about the decentralized key transparency protocol, including installation instructions. Dockerfiles for the Celestia bridge, light node, and validator were updated to use a newer base image. The docker-compose.yml file was modified to enhance service configurations, and scripts for managing bridge and light nodes were improved. Additionally, the justfile and CI workflow were updated for better error handling and process management.

Changes

File	Change Summary
`README.md`	Updated project description for clarity; added hyperlink to `just` documentation; clarified commands for starting prover and light-client.
`ci/Dockerfile.bridge`	Updated base image from `alpine:3.19.1` to `alpine:3.21.0`.
`ci/Dockerfile.lightnode`	Updated base image from `alpine:3.19.1` to `alpine:3.21.0`; no other changes made.
`ci/Dockerfile.validator`	Updated base image from `alpine:3.19.1` to `alpine:3.21.0`; no other changes made.
`ci/docker-compose.yml`	Added environment variables `SKIP_AUTH` and `CONTAINER_NAME` for bridge services; updated comments and added `BRIDGE_COUNT` for light node service.
`ci/run-bridge.sh`	Introduced `CONTAINER_NAME` and `TRUSTED_PEERS_FILE` variables; added `append_trusted_peers` function for managing trusted peers.
`ci/run-lightnode.sh`	Added `BRIDGE_COUNT` and `TRUSTED_PEERS_FILE` variables; updated functions to manage bridge nodes' readiness.
`crates/tests/src/lib.rs`	Updated WebSocket connection strings in `CelestiaConfig` for local connections.
`justfile`	Enhanced commands with error handling and tool installation checks; added shebang lines for bash execution.
`.github/workflows/ci.yml`	Added a step in the `integration-test` job to wait for the light node to start, filtering logs for readiness.

Possibly related PRs

refactor: refactoring into crates and adding nova, sp1 #115: The changes in the main PR regarding the README.md file's updates for clarity and detail enhancement may relate to the restructuring of the project in this PR, which aims to improve organization and modularity.
refactor(sequencer): cleanup sync logic #122: The modifications in the main PR that clarify installation and usage instructions could be relevant to the refactoring of the sequencer's sync logic, as clearer documentation may assist users in understanding the updated functionality.
ci: fix cargo caches and add celestia integration test #124: The changes in the main PR to improve documentation could be connected to the CI workflow updates, as clearer documentation may help users understand the integration tests being introduced.
feat(sp1): hashchain signature verification in sp1 circuit #125: The main PR's focus on enhancing documentation may relate to the introduction of hashchain signature verification, as improved documentation can help users understand new features.
feat(storage): InMemoryDatabase for tests #133: The introduction of an in-memory database for testing in this PR may benefit from the improved documentation in the main PR, as clearer instructions can assist in understanding the testing framework.
feat(WIP): secp256k1-support, rename PublicKey to VerifyingKey #134: The renaming of PublicKey to VerifyingKey and the addition of secp256k1 support may require updated documentation, which aligns with the changes made in the main PR to enhance clarity.
refactor(prism-bin): renaming to prism-cli, refactoring args/config, and adding full-node type #145: The refactoring of command-line arguments and configuration in this PR may benefit from the improved documentation in the main PR, as clearer instructions can help users navigate the new command structure.
refactor(common): use underlying VK types instead of Vec<u8> #150: The changes to utilize underlying VK types instead of Vec<u8> may necessitate updated documentation to reflect these changes, which aligns with the main PR's focus on enhancing clarity.
feat(common): secp256r1 support #152: The introduction of secp256r1 support may require updated documentation to explain the new functionality, which relates to the main PR's goal of improving clarity and usability.

Suggested reviewers

distractedm1nd

Poem

🐇 In the meadow where changes bloom,
The README shines with clearer room.
Dockerfiles fresh, like morning dew,
Light nodes ready, all brand new!
With trusted peers and bridges wide,
Together we hop, in code we glide! 🌼

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)

justfile (2)
71-73: Enhance error handling in integration tests

While the error message is good, we should explicitly exit with a non-zero status on test failure to ensure proper CI pipeline behavior.
  if ! cargo test -p prism-tests --lib --release --features mock_prover; then
    echo "Integration tests failed."
+   exit 1
  fi
174-182: Consider version pinning for cargo tools

The tool installation loop is a good addition, but consider pinning specific versions to ensure consistent behavior across different environments.
  for tool in cargo-udeps cargo-llvm-cov cargo-nextest; do \
    if ! command -v $tool > /dev/null; then \
      echo "Installing $tool..."; \
-     cargo install $tool; \
+     case $tool in \
+       cargo-udeps)    cargo install cargo-udeps --version "0.1.35" ;; \
+       cargo-llvm-cov) cargo install cargo-llvm-cov --version "0.5.31" ;; \
+       cargo-nextest)  cargo install cargo-nextest --version "0.9.63" ;; \
+     esac; \
    else \
      echo "$tool is already installed."; \
    fi; \
  done
ci/run-lightnode.sh (1)
9-9: Redundant assignment of BRIDGE_COUNT variable

The line BRIDGE_COUNT="${BRIDGE_COUNT}" assigns the variable to itself without any modification. If BRIDGE_COUNT is already set in the environment, this line is unnecessary and can be removed.

Apply this diff to remove the redundant assignment:
-BRIDGE_COUNT="${BRIDGE_COUNT}"
README.md (1)
42-42: Correct the hyperlink to the 'just' task runner

The hyperlink to just in the installation instructions appears to be incorrect. The URL may not direct users to the intended page.

Update the link to:
-We use [`just`](https://github.com/casey/just?tab=readme-ov-file#packages) as a task runner. Once installed, you can install the rest of the dependencies with:
+We use [`just`](https://github.com/casey/just#packages) as a task runner. Once installed, you can install the rest of the dependencies with:
crates/tests/src/lib.rs (1)
Line range hint 63-127: Consider improving test reliability and cleanup.

The test spawns multiple async tasks including an infinite transaction generation loop. Consider these improvements:

Add a timeout for the entire test to prevent hanging

Implement proper cleanup of spawned tasks

Document or make configurable the 100-block test duration

Here's a suggested improvement:
 #[tokio::test]
+#[timeout(300000)] // 5 minute timeout
 async fn test_light_client_prover_talking() -> Result<()> {
     // ... existing setup ...
 
+    let (tx, rx) = tokio::sync::oneshot::channel();
     let prover_clone = prover.clone();
-    spawn(async move {
+    let prover_handle = spawn(async move {
         debug!("starting prover");
         prover_clone.run().await.unwrap();
     });
 
     let lc_clone = lightclient.clone();
-    spawn(async move {
+    let lc_handle = spawn(async move {
         debug!("starting light client");
         lc_clone.run().await.unwrap();
     });
 
-    spawn(async move {
+    let tx_handle = spawn(async move {
         let mut rng = StdRng::from_entropy();
+        let mut rx = rx;
 
         // ... transaction generation code ...
 
         loop {
+            if rx.try_recv().is_ok() {
+                break;
+            }
             // ... existing loop code ...
         }
     });
 
     let mut rx = lc_da_layer.clone().subscribe_to_heights();
     let initial_height = rx.recv().await.unwrap();
     while let Ok(height) = rx.recv().await {
         debug!("received height {}", height);
         if height >= initial_height + 100 {
             break;
         }
     }
 
+    // Cleanup
+    tx.send(()).unwrap();
+    tx_handle.abort();
+    prover_handle.abort();
+    lc_handle.abort();
+
     Ok(())
 }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f2f977 and c2323e1.

📒 Files selected for processing (9)

README.md (1 hunks)
ci/Dockerfile.bridge (1 hunks)
ci/Dockerfile.lightnode (1 hunks)
ci/Dockerfile.validator (1 hunks)
ci/docker-compose.yml (3 hunks)
ci/run-bridge.sh (4 hunks)
ci/run-lightnode.sh (4 hunks)
crates/tests/src/lib.rs (1 hunks)
justfile (4 hunks)

✅ Files skipped from review due to trivial changes (3)

ci/Dockerfile.bridge
ci/Dockerfile.validator
ci/Dockerfile.lightnode

🧰 Additional context used

🪛 Shellcheck (0.10.0)

ci/run-lightnode.sh

[warning] 82-82: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 83-83: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (6)

justfile (2)

11-13: LGTM: Docker build step addition improves reliability

The addition of the build step before spinning up the stack ensures that all images are properly built, reducing potential runtime failures.

94-106: Verify coverage exclusions and enhance error handling

The coverage command setup looks good, but has two areas for improvement:

Error handling should be consistent with other commands
We should verify the excluded packages are intentional

  if ! cargo llvm-cov nextest --html --output-dir coverage_report --lib --features "mock_prover" --release --workspace --exclude prism-cli --exclude-from-report prism-sp1 --ignore-filename-regex sp1; then
    echo "Coverage report generation failed."
+   exit 1
  else
    echo "Coverage report generated in 'coverage_report' directory"
  fi

✅ Verification successful

Let me gather more information about the test coverage setup to make a more informed decision.

Let me check one more thing to understand the SP1-related code structure and test coverage requirements.

Coverage exclusions are appropriate, but error handling needs improvement

The exclusions in the coverage command are appropriate:

prism-cli is excluded as it's primarily an executable
prism-sp1 (under crates/zk/sp1) is excluded as it's SP1-specific code
The sp1 filename regex exclusion aligns with the SP1-related implementation files

The error handling should be enhanced as suggested:

  if ! cargo llvm-cov nextest --html --output-dir coverage_report --lib --features "mock_prover" --release --workspace --exclude prism-cli --exclude-from-report prism-sp1 --ignore-filename-regex sp1; then
    echo "Coverage report generation failed."
+   exit 1
  else
    echo "Coverage report generated in 'coverage_report' directory"
  fi

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the excluded packages exist and are intentional
fd -t d "prism-(cli|sp1)" .

# Check for any other test files that might need exclusion
rg -l "^#\[cfg\(test\)\]" . --type rust

Length of output: 234

Script:

#!/bin/bash
# Check if prism-cli and prism-sp1 have test-specific code
rg -l "test" ./prism-cli ./prism-sp1 --type rust || true

# Check Cargo.toml to understand the workspace structure
cat Cargo.toml

# Look for any mock or test utilities in sp1-related files
rg -l "mock|test" . --type rust | grep -i "sp1" || true

Length of output: 3580

Script:

#!/bin/bash
# Check the structure and content of the SP1-related code
fd -t f . "crates/zk/sp1"

# Look for any test configurations in workspace
rg -l "mock_prover|test_utils" . --type toml

Length of output: 371

ci/docker-compose.yml (3)

26-30: Environment variables added correctly for bridge nodes

The addition of SKIP_AUTH and CONTAINER_NAME environment variables for bridge-0 enhances configurability and clarity in the service setup. The variables are appropriately documented.

49-51: Environment variables added correctly for bridge nodes

Similarly, the environment variables for bridge-1 are correctly added, ensuring consistency across bridge services.

65-72: Environment variables added correctly for light node

The inclusion of SKIP_AUTH and BRIDGE_COUNT for light-0 improves the light node's configuration, aligning it with the new peer management features.

crates/tests/src/lib.rs (1)

37-37: LGTM! Connection string changes improve security and resolve port conflicts.

The changes from 0.0.0.0 to localhost improve security by restricting connections to the local machine. The light client port change to 46658 helps avoid potential port conflicts.

Let's verify these ports are correctly configured in the docker-compose file:

Also applies to: 41-41

✅ Verification successful

Port mappings in docker-compose.yml correctly align with test configuration

The verification confirms that:

Port 36658 is mapped to 26658 for bridge connection
Port 46658 is mapped to 26658 for light client connection

These mappings match exactly with the connection strings in the test file:

Bridge: "ws://localhost:36658"
Light client: "ws://localhost:46658"

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the WebSocket ports in docker-compose match the test configuration
# Expected: Port mappings for 36658 and 46658 should be present

rg -A 2 "ports:" ci/docker-compose.yml | rg "366[0-9][0-9]|466[0-9][0-9]"

Length of output: 113

ci/run-lightnode.sh

ci/run-bridge.sh

ci/run-lightnode.sh

ci/run-bridge.sh

justfile

crates/tests/src/lib.rs

justfile

distractedm1nd

beautiful

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)

ci/run-bridge.sh (2)
63-76: Add error message details in retry loop

The retry loop could be more informative about why the peer ID retrieval failed.
-    if [[ -z "$peer_id" ]]; then
-      echo "Node is not running yet. Retrying..."
+    if [[ -z "$peer_id" ]]; then
+      echo "Node is not running yet. Failed to get peer ID. Retrying... (${elapsed}s/${timeout}s)"
78-80: Consider making the port configurable

The RPC port (2121) is hardcoded in the multiaddr construction. Consider making this configurable via an environment variable for flexibility.
+RPC_PORT="${RPC_PORT:-2121}"
-  multiaddr="/dns/$CONTAINER_NAME/tcp/2121/p2p/$peer_id"
+  multiaddr="/dns/$CONTAINER_NAME/tcp/$RPC_PORT/p2p/$peer_id"

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2323e1 and ed8fff4.

📒 Files selected for processing (1)

ci/run-bridge.sh (4 hunks)

🔇 Additional comments (3)

ci/run-bridge.sh (3)

9-9: LGTM: Well-structured variable declarations

The new variables are properly defined with sensible defaults and follow the existing pattern of using shared directories.

Also applies to: 23-23

82-101: LGTM: Proper implementation of file locking

The implementation correctly addresses the race condition concerns raised in the previous review by using proper file locking mechanisms.

Line range hint 1-124: Verify related CI script updates

Since this script is part of the CI infrastructure and handles critical peer configuration, we should verify if similar changes are needed in related scripts (e.g., light node scripts).

✅ Verification successful

Let me check the content of the related scripts to verify if they need similar updates.

No additional updates needed in related scripts

The verification shows that the peer handling implementation in run-bridge.sh and run-lightnode.sh are complementary and working as intended:

run-bridge.sh appends its peer information to the shared TRUSTED_PEERS_FILE
run-lightnode.sh waits for all bridge nodes to register their peer information and properly configures them using add_trusted_peers()
The docker-compose.yml correctly sets up the container names and bridge count used by this peer configuration system

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for related CI scripts that might need similar updates
echo "Searching for related CI scripts..."
fd -t f . ci/ -x grep -l "celestia" {} \;

# Check for similar peer handling in other scripts
echo -e "\nChecking for similar peer handling..."
rg -l "trusted.*peer" ci/

Length of output: 429

Script:

#!/bin/bash
# Check content of run-lightnode.sh for peer handling
echo "=== Content of run-lightnode.sh ==="
cat ci/run-lightnode.sh

echo -e "\n=== Checking trusted peer handling in docker-compose.yml ==="
cat ci/docker-compose.yml | grep -A 5 -B 5 "trusted.*peer"

Length of output: 5061

ci/run-bridge.sh

distractedm1nd · 2024-12-13T11:53:52Z

in the CI the integration test still can't connect to the light node, hmmm

smuu · 2024-12-13T12:16:18Z

in the CI the integration test still can't connect to the light node, hmmm

I needed to change 0.0.0.0 to localhost to get it working. Can you test with 0.0.0.0 to see if this is the issue?

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed8fff4 and 8cd0a40.

📒 Files selected for processing (2)

crates/da/src/celestia.rs (1 hunks)
crates/tests/src/lib.rs (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

crates/tests/src/lib.rs

🔇 Additional comments (1)

crates/da/src/celestia.rs (1)

46-46: Security consideration for exposed WebSocket endpoint

The change to 0.0.0.0 exposes the WebSocket endpoint on all network interfaces, which could present security risks if not properly secured. Please ensure:

This exposure is intentional and required for the architecture
The WebSocket endpoint has proper authentication (note: the code shows optional auth token support)
Network-level security (firewalls, security groups) is configured appropriately

✅ Verification successful

Let me gather more context about the Celestia client implementation and configuration.

Based on the gathered context, I can now provide a final response:

WebSocket binding to 0.0.0.0 is intended for development and testing

The binding to 0.0.0.0:26658 is part of the default configuration in a development context, as evidenced by:

It's used alongside test configurations (36658, 46658 ports in tests)
The codebase is explicitly marked as "not yet suitable for production environments" in README.md
Authentication support is properly implemented via optional auth tokens in the client
This is a local development setup that requires running a local Celestia network ("just celestia-up")

For production deployments, the connection string should be configured to a more restrictive binding through CLI arguments or configuration, which is already supported via the celestia_client parameter.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for security-related configurations

# Search for authentication token usage
echo "Checking auth token usage..."
rg -A 5 "auth_token"

# Search for security-related configurations
echo "Checking security configurations..."
fd -e toml -e yaml -e json -x rg "(security|auth|token|credential)"

Length of output: 944

Script:

#!/bin/bash
# Check the Celestia client implementation and configuration details
echo "Checking Celestia client implementation..."
rg -A 10 "Client::new" 

echo "Checking WebSocket connection configuration..."
rg -A 10 "connection_string"

echo "Checking if there are any security-related constants or configurations..."
rg -B 2 -A 5 "0.0.0.0:26658"

echo "Looking for any documentation about security considerations..."
fd README.md -x cat {}

Length of output: 17561

crates/da/src/celestia.rs

This reverts commit 8cd0a40.

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

.github/workflows/ci.yml (1)
90-94: LGTM! The wait step for light node is a good addition.

This addition properly ensures the light node is ready before running integration tests, which should help resolve the connection issues mentioned in the PR discussion.

Consider adding a timeout to prevent the workflow from hanging if the light node fails to start:
      - name: Wait for light node to start
        run: |
          docker compose -f ci/docker-compose.yml logs -f |
-            awk '/Configuration finished. Running a light node/ {print;exit}'
+            timeout 60 awk '/Configuration finished. Running a light node/ {print;exit}'
+          if [ $? -eq 124 ]; then
+            echo "Error: Light node failed to start within 60 seconds"
+            docker compose -f ci/docker-compose.yml logs
+            exit 1
+          fi

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cd0a40 and 3ebe2b5.

📒 Files selected for processing (2)

.github/workflows/ci.yml (1 hunks)
crates/tests/src/lib.rs (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

crates/tests/src/lib.rs

🔇 Additional comments (1)

.github/workflows/ci.yml (1)

90-94: Verify Docker service connectivity configuration.

Given the discussion about changing addresses from 0.0.0.0 to localhost, we should verify the Docker service connectivity configuration.

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)

ci/docker-compose.yml (2)
37-56: Clean up commented code

Instead of commenting out the entire bridge-1 service configuration, consider:

Removing it if it's no longer needed

Using source control to track the changes

If this is temporary for testing, add a TODO comment explaining the plan
-  # bridge-1:
-  #   image: bridge
-  #   platform: "linux/amd64"
-  #   build:
-  #     context: .
-  #     dockerfile: Dockerfile.bridge
-  #   environment:
-  #     # provide an id for the bridge node (default: 0)
-  #     # each node should have a next natural number starting from 0
-  #     - NODE_ID=1
-  #     # setting SKIP_AUTH to true disables the use of JWT for authentication
-  #     - SKIP_AUTH=true
-  #     # this must match the service name in the docker-compose.yml file
-  #     # used for the trusted peers string for the light node
-  #     - CONTAINER_NAME=bridge-1
-  #   ports:
-  #     - 36658:26658
-  #   volumes:
-  #     - credentials:/credentials
-  #     - genesis:/genesis
Line range hint 1-89: Document testing strategy

The changes suggest a simplified testing setup (single bridge node, disabled auth) but lack documentation about:

The testing strategy for different configurations

Validation steps for the CI pipeline

How this setup differs from production

Consider adding a ci/README.md documenting:

The CI environment setup and configuration

Test scenarios and expected outcomes

Troubleshooting guidelines

Production vs CI differences

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7cf1cd6 and b6ad595.

📒 Files selected for processing (3)

.github/workflows/ci.yml (1 hunks)
ci/docker-compose.yml (3 hunks)
crates/tests/src/lib.rs (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

.github/workflows/ci.yml
crates/tests/src/lib.rs

🔇 Additional comments (2)

ci/docker-compose.yml (2)

10-10: Verify impact of reducing bridge nodes

Reducing the bridge count from 2 to 1 removes redundancy in the system. While this might simplify testing, it could affect system reliability in production.

Consider documenting the rationale for this change and its implications for production deployments.

✅ Verification successful

Reducing bridge count is intentional for CI testing

The reduction from 2 to 1 bridge nodes is a valid change for CI testing purposes. The scripts show that:

The bridge count is configurable via BRIDGE_COUNT environment variable
Light nodes wait for the configured number of bridge nodes before starting
The setup is specifically for CI testing with proper coordination between validator, bridge, and light nodes
The scripts handle both single and multiple bridge node scenarios correctly

This change simplifies the CI testing environment without compromising the test functionality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are any references to multiple bridge nodes in configuration or documentation
rg -l "bridge.*nodes?" . --type md --type yaml --type sh

Length of output: 145

Script:

#!/bin/bash
# Check the content of CI scripts to understand the bridge node setup and its implications
echo "=== run-bridge.sh ==="
cat ./ci/run-bridge.sh
echo -e "\n=== run-lightnode.sh ==="
cat ./ci/run-lightnode.sh
echo -e "\n=== run-validator.sh ==="
cat ./ci/run-validator.sh

Length of output: 14976

65-72: ⚠️ Potential issue

Address light node connection issues

The configuration changes attempt to fix the CI connection issues by:

Adding SKIP_AUTH=true to match bridge configuration
Setting explicit BRIDGE_COUNT=1

However, based on the PR comments about 0.0.0.0 vs localhost, there might be additional networking configuration needed.

Consider:

Adding explicit host binding configuration
Documenting the networking requirements
Adding connection validation in the CI pipeline

ci/docker-compose.yml

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (5)

ci/docker-compose.yml (2)
Line range hint 8-14: Inconsistent BRIDGE_COUNT configuration

The BRIDGE_COUNT=2 in validator service appears incorrect since:

The PR is addressing light node connection issues

Comments indicate troubleshooting with a single bridge node

This might be contributing to the connection problems mentioned in PR comments
- BRIDGE_COUNT=2
+ BRIDGE_COUNT=1
Line range hint 37-56: Remove unused bridge-1 service

Since the PR is focused on fixing light node connection issues with a single bridge, maintaining the bridge-1 service configuration:

Adds unnecessary complexity

May interfere with connection troubleshooting

Conflicts with the suggested BRIDGE_COUNT=1

Consider removing the entire bridge-1 service block while debugging the connection issues.
ci/run-lightnode.sh (1)
32-64: Improve timeout handling in wait_for_provision

The function has been enhanced to wait for bridge nodes, but the timeout handling could be improved.

Consider these improvements:

Make timeout configurable via environment variable

Add more detailed error message on timeout

Add progress indicator
+PROVISION_TIMEOUT="${PROVISION_TIMEOUT:-30}"  # Allow configurable timeout
 echo "Waiting for $BRIDGE_COUNT bridge nodes to start"
 start_time=$(date +%s)
-timeout=30
+timeout=$PROVISION_TIMEOUT
+echo "Will wait up to $timeout seconds..."
ci/run-validator.sh (2)
Line range hint 191-193: Fix potential race condition in genesis hash file writing

Both provision_bridge_nodes and provision_light_nodes are run in parallel and write to the same GENESIS_HASH_FILE. This could lead to a race condition. Consider:

Running them sequentially instead of parallel

Or having only one function write the genesis hash
- provision_bridge_nodes &
- provision_light_nodes &
+ # Run sequentially to avoid race condition
+ provision_bridge_nodes
+ provision_light_nodes
Line range hint 3-3: Enhance error handling for CI reliability

While set -euo pipefail provides basic error handling, consider adding:

Timeout handling for wait_for_block to prevent infinite loops

Explicit checks for file operations in shared directory

Logging for better CI debugging
 set -euo pipefail
+
+# Add timeout handling
+TIMEOUT=300  # 5 minutes
+
+# Add error handling function
+handle_error() {
+  echo "Error occurred in script at line $1"
+  exit 1
+}
+
+trap 'handle_error $LINENO' ERR

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6ad595 and 4364107.

📒 Files selected for processing (5)

.github/workflows/ci.yml (1 hunks)
ci/docker-compose.yml (5 hunks)
ci/run-bridge.sh (4 hunks)
ci/run-lightnode.sh (4 hunks)
ci/run-validator.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/ci.yml

🧰 Additional context used

🪛 Shellcheck (0.10.0)

ci/run-lightnode.sh

[warning] 87-87: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 88-88: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (8)

ci/docker-compose.yml (3)

26-30: Authentication still disabled in CI environment

While the added comments explain the purpose of SKIP_AUTH and CONTAINER_NAME, the security concern about disabled authentication remains unaddressed.

87-91: LGTM: Good volume configuration for CI

The tmpfs configuration for the shared volume is well-suited for CI:

Provides fast I/O performance
Ensures clean state between test runs
Properly configured with appropriate driver options

Line range hint 1-91: Verify configuration consistency

Let's verify the overall configuration consistency to help resolve the connection issues.

✅ Verification successful

Let me verify the configuration consistency by generating additional scripts to check the environment variables and port mappings.

Configuration is consistent and properly set up

The verification shows that:

Bridge count is consistently set to 2 across services
Container names are properly referenced (bridge-0, bridge-1)
Port mappings are correctly configured with unique host ports (26658, 36658, 46658)
Authentication settings (SKIP_AUTH) are consistently set to true across all services

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check configuration consistency across services

# Check for bridge count references
echo "Checking bridge count consistency:"
rg "BRIDGE_COUNT" -A 1

# Check for container name references
echo "\nChecking container name references:"
rg "CONTAINER_NAME" -A 1

# Check for network-related configurations
echo "\nChecking network configurations:"
rg "0.0.0.0|localhost" -A 1

Length of output: 67959

Script:

#!/bin/bash
# Check port mappings and environment variables

# Check all port mappings
echo "Checking port mappings:"
rg "ports:" -A 2 ci/docker-compose.yml

# Check environment variables for bridge count
echo "\nChecking bridge count environment variables:"
rg "BRIDGE_COUNT" ci/docker-compose.yml

# Check container names and their references
echo "\nChecking container name references:"
rg "CONTAINER_NAME|container_name" ci/docker-compose.yml

# Check skip auth settings
echo "\nChecking authentication settings:"
rg "SKIP_AUTH" ci/docker-compose.yml

Length of output: 1048

ci/run-lightnode.sh (1)

86-91: Fix potential return value masking in add_trusted_peers

The function has shellcheck warnings about masking return values.

Apply this diff as suggested in previous reviews:

add_trusted_peers() {
-  local trusted_peers="$(cat "$TRUSTED_PEERS_FILE")"
-  local formatted_peers=$(echo "$trusted_peers" | sed 's/\([^,]*\)/"\1"/g')
+  local trusted_peers
+  trusted_peers="$(cat "$TRUSTED_PEERS_FILE")"
+  local formatted_peers
+  formatted_peers=$(echo "$trusted_peers" | sed 's/\([^,]*\)/"\1"/g')
   echo "Trusting peers: $formatted_peers"
   sed -i'.bak' "s|TrustedPeers = .*|TrustedPeers = [$formatted_peers]|" "$CONFIG_DIR/config.toml"
}

🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 87-87: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 88-88: Declare and assign separately to avoid masking return values.

(SC2155)

ci/run-bridge.sh (2)

115-116: Improve background process handling

The current implementation of background process could be improved.

As suggested in previous reviews, implement proper background process tracking and cleanup:

+# Track background processes
+declare -a bg_pids
+
+# Cleanup function
+cleanup() {
+    for pid in "${bg_pids[@]}"; do
+        kill "$pid" 2>/dev/null || true
+    done
+}
+
+# Set trap
+trap cleanup EXIT
+
 main() {
     # ... previous code ...
     
-    append_trusted_peers &
-    sleep 4
+    append_trusted_peers & 
+    bg_pids+=($!)
+    
+    # Wait for trusted peers file to be created or timeout
+    timeout=30
+    start_time=$(date +%s)
+    while [[ ! -f "$TRUSTED_PEERS_FILE" ]]; do
+        current_time=$(date +%s)
+        if (( current_time - start_time >= timeout )); then
+            echo "Timeout waiting for trusted peers"
+            exit 1
+        fi
+        sleep 1
+    done

Line range hint 1-1: Verify the impact on CI pipeline

The changes look good and address the light node connection issues. Let's verify the impact on the CI pipeline.

✅ Verification successful

Let me gather more information about the CI configuration and pipeline changes.

Based on the gathered information, I can now provide the final response:

CI pipeline changes have been properly implemented with robust error handling

The changes in ci/run-bridge.sh show significant improvements:

Proper file locking mechanism implemented for TRUSTED_PEERS_FILE to prevent race conditions
Timeout handling added for peer ID retrieval (30 seconds limit)
Robust error handling with proper exit codes
Clear logging for debugging purposes
Sequential initialization steps with proper waiting mechanisms

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check CI workflow files for light node configuration
echo "Checking CI workflow configuration..."
rg -l "BRIDGE_COUNT|TRUSTED_PEERS_FILE" .github/workflows/

# Check for any remaining connection issues in CI logs
echo "Checking recent CI logs for connection issues..."
gh run list --limit 5 --json conclusion,databaseId,url \
  | jq -r '.[] | select(.conclusion=="failure") | .url' \
  | while read -r url; do
    gh run view "$url" --log | rg -i "connection|light.*node"
  done

Length of output: 377

Script:

#!/bin/bash
# Check for CI configuration changes in the PR
echo "Checking CI-related changes..."
rg -l "ci/" --type sh

# Look for bridge-related configuration
echo "Checking bridge configuration..."
fd "bridge" --type f --exec cat {}

# Check workflow files for CI configuration
echo "Checking workflow files..."
fd "workflow" -e yml -e yaml --exec cat {}

Length of output: 4513

ci/run-validator.sh (2)

21-23: Verify shared directory permissions and synchronization

The change from GENESIS_DIR to SHARED_DIR makes sense as it's now used for sharing both genesis hash and peer addresses. However, ensure that:

The /shared directory is properly mounted and accessible by all nodes
File permissions are correctly set for both reading and writing

✅ Verification successful

Directory sharing configuration is properly implemented

The change from GENESIS_DIR to SHARED_DIR is correctly implemented with proper infrastructure support:

The shared volume is configured as a tmpfs mount in docker-compose.yml, ensuring fast and secure inter-container communication
All nodes (validator, bridge, light) consistently use the same /shared directory path
The shared directory is properly mounted across all containers using Docker volumes

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the shared directory is referenced in docker-compose.yml
rg -l "shared:" 
# Check if other scripts are using the same shared directory
rg "/shared" --type sh

Length of output: 223

Script:

#!/bin/bash
# Check how the shared directory is configured in docker-compose.yml
rg -A 5 "shared:" ci/docker-compose.yml

# Check how the shared directory is used in other scripts
rg -B 2 -A 2 "/shared" ci/run-{validator,lightnode,bridge}.sh

Length of output: 1598

Line range hint 176-177: Review connection configuration for light node issues

The script sets the validator to listen on 0.0.0.0:26657, but PR comments mention connection issues between the light node and validator. Consider:

Verifying if this binding address is appropriate for your CI environment
Testing with localhost as mentioned in PR comments

ci/docker-compose.yml

ci/run-lightnode.sh

ci/run-bridge.sh

feat: fix ci and tests

c2323e1

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

smuu requested review from distractedm1nd and sebasti810 December 13, 2024 11:18

coderabbitai bot requested changes Dec 13, 2024

View reviewed changes

ci/run-lightnode.sh Show resolved Hide resolved

ci/run-bridge.sh Show resolved Hide resolved

smuu commented Dec 13, 2024

View reviewed changes

ci/run-lightnode.sh Show resolved Hide resolved

ci/run-bridge.sh Show resolved Hide resolved

justfile Show resolved Hide resolved

justfile Show resolved Hide resolved

justfile Show resolved Hide resolved

crates/tests/src/lib.rs Show resolved Hide resolved

justfile Show resolved Hide resolved

distractedm1nd approved these changes Dec 13, 2024

View reviewed changes

fix: race condition

ed8fff4

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

coderabbitai bot requested changes Dec 13, 2024

View reviewed changes

ci/run-bridge.sh Show resolved Hide resolved

fix: 0.0.0.0 for github ci

8cd0a40

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

coderabbitai bot requested changes Dec 13, 2024

View reviewed changes

crates/da/src/celestia.rs Outdated Show resolved Hide resolved

smuu added 2 commits December 13, 2024 15:21

Revert "fix: 0.0.0.0 for github ci"

8f39da8

This reverts commit 8cd0a40.

fix: github CI wait for light node to run

3ebe2b5

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

coderabbitai bot reviewed Dec 13, 2024

View reviewed changes

coderabbitai bot approved these changes Dec 13, 2024

View reviewed changes

smuu added 6 commits December 13, 2024 15:36

fix(ci): wait for each node and show logs

7cf1cd6

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

fix(ci): tty not avaiable on github

fb10fdc

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

feat(ci): read logs and sleep

91be5a4

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

debug(ci): looking at light logs

629a182

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

debug(ci): sleep 30 and then logs

f21bc98

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

debug(ci): only one DA BN

b6ad595

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

coderabbitai bot requested changes Dec 13, 2024

View reviewed changes

ci/docker-compose.yml Show resolved Hide resolved

smuu added 2 commits December 13, 2024 16:21

debug(ci): should work now

69270ca

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

fix(ci): not beeing stuck

a5170eb

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

coderabbitai bot approved these changes Dec 13, 2024

View reviewed changes

fix(ci): caching

4364107

Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>

coderabbitai bot requested changes Dec 13, 2024

View reviewed changes

ci/docker-compose.yml Show resolved Hide resolved

ci/run-lightnode.sh Show resolved Hide resolved

ci/run-bridge.sh Show resolved Hide resolved

coderabbitai bot approved these changes Dec 13, 2024

View reviewed changes

smuu mentioned this pull request Dec 13, 2024

deps: bump to latest celestia #175

Closed

distractedm1nd merged commit 93a85ec into main Dec 16, 2024
5 checks passed

coderabbitai bot mentioned this pull request Dec 16, 2024

feat(deps): bump to latest celestia #176

Merged

smuu deleted the smuu/20241213-fix-ci-and-test branch December 16, 2024 10:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fix ci and tests #174

feat: fix ci and tests #174

smuu commented Dec 13, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 13, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

distractedm1nd left a comment

coderabbitai bot left a comment

distractedm1nd commented Dec 13, 2024

smuu commented Dec 13, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

feat: fix ci and tests #174

feat: fix ci and tests #174

Conversation

smuu commented Dec 13, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Dec 13, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

distractedm1nd left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

distractedm1nd commented Dec 13, 2024

smuu commented Dec 13, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

smuu commented Dec 13, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 13, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)