-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: fix ci and tests #174
Conversation
Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>
WalkthroughThe pull request includes updates to various documentation and configuration files. The Changes
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (5)
justfile (2)
71-73
: Enhance error handling in integration testsWhile the error message is good, we should explicitly exit with a non-zero status on test failure to ensure proper CI pipeline behavior.
if ! cargo test -p prism-tests --lib --release --features mock_prover; then echo "Integration tests failed." + exit 1 fi
174-182
: Consider version pinning for cargo toolsThe tool installation loop is a good addition, but consider pinning specific versions to ensure consistent behavior across different environments.
for tool in cargo-udeps cargo-llvm-cov cargo-nextest; do \ if ! command -v $tool > /dev/null; then \ echo "Installing $tool..."; \ - cargo install $tool; \ + case $tool in \ + cargo-udeps) cargo install cargo-udeps --version "0.1.35" ;; \ + cargo-llvm-cov) cargo install cargo-llvm-cov --version "0.5.31" ;; \ + cargo-nextest) cargo install cargo-nextest --version "0.9.63" ;; \ + esac; \ else \ echo "$tool is already installed."; \ fi; \ doneci/run-lightnode.sh (1)
9-9
: Redundant assignment of BRIDGE_COUNT variableThe line
BRIDGE_COUNT="${BRIDGE_COUNT}"
assigns the variable to itself without any modification. IfBRIDGE_COUNT
is already set in the environment, this line is unnecessary and can be removed.Apply this diff to remove the redundant assignment:
-BRIDGE_COUNT="${BRIDGE_COUNT}"
README.md (1)
42-42
: Correct the hyperlink to the 'just' task runnerThe hyperlink to
just
in the installation instructions appears to be incorrect. The URL may not direct users to the intended page.Update the link to:
-We use [`just`](https://github.com/casey/just?tab=readme-ov-file#packages) as a task runner. Once installed, you can install the rest of the dependencies with: +We use [`just`](https://github.com/casey/just#packages) as a task runner. Once installed, you can install the rest of the dependencies with:crates/tests/src/lib.rs (1)
Line range hint
63-127
: Consider improving test reliability and cleanup.The test spawns multiple async tasks including an infinite transaction generation loop. Consider these improvements:
- Add a timeout for the entire test to prevent hanging
- Implement proper cleanup of spawned tasks
- Document or make configurable the 100-block test duration
Here's a suggested improvement:
#[tokio::test] +#[timeout(300000)] // 5 minute timeout async fn test_light_client_prover_talking() -> Result<()> { // ... existing setup ... + let (tx, rx) = tokio::sync::oneshot::channel(); let prover_clone = prover.clone(); - spawn(async move { + let prover_handle = spawn(async move { debug!("starting prover"); prover_clone.run().await.unwrap(); }); let lc_clone = lightclient.clone(); - spawn(async move { + let lc_handle = spawn(async move { debug!("starting light client"); lc_clone.run().await.unwrap(); }); - spawn(async move { + let tx_handle = spawn(async move { let mut rng = StdRng::from_entropy(); + let mut rx = rx; // ... transaction generation code ... loop { + if rx.try_recv().is_ok() { + break; + } // ... existing loop code ... } }); let mut rx = lc_da_layer.clone().subscribe_to_heights(); let initial_height = rx.recv().await.unwrap(); while let Ok(height) = rx.recv().await { debug!("received height {}", height); if height >= initial_height + 100 { break; } } + // Cleanup + tx.send(()).unwrap(); + tx_handle.abort(); + prover_handle.abort(); + lc_handle.abort(); + Ok(()) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
README.md
(1 hunks)ci/Dockerfile.bridge
(1 hunks)ci/Dockerfile.lightnode
(1 hunks)ci/Dockerfile.validator
(1 hunks)ci/docker-compose.yml
(3 hunks)ci/run-bridge.sh
(4 hunks)ci/run-lightnode.sh
(4 hunks)crates/tests/src/lib.rs
(1 hunks)justfile
(4 hunks)
✅ Files skipped from review due to trivial changes (3)
- ci/Dockerfile.bridge
- ci/Dockerfile.validator
- ci/Dockerfile.lightnode
🧰 Additional context used
🪛 Shellcheck (0.10.0)
ci/run-lightnode.sh
[warning] 82-82: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 83-83: Declare and assign separately to avoid masking return values.
(SC2155)
🔇 Additional comments (6)
justfile (2)
11-13
: LGTM: Docker build step addition improves reliability
The addition of the build step before spinning up the stack ensures that all images are properly built, reducing potential runtime failures.
94-106
: Verify coverage exclusions and enhance error handling
The coverage command setup looks good, but has two areas for improvement:
- Error handling should be consistent with other commands
- We should verify the excluded packages are intentional
if ! cargo llvm-cov nextest --html --output-dir coverage_report --lib --features "mock_prover" --release --workspace --exclude prism-cli --exclude-from-report prism-sp1 --ignore-filename-regex sp1; then
echo "Coverage report generation failed."
+ exit 1
else
echo "Coverage report generated in 'coverage_report' directory"
fi
✅ Verification successful
Let me gather more information about the test coverage setup to make a more informed decision.
Let me check one more thing to understand the SP1-related code structure and test coverage requirements.
Coverage exclusions are appropriate, but error handling needs improvement
The exclusions in the coverage command are appropriate:
prism-cli
is excluded as it's primarily an executableprism-sp1
(undercrates/zk/sp1
) is excluded as it's SP1-specific code- The
sp1
filename regex exclusion aligns with the SP1-related implementation files
The error handling should be enhanced as suggested:
if ! cargo llvm-cov nextest --html --output-dir coverage_report --lib --features "mock_prover" --release --workspace --exclude prism-cli --exclude-from-report prism-sp1 --ignore-filename-regex sp1; then
echo "Coverage report generation failed."
+ exit 1
else
echo "Coverage report generated in 'coverage_report' directory"
fi
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify the excluded packages exist and are intentional
fd -t d "prism-(cli|sp1)" .
# Check for any other test files that might need exclusion
rg -l "^#\[cfg\(test\)\]" . --type rust
Length of output: 234
Script:
#!/bin/bash
# Check if prism-cli and prism-sp1 have test-specific code
rg -l "test" ./prism-cli ./prism-sp1 --type rust || true
# Check Cargo.toml to understand the workspace structure
cat Cargo.toml
# Look for any mock or test utilities in sp1-related files
rg -l "mock|test" . --type rust | grep -i "sp1" || true
Length of output: 3580
Script:
#!/bin/bash
# Check the structure and content of the SP1-related code
fd -t f . "crates/zk/sp1"
# Look for any test configurations in workspace
rg -l "mock_prover|test_utils" . --type toml
Length of output: 371
ci/docker-compose.yml (3)
26-30
: Environment variables added correctly for bridge nodes
The addition of SKIP_AUTH
and CONTAINER_NAME
environment variables for bridge-0
enhances configurability and clarity in the service setup. The variables are appropriately documented.
49-51
: Environment variables added correctly for bridge nodes
Similarly, the environment variables for bridge-1
are correctly added, ensuring consistency across bridge services.
65-72
: Environment variables added correctly for light node
The inclusion of SKIP_AUTH
and BRIDGE_COUNT
for light-0
improves the light node's configuration, aligning it with the new peer management features.
crates/tests/src/lib.rs (1)
37-37
: LGTM! Connection string changes improve security and resolve port conflicts.
The changes from 0.0.0.0
to localhost
improve security by restricting connections to the local machine. The light client port change to 46658
helps avoid potential port conflicts.
Let's verify these ports are correctly configured in the docker-compose file:
Also applies to: 41-41
✅ Verification successful
Port mappings in docker-compose.yml correctly align with test configuration
The verification confirms that:
- Port 36658 is mapped to 26658 for bridge connection
- Port 46658 is mapped to 26658 for light client connection
These mappings match exactly with the connection strings in the test file:
- Bridge: "ws://localhost:36658"
- Light client: "ws://localhost:46658"
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the WebSocket ports in docker-compose match the test configuration
# Expected: Port mappings for 36658 and 46658 should be present
rg -A 2 "ports:" ci/docker-compose.yml | rg "366[0-9][0-9]|466[0-9][0-9]"
Length of output: 113
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beautiful
Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
ci/run-bridge.sh (2)
63-76
: Add error message details in retry loopThe retry loop could be more informative about why the peer ID retrieval failed.
- if [[ -z "$peer_id" ]]; then - echo "Node is not running yet. Retrying..." + if [[ -z "$peer_id" ]]; then + echo "Node is not running yet. Failed to get peer ID. Retrying... (${elapsed}s/${timeout}s)"
78-80
: Consider making the port configurableThe RPC port (2121) is hardcoded in the multiaddr construction. Consider making this configurable via an environment variable for flexibility.
+RPC_PORT="${RPC_PORT:-2121}" - multiaddr="/dns/$CONTAINER_NAME/tcp/2121/p2p/$peer_id" + multiaddr="/dns/$CONTAINER_NAME/tcp/$RPC_PORT/p2p/$peer_id"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ci/run-bridge.sh
(4 hunks)
🔇 Additional comments (3)
ci/run-bridge.sh (3)
9-9
: LGTM: Well-structured variable declarations
The new variables are properly defined with sensible defaults and follow the existing pattern of using shared directories.
Also applies to: 23-23
82-101
: LGTM: Proper implementation of file locking
The implementation correctly addresses the race condition concerns raised in the previous review by using proper file locking mechanisms.
Line range hint 1-124
: Verify related CI script updates
Since this script is part of the CI infrastructure and handles critical peer configuration, we should verify if similar changes are needed in related scripts (e.g., light node scripts).
✅ Verification successful
Let me check the content of the related scripts to verify if they need similar updates.
No additional updates needed in related scripts
The verification shows that the peer handling implementation in run-bridge.sh
and run-lightnode.sh
are complementary and working as intended:
run-bridge.sh
appends its peer information to the sharedTRUSTED_PEERS_FILE
run-lightnode.sh
waits for all bridge nodes to register their peer information and properly configures them usingadd_trusted_peers()
- The
docker-compose.yml
correctly sets up the container names and bridge count used by this peer configuration system
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for related CI scripts that might need similar updates
echo "Searching for related CI scripts..."
fd -t f . ci/ -x grep -l "celestia" {} \;
# Check for similar peer handling in other scripts
echo -e "\nChecking for similar peer handling..."
rg -l "trusted.*peer" ci/
Length of output: 429
Script:
#!/bin/bash
# Check content of run-lightnode.sh for peer handling
echo "=== Content of run-lightnode.sh ==="
cat ci/run-lightnode.sh
echo -e "\n=== Checking trusted peer handling in docker-compose.yml ==="
cat ci/docker-compose.yml | grep -A 5 -B 5 "trusted.*peer"
Length of output: 5061
in the CI the integration test still can't connect to the light node, hmmm |
I needed to change |
Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
crates/da/src/celestia.rs
(1 hunks)crates/tests/src/lib.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/tests/src/lib.rs
🔇 Additional comments (1)
crates/da/src/celestia.rs (1)
46-46
: Security consideration for exposed WebSocket endpoint
The change to 0.0.0.0
exposes the WebSocket endpoint on all network interfaces, which could present security risks if not properly secured. Please ensure:
- This exposure is intentional and required for the architecture
- The WebSocket endpoint has proper authentication (note: the code shows optional auth token support)
- Network-level security (firewalls, security groups) is configured appropriately
✅ Verification successful
Let me gather more context about the Celestia client implementation and configuration.
Based on the gathered context, I can now provide a final response:
WebSocket binding to 0.0.0.0 is intended for development and testing
The binding to 0.0.0.0:26658 is part of the default configuration in a development context, as evidenced by:
- It's used alongside test configurations (36658, 46658 ports in tests)
- The codebase is explicitly marked as "not yet suitable for production environments" in README.md
- Authentication support is properly implemented via optional auth tokens in the client
- This is a local development setup that requires running a local Celestia network ("just celestia-up")
For production deployments, the connection string should be configured to a more restrictive binding through CLI arguments or configuration, which is already supported via the celestia_client
parameter.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for security-related configurations
# Search for authentication token usage
echo "Checking auth token usage..."
rg -A 5 "auth_token"
# Search for security-related configurations
echo "Checking security configurations..."
fd -e toml -e yaml -e json -x rg "(security|auth|token|credential)"
Length of output: 944
Script:
#!/bin/bash
# Check the Celestia client implementation and configuration details
echo "Checking Celestia client implementation..."
rg -A 10 "Client::new"
echo "Checking WebSocket connection configuration..."
rg -A 10 "connection_string"
echo "Checking if there are any security-related constants or configurations..."
rg -B 2 -A 5 "0.0.0.0:26658"
echo "Looking for any documentation about security considerations..."
fd README.md -x cat {}
Length of output: 17561
This reverts commit 8cd0a40.
Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
.github/workflows/ci.yml (1)
90-94
: LGTM! The wait step for light node is a good addition.This addition properly ensures the light node is ready before running integration tests, which should help resolve the connection issues mentioned in the PR discussion.
Consider adding a timeout to prevent the workflow from hanging if the light node fails to start:
- name: Wait for light node to start run: | docker compose -f ci/docker-compose.yml logs -f | - awk '/Configuration finished. Running a light node/ {print;exit}' + timeout 60 awk '/Configuration finished. Running a light node/ {print;exit}' + if [ $? -eq 124 ]; then + echo "Error: Light node failed to start within 60 seconds" + docker compose -f ci/docker-compose.yml logs + exit 1 + fi
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/ci.yml
(1 hunks)crates/tests/src/lib.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/tests/src/lib.rs
🔇 Additional comments (1)
.github/workflows/ci.yml (1)
90-94
: Verify Docker service connectivity configuration.
Given the discussion about changing addresses from 0.0.0.0
to localhost
, we should verify the Docker service connectivity configuration.
Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>
Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>
Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>
Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>
Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>
Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
ci/docker-compose.yml (2)
37-56
: Clean up commented codeInstead of commenting out the entire
bridge-1
service configuration, consider:
- Removing it if it's no longer needed
- Using source control to track the changes
- If this is temporary for testing, add a TODO comment explaining the plan
- # bridge-1: - # image: bridge - # platform: "linux/amd64" - # build: - # context: . - # dockerfile: Dockerfile.bridge - # environment: - # # provide an id for the bridge node (default: 0) - # # each node should have a next natural number starting from 0 - # - NODE_ID=1 - # # setting SKIP_AUTH to true disables the use of JWT for authentication - # - SKIP_AUTH=true - # # this must match the service name in the docker-compose.yml file - # # used for the trusted peers string for the light node - # - CONTAINER_NAME=bridge-1 - # ports: - # - 36658:26658 - # volumes: - # - credentials:/credentials - # - genesis:/genesis
Line range hint
1-89
: Document testing strategyThe changes suggest a simplified testing setup (single bridge node, disabled auth) but lack documentation about:
- The testing strategy for different configurations
- Validation steps for the CI pipeline
- How this setup differs from production
Consider adding a
ci/README.md
documenting:
- The CI environment setup and configuration
- Test scenarios and expected outcomes
- Troubleshooting guidelines
- Production vs CI differences
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/ci.yml
(1 hunks)ci/docker-compose.yml
(3 hunks)crates/tests/src/lib.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/ci.yml
- crates/tests/src/lib.rs
🔇 Additional comments (2)
ci/docker-compose.yml (2)
10-10
: Verify impact of reducing bridge nodes
Reducing the bridge count from 2 to 1 removes redundancy in the system. While this might simplify testing, it could affect system reliability in production.
Consider documenting the rationale for this change and its implications for production deployments.
✅ Verification successful
Reducing bridge count is intentional for CI testing
The reduction from 2 to 1 bridge nodes is a valid change for CI testing purposes. The scripts show that:
- The bridge count is configurable via
BRIDGE_COUNT
environment variable - Light nodes wait for the configured number of bridge nodes before starting
- The setup is specifically for CI testing with proper coordination between validator, bridge, and light nodes
- The scripts handle both single and multiple bridge node scenarios correctly
This change simplifies the CI testing environment without compromising the test functionality.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if there are any references to multiple bridge nodes in configuration or documentation
rg -l "bridge.*nodes?" . --type md --type yaml --type sh
Length of output: 145
Script:
#!/bin/bash
# Check the content of CI scripts to understand the bridge node setup and its implications
echo "=== run-bridge.sh ==="
cat ./ci/run-bridge.sh
echo -e "\n=== run-lightnode.sh ==="
cat ./ci/run-lightnode.sh
echo -e "\n=== run-validator.sh ==="
cat ./ci/run-validator.sh
Length of output: 14976
65-72
:
Address light node connection issues
The configuration changes attempt to fix the CI connection issues by:
- Adding
SKIP_AUTH=true
to match bridge configuration - Setting explicit
BRIDGE_COUNT=1
However, based on the PR comments about 0.0.0.0
vs localhost
, there might be additional networking configuration needed.
Consider:
- Adding explicit host binding configuration
- Documenting the networking requirements
- Adding connection validation in the CI pipeline
Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>
Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>
Signed-off-by: Smuu <18609909+Smuu@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (5)
ci/docker-compose.yml (2)
Line range hint
8-14
: Inconsistent BRIDGE_COUNT configurationThe
BRIDGE_COUNT=2
in validator service appears incorrect since:
- The PR is addressing light node connection issues
- Comments indicate troubleshooting with a single bridge node
- This might be contributing to the connection problems mentioned in PR comments
- BRIDGE_COUNT=2 + BRIDGE_COUNT=1
Line range hint
37-56
: Remove unused bridge-1 serviceSince the PR is focused on fixing light node connection issues with a single bridge, maintaining the
bridge-1
service configuration:
- Adds unnecessary complexity
- May interfere with connection troubleshooting
- Conflicts with the suggested
BRIDGE_COUNT=1
Consider removing the entire
bridge-1
service block while debugging the connection issues.ci/run-lightnode.sh (1)
32-64
: Improve timeout handling in wait_for_provisionThe function has been enhanced to wait for bridge nodes, but the timeout handling could be improved.
Consider these improvements:
- Make timeout configurable via environment variable
- Add more detailed error message on timeout
- Add progress indicator
+PROVISION_TIMEOUT="${PROVISION_TIMEOUT:-30}" # Allow configurable timeout echo "Waiting for $BRIDGE_COUNT bridge nodes to start" start_time=$(date +%s) -timeout=30 +timeout=$PROVISION_TIMEOUT +echo "Will wait up to $timeout seconds..."ci/run-validator.sh (2)
Line range hint
191-193
: Fix potential race condition in genesis hash file writingBoth
provision_bridge_nodes
andprovision_light_nodes
are run in parallel and write to the sameGENESIS_HASH_FILE
. This could lead to a race condition. Consider:
- Running them sequentially instead of parallel
- Or having only one function write the genesis hash
- provision_bridge_nodes & - provision_light_nodes & + # Run sequentially to avoid race condition + provision_bridge_nodes + provision_light_nodes
Line range hint
3-3
: Enhance error handling for CI reliabilityWhile
set -euo pipefail
provides basic error handling, consider adding:
- Timeout handling for wait_for_block to prevent infinite loops
- Explicit checks for file operations in shared directory
- Logging for better CI debugging
set -euo pipefail + +# Add timeout handling +TIMEOUT=300 # 5 minutes + +# Add error handling function +handle_error() { + echo "Error occurred in script at line $1" + exit 1 +} + +trap 'handle_error $LINENO' ERR
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/workflows/ci.yml
(1 hunks)ci/docker-compose.yml
(5 hunks)ci/run-bridge.sh
(4 hunks)ci/run-lightnode.sh
(4 hunks)ci/run-validator.sh
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/ci.yml
🧰 Additional context used
🪛 Shellcheck (0.10.0)
ci/run-lightnode.sh
[warning] 87-87: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 88-88: Declare and assign separately to avoid masking return values.
(SC2155)
🔇 Additional comments (8)
ci/docker-compose.yml (3)
26-30
: Authentication still disabled in CI environment
While the added comments explain the purpose of SKIP_AUTH
and CONTAINER_NAME
, the security concern about disabled authentication remains unaddressed.
87-91
: LGTM: Good volume configuration for CI
The tmpfs configuration for the shared volume is well-suited for CI:
- Provides fast I/O performance
- Ensures clean state between test runs
- Properly configured with appropriate driver options
Line range hint 1-91
: Verify configuration consistency
Let's verify the overall configuration consistency to help resolve the connection issues.
✅ Verification successful
Let me verify the configuration consistency by generating additional scripts to check the environment variables and port mappings.
Configuration is consistent and properly set up
The verification shows that:
- Bridge count is consistently set to 2 across services
- Container names are properly referenced (bridge-0, bridge-1)
- Port mappings are correctly configured with unique host ports (26658, 36658, 46658)
- Authentication settings (SKIP_AUTH) are consistently set to true across all services
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check configuration consistency across services
# Check for bridge count references
echo "Checking bridge count consistency:"
rg "BRIDGE_COUNT" -A 1
# Check for container name references
echo "\nChecking container name references:"
rg "CONTAINER_NAME" -A 1
# Check for network-related configurations
echo "\nChecking network configurations:"
rg "0.0.0.0|localhost" -A 1
Length of output: 67959
Script:
#!/bin/bash
# Check port mappings and environment variables
# Check all port mappings
echo "Checking port mappings:"
rg "ports:" -A 2 ci/docker-compose.yml
# Check environment variables for bridge count
echo "\nChecking bridge count environment variables:"
rg "BRIDGE_COUNT" ci/docker-compose.yml
# Check container names and their references
echo "\nChecking container name references:"
rg "CONTAINER_NAME|container_name" ci/docker-compose.yml
# Check skip auth settings
echo "\nChecking authentication settings:"
rg "SKIP_AUTH" ci/docker-compose.yml
Length of output: 1048
ci/run-lightnode.sh (1)
86-91
: Fix potential return value masking in add_trusted_peers
The function has shellcheck warnings about masking return values.
Apply this diff as suggested in previous reviews:
add_trusted_peers() {
- local trusted_peers="$(cat "$TRUSTED_PEERS_FILE")"
- local formatted_peers=$(echo "$trusted_peers" | sed 's/\([^,]*\)/"\1"/g')
+ local trusted_peers
+ trusted_peers="$(cat "$TRUSTED_PEERS_FILE")"
+ local formatted_peers
+ formatted_peers=$(echo "$trusted_peers" | sed 's/\([^,]*\)/"\1"/g')
echo "Trusting peers: $formatted_peers"
sed -i'.bak' "s|TrustedPeers = .*|TrustedPeers = [$formatted_peers]|" "$CONFIG_DIR/config.toml"
}
🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 87-87: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 88-88: Declare and assign separately to avoid masking return values.
(SC2155)
ci/run-bridge.sh (2)
115-116
: Improve background process handling
The current implementation of background process could be improved.
As suggested in previous reviews, implement proper background process tracking and cleanup:
+# Track background processes
+declare -a bg_pids
+
+# Cleanup function
+cleanup() {
+ for pid in "${bg_pids[@]}"; do
+ kill "$pid" 2>/dev/null || true
+ done
+}
+
+# Set trap
+trap cleanup EXIT
+
main() {
# ... previous code ...
- append_trusted_peers &
- sleep 4
+ append_trusted_peers &
+ bg_pids+=($!)
+
+ # Wait for trusted peers file to be created or timeout
+ timeout=30
+ start_time=$(date +%s)
+ while [[ ! -f "$TRUSTED_PEERS_FILE" ]]; do
+ current_time=$(date +%s)
+ if (( current_time - start_time >= timeout )); then
+ echo "Timeout waiting for trusted peers"
+ exit 1
+ fi
+ sleep 1
+ done
Line range hint 1-1
: Verify the impact on CI pipeline
The changes look good and address the light node connection issues. Let's verify the impact on the CI pipeline.
✅ Verification successful
Let me gather more information about the CI configuration and pipeline changes.
Based on the gathered information, I can now provide the final response:
CI pipeline changes have been properly implemented with robust error handling
The changes in ci/run-bridge.sh
show significant improvements:
- Proper file locking mechanism implemented for
TRUSTED_PEERS_FILE
to prevent race conditions - Timeout handling added for peer ID retrieval (30 seconds limit)
- Robust error handling with proper exit codes
- Clear logging for debugging purposes
- Sequential initialization steps with proper waiting mechanisms
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check CI workflow files for light node configuration
echo "Checking CI workflow configuration..."
rg -l "BRIDGE_COUNT|TRUSTED_PEERS_FILE" .github/workflows/
# Check for any remaining connection issues in CI logs
echo "Checking recent CI logs for connection issues..."
gh run list --limit 5 --json conclusion,databaseId,url \
| jq -r '.[] | select(.conclusion=="failure") | .url' \
| while read -r url; do
gh run view "$url" --log | rg -i "connection|light.*node"
done
Length of output: 377
Script:
#!/bin/bash
# Check for CI configuration changes in the PR
echo "Checking CI-related changes..."
rg -l "ci/" --type sh
# Look for bridge-related configuration
echo "Checking bridge configuration..."
fd "bridge" --type f --exec cat {}
# Check workflow files for CI configuration
echo "Checking workflow files..."
fd "workflow" -e yml -e yaml --exec cat {}
Length of output: 4513
ci/run-validator.sh (2)
21-23
: Verify shared directory permissions and synchronization
The change from GENESIS_DIR
to SHARED_DIR
makes sense as it's now used for sharing both genesis hash and peer addresses. However, ensure that:
- The
/shared
directory is properly mounted and accessible by all nodes - File permissions are correctly set for both reading and writing
✅ Verification successful
Directory sharing configuration is properly implemented
The change from GENESIS_DIR
to SHARED_DIR
is correctly implemented with proper infrastructure support:
- The shared volume is configured as a tmpfs mount in docker-compose.yml, ensuring fast and secure inter-container communication
- All nodes (validator, bridge, light) consistently use the same
/shared
directory path - The shared directory is properly mounted across all containers using Docker volumes
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if the shared directory is referenced in docker-compose.yml
rg -l "shared:"
# Check if other scripts are using the same shared directory
rg "/shared" --type sh
Length of output: 223
Script:
#!/bin/bash
# Check how the shared directory is configured in docker-compose.yml
rg -A 5 "shared:" ci/docker-compose.yml
# Check how the shared directory is used in other scripts
rg -B 2 -A 2 "/shared" ci/run-{validator,lightnode,bridge}.sh
Length of output: 1598
Line range hint 176-177
: Review connection configuration for light node issues
The script sets the validator to listen on 0.0.0.0:26657
, but PR comments mention connection issues between the light node and validator. Consider:
- Verifying if this binding address is appropriate for your CI environment
- Testing with
localhost
as mentioned in PR comments
Summary by CodeRabbit
Documentation
New Features
Bug Fixes
Chores