K8s: Improve Node checks for liveness probe and preStop hook #2661

VietND96 · 2025-02-18T08:58:13Z

User description

Thanks for contributing to the Docker-Selenium project!
A PR well described will help maintainers to quickly review and merge it

Before submitting your PR, please check our contributing guidelines, applied for this repository.
Avoid large PRs, help reviewers by making them as simple and short as possible.

Description

Motivation and Context

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist

I have read the contributing document.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed.

PR Type

Enhancement, Bug fix

Description

Improved handling of Node liveness probe and preStop hook in Kubernetes.
Enhanced error handling and logging for Node and Hub/Router interactions.
Added session count checks and retry mechanisms for Node termination readiness.
Refined environment variable usage and HTTP response handling in scripts.

Changes walkthrough 📝

Relevant files

Enhancement

nodePreStop.sh `Enhance Node preStop script for better error handling` charts/selenium-grid/configs/node/nodePreStop.sh Improved logging for Hub/Router authentication and endpoint checks. Added handling for HTTP 200 responses and other statuses from Hub/Router. Enhanced session count checks and retry logic for Node termination readiness. Refined usage of environment variables and error handling in Node drain logic.	+22/-9
nodeProbe.sh `Improve Node probe script for robustness` charts/selenium-grid/configs/node/nodeProbe.sh Improved logging for Node status and session count. Enhanced handling of HTTP responses from Hub/Router. Added fallback logic for Node readiness when Hub/Router is unreachable. Refined JSON parsing and error handling in Node probe logic.	+12/-6

Need help?
Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
Check out the documentation for more information.

Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>

qodo-merge-pro · 2025-02-18T08:58:39Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Error Handling The script assumes jq commands will succeed when parsing JSON responses. While there are some error fallbacks with \|\| echo "", more robust error handling for malformed JSON responses would be beneficial. SLOT_HAS_SESSION=$(jq -r '[.value.node.slots[]? \| select(.session != null)] \| length' "${tmp_node_file}" \|\| echo 0) if [ "${SLOT_HAS_SESSION}" -eq 0 ] && [ "${endpoint_http_code}" = "200" ]; then Edge Case The script assumes Node is ready when Hub is unreachable (HTTP non-200). This could mask actual Node registration issues and lead to false positives in node readiness detection. elif [ -n "${NODE_ID}" ] && [ "${endpoint_http_code}" != "200" ]; then echo "$(date -u +"${ts_format}") [${probe_name}] - Node ID: ${NODE_ID} report its status, but could not double check ID in Hub. Assume that Node is ready." exit 0

qodo-merge-pro · 2025-02-18T08:59:00Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Verify drain request success The script should validate the HTTP response code from the drain request to ensure the node drain signal was successfully processed. Currently, there's no check if the drain request succeeded. charts/selenium-grid/configs/node/nodePreStop.sh [54] -curl --noproxy "" -m ${max_time} -k -X POST -H "Authorization: Basic ${BASIC_AUTH}" ${grid_url}/se/grid/distributor/node/${NODE_ID}/drain --header "${HEADERS}" +if ! curl --noproxy "" -m ${max_time} -k -X POST -H "Authorization: Basic ${BASIC_AUTH}" ${grid_url}/se/grid/distributor/node/${NODE_ID}/drain --header "${HEADERS}" -w "%{http_code}" \| grep -q "^2"; then + echo "$(date -u +"${ts_format}") [${probe_name}] - Failed to signal node drain to Hub/Router" + exit 1 +fi Apply this suggestion Suggestion importance[1-10]: 8 __ Why: The suggestion adds crucial error handling for the drain request, which could prevent silent failures and ensure proper node termination. This is important for maintaining grid stability and preventing orphaned sessions.	Medium
Possible issue	Handle JSON parsing errors The script should handle potential JSON parsing errors when checking for active sessions, as malformed JSON could cause incorrect termination decisions. charts/selenium-grid/configs/node/nodePreStop.sh [84-85] -SLOT_HAS_SESSION=$(jq -r '[.value.node.slots[]? \| select(.session != null)] \| length' "${tmp_node_file}" \|\| echo 0) +if ! SLOT_HAS_SESSION=$(jq -r '[.value.node.slots[]? \| select(.session != null)] \| length' "${tmp_node_file}" 2>/dev/null); then + echo "$(date -u +"${ts_format}") [${probe_name}] - Failed to parse node status JSON" + exit 1 +fi if [ "${SLOT_HAS_SESSION}" -eq 0 ] && [ "${endpoint_http_code}" = "200" ]; then Apply this suggestion Suggestion importance[1-10]: 7 __ Why: The suggestion improves error handling for JSON parsing failures, which could prevent incorrect termination decisions based on malformed data. This enhances the reliability of the node shutdown process.	Medium
More

qodo-merge-pro · 2025-02-18T09:47:32Z

CI Feedback 🧐

(Feedback updated until commit `8e454ba`)

A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

Action: Rerun workflow when failure
Failed stage: Authenticate GitHub CLI for PR [❌]
Failure summary: The action failed because the GitHub authentication token provided (`GH_CLI_TOKEN_PR`) lacks the required 'read:org' permission scope. The GitHub CLI (`gh`) command attempted to authenticate but couldn't proceed due to insufficient permissions.
Relevant error logs: 1: ##[group]Operating System 2: Ubuntu ... 28: SecurityEvents: write 29: Statuses: write 30: ##[endgroup] 31: Secret source: Actions 32: Prepare workflow directory 33: Prepare all required actions 34: Getting action download info 35: Download action repository 'actions/checkout@main' (SHA:85e6279cec87321a52edac9c87bce653a07cf6c2) 36: Complete job name: Rerun workflow when failure ... 48: show-progress: true 49: lfs: false 50: submodules: false 51: set-safe-directory: true 52: env: 53: GH_CLI_TOKEN: * 54: GH_CLI_TOKEN_PR: * 55: RUN_ID: 13386789165 56: RERUN_FAILED_ONLY: true ... 119: ##[group]Run sudo apt update 120: �[36;1msudo apt update�[0m 121: �[36;1msudo apt install gh�[0m 122: shell: /usr/bin/bash -e {0} 123: env: 124: GH_CLI_TOKEN: * 125: GH_CLI_TOKEN_PR: * 126: RUN_ID: 13386789165 127: RERUN_FAILED_ONLY: true ... 167: 0 upgraded, 0 newly installed, 0 to remove and 22 not upgraded. 168: ##[group]Run echo "$GH_CLI_TOKEN_PR" \| gh auth login --with-token 169: �[36;1mecho "$GH_CLI_TOKEN_PR" \| gh auth login --with-token�[0m 170: shell: /usr/bin/bash -e {0} 171: env: 172: GH_CLI_TOKEN: * 173: GH_CLI_TOKEN_PR: * 174: RUN_ID: 13386789165 175: RERUN_FAILED_ONLY: true 176: RUN_ATTEMPT: 1 177: ##[endgroup] 178: error validating token: missing required scope 'read:org' 179: ##[error]Process completed with exit code 1.

K8s: Improve Node checks for liveness probe and preStop hook

8e454ba

Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>

qodo-merge-pro bot added the Review effort 3/5 label Feb 18, 2025

VietND96 merged commit 0a5b6ec into trunk Feb 18, 2025
25 of 27 checks passed

VietND96 deleted the node-probe branch February 18, 2025 10:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8s: Improve Node checks for liveness probe and preStop hook #2661

K8s: Improve Node checks for liveness probe and preStop hook #2661

VietND96 commented Feb 18, 2025 •

edited by qodo-merge-pro bot

Loading

qodo-merge-pro bot commented Feb 18, 2025

qodo-merge-pro bot commented Feb 18, 2025

qodo-merge-pro bot commented Feb 18, 2025 •

edited

Loading

K8s: Improve Node checks for liveness probe and preStop hook #2661

K8s: Improve Node checks for liveness probe and preStop hook #2661

Conversation

VietND96 commented Feb 18, 2025 • edited by qodo-merge-pro bot Loading

User description

Description

Motivation and Context

Types of changes

Checklist

PR Type

Description

Changes walkthrough 📝

qodo-merge-pro bot commented Feb 18, 2025

PR Reviewer Guide 🔍

qodo-merge-pro bot commented Feb 18, 2025

PR Code Suggestions ✨

qodo-merge-pro bot commented Feb 18, 2025 • edited Loading

CI Feedback 🧐

(Feedback updated until commit 8e454ba)

VietND96 commented Feb 18, 2025 •

edited by qodo-merge-pro bot

Loading

qodo-merge-pro bot commented Feb 18, 2025 •

edited

Loading

(Feedback updated until commit `8e454ba`)