Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s: Update ScaledJob scaling strategy to eager as default #2466

Merged
merged 1 commit into from
Nov 14, 2024
Merged

Conversation

VietND96
Copy link
Member

@VietND96 VietND96 commented Nov 14, 2024

User description

Thanks for contributing to the Docker-Selenium project!
A PR well described will help maintainers to quickly review and merge it

Before submitting your PR, please check our contributing guidelines, applied for this repository.
Avoid large PRs, help reviewers by making them as simple and short as possible.

Description

Follow KEDA docs - https://keda.sh/docs/2.16/reference/scaledjob-spec/#scalingstrategy

The eager strategy comes to the rescue. It addresses this issue by utilizing all available slots up to the maxReplicaCount, ensuring that waiting messages are processed as quickly as possible.

It is expected to overcome in case default could be missed number of replicas while incoming requests between, or overrate scales in accurate .

Motivation and Context

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the contributing document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

PR Type

enhancement, configuration changes


Description

  • Updated the default scaling strategy for KEDA ScaledJob to eager to ensure efficient processing of waiting messages.
  • Modified RBAC role settings to correctly reference triggerauthentications.
  • Adjusted the default configuration for useCachedMetrics to false.
  • Enhanced the patch-keda job by optimizing kubectl delete commands with --wait false.

Changes walkthrough 📝

Relevant files
Enhancement
_helpers.tpl
Add conditional logic for metricType in helpers template 

charts/selenium-grid/templates/_helpers.tpl

  • Added conditional logic for metricType based on scalingType.
+2/-0     
patch-keda-objects-job.yaml
Optimize kubectl delete commands in patch-keda job             

charts/selenium-grid/templates/patch-keda/patch-keda-objects-job.yaml

  • Modified kubectl delete commands to use --wait false.
+2/-2     
Documentation
CONFIGURATION.md
Update configuration documentation for RBAC and scaling strategy

charts/selenium-grid/CONFIGURATION.md

  • Updated RBAC role to include triggerauthentications.
  • Changed default scaling strategy to eager.
  • Modified default settings for useCachedMetrics.
  • +4/-4     
    Configuration changes
    values.yaml
    Update values for RBAC, caching, and scaling strategy       

    charts/selenium-grid/values.yaml

  • Updated RBAC resources to triggerauthentications.
  • Changed useCachedMetrics default to false.
  • Set scaling strategy to eager.
  • +9/-6     

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Potential Race Condition
    Removing --wait flag from kubectl delete commands could lead to race conditions if subsequent operations depend on these resources being fully deleted

    Configuration Change
    Changing useCachedMetrics from true to false might impact performance and increase API calls to the metrics endpoint

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Best practice
    Add proper error handling and logging for resource deletion operations

    Add error handling and logging for the kubectl delete commands to ensure failures
    are properly captured and reported.

    charts/selenium-grid/templates/patch-keda/patch-keda-objects-job.yaml [38-39]

    -kubectl delete ScaledObjects,ScaledJobs,TriggerAuthentication -n {{ .Release.Namespace }} -l component.autoscaling={{ .Release.Name }} --wait false || true ;
    -kubectl delete hpa -n {{ .Release.Namespace }} -l component.autoscaling={{ .Release.Name }} --wait false || true ;
    +if ! kubectl delete ScaledObjects,ScaledJobs,TriggerAuthentication -n {{ .Release.Namespace }} -l component.autoscaling={{ .Release.Name }} --wait false; then
    +  echo "Warning: Failed to delete some KEDA resources"
    +fi
    +if ! kubectl delete hpa -n {{ .Release.Namespace }} -l component.autoscaling={{ .Release.Name }} --wait false; then
    +  echo "Warning: Failed to delete some HPA resources"
    +fi
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: The suggestion improves error visibility and debugging by adding explicit error messages when resource deletion fails, replacing the silent failure with '|| true'. This enhances maintainability and troubleshooting capabilities.

    6

    💡 Need additional feedback ? start a PR chat

    Copy link

    qodo-merge-pro bot commented Nov 14, 2024

    CI Failure Feedback 🧐

    (Checks updated until commit fcf268f)

    Action: Rerun workflow when failure

    Failed stage: Authenticate GitHub CLI for PR [❌]

    Failure summary:

    The action failed due to an error in the authentication process with GitHub CLI:

  • The provided token is missing the required scope read:org.
  • This resulted in a validation error when attempting to authenticate using the GitHub CLI.

  • Relevant error logs:
    1:  ##[group]Operating System
    2:  Ubuntu
    ...
    
    28:  SecurityEvents: write
    29:  Statuses: write
    30:  ##[endgroup]
    31:  Secret source: Actions
    32:  Prepare workflow directory
    33:  Prepare all required actions
    34:  Getting action download info
    35:  Download action repository 'actions/checkout@main' (SHA:3b9b8c884f6b4bb4d5be2779c26374abadae0871)
    36:  Complete job name: Rerun workflow when failure
    ...
    
    48:  show-progress: true
    49:  lfs: false
    50:  submodules: false
    51:  set-safe-directory: true
    52:  env:
    53:  GH_CLI_TOKEN: ***
    54:  GH_CLI_TOKEN_PR: ***
    55:  RUN_ID: 11835906833
    56:  RERUN_FAILED_ONLY: true
    ...
    
    119:  ##[group]Run sudo apt update
    120:  �[36;1msudo apt update�[0m
    121:  �[36;1msudo apt install gh�[0m
    122:  shell: /usr/bin/bash -e {0}
    123:  env:
    124:  GH_CLI_TOKEN: ***
    125:  GH_CLI_TOKEN_PR: ***
    126:  RUN_ID: 11835906833
    127:  RERUN_FAILED_ONLY: true
    ...
    
    165:  0 upgraded, 0 newly installed, 0 to remove and 55 not upgraded.
    166:  ##[group]Run echo "$GH_CLI_TOKEN_PR" | gh auth login --with-token
    167:  �[36;1mecho "$GH_CLI_TOKEN_PR" | gh auth login --with-token�[0m
    168:  shell: /usr/bin/bash -e {0}
    169:  env:
    170:  GH_CLI_TOKEN: ***
    171:  GH_CLI_TOKEN_PR: ***
    172:  RUN_ID: 11835906833
    173:  RERUN_FAILED_ONLY: true
    174:  RUN_ATTEMPT: 1
    175:  ##[endgroup]
    176:  error validating token: missing required scope 'read:org'
    177:  ##[error]Process completed with exit code 1.
    

    ✨ CI feedback usage guide:

    The CI feedback tool (/checks) automatically triggers when a PR has a failed check.
    The tool analyzes the failed checks and provides several feedbacks:

    • Failed stage
    • Failed test name
    • Failure summary
    • Relevant error logs

    In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:

    /checks "https://github.com/{repo_name}/actions/runs/{run_number}/job/{job_number}"
    

    where {repo_name} is the name of the repository, {run_number} is the run number of the failed check, and {job_number} is the job number of the failed check.

    Configuration options

    • enable_auto_checks_feedback - if set to true, the tool will automatically provide feedback when a check is failed. Default is true.
    • excluded_checks_list - a list of checks to exclude from the feedback, for example: ["check1", "check2"]. Default is an empty list.
    • enable_help_text - if set to true, the tool will provide a help message with the feedback. Default is true.
    • persistent_comment - if set to true, the tool will overwrite a previous checks comment with the new feedback. Default is true.
    • final_update_message - if persistent_comment is true and updating a previous checks message, the tool will also create a new message: "Persistent checks updated to latest commit". Default is true.

    See more information about the checks tool in the docs.

    @VietND96 VietND96 merged commit 3dddb7d into trunk Nov 14, 2024
    51 of 53 checks passed
    @VietND96 VietND96 deleted the scale-job branch November 14, 2024 13:40
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant