Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(defaults): Add default resources to engine/executor container #2514

Merged
merged 1 commit into from
Oct 15, 2020

Conversation

groszewn
Copy link
Contributor

@groszewn groszewn commented Oct 1, 2020

Defines default requests and limits on the engine/executor container.

Contributes to #2475

Signed-off-by: Nick Groszewski groszewn@gmail.com

What this PR does / why we need it:

Addition of default engine/executor container resources, which helps in clusters with OPA policies defined to require requests and limits be set.

Which issue(s) this PR fixes:

Fixes #2475

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Added default resources to engine/executor container

@seldondev
Copy link
Collaborator

Hi @groszewn. Thanks for your PR.

I'm waiting for a SeldonIO member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository.

@ukclivecox
Copy link
Contributor

@groszewn I'm just wondering if we should be more conservative on the CPU as a value of 1 implies continuous usage in a cluster. Given we are not serverless I am wondering if this could cause issues to existing users that have over-commits for their cluster?

@groszewn
Copy link
Contributor Author

groszewn commented Oct 2, 2020

@cliveseldon I think that's a very valid concern. Would it make sense for this value to be configurable via the helm chart install so that cluster admins can specify what meets their needs best and have a lower default (maybe 0.5 or lower)?

@ukclivecox
Copy link
Contributor

@groszewn Yes I think that makes sense
Adding things to the helm chart values that appear as env vars is a bit fiddly at present.
You need to

  1. add the envvar to manager kustomize yaml
  2. modify
    HELM_ENV_SUBST = {
    "AMBASSADOR_ENABLED": "ambassador.enabled",
    "AMBASSADOR_SINGLE_NAMESPACE": "ambassador.singleNamespace",
    "ENGINE_SERVER_GRPC_PORT": "engine.grpc.port",
    "ENGINE_CONTAINER_IMAGE_PULL_POLICY": "engine.image.pullPolicy",
    "ENGINE_LOG_MESSAGES_EXTERNALLY": "engine.logMessagesExternally",
    "ENGINE_SERVER_PORT": "engine.port",
    "ENGINE_PROMETHEUS_PATH": "engine.prometheus.path",
    "ENGINE_CONTAINER_USER": "engine.user",
    "ENGINE_CONTAINER_SERVICE_ACCOUNT_NAME": "engine.serviceAccount.name",
    "ISTIO_ENABLED": "istio.enabled",
    "ISTIO_GATEWAY": "istio.gateway",
    "ISTIO_TLS_MODE": "istio.tlsMode",
    "PREDICTIVE_UNIT_SERVICE_PORT": "predictiveUnit.port",
    "PREDICTIVE_UNIT_DEFAULT_ENV_SECRET_REF_NAME": "predictiveUnit.defaultEnvSecretRefName",
    "PREDICTIVE_UNIT_METRICS_PORT_NAME": "predictiveUnit.metricsPortName",
    "USE_EXECUTOR": "executor.enabled",
    "EXECUTOR_CONTAINER_IMAGE_PULL_POLICY": "executor.image.pullPolicy",
    "EXECUTOR_SERVER_PORT": "executor.port",
    "EXECUTOR_SERVER_METRICS_PORT_NAME": "executor.metricsPortName",
    "EXECUTOR_PROMETHEUS_PATH": "executor.prometheus.path",
    "EXECUTOR_CONTAINER_USER": "executor.user",
    "EXECUTOR_CONTAINER_SERVICE_ACCOUNT_NAME": "executor.serviceAccount.name",
    "MANAGER_CREATE_RESOURCES": "managerCreateResources",
    "EXECUTOR_REQUEST_LOGGER_DEFAULT_ENDPOINT": "executor.requestLogger.defaultEndpoint",
    "DEFAULT_USER_ID": "defaultUserID",
    }
    to add new mapping
  3. regenerate the helm chart templates with
  4. add new var to values.yaml for operator helm chart
  5. modify operator code to use envvar

@groszewn
Copy link
Contributor Author

groszewn commented Oct 5, 2020

@cliveseldon updated to make the default resources configurable via the helm chart install.

@ukclivecox
Copy link
Contributor

/ok-to-test

@seldondev
Copy link
Collaborator

Mon Oct 5 12:49:33 UTC 2020
The logs for [pr-build] [1] will show after the pipeline context has finished.
https://github.com/SeldonIO/seldon-core/blob/gh-pages/jenkins-x/logs/SeldonIO/seldon-core/PR-2514/1.log

impatient try
jx get build logs SeldonIO/seldon-core/PR-2514 --build=1

@seldondev
Copy link
Collaborator

Mon Oct 5 12:49:35 UTC 2020
The logs for [lint] [2] will show after the pipeline context has finished.
https://github.com/SeldonIO/seldon-core/blob/gh-pages/jenkins-x/logs/SeldonIO/seldon-core/PR-2514/2.log

impatient try
jx get build logs SeldonIO/seldon-core/PR-2514 --build=2

@groszewn
Copy link
Contributor Author

groszewn commented Oct 6, 2020

@cliveseldon do the test need to be rekicked?

@ukclivecox
Copy link
Contributor

/test integration

@seldondev
Copy link
Collaborator

Tue Oct 6 13:09:43 UTC 2020
The logs for [integration] [3] will show after the pipeline context has finished.
https://github.com/SeldonIO/seldon-core/blob/gh-pages/jenkins-x/logs/SeldonIO/seldon-core/PR-2514/3.log

impatient try
jx get build logs SeldonIO/seldon-core/PR-2514 --build=3

@groszewn
Copy link
Contributor Author

groszewn commented Oct 7, 2020

/retest

@seldondev
Copy link
Collaborator

Wed Oct 7 12:34:13 UTC 2020
The logs for [integration] [4] will show after the pipeline context has finished.
https://github.com/SeldonIO/seldon-core/blob/gh-pages/jenkins-x/logs/SeldonIO/seldon-core/PR-2514/4.log

impatient try
jx get build logs SeldonIO/seldon-core/PR-2514 --build=4

@seldondev
Copy link
Collaborator

@groszewn: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
integration 98c4d11 link /test integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository. I understand the commands that are listed here.

@seldondev
Copy link
Collaborator

Fri Oct 9 13:06:16 UTC 2020
The logs for [pr-build] [5] will show after the pipeline context has finished.
https://github.com/SeldonIO/seldon-core/blob/gh-pages/jenkins-x/logs/SeldonIO/seldon-core/PR-2514/5.log

impatient try
jx get build logs SeldonIO/seldon-core/PR-2514 --build=5

@seldondev
Copy link
Collaborator

Fri Oct 9 13:06:50 UTC 2020
The logs for [lint] [6] will show after the pipeline context has finished.
https://github.com/SeldonIO/seldon-core/blob/gh-pages/jenkins-x/logs/SeldonIO/seldon-core/PR-2514/6.log

impatient try
jx get build logs SeldonIO/seldon-core/PR-2514 --build=6

@groszewn
Copy link
Contributor Author

@cliveseldon seems like the lint stage is hanging. Does this need to be kicked in some way?

@ukclivecox
Copy link
Contributor

/retest

@axsaucedo
Copy link
Contributor

/test lint

@seldondev
Copy link
Collaborator

Tue Oct 13 16:53:56 UTC 2020
The logs for [lint] [7] will show after the pipeline context has finished.
https://github.com/SeldonIO/seldon-core/blob/gh-pages/jenkins-x/logs/SeldonIO/seldon-core/PR-2514/7.log

impatient try
jx get build logs SeldonIO/seldon-core/PR-2514 --build=7

@groszewn
Copy link
Contributor Author

@axsaucedo looks like PR checks have passed, do you need to kick the longer-running tests (I'm not sure if I have this capability and don't want to overstep and strain the build system).

Defines default requests and limits on the engine/executor container.

Contributes to 2475

Signed-off-by: Nick Groszewski <groszewn@gmail.com>
@seldondev
Copy link
Collaborator

Thu Oct 15 14:26:10 UTC 2020
The logs for [lint] [9] will show after the pipeline context has finished.
https://github.com/SeldonIO/seldon-core/blob/gh-pages/jenkins-x/logs/SeldonIO/seldon-core/PR-2514/9.log

impatient try
jx get build logs SeldonIO/seldon-core/PR-2514 --build=9

@seldondev
Copy link
Collaborator

Thu Oct 15 14:26:19 UTC 2020
The logs for [pr-build] [8] will show after the pipeline context has finished.
https://github.com/SeldonIO/seldon-core/blob/gh-pages/jenkins-x/logs/SeldonIO/seldon-core/PR-2514/8.log

impatient try
jx get build logs SeldonIO/seldon-core/PR-2514 --build=8

@axsaucedo
Copy link
Contributor

/approve

@axsaucedo axsaucedo merged commit 3b042a4 into SeldonIO:master Oct 15, 2020
@axsaucedo
Copy link
Contributor

Thank you @groszewn !

@seldondev
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: axsaucedo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explicitly define default requests and limits for engine container
4 participants