-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sparkoperator.k8s.io/SparkApplication health check does not support dynamicAllocation #7557
Comments
We are also facing this issue, SparkApplications are stuck in the Pending state when using |
We have a workaround for this case: override the health check script as shown below: resource.customizations.health.sparkoperator.k8s.io_SparkApplication: |
health_status = {}
if obj.status ~= nil then
if obj.status.applicationState.state ~= nil then
if obj.status.applicationState.state == "" then
health_status.status = "Progressing"
health_status.message = "SparkApplication was added, enqueuing it for submission"
return health_status
end
if obj.spec.dynamicAllocation.enabled == true then
if obj.status.applicationState.state == "RUNNING" then
health_status.status = "Healthy"
health_status.message = "SparkApplication is Running"
return health_status
end
end
if obj.status.applicationState.state == "RUNNING" then
if obj.status.executorState ~= nil then
count=0
executor_instances = obj.spec.executor.instances
for i, executorState in pairs(obj.status.executorState) do
if executorState == "RUNNING" then
count=count+1
end
end
if executor_instances == count then
health_status.status = "Healthy"
health_status.message = "SparkApplication is Running"
return health_status
end
end
end
if obj.status.applicationState.state == "SUBMITTED" then
health_status.status = "Progressing"
health_status.message = "SparkApplication was submitted successfully"
return health_status
end
if obj.status.applicationState.state == "COMPLETED" then
health_status.status = "Healthy"
health_status.message = "SparkApplication was Completed"
return health_status
end
if obj.status.applicationState.state == "FAILED" then
health_status.status = "Degraded"
health_status.message = obj.status.applicationState.errorMessage
return health_status
end
if obj.status.applicationState.state == "SUBMISSION_FAILED" then
health_status.status = "Degraded"
health_status.message = obj.status.applicationState.errorMessage
return health_status
end
if obj.status.applicationState.state == "PENDING_RERUN" then
health_status.status = "Progressing"
health_status.message = "SparkApplication is Pending Rerun"
return health_status
end
if obj.status.applicationState.state == "INVALIDATING" then
health_status.status = "Missing"
health_status.message = "SparkApplication is in InvalidatingState"
return health_status
end
if obj.status.applicationState.state == "SUCCEEDING" then
health_status.status = "Progressing"
health_status.message = [[The driver pod has been completed successfully. The executor pods terminate and are cleaned up.
Under this circumstances, we assume the executor pod are completed.]]
return health_status
end
if obj.status.applicationState.state == "FAILING" then
health_status.status = "Degraded"
health_status.message = obj.status.applicationState.errorMessage
return health_status
end
if obj.status.applicationState.state == "UNKNOWN" then
health_status.status = "Progressing"
health_status.message = "SparkApplication is in UnknownState because either driver pod or one or all executor pods in unknown state "
return health_status
end
end
end
health_status.status = "Progressing"
health_status.message = "Waiting for Executor pods"
return health_status In health.lua, we just check whether the dynamicAllocation is enabled: if obj.spec.dynamicAllocation.enabled == true then
if obj.status.applicationState.state == "RUNNING" then
health_status.status = "Healthy"
health_status.message = "SparkApplication is Running"
return health_status
end
end And it works for us. |
Hi there, attached a PR with checks to both Spark Operator API and plain Spark properties ways of configuring dynamic allocation, also taking in account DStreams applications + unit tests for all cases. |
I plan to merge and release the fix some time next week. @pdrastil has tested and verified this on their installation. If anyone would like to test the new health check for their use case (or propose additional tests for the PR), it would be appreciated! |
…llocation is enabled (argoproj#7557) (argoproj#11522) Signed-off-by: Yevgeniy Fridland <yevg.mord@gmail.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: emirot <emirot.nolan@gmail.com>
…llocation is enabled (argoproj#7557) (argoproj#11522) Signed-off-by: Yevgeniy Fridland <yevg.mord@gmail.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: schakrad <chakradari.sindhu@gmail.com>
Checklist:
argocd version
.Describe the bug
sparkoperator.k8s.io/SparkApplication
has dynamic allocation mode (spec.dynamicAllocation.enabled: true
), which does not have keyspec.executor.instances
. However,spec.executor.instances
is used by current health check script forsparkoperator.k8s.io/SparkApplication
. So in dynamic allocation mode,sparkoperator.k8s.io/SparkApplication
will never reachHealthy
status.To Reproduce
sparkoperator.k8s.io/SparkApplication
withspec.dynamicAllocation.enabled: true
via ArgoCD.sparkoperator.k8s.io/SparkApplication
will never reachHealthy
status.Expected behavior
sparkoperator.k8s.io/SparkApplication
shall reachHealthy
status when Spark application is running.Version
The text was updated successfully, but these errors were encountered: