Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robustness of operator_upgrade notebook #2119

Closed
RafalSkolasinski opened this issue Jul 13, 2020 · 0 comments · Fixed by #2145
Closed

Robustness of operator_upgrade notebook #2119

RafalSkolasinski opened this issue Jul 13, 2020 · 0 comments · Fixed by #2145
Assignees
Milestone

Comments

@RafalSkolasinski
Copy link
Contributor

RafalSkolasinski commented Jul 13, 2020

Describe the bug

There are multiple issues with operator_upgrade notebook test.

  1. Even if test fails pytest think it is PASS because there is missing raise to re-raise the caught exception.
    def test_upgrade(self):
        try:
            create_and_run_script("../../notebooks", "operator_upgrade")
        except:
            run("make install_seldon", shell=True, check=False)
  1. Following
def waitStatus(desired):
    for i in range(120):
        allAvailable = True
        failedGet = False
        state=!kubectl get sdep -o json
        state=json.loads("".join(state))
        for model in state["items"]:
            if "status" in model:
                print("model",model["metadata"]["name"],model["status"]["state"])
                if model["status"]["state"]!="Available":
                    allAvailable=False
                    break
            else:
                failedGet = True
        if allAvailable == desired and not failedGet:
            break
        time.sleep(1)
    return allAvailable

fails due to timeout. Deployments become available after this function already gave up (it is effectively waiting 120s only).
This leads to a pure timeout failure but this is anyway masked by 1.

To reproduce

  1. Run test locally or in CI.
  2. Observe in logs:
kubectl create namespace seldon-system || echo "namespace seldon-system exists"
Error from server (AlreadyExists): namespaces "seldon-system" already exists
namespace seldon-system exists
helm install seldon \
        ../../helm-charts/seldon-core-operator \
        --namespace seldon-system \
        --set istio.enabled=true \
        --set istio.gateway=istio-system/seldon-gateway \
        --set certManager.enabled=false \
        --set executor.enabled="true" \
        --wait
Error: cannot re-use a name that is still in use
make[1]: *** [Makefile:76: install_seldon] Error 1
make[1]: Leaving directory '/workspace/source/testing/scripts'
PASSED

I observe it in current master as well as commits from last week before starting to upgrade dependencies.

Expected behaviour

Fails when is really failing, passes without problems if things are ok.

@RafalSkolasinski RafalSkolasinski added bug triage Needs to be triaged and prioritised accordingly labels Jul 13, 2020
@ukclivecox ukclivecox added priority/p0 and removed triage Needs to be triaged and prioritised accordingly labels Jul 16, 2020
@ukclivecox ukclivecox added this to the 1.2 milestone Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants