Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samples.instanceadmin.test_instanceadmin: test_add_and_delete_cluster failed #353

Closed
flaky-bot bot opened this issue Jul 8, 2021 · 4 comments · Fixed by #356 or #362
Closed

samples.instanceadmin.test_instanceadmin: test_add_and_delete_cluster failed #353

flaky-bot bot opened this issue Jul 8, 2021 · 4 comments · Fixed by #356 or #362
Assignees
Labels
api: bigtable Issues related to the googleapis/python-bigtable API. flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. samples Issues that are directly related to samples. type: process A process-related concern. May include testing, release, or the like.

Comments

@flaky-bot
Copy link

flaky-bot bot commented Jul 8, 2021

This test failed!

To configure my behavior, see the Flaky Bot documentation.

If I'm commenting on this issue too often, add the flakybot: quiet label and
I will stop commenting.


commit: 1e51285
buildURL: Build Status, Sponge
status: failed

Test output
args = (parent: "projects/python-docs-samples-tests/instances/instanceadmin-636-1625735059"
cluster_id: "instanceadmin-920"
c...ocation: "projects/python-docs-samples-tests/locations/us-central1-a"
  serve_nodes: 1
  default_storage_type: SSD
}
,)
kwargs = {'metadata': [('x-goog-request-params', 'parent=projects/python-docs-samples-tests/instances/instanceadmin-636-1625735059'), ('x-goog-api-client', 'gl-python/3.6.13 grpc/1.38.1 gax/1.30.0')]}
@six.wraps(callable_)
def error_remapped_callable(*args, **kwargs):
    try:
      return callable_(*args, **kwargs)

.nox/py-3-6/lib/python3.6/site-packages/google/api_core/grpc_helpers.py:67:


self = <grpc._channel._UnaryUnaryMultiCallable object at 0x7efeafef4fd0>
request = parent: "projects/python-docs-samples-tests/instances/instanceadmin-636-1625735059"
cluster_id: "instanceadmin-920"
cl... location: "projects/python-docs-samples-tests/locations/us-central1-a"
serve_nodes: 1
default_storage_type: SSD
}

timeout = None
metadata = [('x-goog-request-params', 'parent=projects/python-docs-samples-tests/instances/instanceadmin-636-1625735059'), ('x-goog-api-client', 'gl-python/3.6.13 grpc/1.38.1 gax/1.30.0')]
credentials = None, wait_for_ready = None, compression = None

def __call__(self,
             request,
             timeout=None,
             metadata=None,
             credentials=None,
             wait_for_ready=None,
             compression=None):
    state, call, = self._blocking(request, timeout, metadata, credentials,
                                  wait_for_ready, compression)
  return _end_unary_response_blocking(state, call, False, None)

.nox/py-3-6/lib/python3.6/site-packages/grpc/_channel.py:946:


state = <grpc._channel._RPCState object at 0x7efeafe0f978>
call = <grpc._cython.cygrpc.SegregatedCall object at 0x7efeadd1f508>
with_call = False, deadline = None

def _end_unary_response_blocking(state, call, with_call, deadline):
    if state.code is grpc.StatusCode.OK:
        if with_call:
            rendezvous = _MultiThreadedRendezvous(state, call, None, deadline)
            return state.response, rendezvous
        else:
            return state.response
    else:
      raise _InactiveRpcError(state)

E grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
E status = StatusCode.UNAVAILABLE
E details = "The instance is currently being changed, please try again."
E debug_error_string = "{"created":"@1625735069.728455632","description":"Error received from peer ipv4:142.250.99.95:443","file":"src/core/lib/surface/call.cc","file_line":1066,"grpc_message":"The instance is currently being changed, please try again.","grpc_status":14}"
E >

.nox/py-3-6/lib/python3.6/site-packages/grpc/_channel.py:849: _InactiveRpcError

The above exception was the direct cause of the following exception:

capsys = <_pytest.capture.CaptureFixture object at 0x7efeafe89240>
dispose_of = <function dispose_of..disposal at 0x7efeb0eee2f0>

def test_add_and_delete_cluster(capsys, dispose_of):
    dispose_of(INSTANCE)

    # This won't work, because the instance isn't created yet
    instanceadmin.add_cluster(PROJECT, INSTANCE, CLUSTER2)
    out = capsys.readouterr().out
    assert f"Instance {INSTANCE} does not exist" in out

    # Get the instance created
    instanceadmin.run_instance_operations(PROJECT, INSTANCE, CLUSTER1)
    capsys.readouterr()  # throw away output

    # Add a cluster to that instance
  instanceadmin.add_cluster(PROJECT, INSTANCE, CLUSTER2)

test_instanceadmin.py:131:


instanceadmin.py:158: in add_cluster
cluster.create()
../../google/cloud/bigtable/cluster.py:301: in create
"cluster": cluster_pb,
../../google/cloud/bigtable_admin_v2/services/bigtable_instance_admin/client.py:1001: in create_cluster
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
.nox/py-3-6/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py:145: in call
return wrapped_func(*args, **kwargs)
.nox/py-3-6/lib/python3.6/site-packages/google/api_core/grpc_helpers.py:69: in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)


value = None
from_value = <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "The instance is currently...l.cc","file_line":1066,"grpc_message":"The instance is currently being changed, please try again.","grpc_status":14}"

???
E google.api_core.exceptions.ServiceUnavailable: 503 The instance is currently being changed, please try again.

:3: ServiceUnavailable

@flaky-bot flaky-bot bot added flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Jul 8, 2021
@product-auto-label product-auto-label bot added api: bigtable Issues related to the googleapis/python-bigtable API. samples Issues that are directly related to samples. labels Jul 8, 2021
@flaky-bot flaky-bot bot added the flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. label Jul 8, 2021
@flaky-bot
Copy link
Author

flaky-bot bot commented Jul 8, 2021

Looks like this issue is flaky. 😟

I'm going to leave this open and stop commenting.

A human should fix and close this.


When run at the same commit (1e51285), this test passed in one build (Build Status, Sponge) and failed in another build (Build Status, Sponge).

@tseaver tseaver added type: process A process-related concern. May include testing, release, or the like. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Jul 9, 2021
@tseaver tseaver self-assigned this Jul 9, 2021
tseaver added a commit that referenced this issue Jul 9, 2021
In addition to showing the better practice (using the operation returned
from 'Instance.create' / 'Cluster.create'), this change also hardens
the sample against eventual-consistency issues.

Closes #353.
tseaver added a commit that referenced this issue Jul 9, 2021
In addition to showing the better practice (using the operation returned
from 'Instance.create' / 'Cluster.create'), this change also hardens
the sample against eventual-consistency issues.

The timeouts used match those used in the system tests.

Closes #353.
tseaver added a commit that referenced this issue Jul 9, 2021
In addition to showing the better practice (using the operation returned
from 'Instance.create' / 'Cluster.create'), this change also hardens
the sample against eventual-consistency issues.

Closes #353.
@flaky-bot flaky-bot bot reopened this Jul 12, 2021
@flaky-bot flaky-bot bot added the priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. label Jul 12, 2021
@flaky-bot
Copy link
Author

flaky-bot bot commented Jul 12, 2021

Oops! Looks like this issue is still flaky. It failed again. 😬

I reopened the issue, but a human will need to close it again.


commit: aa5b606
buildURL: Build Status, Sponge
status: failed

Test output
args = (parent: "projects/python-docs-samples-tests/instances/instanceadmin-825-1626080660"
cluster_id: "instanceadmin-148"
c...ocation: "projects/python-docs-samples-tests/locations/us-central1-a"
  serve_nodes: 1
  default_storage_type: SSD
}
,)
kwargs = {'metadata': [('x-goog-request-params', 'parent=projects/python-docs-samples-tests/instances/instanceadmin-825-1626080660'), ('x-goog-api-client', 'gl-python/3.6.13 grpc/1.38.1 gax/1.31.0')]}
@six.wraps(callable_)
def error_remapped_callable(*args, **kwargs):
    try:
      return callable_(*args, **kwargs)

.nox/py-3-6/lib/python3.6/site-packages/google/api_core/grpc_helpers.py:67:


self = <grpc._channel._UnaryUnaryMultiCallable object at 0x7fd07d3026d8>
request = parent: "projects/python-docs-samples-tests/instances/instanceadmin-825-1626080660"
cluster_id: "instanceadmin-148"
cl... location: "projects/python-docs-samples-tests/locations/us-central1-a"
serve_nodes: 1
default_storage_type: SSD
}

timeout = None
metadata = [('x-goog-request-params', 'parent=projects/python-docs-samples-tests/instances/instanceadmin-825-1626080660'), ('x-goog-api-client', 'gl-python/3.6.13 grpc/1.38.1 gax/1.31.0')]
credentials = None, wait_for_ready = None, compression = None

def __call__(self,
             request,
             timeout=None,
             metadata=None,
             credentials=None,
             wait_for_ready=None,
             compression=None):
    state, call, = self._blocking(request, timeout, metadata, credentials,
                                  wait_for_ready, compression)
  return _end_unary_response_blocking(state, call, False, None)

.nox/py-3-6/lib/python3.6/site-packages/grpc/_channel.py:946:


state = <grpc._channel._RPCState object at 0x7fd07d2fc550>
call = <grpc._cython.cygrpc.SegregatedCall object at 0x7fd07c19a348>
with_call = False, deadline = None

def _end_unary_response_blocking(state, call, with_call, deadline):
    if state.code is grpc.StatusCode.OK:
        if with_call:
            rendezvous = _MultiThreadedRendezvous(state, call, None, deadline)
            return state.response, rendezvous
        else:
            return state.response
    else:
      raise _InactiveRpcError(state)

E grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
E status = StatusCode.UNAVAILABLE
E details = "The instance is currently being changed, please try again."
E debug_error_string = "{"created":"@1626080671.204595768","description":"Error received from peer ipv4:74.125.142.95:443","file":"src/core/lib/surface/call.cc","file_line":1066,"grpc_message":"The instance is currently being changed, please try again.","grpc_status":14}"
E >

.nox/py-3-6/lib/python3.6/site-packages/grpc/_channel.py:849: _InactiveRpcError

The above exception was the direct cause of the following exception:

capsys = <_pytest.capture.CaptureFixture object at 0x7fd07d2cf6a0>
dispose_of = <function dispose_of..disposal at 0x7fd07e3652f0>

def test_add_and_delete_cluster(capsys, dispose_of):
    dispose_of(INSTANCE)

    # This won't work, because the instance isn't created yet
    instanceadmin.add_cluster(PROJECT, INSTANCE, CLUSTER2)
    out = capsys.readouterr().out
    assert f"Instance {INSTANCE} does not exist" in out

    # Get the instance created
    instanceadmin.run_instance_operations(PROJECT, INSTANCE, CLUSTER1)
    capsys.readouterr()  # throw away output

    # Add a cluster to that instance
  instanceadmin.add_cluster(PROJECT, INSTANCE, CLUSTER2)

test_instanceadmin.py:131:


instanceadmin.py:158: in add_cluster
cluster.create()
../../google/cloud/bigtable/cluster.py:301: in create
"cluster": cluster_pb,
../../google/cloud/bigtable_admin_v2/services/bigtable_instance_admin/client.py:1001: in create_cluster
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
.nox/py-3-6/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py:145: in call
return wrapped_func(*args, **kwargs)
.nox/py-3-6/lib/python3.6/site-packages/google/api_core/grpc_helpers.py:69: in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)


value = None
from_value = <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "The instance is currently...l.cc","file_line":1066,"grpc_message":"The instance is currently being changed, please try again.","grpc_status":14}"

???
E google.api_core.exceptions.ServiceUnavailable: 503 The instance is currently being changed, please try again.

:3: ServiceUnavailable

@tseaver
Copy link
Contributor

tseaver commented Jul 12, 2021

This failure is not (as I had earlier assumed) fixable by polling the LRO returned from Cluster.create with a long-enough timeout. Rather, it is being raised directly from the CreateCluster API, before the LRO is constructed. That API has no retry semantics configured in the proto, which means we can't expect it to be retried automagically.

instanceadmin.py:158: in add_cluster
    cluster.create()
../../google/cloud/bigtable/cluster.py:301: in create
    "cluster": cluster_pb,
../../google/cloud/bigtable_admin_v2/services/bigtable_instance_admin/client.py:1001: in create_cluster
    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
.nox/py-3-6/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py:145: in __call__
    return wrapped_func(*args, **kwargs)
.nox/py-3-6/lib/python3.6/site-packages/google/api_core/grpc_helpers.py:69: in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

value = None
from_value = <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "The instance is currently...l.cc","file_line":1066,"grpc_message":"The instance is currently being changed, please try again.","grpc_status":14}"

None of the API-invoking Cluster and Instance CRUD methods expose retry or timeout parameters, which means that we can't even add a custom retry in the sample to fix this flake. @kolea2 should we update them to allow such usage?

@tseaver tseaver removed the priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. label Jul 12, 2021
@tseaver
Copy link
Contributor

tseaver commented Jul 12, 2021

@kolea2 Seeing that samples for other API clients (automl, datalabeling, translate) are using the backoff library, I guess we could update the samples which flake here (particularly adding a cluster to a newly-created instance) to use it, e.g.:

import backoff

...

    @backoff.on_exception(backoff.expo, exceptions.ServiceUnavailable)
    def do_create_cluster(cluster):
        return cluster.create()

    operation = do_create_cluster(cluster)
    operation.result(timeout=30)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/python-bigtable API. flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. samples Issues that are directly related to samples. type: process A process-related concern. May include testing, release, or the like.
Projects
None yet
1 participant