Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigtable: 'test_create_instance_w_two_clusters' flakes with '504 Deadline Exceeded' #5928

Closed
tseaver opened this issue Sep 11, 2018 · 9 comments · Fixed by #6579 or #8450
Closed

Bigtable: 'test_create_instance_w_two_clusters' flakes with '504 Deadline Exceeded' #5928

tseaver opened this issue Sep 11, 2018 · 9 comments · Fixed by #6579 or #8450
Assignees
Labels
api: bigtable Issues related to the Bigtable API. flaky testing type: process A process-related concern. May include testing, release, or the like.

Comments

@tseaver
Copy link
Contributor

tseaver commented Sep 11, 2018

/cc @sduskis, @vikas-jamdar

From:

___________ TestInstanceAdminAPI.test_create_instance_w_two_clusters ___________

self = <tests.system.TestInstanceAdminAPI testMethod=test_create_instance_w_two_clusters>

    def test_create_instance_w_two_clusters(self):
        from google.cloud.bigtable import enums
        from google.cloud.bigtable.table import ClusterState
        _PRODUCTION = enums.Instance.Type.PRODUCTION
        ALT_INSTANCE_ID = 'dif' + unique_resource_id('-')
        instance = Config.CLIENT.instance(ALT_INSTANCE_ID,
                                          instance_type=_PRODUCTION,
                                          labels=LABELS)
    
        ALT_CLUSTER_ID_1 = ALT_INSTANCE_ID + '-c1'
        ALT_CLUSTER_ID_2 = ALT_INSTANCE_ID + '-c2'
        LOCATION_ID_2 = 'us-central1-f'
        STORAGE_TYPE = enums.StorageType.HDD
        cluster_1 = instance.cluster(
            ALT_CLUSTER_ID_1, location_id=LOCATION_ID, serve_nodes=SERVE_NODES,
            default_storage_type=STORAGE_TYPE)
        cluster_2 = instance.cluster(
            ALT_CLUSTER_ID_2, location_id=LOCATION_ID_2,
            serve_nodes=SERVE_NODES, default_storage_type=STORAGE_TYPE)
        operation = instance.create(clusters=[cluster_1, cluster_2])
        # We want to make sure the operation completes.
        operation.result(timeout=10)
    
        # Make sure this instance gets deleted after the test case.
        self.instances_to_delete.append(instance)
    
        # Create a new instance instance and make sure it is the same.
        instance_alt = Config.CLIENT.instance(ALT_INSTANCE_ID)
        instance_alt.reload()
    
        self.assertEqual(instance, instance_alt)
        self.assertEqual(instance.display_name, instance_alt.display_name)
        self.assertEqual(instance.type_, instance_alt.type_)
    
        clusters, failed_locations = instance_alt.list_clusters()
        self.assertEqual(failed_locations, [])
    
        clusters.sort(key=lambda x: x.name)
        alt_cluster_1, alt_cluster_2 = clusters
    
        self.assertEqual(cluster_1.location_id, alt_cluster_1.location_id)
        self.assertEqual(alt_cluster_1.state, enums.Cluster.State.READY)
        self.assertEqual(cluster_1.serve_nodes, alt_cluster_1.serve_nodes)
        self.assertEqual(cluster_1.default_storage_type,
                         alt_cluster_1.default_storage_type)
        self.assertEqual(cluster_2.location_id, alt_cluster_2.location_id)
        self.assertEqual(alt_cluster_2.state, enums.Cluster.State.READY)
        self.assertEqual(cluster_2.serve_nodes, alt_cluster_2.serve_nodes)
        self.assertEqual(cluster_2.default_storage_type,
                         alt_cluster_2.default_storage_type)
    
        # Test list clusters in project via 'client.list_clusters'
        clusters, failed_locations = Config.CLIENT.list_clusters()
        self.assertFalse(failed_locations)
        found = set([cluster.name for cluster in clusters])
        self.assertTrue({alt_cluster_1.name,
                         alt_cluster_2.name,
                         Config.CLUSTER.name}.issubset(found))
    
        temp_table_id = 'test-get-cluster-states'
        temp_table = instance.table(temp_table_id)
>       temp_table.create()

tests/system.py:280: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
google/cloud/bigtable/table.py:218: in create
    table=table, initial_splits=splits)
google/cloud/bigtable_admin_v2/gapic/bigtable_table_admin_client.py:327: in create_table
    request, retry=retry, timeout=timeout, metadata=metadata)
../api_core/google/api_core/gapic_v1/method.py:139: in __call__
    return wrapped_func(*args, **kwargs)
../api_core/google/api_core/retry.py:260: in retry_wrapped_func
    on_error=on_error,
../api_core/google/api_core/retry.py:177: in retry_target
    return target()
../api_core/google/api_core/timeout.py:206: in func_with_timeout
    return func(*args, **kwargs)
../api_core/google/api_core/grpc_helpers.py:61: in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

value = DeadlineExceeded('Deadline Exceeded',)
from_value = <_Rendezvous of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEED...all.cc","file_line":1099,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>

    def raise_from(value, from_value):
>       raise value
E       DeadlineExceeded: 504 Deadline Exceeded

../.nox/sys-2-7/lib/python2.7/site-packages/six.py:737: DeadlineExceeded
@tseaver tseaver added testing api: bigtable Issues related to the Bigtable API. type: process A process-related concern. May include testing, release, or the like. flaky labels Sep 11, 2018
@sduskis
Copy link
Contributor

sduskis commented Sep 13, 2018

We probably need to increase the time out from the 130 seconds it is now, to more like 15 minutes. Create table can take longer than 2 minutes, depending on conditions.

@tseaver
Copy link
Contributor Author

tseaver commented Sep 13, 2018

@sduskis The deadline in bigtable_table_admin_client_config.py is set via autosynth. We would need to get the upstream configuration fixed, rather than changing here manually.

@sduskis
Copy link
Contributor

sduskis commented Sep 13, 2018

Understood. I am in progress in making those upstream changes now.

@tseaver
Copy link
Contributor Author

tseaver commented Sep 17, 2018

@sduskis
Copy link
Contributor

sduskis commented Sep 17, 2018

FYI, my internal changes touched more than just CreateTable. I wanted to fix other RPCs as well. Hopefully, we'll get a fix out this week.

@sduskis
Copy link
Contributor

sduskis commented Sep 18, 2018

@tseaver, I got the changes out. See this googleapis commit for details.

A synth regen should hopefully fix this flake.

@tseaver tseaver reopened this Sep 28, 2018
@tseaver
Copy link
Contributor Author

tseaver commented Oct 17, 2018

Another failure today.

@tseaver
Copy link
Contributor Author

tseaver commented Jun 19, 2019

@sduskis I just saw the temp_table.create() fail again with a 504. It looks to me like the googleapis commit you linked did not up the timeout for CreateInstance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the Bigtable API. flaky testing type: process A process-related concern. May include testing, release, or the like.
Projects
None yet
2 participants