Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ cache: add a synthetic delay to the cache server #2742

Conversation

stevekuznetsov
Copy link
Contributor

Signed-off-by: Steve Kuznetsov skuznets@redhat.com

/cc @ncdc
/assign @p0lyn0mial

@openshift-ci openshift-ci bot requested a review from ncdc February 2, 2023 18:42
@stevekuznetsov
Copy link
Contributor Author

In the sharded env, this seems to cause the VWs to never come up. They fail to get data from a live read ... I need to improve the logging situation here, this is hard to parse.

@p0lyn0mial
Copy link
Contributor

lgtm but it looks like you will be changing this PR, i will have another look when it is ready.

@p0lyn0mial
Copy link
Contributor

e2e-multiple-runs failed on TestReplicateShard.

Normally this test creates a workspace, adds a new shard resource and verifies if it was replicated.

In this run the test failed because a workspace wasn't scheduled. The scheduling controller was trying to put the workspace on a fake shard!

In general TestReplicationDisruptive runs each scenario in a separate private server. Each scenario gets its own set of directories. However in the faulty run I found a few requests issued by UserAgent=TestReplicateShardNegative which might indicate that the private instance was shared among the tests. Which is very suspicious because ports to private servers appears to be assigned randomly.

I0202 18:55:46.498784   55516 resource_controller.go:216] "queueing resource" reconciler="kcp-workload-resource-scheduler" key="workspaces.v1alpha1.tenancy.kcp.io::root|e2e-workspace-ntgqm"

E0202 18:55:46.557341   55516 workspace_controller.go:237] "kcp-workspace" controller failed to sync "root|e2e-workspace-ntgqm", err: Post "https://base.kcp.test.dev/clusters/1yo677m4ish61htq/apis/core.kcp.io/v1alpha1/logicalclusters": dial tcp: lookup base.kcp.test.dev on 172.30.0.10:53: no such host

And it looks like the fake shard was added by a different test (TestReplicateShardNegative ?!):

I0202 18:55:46.402018   55516 httplog.go:131] "HTTP" verb="LIST" URI="/clusters/root/apis/core.kcp.io/v1alpha1/shards" latency="1.545268ms" userAgent="cache.test/v0.0.0 (linux/amd64) kubernetes/$Format/TestReplicationDisruptive/TestReplicateShardNegative" audit-ID="9cdc7e0b-5f4e-438c-bb49-bc9c421d274d" srcIP="10.130.28.235:44072" resp=200
I0202 18:55:46.405126   55516 httplog.go:131] "HTTP" verb="POST" URI="/clusters/root/apis/core.kcp.io/v1alpha1/shards" latency="1.62927ms" userAgent="cache.test/v0.0.0 (linux/amd64) kubernetes/$Format/TestReplicationDisruptive" audit-ID="2780431d-ecc0-4ae7-a256-02d76efb0ddc" srcIP="10.130.28.235:44072" resp=201
I0202 18:55:46.405148   55516 shard_controller.go:90] "queueing Shard" reconciler="kcp-shard" key="root|test-shard-7397489778125985083"

Also, it looks like the CI is missing some logs. Since TestReplicationDisruptive runs two scenarios, each with its own server I would expect to find two separate directories/log that would correspond to subtests but I found only one.

@ncdc
Copy link
Member

ncdc commented Feb 13, 2023

@p0lyn0mial I only recently fixed it so each disruptive test runs with its own private server. The test failure might have been from before that fix.

@p0lyn0mial
Copy link
Contributor

@p0lyn0mial I only recently fixed it so each disruptive test runs with its own private server. The test failure might have been from before that fix.

thanks for the info, let me re-run the test then.

/test e2e-multiple-runs

@p0lyn0mial
Copy link
Contributor

e2e-multiple-runs is green! it looks like now we are storing artefacts of a private server in a separate dir 👍 - https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/kcp-dev_kcp/27[…]tiple-runs/artifacts/TestReplicationDisruptive/

@stevekuznetsov
Copy link
Contributor Author

/test all

@stevekuznetsov
Copy link
Contributor Author

Huh, something else must have landed in the interim, when I left this was 100% broken :)

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
@stevekuznetsov stevekuznetsov force-pushed the skuznets/add-sythethic-delay branch from 8ce17df to 0d6a2ae Compare February 20, 2023 16:12
@stevekuznetsov stevekuznetsov added approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. labels Feb 20, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 20, 2023

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 4f71d22 into kcp-dev:main Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants