Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APIExport consumer can't wildcard list/watch its schemas until the first APIBinding is created #1183

Closed
ncdc opened this issue Jun 1, 2022 · 25 comments · Fixed by #2135
Closed
Assignees
Labels
area/apiexports kind/bug Categorizes issue or PR as related to a bug.

Comments

@ncdc
Copy link
Member

ncdc commented Jun 1, 2022

Describe the bug
A controller using APIExports can't successfully start up and wildcard list/watch the APIs it's exporting until the first APIBinding is created.

To Reproduce
Steps to reproduce the behavior:

  1. Create an APIResourceSchema
  2. Create an APIExport that exports the APIResourceSchema
  3. Try to do a wildcard list/watch through the APIExport virtual workspace, such as
https://$server/services/apiexport/root:default:andy/controller-runtime-example-data.my.domain/clusters/*/apis/data.my.domain/v1alpha1/widgets
  1. Get a 404

In a controller-runtime app, you'd see something like this:

1.6541008964721239e+09  ERROR   controller-runtime.source       if kind is a CRD, it should be installed before calling Start   {"kind": "Widget.data.my.domain", "error": "no matches for kind \"Widget\" in version \"data.my.domain/v1alpha1\""}

Expected behavior
The controller should be able to start listing/watching its APIs even without any APIBindings for it

Additional context
None

@ncdc ncdc added kind/bug Categorizes issue or PR as related to a bug. area/apiexports labels Jun 1, 2022
@ncdc ncdc self-assigned this Jun 1, 2022
@sttts sttts added this to kcp Jun 1, 2022
@sttts sttts moved this to New in kcp Jun 1, 2022
@sttts
Copy link
Member

sttts commented Jun 1, 2022

workaround: create an APIBinding in the same workspace to itself (we do the same in compute workspaces).

@sttts
Copy link
Member

sttts commented Jun 1, 2022

cc @davidfestal

@ncdc
Copy link
Member Author

ncdc commented Jun 1, 2022

I'm working on a fix

@ncdc
Copy link
Member Author

ncdc commented Jun 1, 2022

Would it make sense to create a new controller for apiresourceschema to create bound CRDs?

@ncdc
Copy link
Member Author

ncdc commented Jun 2, 2022

@sttts ^?

@ncdc
Copy link
Member Author

ncdc commented Jun 2, 2022

Actually I want to go back and look at the vw and see if I can fix it there

@sttts
Copy link
Member

sttts commented Jun 2, 2022

I don't think we want a copy of every schema in every cluster.

@sttts
Copy link
Member

sttts commented Jun 2, 2022

Actually I want to go back and look at the vw and see if I can fix it there

Also my intuition.

@sttts
Copy link
Member

sttts commented Jun 2, 2022

there is a problem though with the vw solution: what about watches? They have to start and then eventually show objects when the first APIBinding is created on that shard. That's tricky. We could have a fake storage for identity based requests always returning empty, or a watch without events. When an APIBinding is added, we would have to terminate the old fake storage (including the fake watch) and swich over to the bound CRD based storage. Does that make sense?

@ncdc
Copy link
Member Author

ncdc commented Jun 2, 2022

Yeah, I'll look into it

@ncdc ncdc added this to the v0.6.0 milestone Jun 2, 2022
@ncdc
Copy link
Member Author

ncdc commented Jun 2, 2022

Our custom apiextensions logic to get CRDs looks for a CRD matching <identity, group, resource>, where the index data is populated from APIBindings. This is why you get a 404 when doing a wildcard list/watch (when no APIBindings exist that point to the APIExport).

@ncdc
Copy link
Member Author

ncdc commented Aug 30, 2022

Chatted with @sttts and here's what we want to do:

  1. Because APIExports will have to be replicated from their source shard to all other shards that have APIBindings for them, we don't want to automatically create a bound CRD as soon as an APIExport is created
  2. Instead, we only want to preserve the current behavior of only creating the bound CRD when the first APIBinding is created, AND
  3. We only populate APIExport.status.virtualWorkspaceURLs if there is at least 1 APIBinding for an APIExport. This means moving the URL management into the APIBinding controller
  4. Controllers using the APIExport virtual workspace must wait until there is at least 1 virtualWorkspaceURL entry before proceeding to list/watch exported resource (this means there is at least 1 APIBinding)
    1. They should probably (must?) terminate when the list is empty (restarting would being the waiting process again)

@sttts
Copy link
Member

sttts commented Aug 30, 2022

cc @p0lyn0mial

@stevekuznetsov
Copy link
Contributor

DX of a crash-looping controller until an APIBinding exists is ... not ideal? :)

@stevekuznetsov
Copy link
Contributor

Or even ops-X - that will likely trigger alerts on any/all clusters

@sttts
Copy link
Member

sttts commented Aug 30, 2022

They shouldn't be crash-looping, but just notice the URL list is empty and keep watching it until some shard (in their topology partition) shows up.

@stevekuznetsov
Copy link
Contributor

I was responding to:

They should probably (must?) terminate when the list is empty (restarting would being the waiting process again)

We likely want some user-facing way to expose this waiting state, ultimately.

@ncdc
Copy link
Member Author

ncdc commented Aug 30, 2022

We can adjust that bit to not do the termination

@sttts
Copy link
Member

sttts commented Aug 30, 2022

Just sit there and wait until a shard appears, like a controller not seeing any object in a watch.

@p0lyn0mial
Copy link
Contributor

I think the same issue applies to the kcp server, not only to the vw server.

Because APIExports will have to be replicated from their source shard to all other shards that have APIBindings for them, we don't want to automatically create a bound CRD as soon as an APIExport is created

is this due to some performance issue? having a single CRD object per shard doesn't sound too heavy.

btw: do we want to change the reply from the server to something like 404: you need to create an APIBinding first to use this api (if the export exists, otherwise we would return plain 404) ?

@ncdc
Copy link
Member Author

ncdc commented Aug 30, 2022

I think the same issue applies to the kcp server, not only to the vw server.

Yes, this is for APIExports in general, so it's not specific to the vw server.

is this due to some performance issue? having a single CRD object per shard doesn't sound too heavy.

It's 1 CRD per APIResourceSchema. An APIExport can export 1..n APIResourceSchemas.

btw: do we want to change the reply from the server to something like 404: you need to create an APIBinding first to use this api (if the export exists, otherwise we would return plain 404) ?

Maybe?

@sttts
Copy link
Member

sttts commented Aug 30, 2022

is this due to some performance issue? having a single CRD object per shard doesn't sound too heavy.

There can be many. And there is just no need to tell controllers to list/watch shards without bindings.

@sttts
Copy link
Member

sttts commented Aug 30, 2022

also creating APIExports could create a storm of storage creations and list/watches. Doing that on demand when a binding shows up is much much better.

@sttts
Copy link
Member

sttts commented Aug 30, 2022

And we can safely assume that only very very few APIExports will actually be bound to more than one workspace.

@ncdc
Copy link
Member Author

ncdc commented Sep 15, 2022

@sttts & I chatted a couple of weeks ago about what to do about this. We decided that the APIExport status should only get virtual workspace URL(s) set if and only if there is at least 1 APIBinding referencing the APIExport.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apiexports kind/bug Categorizes issue or PR as related to a bug.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants