Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Federation Extension #419

Merged
merged 8 commits into from
Feb 8, 2022
Merged

Federation Extension #419

merged 8 commits into from
Feb 8, 2022

Conversation

m-mohr
Copy link
Member

@m-mohr m-mohr commented Oct 19, 2021

A first draft for the federation extension, all up to discussion, it's really just capturing ideas right now.

Rendered version: https://github.com/Open-EO/openeo-api/blob/federation-extension/extensions/federation/README.md

Related issue: https://github.com/openEOPlatform/architecture-docs/issues/188

extensions/federation/README.md Outdated Show resolved Hide resolved
extensions/federation/README.md Show resolved Hide resolved
extensions/federation/README.md Outdated Show resolved Hide resolved
extensions/federation/README.md Outdated Show resolved Hide resolved
@soxofaan
Copy link
Member

just FYI: the aggregator now includes this under / as initial implementation (versioned back-end urls):

"federation": {
    "eodc": {
      "url": "https://openeo.eodc.eu/v1.0/"
    },
    "vito": {
      "url": "https://openeo.vito.be/openeo/1.0/"
    }
  },

@soxofaan
Copy link
Member

Another thing to consider: in the aggregator there is currently a priority difference between backends (currently VITO and EODC), which is used when there is a (metadata) "conflict" (e.g. title of a "merged" collection) or there is no obvious back-end for a query (e.g. https://twitter.com/matthmohr/status/1451589229721661444). At the moment the VITO back-end has highest priority.

It might be relevant to formalize this priority better here, e.g. with an explicit back-end order, or with defining a main/reference back-end

@m-mohr
Copy link
Member Author

m-mohr commented Oct 28, 2021

Long-term I think there should be no order. Process differences could partly be solved by checking against the official specification. And all other cases probably would work best if they are not solved automatically but instead be logged somehow so that we can check what the issue is and solve them. To some degree, we may also need to specify processes and collections "manually" if an automatic merge doesn't work well (e.g. different descriptions with warnings/notes per back-end).
And jobs should be forwarded somewhat evenly if they can run on multiple back-ends.

@m-mohr m-mohr requested a review from soxofaan October 28, 2021 17:10
@m-mohr
Copy link
Member Author

m-mohr commented Nov 25, 2021

From the meeting today: It seems we also need a flag that the user can set for batch jobs/sync proc/web services so that they can choose explicitly which back-end to send a process graph to (e.g. similar to the billing plan option). Or should this be done via load_collection properties? The latter seems a bit weird.

@soxofaan
Copy link
Member

Or should this be done via load_collection properties? The latter seems a bit weird.

indeed, that feature is focused on specifying the backend to use for providing a particular collection, which currently directly implies the back-end to execute the whole graph

It seems we also need a flag that the user can set for batch jobs/sync proc/web services so that they can choose explicitly which back-end to send a process graph

Another idea that crossed my mind is having back-end specific connection urls in addition to the general one, e.g.

so you can interact with an explicitly chosen backend, but still use the general platform features (auth, billing, ...).
Putting a back-end identifier in the base url avoids having to define some kind of selection field in multiple end-points. And you can easily provide collection/process discovery without all the tricky metadata merging.

@m-mohr
Copy link
Member Author

m-mohr commented Nov 25, 2021

Another note from the meeting today:
Do we need a way to expose for collections and processes for which execution type (batch, sync, web services) they are (not) available?

I'm not sure we need it here or in the core API?

@soxofaan
Copy link
Member

Do we need a way to expose for collections and processes for which execution type (batch, sync, web services) they are (not) available?
I'm not sure we need it here or in the core API?

sounds like something that is not (only) related to federation

@m-mohr
Copy link
Member Author

m-mohr commented Nov 25, 2021

Added a new issue: #429

@m-mohr
Copy link
Member Author

m-mohr commented Nov 25, 2021

Another idea that crossed my mind is having back-end specific connection urls in addition to the general one [...]
so you can interact with an explicitly chosen backend, but still use the general platform features (auth, billing, ...). Putting a back-end identifier in the base url avoids having to define some kind of selection field in multiple end-points. And you can easily provide collection/process discovery without all the tricky metadata merging.

Yeah, I see pros and cons for both approaches. I think we need to discuss this in detail and check what is the better way forward.

Some that come to mind right now:

Back-end URLs:
Pros: Avoids merging metadata (could be mitigated by federation extension and some hand-drafted metadata)
Cons: You need to reconnect each time

Flag in requests:
Pros: You need to connect once (think web editor) and then can select each time easily in the UI (but this is less easy in programming libraries)
Cons: Needs changes in clients

@soxofaan
Copy link
Member

Cons: You need to reconnect each time

I don't think this will a big issue in practice: the choice to work on a specific back-end instead of working with the federated one will be very conscious decision, so it's a good thing to make that clear.
Moreover, a lot of things are impacted from switching between the federated backend and the underlying backends: collections, processes, file formats, udf runtimes, ... (as discussed elsewhere), so I think it's a good thing that this is reflected "UI-wise" by having to create a separate connection.

@m-mohr
Copy link
Member Author

m-mohr commented Nov 26, 2021

Indeed, this might be biased from a Web Editor standpoint although as a user I'd personally still prefer to have a simple switch. This could be something to leave out for now and then ask our users what they would prefer.

@m-mohr m-mohr marked this pull request as ready for review February 7, 2022 15:38
@m-mohr m-mohr force-pushed the federation-extension branch from 6f87404 to 818fb9f Compare February 8, 2022 15:26
@m-mohr m-mohr merged commit 0533f9e into draft Feb 8, 2022
@m-mohr m-mohr deleted the federation-extension branch February 8, 2022 17:09
@m-mohr
Copy link
Member Author

m-mohr commented Feb 8, 2022

Merged for now, let's discuss further issues in separate PRs.

soxofaan added a commit to openEOPlatform/documentation that referenced this pull request Feb 16, 2023
m-mohr pushed a commit to openEOPlatform/documentation that referenced this pull request Feb 27, 2023
…filtering (#58)

* Add documentation on backend selection with load_collection property filtering

ref: openEOPlatform/architecture-docs#268, Open-EO/openeo-api#419

* PR #58 add R code example

* fixup! PR #58 add R code example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generic way for warnings/deprecations on API response
2 participants