-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalability issue with missing feature to request "all offers for a known asset ID" #10
Comments
This can be handled by a filter, so I don't think it is necessary to define any additional mechanism. If the client already knows the dataset id, why does it need to make a further catalog request? In DCAT, offers are referenced from the Dataset (and are generally contained, although with RDF this can be expanded), so a client will always know which offers are available for a data set once it has access to the latter. All subsequent operations can be performed with just the asset id and offer id. Also, the client presumably would know about a data set from a previous request and can cache it if needed. |
Hi @jimmarino
And without this implementation specific filter, I could never get the offer id for a specific asset WITHOUT fetching the catalog, (and parsing it on consumer side...), right? And as you also say, the offer id is required for further operations (negotiation). |
That's up to the specific implementation to support. Given the wide variety of query languages, filter expressions, etc., the testing burden this would entail, and higher priorities, we decided to refrain from standardizing the contents of the filter expression. Implementations can provide something similar to what you are describing and be fully spec compliant. Note also that at some point, the client will need to obtain the catalog from the provider, either explicitly via a request or through some form of "implicit" context. Do you have a concrete use case that details the scenario you are alluding to? |
The catalog request provides a property {
"@type": "ids:CatalogRequestMessage",
"ids:filter": {}
} The spec does not pre-define what filters are allowed. If you have a catalog implemented that allows searching for assets by ID, the request could look like that:
Or you may pre-define that an SQL statement is allowed. Then the response would not have to be a catalog with n objects but would countain only one asset: {
"@type": "dcat:Catalog",
[...]
"dcat:dataset": [
{
"@id": "ASSET_ID",
[...] It is just not part of the basic spec, but is allowed by the |
I think it could be also an option to NOT make this a 'filter' - because of the variety of filter expressions you mentioned. I would agree here. But then we need to find another solution to get all offers for a specific asset id. The use case is how we currently do it in bigger parts of Catena-X. We have digital twins, registered in the twin registry (all according to Platform I40 'Asset Administration Shell' (AAS) specification...). Now, those twins get an EDC in front of it. That means, a big part of what the 'catalog' would do in a pure EDC world, is done in the AAS Registry. Namely finding the Asset Id. That means we always know the asset id, but of course, not the offer id - since this is dynamically created. Fetching the whole catalog - as we do it right now - is an overhead that doesn't make sense. Doing this in pages and even caching doesn't solve the root cause of the problem. Now, we have ways to filter for the asset id, BUT this is EDC implementation specific - and this is what I think is not good. We would heavily depend on IMPLEMENTATION decisions instead of PROTOCOL decisions for a major part of our solution. |
IMO it's not implementation specific, because C-X could define how a catalog request filtering for an assetID - not on implementation level, but on protocol level - has to look like. And it could be defined e.g. that 4 filters are pre-defined, 1 is mandatory to support (filtering by assetId) and the connectors (resp. catalog services) are allowed to implement x use-case-/system-sepcific filters. So any connector in your project could be implemented and used, following the "C-X-flavored" message schemes. This is the idea of IDS only defining the core and allowing for adoptions. Still, I can follow that a filter by ID may be the most basic filter that should be pre-defined and set as mandatory by default... |
You will have the same need for a C-X flavored vocabulary as an IDS dataset allows any attribute, but maybe in C-X your assets are domain-specific and systems on both sides need to be able to process the information they receive. Same goes for the policies. That is a question of how to design the interfaces/levels of adoption. And we have agreed at the very beginning that in the first step of the spec only such things are specified that are absolutely essential for a proper communication. |
The issue with defining what is inside the filter attribute is that it is a lot of work and difficult to get right, particularly given all of the other issues that need to be solved. We can't just define a "simple" mechanism. Consider the following questions that would arise:
By leaving this implementation specific, we allow future work to define filter expressions. I think it's also important to take a very conservative view of what to standardize to avoid premature standardization without concrete implementation experience. This approach also provides for implementation-specific innovation, which is important for standards to get uptake. |
This seems to be a data-space-specific extension point. That should be highlighted in the document that this (and maybe others) extension point exists. For the time being, we might want to add examples to the document that guide the reader. |
This discussion might take a while, @ssteinbuss will add an infobox for the time being. |
Catalog Protocol will be extended by a new message type:
Catalog Protocol Https Binding will map this message type to
with a response object of type dcat:Dataset |
@sebbader-sap request to have an additional UML picture, and will provide it ;-) |
Closing this issue, since the changes have been merged into the 0.8 release already. |
After receiving access to the repo last week, I'm now trying to dig deeper into the protocol spec. One question I have is regarding scalability of requests when the AssetId is already known. In the 'old' specification I understood, that it was part of the IDS protocol to request 'all offers for 1 asset' with adding the 'requestedEelment' to the request to the catalog service.
Ref:
https://github.com/International-Data-Spaces-Association/IDS-G/tree/main/Communication/sequence-diagrams/data-connector-to-data-connector
With the 'new' spec, it looks to me like this part is 'outsourced' to the implementation part with the very general 'ids:filter' expression
https://github.com/International-Data-Spaces-Association/ids-specification/blob/main/catalog/message/catalog.request.message.json
I would see this as a very crucial part for scaling. Catalogs can become VERY big. Transferring the whole content even if the requester already knows the asset id is a problem. And outsourcing this scalability critical part should not be desired from my perspective.
My proposal is NOT to define all filters in IDS, but specify a way to 'filter' for one specific assetId only.
Any thoughts on this? Unfortunately I can not join the Thursday meeting this week because of a Catena-X Workshop. Maybe you can comment here if you have thoughts on this.
Thanks in advance,
Matthias Binzer
The text was updated successfully, but these errors were encountered: