Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we specialize dcterms:hasPart for dcat:DatasetSeries? #1307

Closed
riccardoAlbertoni opened this issue Feb 28, 2021 · 32 comments · Fixed by #1292
Closed

Should we specialize dcterms:hasPart for dcat:DatasetSeries? #1307

riccardoAlbertoni opened this issue Feb 28, 2021 · 32 comments · Fixed by #1292
Labels
dataset-series dcat due for closing Issue that is going to be closed if there are no objection within 6 days
Milestone

Comments

@riccardoAlbertoni
Copy link
Contributor

riccardoAlbertoni commented Feb 28, 2021

For data catalog, dcterms:hasPart is specialized into dcat:dataset, dcat:service, etc. Should we have some sort of specialization also for dcterms:hasPart used in dcat:DatasetSeries?

In particular, to distinguish more between datasets that are 'just a bag of files' (see examples in Loosely structured catalog) from actual datasets/distributions part of the dataset series.

@riannella
Copy link

Can we just use dcat:dataset as property from dcat:DatasetSeries and dcat:Dataset ?

(of course, I would like to see the inverse property as well :-)

@dr-shorthair
Copy link
Contributor

My suggestion:

dcat:hasSeriesMember a owl:ObjectProperty ;
    rdfs:subPropertyOf dcterms:hasPart ;
    rdfs:domain dcat:DatasetSeries ;
    rdfs:range dcat:Dataset ;
. 

dcat:inSeries a owl:ObjectProperty ;
    rdfs:subPropertyOf dcterms:isPartOf ;
    rdfs:domain dcat:Dataset ;
    rdfs:range dcat:DatasetSeries ;
. 

@jakubklimek
Copy link
Contributor

@dr-shorthair @riannella From the point of view of an application developer, let me again state that I would strongly prefer to standardize only one of the two mutually inverse properties.

Otherwise, compliant applications and queries need to always account for the possibility that in one data catalog, the datasets are connected to the series using one property, in another using the second, which makes them more complex with each inverse property pair.

@agreiner
Copy link
Contributor

agreiner commented Mar 9, 2021

I would suggest keeping only the inSeries property, so that one needn't worry about updating metadata for the series every time a new dataset is added to that series.

@dr-shorthair
Copy link
Contributor

Agree with @jakubklimek and @agreiner . Only one needed and dcat:inSeries is more useful.

@riannella
Copy link

...but they are inverse properties...so you only need to define one, the other can be asserted or inferred.

@dr-shorthair
Copy link
Contributor

Yes. Agree.

@andrea-perego
Copy link
Contributor

I'm fine with defining only dcat:inSeries for the moment.

@riannella
Copy link

Sorry @andrea-perego - when I said "you only need to define one" - I really meant you only need to "assert" one.
Both properties should be defined (in the ontology).

@dr-shorthair dr-shorthair changed the title Should we specialize dct:hasPart for dcat:DatasetSeries? Should we specialize dcterms:hasPart for dcat:DatasetSeries? Mar 10, 2021
@init-dcat-ap-de
Copy link

dcat:inSeries would mean

(catalog) --dcat:dataset--> (dataset1)
(catalog) --dcat:dataset--> (dataset2)
(catalog) --dcat:dataset--> (dataset3)
(catalog) --dcat:dataset--> (dataseries)
(dataset1) --dcat:inSeries--> (dataseries)
(dataset2) --dcat:inSeries--> (dataseries)
(dataset3) --dcat:inSeries--> (dataseries)

instead of

(catalog) --dcat:dataset--> (dataset1)
(catalog) --dcat:dataset--> (dataset2)
(catalog) --dcat:dataset--> (dataset3)
(catalog) --dcat:dataset--> (dataseries)
(dataseries) --dcat:hasSeriesMember--> (dataset1)
(dataseries) --dcat:hasSeriesMember--> (dataset2)
(dataseries) --dcat:hasSeriesMember--> (dataset3)

?

I don't see why we would switch the logic for dataseries. Until now the "bigger" class states which "smaller" objects are part of them. Catalogs include Datasets, Datasets include Distributions.

@smrgeoinfo
Copy link
Contributor

smrgeoinfo commented Mar 13, 2021

The rational for the dcat:inSeries approach is that for series that are derived from e.g. continuous sensor feeds, there are new members of the series being added at periodic intervals. the dcat:inSeries approach allows the metadata for new members of the series to declare their membership without having to update the series metadata.
hmm, I suppose its easy enough to insert triples like (dataseries) --dcat:hasSeriesMember--> (dataset3)...

@dr-shorthair
Copy link
Contributor

I agree with @smrgeoinfo that the more important relationship is from the part to the series - either dcterms:isPartOf or dcat:inSeries. And this case should be added to the description of dcat:Dataset.

This also impacts #1272.

@riannella
Copy link

In our sector, for example, a Regulator would say the more important relationship was from the series to the parts.
Lets have both.

@jakubklimek
Copy link
Contributor

Regarding the hasSeriesMember and inSeries, the way it is defined now it always connects an instance of dataset series to an instance of the dataset. However, the dataset series may have a hierarchical nature, with a different kind of series on each level, e.g. the top-level series is split according to topical facets, the next level according to geographical division, and the lowest level according to temporal division.

The bottom line is that I think that the range of hasSeriesMember and the domain of inSeries should allow the users to use both Datasets and Dataset Series.

We can already see this in the Czech catalog:

@dr-shorthair
Copy link
Contributor

@jakubklimek because of the subclass axiom

dcat:DatasetSeries rdfs:subClassOf dcat:Dataset .

isn't the issue you describe already taken care of implicilty?

@jakubklimek
Copy link
Contributor

@dr-shorthair ahh, ok, of course. I lost track of the state of the discussion about whether DatasetSeries should be a subclass of dataset/catalog/resource. I guess my use case is an argument for it being a subclass of dataset.

@riccardoAlbertoni
Copy link
Contributor Author

Some discussions about the names of these properties have taken place in PR #1328.
I am going to report the gist of the discussions below, as this issue seems to be a better place where to discuss.

Originally posted by @andrea-perego in #1328 (comment)

For the inverse of dcat:inSeries I suggest we use something shorter: dcat:hasMember.

Another option for the names of these properties is dcat:hasParent / dcat:hasChild , following the terminology used in ISO 19115 for dataset series.

Originally posted by @riccardoAlbertoni in #1328 (comment)

Yes, the names are long. However, I wonder if mentioning 'series' in the property names can help recall and use these properties more coherently. This also considering possible confusion with dcterms:hasPart and other parts-related properties. What do you think?

Originally posted by @andrea-perego in #1328 (comment)

I don't think it's necessary to recall the scope of this property, as it has its own domain (dcat:DatasetSeries). For the same reason, including "series" in the property name looks redundant, and I am not very worried about being confused with dcterms:hasPart.

However, if the close semantics of "part" and "series" raises concerns on improper use, we should consider other options.

dcat:hasChild / dcat:hasParent are a good alternative, IMO, and they reflect the hierarchical structure of dataset series, as already mentioned in PR #1292 and in #1307 (comment)

Originally posted by @riccardoAlbertoni in #1328 (comment)

Andrea, If you think dcat:hasSeriesMember is too long and not acceptable then dcat:hasMember will be my choice.

Using dcat:hasChild and dcat:hasParent helps those familiar with ISO dataset series. However, we target a larger audience. In this target, I suspect people might have different views of what is considered a child of a dataset (potentially, anything originated by a dataset can be considered a 'child'!? and that might be confusing).

@riannella
Copy link

I find parts and parent/child to ok for generic relationships. I think we can be more specific and use property names that are aligned to the domain of data catalog publishing.

@dr-shorthair proposal above with dcat:hasSeriesMember and dcat:inSeries is good for the relationships between the dcat:DatasetSeries and the dcat:Dataset.

I would then go further and propose (in the same way) for having dcat:hasCatalogMember and dcat:inCatalog for the relationships between the dcat:Catalog and the dcat:DatasetSeries.

@riccardoAlbertoni
Copy link
Contributor Author

I would then go further and propose (in the same way) for having dcat:hasCatalogMember and dcat:inCatalog for the relationships between the dcat:Catalog and the dcat:DatasetSeries.

Thanks for the comment, @riannella:

dcat:catalog is already used in existing implementations, as it was defined in DCAT 2. Moreover, it follows the same name convention of dcat:dataset, dcat:distribution, dcat:record defined in DCAT 1, and dcat:service defined in DCAT 2.

I am afraid that changing the name of dcat:catalog has more disadvantages than advantages. In particular, It would limit back-compatibility with previous DCAT recommendations and undesirably impact the existing implementations.

@agreiner
Copy link
Contributor

Hm, I'm getting a little concerned about the direction here. This thread shows at least three people who agreed that it would be beneficial to only define the inSeries property and leave out hasSeriesMember or whatever, because it makes it difficult to determine membership in a series programmatically, since one doesn't know in advance which approach is taken in the metadata. Yet we have people saying they see no reason not to include both, the draft includes both, and there is ongoing discussion of what to call the second term. Next, I expect someone will say that we should just go with it this way and see if anyone outside the group comments, nobody will comment, and we will be stuck with something that doesn't reflect what most of us want, or at least a process that doesn't address the issue raised.

@riccardoAlbertoni
Copy link
Contributor Author

riccardoAlbertoni commented Mar 17, 2021

In tonight's call, dcat:catalogMemberhas been proposed instead of dcat:hasCatalogMember to maintain the consistence with dcat:catalog, dcat:dataset, dcat:distribution, dcat:record.

@andrea-perego, @riannella, @dr-shorthair: How dcat:catalogMember sounds to you?

@agreiner
Copy link
Contributor

I think that's okay if people really need to mark that up in metadata, but honestly, I don't see why anyone would use that if they can just use the catalog itself. If it's intended to be used for a series, I don't think most people will even think that it has anything to do with a series.

@riccardoAlbertoni
Copy link
Contributor Author

Hm, I'm getting a little concerned about the direction here. This thread shows at least three people who agreed that it would be beneficial to only define the inSeries property and leave out hasSeriesMember or whatever, because it makes it difficult to determine membership in a series programmatically, since one doesn't know in advance which approach is taken in the metadata. Yet we have people saying they see no reason not to include both, the draft includes both, and there is ongoing discussion of what to call the second term. Next, I expect someone will say that we should just go with it this way and see if anyone outside the group comments, nobody will comment, and we will be stuck with something that doesn't reflect what most of us want, or at least a process that doesn't address the issue raised.

Thanks for your comment, @agreiner. I understand your concern.

As for your previous specific comment,

I would suggest keeping only the inSeries property so that one needn't worry about updating metadata for the series every time a new dataset is added to that series.

I suspect the metadata of the dataset series needs to be updated anyway. Especially if we keep the upstream inheritance explained in https://raw.githack.com/w3c/dxwg/dcat-dataseries-issue1272/dcat/index.html#dataset-series where it is said: "The update date (dcterms:modified) of the dataset series should correspond to the latest publication or update date of the child datasets."

Also, we are having some parallel discussion of whether supporting inverse properties with a "lightweight approach" adopted by PROV-O (https://www.w3.org/TR/prov-o/#inverse-names) ( we have discussed this in tonight call, see meeting minutes)

I think the difficulty here is that there are different intertwined aspects for deciding whether or not to have both inSeries and hasSeriesMember, and which one of the two.
In the PR I am co-drafting, I have kept both the directions as I sensed the discussion was still open, and there were views in contrast. I might have "miscounted" the commenter positions as the discussion is quite intricated, and people seem to have changed mind during the discussion. Also, some editors and contributors haven't expressed their views. For these reasons, I attempted to be "inclusive" and have a draft on which we can reconsider this point starting from a more stable description.

@riannella
Copy link

In tonight's call, dcat:catalogMemberhas been proposed instead of dcat:hasCatalogMember to maintain the consistence with dcat:catalog, dcat:dataset, dcat:distribution, dcat:record.

@andrea-perego, @riannella, @dr-shorthair: How dcat:catalogMember sounds to you?

OK

@andrea-perego
Copy link
Contributor

In tonight's call, dcat:catalogMemberhas been proposed instead of dcat:hasCatalogMember to maintain the consistence with dcat:catalog, dcat:dataset, dcat:distribution, dcat:record.

@andrea-perego, @riannella, @dr-shorthair: How dcat:catalogMember sounds to you?

@riccardoAlbertoni , if I correctly recall the last meeting, the discussion was about the relationship between a series and its members, with a proposal of calling it dcat:seriesMember.

To be consistent, the relationship between a dcat:Catalog and a dcat:Resource should rather be dcat:resource. I would also avoid relating the two semantically, as the notion of "membership" in a catalogue is not exactly the same one of dataset series.

@agreiner
Copy link
Contributor

My understanding of W3C's process is that we take the best guess of the working group and publish that for input from the community, not that we take what feels the most inclusive and put that forth. By typing +1 in a meeting to publish, one says in effect that they are supportive of the text as it stands and okay with the assumption by the rest of the world that you stand by it. Notes can be helpful for expressing alternatives, so I would expect additional options be shown in notes, thereby remaining inclusive of a minority opinion if there isn’t agreement in the group. The current note, to me, doesn’t clearly state the concern raised at all. I’d prefer to have a note that says some members of the group feel there should be an inverse to inSeries and ask if anyone feels it is important to do so. Given the strong reason not to do so, I think this approach is justified.

Re update dates, I think any property that assumes people will make updates to a published dataset’s metadata is flawed. Even the most diligent publishers of data will never be able to update copies of datasets that find their way to secondary sites and to users. Update dates only make sense to me when they reflect the state when the dataset (or series) was published.

@andrea-perego
Copy link
Contributor

Looking at this thread, I think there's a general agreement on dcat:inSeries (which is what is issue was specifically about), but still some concerns on its inverse and other design issues (including those raised by @agreiner ).

Therefore, I propose we revise PR #1292 by provisionally removing the inverse of dcat:inSeries, and then merge it (it has been around for quite a while, and it is addressing also other issues).

For the inverse of dcat:inSeries and other discussion topics, we will instead create separate issues, and move the discussion there. This would also help better track the decisions that will be taken on them.

@riccardoAlbertoni
Copy link
Contributor Author

Looking at this thread, I think there's a general agreement on dcat:inSeries (which is what is issue was specifically about), but still some concerns on its inverse and other design issues (including those raised by @agreiner ).

Therefore, I propose we revise PR #1292 by provisionally removing the inverse of dcat:inSeries, and then merge it (it has been around for quite a while, and it is addressing also other issues).

For the inverse of dcat:inSeries and other discussion topics, we will instead create separate issues, and move the discussion there. This would also help better track the decisions that will be taken on them.

Yes, I agree with the direction you have suggested. I am going to update the PR and open a new issue accordingly.

@riccardoAlbertoni
Copy link
Contributor Author

riccardoAlbertoni commented Mar 22, 2021

In tonight's call, dcat:catalogMemberhas been proposed instead of dcat:hasCatalogMember to maintain the consistence with dcat:catalog, dcat:dataset, dcat:distribution, dcat:record.
@andrea-perego, @riannella, @dr-shorthair: How dcat:catalogMember sounds to you?

@riccardoAlbertoni, if I correctly recall the last meeting, the discussion was about the relationship between a series and its members, with a proposal of calling it dcat:seriesMember.

To be consistent, the relationship between a dcat:Catalog and a dcat:Resource should rather be dcat:resource. I would also avoid relating the two semantically, as the notion of "membership" in a catalogue is not exactly the same one of dataset series.

Thanks, @andrea-perego , you are right. I switched series with catalog for some mysterious reasons, but series was the one I meant. My apologies ... .

@riccardoAlbertoni
Copy link
Contributor Author

Looking at this thread, I think there's a general agreement on dcat:inSeries (which is what is issue was specifically about), but still some concerns on its inverse and other design issues (including those raised by @agreiner ).
Therefore, I propose we revise PR #1292 by provisionally removing the inverse of dcat:inSeries, and then merge it (it has been around for quite a while, and it is addressing also other issues).
For the inverse of dcat:inSeries and other discussion topics, we will instead create separate issues, and move the discussion there. This would also help better track the decisions that will be taken on them.

Yes, I agree with the direction you have suggested. I am going to update the PR and open a new issue accordingly.

If we do not need to open other issues besides #1335, this issue can be closed as soon as we merge the changes in PR #1292.

@riccardoAlbertoni riccardoAlbertoni added the due for closing Issue that is going to be closed if there are no objection within 6 days label Mar 22, 2021
@andrea-perego
Copy link
Contributor

If we do not need to open other issues besides #1335, this issue can be closed as soon as we merge the changes in PR #1292.

Thanks, @riccardoAlbertoni .

I wonder whether we should also open a separate issue about the general approach to be taken to deal with inverse properties.

BTW, part of the discussion in this thread fits into #1273 . I posted there the relevant comments.

@riccardoAlbertoni
Copy link
Contributor Author

riccardoAlbertoni commented Mar 23, 2021

I wonder whether we should also open a separate issue about the general approach to be taken to deal with inverse properties.

opened a new issue #1336

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset-series dcat due for closing Issue that is going to be closed if there are no objection within 6 days
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants