Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated dataset module #272

Merged
merged 4 commits into from
Dec 16, 2024
Merged

Updated dataset module #272

merged 4 commits into from
Dec 16, 2024

Conversation

jesper-friis
Copy link
Contributor

@jesper-friis jesper-friis commented Dec 15, 2024

Description

Updated dataset module including the following changes:

  • Allow to add other types of entries to the triplestore that are not datasets. Ex: samples, models, instruments, people, projects...
  • Renamed list_data_iris() to search_iris(). It can now be use to search for all types of entries. Question: Would search() be an even better name?
  • Renamed prepare() to as_jsonld() and made it part of the public API

Type of change

  • Bug fix and code cleanup
  • New feature
  • Documentation update
  • Testing

Checklist for the reviewer

This checklist should be used as a help for the reviewer.

  • Is the change limited to one issue?
  • Does this PR close the issue?
  • Is the code easy to read and understand?
  • Do all new feature have an accompanying new test?
  • Has the documentation been updated as necessary?
  • Is the code properly tested?

  - Allow to add other types of entries to the triplestore that are not
    datasets. Ex: samples, models, instruments, people, projects...
  - Renamed list_data_iris() to search_iris(). It can now be use to search
    for all types of entries.
  - Renamed prepare() to as_jsonld() and made it part of the public API
Copy link

codecov bot commented Dec 15, 2024

Codecov Report

Attention: Patch coverage is 83.33333% with 4 lines in your changes missing coverage. Please review.

Project coverage is 78.21%. Comparing base (f947497) to head (c656536).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
tripper/dataset/dataset.py 83.33% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #272      +/-   ##
==========================================
- Coverage   78.22%   78.21%   -0.02%     
==========================================
  Files          20       20              
  Lines        2145     2153       +8     
==========================================
+ Hits         1678     1684       +6     
- Misses        467      469       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +84 to +87
other_entries:
- "@id": sample:SEM_cement_batch2/77600-23-001
"@type": chameo:Sample
title: Series for SEM images for sample 77600-23-001.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit strange that crucial information like @id and @type end up under other_entries. What is the rationale for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, other_entries is not a good label. My simpleminded initial idea was that we are only documenting datasets. But that is not true. We also want to represent samples, models, people, instruments, projects, etc in the knowledge base. These are not datasets but other types of entries.

A better label would be "samples". We could easily add support for that, but we would probably need a more general framework where the user can extend the categories. Ideally such extensions should be done with user-supplied JSON-LD context. A good solution should probably go into a separate PR.

Comment on lines 129 to 134
[JSON-LD context] or one of the following special keywords:
- "@id": Dataset IRI. Must always be given.
- "@type": IRI of the ontology class for this type of data.
For datasets, it is typically used to refer to a specific subclass
of `emmo:DataSet` that provides a semantic description of this
dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not render well in the documentation. Check out howt o make the lists inthe notes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done

Copy link
Contributor

@torhaugl torhaugl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this PR looks reasonable. It is mostly renaming prepare to as_jsonld, and adding a possibility to add other_entries such as samples. I am still of the opinion that context should be put into the yaml file, and that we should use JSON-LD or YAML-LD directly, which would immeadiately add the other_entries functionality in a more "intuitive" way. That discussion is better in a different PR I think. If you @francescalb dont have any more comments, feel free to merge

@jesper-friis
Copy link
Contributor Author

For me this PR looks reasonable. It is mostly renaming prepare to as_jsonld, and adding a possibility to add other_entries such as samples. I am still of the opinion that context should be put into the yaml file, and that we should use JSON-LD or YAML-LD directly, which would immeadiately add the other_entries functionality in a more "intuitive" way. That discussion is better in a different PR I think. If you @francescalb dont have any more comments, feel free to merge

I think that it is a good idea to be able to provide user-defined JSON-LD context. But it is for the advanced users. For normal users, we should make it as simple as possible. I think we should stick with JSON-LD (semantically), but represent it in YAML. YAML-LD is an extension of JSON-LD that I don't think we urgently need. It is still in a draft phase with no implementations that I am aware of.

@jesper-friis jesper-friis merged commit 78ff528 into master Dec 16, 2024
21 checks passed
@jesper-friis jesper-friis deleted the update-dataset branch December 16, 2024 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants