Updated dataset module #272

jesper-friis · 2024-12-15T18:20:16Z

Description

Updated dataset module including the following changes:

Allow to add other types of entries to the triplestore that are not datasets. Ex: samples, models, instruments, people, projects...
Renamed list_data_iris() to search_iris(). It can now be use to search for all types of entries. Question: Would search() be an even better name?
Renamed prepare() to as_jsonld() and made it part of the public API

Type of change

Bug fix and code cleanup
New feature
Documentation update
Testing

Checklist for the reviewer

This checklist should be used as a help for the reviewer.

Is the change limited to one issue?
Does this PR close the issue?
Is the code easy to read and understand?
Do all new feature have an accompanying new test?
Has the documentation been updated as necessary?
Is the code properly tested?

- Allow to add other types of entries to the triplestore that are not datasets. Ex: samples, models, instruments, people, projects... - Renamed list_data_iris() to search_iris(). It can now be use to search for all types of entries. - Renamed prepare() to as_jsonld() and made it part of the public API

codecov · 2024-12-15T18:21:59Z

Codecov Report

Attention: Patch coverage is 83.33333% with 4 lines in your changes missing coverage. Please review.

Project coverage is 78.21%. Comparing base (f947497) to head (c656536).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
tripper/dataset/dataset.py	83.33%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #272      +/-   ##
==========================================
- Coverage   78.22%   78.21%   -0.02%     
==========================================
  Files          20       20              
  Lines        2145     2153       +8     
==========================================
+ Hits         1678     1684       +6     
- Misses        467      469       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tripper/dataset/dataset.py

francescalb · 2024-12-16T09:28:00Z

tests/input/semdata.yaml

+other_entries:
+  - "@id": sample:SEM_cement_batch2/77600-23-001
+    "@type": chameo:Sample
+    title: Series for SEM images for sample 77600-23-001.


I find it a bit strange that crucial information like @id and @type end up under other_entries. What is the rationale for this?

Yes, other_entries is not a good label. My simpleminded initial idea was that we are only documenting datasets. But that is not true. We also want to represent samples, models, people, instruments, projects, etc in the knowledge base. These are not datasets but other types of entries.

A better label would be "samples". We could easily add support for that, but we would probably need a more general framework where the user can extend the categories. Ideally such extensions should be done with user-supplied JSON-LD context. A good solution should probably go into a separate PR.

francescalb · 2024-12-16T09:37:12Z

tripper/dataset/dataset.py

+        [JSON-LD context] or one of the following special keywords:
          - "@id": Dataset IRI.  Must always be given.
          - "@type": IRI of the ontology class for this type of data.
            For datasets, it is typically used to refer to a specific subclass
            of `emmo:DataSet` that provides a semantic description of this
            dataset.


This does not render well in the documentation. Check out howt o make the lists inthe notes.

Thanks. Done

Co-authored-by: Francesca L. Bleken <48128015+francescalb@users.noreply.github.com>

…date-dataset

torhaugl

For me this PR looks reasonable. It is mostly renaming prepare to as_jsonld, and adding a possibility to add other_entries such as samples. I am still of the opinion that context should be put into the yaml file, and that we should use JSON-LD or YAML-LD directly, which would immeadiately add the other_entries functionality in a more "intuitive" way. That discussion is better in a different PR I think. If you @francescalb dont have any more comments, feel free to merge

jesper-friis · 2024-12-16T14:40:07Z

For me this PR looks reasonable. It is mostly renaming prepare to as_jsonld, and adding a possibility to add other_entries such as samples. I am still of the opinion that context should be put into the yaml file, and that we should use JSON-LD or YAML-LD directly, which would immeadiately add the other_entries functionality in a more "intuitive" way. That discussion is better in a different PR I think. If you @francescalb dont have any more comments, feel free to merge

I think that it is a good idea to be able to provide user-defined JSON-LD context. But it is for the advanced users. For normal users, we should make it as simple as possible. I think we should stick with JSON-LD (semantically), but represent it in YAML. YAML-LD is an extension of JSON-LD that I don't think we urgently need. It is still in a draft phase with no implementations that I am aware of.

jesper-friis mentioned this pull request Dec 15, 2024

New TableDoc class providing a table interface for data documentation #273

Merged

10 tasks

francescalb reviewed Dec 16, 2024

View reviewed changes

tripper/dataset/dataset.py Outdated Show resolved Hide resolved

francescalb reviewed Dec 16, 2024

View reviewed changes

jesper-friis and others added 3 commits December 16, 2024 11:33

Update tripper/dataset/dataset.py

a71fc3f

Co-authored-by: Francesca L. Bleken <48128015+francescalb@users.noreply.github.com>

Added blank lines before lists in docstrings as suggested by flb

61a8ba8

Merge branch 'update-dataset' of github.com:EMMC-ASBL/tripper into up…

c656536

…date-dataset

torhaugl approved these changes Dec 16, 2024

View reviewed changes

jesper-friis merged commit 78ff528 into master Dec 16, 2024
21 checks passed

jesper-friis deleted the update-dataset branch December 16, 2024 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated dataset module #272

Updated dataset module #272

jesper-friis commented Dec 15, 2024 •

edited

Loading

codecov bot commented Dec 15, 2024 •

edited

Loading

francescalb Dec 16, 2024

jesper-friis Dec 16, 2024

francescalb Dec 16, 2024

jesper-friis Dec 16, 2024

torhaugl left a comment •

edited

Loading

jesper-friis commented Dec 16, 2024

Updated dataset module #272

Updated dataset module #272

Conversation

jesper-friis commented Dec 15, 2024 • edited Loading

Description

Type of change

Checklist for the reviewer

codecov bot commented Dec 15, 2024 • edited Loading

Codecov Report

francescalb Dec 16, 2024

Choose a reason for hiding this comment

jesper-friis Dec 16, 2024

Choose a reason for hiding this comment

francescalb Dec 16, 2024

Choose a reason for hiding this comment

jesper-friis Dec 16, 2024

Choose a reason for hiding this comment

torhaugl left a comment • edited Loading

Choose a reason for hiding this comment

jesper-friis commented Dec 16, 2024

jesper-friis commented Dec 15, 2024 •

edited

Loading

codecov bot commented Dec 15, 2024 •

edited

Loading

torhaugl left a comment •

edited

Loading