[Feature]: Support annotation of substrings with HERD or another system #1092

rly · 2024-04-05T15:04:46Z

What would you like to see added to HDMF?

Use case 1: HED tags are strings that can contain multiple keys, separated by commas, in any order. A DynamicTable may have a column of HED tags. We want to associate these keys with persistent identifiers in the HED schema, but I'm not 100% sure that is necessary. HED already provides tools for processing the HED tags and linking them to the HED schema.
Use case 2: HDMF-ML permits the storage of a PyTorch model output as a long text field. We want to be able to annotate terms within this output with the AI Ontology. A similar hypothetical use case is if a user wants to store text from a scientific paper, device configuration file, or software output in HDMF and associate terms within these strings to external resources.

A single string may not be the ideal representation for these data, but sometimes that is what we have to work with.

In use case 1, the key can be anywhere in any string in the one-dimensional VectorData.
In use case 2, we want to annotate a particular substring of a scalar text field, since the same substring may appear multiple times with different meanings (rare), so it would be important to store the starting index of the substring.
These probably require different solutions.

It may also be useful to have a way to refer to substrings in general for annotation, like DynamicTableRegion for row slicing of tables and TimeIntervals for annotating time series in time.

I'm open to ideas. Just wanted to start a discussion.

What solution would you like?

^

Do you have any interest in helping implement the feature?

Yes.

mavaylon1 · 2024-05-13T13:20:46Z

Focusing on case 2, what do is mean to store a pytorch model output as a long textfield? If I had a model that does semantic segmentation and I predicted a segmented image. The matrix is stored as a string?

VisLab · 2024-07-08T20:58:19Z

@rly with the release of HED version 8.3.0, HED now has persistent identifiers for each HED tag (and auxiliary items such as unit classes etc.). HED now has an associated Ontology (see https://bioportal.bioontology.org/ontologies/HED).

Is there any more documentation on the roadmap for HERD and the needed support?

mavaylon1 · 2024-07-08T23:19:07Z

@VisLab Hi there. As the main developer of HERD, the next planned stage is a continuation of user facing tools to more easily automate term validation and HERD population when writing the file.

We do have some ideas that have not been formalized in a community facing roadmap that are beyond user facing tools.

That being said, the team and I are more than happy to discuss expanding HERD. I can talk with the team next week, and then get back to you.

rly added category: proposal proposed enhancements or new features priority: low alternative solution already working and/or relevant to only specific user(s) labels Apr 5, 2024

mavaylon1 assigned rly and mavaylon1 May 13, 2024

mavaylon1 added this to the Future milestone May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Support annotation of substrings with HERD or another system #1092

[Feature]: Support annotation of substrings with HERD or another system #1092

rly commented Apr 5, 2024

mavaylon1 commented May 13, 2024

VisLab commented Jul 8, 2024

mavaylon1 commented Jul 8, 2024 •

edited

Loading

[Feature]: Support annotation of substrings with HERD or another system #1092

[Feature]: Support annotation of substrings with HERD or another system #1092

Comments

rly commented Apr 5, 2024

What would you like to see added to HDMF?

What solution would you like?

Do you have any interest in helping implement the feature?

mavaylon1 commented May 13, 2024

VisLab commented Jul 8, 2024

mavaylon1 commented Jul 8, 2024 • edited Loading

mavaylon1 commented Jul 8, 2024 •

edited

Loading