[Feature]: Support annotation of substrings with HERD or another system #1092
Labels
category: proposal
proposed enhancements or new features
priority: low
alternative solution already working and/or relevant to only specific user(s)
Milestone
What would you like to see added to HDMF?
Use case 1: HED tags are strings that can contain multiple keys, separated by commas, in any order. A DynamicTable may have a column of HED tags. We want to associate these keys with persistent identifiers in the HED schema, but I'm not 100% sure that is necessary. HED already provides tools for processing the HED tags and linking them to the HED schema.
Use case 2: HDMF-ML permits the storage of a PyTorch model output as a long text field. We want to be able to annotate terms within this output with the AI Ontology. A similar hypothetical use case is if a user wants to store text from a scientific paper, device configuration file, or software output in HDMF and associate terms within these strings to external resources.
A single string may not be the ideal representation for these data, but sometimes that is what we have to work with.
In use case 1, the key can be anywhere in any string in the one-dimensional VectorData.
In use case 2, we want to annotate a particular substring of a scalar text field, since the same substring may appear multiple times with different meanings (rare), so it would be important to store the starting index of the substring.
These probably require different solutions.
It may also be useful to have a way to refer to substrings in general for annotation, like DynamicTableRegion for row slicing of tables and TimeIntervals for annotating time series in time.
I'm open to ideas. Just wanted to start a discussion.
What solution would you like?
^
Do you have any interest in helping implement the feature?
Yes.
The text was updated successfully, but these errors were encountered: