Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to actually handle the implementation of library schema #156

Closed
VisLab opened this issue Jan 22, 2021 · 20 comments
Closed

How to actually handle the implementation of library schema #156

VisLab opened this issue Jan 22, 2021 · 20 comments
Labels
enhancement New feature or request

Comments

@VisLab
Copy link
Member

VisLab commented Jan 22, 2021

As we get further along in the specification and implementation with respect to including library schema, I have done some serious rethinking about how we should do it going forward and propose to change both the HED spec and the proposed validator implementation. Since we have just started to implement the validator support of library schema, it isn't a problem now, but it will be going forward. I will outline the issue and my proposal for changing here.

In the current implementation we declare library name spaces using the HED Definition mechanism. This allows name-space declarations to appear anywhere in side-cars or in the event files. As a result, validation requires some extra passes at the beginning to gather the definitions. We can do this, though it is inconvenient. The difficulty is that it is also going to be a nightmare for tool developers doing analysis, as they will also have to do this.

I am proposing that the base HED schema and library schemas are passed in from the outside and so are known at the beginning of validation or other analysis. This makes sure that the library schema versions are consistent across the study and eliminates a lot of extra processing passes during validation and analysis.

How to accomplish this in BIDS:

In BIDS, we can put in a proposal to update the specification so that the HED version field in dataset.json can be a single version number or a dictionary with entries:

    libraryName:
    version #:
    shortName:

The shortName would be the nickname prefix used before the library tags in the HED strings themselves.

How to accomplish this for EEGLAB:
This information could be stored in the .etc field of the EEG structure.

@VisLab VisLab changed the title How to actually handle the implementation of library scheam How to actually handle the implementation of library schema Jan 22, 2021
@happy5214
Copy link
Member

happy5214 commented Jan 22, 2021

This is my suggestion for the BIDS implementation:

"HEDVersion": '...',
"HEDLibraries": {
    "lib1": {
        "libraryName": "library1",
        "version": "0.0.1"
    },
    "lib2": {
        "libraryName": "library2",
        "version": "HED_library2_0.5.3.xml"
    }
}

In this example, the keys are the short names.

@sappelhoff
Copy link
Member

This is my suggestion for the BIDS implementation:

@happy5214 - in your example, isn't the HEDVersion field superfluous? If you specify a library at a certain version, then the HEDVersion is probably implied by that, right?

I suggest that the HEDVersion field (see here) gets extended to accept either a string OR an object of strings (or URIs).

@VisLab
Copy link
Member Author

VisLab commented Jan 23, 2021

Agreed....

@happy5214
Copy link
Member

This is my suggestion for the BIDS implementation:

@happy5214 - in your example, isn't the HEDVersion field superfluous? If you specify a library at a certain version, then the HEDVersion is probably implied by that, right?

I suggest that the HEDVersion field (see here) gets extended to accept either a string OR an object of strings (or URIs).

@sappelhoff @VisLab I'm not sure what you mean by it being superfluous. HEDVersion specifies the version of the base HED schema (in this case, a HED 3 schema from this repository), while HEDLibraries specifies used library schemas (from https://github.com/hed-standard/hed-schema-library). I'm not aware of a library schema version imposing a specific base schema version, which is how I read your question. If we merge the fields, how would the base schema be specified in the merged object? The keys are the prefixes used in the tags, and the base schema has no prefix. Is it the empty string?

@sappelhoff
Copy link
Member

I'm not aware of a library schema version imposing a specific base schema version, which is how I read your question.

that was my assumption - but apparently that's not the case. I am just wondering why it's not the case.

When I develop an HED library, shouldn't I use a particular base schema for that? And shouldn't I have certain "minimum" and "maximum" version requirements for my HED library? For example, my library "stefan-test" needs features from HED base version 2, ... so it's incompatible with HED base version 1.

@VisLab
Copy link
Member Author

VisLab commented Jan 25, 2021

@sappelhoff is correct: users always have to specify a single base version of the HED schema. The library schema are always in addition to the base schema and will be used by a relatively small number of users.

A smaller footprint in BIDS is far preferred to a larger one. If we can implement this with no top-level change to the bids specification, it will be a lot cleaner for us for integration with BIDS and better for the community.

As I understand it now, here would be several allowed possibilities for HEDVersion.

A standard base HED version is specified, no library schema (This is the recommended use case for 99.9% of the users. The tools get the version from hedxml on github.)

"HEDVersion:  "8.0.0"

A locally developed base HED version is specified, no library schema (This is used when people have extended the base
schema and store the schema in the source directory of the BIDS dataset.)

"HEDVersion:  "mylocalhed.xml"

User wants some combination of a base schema and library schema (Not used that often until there is a real community of
library schema developers. However, we need to get it in so that this process can start.)

"HEDVersion": {
    "version": "8.0.0",
    "lib1": {
        "libraryName": "library1",
        "version": "0.0.1"
    },
    "lib2": {
        "libraryName": "library2",
        "version": "HED_library2_0.5.3.xml"
    }
}

If HEDVersion has a dictionary value, then version is a required key.

This would involve some extra logic to get the HED versions for the BIDS validator, but it will be isolated to a small section
of the code. Looking at it from the perspective of people using the data and tool builders down stream, it seems that this
will be clear. It also ensures that all of the HED information is kept together under a single HEDVersion key in the
dataset_description.json.

@sappelhoff
Copy link
Member

A smaller footprint in BIDS is far preferred to a larger one. If we can implement this with no top-level change to the bids specification, it will be a lot cleaner for us for integration with BIDS and better for the community.

Agreed 💯

It also ensures that all of the HED information is kept together under a single HEDVersion key in the
dataset_description.json.

That would be cleanest in my opinion as well

@VisLab
Copy link
Member Author

VisLab commented Jan 25, 2021

This discussion has helped to clarify my thinking on this.

@happy5214
Copy link
Member

I'm not aware of a library schema version imposing a specific base schema version, which is how I read your question.

that was my assumption - but apparently that's not the case. I am just wondering why it's not the case.

When I develop an HED library, shouldn't I use a particular base schema for that? And shouldn't I have certain "minimum" and "maximum" version requirements for my HED library? For example, my library "stefan-test" needs features from HED base version 2, ... so it's incompatible with HED base version 1.

@sappelhoff is correct: users always have to specify a single base version of the HED schema. The library schema are always in addition to the base schema and will be used by a relatively small number of users.

A smaller footprint in BIDS is far preferred to a larger one. If we can implement this with no top-level change to the bids specification, it will be a lot cleaner for us for integration with BIDS and better for the community.

As I understand it now, here would be several allowed possibilities for HEDVersion.

A standard base HED version is specified, no library schema (This is the recommended use case for 99.9% of the users. The tools get the version from hedxml on github.)

"HEDVersion:  "8.0.0"

A locally developed base HED version is specified, no library schema (This is used when people have extended the base
schema and store the schema in the source directory of the BIDS dataset.)

"HEDVersion:  "mylocalhed.xml"

User wants some combination of a base schema and library schema (Not used that often until there is a real community of
library schema developers. However, we need to get it in so that this process can start.)

"HEDVersion": {
    "version": "8.0.0",
    "lib1": {
        "libraryName": "library1",
        "version": "0.0.1"
    },
    "lib2": {
        "libraryName": "library2",
        "version": "HED_library2_0.5.3.xml"
    }
}

If HEDVersion has a dictionary value, then version is a required key.

This would involve some extra logic to get the HED versions for the BIDS validator, but it will be isolated to a small section
of the code. Looking at it from the perspective of people using the data and tool builders down stream, it seems that this
will be clear. It also ensures that all of the HED information is kept together under a single HEDVersion key in the
dataset_description.json.

I still fail to see what @VisLab's understanding of the issue in the second quote (having a single base schema version specified by the BIDS dataset, with optional library schemas) has anything to do with my understanding of what @sappelhoff was stating in the first quote (which is the library schema itself, rather than the BIDS dataset, implying a base schema version). Even the proposed merged HEDVersion solution would require special-casing version in the HED 3 spec (basically banning version as a valid namespace, as its use in this proposal would conflict with such a use), which I see as undesirable. I am not opposed to a merged HEDVersion field, but we need to have a better way of specifying the base schema within the object.

@VisLab
Copy link
Member Author

VisLab commented Jan 26, 2021

How about the following...I do like the idea of using the local library nicknames as keys rather than my original idea. Other proposals?

"HEDVersion": {
    "Version": "8.0.0",
    "Libraries:" {
        "Lib1": {
              "LibraryName": "library1",
              "Version": "0.0.1"
        },
       "Lib2": {
               "LibraryName": "library2",
                "Version": "HED_library2_0.5.3.xml"
         }
     }
}

@happy5214
Copy link
Member

If we're going to do that, we should probably name it BaseVersion instead of Version or something similar to make the purpose clearer, under the assumption that the string form of HEDVersion would be used when there are no library schemas and thus Version/BaseVersion would never occur alone.

@happy5214
Copy link
Member

After discussion with @VisLab, we're thinking of this idea:

"HEDVersion": {
    "BaseVersion": "8.0.0",
    "Libraries": {
        "Lib1": {
            "LibraryName": "library1",
            "LibraryVersion": "0.0.1"
        },
        "Lib2": {
            "LibraryName": "library2",
            "LibraryVersion": "HED_library2_0.5.3.xml"
        }
    }
}

@happy5214
Copy link
Member

Here is my proposal for passing the schema versions into the actual JavaScript and Python validators (e.g. from inside BIDS):

"path": "",
"version": "",
"libraries": {
    "local": {
        "path": ""
    },
    "remote": {
        "name": "",
        "version": ""
    }
}

No more than one of path or version may be used. If neither is given, the latest base version is used. The options for local and remote are mutually exclusive. If we're going to have latest library version symlinks, we could leave version blank for those too.

@VisLab
Copy link
Member Author

VisLab commented Feb 2, 2021

This sounds reasonable, but I would like to wait for us to finalize until we have worked through some more of the details for the validation. I am assuming that there would be something that would take the BIDS dataset HEDVersion value and return an object that is structured as this proposal?

@VisLab VisLab added the enhancement New feature or request label Feb 11, 2021
@happy5214
Copy link
Member

I realized after talking with @VisLab that my key naming in the example above was confusing. Here is an updated example:

"path": "",
"version": "",
"libraries": {
    "lib1": {
        "path": ""
    },
    "lib2": {
        "name": "",
        "version": ""
    }
}

@smakeig
Copy link
Member

smakeig commented Jul 31, 2021

The library schema itself should include a variable specifying the version of the base schema the library was created to complement, and perhaps another for the latest version of the base schema known to be compatible with the library schema. Of course, appending the lib1: prefix to library terms will avoid code conflict between the base and library schemas, but if a new base schema version adds leaf-node 'term1' and this is also in the lib1 schema, then lib1 maintainers may want to remove it ... ???

@VisLab
Copy link
Member Author

VisLab commented May 24, 2022

Which term is used will depend on how the users tag. We would like users to tag with latest versions of all schemas.

@VisLab
Copy link
Member Author

VisLab commented May 24, 2022

Here is the version proposed in BIDS specification PR#1106:

{
  "Name": "A great experiment",
  "BIDSVersion": "1.7.0",
  "HEDVersion": {
      "base": "8.1.0",
      "libraries": {
          "sc": "score_0.0.1",
          "ts": "testlib_1.0.2"
      }
  }
}

@happy5214
Copy link
Member

This hasn't been updated in a while, and I know there have been many changes made to the proposed structure. Can someone update progress and/or close this issue/link a successor issue?

@VisLab
Copy link
Member Author

VisLab commented Jul 22, 2022

The version format has been finalized and is available at:
https://hed-specification.readthedocs.io/en/latest/07_Library_schema.html#library-schemas-in-bids.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants