Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema YAML files with slot_usages #1469

Closed
turbomam opened this issue Dec 4, 2023 · 8 comments
Closed

schema YAML files with slot_usages #1469

turbomam opened this issue Dec 4, 2023 · 8 comments

Comments

@turbomam
Copy link
Contributor

turbomam commented Dec 4, 2023

grep -r -c slot_usage src/schema | grep -v ':0'
src/schema/prov.yaml:1
src/schema/annotation.yaml:3
src/schema/core.yaml:11
src/schema/nmdc.yaml:13
src/schema/workflow_execution_activity.yaml:13

slot attributes modified:

  • annotations
  • any_of (over ranges)
  • comments
  • description
  • maximum_cardinality
  • minimum_cardinality
  • notes
  • pattern asserting a pattern on an object property leads to RDF with errors
  • range
  • required
  • structured_pattern
@turbomam
Copy link
Contributor Author

turbomam commented Dec 4, 2023

src/schema/prov.yaml:1

Activity:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:act-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true

@turbomam
Copy link
Contributor Author

turbomam commented Dec 4, 2023

src/schema/annotation.yaml

GenomeFeature:
  slot_usage:
    seqid:
      required: true
    type:
      range: OntologyClass
      description: A type from the sequence ontology
    start:
      required: true
    end:
      required: true
Pathway:
  slot_usage:
    has_part:
      range: Reaction
      required: true
      description: >-
        A pathway can be broken down to a series of reaction step
FunctionalAnnotation:
  slot_usage:
    has_function:
      notes:
        - this slot had been called id
        - "Still missing patterns for COG and RetroRules."
        - "These patterns aren't tied to the listed prefixes. A discussion about that possibility had been started, including the question of whether these lists are intended to be open examples or closed"
    type:
      range: OntologyClass
      description: TODO
    was_generated_by:
      description: provenance for the annotation.
      notes: To be consistent with the rest of the NMDC schema we use the PROV annotation model, rather than GPAD
      range: MetagenomeAnnotationActivity

@turbomam
Copy link
Contributor Author

turbomam commented Dec 4, 2023

src/schema/core.yaml

ProcessedSample:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:procsm-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
AnalyticalSample:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:ansm-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
Site:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:site-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
PlannedProcess:
  slot_usage:
    designated_class:
      comments:
        - required on all instances in a polymorphic Database slot like planned_process_set
OntologyClass:
  slot_usage:
    id:
      pattern: '^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$'
AttributeValue:
  slot_usage:
    type:
      description: An optional string that specified the type of object.
QuantityValue:
  slot_usage:
    has_raw_value:
      description: Unnormalized atomic string representation, should in syntax {number} {unit}
    has_unit:
      description: The unit of the quantity
    has_numeric_value:
      description: The number part of the quantity
      range: double
PersonValue:
  slot_usage:
    orcid:
      annotations:
        display_hint: Open Researcher and Contributor ID for this person. See https://orcid.org
    email:
      annotations:
        display_hint: Email address for this person.
    has_raw_value:
      description: The full name of the Investigator in format FIRST LAST.
      notes:
        - May eventually be deprecated in favor of "name".
    name:
      description: >-
        The full name of the Investigator.
        It should follow the format FIRST [MIDDLE NAME| MIDDLE INITIAL] LAST, where MIDDLE NAME| MIDDLE INITIAL is optional.
      annotations:
        display_hint: First name, middle initial, and last name of this person.
ProteinQuantification:
  slot_usage:
    best_protein:
      description: the specific protein identifier most correctly grouped to its associated peptide sequences
    all_proteins:
      description: the grouped list of protein identifiers associated with the peptide sequences that were grouped to a best protein
ControlledIdentifiedTermValue:
  slot_usage:
    term:
      required: true
GeolocationValue:
  slot_usage:
    has_raw_value:
      description: The raw value for a geolocation should follow {latitude} {longitude}
    latitude:
      required: true
    longitude:
      required: true

@turbomam
Copy link
Contributor Author

turbomam commented Dec 4, 2023

src/schema/nmdc.yaml

The following slot_usages are currently commented out. Everything else in this issue is active

  • OmicsProcessing only patterns on object properties
  • Study almost all OK now
  • Biosample mostly
Pooling:
  slot_usage:
    has_input:
      minimum_cardinality: 2
    has_output:
      minimum_cardinality: 1
      maximum_cardinality: 1
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:poolp-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
Extraction:
  slot_usage:
    has_input:
      required: true
    has_output:
      required: true
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:extrp-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
LibraryPreparation:
  slot_usage:
    has_input:
      required: true
    has_output:
      required: true
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:libprp-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
FieldResearchSite:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:frsite-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
CollectingBiosamplesFromSite:
  slot_usage:
    has_input:
      range: Site
      required: true
    has_output:
      range: Biosample
      required: true
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:clsite-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
DataObject:
  slot_usage:
    name:
      required: true
    description:
      required: true
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:dobj-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
BiosampleProcessing:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:bsmprc-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
    has_input:
      range: Biosample
SubSamplingProcess:
  slot_usage:
    volume:
      description: The output volume of the SubSampling Process.
    mass:
      description: The output mass of the SubSampling Process.
    has_input:
      any_of:
        - range: Biosample
        - range: ProcessedSample
    has_output:
      range: ProcessedSample
      description: The subsample.
MixingProcess:
    slot_usage:
      volume:
        description: The volume of sample filtered.

@turbomam
Copy link
Contributor Author

turbomam commented Dec 4, 2023

src/schema/workflow_execution_activity.yaml

WorkflowExecutionActivity:
  slot_usage:
    started_at_time:
      required: true
    ended_at_time:
      required: true
    git_url:
      required: true
    has_input:
      required: true
    has_output:
      required: true
    execution_resource:
      required: true
    type:
      required: true
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:wf-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
MetagenomeAssembly:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:wfmgas-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
MetatranscriptomeAssembly:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:wfmtas-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
MetagenomeAnnotationActivity:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:wfmgan-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
MetatranscriptomeAnnotationActivity
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:wfmtan-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
MetatranscriptomeActivity:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:wfmt-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
MetatranscriptomeActivity:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:wfmt-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
MagsAnalysisActivity:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:wfmag-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
MetagenomeSequencingActivity:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:wfmsa-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
ReadQcAnalysisActivity:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:wfrqc-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
ReadBasedTaxonomyAnalysisActivity:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:wfrbt-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
MetabolomicsAnalysisActivity:
  slot_usage:
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:wfmb-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
MetaproteomicsAnalysisActivity:
  slot_usage:
    used:
      description: The instrument used to collect the data used in the analysis
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:wfmp-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true
NomAnalysisActivity:
  slot_usage:
    used:
      range: string
      description: The instrument used to collect the data used in the analysis
    id:
      required: true
      structured_pattern:
        syntax: "{id_nmdc_prefix}:wfnom-{id_shoulder}-{id_blade}{id_version}{id_locus}"
        interpolated: true

@turbomam
Copy link
Contributor Author

turbomam commented Jan 2, 2024

oops, this is for some other repo that I work in. will move soon.

@pbuttigieg
Copy link
Member

Thanks - was confused

@turbomam
Copy link
Contributor Author

shoot, I don't think I can move this issue out of this org. I will just copy and paste and then delete here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants