Skip to content

not a valid Avro schema with enum arrays #1285

@kmavrommatis

Description

@kmavrommatis

Hi,
I am using cwltool version 1.0.20191022103248

I have a tool which contains an input of type array enum (called keep_types)
Tool:

class: CommandLineTool
cwlVersion: v1.0
id: bcftools_view
baseCommand:
  - bcftools
  - view
inputs:
  - id: inputVCF
    type: File
    inputBinding:
      position: 4
      shellQuote: false
    label: Input VCF
    secondaryFiles:
      - .tbi
  - id: phased
    type: boolean?
    inputBinding:
      position: 0
      prefix: '--phased'
      shellQuote: false
    doc: >-
      print sites where all samples are phased. Haploid genotypes are considered
      phased. Missing genotypes considered unphased unless the phased bit is
      set.
  - id: exclude_phased
    type: boolean?
    inputBinding:
      position: 0
      prefix: '--exclude-phased'
      shellQuote: false
    doc: exclude sites where all samples are phased
  - id: keep_types
    type:
      - 'null'
      - type: array
        items:
          type: enum
          name: keep_types
          symbols: 
            - snps
            - indels
            - mnps
            - other
    inputBinding:
      position: 0
      prefix: '--types'
      shellQuote: false
      itemSeparator: ','
    label: Types to keep
    doc: >-
      comma-separated list of variant types to select. Site is selected if any
      of the ALT alleles is of the type requested. Types are determined by
      comparing the REF and ALT alleles in the VCF record not INFO tags like
      INFO/INDEL or INFO/VT. Use --include to select based on INFO tags.
  - id: exclude_types
    type:
      - 'null'
      - type: array
        items:
          type: enum
          name: exclude_types
          symbols:
            - snps
            - indels
            - mnps
            - ref
            - bnd
            - other
    inputBinding:
      position: 0
      prefix: '--exclude-types'
      separate: false
      itemSeparator: ','
      shellQuote: false
    doc: >-
      comma-separated list of variant types to exclude. Site is excluded if any
      of the ALT alleles is of the type requested. Types are determined by
      comparing the REF and ALT alleles in the VCF record not INFO tags like
      INFO/INDEL or INFO/VT. Use --exclude to exclude based on INFO tags.
  - id: bcftools_view_threads
    type: int?
    inputBinding:
      position: 0
      prefix: '--threads'
      shellQuote: false
outputs:
  - id: FilteredVCF
    type: File
    outputBinding:
      glob: '*.vcf.gz'
    secondaryFiles:
      - .tbi
label: bcftools view
arguments:
  - position: 0
    prefix: '-o'
    shellQuote: false
    valueFrom: $(inputs.inputVCF.basename)
  - position: 0
    prefix: '-O'
    shellQuote: false
    valueFrom: z
  - position: 100
    prefix: ''
    shellQuote: false
    valueFrom: '&& tabix -p vcf'
  - position: 101
    prefix: ''
    shellQuote: false
    valueFrom: $(inputs.inputVCF.basename)
requirements:
  - class: ShellCommandRequirement
  - class: ResourceRequirement
    ramMin: 4000
    coresMin: $(inputs.bcftools_view_threads)
  - class: DockerRequirement
    dockerPull: 'samtools:v1.6.1'
  - class: InlineJavascriptRequirement


and the workflow that uses it:

class: Workflow
cwlVersion: v1.0
id: wf_filtering_germline
doc: Apply hard filters to vcf files for germilne calls
label: Variant Filtering Germline
inputs:
  - id: inputVCF
    type: File
    secondaryFiles:
      - .tbi
outputs:
  - id: FilteredVCF
    outputSource:
      - bcftools_view/FilteredVCF
    type: File
steps:
  - id: bcftools_view
    in:
      - id: inputVCF
        source: inputVCF
      - id: keep_types
        default:
          - snps
    out:
      - id: FilteredVCF
    run: ../../tools/bcftools-view.cwl
    label: bcftools view
requirements: []


when I run cwltool I get the error:

Traceback (most recent call last):
  File "/home/kmavrommatis/venv3/lib/python3.6/site-packages/schema_salad/avro/schema.py", line 402, in __init__
    items_schema = make_avsc_object(items, names)
  File "/home/kmavrommatis/venv3/lib/python3.6/site-packages/schema_salad/avro/schema.py", line 578, in make_avsc_object
    return EnumSchema(name, namespace, symbols, names, doc, other_props)
  File "/home/kmavrommatis/venv3/lib/python3.6/site-packages/schema_salad/avro/schema.py", line 373, in __init__
    NamedSchema.__init__(self, "enum", name, namespace, names, other_props)
  File "/home/kmavrommatis/venv3/lib/python3.6/site-packages/schema_salad/avro/schema.py", line 249, in __init__
    new_name = names.add_name(name, namespace, self)
  File "/home/kmavrommatis/venv3/lib/python3.6/site-packages/schema_salad/avro/schema.py", line 215, in add_name
    raise SchemaParseException(fail_msg)
schema_salad.avro.schema.SchemaParseException: The name "keep_types" is already in use.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kmavrommatis/venv3/lib/python3.6/site-packages/cwltool/workflow.py", line 809, in job
    runtimeContext):
  File "/home/kmavrommatis/venv3/lib/python3.6/site-packages/cwltool/command_line_tool.py", line 430, in job
    builder = self._init_job(job_order, runtimeContext)
  File "/home/kmavrommatis/venv3/lib/python3.6/site-packages/cwltool/process.py", line 748, in _init_job
    discover_secondaryFiles=getdefault(runtime_context.toplevel, False)))
  File "/home/kmavrommatis/venv3/lib/python3.6/site-packages/cwltool/builder.py", line 276, in bind_input
    bindings.extend(self.bind_input(f, datum[f["name"]], lead_pos=lead_pos, tail_pos=f["name"], discover_secondaryFiles=discover_secondaryFiles))
  File "/home/kmavrommatis/venv3/lib/python3.6/site-packages/cwltool/builder.py", line 244, in bind_input
    avsc = make_avsc_object(convert_to_dict(t), self.names)
  File "/home/kmavrommatis/venv3/lib/python3.6/site-packages/schema_salad/avro/schema.py", line 589, in make_avsc_object
    return ArraySchema(items, names, other_props)
  File "/home/kmavrommatis/venv3/lib/python3.6/site-packages/schema_salad/avro/schema.py", line 406, in __init__
    "names: %s)" % (items, err, list(names.names.keys()))
schema_salad.avro.schema.SchemaParseException: Items schema ({'type': 'enum', 'name': 'keep_types', 'symbols': ['snps', 'indels', 'mnps', 'other']}) not a valid Avro schema: The name "keep_types" is already in use. (known names: ['File', 'File_class', 'Directory', 'Directory_class', 'Any', 'input_record_schema', 'keep_types', 'exclude_types', 'outputs_record_schema'])
ERROR [step bcftools_view_1] Cannot make job: Items schema ({'type': 'enum', 'name': 'keep_types', 'symbols': ['snps', 'indels', 'mnps', 'other']}) not a valid Avro schema: The name "keep_types" is already in use. (known names: ['File', 'File_class', 'Directory', 'Directory_class', 'Any', 'input_record_schema', 'keep_types', 'exclude_types', 'outputs_record_schema'])



If I switch the keep_types argument in the CommandLine tool to type enum the workflow works.

Thanks in advance for your help

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions