Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix meta.yml structure for input and output tuples #4983

Closed
Tracked by #5828
ewels opened this issue Feb 25, 2024 · 9 comments
Closed
Tracked by #5828

Fix meta.yml structure for input and output tuples #4983

ewels opened this issue Feb 25, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@ewels
Copy link
Member

ewels commented Feb 25, 2024

The current contents of the meta.yml files does not correspond to the actual processes in most modules. This limits how useful these metadata files can be.

For example, consider abacas. The meta.yml file shows 3 inputs:

input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- scaffold:
type: file
description: Fasta file containing scaffold
pattern: "*.{fasta,fa}"
- fasta:
type: file
description: FASTA reference file
pattern: "*.{fasta,fa}"

However, there are only 2 inputs in main.nf (the first one being a tuple with two elements):

input:
tuple val(meta), path(scaffold)
path fasta

The correct YAML structure to describe this should be:

 input: 
   -
       - meta: 
           type: map 
           description: | 
             Groovy Map containing sample information 
             e.g. [ id:'test', single_end:false ] 
       - scaffold: 
           type: file 
           description: Fasta file containing scaffold 
           pattern: "*.{fasta,fa}" 
   - fasta: 
       type: file 
       description: FASTA reference file 
       pattern: "*.{fasta,fa}" 

The difference is pretty difficult to see in YAML (*cough* see #11 *cough*), but now we have two inputs. The first is an array containing two elements.

This error seems to be present in basically any modules containing a tuple, as all meta.yml files seems to have a flat array of inputs.

We should be doing better automation of this file from nf-core/tools in order to solve this problem. I started a brief proof of concept on how we can do this in nf-core/tools#2789 and described a rough outline of what needs doing.

@ewels ewels added the bug Something isn't working label Feb 25, 2024
@nvnieuwk
Copy link
Contributor

I fully agree with this! Maybe using [] instead of - for arrays will make this a bit more clear to read?

@nvnieuwk
Copy link
Contributor

Although I'm not sure it's possible to create multiline lists that way

@nvnieuwk
Copy link
Contributor

@mashehu
Copy link
Contributor

mashehu commented Feb 26, 2024

tbh: also quite easy to overlook what is inside the array in json:

{
  "input": [
    [
      {
        "meta": {
          "type": "map",
          "description": "Groovy Map containing sample information \ne.g. [ id:'test', single_end:false ] \n"
        }
      },
      {
        "scaffold": {
          "type": "file",
          "description": "Fasta file containing scaffold",
          "pattern": "*.{fasta,fa}"
        }
      }
    ],
    {
      "fasta": {
        "type": "file",
        "description": "FASTA reference file",
        "pattern": "*.{fasta,fa}"
      }
    }
  ]
}

how about adding comment, so it is not just one lonely dash, helping to distinguish things:

 input: 
   - #input tuple
       - meta: 
           type: map 
           description: | 
             Groovy Map containing sample information 
             e.g. [ id:'test', single_end:false ] 
       - scaffold: 
           type: file 
           description: Fasta file containing scaffold 
           pattern: "*.{fasta,fa}" 
   - fasta: 
       type: file 
       description: FASTA reference file 
       pattern: "*.{fasta,fa}" 

@ewels
Copy link
Member Author

ewels commented Feb 26, 2024

tbh: also quite easy to overlook what is inside the array in json:

Overlook yes, but harder to break it by accidentally messing with indentation at least. Like the comment idea 👍🏻

@nvnieuwk - remember that we will be running stuff through Prettier which has a habit of destroying any fancy formatting like that 😉 But yes - JSON is valid YAML, so makes sense that we could write it in a JSON-like way.

I'm not super fussed about the language we use - hopefully we can do more automation and linting so will pick up anything that looks wrong automatically. Main thing is to fix the structure.

@nvnieuwk
Copy link
Contributor

I like @mashehu's idea a lot, maybe the best of both worlds (in YAML)

@jfy133
Copy link
Member

jfy133 commented May 31, 2024

@pinin4fjords (maintainers team disccusion) suggests not a comment, but just duplicate the name first element of tuple

 input: 
   - scaffold
       - meta: 
           type: map 
           description: | 
             Groovy Map containing sample information 
             e.g. [ id:'test', single_end:false ] 
       - scaffold: 
           type: file 
           description: Fasta file containing scaffold 
           pattern: "*.{fasta,fa}" 
   - fasta: 
       type: file 
       description: FASTA reference file 
       pattern: "*.{fasta,fa}" 

OR

input: 
  - bam
      - meta: 
          type: map 
          description: | 
            Groovy Map containing sample information 
            e.g. [ id:'test', single_end:false ] 
      - bam: 
          type: file 
          description: BAM
          pattern: "*.{bam},
      - bai: 
          type: file 
          description: BAM index file
          pattern: "*.{bai}" 
  - fasta: 
      type: file 
      description: FASTA reference file 
      pattern: "*.{fasta,fa}" 

@JoseEspinosa
Copy link
Member

I would be more inclined to adopt @mashehu suggestion of using a comment and it seems to me more machine-readable. My point is that if you use the first element of the tuple what if you have a channel that has a structure that encodes something different in the the first element of the tuple. Maybe the indentation is enough though to parse it...
Anyhow, my point is that we should keep in mind that at some point we may want to use these files to automatically generate a pipeline or subworkflow and it would be nice that the yaml file is as machine-readable as possible.

@ewels
Copy link
Member Author

ewels commented Jun 2, 2024

Agree, my vote is YAML with comment or switch to JSON.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

5 participants