Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow control over index sample designation #434

Open
Nicolai-vKuegelgen opened this issue Aug 24, 2023 · 4 comments
Open

Allow control over index sample designation #434

Nicolai-vKuegelgen opened this issue Aug 24, 2023 · 4 comments

Comments

@Nicolai-vKuegelgen
Copy link
Contributor

Is your feature request related to a problem? Please describe.
If a family has more than one affected sample (with the same parents) then snappy will - without informing about this - chose the alphabetically first sample as index, however in some cases this in unwanted behavior as i.e. some vcf files may only be available for a different sample that is the pre-designated index.

Describe the solution you'd like
It should be possible to set samples in a given Family as index. To so a new column (i.e. is_index) needs to be introduced to biomedsheets and also be supported by cubi-tk. This column/information should be optional, as to not disrupt all existing projects

Describe alternatives you've considered
Without this feature one has to change external circumstances to make snappy work as intended. Biomedsheets currently provides the only input for sample information to snappy, so other solutions would require more work.

@Nicolai-vKuegelgen
Copy link
Contributor Author

I also though of an alternative solution that might work without having to change biomedsheets & cubi-tk:
The familyID field (or pedrigree_info in snappy) often contains an ID that matches the intended index sample of the given family. Snappy could be adapted to prefer such a matching sample as index over other potential index samples.

@holtgrewe
Copy link
Member

holtgrewe commented Aug 25, 2023

Let's implement it and call it "AI-driven sample sheet processing".

@Nicolai-vKuegelgen
Copy link
Contributor Author

okay I looked through the code some more and I think snappy does not do index selection it all, it fully relies on biomedsheets for this.

The biomedsheet code controlling Pedigree defintions seems to located in biomedsheets/shortcuts/germline.py, which is then called by the io functions snappy uses the read the tsv-sheets.

Do we need to move this ticket to biomedsheets? To solve this we probably either need to give biodmedsheet (io?) a new option to set the index based on on the 'custom pedigree field' or make this way of selecting index the default is that field is used anyway.

@xiamaz
Copy link
Member

xiamaz commented Sep 12, 2023

Do we have a Index column in Biomedsheet? Seems to be the only sane option, otherwise just send everything to ChatGPT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants