Allow control over index sample designation #434

Nicolai-vKuegelgen · 2023-08-24T14:32:49Z

Is your feature request related to a problem? Please describe.
If a family has more than one affected sample (with the same parents) then snappy will - without informing about this - chose the alphabetically first sample as index, however in some cases this in unwanted behavior as i.e. some vcf files may only be available for a different sample that is the pre-designated index.

Describe the solution you'd like
It should be possible to set samples in a given Family as index. To so a new column (i.e. is_index) needs to be introduced to biomedsheets and also be supported by cubi-tk. This column/information should be optional, as to not disrupt all existing projects

Describe alternatives you've considered
Without this feature one has to change external circumstances to make snappy work as intended. Biomedsheets currently provides the only input for sample information to snappy, so other solutions would require more work.

Nicolai-vKuegelgen · 2023-08-25T07:00:14Z

I also though of an alternative solution that might work without having to change biomedsheets & cubi-tk:
The familyID field (or pedrigree_info in snappy) often contains an ID that matches the intended index sample of the given family. Snappy could be adapted to prefer such a matching sample as index over other potential index samples.

holtgrewe · 2023-08-25T07:03:08Z

Let's implement it and call it "AI-driven sample sheet processing".

Nicolai-vKuegelgen · 2023-09-01T12:41:28Z

okay I looked through the code some more and I think snappy does not do index selection it all, it fully relies on biomedsheets for this.

The biomedsheet code controlling Pedigree defintions seems to located in biomedsheets/shortcuts/germline.py, which is then called by the io functions snappy uses the read the tsv-sheets.

Do we need to move this ticket to biomedsheets? To solve this we probably either need to give biodmedsheet (io?) a new option to set the index based on on the 'custom pedigree field' or make this way of selecting index the default is that field is used anyway.

xiamaz · 2023-09-12T15:09:25Z

Do we have a Index column in Biomedsheet? Seems to be the only sane option, otherwise just send everything to ChatGPT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow control over index sample designation #434

Allow control over index sample designation #434

Nicolai-vKuegelgen commented Aug 24, 2023

Nicolai-vKuegelgen commented Aug 25, 2023

holtgrewe commented Aug 25, 2023 •

edited

Loading

Nicolai-vKuegelgen commented Sep 1, 2023

xiamaz commented Sep 12, 2023

Allow control over index sample designation #434

Allow control over index sample designation #434

Comments

Nicolai-vKuegelgen commented Aug 24, 2023

Nicolai-vKuegelgen commented Aug 25, 2023

holtgrewe commented Aug 25, 2023 • edited Loading

Nicolai-vKuegelgen commented Sep 1, 2023

xiamaz commented Sep 12, 2023

holtgrewe commented Aug 25, 2023 •

edited

Loading