You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First off, this has been an amazingly useful tool for my work, really appreciate it!
So I didn't realise at the time that samplenames are hardcoded at the somalier relate stage (i.e. renaming the somalier files does not affect the output of relate. If I'm not wrong it actually gets the name from within the VCF/BAM?). This results in issues if a sample was run multiple times across batches and you try to relate them across batches.
Is there any way of renaming the samplename within the .somalier binary files? If I could write a script that recursively looped through my batch folders, appending the batch_id and date_processed, I had run hundreds of these on a per-batch basis. I understand that the appropriate way is to have set output-prefix in the somalier extract stage, but I'd rather not have to recall all these bams/vcfs to rerun somalier if possible.
It would be great if somalier relate had some sort of --samplename-from-filename flag that would rely on the filename for the samplename (though admittedly, it feels a little hacky). Or a simple --samplesheet samplenames.csv that maps two columns, sample,somalier_path for renaming.
The text was updated successfully, but these errors were encountered:
Hi, glad to hear it's useful.
you could use this python script: https://github.com/brentp/somalier/blob/master/scripts/ancestry-predict.py
to see the format of the .somalier files, specifically the read_somalier function. You could then write out with a new name and name length with all else mostly the same.
you can reverse with int.to_bytes and arr.tobytes() to reverse the operations you see there.
Hi,
First off, this has been an amazingly useful tool for my work, really appreciate it!
So I didn't realise at the time that samplenames are hardcoded at the somalier
relate
stage (i.e. renaming the somalier files does not affect the output ofrelate
. If I'm not wrong it actually gets the name from within the VCF/BAM?). This results in issues if a sample was run multiple times across batches and you try torelate
them across batches.Is there any way of renaming the samplename within the .somalier binary files? If I could write a script that recursively looped through my batch folders, appending the
batch_id
anddate_processed
, I had run hundreds of these on a per-batch basis. I understand that the appropriate way is to have setoutput-prefix
in thesomalier extract
stage, but I'd rather not have to recall all these bams/vcfs to rerun somalier if possible.It would be great if
somalier relate
had some sort of--samplename-from-filename
flag that would rely on the filename for the samplename (though admittedly, it feels a little hacky). Or a simple--samplesheet samplenames.csv
that maps two columns,sample,somalier_path
for renaming.The text was updated successfully, but these errors were encountered: