Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to rename samplename within .somalier files? #138

Open
SpikyClip opened this issue Jul 16, 2024 · 2 comments
Open

Is there a way to rename samplename within .somalier files? #138

SpikyClip opened this issue Jul 16, 2024 · 2 comments

Comments

@SpikyClip
Copy link

Hi,

First off, this has been an amazingly useful tool for my work, really appreciate it!

So I didn't realise at the time that samplenames are hardcoded at the somalier relate stage (i.e. renaming the somalier files does not affect the output of relate. If I'm not wrong it actually gets the name from within the VCF/BAM?). This results in issues if a sample was run multiple times across batches and you try to relate them across batches.

Is there any way of renaming the samplename within the .somalier binary files? If I could write a script that recursively looped through my batch folders, appending the batch_id and date_processed, I had run hundreds of these on a per-batch basis. I understand that the appropriate way is to have set output-prefix in the somalier extract stage, but I'd rather not have to recall all these bams/vcfs to rerun somalier if possible.

It would be great if somalier relate had some sort of --samplename-from-filename flag that would rely on the filename for the samplename (though admittedly, it feels a little hacky). Or a simple --samplesheet samplenames.csv that maps two columns, sample,somalier_path for renaming.

@brentp
Copy link
Owner

brentp commented Jul 17, 2024

Hi, glad to hear it's useful.
you could use this python script: https://github.com/brentp/somalier/blob/master/scripts/ancestry-predict.py
to see the format of the .somalier files, specifically the read_somalier function. You could then write out with a new name and name length with all else mostly the same.
you can reverse with int.to_bytes and arr.tobytes() to reverse the operations you see there.

@SpikyClip
Copy link
Author

Thanks for the response, I'll give it a shot when I have the time and let you know how I go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants