Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools +split will fail if sample names contain the "/" character #1404

Closed
freeseek opened this issue Feb 6, 2021 · 4 comments
Closed

Comments

@freeseek
Copy link
Contributor

freeseek commented Feb 6, 2021

If a sample name contains the / character, such as A/B, bcftools split will fail with error message:

[E::hts_open_format] Failed to open file "./A/B.vcf" : No such file or directory
[init_data] Error: cannot write to "./A/B.vcf": No such file or directory

This is mostly the result of the VCF format not restricting sample names from using any weird characters, while UNIX filesystem don't allow / character in file names.

One quick solution would be to change this line of code in split.c:

        for (k=l; k<str.l; k++) if ( isspace(str.s[k]) ) str.s[k] = '_';

to

        for (k=l; k<str.l; k++) if ( isspace(str.s[k]) || str.s[k] == "/" ) str.s[k] = '_';

However, do notice that this fix would work okay for UNIX filesystems, while a similar issue would arise with Windows filesystems with a wider range of characters.

@freeseek freeseek changed the title bcftools +split will fail if sample names contain the "/ bcftools +split will fail if sample names contain the "/" character Feb 6, 2021
@pd3 pd3 closed this as completed in 200bbba Feb 10, 2021
@pd3
Copy link
Member

pd3 commented Feb 10, 2021

Thank you for raising the issue. This is now addressed by 200bbba.

@freeseek
Copy link
Contributor Author

freeseek commented Sep 7, 2021

There is a minor regression bug in the last fix. The code:

        char *suffix = NULL;
        if ( args->output_type & FT_BCF ) suffix = "bcf";
        else if ( args->output_type & FT_GZ ) suffix = ".vcf.gz";
        else suffix = ".vcf";

is missing a . before bcf, causing the output file names to be incorrect.

@pd3
Copy link
Member

pd3 commented Sep 8, 2021

This is now fixed, thank you.

(Note that the current develop is waiting for samtools/htslib#1327 to be merged)

@freeseek
Copy link
Contributor Author

freeseek commented Sep 8, 2021

No need for me to push a new release just for this issue. I have it sorted out on my end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants