Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chr issue in BAM #15

Open
sybrohee opened this issue Nov 16, 2022 · 1 comment
Open

Chr issue in BAM #15

sybrohee opened this issue Nov 16, 2022 · 1 comment

Comments

@sybrohee
Copy link

Hi all,

When running vargrouper on a VCF file on hg38 with the corresponding BAM and FASTA, I ran into the following issue.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/celltics-_version_-py3.8.egg/celltics/tools/vargroup.py", line 667, in <module>
    cli()
  File "/usr/local/lib/python3.8/dist-packages/click-8.1.3-py3.8.egg/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click-8.1.3-py3.8.egg/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click-8.1.3-py3.8.egg/click/core.py", line 1635, in invoke
    rv = super().invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click-8.1.3-py3.8.egg/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click-8.1.3-py3.8.egg/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/celltics-_version_-py3.8.egg/celltics/tools/vargroup.py", line 661, in cli
    main(input_file=input_file, output_file=output_file, bam_file=bam_file, merge_distance=merge_distance,
  File "/usr/local/lib/python3.8/dist-packages/celltics-_version_-py3.8.egg/celltics/tools/vargroup.py", line 628, in main
    records, var_dict = bam_and_merge_multiprocess(bam_file, vars_to_group, fq_threshold, min_reads,
  File "/usr/local/lib/python3.8/dist-packages/celltics-_version_-py3.8.egg/celltics/tools/vargroup.py", line 608, in bam_and_merge_multiprocess
    recs, var_dict_part = r.get().get_fat() if not debug and nthreads > 1 else r.get_fat()
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 768, in get
    raise self._value
ValueError: invalid contig `chrchr11`

This seems to be due to the fact that vargrouper adds "chr" to the chromosome name when interrogating the BAM as it is supposed to work with a VCF file without chr and BAM that may potentially have one (function check_for_chr)

def check_for_chr(sam):
    """ Check sam file to see if 'chr' needs to be prepended to chromosome """
    if 'chr' in sam.references[0]:
        return True
    return False

It could be of interest to modify the function check_for_chr, so that it also takes the vcf file as an argument and checks that chr is used or not as prefix in the chromosome coordinates.

@sybrohee
Copy link
Author

At the moment I solved the issue by making function check_for_chr always return False (if the fasta, bam and vcf have the same conting annotation it is OK).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant