Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure to circularise: cannot use this pair because longer match was found #75

Closed
mhulin opened this issue Oct 5, 2016 · 10 comments
Closed

Comments

@mhulin
Copy link

mhulin commented Oct 5, 2016

Hi,

I am trying to use circlator to circularise my pacbio bacterial genomes. The genome has 1 chromosome and 5 plasmids. Using the default command "circlator all" it successfully circularises the chromosome and 4 of the contigs. However one contig (tig00000009) is failing to circularise, and I'm not sure why.

In the 04.merge.circularise_details.log file it says "cannot use this pair because longer match was found", I assume this is where it is going wrong. Could you explain what this means?

Here is the log file:

[merge circularise_details] tig00000009 Checking 3 nucmer hits
[merge circularise_details] tig00000009 potential pair of nucmer hits for circularization:
[merge circularise_details] tig00000009 1 13475 13476 1 13475 13476 99.99 39155 21213 1 tig00000009 NODE_6_length_21213_cov_79.7433_ID_37074
[merge circularise_details] tig00000009 33247 39155 21213 15305 5909 5909 100.00 39155 21213 1 tig00000009 NODE_6_length_21213_cov_79.7433_ID_37074
[merge circularise_details] tig00000009 cannot use this pair because longer match was found
[merge circularise_details] tig00000008 Checking 2 nucmer hits
[merge circularise_details] tig00000008 potential pair of nucmer hits for circularization:
[merge circularise_details] tig00000008 1 30669 12112 42783 30669 30672 99.99 62583 42783 1 tig00000008 NODE_5_length_42783_cov_88.7264_ID_37072
[merge circularise_details] tig00000008 30593 62583 1 31996 31991 31996 99.97 62583 42783 1 tig00000008 NODE_5_length_42783_cov_88.7264_ID_37072
[merge circularise_details] tig00000008 can use this pair of hits
[merge circularise_details] tig00000000 Checking 2 nucmer hits
[merge circularise_details] tig00000000 potential pair of nucmer hits for circularization:
[merge circularise_details] tig00000000 1 50000 16397 66396 50000 50000 100.00 6257505 66396 1 tig00000000 NODE_3_length_66396_cov_30.9833_ID_37068
[merge circularise_details] tig00000000 6226450 6257505 1 31056 31056 31056 100.00 6257505 66396 1 tig00000000 NODE_3_length_66396_cov_30.9833_ID_37068
[merge circularise_details] tig00000000 can use this pair of hits
[merge circularise_details] tig00000005 Checking 2 nucmer hits
[merge circularise_details] tig00000005 potential pair of nucmer hits for circularization:
[merge circularise_details] tig00000005 1 48690 20084 68774 48690 48691 99.99 124726 68774 1 tig00000005 NODE_2_length_68774_cov_97.0195_ID_37066
[merge circularise_details] tig00000005 82779 124726 1 41948 41948 41948 100.00 124726 68774 1 tig00000005 NODE_2_length_68774_cov_97.0195_ID_37066
[merge circularise_details] tig00000005 can use this pair of hits
[merge circularise_details] tig00000004 Checking 3 nucmer hits
[merge circularise_details] tig00000004 potential pair of nucmer hits for circularization:
[merge circularise_details] tig00000004 1 49612 16420 66030 49612 49611 99.99 117048 66030 1 tig00000004 NODE_4_length_66030_cov_80.4157_ID_37070
[merge circularise_details] tig00000004 81423 117048 1 35629 35626 35629 99.99 117048 66030 1 tig00000004 NODE_4_length_66030_cov_80.4157_ID_37070
[merge circularise_details] tig00000004 can use this pair of hits
[merge circularise_details] tig00000006 Checking 5 nucmer hits
[merge circularise_details] tig00000006 potential pair of nucmer hits for circularization:
[merge circularise_details] tig00000006 3332 63098 59767 1 59767 59767 100.00 91442 69519 1 tig00000006 NODE_1_length_69519_cov_84.964_ID_37064
[merge circularise_details] tig00000006 61023 91442 69519 39100 30420 30420 100.00 91442 69519 1 tig00000006 NODE_1_length_69519_cov_84.964_ID_37064
[merge circularise_details] tig00000006 can use this pair of hits
[merge circularise_details]
[merge circularise_details] SPAdes reassembly contigs that are circular: None
[merge circularise_details]
[merge circularise_details] tig00000009 Trying to circularize. Has nucmer hits to check...
[merge circularise_details] tig00000009 No matches to SPAdes circular contigs
[merge circularise_details] tig00000009 Could not circularize using matches to SPAdes circular contigs
[merge circularise_details] tig00000009 Cannot circularize: no suitable nucmer hits
[merge circularise_details] tig00000009 Circularized: no
[merge circularise_details]
[merge circularise_details] tig00000008 Trying to circularize. Has nucmer hits to check...
[merge circularise_details] tig00000008 Could not circularize using matches to SPAdes circular contigs
[merge circularise_details] tig00000008 Circularizing using this pair of nucmer matches to SPAdes contig:
[merge circularise_details] tig00000008 1 30669 12112 42783 30669 30672 99.99 62583 42783 1 tig00000008 NODE_5_length_42783_cov_88.7264_ID_37072
[merge circularise_details] tig00000008 30593 62583 1 31996 31991 31996 99.97 62583 42783 1 tig00000008 NODE_5_length_42783_cov_88.7264_ID_37072
[merge circularise_details] tig00000008 Circularized: yes
[merge circularise_details]
[merge circularise_details] tig00000000 Trying to circularize. Has nucmer hits to check...
[merge circularise_details] tig00000000 No matches to SPAdes circular contigs
[merge circularise_details] tig00000000 Could not circularize using matches to SPAdes circular contigs
[merge circularise_details] tig00000000 Circularizing using this pair of nucmer matches to SPAdes contig:
[merge circularise_details] tig00000000 1 50000 16397 66396 50000 50000 100.00 6257505 66396 1 tig00000000 NODE_3_length_66396_cov_30.9833_ID_37068
[merge circularise_details] tig00000000 6226450 6257505 1 31056 31056 31056 100.00 6257505 66396 1 tig00000000 NODE_3_length_66396_cov_30.9833_ID_37068
[merge circularise_details] tig00000000 Circularized: yes
[merge circularise_details]
[merge circularise_details] tig00000005 Trying to circularize. Has nucmer hits to check...
[merge circularise_details] tig00000005 No matches to SPAdes circular contigs
[merge circularise_details] tig00000005 Could not circularize using matches to SPAdes circular contigs
[merge circularise_details] tig00000005 Circularizing using this pair of nucmer matches to SPAdes contig:
[merge circularise_details] tig00000005 1 48690 20084 68774 48690 48691 99.99 124726 68774 1 tig00000005 NODE_2_length_68774_cov_97.0195_ID_37066
[merge circularise_details] tig00000005 82779 124726 1 41948 41948 41948 100.00 124726 68774 1 tig00000005 NODE_2_length_68774_cov_97.0195_ID_37066
[merge circularise_details] tig00000005 Circularized: yes
[merge circularise_details]
[merge circularise_details] tig00000004 Trying to circularize. Has nucmer hits to check...
[merge circularise_details] tig00000004 No matches to SPAdes circular contigs
[merge circularise_details] tig00000004 Could not circularize using matches to SPAdes circular contigs
[merge circularise_details] tig00000004 Circularizing using this pair of nucmer matches to SPAdes contig:
[merge circularise_details] tig00000004 1 49612 16420 66030 49612 49611 99.99 117048 66030 1 tig00000004 NODE_4_length_66030_cov_80.4157_ID_37070
[merge circularise_details] tig00000004 81423 117048 1 35629 35626 35629 99.99 117048 66030 1 tig00000004 NODE_4_length_66030_cov_80.4157_ID_37070
[merge circularise_details] tig00000004 Circularized: yes
[merge circularise_details]
[merge circularise_details] tig00000006 Trying to circularize. Has nucmer hits to check...
[merge circularise_details] tig00000006 No matches to SPAdes circular contigs
[merge circularise_details] tig00000006 Could not circularize using matches to SPAdes circular contigs
[merge circularise_details] tig00000006 Circularizing using this pair of nucmer matches to SPAdes contig:
[merge circularise_details] tig00000006 3332 63098 59767 1 59767 59767 100.00 91442 69519 1 tig00000006 NODE_1_length_69519_cov_84.964_ID_37064
[merge circularise_details] tig00000006 61023 91442 69519 39100 30420 30420 100.00 91442 69519 1 tig00000006 NODE_1_length_69519_cov_84.964_ID_37064
[merge circularise_details] tig00000006 Circularized: yes

Thank you very much in advance.

@martinghunt
Copy link
Contributor

Hi,

Circlator is conservative and doesn't make a change if there is ambiguity. In your case there the spades contig NODE_6_length_21213_cov_79.7433_ID_37074 spans across the start/end of your input contig tig00000009. However, there is also a nucmer match between NODE_6_length_21213_cov_79.7433_ID_37074 and somewhere else in the input contigs that is longer than either of the two matches at the start/end to tig00000009. This makes circlator not change tig00000009 because it can't be sure that NODE_6_length_21213_cov_79.7433_ID_37074 doesn't really belong to somewhere else in the genome.

If you have ACT installed, having a look at the input contigs vs the spades reassembly contigs might make it clearer as to what happened. Try running 04.merge.circularise.start_act.sh, which should be in the output directory.

Martin

@mhulin
Copy link
Author

mhulin commented Oct 5, 2016

Dear Martin,

Thank you for your reply. I’ve aligned the spades assembly (node 6) and original contig on Geneious (I don’t have ACT installed). I’ve attached a picture of the alignment. The two pairs of nucmer matches to the end of the original contig are annotated, however as you said the spades assembly has a longer match – the alignment shows to the middle of the contig?

I hope that makes sense, let me know what you think.

Many thanks,
Michelle

From: martinghunt
Reply-To: sanger-pathogens/circlator
Date: Wednesday, 5 October 2016 14:53
To: sanger-pathogens/circlator
Cc: Michelle Hulin, Author
Subject: Re: [sanger-pathogens/circlator] failure to circularise: cannot use this pair because longer match was found (#75)

Hi,

Circlator is conservative and doesn't make a change if there is ambiguity. In your case there the spades contig NODE_6_length_21213_cov_79.7433_ID_37074 spans across the start/end of your input contig tig00000009. However, there is also a nucmer match between NODE_6_length_21213_cov_79.7433_ID_37074 and somewhere else in the input contigs that is longer than either of the two matches at the start/end to tig00000009. This makes circlator not change tig00000009 because it can't be sure that NODE_6_length_21213_cov_79.7433_ID_37074 doesn't really belong to somewhere else in the genome.

If you have ACT installed, having a look at the input contigs vs the spades reassembly contigs might make it clearer as to what happened. Try running 04.merge.circularise.start_act.sh, which should be in the output directory.

Martin


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/75#issuecomment-251681102, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGnlpLK2EDLmXcoV_U-aZb5Qy2VPrWh0ks5qw6vkgaJpZM4KOxkZ.

This email and any files transmitted with it may contain confidential or privileged information which are intended for the addressee only. Access to this email by anyone else is unauthorised and any disclosure, copying, distribution or any action taken or omitted to be taken or reliance on it, is prohibited and may be unlawful. If you are not the named addressee or the person responsible for delivering the message to the named addressee, please delete the email and any attachments and contact: enquiries:emr.ac.uk For any information about NIAB EMR or for a copy of the Terms of Business, please contact +44 (0)1732 843833 or visit www.emr.ac.uk. NIAB EMR is a charitable company limited by guarantee | Registered in England : 09894859 | Registered Charity : 1165055. Registered office: NIAB EMR, Huntingdon Road, Cambridge, Cambridgeshire, United Kingdom, CB3 0LE, UK.

@martinghunt
Copy link
Contributor

Hi Michelle,

Looks like github doesn't attach files if you replied to the email. Could you reply on github and attach the file that way please?

Thanks,
Martin

@mhulin
Copy link
Author

mhulin commented Oct 5, 2016

Dear Martin,

Thank you for your reply. I’ve aligned the spades assembly (node 6) and original contig on Geneious (I don’t have ACT installed). I’ve attached a picture of the alignment. The two pairs of nucmer matches to the end of the original contig are annotated, however as you said the spades assembly has a longer match – the alignment shows to the middle of the contig?

I hope that makes sense, let me know what you think.

Many thanks,
Michelle
spades_original_alignment.pdf

@martinghunt
Copy link
Contributor

Hi Michelle,

I'm not sure what's happened, would be good if I could have all the 04.* files from the output directory. Can you share them with me?

Thanks,
Martin

@mhulin
Copy link
Author

mhulin commented Oct 6, 2016

Hi,

I've attached all the 04 files.

Best wishes,
Michelle
04 files.zip

@martinghunt
Copy link
Contributor

Hi Michelle,

Looks like the spades contig has about 700bp of sequence in common at its ends, which is odd. I would have expected it to call it as circular. What version of spades did you use?

Thanks,
Martin

@mhulin
Copy link
Author

mhulin commented Oct 6, 2016

Hi Martin,

I used Spades 3.7.0. Heres the output of circlator progcheck for more info:

External dependencies:
bwa 0.7.15 /home/hulinm/local/src/bwa-0.7.15/bwa
nucmer 3.1 /usr/bin/nucmer
prodigal 2.6.2 /home/hulinm/local/src/Prodigal-2.6.2/prodigal
samtools 1.3.1 /home/hulinm/local/src/samtools-1.3.1/samtools
spades 3.7.0 /home/hulinm/local/src/SPAdes-3.7.0-Linux/bin/spades.py

Python version:
3.3.5 (default, Sep 30 2016, 15:44:31)
[GCC 4.7.2]

Python dependencies:
openpyxl 2.4.0 /home/hulinm/local/src/python3/Python-3.3.5/lib/python3.3/site-packages/openpyxl-2.4.0-py3.3.egg/openpyxl/init.py
pyfastaq 3.14.0 /home/hulinm/local/src/python3/Python-3.3.5/lib/python3.3/site-packages/pyfastaq-3.14.0-py3.3.egg/pyfastaq/init.py
pymummer 0.8.1 /home/hulinm/local/src/python3/Python-3.3.5/lib/python3.3/site-packages/pymummer-0.8.1-py3.3.egg/pymummer/init.py
pysam 0.9.1.4 /home/hulinm/local/src/python3/Python-3.3.5/lib/python3.3/site-packages/pysam-0.9.1.4-py3.3-linux-x86_64.egg/pysam/init.py

Thanks,
Michelleq

@martinghunt
Copy link
Contributor

Hi Michelle,

Sorry, but in this case, circlator can't do any more because of how the spades contig looks. There is something strange going on with the ~700bp sequence in common, and you would need to investigate more as to why it is there. I would start with mapping the reads to the spades assembly and to the input assembly and see how they look on tig00000006 and NODE_6.

Martin

@mhulin
Copy link
Author

mhulin commented Oct 12, 2016

Hi Martin,

Thank you for having a look. I shall investigate, and let you know how I get on.

Best wishes,
Michelle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants