Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use genetic map (markers.csv) file to scafold a haplotype phased assembly from contig level to pseudochromosome level. #604

Open
ap1438 opened this issue Oct 12, 2023 · 5 comments

Comments

@ap1438
Copy link

ap1438 commented Oct 12, 2023

have a Genetic map (18000_markers.csv) in ABH format and i want to use this genetic map to scaffold a contig level assembly(I have 2 haplotype phased genome assembly).

I figured out that this can be achived using ALLMAPS .But the input file in all maps looks different than what i posses and i am new to using genetic map and donot know how to convert the file that i have to be used to scafold the haplotype phased genomes.

I have below an example of the file i have and i donot know how to attach the .csv file for convenience.

`

<title></title>
<meta name="generator" content="LibreOffice 7.3.7.2 (Linux)"/>
<style type="text/css">
	body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small }
	a.comment-indicator:hover + comment { background:#ffd; position:absolute; display:block; border:1px solid black; padding:0.5em;  } 
	a.comment-indicator { background:red; display:inline-block; border:1px solid black; width:0.5em; height:0.5em;  } 
	comment { display:none;  } 
</style>
id chr1_153600 chr1_202763 chr1_211008 chr1_211360 chr1_237927 chr1_294054 chr1_315629 chr1_221454 chr1_237913 chr1_214484 chr1_348473 chr1_317931 chr1_393124 chr1_410996 chr1_441148 chr1_444094 chr1_509974 chr1_515004 chr1_515498 chr1_516081 chr1_520420 chr1_520449 chr1_522093 chr1_522229 chr1_522520 chr1_522530 chr1_533422 chr1_537039 chr1_537048 chr1_537275 chr1_552942
  L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1 L.1
  0 0.000001 0.000002 0.000003 0.000004 0.000005 0.000006 0.000007 0.000008 0.000009 0.00001 0.000011 0.000012 0.000013 0.000014 0.000015 0.000016 0.000017 0.000018 0.000019 0.00002 0.000021 0.000022 0.000023 0.000024 0.000025 0.000026 0.000027 0.000028 0.000029 0.00003
1_a1 B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
10_a10 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
100_a100 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H
101_a101 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H
102_a102 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
103_a103 B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
104_a104 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H
105_a105 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
106_a106 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H
107_a107 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H
108_a108 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H
109_a109 B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
11_a11 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H
110_a110 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H
111_a111 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
112_a112 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H
113_a113 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
114_a114 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
115_a115 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H
116_a116 B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
117_a117 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H
119_a119 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H
12_a12 B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
`

Please let me know how can i scaffold the genome. Thanks in advance for your valuable time.

@tanghaibao
Copy link
Owner

@ap1438

I think you may need to construct the genetic map first. The format you have here is very similar to MSTMap input. http://alumni.cs.ucr.edu/~yonghui/mstmap.html

Once you have computed the genetic distance for all markers, then you can follow the steps in https://github.com/tanghaibao/jcvi/wiki/ALLMAPS#step-1-prepare-input-data to use ALLMAPS.

@ap1438
Copy link
Author

ap1438 commented Oct 16, 2023

I got this result. I am not familiar with theses type of plots and what to interpret from this.

Does this just mean that, for Chr 3, have linkage to two linkage groups (that means markers from linkage group 2 i.e corresponding to chr. 2 , also map to chr.3 )Or is something more concerning going on?
-I can interpreter that there are disagreements between the position of the marker in genetic map and physical map.
-What does the grey colour signifies in the bar of ChrL.3(28mb). What does vertical lines on the graph means.

  • If there are disagreements should I question the quality of the linkage map and/or genome assembly?
  • -Can i locate and extract the exact region of disagreement from any output file produced by allmaps?
  • Should i check the assembly by aligning the contigs back to the assembly and check the regions of disagreement for validation.
    Any help would be much appreciated.
    chrL.3.pdf

This one seems even more chaotic.

chrL.8.pdf

Thanks for your valuable time.

@tanghaibao
Copy link
Owner

@ap1438

The assembly looks reasonable to me and the matching to multiple linkage groups does not seem too concerning.

The grey colors are showing contigs that get assembled (so grey, white, grey, white ... alternating colors so that you can see the boundaries of contigs). In your case, there are just 2 contigs. This is the best arrangement between the two contigs that agree with the linkage map.

If you look at the matches to other linkage group (X-L.2), they are really minor and appears concentrated on a few places. You can increase --links to remove those minor matches.

Same story on the chr8, it doesn't seem bad at all.

@ap1438
Copy link
Author

ap1438 commented Oct 30, 2023

Thank You

@ap1438 ap1438 closed this as completed Oct 30, 2023
@ap1438 ap1438 reopened this Nov 24, 2023
@ap1438
Copy link
Author

ap1438 commented Nov 24, 2023

Refering to - You can increase --links to remove those minor matches.

How does this --links option work and what value should be reasonable to increase. I increased it to 20 and it did remove some minor matches but not in all the chromosomes. So, i was curious to know what exactly this option does and what should be an optimal value.

Can you please explain this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants