rs.seg.gr$cnv from segment_hic_data.R and interpreting final output bed file #1

jjylee · 2019-01-24T17:04:12Z

Hello,

I am trying to use cancer-hic-norm to infer copy number information from the Hi-C data. For the time being, I am just trying to infer the copy number data and not normalize my Hi-C data according to it.

For the script cancer-hic-norm-master/cnv_from_hic/segment_hic_data.R, when it comes to the part for plotting p2, it generates the following data frame.

dat <- data.frame(chr=as.vector(seqnames(rs.seg.gr)), pos = xpos, counts.cor = rs.seg.gr$counts.cor, smt = rs.seg.gr$smt, cn=rs.seg.gr$cnv)

However, this results in an error for me because rs.seg.gr does not have a column called cnv. If I remove this part from the command (remove cn=rs.seg.gr$cnv), script runs fine. Output file seems to only use the smt column so I was thinking this should be okay but I just wanted to make sure I'm not discarding some kind of critical information. Which step is the CNV column supposed to be added to the rs.seg.gr object?

Also, I wasn't sure how to interpret the final output bed file with copy number values at the 4th column. Is this a log2 ratio compared to the average number of reads for bins throughout the genome? For example, if I see the 4th column being 2 for a bin of my interest, does that mean that bin shows twice more reads compared to average of the genome when appropriate normalization and smoothing of the signal took place?

Thank you!

csijcs · 2020-08-04T16:35:53Z

Hello, sorry to resurrect an old comment but I'm having the exact same issue. Was there ever a solution for this?

nservant · 2020-08-05T09:01:07Z

Hi,
Could you use the segment_hic_data_sym.R which should fix this issue.
Actually, all sym scripts are more recent, and has been corrected for a couple of bugs.
But I did not have time so far to correclty include them in the repo. Sorry for that.

To answer the first question, rs.seg.gr$cnv was a copy number status infer from microarray experiment ... so not useful at all here, but I just used it as a control.
And the log ratio is a normalized and centered signal from raw counts.
Note that for CNV normalization, the only thing which is really important is the breakpoint location. As each copy number block will be treated independantly. The copy number value (or smt) is used if you want to keep the CNV profile, but does not really make sense (biologically) per se.
Best

csijcs · 2020-08-05T10:08:26Z

Got it, thanks for the quick reply

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rs.seg.gr$cnv from segment_hic_data.R and interpreting final output bed file #1

rs.seg.gr$cnv from segment_hic_data.R and interpreting final output bed file #1

jjylee commented Jan 24, 2019

csijcs commented Aug 4, 2020

nservant commented Aug 5, 2020

csijcs commented Aug 5, 2020

rs.seg.gr$cnv from segment_hic_data.R and interpreting final output bed file #1

rs.seg.gr$cnv from segment_hic_data.R and interpreting final output bed file #1

Comments

jjylee commented Jan 24, 2019

csijcs commented Aug 4, 2020

nservant commented Aug 5, 2020

csijcs commented Aug 5, 2020