You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use cancer-hic-norm to infer copy number information from the Hi-C data. For the time being, I am just trying to infer the copy number data and not normalize my Hi-C data according to it.
For the script cancer-hic-norm-master/cnv_from_hic/segment_hic_data.R, when it comes to the part for plotting p2, it generates the following data frame.
However, this results in an error for me because rs.seg.gr does not have a column called cnv. If I remove this part from the command (remove cn=rs.seg.gr$cnv), script runs fine. Output file seems to only use the smt column so I was thinking this should be okay but I just wanted to make sure I'm not discarding some kind of critical information. Which step is the CNV column supposed to be added to the rs.seg.gr object?
Also, I wasn't sure how to interpret the final output bed file with copy number values at the 4th column. Is this a log2 ratio compared to the average number of reads for bins throughout the genome? For example, if I see the 4th column being 2 for a bin of my interest, does that mean that bin shows twice more reads compared to average of the genome when appropriate normalization and smoothing of the signal took place?
Thank you!
The text was updated successfully, but these errors were encountered:
Hi,
Could you use the segment_hic_data_sym.R which should fix this issue.
Actually, all sym scripts are more recent, and has been corrected for a couple of bugs.
But I did not have time so far to correclty include them in the repo. Sorry for that.
To answer the first question, rs.seg.gr$cnv was a copy number status infer from microarray experiment ... so not useful at all here, but I just used it as a control.
And the log ratio is a normalized and centered signal from raw counts.
Note that for CNV normalization, the only thing which is really important is the breakpoint location. As each copy number block will be treated independantly. The copy number value (or smt) is used if you want to keep the CNV profile, but does not really make sense (biologically) per se.
Best
Hello,
I am trying to use cancer-hic-norm to infer copy number information from the Hi-C data. For the time being, I am just trying to infer the copy number data and not normalize my Hi-C data according to it.
For the script cancer-hic-norm-master/cnv_from_hic/segment_hic_data.R, when it comes to the part for plotting p2, it generates the following data frame.
dat <- data.frame(chr=as.vector(seqnames(rs.seg.gr)), pos = xpos, counts.cor = rs.seg.gr$counts.cor, smt = rs.seg.gr$smt, cn=rs.seg.gr$cnv)
However, this results in an error for me because rs.seg.gr does not have a column called cnv. If I remove this part from the command (remove cn=rs.seg.gr$cnv), script runs fine. Output file seems to only use the smt column so I was thinking this should be okay but I just wanted to make sure I'm not discarding some kind of critical information. Which step is the CNV column supposed to be added to the rs.seg.gr object?
Also, I wasn't sure how to interpret the final output bed file with copy number values at the 4th column. Is this a log2 ratio compared to the average number of reads for bins throughout the genome? For example, if I see the 4th column being 2 for a bin of my interest, does that mean that bin shows twice more reads compared to average of the genome when appropriate normalization and smoothing of the signal took place?
Thank you!
The text was updated successfully, but these errors were encountered: