Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rs.seg.gr$cnv from segment_hic_data.R and interpreting final output bed file #1

Open
jjylee opened this issue Jan 24, 2019 · 3 comments

Comments

@jjylee
Copy link

jjylee commented Jan 24, 2019

Hello,

I am trying to use cancer-hic-norm to infer copy number information from the Hi-C data. For the time being, I am just trying to infer the copy number data and not normalize my Hi-C data according to it.

For the script cancer-hic-norm-master/cnv_from_hic/segment_hic_data.R, when it comes to the part for plotting p2, it generates the following data frame.

dat <- data.frame(chr=as.vector(seqnames(rs.seg.gr)), pos = xpos, counts.cor = rs.seg.gr$counts.cor, smt = rs.seg.gr$smt, cn=rs.seg.gr$cnv)

However, this results in an error for me because rs.seg.gr does not have a column called cnv. If I remove this part from the command (remove cn=rs.seg.gr$cnv), script runs fine. Output file seems to only use the smt column so I was thinking this should be okay but I just wanted to make sure I'm not discarding some kind of critical information. Which step is the CNV column supposed to be added to the rs.seg.gr object?

Also, I wasn't sure how to interpret the final output bed file with copy number values at the 4th column. Is this a log2 ratio compared to the average number of reads for bins throughout the genome? For example, if I see the 4th column being 2 for a bin of my interest, does that mean that bin shows twice more reads compared to average of the genome when appropriate normalization and smoothing of the signal took place?

Thank you!

@csijcs
Copy link

csijcs commented Aug 4, 2020

Hello, sorry to resurrect an old comment but I'm having the exact same issue. Was there ever a solution for this?

@nservant
Copy link
Owner

nservant commented Aug 5, 2020

Hi,
Could you use the segment_hic_data_sym.R which should fix this issue.
Actually, all sym scripts are more recent, and has been corrected for a couple of bugs.
But I did not have time so far to correclty include them in the repo. Sorry for that.

To answer the first question, rs.seg.gr$cnv was a copy number status infer from microarray experiment ... so not useful at all here, but I just used it as a control.
And the log ratio is a normalized and centered signal from raw counts.
Note that for CNV normalization, the only thing which is really important is the breakpoint location. As each copy number block will be treated independantly. The copy number value (or smt) is used if you want to keep the CNV profile, but does not really make sense (biologically) per se.
Best

@csijcs
Copy link

csijcs commented Aug 5, 2020

Got it, thanks for the quick reply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants