Possible data corruption? #3

concatenize · 2019-12-17T22:06:28Z

Hi Dr. Veres,
I'm working with some of your data, downloaded from GEO. When I read into R the file x1_S3c_b1.counts.tsv.gz from Veres2019_analysis_data⁩ ▸ ⁨01_Stages_3_to_6⁩ ▸ ⁨data⁩ ▸ ⁨indrops_raw⁩, I get the classic R error Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 14347 did not have 41748 elements. I'm reading it like this:

fname = "x1_S3c_b1.counts.tsv.gz"
expr = read.table(file.path("data/melton_veres/01_Stages_3_to_6", fname) %>% gzfile, skip = 14346, nrows = 3 )

I looked for the offending line, comparing it to the ones before it like this:

scan(file.path("data/melton_veres/01_Stages_3_to_6", fname) %>% gzfile, skip = 14342, nlines = 3, what = "character" ) %>% length %>% divide_by(3)
> 41748
scan(file.path("data/melton_veres/01_Stages_3_to_6", fname) %>% gzfile, skip = 14346, nlines = 3, what = "character" ) %>% length %>% divide_by(3)
> 82838290

It looks to me like I am reading a corrupted file with a "frameshift mutation" that wipes out all the newlines after line 14346. Could you check and see if the file looks OK on your end?

Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible data corruption? #3

Possible data corruption? #3

concatenize commented Dec 17, 2019 •

edited

Loading

Possible data corruption? #3

Possible data corruption? #3

Comments

concatenize commented Dec 17, 2019 • edited Loading

concatenize commented Dec 17, 2019 •

edited

Loading