Data provided do not have the raw counts in the 'counts layer' #22

genecell · 2023-01-26T05:50:02Z

Hi,

Thank you for the nice tool and resource! I downloaded the lung and the human immune data from the figshare website, but found there were no raw counts data in the adata object. For example, from https://figshare.com/ndownloader/files/25717328, I downloaded the data:

adata.layers['counts']

array([[ 0.  ,  0.  ,  0.  , ...,  1.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  , ...,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  , ...,  0.  ,  0.  ,  0.  ],
       ...,
       [ 0.  ,  0.  ,  0.  , ..., 54.67,  0.  , 93.26],
       [ 0.  ,  0.  ,  0.  , ..., 14.62,  0.  , 84.9 ],
       [ 0.  ,  0.  ,  0.  , ...,  5.98,  0.  ,  0.  ]], dtype=float32)

they are not integers. Btw, I got the warning when import the data:

 OldFormatWarning: Element '/layers/counts' was written without encoding metadata.
  return {k: read_elem(v) for k, v in elem.items()}

the version of scanpy and anndata are

scanpy==1.9.1 anndata==0.8.0

Thank you so much!

Best,
Min

The text was updated successfully, but these errors were encountered:

wconnell · 2023-05-11T15:40:20Z

I noticed this about the counts too...

lazappi · 2023-05-12T06:54:00Z

I am not sure exactly what was uploaded to FigShare. We would need to ask @LuckyMD about that but he is unavailable for the next few months.

 OldFormatWarning: Element '/layers/counts' was written without encoding metadata.
  return {k: read_elem(v) for k, v in elem.items()}

This warning is because the files were written with an older version of anndata and you are using v0.8.0 which expects a different file format. It should be back-compatible though so no need to worry about this.

LuckyMD · 2023-05-16T08:15:10Z

Hi @wconnell and @genecell,

Sorry for the late reply here. The reason not all of these are integers is that we use TPMs as "raw counts" for full-length data without UMIs. I believe this is mentioned in the methods section of the paper as well. In the immune dataset, the Villani data were measured using Smart-seq2. We don't have raw read counts for this dataset, but instead use TPMs which are already gene length corrected after alignment. I hope that clarifies things.

LuckyMD · 2023-05-16T08:16:41Z

Lung data should have integer counts though afaik, as there are no full length data in that task... did you find this issue also for the lung data?

wconnell · 2023-05-16T16:16:45Z

Thank you for clarifying @LuckyMD; I found the detail in the Sup Info that Villani was excluded from scran norm b/c only TPM was provided.

I'm not sure about the lung data.

genecell · 2023-07-12T23:11:50Z

Hi @lazappi @LuckyMD @wconnell Thank you for your responses! yeah, the lung data did not have the integer counts, as I imported the data via:

adata = sc.read(
    "data/lung_atlas.h5ad",
    backup_url="https://figshare.com/ndownloader/files/24539942",
)

The integer counts are important as some methods rely on raw counts data.
Thank you very much!

Best regards,
Min

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data provided do not have the raw counts in the 'counts layer' #22

Data provided do not have the raw counts in the 'counts layer' #22

genecell commented Jan 26, 2023

wconnell commented May 11, 2023

lazappi commented May 12, 2023

LuckyMD commented May 16, 2023

LuckyMD commented May 16, 2023

wconnell commented May 16, 2023

genecell commented Jul 12, 2023

Data provided do not have the raw counts in the 'counts layer' #22

Data provided do not have the raw counts in the 'counts layer' #22

Comments

genecell commented Jan 26, 2023

wconnell commented May 11, 2023

lazappi commented May 12, 2023

LuckyMD commented May 16, 2023

LuckyMD commented May 16, 2023

wconnell commented May 16, 2023

genecell commented Jul 12, 2023