Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data provided do not have the raw counts in the 'counts layer' #22

Open
genecell opened this issue Jan 26, 2023 · 6 comments
Open

Data provided do not have the raw counts in the 'counts layer' #22

genecell opened this issue Jan 26, 2023 · 6 comments

Comments

@genecell
Copy link

Hi,

Thank you for the nice tool and resource! I downloaded the lung and the human immune data from the figshare website, but found there were no raw counts data in the adata object. For example, from https://figshare.com/ndownloader/files/25717328, I downloaded the data:

adata.layers['counts']
array([[ 0.  ,  0.  ,  0.  , ...,  1.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  , ...,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  , ...,  0.  ,  0.  ,  0.  ],
       ...,
       [ 0.  ,  0.  ,  0.  , ..., 54.67,  0.  , 93.26],
       [ 0.  ,  0.  ,  0.  , ..., 14.62,  0.  , 84.9 ],
       [ 0.  ,  0.  ,  0.  , ...,  5.98,  0.  ,  0.  ]], dtype=float32)

they are not integers. Btw, I got the warning when import the data:

 OldFormatWarning: Element '/layers/counts' was written without encoding metadata.
  return {k: read_elem(v) for k, v in elem.items()}

the version of scanpy and anndata are

scanpy==1.9.1 anndata==0.8.0

Thank you so much!

Best,
Min

@wconnell
Copy link

I noticed this about the counts too...

@lazappi
Copy link
Member

lazappi commented May 12, 2023

I am not sure exactly what was uploaded to FigShare. We would need to ask @LuckyMD about that but he is unavailable for the next few months.

 OldFormatWarning: Element '/layers/counts' was written without encoding metadata.
  return {k: read_elem(v) for k, v in elem.items()}

This warning is because the files were written with an older version of anndata and you are using v0.8.0 which expects a different file format. It should be back-compatible though so no need to worry about this.

@LuckyMD
Copy link
Collaborator

LuckyMD commented May 16, 2023

Hi @wconnell and @genecell,

Sorry for the late reply here. The reason not all of these are integers is that we use TPMs as "raw counts" for full-length data without UMIs. I believe this is mentioned in the methods section of the paper as well. In the immune dataset, the Villani data were measured using Smart-seq2. We don't have raw read counts for this dataset, but instead use TPMs which are already gene length corrected after alignment. I hope that clarifies things.

@LuckyMD
Copy link
Collaborator

LuckyMD commented May 16, 2023

Lung data should have integer counts though afaik, as there are no full length data in that task... did you find this issue also for the lung data?

@wconnell
Copy link

Thank you for clarifying @LuckyMD; I found the detail in the Sup Info that Villani was excluded from scran norm b/c only TPM was provided.

I'm not sure about the lung data.

@genecell
Copy link
Author

Hi @lazappi @LuckyMD @wconnell Thank you for your responses! yeah, the lung data did not have the integer counts, as I imported the data via:

adata = sc.read(
    "data/lung_atlas.h5ad",
    backup_url="https://figshare.com/ndownloader/files/24539942",
)

The integer counts are important as some methods rely on raw counts data.
Thank you very much!

Best regards,
Min

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants