Use merged object for Ewing sarcoma samples to assign cell types #1017

allyhawkins · 2025-01-31T22:24:15Z

If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.

Describe the goals of the changes to the analysis module.

Before we spend a lot of time trying to manually validate annotations in each sample (via #1003), I think it might be useful to explore using the merged object to finalize annotations. We have some samples that are heterogeneous and have both tumor cells and a set of normal cells, while most samples are almost exclusively tumor. Working with the homogenous samples is difficult when they all show high expression of tumor markers. But if we were to look at all cells together we would expect to see a more obvious distinction between tumor and normal cell types. In particular I would like to run AUCell on the merged object with the gene sets that we have identified from MSigDB and look at that output alongside both the cell types obtained by running SingleR with the tumor cell reference and consensus cell types.

One caveat here is that we don't want to integrate the samples, but just work with the merged and uncorrected data. So it is possible that we might see some technical differences. We should pay close attention to the normal cells in this case since I expect those to be more similar to each other across samples than tumor cells.

What will your pull request contain?

This is going to be broken into a few PRs.

The first PR will add running the merged object through AUCell as an additional step to the script that currently runs AUCell on each individual object. In looking at the AUCell vignette there is a note that rankings could be combined after running independently, but it's best to be sure the same genes are being considered and proceed with caution. Because of this, I think we should just re-run on the merged object rather than use existing results.
The second PR will be a script to read in the merged object, all singleR results, all consensus cell types, and all AUC results. The output will be a TSV file with UMAP embeddings, cell type annotations, and AUC scores for each cell across all datasets. This can then be used as input to a notebook rather than having to work with the full merged object.
The last PR will be an exploratory notebook that looks at the cell type assignments and AUC values in the merged object. I plan to use the guide notebook that I've been developing as part of Write a template notebook to use for "finalizing" cell type annotations for Ewing samples #993 to create this notebook. The goal is to adjust any assignments based on findings in this notebook and validate the assignments.

Will you require additional software beyond what is already in the analysis module?

No, we should have everything but we will need to use the merged object from OpenScPCA-nf.

Will you require different computational resources beyond what the analysis module already uses?

No

If known, when do you expect to file the pull request?

Hoping to have these items done by middle of next sprint (2/14).

The text was updated successfully, but these errors were encountered:

allyhawkins added the analysis label Jan 31, 2025

allyhawkins self-assigned this Feb 7, 2025

allyhawkins closed this as completed in #1027 Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use merged object for Ewing sarcoma samples to assign cell types #1017

Use merged object for Ewing sarcoma samples to assign cell types #1017

allyhawkins commented Jan 31, 2025

Use merged object for Ewing sarcoma samples to assign cell types #1017

Use merged object for Ewing sarcoma samples to assign cell types #1017

Comments

allyhawkins commented Jan 31, 2025

If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.

Describe the goals of the changes to the analysis module.

What will your pull request contain?

Will you require additional software beyond what is already in the analysis module?

Will you require different computational resources beyond what the analysis module already uses?

If known, when do you expect to file the pull request?