Refine tumor and normal cell annotations across all samples in SCPCP000015 #1027

allyhawkins · 2025-02-05T19:52:43Z

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

What is the goal of this pull request?

The goal of this PR is to compile the cell type annotations obtained from SingleR (using the aucell-singler-annotation.sh workflow), the consensus cell types, and the AUC values from running AUCell on a set of Ewing specific gene sets for all samples in SCPCP000015. As a result of this we are able to assign cell types across all cells and output both a rendered notebook summarizing the cell type annotations for the entire dataset and a TSV file with all assigned cell types.

Briefly describe the general approach you took to achieve this goal.

I started by copying over the template notebook that we had started creating (celltype-exploration.Rmd) as a new exploratory notebook. Much of the set up and first sections that just explore the results are copied from that template notebook. The main difference here is that we are now working with the merged object rather than a single object. I did adjust some of the setup to account for working with the merged object, including:
- Reading in the consensus cell type results (output by running assign-consensus-celltype.sh from the cell-type-consensus module) for all samples.
- Reading in the SingleR results for all samples
- Removing the clustering results since they are meaningless here
- Reading in the AUCell results from running AUCell on the merged object
I showed summaries for both cell type annotations, those obtained with SingleR and the consensus cell types. For the AUCell results and custom gene sets I looked at the expression across both cell types. Generally the set of cell types is similar except for the presence of chondrocytes in the SingleR annotations which line up with EWS-FLI1 low cells.
Looking at those results heped me refine which group of cells should be tumor cells and how to divvy those up into ews-high, ews-low, and ews-high-proliferative. So using the cell type assignments from both methods and some AUC cutoffs for AUCell I pull out each of those tumor cell states. Generally most cells are ews-high, which is exactly what I would expect!
The last section includes some validation plots to summarize these refined annotations. To do this, I created a new reference file, combined-validation-markers.tsv, that has a few genes for each of the expected cell types. The tumor genes are all pulled from previous marker genes we had in our lists. I added MRC1 for macrophages and CD3D and CD3E for T cells. I then looked at the mean expression of these genes across all cells in each cell type in a dot plot and a heatmap. And then finally a UMAP with our "final.final" annotations for all cells!
I exported a TSV file with the SingleR annotations, consensus annotations, and final annotations, including ontology assignments for all non-tumor cells.

Just a side note that I altered some plot colors while I was here and veered away from our usual viridis coloring scheme so that I could see some of the variation easier. Let me know if you hate it and I can change it back.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Depending on reviews, I am anticipating this to be the last analysis PR for this module for now.
My hope is that this helps "finalize" our annotations across all samples and gets this module to a stopping point until we want to any further cell state classification.

I'm envisioning the next PRs to be clean-up PRs to make sure all the code needed to actually produce these annotations is well-organized and all documentation is updated.

Results

What is the name of your results bucket on S3?

s3://researcher-211125375652-us-east-2/cell-type-ewings/results/final-annotations/SCPCP000015_celltype-annotations.tsv.gz

What types of results does your code produce (e.g., table, figure)?

TSV file with all annotations and a rendered notebook summarizing annotations:

08-merged-celltypes.nb.html.zip

What is your summary of the results?

I included a lot of commentary in the notebook about the results, but generally I think we are able to pull out tumor cells from normal cells. There is some detection of tumor cell markers across all cell types, but it is much lower in the normal cells. We also see pretty much no expression of the normal cell markers in the tumor cells.

Within tumor cells, we're able to pull out EWS-low, EWS-high, and EWS-proliferative. The EWS-low show pretty strong expression of the expected marker genes and the proliferative genes are exclusively expressed in the proliferative subset.

The only questionable finding is that the "mature T cells" show very low expression of CD3... But expression of that gene is generally pretty low, so not totally sure what to think about that. The mature T cell comes from the consensus cell type, so for now I'm good to leave it 🤷‍♀️

There are a handful of cell types that have 1-8 cells and I did not dive deeper into those to validate them. These are cell types assigned by the consensus labels, so I feel comfortable leaving them alone.

Provide directions for reviewers

What are the software and computational requirements needed to be able to run the code in this PR?

Originally I thought working with the merged object would be an issue, but I was able to run this notebook in like a minute on my laptop without any issues.

Is there anything that you want to discuss further?

Are there any decisions that I made that you need more clarification on? Or do you disagree?
Apologies for the larger PR here, but I felt it was more important to get the whole picture together rather than break it up for this particular scenario. I'm sure there are plots we could cut out, but I wanted to include them all at first just so you could see all the data as a reviewer.
One caveat is that we are looking at gene expression in non batch-corrected data, but I think that's okay.
I plan to make more thorough documentation updates on how we obtained these annotations in a later PR.

Author checklists

Analysis module and review

This analysis module uses the analysis template and has the expected directory structure.
The analysis module README.md has been updated to reflect code changes in this pull request.
The analytical code is documented and contains comments.
Any results and/or plots this code produces have been added to your S3 bucket for review.

Reproducibility checklist

Code in this pull request has been added to the GitHub Action workflow that runs this module.
The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

…rged-annotations-ewings

sjspielman · 2025-02-05T20:13:34Z

The only questionable finding is that the "mature T cells" show very low expression of CD3... But expression of that gene is generally pretty low, so not totally sure what to think about that. The mature T cell comes from the consensus cell type, so for now I'm good to leave it 🤷‍♀️

About to actually look at the notebook itself, but this does seem weird. Follow-up question: Is there a chance this maybe expected to a degree? In mature T cells we expect concentrations of CD3 to be lower in the cytoplasm since it's all getting shuttled to the cell surface, and we are working with single-nuclei data so maybe we'd be biased against some of those? Of course it's the protein, not the transcript, that's getting transported, so this might be a real stretch and something else is going on.....

allyhawkins · 2025-02-05T20:20:08Z

The only questionable finding is that the "mature T cells" show very low expression of CD3... But expression of that gene is generally pretty low, so not totally sure what to think about that. The mature T cell comes from the consensus cell type, so for now I'm good to leave it 🤷‍♀️

About to actually look at the notebook itself, but this does seem weird. Follow-up question: Is there a chance this maybe expected to a degree? In mature T cells we expect concentrations of CD3 to be lower in the cytoplasm since it's all getting shuttled to the cell surface, and we are working with single-nuclei data so maybe we'd be biased against some of those? Of course it's the protein, not the transcript, that's getting transported, so this might be a real stretch and something else is going on.....

I looked at it again after posting this and I do see some expression indicated in the heatmap of CD3E, its just really low. For the dotplot I set a cutoff of the genes being expressed in at least 10% of cells so that's probably why there is no dot for that group of cells. It's only expressed in a low percentage of cells... Those same cells do express PTPRC which should be in all immune cells and do not express MRC1, so they are immune but not macrophages. I think we could dig deeper into the T cells specifically, but this notebook was already getting long, so maybe one more PR after this?

sjspielman · 2025-02-05T20:20:22Z

The Docker failure here is parallelDist so maybe this didn't get the job done? #1019

2025-02-05T20:16:16.2800159Z #16 1075.7 - Installing parallelDist ...                   FAILED
2025-02-05T20:16:16.4444954Z #16 1075.9 /usr/local/lib/R/bin/R --vanilla -s -f '/tmp/RtmpTlfV66/renv-install-73c78932c.R'
2025-02-05T20:16:16.4445835Z #16 1075.9 ================================================================================
2025-02-05T20:16:16.4446270Z #16 1075.9 
2025-02-05T20:16:16.4446674Z #16 1075.9 Error in dyn.load(file, DLLpath = DLLpath, ...) : 
2025-02-05T20:16:16.4447661Z #16 1075.9   unable to load shared object '/usr/local/lib/R/site-library/.renv/1/parallelDist/libs/parallelDist.so':
2025-02-05T20:16:16.4449440Z #16 1075.9   /usr/local/lib/R/site-library/.renv/1/parallelDist/libs/parallelDist.so: undefined symbol: _ZN12RcppParallel14tbbParallelForEmmRNS_6WorkerEmi
2025-02-05T20:16:16.4450457Z #16 1075.9 Calls: loadNamespace -> library.dynam -> dyn.load
2025-02-05T20:16:16.4450964Z #16 1075.9 Execution halted
2025-02-05T20:16:16.4451320Z #16 1075.9 
2025-02-05T20:16:16.4451739Z #16 1075.9 Error: error testing if 'parallelDist' can be loaded [error code 1]

This reverts commit 6697fa9.

sjspielman · 2025-02-05T20:21:11Z

I think we could dig deeper into the T cells specifically, but this notebook was already getting long, so maybe one more PR after this?

Oh definitely separate PR if we want to dig into these more.

allyhawkins · 2025-02-05T20:22:13Z

The Docker failure here is parallelDist so maybe this didn't get the job done? #1019

2025-02-05T20:16:16.2800159Z #16 1075.7 - Installing parallelDist ...                   FAILED
2025-02-05T20:16:16.4444954Z #16 1075.9 /usr/local/lib/R/bin/R --vanilla -s -f '/tmp/RtmpTlfV66/renv-install-73c78932c.R'
2025-02-05T20:16:16.4445835Z #16 1075.9 ================================================================================
2025-02-05T20:16:16.4446270Z #16 1075.9 
2025-02-05T20:16:16.4446674Z #16 1075.9 Error in dyn.load(file, DLLpath = DLLpath, ...) : 
2025-02-05T20:16:16.4447661Z #16 1075.9   unable to load shared object '/usr/local/lib/R/site-library/.renv/1/parallelDist/libs/parallelDist.so':
2025-02-05T20:16:16.4449440Z #16 1075.9   /usr/local/lib/R/site-library/.renv/1/parallelDist/libs/parallelDist.so: undefined symbol: _ZN12RcppParallel14tbbParallelForEmmRNS_6WorkerEmi
2025-02-05T20:16:16.4450457Z #16 1075.9 Calls: loadNamespace -> library.dynam -> dyn.load
2025-02-05T20:16:16.4450964Z #16 1075.9 Execution halted
2025-02-05T20:16:16.4451320Z #16 1075.9 
2025-02-05T20:16:16.4451739Z #16 1075.9 Error: error testing if 'parallelDist' can be loaded [error code 1]

It did, I just didn't run renv::restore() before running renv::snapshot() and accidentally over-wrote those changes. I just fixed it in the most recent commit.

sjspielman

I've left some initial comments on code and the notebook, but FYI I did not review the whole notebook yet since I got a bit lost navigating all the moving parts and wanted to make sure that I am interpreting the notebook correctly before I keep going. The context I'm missing may not be much though, maybe only 1-2 sentences! Overall though the notebook looks good, and you made a lovely dot plot (review of that code forthcoming)!!

analyses/cell-type-ewings/references/combined-validation-markers.tsv

analyses/cell-type-ewings/template_notebooks/utils/plotting-functions.R

sjspielman · 2025-02-05T20:32:16Z

analyses/cell-type-ewings/template_notebooks/utils/setup-functions.R

+    umap_df <- umap_df |> 
+      dplyr::left_join(cluster_df, by = join_columns)


Rather than joining in each if, you could also build up a list of data frames as you go and do something like this in the end -

purrr::reduce(list_of_dfs, dplyr::left_join, by = join_columns)

very small comment though, not a big deal

tbh not sure if this code will get made if that list is empty though...

sjspielman · 2025-02-05T20:35:10Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+# output file to save final annotations 
+results_dir <- file.path(module_base, "results", "final-annotations")
+fs::dir_create(results_dir)
+output_file <- file.path(results_dir, glue::glue("SCPCP000015_celltype-annotations.tsv.gz"))


nothing is being glued here? did you want to glue anything or is this left over from when we were going sample-by-sample?

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

sjspielman · 2025-02-05T20:49:41Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+```{r, fig.height=10}
+auc_columns <- colnames(all_info_df)[which(startsWith(colnames(all_info_df), "auc_"))]
+cluster_density_plot(all_info_df, auc_columns, "consensus_lumped", "AUC")
+```
+
+Here we look at the AUC values for each gene set across all `SingleR` cell types.  
+
+```{r, fig.height=10}
+cluster_density_plot(all_info_df, auc_columns, "singler_lumped", "AUC")
+```


These are quite hard to read because of size. I'd make the chunks taller for sure, and I'm not sure they need to be quite so wide. There's a chance you can get three on a row, but I'm not sure if you can tease out all patterns that way. Regardless, they definitely need to be taller, especially since they are juxtaposed with a nice big heatmap (which is good imo!).

sjspielman · 2025-02-05T20:52:11Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+The following input is used: 
+
+- Annotations obtained by running `SingleR` with tumor cells as a reference output by `aucell-singler-annotation.sh`. 
+- Consensus cell type annotations output from the `cell-type-consensus` module. 
+- AUC values as calculated by `AUCell` for a set of Ewing sarcoma specific gene sets in MSigDB output by `run-aucell-ews-signatures.sh`. 


Maybe I missed it somewhere here, but I'd also add to the introduction the specific way in which each of these contributes to the final annotations (and if my understanding here is wrong, that's another good reason to include those details!) -

SingleR --> tumor cells

gene sets --> tumor cell states

Consensus cell types --> normal cells

sjspielman · 2025-02-05T20:53:50Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+```{r, fig.height=7, warning=FALSE}
+mean_exp_columns <- colnames(all_info_df)[which(endsWith(colnames(all_info_df), "_mean"))]
+cluster_density_plot(all_info_df, mean_exp_columns, annotation_column = "consensus_lumped", "mean gene expression")
+```


same plot size comment - more taller more better

sjspielman · 2025-02-05T20:55:54Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+cluster_density_plot(all_info_df, auc_columns, "singler_lumped", "AUC")
+```
+
+The heatmap below shows the AUC values for all cells and all gene sets and the final refined cell type annotations. 


I'm a little confused by this wording - aren't you doing refining in the bottom section? These are the consensus cell types right?

Actually, let's make that a bit clearer throughout - for the two heatmaps with the cell_type label, can this say which cell type? There's a lot of moving parts so it helps to be reminded that they are consensus!

sjspielman · 2025-02-05T21:08:38Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+
+### Conclusions based on workflow results 
+
+Looking at these results, it looks like things labeled as "tumor" by `SingleR` and "Unknown" in the consensus cell types have high AUC values for the `EWS-FLI1` upregulated gene sets. 


I think I'm going to pause and return my first review here, since I am a bit lost in the notebook and want to make sure I'm absorbing this all correctly. Specifically, I realize I don't specifically see where the notebook shows the specific relationship between cells that SingleR labeled tumor and EWS-FLI1. Plots looking at expression generally focus on consensus cell types. You do have "tumor" in the final heatmap of the above section, but I'm forgetting how (if at all?) that label related to the SingleR labels themselves.

Punchline: there are many moving parts and a little more context for what comes after the first three UMAPs would be really helpful. In other words, I felt I had a good grasp up until the density plots landed and then I started to get a little unsure that I knew what exactly was what.

allyhawkins · 2025-02-06T14:48:30Z

@sjspielman thanks for all your comments! I have some ideas to kind of revamp the beginning parts of this notebook to make things more clear. I'll let you know when it's ready for another look!

allyhawkins · 2025-02-06T18:09:49Z

@sjspielman let's try this again with hopefully a clearer notebook walking through my thought process for refining the cell types!

First I directly compare the annotations from SingleR with a tumor reference and consensus cell types. I used this to create a combined classification where any cells that are tumor in SingleR and unknown in consensus are then classified as tumor and all other cell types are from the consensus annotation.
Then we look at AUCell results for all the gene sets and all cell types to do two things:
- find gene sets that are going to be informative in defining tumor cells and tumor cell states
- see if any normal cell types show expression of tumor gene sets
I picked 4 gene sets (2 up and 2 down) that are then used to refine the cells that are considered tumor cells and label cells as high and low.
Finally, we look at the expression of proliferative marker genes to identify any cells that are tumor cells and proliferative.

I also added CD3G to the list of genes for T cells and we now see expression in mature T cells as expected.
Let me know what you think and hopefully this version is a lot easier to follow.
08-merged-celltypes.nb.html.zip

sjspielman

Thanks for these revisions! The notebook is much easier to follow and walks you through the annotation process really well. For this review, I left various pieces of feedback that mostly are code-oriented; I'll be honest there's a lot of wrangling here and I trust that you have done it right! But before the final.final approval I will a bit more carefully at it.

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

sjspielman · 2025-02-06T20:00:01Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+The first thing we will do is compare cell type annotations obtained by each method, `SingleR` with tumor cells as a reference and consensus annotations. 
+
+Let's see how similar these annotations are and see if we can create a combined annotation. 
+Note that `SingleR` will label tumor cells but the consensus cell types only label normal cells. 
+Consensus cell types are observed when cell types from `SingleR` and `CellAssign` (using only normal tissue references) share a common ancestor. 
+If no consensus is found, the cells are labeled with "Unknown". 


I would add specific links to where these cell types were annotated for most inquiring minds out there:

the singler directory in this analysis module

probably the openscpca-nf module (not the one in this repo) for consensus since that actually generated the cell types

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

sjspielman · 2025-02-06T20:48:32Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+- EWS-high proliferative cells show strong expression of proliferation markers. 
+- Tumor EWS low cells show pretty strong expression of the `EWS-low` markers. 
+These markers are also present in fibroblasts, but to a lesser degree. 
+Additionally, `TNC` (which has been published to be important in the metastatic phenotype in Ewing cells) is specific to the low cells and not found in the fibroblasts, which makes me more confident that those are indeed tumor cells. 


doi maybe if it's handy?

sjspielman · 2025-02-06T20:48:57Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+These markers are also present in fibroblasts, but to a lesser degree. 
+Additionally, `TNC` (which has been published to be important in the metastatic phenotype in Ewing cells) is specific to the low cells and not found in the fibroblasts, which makes me more confident that those are indeed tumor cells. 
+- Endothelial cells show strong expression of `PECAM1` and `VWF`, while that is not seen in most of the other cells. 
+- All immune cell types show `PTPRC` as expected with macrophages also showing `MRC1` and T cells showing expression of `CD3G`


yay gamma subunit!

sjspielman · 2025-02-06T20:50:13Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+
+
+And finally we'll look at our annotations on a UMAP! 
+Because, why not. 


sjspielman · 2025-02-06T20:53:20Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+# create annotation for heatmap
+annotation <- ComplexHeatmap::columnAnnotation(
+  marker_gene_category = validation_markers_df$cell_type,
+  col = list(
+    marker_gene_category = category_colors
+  )
+)
+
+ComplexHeatmap::Heatmap(


can we get the legend ordered in the same way as the annotation bar?

Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>

…rged-annotations-ewings

allyhawkins · 2025-02-07T18:58:42Z

@sjspielman I believe I have addressed all your comments and updated plots/tables accordingly! Here's the updated notebook:

08-merged-celltypes.nb.html.zip

I'll address some of your inline comments below, but I also made the following changes:

Updated the dotplot to include cell numbers in the axes label and used oob=squish how fun! The legend is also in the correct order now.
The final heatmap is also updated to have the legend in the proper order.
I also added a few genes for T cells and macrophages based on the publication that came out this week using spatial transcriptomics on Ewing samples. They included a table with genes they used to classify different cell types so I took some that were expressed in our data.

This heatmap is a really helpful addition, thanks!
One thing I'm wondering that might be a bad idea is to only show rows (aka SingleR annotations) which are non-0 in the heatmap to make it easier to see where values lineup. But, are any actually 0 or are they just too faint to see?

So I checked and none of them are actually 0 but they are very very low. I also don't think we get very much from these minor cell types, so I decided to only look at cell types in SingleR that have at least 50 cells. I also looked at different cutoffs between 5-50 and you don't really lose anything in the visualization since those other cell numbers are so small, everything is just white. I think with 50 you get the idea that there's agreement, which is what we want to see.

I am wondering if there's something to the bimodal shape of the "Unknown" density plot here. We see roughly the same dip in the mature T cell plot too, both around 0.05. Could this be the cutoff? What do you think?

I think you're right so I adjusted the cutoffs. I played around with the numbers until I saw the lines line up as close as possible with the dips in Unknown. Overall numbers are still about the same for context.

For the next plot, is it possible to keep the same order as the previous plot, just replace the single tumor row with the two tumor rows? I think that makes it a bit easier to compare these to the previous ones

Order for the density plots has been updated. Note that memory T cell is no longer in the top cell types with the separation of EWS high/low.

This make me think, at a higher-level, I wonder if we want to record information like this along with the cell types (in the final TSV) as some kind of "confidence" indicator, something like how many lines of evidence supported characterizing a given cell. I'm not immediately sure what that would look like but wanted to put this out there as something to think about. At a certain point here, we can only get so far with computational labeling and we have to pick something, so having a sense of how much evidence supporting that decision seems like a helpful complement.

I don't know that at this point we can include a true confidence indicator. I think this is just part of dealing with the biology of Ewing sarcoma. They are going to resemble tumor cells, so we can do our best to separate them based on EWS-FLI1 target expression, but we may miss some. I think if we document how things were classified and the gene sets we used to do that, then that should be sufficient.

Can we also look at a table of how many libraries each cell type appears in? With the smaller cell types, we're not going to see them even close to uniformly distributed. I think this might be informative, like if a cell type is only present in 1 library, it could be more likely to be mislabeled...but it could also certainly be real 🤷‍♀️

I added a table with this and the cell types I show in the plots are found in at least 4 libraries. Perhaps the other cell types are mis-labeled but I think digging into that is outside the scope of this notebook, so I made a note of that.

sjspielman

Looking very good! I left a few more small comments but this is almost certainly the last round!

I added a table with this and the cell types I show in the plots are found in at least 4 libraries. Perhaps the other cell types are mis-labeled but I think digging into that is outside the scope of this notebook, so I made a note of that.

Oh 💯 I agree about scope - this would certainly be separate if pursued at all.

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

sjspielman · 2025-02-10T16:18:14Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+
+Any cells that are unable to be labeled via consensus cell types are labeled as "Unknown" and I expect these will line up with cells labeled as tumor cells by `SingleR`.  
+
+```{r, fig.height=5}


I'd just add a sentence saying the heatmap shows only >= 50

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

sjspielman · 2025-02-10T16:33:57Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+
+# get individual gene counts for all marker genes 
+gene_cts <- logcounts(sce[genes, ]) |>
+  as.matrix() |>


do you need this?

sjspielman · 2025-02-10T16:36:22Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+# get total number of cells per final annotation group 
+total_cells_df <- genes_df |> 
+  dplyr::select(barcodes, final_lumped) |> 
+  unique() |> 
+  dplyr::count(final_lumped, name = "total_cells")


Again, I don't think you need this intermediate to join in but can just tack in an add_counts() into your piping below where you define gene_summary_df. Using add_count() in place of the left_join() on line 561 for total_cells_df I think will get you there

sjspielman · 2025-02-10T16:37:16Z

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

+  patchwork::plot_layout(ncol = 1, heights = c(4, 0.1)) 
+```
+
+A few notes from this plot: 


I'd state that expression in the plot is capped at 2.5 but it might actually be higher

analyses/cell-type-ewings/exploratory_analysis/08-merged-celltypes.Rmd

Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>

allyhawkins · 2025-02-10T20:50:14Z

@sjspielman I addressed the remaining small comments, so this should be ready for another look.

sjspielman

🐈 🐈‍⬛ 🚀

allyhawkins · 2025-02-11T16:17:44Z

The updated results from these annotations can be found at s3://researcher-211125375652-us-east-2/cell-type-ewings/results/final-annotations/SCPCP000015_celltype-annotations.tsv.gz.

I also canceled the workflow run since no changes here are in CI.

allyhawkins added 12 commits February 3, 2025 16:28

initial addition of notebook annotating merged object

0edf474

incorporate consensus cell types

a3358a6

temp new markers file

bf45f1f

dotplot and heatmap

9902e22

tsv with combined markers for use in validation

5ba3c68

function to make single annotation heatmap

f8e211d

add ggmap

6697fa9

document new validation marker genes ref

e0644a3

document function updates

a1dc5a6

make tables prettier

83077c9

document new notebook

3b620fb

Merge remote-tracking branch 'AlexsLemonade/main' into allyhawkins/me…

1e4251e

…rged-annotations-ewings

allyhawkins requested a review from jaclyn-taroni as a code owner February 5, 2025 19:52

allyhawkins requested review from sjspielman and removed request for jaclyn-taroni February 5, 2025 19:56

Revert "add ggmap"

80659a3

This reverts commit 6697fa9.

only update ggmap and not Rcpp

035f31f

sjspielman reviewed Feb 5, 2025

View reviewed changes

allyhawkins added 3 commits February 6, 2025 12:08

add CD3G

70fc6f1

fix all of error

cb8c981

reorganize

76e3241

allyhawkins requested a review from sjspielman February 6, 2025 18:10

sjspielman reviewed Feb 6, 2025

View reviewed changes

allyhawkins and others added 4 commits February 7, 2025 11:27

Apply suggestions from code review

4a9d5c7

Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>

add new immune markers

3007338

Merge remote-tracking branch 'AlexsLemonade/main' into allyhawkins/me…

96ec58f

…rged-annotations-ewings

refine plots based on review

be6d7d5

fix some legend titles

9944f85

sjspielman reviewed Feb 10, 2025

View reviewed changes

allyhawkins and others added 2 commits February 10, 2025 11:37

Apply suggestions from code review

342a73c

Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>

address remaining review comments

5fd1b0d

allyhawkins requested a review from sjspielman February 10, 2025 20:50

allyhawkins mentioned this pull request Feb 10, 2025

Clean up Ewing module #1032

Open

sjspielman approved these changes Feb 11, 2025

View reviewed changes

Merge branch 'main' into allyhawkins/merged-annotations-ewings

89c488c

allyhawkins merged commit dc54eb1 into AlexsLemonade:main Feb 11, 2025
3 of 5 checks passed

allyhawkins deleted the allyhawkins/merged-annotations-ewings branch February 11, 2025 16:21

allyhawkins mentioned this pull request Feb 11, 2025

Annotate tumor cells in remaining samples for SCPCP000015 #563

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine tumor and normal cell annotations across all samples in SCPCP000015 #1027

Refine tumor and normal cell annotations across all samples in SCPCP000015 #1027

allyhawkins commented Feb 5, 2025

sjspielman commented Feb 5, 2025

allyhawkins commented Feb 5, 2025

sjspielman commented Feb 5, 2025

sjspielman commented Feb 5, 2025

allyhawkins commented Feb 5, 2025

sjspielman left a comment

sjspielman Feb 5, 2025

sjspielman Feb 5, 2025

sjspielman Feb 5, 2025

sjspielman Feb 5, 2025

sjspielman Feb 5, 2025

sjspielman Feb 5, 2025

sjspielman Feb 5, 2025

sjspielman Feb 5, 2025

sjspielman Feb 5, 2025

allyhawkins commented Feb 6, 2025

allyhawkins commented Feb 6, 2025

sjspielman left a comment

sjspielman Feb 6, 2025

sjspielman Feb 6, 2025

sjspielman Feb 6, 2025

sjspielman Feb 6, 2025

sjspielman Feb 6, 2025

allyhawkins commented Feb 7, 2025

sjspielman left a comment

sjspielman Feb 10, 2025

sjspielman Feb 10, 2025

sjspielman Feb 10, 2025

sjspielman Feb 10, 2025

allyhawkins commented Feb 10, 2025

sjspielman left a comment

allyhawkins commented Feb 11, 2025

		umap_df <- umap_df \|>
		dplyr::left_join(cluster_df, by = join_columns)


		### Conclusions based on workflow results

		Looking at these results, it looks like things labeled as "tumor" by `SingleR` and "Unknown" in the consensus cell types have high AUC values for the `EWS-FLI1` upregulated gene sets.



		And finally we'll look at our annotations on a UMAP!
		Because, why not.


		Any cells that are unable to be labeled via consensus cell types are labeled as "Unknown" and I expect these will line up with cells labeled as tumor cells by `SingleR`.

		```{r, fig.height=5}

Refine tumor and normal cell annotations across all samples in SCPCP000015 #1027

Refine tumor and normal cell annotations across all samples in SCPCP000015 #1027

Conversation

allyhawkins commented Feb 5, 2025

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

What is the goal of this pull request?

Briefly describe the general approach you took to achieve this goal.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Results

What is the name of your results bucket on S3?

What types of results does your code produce (e.g., table, figure)?

What is your summary of the results?

Provide directions for reviewers

What are the software and computational requirements needed to be able to run the code in this PR?

Is there anything that you want to discuss further?

Author checklists

Analysis module and review

Reproducibility checklist

sjspielman commented Feb 5, 2025

allyhawkins commented Feb 5, 2025

sjspielman commented Feb 5, 2025

sjspielman commented Feb 5, 2025

allyhawkins commented Feb 5, 2025

sjspielman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allyhawkins commented Feb 6, 2025

allyhawkins commented Feb 6, 2025

sjspielman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allyhawkins commented Feb 7, 2025

sjspielman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allyhawkins commented Feb 10, 2025

sjspielman left a comment

Choose a reason for hiding this comment

allyhawkins commented Feb 11, 2025