Explore consensus cell types in osteosarcoma samples #1028

allyhawkins · 2025-02-06T23:46:17Z

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

What is the goal of this pull request?

The goal is to look at the consensus cell types assigned across all osteosarcoma samples found in three different projects and identify any variation across samples. In particular, we want to look for samples with high and low immune infiltrate (hot and cold tumors). What I'm adding here is pretty basic and just summarizes the consensus cell types and total immune percentage. My understanding is we want to be able to make an observation about samples with different immune populations here, but any biological insight is going to be additional work outside the scope of this PR.

Briefly describe the general approach you took to achieve this goal.

First I just looked at the top cell types across all samples in a single plot, similar to the stacked plots we had seen previously. The overall cell types present are pretty similar, but the total percentage of those cell types changes for each sample.
I also looked at the top cell types separated by project. One thing to note is that generally project 23 has a lot fewer cells that are classified than both 17 and 18. Also it looks like 18 has more T cells, while 17 has more macrophages. Endothelial cells and smooth muscle cells are pretty consistent across samples if cells are classified.
I looked at total immune percentage vs. non-immune vs. unknown. Again, there are definitely variations across samples, but I think part of that is some samples having more cells that are classified to begin with.
I arbitrarily broke up "hot" and "cold" tumors based on having at least 5% immune cells. This is totally arbitrary, but just to show that we do have different groups of tumors.
Lastly I looked at immune composition alongside some of the clinical metadata including primary/mets, disease timing, tissue location and seq unit. There may be some difference in single-cell/nuclei but sample numbers are not comparable. I could definitely do the stats to see if this is the case, but I think if we want to do any real comparisons we might want to look into methods on comparing cell composition in single-cell.

While I was working on this I also moved the setup function that we used in the first exploratory notebook to it's own R script to source in and added a list of just immune cell types as a reference.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Yes

Results

What types of results does your code produce (e.g., table, figure)?

Rendered HTML report here:

02-explore-consensus-results.nb.html.zip

What is your summary of the results?

Different samples have different proportions of classified cells and immune cells and this seems to be mostly driven by project. I think the most interesting thing here (besides showing that cell type composition is different in general) is that the cell types obtained is project specific. I'm not sure we want to highlight that... but I think there's definitely some technical variation that can impact which cell types are identified in the first place.

I think we need to do more rigorous statistical analysis to make any other real conclusions beyond that at this point.

Provide directions for reviewers

Are there particularly areas you'd like reviewers to have a close look at?

Are there things missing that you want to see at this stage?
I generally don't feel great about this, but I'm not sure what else to do before diving down a rabbit hole, so I thought I would stop and get some feedback.

Note that I did look at just T cell infiltrate since that's been shown to be found in osteo samples before and I didn't see anything worth noting, but I can add that section back in if you would like to see it.

Author checklists

Analysis module and review

This analysis module uses the analysis template and has the expected directory structure.
The analysis module README.md has been updated to reflect code changes in this pull request.
The analytical code is documented and contains comments.
Any results and/or plots this code produces have been added to your S3 bucket for review.

Reproducibility checklist

Code in this pull request has been added to the GitHub Action workflow that runs this module.
The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

…teo-consensus

jaclyn-taroni

I am glad to see @jashapiro was called in for review of the code here. I'll post my thoughts soon, but I did notice one thing that I am returning now.

jaclyn-taroni · 2025-02-07T15:10:52Z

analyses/cell-type-consensus/exploratory-notebooks/utils/setup-functions.R

+  total_cells_df <- df |> 
+    dplyr::group_by(library_id) |> 
+    dplyr::summarize(
+      total_cells_per_library = length(library_id),


Would dplyr::n() be more appropriate here?

jaclyn-taroni · 2025-02-07T15:11:27Z

analyses/cell-type-consensus/exploratory-notebooks/utils/setup-functions.R

+
+  summary_df <- df |> 
+    dplyr::group_by(library_id, sample_type, consensus_annotation, consensus_ontology) |> 
+    dplyr::summarize(total_cells_per_annotation = length(consensus_annotation)) |>


Same comment about dplyr::n() if you are counting things in a group

jaclyn-taroni

To give some general scientific feedback here:

I arbitrarily broke up "hot" and "cold" tumors based on having at least 5% immune cells. This is totally arbitrary, but just to show that we do have different groups of tumors.

I would take this out. I think the fact that there is heterogeneity is pretty apparent without arbitrarily picking a threshold.

Lastly I looked at immune composition alongside some of the clinical metadata including primary/mets, disease timing, tissue location and seq unit. There may be some difference in single-cell/nuclei but sample numbers are not comparable. I could definitely do the stats to see if this is the case, but I think if we want to do any real comparisons we might want to look into methods on comparing cell composition in single-cell.

What I think we should aim for – and we can discuss this in our normal planning process – is using the labels to generate a hypothesis someone could test.

jashapiro

Overall, I think this is a pretty good exploration, so I don't think I have much in the way of major comments. One note is that the hot vs. cold (generally) is pretty significantly intertwined with project, which makes it very hard to interpret the relative contribution of things like cell vs. nucleus, as those categories don't appear to be evenly distributed between projects.

I made some coding suggestions to try to remove some of the warnings you are seeing, and to reduce joins, though the latter changes are less important.

analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd

analyses/cell-type-consensus/exploratory-notebooks/utils/setup-functions.R

analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd

jashapiro · 2025-02-10T18:52:59Z

analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd

+stacked_barchart(total_order_df, fill_color = "top_celltypes", colors = all_celltype_colors)
+```
+
+It looks like there's definitely some variation between distributions of cell types within the osteo samples. 


Suggested change

It looks like there's definitely some variation between distributions of cell types within the osteo samples.

It looks like there's definitely some variation among distributions of cell types within the osteo samples.

jashapiro · 2025-02-10T19:57:30Z

analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd

+When looking at all samples together we do see variation in immune cells classified and it appears that `SCPCP000017` and `SCPCP000018` have more cells classified in general and have more cells classified as immune. 
+It does appear that libraries that have more immune cell composition also have a higher percentage of non-immune cell types which could very well be a technical artifact and related to sample prep. 
+
+Just for visualization purposes, let's classify these as "hot" and "cold", where "hot" tumors have an immune composition > 5%.


Based on @jaclyn-taroni's comment I'm not going review this section.

jashapiro · 2025-02-10T20:27:05Z

analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd

+    primary_or_metastasis = dplyr::if_else(!is.na(primary_or_metastasis), primary_or_metastasis, disease_timing),
+    disease_timing_mod = dplyr::if_else(disease_timing %in% c("Initial diagnosis", "Recurrence"), disease_timing, "other"),


I think we want to add ordering here so the plots are in expected orders. I also am not sure why we seem end up with only "Recurrence" sample in the primary_or_metastasis? (Since it is only one, I think it could be dropped, maybe?)

jashapiro · 2025-02-10T20:32:41Z

analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd

+It looks like there might be higher immune infiltrate in "bone" samples, but again I think if we want to make any conclusions we need to look at software that helps correct for technical artifacts. 
+There may also be a difference in single-cell and single-nuclei, but there are much fewer single-cell samples so it's hard to say. 


If we really want to look at this, we might well want to make a little model with all factors, but I don't think now is the time to do that.

analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>

allyhawkins · 2025-02-10T23:19:34Z

@jaclyn-taroni and @jashapiro thanks for your feedback. I removed the hot/cold section as requested and then made some minor plot updates and code changes based on reviews.

I think we want to add ordering here so the plots are in expected orders. I also am not sure why we seem end up with only "Recurrence" sample in the primary_or_metastasis? (Since it is only one, I think it could be dropped, maybe?)

I'm not totally sure what you mean be expected order, but I assume you meant primary first, initial diagnosis first, etc? So I made that change. As far as the "Recurrence" sample that was showing up it was because for one project the primary/mets information is actually in the disease_timing column instead of a separate column. Then for one of those samples, instead of primary or metastasis it says Recurrence. I updated the code to make that sample NA and then remove it prior to making the plots.

Other than that I left everything pretty much the same since we are looking more for a qualitative overview. Here's the updated notebook:
03-osteosarcoma-consensus-celltypes.nb.html.zip

jaclyn-taroni · 2025-02-11T14:14:05Z

I don't think I need to take another look at this, so I'm going to remove my review request.

jashapiro

This looks good to me, with just a small suggestion about the labels for the plots.

analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>

allyhawkins · 2025-02-11T16:12:10Z

Note that I cancelled the workflow to run the module here since these changes are only exploratory notebooks and not run in CI.

allyhawkins added 8 commits February 5, 2025 20:16

create setup functions

8ea88df

list of immune cell types

b52c9e3

initiate osteo notebook

41dc126

notebook for osteo samples

48a78e9

move notebook

abf883b

add consensus immune list

cdea7f3

document notebook and utils folder

fd2223a

Merge remote-tracking branch 'AlexsLemonade/main' into allyhawkins/os…

438ee31

…teo-consensus

allyhawkins requested a review from jaclyn-taroni as a code owner February 6, 2025 23:46

re-render

435d554

allyhawkins requested a review from jashapiro February 7, 2025 17:05

jaclyn-taroni reviewed Feb 10, 2025

View reviewed changes

jashapiro reviewed Feb 10, 2025

View reviewed changes

allyhawkins and others added 2 commits February 10, 2025 16:27

Apply suggestions from code review

5e24527

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>

clean up based on review comments and remove hot/cold

1ef6d21

allyhawkins requested review from jaclyn-taroni and jashapiro February 10, 2025 23:19

jaclyn-taroni removed their request for review February 11, 2025 14:14

jashapiro approved these changes Feb 11, 2025

View reviewed changes

analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd Outdated Show resolved Hide resolved

analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd Outdated Show resolved Hide resolved

allyhawkins and others added 2 commits February 11, 2025 10:02

Apply suggestions from code review

6c59841

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>

re-render with updated plot titles

93db0bf

allyhawkins merged commit 259df70 into AlexsLemonade:main Feb 11, 2025
1 of 3 checks passed

allyhawkins deleted the allyhawkins/osteo-consensus branch February 11, 2025 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore consensus cell types in osteosarcoma samples #1028

Explore consensus cell types in osteosarcoma samples #1028

allyhawkins commented Feb 6, 2025

jaclyn-taroni left a comment

jaclyn-taroni Feb 7, 2025

jaclyn-taroni Feb 7, 2025

jaclyn-taroni left a comment

jashapiro left a comment

jashapiro Feb 10, 2025

jashapiro Feb 10, 2025

jashapiro Feb 10, 2025

jashapiro Feb 10, 2025

allyhawkins commented Feb 10, 2025

jaclyn-taroni commented Feb 11, 2025

jashapiro left a comment

allyhawkins commented Feb 11, 2025

	It looks like there's definitely some variation between distributions of cell types within the osteo samples.
	It looks like there's definitely some variation among distributions of cell types within the osteo samples.

		primary_or_metastasis = dplyr::if_else(!is.na(primary_or_metastasis), primary_or_metastasis, disease_timing),
		disease_timing_mod = dplyr::if_else(disease_timing %in% c("Initial diagnosis", "Recurrence"), disease_timing, "other"),

		It looks like there might be higher immune infiltrate in "bone" samples, but again I think if we want to make any conclusions we need to look at software that helps correct for technical artifacts.
		There may also be a difference in single-cell and single-nuclei, but there are much fewer single-cell samples so it's hard to say.

Explore consensus cell types in osteosarcoma samples #1028

Explore consensus cell types in osteosarcoma samples #1028

Conversation

allyhawkins commented Feb 6, 2025

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

What is the goal of this pull request?

Briefly describe the general approach you took to achieve this goal.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Results

What types of results does your code produce (e.g., table, figure)?

What is your summary of the results?

Provide directions for reviewers

Are there particularly areas you'd like reviewers to have a close look at?

Author checklists

Analysis module and review

Reproducibility checklist

jaclyn-taroni left a comment

Choose a reason for hiding this comment

jaclyn-taroni Feb 7, 2025

Choose a reason for hiding this comment

jaclyn-taroni Feb 7, 2025

Choose a reason for hiding this comment

jaclyn-taroni left a comment

Choose a reason for hiding this comment

jashapiro left a comment

Choose a reason for hiding this comment

jashapiro Feb 10, 2025

Choose a reason for hiding this comment

jashapiro Feb 10, 2025

Choose a reason for hiding this comment

jashapiro Feb 10, 2025

Choose a reason for hiding this comment

jashapiro Feb 10, 2025

Choose a reason for hiding this comment

allyhawkins commented Feb 10, 2025

jaclyn-taroni commented Feb 11, 2025

jashapiro left a comment

Choose a reason for hiding this comment

allyhawkins commented Feb 11, 2025