Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore consensus cell types in osteosarcoma samples #1028

Merged

Conversation

allyhawkins
Copy link
Member

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

Closes #1004

What is the goal of this pull request?

The goal is to look at the consensus cell types assigned across all osteosarcoma samples found in three different projects and identify any variation across samples. In particular, we want to look for samples with high and low immune infiltrate (hot and cold tumors). What I'm adding here is pretty basic and just summarizes the consensus cell types and total immune percentage. My understanding is we want to be able to make an observation about samples with different immune populations here, but any biological insight is going to be additional work outside the scope of this PR.

Briefly describe the general approach you took to achieve this goal.

  • First I just looked at the top cell types across all samples in a single plot, similar to the stacked plots we had seen previously. The overall cell types present are pretty similar, but the total percentage of those cell types changes for each sample.
  • I also looked at the top cell types separated by project. One thing to note is that generally project 23 has a lot fewer cells that are classified than both 17 and 18. Also it looks like 18 has more T cells, while 17 has more macrophages. Endothelial cells and smooth muscle cells are pretty consistent across samples if cells are classified.
  • I looked at total immune percentage vs. non-immune vs. unknown. Again, there are definitely variations across samples, but I think part of that is some samples having more cells that are classified to begin with.
  • I arbitrarily broke up "hot" and "cold" tumors based on having at least 5% immune cells. This is totally arbitrary, but just to show that we do have different groups of tumors.
  • Lastly I looked at immune composition alongside some of the clinical metadata including primary/mets, disease timing, tissue location and seq unit. There may be some difference in single-cell/nuclei but sample numbers are not comparable. I could definitely do the stats to see if this is the case, but I think if we want to do any real comparisons we might want to look into methods on comparing cell composition in single-cell.

While I was working on this I also moved the setup function that we used in the first exploratory notebook to it's own R script to source in and added a list of just immune cell types as a reference.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Yes

Results

What types of results does your code produce (e.g., table, figure)?

Rendered HTML report here:

02-explore-consensus-results.nb.html.zip

What is your summary of the results?

Different samples have different proportions of classified cells and immune cells and this seems to be mostly driven by project. I think the most interesting thing here (besides showing that cell type composition is different in general) is that the cell types obtained is project specific. I'm not sure we want to highlight that... but I think there's definitely some technical variation that can impact which cell types are identified in the first place.

I think we need to do more rigorous statistical analysis to make any other real conclusions beyond that at this point.

Provide directions for reviewers

Are there particularly areas you'd like reviewers to have a close look at?

Are there things missing that you want to see at this stage?
I generally don't feel great about this, but I'm not sure what else to do before diving down a rabbit hole, so I thought I would stop and get some feedback.

Note that I did look at just T cell infiltrate since that's been shown to be found in osteo samples before and I didn't see anything worth noting, but I can add that section back in if you would like to see it.

Author checklists

Analysis module and review

Reproducibility checklist

  • Code in this pull request has been added to the GitHub Action workflow that runs this module.
  • The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
  • If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
  • If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

@allyhawkins allyhawkins requested a review from jashapiro February 7, 2025 17:05
Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am glad to see @jashapiro was called in for review of the code here. I'll post my thoughts soon, but I did notice one thing that I am returning now.

total_cells_df <- df |>
dplyr::group_by(library_id) |>
dplyr::summarize(
total_cells_per_library = length(library_id),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would dplyr::n() be more appropriate here?


summary_df <- df |>
dplyr::group_by(library_id, sample_type, consensus_annotation, consensus_ontology) |>
dplyr::summarize(total_cells_per_annotation = length(consensus_annotation)) |>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about dplyr::n() if you are counting things in a group

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give some general scientific feedback here:

I arbitrarily broke up "hot" and "cold" tumors based on having at least 5% immune cells. This is totally arbitrary, but just to show that we do have different groups of tumors.

I would take this out. I think the fact that there is heterogeneity is pretty apparent without arbitrarily picking a threshold.

Lastly I looked at immune composition alongside some of the clinical metadata including primary/mets, disease timing, tissue location and seq unit. There may be some difference in single-cell/nuclei but sample numbers are not comparable. I could definitely do the stats to see if this is the case, but I think if we want to do any real comparisons we might want to look into methods on comparing cell composition in single-cell.

What I think we should aim for – and we can discuss this in our normal planning process – is using the labels to generate a hypothesis someone could test.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is a pretty good exploration, so I don't think I have much in the way of major comments. One note is that the hot vs. cold (generally) is pretty significantly intertwined with project, which makes it very hard to interpret the relative contribution of things like cell vs. nucleus, as those categories don't appear to be evenly distributed between projects.

I made some coding suggestions to try to remove some of the warnings you are seeing, and to reduce joins, though the latter changes are less important.

stacked_barchart(total_order_df, fill_color = "top_celltypes", colors = all_celltype_colors)
```

It looks like there's definitely some variation between distributions of cell types within the osteo samples.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It looks like there's definitely some variation between distributions of cell types within the osteo samples.
It looks like there's definitely some variation among distributions of cell types within the osteo samples.

When looking at all samples together we do see variation in immune cells classified and it appears that `SCPCP000017` and `SCPCP000018` have more cells classified in general and have more cells classified as immune.
It does appear that libraries that have more immune cell composition also have a higher percentage of non-immune cell types which could very well be a technical artifact and related to sample prep.

Just for visualization purposes, let's classify these as "hot" and "cold", where "hot" tumors have an immune composition > 5%.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @jaclyn-taroni's comment I'm not going review this section.

Comment on lines 349 to 350
primary_or_metastasis = dplyr::if_else(!is.na(primary_or_metastasis), primary_or_metastasis, disease_timing),
disease_timing_mod = dplyr::if_else(disease_timing %in% c("Initial diagnosis", "Recurrence"), disease_timing, "other"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to add ordering here so the plots are in expected orders. I also am not sure why we seem end up with only "Recurrence" sample in the primary_or_metastasis? (Since it is only one, I think it could be dropped, maybe?)

Comment on lines +415 to +416
It looks like there might be higher immune infiltrate in "bone" samples, but again I think if we want to make any conclusions we need to look at software that helps correct for technical artifacts.
There may also be a difference in single-cell and single-nuclei, but there are much fewer single-cell samples so it's hard to say.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really want to look at this, we might well want to make a little model with all factors, but I don't think now is the time to do that.

allyhawkins and others added 2 commits February 10, 2025 16:27
@allyhawkins
Copy link
Member Author

@jaclyn-taroni and @jashapiro thanks for your feedback. I removed the hot/cold section as requested and then made some minor plot updates and code changes based on reviews.

I think we want to add ordering here so the plots are in expected orders. I also am not sure why we seem end up with only "Recurrence" sample in the primary_or_metastasis? (Since it is only one, I think it could be dropped, maybe?)

I'm not totally sure what you mean be expected order, but I assume you meant primary first, initial diagnosis first, etc? So I made that change. As far as the "Recurrence" sample that was showing up it was because for one project the primary/mets information is actually in the disease_timing column instead of a separate column. Then for one of those samples, instead of primary or metastasis it says Recurrence. I updated the code to make that sample NA and then remove it prior to making the plots.

Other than that I left everything pretty much the same since we are looking more for a qualitative overview. Here's the updated notebook:
03-osteosarcoma-consensus-celltypes.nb.html.zip

@jaclyn-taroni
Copy link
Member

I don't think I need to take another look at this, so I'm going to remove my review request.

@jaclyn-taroni jaclyn-taroni removed their request for review February 11, 2025 14:14
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, with just a small suggestion about the labels for the plots.

allyhawkins and others added 2 commits February 11, 2025 10:02
Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>
@allyhawkins
Copy link
Member Author

Note that I cancelled the workflow to run the module here since these changes are only exploratory notebooks and not run in CI.

@allyhawkins allyhawkins merged commit 259df70 into AlexsLemonade:main Feb 11, 2025
1 of 3 checks passed
@allyhawkins allyhawkins deleted the allyhawkins/osteo-consensus branch February 11, 2025 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore consensus cell types in all osteo projects
3 participants