-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore consensus cell types in osteosarcoma samples #1028
Explore consensus cell types in osteosarcoma samples #1028
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am glad to see @jashapiro was called in for review of the code here. I'll post my thoughts soon, but I did notice one thing that I am returning now.
total_cells_df <- df |> | ||
dplyr::group_by(library_id) |> | ||
dplyr::summarize( | ||
total_cells_per_library = length(library_id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would dplyr::n()
be more appropriate here?
|
||
summary_df <- df |> | ||
dplyr::group_by(library_id, sample_type, consensus_annotation, consensus_ontology) |> | ||
dplyr::summarize(total_cells_per_annotation = length(consensus_annotation)) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about dplyr::n()
if you are counting things in a group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To give some general scientific feedback here:
I arbitrarily broke up "hot" and "cold" tumors based on having at least 5% immune cells. This is totally arbitrary, but just to show that we do have different groups of tumors.
I would take this out. I think the fact that there is heterogeneity is pretty apparent without arbitrarily picking a threshold.
Lastly I looked at immune composition alongside some of the clinical metadata including primary/mets, disease timing, tissue location and seq unit. There may be some difference in single-cell/nuclei but sample numbers are not comparable. I could definitely do the stats to see if this is the case, but I think if we want to do any real comparisons we might want to look into methods on comparing cell composition in single-cell.
What I think we should aim for – and we can discuss this in our normal planning process – is using the labels to generate a hypothesis someone could test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this is a pretty good exploration, so I don't think I have much in the way of major comments. One note is that the hot vs. cold (generally) is pretty significantly intertwined with project, which makes it very hard to interpret the relative contribution of things like cell vs. nucleus, as those categories don't appear to be evenly distributed between projects.
I made some coding suggestions to try to remove some of the warnings you are seeing, and to reduce joins, though the latter changes are less important.
analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-consensus/exploratory-notebooks/utils/setup-functions.R
Outdated
Show resolved
Hide resolved
analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd
Outdated
Show resolved
Hide resolved
stacked_barchart(total_order_df, fill_color = "top_celltypes", colors = all_celltype_colors) | ||
``` | ||
|
||
It looks like there's definitely some variation between distributions of cell types within the osteo samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there's definitely some variation between distributions of cell types within the osteo samples. | |
It looks like there's definitely some variation among distributions of cell types within the osteo samples. |
When looking at all samples together we do see variation in immune cells classified and it appears that `SCPCP000017` and `SCPCP000018` have more cells classified in general and have more cells classified as immune. | ||
It does appear that libraries that have more immune cell composition also have a higher percentage of non-immune cell types which could very well be a technical artifact and related to sample prep. | ||
|
||
Just for visualization purposes, let's classify these as "hot" and "cold", where "hot" tumors have an immune composition > 5%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on @jaclyn-taroni's comment I'm not going review this section.
primary_or_metastasis = dplyr::if_else(!is.na(primary_or_metastasis), primary_or_metastasis, disease_timing), | ||
disease_timing_mod = dplyr::if_else(disease_timing %in% c("Initial diagnosis", "Recurrence"), disease_timing, "other"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to add ordering here so the plots are in expected orders. I also am not sure why we seem end up with only "Recurrence" sample in the primary_or_metastasis
? (Since it is only one, I think it could be dropped, maybe?)
It looks like there might be higher immune infiltrate in "bone" samples, but again I think if we want to make any conclusions we need to look at software that helps correct for technical artifacts. | ||
There may also be a difference in single-cell and single-nuclei, but there are much fewer single-cell samples so it's hard to say. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really want to look at this, we might well want to make a little model with all factors, but I don't think now is the time to do that.
analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd
Show resolved
Hide resolved
analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd
Outdated
Show resolved
Hide resolved
Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>
@jaclyn-taroni and @jashapiro thanks for your feedback. I removed the hot/cold section as requested and then made some minor plot updates and code changes based on reviews.
I'm not totally sure what you mean be expected order, but I assume you meant primary first, initial diagnosis first, etc? So I made that change. As far as the "Recurrence" sample that was showing up it was because for one project the primary/mets information is actually in the Other than that I left everything pretty much the same since we are looking more for a qualitative overview. Here's the updated notebook: |
I don't think I need to take another look at this, so I'm going to remove my review request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, with just a small suggestion about the labels for the plots.
analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-consensus/exploratory-notebooks/03-osteosarcoma-consensus-celltypes.Rmd
Outdated
Show resolved
Hide resolved
Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>
Note that I cancelled the workflow to run the module here since these changes are only exploratory notebooks and not run in CI. |
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
Closes #1004
What is the goal of this pull request?
The goal is to look at the consensus cell types assigned across all osteosarcoma samples found in three different projects and identify any variation across samples. In particular, we want to look for samples with high and low immune infiltrate (hot and cold tumors). What I'm adding here is pretty basic and just summarizes the consensus cell types and total immune percentage. My understanding is we want to be able to make an observation about samples with different immune populations here, but any biological insight is going to be additional work outside the scope of this PR.
Briefly describe the general approach you took to achieve this goal.
While I was working on this I also moved the setup function that we used in the first exploratory notebook to it's own R script to source in and added a list of just immune cell types as a reference.
If known, do you anticipate filing additional pull requests to complete this analysis module?
Yes
Results
What types of results does your code produce (e.g., table, figure)?
Rendered HTML report here:
02-explore-consensus-results.nb.html.zip
What is your summary of the results?
Different samples have different proportions of classified cells and immune cells and this seems to be mostly driven by project. I think the most interesting thing here (besides showing that cell type composition is different in general) is that the cell types obtained is project specific. I'm not sure we want to highlight that... but I think there's definitely some technical variation that can impact which cell types are identified in the first place.
I think we need to do more rigorous statistical analysis to make any other real conclusions beyond that at this point.
Provide directions for reviewers
Are there particularly areas you'd like reviewers to have a close look at?
Are there things missing that you want to see at this stage?
I generally don't feel great about this, but I'm not sure what else to do before diving down a rabbit hole, so I thought I would stop and get some feedback.
Note that I did look at just T cell infiltrate since that's been shown to be found in osteo samples before and I didn't see anything worth noting, but I can add that section back in if you would like to see it.
Author checklists
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.