-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wilms tumor 06- clustering exploration #750
Wilms tumor 06- clustering exploration #750
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @maud-p, thanks for filing this next PR!
I'm going to start having a careful look for review, but first there are two quick things I see off the bat that you can start working on if you want! For one, I left you a separate inline comment. Second, it doesn't look like the renv.lock
file is up to date with additional packages used in this notebook. Can you please snapshot to update the lockfile? Thanks!
for (sample_id in metadata$scpca_sample_id) { | ||
if (!running_ci) { | ||
# Cluster exploration | ||
rmarkdown::render(input = file.path(notebook_template_dir, "03_clustering_exploration.Rmd"), | ||
params = list(scpca_project_id = project_id, sample_id = sample_id), | ||
output_format = "html_document", | ||
output_file = paste0("03_clustering_exploration", sample_id, ".html"), | ||
output_dir = file.path(notebook_output_dir, sample_id)) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this for-loop is actually the same as the previous for loop (which also contains this if-statement for the label transfer notebooks) for (sample_id in metadata$scpca_sample_id) {
, you should be able to move this code up into the previous for-loop and get rid of this additional for-loop.
In other words, the cluster exploration step can just go right after the step where you render this notebook: 02b_label-transfer_fetal_kidney_reference_Stewart.Rmd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first round of review here, I've looked things over for clarity and correctness. Overall it looks like it's in great shape!! After this first round, I'll do another round of review more focused on the science. Here are some comments in addition to the others I left inline:
- Can you update the
results/README.md
to include this notebook? - The functions you added & docs about them look great, thanks for doing that! It really makes the notebook easier to read and work with :) Let's just do a bit more reorganization:
- Can you scoot up the "Functions" section (which looks great, by the way!) to be above "Analysis" but after "Introduction"?
- Can you order the functions in the same order that they are used in the actual notebook?
- I'm not sure the alluvial plots (while very cool!) are the easiest to read, because of how long the cell type labels are. Is it possible to make that font small enough to be able to read the labels clearly? If not (or alternatively), I wonder if a heatmap might be a clearer plot to make here that shows counts of cells in each combination of groups? There are more complicated statistics that one could show in a heatmap comparing these groupings, but for an exploratory notebook like this I think just counts are probably sufficient. One way to make this plot would be (vs using existing heatmap packages) to use
ggplot2::geom_rect()
. You can create new data frame that counts the combinations of cluster/annotation (dplyr::count()
can help for this!) and then plot cluster & annotation against each other, with a fill aesthetic of the actual counts. Let me know if this makes sense or how I can further explain!
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
Thank you @sjspielman for looking into it :) I have added the renv.lock modified, sorry, I missed it in my Regarding the
|
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
Hi @sjspielman ,
I haven't updated the
I tried to go for both:
The aim of these two approach is to show that whatever method we choose for labelling the cells (full or kidney fetal reference), they seem to converge for the identification of endothelial and immune cells. This is important as I like to use it as the next step for running Would this make sense? thank you!! |
@maud-p just a quick heads up that I'm out of the office now at the AACR Pediatric conference, so I will be back to review this and chat about |
Hi @sjspielman , thanks for letting me know, hope you enjoy the conference! I'll also be on conference next week 16-18 September FYI :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things are looking good!! Here's my next round of review, including some thoughts on next steps:
-
Per your comment, yes you're right, my bad, but you make great point! We should have a quick
README.md
in the notebook directory at this point. I see one now exists as an empty file, so let's fill it in a bit with some brief information. You can probably get a lot of the content you need here fromresults/README.md
, too, for the other notebooks.I haven't updated the
results/README.md
for now because the notebook03_clustering_exploration.Rmd
is not saving any output inresults
. I am just generating a report innotebook
. So I updated theREADME.md
file in the analysis module. Should I create one for thenotebook
directory? -
In the cell annotation section, it would help with interpretation if you could add a brief indication of the differences among fetal references since there are a few.
-
Thanks for trying out the different plotting strategies here instead of alluvial! The sankey plots do look much better than the alluvial label-wise, but for the plots with a ton of labels, it can be really hard to follow the paths left to right. The heatmaps you made are exactly what I had in mind, and I think because they are easier to interpret in the case of many labels, we should go with that (edit: and we can remove the sankey code and its function, too)! The documentation you have for the function is basically fine too (made 1 little suggestion).
- I'll point out a cool tidyverse trick which you can use if you want! Instead of writing
sym
, you can use the "curly-curly" syntax. Then, when calling the function, do not put the column names in quotes. To show how to do this, I've rewritten your heatmap function to use this strategy, and then I show how you'd make a plot with this. If you like this approach, feel free to use it, but if not also that's totally fine!
- I'll point out a cool tidyverse trick which you can use if you want! Instead of writing
do_Table_Heatmap <- function(data, first_group, last_group ){
df <- data@meta.data %>%
mutate_if(sapply(data@meta.data, is.character), as.factor) %>%
count({{first_group}}, {{last_group}}, name = "count")
p <- ggplot(df, aes(x= {{first_group}}, y = {{last_group}}, fill = count)) +
geom_tile() +
viridis::scale_fill_viridis(discrete=FALSE) +
theme_bw() +
theme(text = element_text(size = 20))
return(p)
}
# Use the function - no quotes around column names!
do_Table_Heatmap(data = srat,
first_group = seurat_clusters,
last_group = fetal_kidney_predicted.compartment
)
A couple thoughts about inferCNV
- As I understand it based on #635, the goal of this step would be to begin identifying tumor vs normal cells, yes? If so... (and if not, please tell me if I'm confused!)
- I'm not sure how it will go out if we assume that all non-endothelial and non-immune cells are necessarily tumor.
inferCNV
works by taking a set of pre-defined tumor and normal, and then assess the CNV in the cells a priori known to be tumor. So, usinginferCNV
to determine which cells are tumor cells might not be the right logic for this problem. Let me know what else you are thinking about usinginferCNV
, maybe I am missing something! - Because of this, I wonder if using
copyKAT
might therefore be an approach that better meets the goal of this next step in analysis. This approach can be used to help identify tumor/normal based on aneuploidy. It can either:- Directly infer aneuploidy from the data without additional information
- Take a set of a priori defined normal cells (aka, endothelial and immune!) which are used as a baseline to help infer aneuploidy. This is recommended when there are not a lot of CNVs in the data, which may be for many of the samples here.
Something to be careful with for the next steps in analysis is the heterogeneity among the samples, in particular as it relates to the subdiagnosis - this ScPCA project is split between anaplastic and favorable histology WT samples, which may influence the success of the fetal reference as well as CNV inference methods. To help keep this in mind and guide interpretation, I think we should indicate this information somewhere in the report.
I suggest we do this - immediately after the introduction header, add a chunk like this to grab this information:
subdiagnosis <- readr::read_tsv(
file.path("..", "..", "..", "data", "current", params$scpca_project_id, "single_cell_metadata.tsv")
) |>
dplyr::filter(scpca_sample_id == params$sample_id) |>
dplyr::pull(subdiagnosis)
Then, add a sentence after the sentence printing out the sample ID to indicate its subdiagnosis. I've directly suggested this inline to help. (fyi I can't suggest a new code chunk altogether, because of the syntax of how GitHub suggestions work with the three back-ticks...and chunks need them too! so, it becomes a mess...)
Either way, to get a baseline of normal cells to potentially use for next steps, I do think we need to add more comparisons (aka more heatmaps at the end!) to this notebook. Right now your heatmaps compare between clusters and annotations from different approaches, but we should probably add a couple heatmaps comparing annotations to each other. Combined with the other exploration in this notebook, it would be good to see if different automated annotations agree on which cells are endothelial and/or immune. This will give us a bit more confidence in proceeding, too. In particular, I think it would be helpful to compare the fetal references to each other directly.
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
Dear @sjspielman , I should have made the few changes and added the last For some reasons, I got an error for one of the sample But I wanted to already share with you the notebooks. I had a look at few of the reports, and it seems that the different annotation strategies converge in the identification of endothelial and immune cells. I especially like the fetal kidney reference, FYI, I will be away from tomorrow until next Thursday, I'll be at the SIOP RTSG. I hope to hear & learn new relevant insights for Wilms tumor! Thank you! |
I'll have a look at this sample and see if I can track down the problem.
Thanks again for sharing all of these! I'll look through them all and see if we come to the same conclusions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tracked down the error for SCPCS000197
, and it looks like it is coming from the Seurat function AddModuleScore
. Some aspect of this dataset does not work with their default binning of 24 bins. The quickest fix for this (and I don't think we need anything more involved for this circumstance) is just to pass a different number of bins in for this dataset. It seems 23 works just fine for this dataset, so I've suggested code to just use 23 instead of 24 for that dataset.
Once you accept these suggestions, please test if this code indeed works since code suggestions can be notoriously finicky, and/or maybe I missed a spot! If it does work, then you can go ahead and generate that last notebook 🎉
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
It worked, thank you so much @sjspielman ! |
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken some time to carefully go over all of the notebooks. I still want to think a little bit more about how to approach the next steps here, but wanted to share my thoughts so far.
I went through all samples to ask two questions, with my answers a sub-bullets. See below for a compiled table of what I observed. It's also worth noting that I did not observe any real patterns between favorable and anaplastic samples, so that's helpful to know a little about how heterogeneity plays out here.
- Does
fetal_full_predicted.organ
indeed show that the majority of cells are predicted kidney?- Yes, most samples do show that the majority of cells are kidney, but several samples did NOT show this. For some of those, kidney might be a plurality, but sometimes it's just kind of a mess...
- Note: there is a sentence in the notebook where you write that we observe that kidney is always the majority; I suggested updating this sentence to better reflect the overall distribution across samples.
- Does
fetal_kidney_predicted.compartment
show immune and/or endothelial cells which could be used as a normal baseline?- Most do, but notably two samples did identify any immune or endothelial cells. SCPCS000187 and SCPCS000190. We'll need to think more about how to approach these samples later.
One challenge in interpreting these results is that we have a sense of relative proportion of each cell type in each cluster, but not the overall percentage or count of cells across the entire dataset. The exploration in this notebook is definitely helpful, but it's challenging to get a definitive sense of whether indeed kidney was a predicted majority, and whether there are sufficient immune and/or endothelial cells to use as a normal baseline.
Therefore, rather than proceeding right into inferCNV
, we'll probably want to run one additional analysis here in a notebook that looks at all samples at once. I'm thinking...
- What percentage of total cells does
fetal_full_predicted.organ
label as kidney? - What percentage of total cells does the
fetal_kidney_predicted.compartment
label as endothelial and/or immune? - We might also want to see how expression of a very small set of markers genes specifically relates to these groupings (aka, 1) fetal or not, and 2) assuming fetal, endo/immune or not), rather than seurat clusters, which I don't seem to have necessarily strong relationships to these annotations.
One reason I think this should be in a new (not template - just a single one!) notebook is it will make it easier to compare across samples all at once. We can make heavy use of "faceting" plots or tables to look at all of this at once. But again, I'm still thinking a bit more about how we can develop confidence in these annotations for next steps, so stay tuned a little bit more!
Also, at this point, I think this PR is about ready to go - if you can accept my suggestion and re-generate the notebooks to reflect the change, I will go ahead and approve and take it from here to merge in this PR!
sample | diagnosis | fetal_full_predicted: clearly has majority predicted kidney? | fetal_kidney_predicted: clearly has endo and/or immune? |
---|---|---|---|
SCPCS000168 | Anaplastic | yes | yes |
SCPCS000169 | Favorable | yes | yes |
SCPCS000170 | Favorable | yes | yes |
SCPCS000171 | Favorable | yes | yes |
SCPCS000172 | Favorable | NO | yes |
SCPCS000173 | Anaplastic | yes | yes |
SCPCS000175 | Favorable | NO | yes |
SCPCS000176 | Favorable | yes | yes |
SCPCS000177 | Favorable | yes | NO |
SCPCS000178 | Anaplastic | yes | yes |
SCPCS000179 | Anaplastic | yes | yes |
SCPCS000180 | Anaplastic | yes | yes |
SCPCS000181 | Favorable | yes | yes |
SCPCS000182 | Anaplastic | yes | yes |
SCPCS000183 | Favorable | NO | yes |
SCPCS000184 | Anaplastic | yes | yes |
SCPCS000185 | Favorable | maybe | yes |
SCPCS000186 | Anaplastic | yes | yes |
SCPCS000187 | Favorable | yes | NO |
SCPCS000188 | Favorable | yes | yes |
SCPCS000189 | Anaplastic | yes | yes |
SCPCS000190 | Anaplastic | yes | NO |
SCPCS000191 | Anaplastic | NO | yes |
SCPCS000192 | Anaplastic | NO | yes (immune only) |
SCPCS000193 | Anaplastic | NO | yes |
SCPCS000194 | Anaplastic | yes | yes |
SCPCS000195 | Anaplastic | maybe | yes |
SCPCS000196 | Favorable | yes | yes (endo only) |
SCPCS000197 | Favorable | yes | yes (endo only) |
SCPCS000198 | Favorable | yes | yes |
SCPCS000199 | Anaplastic | yes | yes (but low counts maybe) |
SCPCS000200 | Favorable | yes | yes |
SCPCS000201 | Favorable | maybe | yes |
SCPCS000202 | Favorable | maybe | yes |
SCPCS000203 | Favorable | yes | yes |
SCPCS000204 | Favorable | yes | yes |
SCPCS000205 | Favorable | yes | yes |
SCPCS000206 | Anaplastic | yes | yes |
SCPCS000208 | Anaplastic | yes | yes |
analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd
Outdated
Show resolved
Hide resolved
…ing_exploration.Rmd Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
Dear @sjspielman , thank you very much, these all makes lot of sense. I will re-run the analysis and should upload the notebooks by Thursday (will be travelling tomorrow, not sure how I'll have access to our server). I like the idea to look at all samples, I'll work on a notebook and start a new PR :) Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for re-running the notebooks @maud-p, looks great! I'll approve and take it from here to get it merged into main
. Congrats on another successful PR! 🎉
(FYI, we're working on our end to implement an approach to be able to fully test your module with test data, so stay tuned a little longer for this :) )
Dear @sjspielman, thank you very much !!! I am working on the next PR, I'll get back to you soon! Thank you very much for your help and effort to make it work!!! |
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
this is the following work on PR #704 taking into account the changes in PR #737
What is the goal of this pull request?
The aim here is to explore the clustering and label transfer from the 2 fetal references for each sample.
Briefly describe the general approach you took to achieve this goal.
Here I started from the output of the notebook 02b_label-transfer_fetal_kidney_reference_Stewart.Rmd that contains:
SCTransform
,PCA
,UMAP
,and explored the results looking at:
I compared the labels obtained from SingleR, CellAssign and the label transfer from the two fetal references (PR #737).
If known, do you anticipate filing additional pull requests to complete this analysis module?
Yes! More than one. I think that from this analysis, I can find a way to annotate healthy cells such as "immune" and "endothelial cells". From here, I will be able to fill a new PR to include inferCNV and/or copyKAT to the template.
Results
The notebook template produce a notebook per sample in
notebook/{sample_id}
folder. I have now uploaded the notebooks for the 2 first samples. Once we have discussed the analysis, I'll run for the 40 samples and add the notebooks!What is the name of your results bucket on S3?
What types of results does your code produce (e.g., table, figure)?
notebook
What is your summary of the results?
Provide directions for reviewers
What do you think?
What are the software and computational requirements needed to be able to run the code in this PR?
I render the notebook from the 00_run_workflow.R script. I open a new loop on purpose in order not to run everythink from PR #737 again, but I guess in a final step all the notebook will be ran i the same loop!
Are there particularly areas you'd like reviewers to have a close look at?
Is there anything that you want to discuss further?
I like to have your opinion on the best way to go to select normal cells as input for inferCNV. I am quite satisfyied by the labels from the fetal kidney reference
fetal_kidney_predicted.compartment
divided into:I think that we can safely take the immune and endothelial cells as healthy reference and run inferCNV from here.
Then, with the result of inferCNV, I hope to be able to further split the fetal kidney and stroma compartment into normal and cancer blastema, epithelial and stroma cells.
Author checklists
Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.