-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding SCTransform as an alternative to anchor transfer-Wilms tumor annotation (SCPCP000014) #836
Adding SCTransform as an alternative to anchor transfer-Wilms tumor annotation (SCPCP000014) #836
Conversation
…nto JingxuanChen7/wilms14_anchor2
Regarding the error in CI, it could due to missing FYI, I was able to finish the computing on a lightsail 4XL machine only after I put the |
Hi @JingxuanChen7: I wanted to let you know that I have started to look at this submission. I am starting with a bit of debugging to see why it might have failed in CI, so you may see a few changes pushed to the branch. I will then be looking more directly at the changes you made; it does look like SCTransform may be performing better for this application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this looks pretty good. I made some suggestions in a few places, including using purrr::pwalk
to allow you to remove some duplicated code. I haven't fully tested that code though, so there may well be a couple of typos in there!
I also pushed in some changes to actually skip the SCT transformation of the reference during CI, as I suspect that was what was causing the CI to fail. I have not yet finished testing that, but hopefully it will work out! I am also working on a separate branch to add a docker image for this module, which will hopefully speed up testing as well.
analyses/cell-type-wilms-tumor-14/exploratory_analysis/03_cnv/03_runCopyKat.R
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-14/exploratory_analysis/03_cnv/03_runInferCNV.R
Show resolved
Hide resolved
...ll-type-wilms-tumor-14/plots/01_anchor_transfer_seurat/archive/SCPCL000846_celltype_core.png
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-14/scripts/01_anchor_transfer_seurat.R
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-14/scripts/utils/01_anchor_transfer_seurat_functions.R
Outdated
Show resolved
Hide resolved
mutate(annot = celltype) %>% | ||
mutate(annot = case_when(compartment == "stroma" ~ "stroma", | ||
compartment == "immune" ~ "immune", | ||
compartment == "endothelium" ~ "endothelium", | ||
TRUE ~ annot)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the logic here. While I can understand that at the end you might not want to retain the fine annotations, I feel like you would want to keep them during the anchor transfer step, and maybe combine the two labels afterward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment.
My goal here is to improve the prediction scores for stroma
, immune
, and endothelium
and make the annotation cleaner in visualization. If detailed cell types are included here, we may end up getting no cell types related to stroma
, immune
, or endothelium
other than Unknown
, for two reasons: (i) In general too few stroma
, immune
, and endothelium
cells are present in the dataset; (ii) Prediction scores for detailed cell types are getting lower which creates a lot of Unknown
.
However, I'm not sure if this is the correct thing to do. As you mentioned, I could also try to keep cell types during anchor transfer step, and combine labels later on (or even keep all Unknown
s as is?). @jashapiro Any suggestions?
Thanks for helping troubleshoot the CI. It makes sense to skip SCT when processing reference datasets. I agree that it may cause CI failure.
Actually,
Thanks for the suggestion! I applied it and finished re-run the step! @jashapiro I think I finished dealing with all review suggestions so far (07fa1a5). Please let me know how you would like to proceed with the last conversation (#836 (comment)). Thanks again for all the help! |
@JingxuanChen7 I think what I meant by this was that SCTransform is giving results that make more sense with respect to the immune cells scatterered within the fetal_nephron. But when I was just looking into this a bit more, I wonder if there might be an error in the code? Specifically, I see that when you call I do not know if this would make a difference in the results, but it seems worth looking at? As to #836 (comment)), I think that what you are doing by combining the compartment and finer level detail might be useful, but I think I would probably call that a fully separate method: so you might want to test the original Without testing and comparison, I don't think we can say which of these would be best, if doing the combination in the reference annotation works better or worse than combining the results after transfer. |
Hi @jashapiro , thank you so much for the comments!
When I tested my codes, it looked like as long as the default assay is set to "SCT",
In my latest commit (9e334c0), I specified
Thank you so much for the suggestion. I agreed that the cell types can be merged later on as needed. In my latest commit, I changed back to original In addition, I realized that since the submission due date is approaching very soon, I'm not sure if I have time to merge another PR. Therefore, I also added a script to create a summary table for results in this PR. Besides, I have some other codes (notebooks) aiming to explore feature selection & clustering methods, which wouldn't affect cell type results and not part of the final table. Would that be fine to file them after due date (probably as exploratory analysis as well)? Thank you again for all the efforts! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw this come in and I had a small suggestion for the concatenation function.
Thank you for adding the explicit setting; I saw that you were changing the default assay, but I was unsure if there might be some internal differences in how the function is applied that this setting enables. It does seem like the results may have changed a bit, but I am not sure. The main thing I was curious about was if the prediction scores might have changed with the explicit setting.
Yes, thank you for doing this! I would just ask that you include this script as well in the main analysis shell script so it can be tested (though I realize that this may require changes to skip the SCT files which won't exist in testing) Can you also please include this table in the PR? To do this, you may want to modify the
Abosolutely! We would be very happy for you to continue to work to improve your annotations after the deadline. We hope that you can stay involved as much as possible as we continue the project and add more analyses! |
Hi @jashapiro , I tried to commit the summary table. However, the original file size is ~7MB, even the gzipped version is 600KB+. In this case,
|
I guess I was forgetting that this table has a row for every cell in the whole project, and barcodes don't compress very well! Please ignore my suggestion of including the table! Instead, maybe just put a note in the readme with the path to the final table on S3, for convenience in retrieving that file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JingxuanChen7 I think this all looks good for this stage.
For the future, I would be interested to hear how you might want to proceed toward refining and improving these annotations, especially if you have more thoughts on identifying tumor cells in particular. It would be great to have some more discussion posts on that front, if you have anything you might want to share.
We might also want to do some comparisons between your annotations and the annotations that were provided as part of our original pipeline using SingleR and CellAssign with our default references.
Finally, I do want to re-emphasize that as I said earlier, we would love it if you are able to continue to contribute to the project beyond the deadline for these annotations. We obviously have a good deal more work to do, and your efforts and expertise will be welcome as we continue!
Hi @JingxuanChen7, I wanted to just check in with you about whether you were expecting to submit any more PRs with a final set of cell type assignments? In particular, I want to note that the submission guidelines require specific table headings, and in particular require a |
Good morning @jashapiro , thank you so much for asking! Since the due date is tomorrow, I don't think I have sufficient time to complete another PR. Regarding |
I think a rough classification would still be useful, so I would encourage you to add it. Having a complete table, even if the results are subject to refinement, is going to be required based on the submission guidelines. If you are able to also run copyKAT, rough as it is, that might be worth doing as well; using immune cells as reference when possible is something that seems to have worked in some other cases. |
@jashapiro Thank you so much for the prompt reply! Should I open another short PR to make this modification? Regarding CopyKat, in my case the result seems not as good as inferCNV in the sample I used. I decided not to go ahead with this software since it could be misleading. |
Yes, please do make a PR with this addition. |
Hello @jashapiro , I have opened a PR for adding a tumor classification to the result table. Thanks! |
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
Proposal discussion: #628
Issue: #782
What is the goal of this pull request?
Normalization method can make a difference in anchor transfer strategy. In my previous PR, I filed results generated by standard Seurat workflow, i.e,
NormalizeData
,FindVariableFeatures
, followed byScaleData
. Here, I tried out an alternative normalization method calledSCTransform
for anchor transfer. In addition, the default UMAP for initial visualization was generated withSCTransform
.Briefly describe the general approach you took to achieve this goal.
I modified my previous code in following aspects:
SCTransform
.RNA
for standard Seurat workflow andSCT
forSCTransform
.SCTransform
-based instead of standard Seurat workflow.fetal_nephron
in my samples, I removed subtype annotation forstroma
,immune
andendothelium
for cleaner annotation. Please let me know if you still want to keep these sub-levels for minor compartments.If known, do you anticipate filing additional pull requests to complete this analysis module?
NA
Results
What is the name of your results bucket on S3?
researcher-009160072044-us-east-2
What types of results does your code produce (e.g., table, figure)?
s3://researcher-009160072044-us-east-2/cell-type-wilms-tumor-14/results/01_anchor_transfer_seurat
celltype
andcompartment
.[sample_id]_[level].csv
label transfer result table including cell ID, predicted cell type, along with predicted scores.results/01_anchor_transfer_seurat/RNA
: Results generated by normalization methodLogNormalize
.results/01_anchor_transfer_seurat/SCT
: Results generated by normalization methodSCTransform
.What is your summary of the results?
Overally, there is only small differences in terms of anchor transfer results.
E.g., in library
SCPCL000850
LogNormalize
SCTransform
It looks that
LogNormalize
method shows an overall higher score, with more anchors identified.LogNormalize
SCTransform
However, in UMAP visualization made from
SCTransform
, I was able to "fix" the strange visualization for librarySCPCL001109
(old version here).The anchor transfer results with
SCTransform
for this library also make more sense to me, since I think it is problematic to observe scatteredimmune
withinfetal_nephron
.Anchor transfer with
LogNormalize
Anchor transfer with
SCTransform
In this case, I think it would be useful to keep codes/results for
SCTransform
.Provide directions for reviewers
What are the software and computational requirements needed to be able to run the code in this PR?
Are there particularly areas you'd like reviewers to have a close look at?
NA
Is there anything that you want to discuss further?
Author checklists
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.