Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add marker gene lists for tumor cell states #971

Conversation

allyhawkins
Copy link
Member

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

Closes #939

What is the goal of this pull request?

Here I am adding a marker gene list that can be used to define different tumor cell states that we expect to find in the Ewing sarcoma samples. Right now the goal is to label cell states that have been shown to be consistently present across Ewing sarcoma studies and are not looking to find novel cell states. Generally it's been shown that EWS cells lie on a continuum of EWS-FLI1 expression and publications have categorized cells into EWS-FLI1 high and EWS-FLI1 low cells. There's also a population of cells that has been labeled as "proliferative" across multiple publications (although there is some old argument that EWS-FLI1 high cells are also proliferative). Because of this, I made a table that has a column for the name for the cell state (either proliferative, EWS-low, or EWS-high) and then each row is an individual marker gene. I also included any custom gene lists that we may consider using to identify these cell states that were published by others.

Briefly describe the general approach you took to achieve this goal.

There is no clear consensus of exactly which set of genes describes EWS-FLI1 high or low cells, but there are some key marker genes that were identified and validated in some of the publications. These are the ones that I chose to include in the main marker gene table, tumor-cell-state-markers.tsv.

  • Targets for EWS-FLI1 high and low cells from Aynaud et al came from this sentence:

We also checked the single-cell expression dynamics of eight genes known to be directly modulated by EWSR1-FLI1 (upregulated genes: PRKCB, LIPI, CCND1, and NR0B1; downregulated genes: IGFBP3, IL8, LOX, and VIM) (Figures 1C and S1).

  • Wrenn et al identified NT5E as a marker for EWS-FLI1 low cells along with a group of ECM and EMT related genes. I chose to include the subset of genes that they validated as having high expression in EWS-FLI1 low cells in this table. See Figure S2E

  • Goodspeed et al used MKI67 and PCNA to identify the "proliferative" population:

Among the Ewing sarcoma cells, a proliferating population was identified by the expression of MKI67 and PCNA (Fig. 1C-D).

  • MKI67 was also used to differentiate proliferating cells from mesenchymal like cells (EWS-FLI1 low cells) in spatial profiling performed by Wrenn et al.

Ki67 and cytoskeletal protein vimentin were used as morphology markers to identify differentially proliferative and mesenchymal regions, respectively, and CD31 was used to localize vasculature (Fig. 4B)

I also added two custom marker gene lists that I think we may want to use:

  • aynaud-ews-targets.tsv represent the targets for EWS-FLI1 identified in Aynaud et al Fig. 4. These genes were shown to be on a continuum with EWS-FLI1 expression at the single-cell level. Note that there were a few genes in this list that did not return any mappings for Ensembl IDs, so I left them as NA.
  • wrenn-nt5e-genes.tsv represent the intersection between the top genes correlated with NT5E expression in patient tumors and the top genes that were markers of NT5E+ Ewing sacoma cells. These genes were listed in Fig 5D and 5E.

The last thing I did here was update the README in the references folder to document all of these gene lists. I also included a section where I linked to potentially useful marker gene lists in MsigDB that were mentioned in the various publications I saw. I think these may be helpful in identifying EWS-FLI1 high/low cells or at least helping to validate our assignments so I added them for future reference just in case.

If known, do you anticipate filing additional pull requests to complete this analysis module?

The next thing I plan on doing is creating an exploratory notebook where I look at 2-3 samples and see if I can identify EWS-FLI1 high/low clusters of cells. To do this I am going to start with the genes in tumor-cell-state-markers.tsv.

Author checklists

Analysis module and review

Reproducibility checklist

  • Code in this pull request has been added to the GitHub Action workflow that runs this module.
  • The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
  • If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
  • If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

@allyhawkins allyhawkins requested review from sjspielman and removed request for jaclyn-taroni January 3, 2025 18:10
Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks pretty much in place, but I have a couple questions -

  • While looking (a bit) over the papers, I wondered whether we want to be considering CD44 here which is mentioned in Goodspeed, and seems to be otherwise recognized as a EWS-FL1 low marker?
  • How do you envision the gene_signatures tsvs being used in the future, and is there any information that indicates high/low for these genes? My sense here is the marker genes file is the main file of interest to explore cell states, but it might be supplemented with the other signature files? This comment might be me asking for a smidge more docs, but that also depends on your answer here :)
    • I'll also note that I did confirm their gene symbol id <-> ensembl mappings are correct as part of review

analyses/cell-type-ewings/references/README.md Outdated Show resolved Hide resolved
@sjspielman
Copy link
Member

For fun (for science?!) I may have also asked ChatGPT what it thinks about marker genes here (with, happily, some overlap with what you found so at least ChatGPT has some degree of accuracy here...) - https://chatgpt.com/share/677bdf60-1a1c-8003-9708-a7e91873976a

Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
@allyhawkins
Copy link
Member Author

For fun (for science?!) I may have also asked ChatGPT what it thinks about marker genes here (with, happily, some overlap with what you found so at least ChatGPT has some degree of accuracy here...) - https://chatgpt.com/share/677bdf60-1a1c-8003-9708-a7e91873976a

I actually was thinking I should do this so thank you! Although some of the genes overlap, their descriptions are somewhat off. For example, PRKCB is a direct target of EWS-FLI1 and is upregulated by the fusion not repressed!

PRKCB (Protein Kinase C Beta): EWS-FLI1 has been shown to repress PRKCB expression, impacting downstream signaling pathways.

Don't trust the robots 🤖

@sjspielman
Copy link
Member

Although some of the genes overlap, their descriptions are somewhat off.

Nobody is even a little shocked 🥴

@allyhawkins
Copy link
Member Author

  • While looking (a bit) over the papers, I wondered whether we want to be considering CD44 here which is mentioned in Goodspeed, and seems to be otherwise recognized as a EWS-FL1 low marker?

I added this to the list! I think because they show transcriptional heterogeneity at the single-cell level in Goodspeed I'm good with adding it. I was trying to be careful about making the list too long since there are a lot of EWS-FLI1 targets and multiple lists that exist of those targets. I wanted to use markers that had been validated experimentally to be heterogeneous in samples and can actually serve as markers of the EWS-FLI1 low/high phenotype within a single sample.

  • How do you envision the gene_signatures tsvs being used in the future, and is there any information that indicates high/low for these genes? My sense here is the marker genes file is the main file of interest to explore cell states, but it might be supplemented with the other signature files? This comment might be me asking for a smidge more docs, but that also depends on your answer here :)

I added a line to the documentation for each of these files with the expectation for expression in EWS-FLI1 high/low cells. My initial thought is to use the main marker gene list to try and define cell states and then use these published gene signatures to help validate the assignments. I also wonder if these lists will be more helpful than my custom list in assigning cell states. I basically wanted to include anything that could be helpful in making these assignments now, even if they don't get fully used. But you are correct that the marker genes file is the main file of interest.

This should be ready for another look!

@allyhawkins allyhawkins requested a review from sjspielman January 6, 2025 18:14
Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me, thanks!

@allyhawkins
Copy link
Member Author

Canceling the workflow run here since no changes were made to the code.

@allyhawkins allyhawkins merged commit 0e8187c into AlexsLemonade:main Jan 6, 2025
2 of 3 checks passed
@allyhawkins allyhawkins deleted the allyhawkins/ewing-cell-state-gene-lists branch January 6, 2025 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create marker gene lists for tumor cell states in Ewing sarcoma
2 participants