Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset details of TCGA-BRCA #5

Open
naivete5656 opened this issue Nov 6, 2024 · 0 comments
Open

Dataset details of TCGA-BRCA #5

naivete5656 opened this issue Nov 6, 2024 · 0 comments

Comments

@naivete5656
Copy link

Hi, thank you for sharing the implementation of this excellent work!

I have three questions about TCGA-BRCA.

  1. How did you determine which cases to use for training? There are 1,062 diagnostic slides, but only 1,041 cases were used to train your model. Could you explain how you selected these images?
  2. Could you let me know why you only used only diagnostic slides? The TCGA dataset includes tissue slides, which appear linked to the same sample IDs as the RNA data. Since tissue slides visually resemble diagnostic slides, I wonder if they could also be suitable for training.
  3. How did you obtain the OncoTreeCode? I reviewed the shared csv, which contains OncoTreeCode data for each case. According to TCGA-BRCA, there are nine disease types, but these types don’t seem to align with the OncoTreeCodes in your CSV. I assume that Disease_type or diagnoses.0.primary_diagnosis might be the key information to identify the OncoTreeCode, but these labels don’t appear to match those in your CSV.

Best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant