Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider not providing the names for top_markers to the model #2

Open
maxim-h opened this issue Jul 29, 2024 · 3 comments
Open

Consider not providing the names for top_markers to the model #2

maxim-h opened this issue Jul 29, 2024 · 3 comments

Comments

@maxim-h
Copy link

maxim-h commented Jul 29, 2024

Hello!

I was quite curious about how your framework would perform with our data. So the first thing I did was to retrieve the markers of already manually annotated cell populations. And I was very happy with the performance until I read the Reasons and realized that it seems to be making inferences not fully explained by the markers provided.

This led me to realize that during the creation of the prompt the names of the top_genes are being used. So if the markers were identified based on a pre-existing annotation the model will see those labels in the prompt.

ceLLama/R/ceLLama.R

Lines 25 to 29 in 2af357a

annotation_data <- lapply(names(top_genes), function(cluster) {
up_genes <- paste(top_genes[[cluster]]$up, collapse = ", ")
down_genes <- paste(top_genes[[cluster]]$down, collapse = ", ")
prompt <- paste(
"This cell cluster (", cluster, ") has up-regulated genes:", up_genes,

Perhaps it might be better to anonymize the cluster names within the function or point out in the tutorial that the marker.list must have anonymized names.

@eonurk
Copy link
Collaborator

eonurk commented Jul 29, 2024

I am not sure I follow, but might also be insomnia.

@maxim-h
Copy link
Author

maxim-h commented Jul 29, 2024

Hehe, no worries.

We have a seurat object with manually annotated cell types in the field Manual_annotation.
Here's what I did.

Idents(seurat) <- "Manual_annotation"

markers <- FindAllMarkers(seurat, min.pct = .5)
markers.list <- split(markers, markers$cluster)
## at this point `names(markers.list)` has the manually annotated cluster names. 
## Which is inserted into the prompt in the lines that I linked in the last message.

res <- ceLLama(markers.list, temperature = 0, seed = 101, n_genes = 30)

As a results the prompt contains our manual annotation, so the model just comes up with a "random" justification for whatever labels we already provided to it.

@eonurk
Copy link
Collaborator

eonurk commented Jul 29, 2024

Oh I see. That's cheating! 😄 I will think about this, maybe overriding could be an option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants