Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch processing #3

Open
agemagician opened this issue Feb 15, 2024 · 2 comments
Open

Batch processing #3

agemagician opened this issue Feb 15, 2024 · 2 comments

Comments

@agemagician
Copy link

Hello,

Thanks for your great work.

I have noticed that you don't use batch processing in your "get_cell_types_for_adata" function, which makes the feature extraction process very slow.

Compared to the scGPT, which processes inputs in batches, it is 24 times faster.

Do you have any plans to support batch processing ?

@SuperBianC
Copy link
Owner

SuperBianC commented Feb 25, 2024

@agemagician Thanks a lot for your suggestion. I have been working on batch processing of scMulan. However, I found it's difficult to use batch processing, because if the input cells have different length (the number of expressed genes), the decoder only architecture could not process them as a batch.
I have two possible solutions. First is sampling cells with same length as a batch from the dataloader. But it takes a long time to return cell type results in the order of the original adata file indexes. Second is to pad the cells in a batch as a same length. But the trade-off is the generation steps would only be determined by the longest cells, thus the short cells in the batch would waste extra computation time.

I have tried the first solution. It doesn't show any acceleration.

Do you have any ideas for this?

Thanks again.

@xhl-xhl
Copy link

xhl-xhl commented Oct 4, 2024

应该不行吧,本身gpt预测就是一个token遍历预测的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants