Batch processing #3

agemagician · 2024-02-15T11:12:56Z

Hello,

Thanks for your great work.

I have noticed that you don't use batch processing in your "get_cell_types_for_adata" function, which makes the feature extraction process very slow.

Compared to the scGPT, which processes inputs in batches, it is 24 times faster.

Do you have any plans to support batch processing ?

SuperBianC · 2024-02-25T02:59:36Z

@agemagician Thanks a lot for your suggestion. I have been working on batch processing of scMulan. However, I found it's difficult to use batch processing, because if the input cells have different length (the number of expressed genes), the decoder only architecture could not process them as a batch.
I have two possible solutions. First is sampling cells with same length as a batch from the dataloader. But it takes a long time to return cell type results in the order of the original adata file indexes. Second is to pad the cells in a batch as a same length. But the trade-off is the generation steps would only be determined by the longest cells, thus the short cells in the batch would waste extra computation time.

I have tried the first solution. It doesn't show any acceleration.

Do you have any ideas for this?

Thanks again.

xhl-xhl · 2024-10-04T14:29:58Z

应该不行吧，本身gpt预测就是一个token遍历预测的

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch processing #3

Batch processing #3

agemagician commented Feb 15, 2024

SuperBianC commented Feb 25, 2024 •

edited

Loading

xhl-xhl commented Oct 4, 2024

Batch processing #3

Batch processing #3

Comments

agemagician commented Feb 15, 2024

SuperBianC commented Feb 25, 2024 • edited Loading

xhl-xhl commented Oct 4, 2024

SuperBianC commented Feb 25, 2024 •

edited

Loading