Slow running speed #767

Tingchen-G · 2024-08-26T00:03:02Z

Hi!

We are using kilosort for 32-channel recordings that are 10~15 hours long, and it's taking a really long time to process, so I am hoping to ask for some advice on this issue.

We have 16 shanks, each shank with 32 channels. Currently I'm using a loop to run kilosort on each shank separately. Some shanks took 3-4 hours, but a few shanks took 9-10 hours. I noticed that kilosort takes longer and longer to run as it is looped. Any idea for why this might be the case?
We are planning to upgrade our GPU. I read on the Kilosort Hardware Recommendation page that for longer recordings, "this situation typically requires more RAM, like 32 or 64 GB". May I check if this is referring to GPU or system memory? Also, since our current memory is sufficient to handle our data, do you think increasing memory, either in the system or GPU, would reduce runtime?

Thank you!

RobertoDF · 2024-08-26T09:52:05Z

Interesting, might be related to this SpikeInterface/spikeinterface#3332 . I also noticed that running kilosort in a loop sometimes causes weird behaviors.

jacobpennington · 2024-08-26T14:29:40Z

As for the loop question, are you noticing that it takes longer on the third and forth loop, or just longer on the second loop like the linked issue in @RobertoDF's comment? If you're assigning the sorting results like:

for i in some_list:
    results = run_kilosort(...)

Then the variables in results will be kept in memory until the next loop completes (or longer if you're storing the results in a list for example), which will slow down sorting some since that memory won't be available in the meantime. Most of those aren't too big, but the memory for tF can add up fast for recordings with a lot of spikes.

For the "taking a long time" part, I can't really say much without some information about what hardware you're using. For reference, a Neuropixels recording 2-3 hours long on SSD is expected to take 2-3 hours to sort with a 8-12GB GeForce 3000 or 4000 series card, an i7 or better processor from the last few generations, and at least 32GB of system memory. A 32-channel recording should take less time; however, differences in hardware or spike counts could account for some of the gap.

Is there a reason you're sorting the shanks separately instead of all at once?

Tingchen-G · 2024-08-26T19:16:24Z

Thank you for your response! Yes the sorting takes longer on the second loop, just like in @RobertoDF's comment. But for every loop, I have del ops, st, clu, tF, Wall, similar_templates, is_ref, est_contam_rate, kept_spikes at the end, which I thought would clear the memory?

I am sorting the shanks separately because our recordings are very long, so I am worried that sorting shanks together would lead to "CUDA out of memory" issue.

And finally, just to clarify: in the Kilosort Hardware Recommendation page, "this situation typically requires more RAM, like 32 or 64 GB" --> is this referring to system memory?

Thank you!

jacobpennington · 2024-08-27T00:47:36Z

Yes that is referring to system memory. I'll look into the looping issue. I would also recommend trying out sorting it all together, and only sort separately if you run into errors since that should speed up the sorting quite a bit.

As for taking too long to run, can you please give some information about what hardware you're using? Specifically: graphics card, processor, amount of GPU memory and system memory, and are you sorting on SSD or HDD?

Tingchen-G · 2024-08-30T14:54:52Z

I see, I'll try sorting them all together. Regarding hardware, we're using GPU: GeForce GTX 1080Ti, 11GB memory; Processor: Intel i7-9700, 48GB memory, and we are sorting on SSD.

Tingchen-G · 2024-08-30T14:59:32Z

Also, I noticed that the final clustering step takes the longest time. For a shank that took 11.5hrs to run, 13,844,472 spikes are extracted for first clustering, but 43,478,695 spikes are extracted for final clustering. Is it because too many spikes are extracted for final clustering? I'm using the default 9 and 8 for Th_universal and Th_learned.

jacobpennington · 2024-09-04T15:51:34Z

One other thing to check: can you make note of how many spikes were detected for each shank? I just want to make sure it's not a case where you happened to sort the shanks with more spikes later in the loop, which would of course take longer.

Another thing you can try is increasing the cluster_downsampling parameter, which would speed up the clustering steps. With that many spikes, you don't need to use as many for some of the clustering operations.

Tingchen-G · 2024-09-10T03:24:30Z

Sorry for the late reply! Here are the spike counts for each shank:

Shank 1: 23,946,723
Shank 2: 26,824,833
Shank 3: 40,672,509
Shank 4: 43,187,385
Shank 5: 32,859,009
Shank 6: 30,946,386
Shank 7: 26,166,955
Shank 8: 17,119,952
Shank 9: 5,001,869
Shank 10: 8,773,221
Shank 11: 22,833,448
Shank 12: 20,865,463
Shank 13: 22,793,711
Shank 14: 30,212,405
Shank 15: 27,891,315
Shank 16: 19,776,232
The spike counts vary significantly between shanks. I suspect the loop may be causing the slow runtime because I've noticed that when a shank takes too long, stopping the loop, restarting Anaconda Prompt and kilosort, and running a new loop from this same shank onwards would make it run much faster.

I'll definitely try increasing the cluster_downsampling parameter! Thanks!

jacobpennington · 2024-09-19T18:38:01Z

Thanks, still looking into this. Would it be possible for you to share the binary file and probe information for one of the shanks so that I can benchmark the memory usage in a loop? Any of the shanks with 20million or more spikes should work. We don't have datasets with a long duration like that available, so that would help me debug this issue and some related ones.

Tingchen-G · 2024-10-11T14:41:51Z

Hi!

Sorry for the delay. Sure, we could share the files. May I ask how to share the binary file? the compressed file is still too big to share on GitHub. Here is the probe information:

chanMap = np.arange(32)
kcoords = np.zeros(32)
n_chan = 32

xc_1_3 = np.ones(16) * 6.2
xc_2_4 = np.ones(16) * 6.2 + 30
xc = np.array([val for pair in zip(xc_1_3, xc_2_4) for val in pair])

yc_2_4 = np.array([15 + 6.2 + 30 * i for i in range(16)])
yc_1_3 = np.array([6.2 + 30 * k for k in range(16)])
yc = np.array([val for pair in zip(yc_1_3, yc_2_4) for val in pair])

probe = {
    'chanMap': chanMap,
    'xc': xc,
    'yc': yc,
    'kcoords': kcoords,
    'n_chan': n_chan
}

Thank you!

jacobpennington · 2024-10-11T19:38:27Z

The easiest way is to upload the data to google drive or dropbox, then paste a link to it here or email me the link at ***@***.***

…

On Fri, Oct 11, 2024, 7:42 AM Tingchen-G ***@***.***> wrote: Hi! Sorry for the delay. Sure, we could share the files. May I ask how to share the binary file? the compressed file is still too big to share on GitHub. Here is the probe information: `''' PROBE ''' chanMap = np.arange(32) kcoords = np.zeros(32) n_chan = 32 X-coordinates xc_1_3 = np.ones(16) * 6.2 xc_2_4 = np.ones(16) * 6.2 + 30 xc = np.array([val for pair in zip(xc_1_3, xc_2_4) for val in pair]) Y-coordinates yc_2_4 = np.array([15 + 6.2 + 30 * i for i in range(16)]) yc_1_3 = np.array([6.2 + 30 * k for k in range(16)]) yc = np.array([val for pair in zip(yc_1_3, yc_2_4) for val in pair]) Set up probe probe = { 'chanMap': chanMap, 'xc': xc, 'yc': yc, 'kcoords': kcoords, 'n_chan': n_chan } ` Thank you! — Reply to this email directly, view it on GitHub <#767 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIQ6WYHFNE7S233MTQYS5MDZ27PULAVCNFSM6AAAAABNC7AUKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBXGU3DENBVGI> . You are receiving this because you commented.Message ID: ***@***.***>

Tingchen-G · 2024-10-11T21:17:02Z

I see, sure! Here is the dropbox link: https://www.dropbox.com/scl/fi/4j65b003lqp3c5umfbhf0/shank3.zip?rlkey=lqz3fuepdbv3hkl99vswuz1ha&st=h1uaym6n&dl=0

Tingchen-G · 2024-10-13T17:24:45Z

Hi,

I am now running kilosort on a new set of data of similar sizes, and this issue seems to be solved! Now each shank takes around 2hrs, which is quite reasonable considering our data size. I am now using kilosort4.0.18, and have added these lines to the end of the loop:

    with open('kilosort.log', 'w') as f:
        pass  

    del ops, st, clu, tF, Wall, similar_templates, is_ref, est_contam_rate, kept_spikes
    del camps, contam_pct, templates, chan_best, amplitudes, firing_rates, dshift

Thank you for your help!

RobertoDF mentioned this issue Aug 29, 2024

Run_sorter speed between loops SpikeInterface/spikeinterface#3332

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow running speed #767

Slow running speed #767

Tingchen-G commented Aug 26, 2024 •

edited

Loading

RobertoDF commented Aug 26, 2024 •

edited

Loading

jacobpennington commented Aug 26, 2024

Tingchen-G commented Aug 26, 2024

jacobpennington commented Aug 27, 2024

Tingchen-G commented Aug 30, 2024 •

edited

Loading

Tingchen-G commented Aug 30, 2024

jacobpennington commented Sep 4, 2024

Tingchen-G commented Sep 10, 2024

jacobpennington commented Sep 19, 2024

Tingchen-G commented Oct 11, 2024 •

edited

Loading

jacobpennington commented Oct 11, 2024 via email

Tingchen-G commented Oct 11, 2024

Tingchen-G commented Oct 13, 2024 •

edited

Loading

Slow running speed #767

Slow running speed #767

Comments

Tingchen-G commented Aug 26, 2024 • edited Loading

RobertoDF commented Aug 26, 2024 • edited Loading

jacobpennington commented Aug 26, 2024

Tingchen-G commented Aug 26, 2024

jacobpennington commented Aug 27, 2024

Tingchen-G commented Aug 30, 2024 • edited Loading

Tingchen-G commented Aug 30, 2024

jacobpennington commented Sep 4, 2024

Tingchen-G commented Sep 10, 2024

jacobpennington commented Sep 19, 2024

Tingchen-G commented Oct 11, 2024 • edited Loading

jacobpennington commented Oct 11, 2024 via email

Tingchen-G commented Oct 11, 2024

Tingchen-G commented Oct 13, 2024 • edited Loading

Tingchen-G commented Aug 26, 2024 •

edited

Loading

RobertoDF commented Aug 26, 2024 •

edited

Loading

Tingchen-G commented Aug 30, 2024 •

edited

Loading

Tingchen-G commented Oct 11, 2024 •

edited

Loading

Tingchen-G commented Oct 13, 2024 •

edited

Loading