-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow running speed #767
Comments
Interesting, might be related to this SpikeInterface/spikeinterface#3332 . I also noticed that running kilosort in a loop sometimes causes weird behaviors. |
As for the loop question, are you noticing that it takes longer on the third and forth loop, or just longer on the second loop like the linked issue in @RobertoDF's comment? If you're assigning the sorting results like:
Then the variables in For the "taking a long time" part, I can't really say much without some information about what hardware you're using. For reference, a Neuropixels recording 2-3 hours long on SSD is expected to take 2-3 hours to sort with a 8-12GB GeForce 3000 or 4000 series card, an i7 or better processor from the last few generations, and at least 32GB of system memory. A 32-channel recording should take less time; however, differences in hardware or spike counts could account for some of the gap. Is there a reason you're sorting the shanks separately instead of all at once? |
Thank you for your response! Yes the sorting takes longer on the second loop, just like in @RobertoDF's comment. But for every loop, I have I am sorting the shanks separately because our recordings are very long, so I am worried that sorting shanks together would lead to "CUDA out of memory" issue. And finally, just to clarify: in the Kilosort Hardware Recommendation page, "this situation typically requires more RAM, like 32 or 64 GB" --> is this referring to system memory? Thank you! |
Yes that is referring to system memory. I'll look into the looping issue. I would also recommend trying out sorting it all together, and only sort separately if you run into errors since that should speed up the sorting quite a bit. As for taking too long to run, can you please give some information about what hardware you're using? Specifically: graphics card, processor, amount of GPU memory and system memory, and are you sorting on SSD or HDD? |
I see, I'll try sorting them all together. Regarding hardware, we're using GPU: GeForce GTX 1080Ti, 11GB memory; Processor: Intel i7-9700, 48GB memory, and we are sorting on SSD. |
Also, I noticed that the final clustering step takes the longest time. For a shank that took 11.5hrs to run, 13,844,472 spikes are extracted for first clustering, but 43,478,695 spikes are extracted for final clustering. Is it because too many spikes are extracted for final clustering? I'm using the default 9 and 8 for Th_universal and Th_learned. |
One other thing to check: can you make note of how many spikes were detected for each shank? I just want to make sure it's not a case where you happened to sort the shanks with more spikes later in the loop, which would of course take longer. Another thing you can try is increasing the |
Sorry for the late reply! Here are the spike counts for each shank: Shank 1: 23,946,723 I'll definitely try increasing the cluster_downsampling parameter! Thanks! |
Thanks, still looking into this. Would it be possible for you to share the binary file and probe information for one of the shanks so that I can benchmark the memory usage in a loop? Any of the shanks with 20million or more spikes should work. We don't have datasets with a long duration like that available, so that would help me debug this issue and some related ones. |
Hi! Sorry for the delay. Sure, we could share the files. May I ask how to share the binary file? the compressed file is still too big to share on GitHub. Here is the probe information:
Thank you! |
The easiest way is to upload the data to google drive or dropbox, then
paste a link to it here or email me the link at ***@***.***
…On Fri, Oct 11, 2024, 7:42 AM Tingchen-G ***@***.***> wrote:
Hi!
Sorry for the delay. Sure, we could share the files. May I ask how to
share the binary file? the compressed file is still too big to share on
GitHub. Here is the probe information:
`'''
PROBE
'''
chanMap = np.arange(32)
kcoords = np.zeros(32)
n_chan = 32
X-coordinates
xc_1_3 = np.ones(16) * 6.2
xc_2_4 = np.ones(16) * 6.2 + 30
xc = np.array([val for pair in zip(xc_1_3, xc_2_4) for val in pair])
Y-coordinates
yc_2_4 = np.array([15 + 6.2 + 30 * i for i in range(16)])
yc_1_3 = np.array([6.2 + 30 * k for k in range(16)])
yc = np.array([val for pair in zip(yc_1_3, yc_2_4) for val in pair])
Set up probe
probe = {
'chanMap': chanMap,
'xc': xc,
'yc': yc,
'kcoords': kcoords,
'n_chan': n_chan
}
`
Thank you!
—
Reply to this email directly, view it on GitHub
<#767 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIQ6WYHFNE7S233MTQYS5MDZ27PULAVCNFSM6AAAAABNC7AUKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBXGU3DENBVGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I see, sure! Here is the dropbox link: https://www.dropbox.com/scl/fi/4j65b003lqp3c5umfbhf0/shank3.zip?rlkey=lqz3fuepdbv3hkl99vswuz1ha&st=h1uaym6n&dl=0 |
Hi, I am now running kilosort on a new set of data of similar sizes, and this issue seems to be solved! Now each shank takes around 2hrs, which is quite reasonable considering our data size. I am now using kilosort4.0.18, and have added these lines to the end of the loop:
Thank you for your help! |
Hi!
We are using kilosort for 32-channel recordings that are 10~15 hours long, and it's taking a really long time to process, so I am hoping to ask for some advice on this issue.
We have 16 shanks, each shank with 32 channels. Currently I'm using a loop to run kilosort on each shank separately. Some shanks took 3-4 hours, but a few shanks took 9-10 hours. I noticed that kilosort takes longer and longer to run as it is looped. Any idea for why this might be the case?
We are planning to upgrade our GPU. I read on the Kilosort Hardware Recommendation page that for longer recordings, "this situation typically requires more RAM, like 32 or 64 GB". May I check if this is referring to GPU or system memory? Also, since our current memory is sufficient to handle our data, do you think increasing memory, either in the system or GPU, would reduce runtime?
Thank you!
The text was updated successfully, but these errors were encountered: