You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When creating synapses, we are currently checking for occurrences of multiple synapses for the same (pre, post) neuron pair (here in synapses_create_array.cu and also in synapses_create_generator.cu). We need this to choose the correct parallelisation mode. But this check is very time demanding. Currently for 10^7 synapses with syn.connect(p=sparseness) (brunel scalar delay example with 10^4 neurons), our synapses_create_generator.cu template takes ~30s. With the cpp_standalone or genn device (which both use almost the same template), it takes only in the order of ~1s. Our check for multiple pre post synapses takes ~20s. And we seem to loose another ~10s in my random number buffers operator[] overload. Looks like I'm doing something very inefficient here?
I would say it makes sense to have a user preference to choose not to check for multiple pre post synapses (if the user is sure they don't exist).
Or find a more efficient way of checking this, maybe on the gpu (instead of using a map of id pairs to integer counters that has to loop through all existing synapses).
For the operator[] performance, it would probably make sense to just use the cpp_standalone random number generation implementation for host code. Or precompute the number of needed random numbers and not use the buffer class at all, but normal pointer arithmetic. Or just find out why my implementation is inefficient. Because the random number generation on the device and copying it to host for usage seems to be quite fast.
The text was updated successfully, but these errors were encountered:
When creating synapses, we are currently checking for occurrences of multiple synapses for the same
(pre, post)
neuron pair (here insynapses_create_array.cu
and also insynapses_create_generator.cu
). We need this to choose the correct parallelisation mode. But this check is very time demanding. Currently for10^7
synapses withsyn.connect(p=sparseness)
(brunel scalar delay example with10^4
neurons), oursynapses_create_generator.cu
template takes~30s
. With thecpp_standalone
orgenn
device (which both use almost the same template), it takes only in the order of~1s
. Our check for multiple pre post synapses takes~20s
. And we seem to loose another~10s
in my random number buffersoperator[]
overload. Looks like I'm doing something very inefficient here?I would say it makes sense to have a user preference to choose not to check for multiple pre post synapses (if the user is sure they don't exist).
Or find a more efficient way of checking this, maybe on the gpu (instead of using a map of id pairs to integer counters that has to loop through all existing synapses).
For the
operator[]
performance, it would probably make sense to just use thecpp_standalone
random number generation implementation for host code. Or precompute the number of needed random numbers and not use the buffer class at all, but normal pointer arithmetic. Or just find out why my implementation is inefficient. Because the random number generation on the device and copying it to host for usage seems to be quite fast.The text was updated successfully, but these errors were encountered: