Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalability questions #190

Open
Helveg opened this issue Feb 21, 2022 · 2 comments
Open

Scalability questions #190

Helveg opened this issue Feb 21, 2022 · 2 comments
Assignees
Labels
NeuroML question Further information is requested

Comments

@Helveg
Copy link

Helveg commented Feb 21, 2022

Hi there! I have some questions about scalability of MDF: while connectomes between specific cells can usually be stored as some sort of sparse matrix with the from and to identifiers of the pre and postsynaptic cell, and the from and to location on the cell; this leads to scalability issues when transferring that data to the simulator:

  1. Cells are distributed across compute nodes, most formats require you to iterate the entire dataset to filter out the data about cells on your node. This leads to iteration times of O(N_syn), to filter out 1/Nth of the data in the dataset. This O(N_syn) iteration time assumes you can store the iterated data into a data structure with O(1) lookups. Without O(1) lookup, since you need to query the connections of each cell on your node, you're looking at O(N_syn * (N_cells/nodes)) runtimes to lookup the connections. The number of synapses is the most numerous element of a biophysical neural network. On top of that, most classical O(1) lookup data structures, like a hashtable, have large memory requirements: storing all your data in memory like that on each node is going to limit your scale by memory requirements; NEURON already hits memory limits on HPC at ring networks of 16k cells on 64GB RAM compute nodes, see page 6 of https://arxiv.org/pdf/1901.07454.pdf, imagine having to construct networks with the whole connectome stored in memory, or facing exponential runtime for network construction.
  2. Both NEURON and Arbor (and probably other simulators) need 2 different aspects of the connectivity information: Which connections arrive on this cell? And the inverse, which connections emanate from this cell? How is this adressed, as optimized lookup for one, often hinders the other? Again, processing all the edges in a network seems suboptimal, will there be indices/solutions for both sides of the problem, restrictable to the cells on the node?
  3. Another scaling issue is that tooling that wants to parse your format are often single threaded tools and users who use said tools are on a desktop environment with strict memory limitations, is MDF going to provide streaming, partial lookup of smaller chunks, or optimized data structures for such lookups?
@Helveg
Copy link
Author

Helveg commented Feb 21, 2022

Additionally: can we expect a publication on MDF that among other things properly investigates scaling and runtime complexity of code that has to deal with it and read/write it?

@pgleeson
Copy link
Member

@Helveg As mentioned in #191, the types of network you are referring to here are more in the domain of NeuroML. Eventually there will be full compatibility "under the hood" between MDF and NML, but for now the issue of standardising formats for large scale spiking models is more relevant for NML, and getting NeuroMLlite working well with Arbor, Neuron, Nest, etc. is a more near term goal. Hope that helps.

@jdcpni jdcpni added question Further information is requested NeuroML labels Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeuroML question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants