-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate ancpbids as a successor to bids.layout #831
Comments
ancpBIDS:
Originally posted by @erdalkaraca in #818 (comment) |
The main current limitations of
|
An informative exercise may be to use |
I initially analyzed the runtime performance behavior of pybids: most of the time was used to initialize the SQL database using SQLAlchemy. Note that SQLAlchemy as a object relational manager is rather used in business applications providing significant convenience mechanisms to simplify communication with a database when the business domain is rather complex. For pybids, it would have been better to not use SQLAlchemy as a mediation layer as there are only a handful of domain entities (BIDSFile, Entity, FileAssociation, etc.). I.e. directly communicating with the underlying SQLite database without SQLAlchemy (using the db-api package) would already give significant performance benefits. |
I think that was the initial intention behind a "pybids lite" which mutated into ancpBIDS. There is already a very simple benchmark (as unit tests) that we could further elaborate on:
|
That's great. Question: is ancp bids indexing meta-data by default? I think it would be useful to try pybids in those unit tests with |
ancpbids scans the file system once and builds up the graph in-memory. As the amount of meta-data is very low compared to the imaging data, meta-data is part of the graph as additional leaf nodes. As memory access is fast, there is no need to index meta-data. If you have a more specific test case, let me know, I am happy to try out. |
Another idea is to try to run the main unit tests in pybids w/ ancp bids. Obviously many will fail simply because the APIs are not identical, but it would be useful to compare differences. |
There are several unit tests which use BIDSLayout as their querying interface, for example: I may refactor/extract some use cases into the benchmark to make it more exhaustive. |
@erdalkaraca I'm trying to reconcile your performance numbers w a previous investigation into performance: see: Basically, we concluded that most time was actually spent in cc: @gkiar do you remember where your profiling results were? Note: some of this profiling is pretty old, so its possible pybids pre SQL was EVEN slower I also think its interesting to note that @tyarkoni noted he moved to SQL for maintainability, which goes against @tsalo's experience. Not sure I have the right answer here but just trying to pin down the difference in approaches. |
My profiling results are in #285 ; good luck! |
@erdalkaraca what's the best way to contact you? we're having a bids event soon that i'd like to invite you to participate in to discuss this further. if you want, you can reach me at the email listed on my profile. |
On our lab server (using network storage) the dataset used in the benchmark took 6.5 mins to load/index using pybids. Of the 6.5 mins I could pinpoint 50% execution time to sqlalchemy interaction, the rest mostly being file system operations like os.walk as you mentioned.
|
Happily! I'm a bit swamped and don't have the reference environment anymore, so it'll have to wait a bit, but consider it added to my plate 🙂 If you attend the BIDS meeting @adelavega mentioned, we can also sync up more about this, then. |
Updated profiling... Tests
Results
Next Step
@erdalkaraca this is looking great! I'll keep poking at it this week. Will you be online? |
Thanks a lot for posting the results, Greg @gkiar Yes, I will be online for the zoom/discord call this week to talk about the implementation. |
The ancpbids project has made formidable progress in implementing a
BIDSLayout
-like API, with marked performance improvements, using a different underlying implementation with a custom query language.Given limited resources to maintain key community-led BIDS infrastructure, it is important to compare pybids to ancp-bids, and evaluate the possibility of combining efforts in order to prevent a fragmentation of the ecosystem.
The text was updated successfully, but these errors were encountered: