-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pybids 0.9.4 issue with slowness #521
Comments
There's definitely plenty of optimization that could be done on the indexing side of things, but the main reason for moving to a DB was actually robustness and code maintainability—adding features that involve merging/joining data was becoming a nightmare, and is now much more straightforward. Performance is secondary. For what it's worth, it's unlikely that the DB-backed version will ever be as fast as 0.7... there's considerable overhead associated with the SQLAlchemy ORM layer (in addition to the SQLite DB itself). One thing to note is that the 0.7 series didn't index metadata by default, whereas the present version does. So it's possible that a lot of the difference is coming from that. If you don't need to, e.g., search files by metadata keys, you can initialize the Note also that you can |
Closing this for now, as a major refactor of the indexing code is unlikely to happen any time soon, and the tips above plus #523 should help some. |
Hello!
I'm trying to use pybids to work with a large-ish dataset. I installed what seems to be the newest version (0.9.4) and have found it to be really really slow (it took ~40 minutes to get the layout alone). I spoke to my coworker who uses pybids and who was insisting it shouldn't be as bad as I was experiencing and after a bit of investigation we discovered that she was using a much older version (pybids 0.7.0). I installed this and suddenly the layout took less than two minutes to construct for the same dataset.
I decided to profile both versions and found that most of the time in the new version is being spent on sqlite3 related instructions, which the older version doesn't seem to use (See the attached 'layout_x.x.x.txt' files). Do you know why there would be such an extreme difference in performance between the two versions? I would imagine the sqlite3 indexing is meant to speed things up, but while layout creation is nearly instant after the database has been made retrieving any data is still about ten times slower than for version 0.7.0 (see the two 'subject_x.x.x.txt' cProfile files I've attached for
layout.get(subject='XXXXX')
). I'd like to use the newest version if possible, is there anything I can do to turn off the sqlite3 usage in the new version? Or anything I can do to speed it up?More details, just in case it helps:
layout_0.9.0.txt
layout_0.7.0.txt
subject_0.9.0.txt
subject_0.7.0.txt
The text was updated successfully, but these errors were encountered: