You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the pure-python of PDB parsing in BioPandas is quite slow - certainly too slow for highthroughput structural bioinformatics or ML.
Describe your proposed solution
I have written a Cython-based implementation (CPDB) which is considerably faster and would like to set this as the default parsing backend. As it stands, I believe this to be one of the fastest (if not the fastest) available PDB parser for Python.
That's a good point! I was mostly concerned about the potential for build problems (mostly as cpdb is my first time working with Cython). I'll make a PR tonight and push a dev release so we can collect some feedback.
One difference in the comparison is that your Cython implementation only reads ATOM, HETATM, and ENDMDL lines while biopandas reads all. Would be interesting to compare the performance if all lines are read (no need to parse like biopandas?).
@Ruibin-Liu Hmm, that's a really great point. I could add a read_header arg to cpdb. In any case, I wouldn't have thought it would make a huge difference to speed; in terms of line count PDB files are most coordinates.
Describe the workflow you want to enable
Currently, the pure-python of PDB parsing in BioPandas is quite slow - certainly too slow for highthroughput structural bioinformatics or ML.
Describe your proposed solution
I have written a Cython-based implementation (CPDB) which is considerably faster and would like to set this as the default parsing backend. As it stands, I believe this to be one of the fastest (if not the fastest) available PDB parser for Python.
Performance comparison
However, given BioPandas' widespread usage, I am unclear if distributing this with a Cython component will lead to dependency problems for users.
Describe alternatives you've considered, if relevant
Speeding up the passage of time
Additional context
The text was updated successfully, but these errors were encountered: