Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 4039 large dcd #4048

Merged
merged 7 commits into from
Mar 29, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions package/CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,14 @@ Changes
Deprecations


??/??/?? richardjgowers

* 2.4.3

Fixes
* Fixed DCD reading for large (>2Gb) files (Issue #4039). This was broken
for versions 2.4.0, 2.4.1 and 2.4.2

01/04/23 IAlibay

* 2.4.2
Expand Down
6 changes: 3 additions & 3 deletions package/MDAnalysis/lib/formats/libdcd.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -115,11 +115,11 @@ cdef class DCDFile:
# The current DCD frame
cdef int current_frame
# size of the first DCD frame
cdef readonly int _firstframesize
cdef readonly fio_size_t _firstframesize
# Size of a DCD frame
cdef readonly int _framesize
cdef readonly fio_size_t _framesize
# Size of the DCD header
cdef readonly int _header_size
cdef readonly fio_size_t _header_size
# Is the file open?
cdef int is_open
# Have we reached the end of the file
Expand Down
4 changes: 3 additions & 1 deletion package/MDAnalysis/lib/formats/libdcd.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -391,7 +391,9 @@ cdef class DCDFile:
if frame == 0:
offset = self._header_size
else:
offset = self._header_size + self._firstframesize + self._framesize * (frame - 1)
offset = self._header_size
offset += self._firstframesize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ensure that the overflow cannot happen?

Btw, frames was declared as int in the methods signature. Should that be changed, too, or is that a Python int with infinite size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've repro'd the exact bug (with the contentious test) and this fixes it. I've not looked at the raw c and followed all the types.. but by eye promoting some variables to the correct datatype seemed to jiggle it into place

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orbeckst the def(blah: int ) syntax in cython allows it to switch between int or PyInt depending on how much it knows about types. I think @richardjgowers approach of changing the size of the directly declared C types is the correct one.

offset += self._framesize * (frame - 1)

cdef int ok = fio_fseek(self.fp, offset, _whence_vals['FIO_SEEK_SET'])
if ok != 0:
Expand Down
37 changes: 37 additions & 0 deletions testsuite/MDAnalysisTests/coordinates/test_dcd.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations.
# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
#
import os
from pathlib import Path
import numpy as np

Expand Down Expand Up @@ -436,3 +437,39 @@ def test_pathlib():
# we really only care that pathlib
# object handling worked
assert u.atoms.n_atoms == 3341


@pytest.fixture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a module level fixture so that it really only runs once? Unfortunately will need to use the tmpdir factory

def large_dcdfile(tmpdir):
# creates a >2Gb DCD file
fsize = 3.8 # mb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be super-flexible, get the size from DCD itself. Totally optional

nreps_reqs = int(2100 // fsize) # times to duplicate traj to hit 2.1Gb

newf = str(tmpdir / 'jabba.dcd')

u = mda.Universe(PSF, DCD)

print(nreps_reqs)

with mda.Writer(newf, n_atoms=len(u.atoms)) as w:
for _ in range(nreps_reqs):
for ts in u.trajectory:
w.write(u.atoms)

yield newf, nreps_reqs


@pytest.mark.skipif(not os.environ.get('LARGEDCD', False),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's kinda confusing logic, and looks undocumented, are we really expecting to use it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can just remove the test if you like, it was handy while I was fixing the bug

Copy link
Member

@IAlibay IAlibay Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test would be good to keep, I just don't really know why you'd need a skipif that isn't really documented. Did we not already have a high memory flag from the EDR tests? Can we just use that instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should run the test in at least one runner every time.

And as I said in the original issue, eventually every reader should be tested with a large trajectory so that we have a better chance catching these kind of issues.

reason="Skipping large file test")
def test_large_dcdfile(large_dcdfile):
DCD_large, nreps = large_dcdfile

u_small = mda.Universe(PSF, DCD)
u = mda.Universe(PSF, DCD_large)

assert len(u.trajectory) == len(u_small.trajectory) * nreps

u_small.trajectory[-1]
u.trajectory[-1]

assert_array_almost_equal(u.atoms.positions, u_small.atoms.positions)