-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 4039 large dcd #4048
Issue 4039 large dcd #4048
Conversation
skipped by pytest by default unless LARGEDCD env var set
use fio_size_t for all variables related to filesizes fixes for #4039
yup this works re: bugfix, let's just make sure we squash merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually care about that skipif so much?
yield newf, nreps_reqs | ||
|
||
|
||
@pytest.mark.skipif(not os.environ.get('LARGEDCD', False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's kinda confusing logic, and looks undocumented, are we really expecting to use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can just remove the test if you like, it was handy while I was fixing the bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test would be good to keep, I just don't really know why you'd need a skipif that isn't really documented. Did we not already have a high memory flag from the EDR tests? Can we just use that instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should run the test in at least one runner every time.
And as I said in the original issue, eventually every reader should be tested with a large trajectory so that we have a better chance catching these kind of issues.
FYI lint failure is optional, I might make it print comments instead when I get time to play with the GH API again |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## develop #4048 +/- ##
========================================
Coverage 93.57% 93.57%
========================================
Files 192 192
Lines 25133 25135 +2
Branches 4056 4056
========================================
+ Hits 23517 23521 +4
+ Misses 1095 1094 -1
+ Partials 521 520 -1
... and 1 file with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments
@@ -436,3 +437,38 @@ def test_pathlib(): | |||
# we really only care that pathlib | |||
# object handling worked | |||
assert u.atoms.n_atoms == 3341 | |||
|
|||
|
|||
@pytest.fixture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a module level fixture so that it really only runs once? Unfortunately will need to use the tmpdir factory
@@ -391,7 +391,9 @@ cdef class DCDFile: | |||
if frame == 0: | |||
offset = self._header_size | |||
else: | |||
offset = self._header_size + self._firstframesize + self._framesize * (frame - 1) | |||
offset = self._header_size | |||
offset += self._firstframesize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this ensure that the overflow cannot happen?
Btw, frames was declared as int in the methods signature. Should that be changed, too, or is that a Python int with infinite size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've repro'd the exact bug (with the contentious test) and this fixes it. I've not looked at the raw c and followed all the types.. but by eye promoting some variables to the correct datatype seemed to jiggle it into place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@orbeckst the def(blah: int )
syntax in cython allows it to switch between int
or PyInt
depending on how much it knows about types. I think @richardjgowers approach of changing the size of the directly declared C types is the correct one.
fsize = 3.8 # mb | ||
nreps_reqs = int(2100 // fsize) # times to duplicate traj to hit 2.1Gb | ||
|
||
newf = str(tmpdir / "jabba.dcd") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name approved!
@pytest.fixture | ||
def large_dcdfile(tmpdir): | ||
# creates a >2Gb DCD file | ||
fsize = 3.8 # mb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be super-flexible, get the size from DCD itself. Totally optional
|
||
|
||
@pytest.mark.skipif( | ||
not os.environ.get("LARGEDCD", False), reason="Skipping large file test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the env bar is supposed to skip the test then better call it SKIPLATGEFILETESTS or something like that. In any case, update CI so that it runs somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Env bar = env var… sorry typing from mobile
@IAlibay if you don't want to be in charge please assign someone else, but given that this is related to releases etc I thought you'd be the best person. |
no worries, I'm happy to be in charge of merging, will make sure I don't forget to release |
So one thing to be aware of here - there are only 14 GB disk space available on GitHub runners. We'll need to make sure we're 100% sure with clearing up space, especially when dealing with pytest-xdist. |
I'm not sure it's a good idea to run a large file test for every format for every run. They're slow to create for one (2Gb of I/O) and it's probably not necessary. I'm (we're) not going to have time to solve the entire issue of testing large files I/O in this PR, but I might suggest that we take this patch and fix a popular format in a bugfix release |
fair, do you want to just raise an issue with the current state of things? |
I don’t suggest that this PR should solve the testing big files for every format. But I think the PR should make some changes to the CI files that ensure that this test is run — either in at least one runner or at an absolute minimum in the cron. @IAlibay might have a better idea of when we should run it. But I’d want to avoid seeing such a bad regression again. |
May I counter @orbeckst and ask that we don't deal with CI here? This is a good cherry pickable PR. Add CI and it's going to be a pain (changes a decent chunk between releases). |
I'm happy to take on the responsibility of fixing up a CI entry for this if @richardjgowers would prefer not opening a second PR. |
Also note that I approve but leave red so that I can just fixup stuff here directly so we don't need a second pre-2.4.3 PR. |
I agree, I more was thinking that this is a potential class of bug we should investigate, especially with the XDR reader that I cythonised in #3892. We (I) can raise an issue and we can work from there? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @richardjgowers!
@@ -391,7 +391,9 @@ cdef class DCDFile: | |||
if frame == 0: | |||
offset = self._header_size | |||
else: | |||
offset = self._header_size + self._firstframesize + self._framesize * (frame - 1) | |||
offset = self._header_size | |||
offset += self._firstframesize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@orbeckst the def(blah: int )
syntax in cython allows it to switch between int
or PyInt
depending on how much it knows about types. I think @richardjgowers approach of changing the size of the directly declared C types is the correct one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that @IAlibay prefers the PR in this form for easier handling and given that he also committed to getting the test to run on CI somehow, I have no further objections.
Progress? |
Sorry I've been swamped lately and I mostly need a good empty half day to do the release, depending on how metrics generation & fixing darker lint goes I'll try to do so today or tomorrow. |
I understand that the release requires a solid chunk of time. My (poorly worded) question was more along the lines of what needs to be done to be able to merge the PR into develop — when you said
Once it's merged then we can offer at least a working development version and you can cherry-pick from develop when you can fit it in. At least that's how I understood your comment for the process. |
Yeah sorry, there's a significant element of "I don't remember fully what I need fixed to cherry pick easily" (I have the medium term memory of a goldfish lately...), so I was trying to get a bit of time to review what I needed before having to make a mess out of this. I've booked off the evening for this, so let's try to get this done now. |
Fixes #4039 * Fixes DCD seeking for large (2Gb+) files.
Fixes #4039
Changes made in this Pull Request:
fio_size_t
for variables related to filesizePR Checklist