Skip to content

Conversation

@tylerjereddy
Copy link
Member

Related to #929 -- I'm working in a separate branch descending from the original but rebased against develop. We can deal with merging stuff etc. later.

The changes introduced so far finally allow the DCD unit tests to run again on my machine, although there is a new failure for one of them for some reason. Trying to figure this out & slowly chip away at writing DCDs. Focusing on py2 at the moment.

@tylerjereddy
Copy link
Member Author

tylerjereddy commented Jan 11, 2017

  • I've added very early draft code for writing the DCD header data & an extremely crude test that simply calls the method without generating an exception at the moment -- this doesn't mean it is working properly though -- it simply compiles & doesn't complain
  • at the moment this is just calling _write_header() from an already-open DCDFile object -- presumably a write method will have to be written to actually flush to a file by calling _write_header() (and eventually other stuff as well)?

class DCDWriteHeaderTest(TestCase):

def setUp(self):
self.dcdfile = DCDFile(DCD)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe open a new file here instead of writing to the test file.

Copy link
Member

@kain88-de kain88-de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as a first draft. We need some error checks though to not accidentally overwrite files.


@run_in_tempdir()
def test_write_header(self):
self.dcdfile._write_header()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should raise an error since the dcd isn't opened in write mode

self.current_frame = frame

def _write_header(self):
if not self.is_open:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also check that the dcd is opened in write mode.

@kain88-de
Copy link
Member

@tylerjereddy thanks for working on this. I'm currently more focused on getting 0.16.0 out and preparing things for GSoC

@tylerjereddy
Copy link
Member Author

Yeah, the idea is that I'm probably going to do most stuff incorrectly on the first try here, but I can at least catalyze the process / discussion/ get feedback & edge us closer to use of MDA with python 3.

@tylerjereddy
Copy link
Member Author

  • added code to prevent file overwrites & ensure that only w mode files are used by DCD header writing code
  • using the _write_header() method from a DCDFile object initialized from a blank file in w mode produces a header that looks like this in the binary file: TCORDT?REMARKS Created 11 January, 2017 at 16:19??UY?P?????
  • Up next -- suggestions for getting actual DCD file header data piped into the _write_header() method for testing? Will that method have to accept the data fields as arguments, or should the data be set to various attributes within DCDFile that are then accessed by _write_header()?

assert_equal(self.dcdfile.n_atoms, 3341)

@raises(IOError)
@raises(RuntimeError)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've think in the xdr wrapper I changed everything to raise an IO error. I think that would be a good idea here to.

from . import memory
from . import MMTF
from . import null

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add these imports back in, more tests should start passing (or failing with better messages)

@tylerjereddy
Copy link
Member Author

tylerjereddy commented Jan 14, 2017

  • rebased to gain py3 compatibility improvements from @jbarnoud
  • replaced all usage of RuntimeError with IOError in the DCD code / test suite as requested
  • restored the original DCD import checks used by the package for py2/3 compatibility, now that the upstream py3 changed are available -- this was the suggestion from @richardjgowers
    • although I've done this, it seems that this code might be trying to import the old DCD machinery (?) in python 2 (which is part of why I removed it in the first place) -- so it produces ImportError: cannot import name _dcdmodule
      • maybe suggest a fix for this if you really want me to keep those imports in that form -- eventually the try/except block has to go though right? since py2/3 DCD handling should match (the entire point of the Cythonization process here)
  • block_import now works with python 3 (this can't be achieved with the six library because mock was never in the python 2 stdlib)
  • there's a single failure in the python 3.5 unit tests for libdcd: FAIL: test_read_unit_cell -- this appears to be a regression introduced after rebasing (an older rebase..) since it is related to reading DCDs. That wouldn't be surprising since the unit tests in this branch don't run automatically on develop changes.
    • can probably track the issue down fairly swiftly with git bisect

Will be good to have things in a state where python 2 & 3 produce matching results for nosetests test_libdcd.py, which I suspect will be the case once the above DCD import block thing is remedied.

@jbarnoud
Copy link
Contributor

block_import now works with python 3 (this can't be achieved with the six library because mock was never in the python 2 stdlib)

Are there tests for block_import? I thought I fixed its python 3 compatibility because I did not break anything by touching it.

@kain88-de
Copy link
Member

although I've done this, it seems that this code might be trying to import the old DCD machinery (?) in python 2 (which is part of why I removed it in the first place) -- so it produces ImportError: cannot import name _dcdmodule

Yes it does still use the old dcd module for the DCDReader. This is because the new reader is currently a stand alone module. This style has advantages for the development. Rebasing against develop is easy since there can be almost no merge-conflicts. When I test the new libdcd functions it's easy to compare against the current DCDReader using the same python2 session. I've done that because developing libdcd is taking much longer then actually hooking it up to the reader in mdanalysis later, that should then be simple.

there's a single failure in the python 3.5 unit tests for libdcd: FAIL: test_read_unit_cell -- this appears to be a regression introduced after rebasing (an older rebase..) since it is related to reading DCDs. That wouldn't be surprising since the unit tests in this branch don't run automatically on develop changes.

I'm surprised by that. You shouldn't have had any merge conflicts since this branch contains almost exclusively new files.

Will be good to have things in a state where python 2 & 3 produce matching results for nosetests test_libdcd.py, which I suspect will be the case once the above DCD import block thing is remedied.

Yes that should be the first goal before we replace _dcdmodule in this branch.

@tylerjereddy
Copy link
Member Author

Are there tests for block_import? I thought I fixed its python 3 compatibility because I did not break anything by touching it.

@jbarnoud Are you saying you previously fixed the way that the mock module was imported for block_import to work with both python 2 & 3?

Bisecting in python 3.5 is proving harder than I thought... other confounding issues keep cropping up with the setup / build process.

@jbarnoud
Copy link
Contributor

jbarnoud commented Jan 16, 2017 via email

@richardjgowers
Copy link
Member

There's no tests for block_import, maybe we should have a few tests for our test fixtures?

Anyway, I ran the following in a python 3 environment and it worked (in that it blocked the import) so I think block_import has been fixed for py3

from MDAnalysisTests import block_import


@block_import('numpy')
def try_this():
    import numpy as np

    return np.zeros(3)

WRT mock, we're still installing this even in the py3 install, which isn't a real problem as I think it's identical, but it does mean that we can do either unittest.mock or mock imports.

@tylerjereddy I didn't realise that the DCDFile wasn't coupled to the DCDReader yet, so you can not import it if it makes things easier.

@kain88-de
Copy link
Member

I activated the _dcdmodule build again. This should help with travis

@tylerjereddy
Copy link
Member Author

With respect to the mock behaviour, I mean for testing with python 3 on my machine not travis (i.e., I'm assuming that the pip installed "mock" is extraneous in a python 3 context and the standard library version from unittest should be used). We can cut those changes out before any final merge anyway, if you want to continue using the third-party mock in both pythons for CI purposes.

@tylerjereddy
Copy link
Member Author

Also, there's no regression -- the git log shows that I noted a single test failure in October of last year, so that is just being carried over rather than being introduced by some other change. So, just need to fix it rather than bisect / hunt.

@tylerjereddy
Copy link
Member Author

Also, likely thanks to the changes from @kain88-de, I'm now seeing the exact same unit test results for nosetests test_libdcd.py in both pythons, which is helpful as they both now have the single same failing test for me to patch.

@tylerjereddy
Copy link
Member Author

How confident are we that readdcd.h can read unit cell information properly without a topology (PSF) file? I've been debugging the C code with the DCD test file that MDAnalysis uses and if I iterate through the elements of unitcell in read_dcdstep with:

596   printf("Starting to print unit cell in C\n");                                                   
597   int i;                                                                                          
598   for (i=0;i < (sizeof (unitcell));i++) {                                                         
599               printf("unitcell element from C code: %lf\n",unitcell[i]);                          
600   }                                                                                               
601   printf("Finished printing unit cell in C\n");                                                   
602   return DCD_SUCCESS;                                                                             
603 }   

I get:

Starting to print unit cell in C
unitcell element from C code: 0.000000
unitcell element from C code: 0.000000
unitcell element from C code: 0.000056
unitcell element from C code: -0.000000
unitcell element from C code: 0.000000
unitcell element from C code: 0.000000
unitcell element from C code: 0.000000
unitcell element from C code: 0.000000
Finished printing unit cell in C

(two extra elements -- so I'm probably over-iterating in C for whatever reason?)

And the Cython-captured result looks like this:

[  0.00000000e+00   1.08420217e-19   5.57586063e-05  -2.86145146e-42
   2.89659934e-31   1.40129846e-45]

There appears to be some stochastic component to the result having values that hover around 0 / floating point stuff going on? But overall, it looks like the values are basically just zeroes and that never changes. The unit test expected result from test_read_unit_cell is obtained by using the old MDA DCD reading code, but on a universe that is created using both a PSF & a DCD, so I just want to make sure it is reasonable to try to extract the unit cell information directly from the new DCDFile object, without any topology input from i.e., a PSF.

@kain88-de
Copy link
Member

Look into the old python code. There was some unit-cell handling in there as well. I wanted to keep it at the python level since unit-cell handling seems to be different for some dcd formats.

@orbeckst
Copy link
Member

DCD unit cells are different in NAMD DCD vs newer CHARMM DCD. I patched the CHARMM DCD unit cell recognition into the Python code but ideally that gets handled at the C/Cython level. See #187 .

@tylerjereddy
Copy link
Member Author

  • all tests in test_libdcd.py now pass in python 2.7 & 3.5
  • unit cell handling behaviour has been drafted in to allow for the above, mostly by transplanting what Oli has done to various locations in the new workflow; hacking some of the stuff directly into the C header file might be considered a bit ugly, but it works well enough so far (and it is easier for me since I can use the C stuff Oli wrote directly)
  • if there are additional DCD files / formats in the test suite, please do direct me to those as we currently just have 1 unit test for the unit cell handling behaviour with the new DCD handling
  • on a more technical note, the switch to read_dcdsubset from read_dcdstep probably wasn't needed, so we could reverse that if not desirable at some stage, but at least the simple unit cell test case is now handled

@tylerjereddy
Copy link
Member Author

  • added an early draft of the DCDfile write method -- currently just checks that file object is opened in the correct mode
  • added an associated unit test for the the general write() method file mode
  • this introduces a bit of duplication in the testing framework -- we can easily abstract that later though; I'll focus for now on getting the actual writing to work -- any suggestions on how to 'feed in' coordinate data and so on for the writing / testing of writing? It seems that assigning to attributes of the DCDFile is often a bit tricky / not really intended.

alpha = unitcell[4];
beta = unitcell[3];
gamma = unitcell[1];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my original idea was that we leave out these things out of this library. If we have this in here it will make it difficult to write DCD files for CHARM or NAMD 2.5 separately. This is because that code standardizes the unitcell we receive from a DCD file so in writing the same standard should be accepted. Then when writing a DCD we need to add a flag what unit-cell format should be written (complicating this code) and adopting to changes in the unitcell handling will mean we have to change this file and add more options. This goes against my original goal of having this wrapper as simple as possible and dealing with program specific DCD file logic in the python mda layer.

But I'm not to familiar with the DCD file landscape or CHARM/NAMD if NAMD=2.5 isn't used anymore and other programs using DCD use the same standard for unitcell's I'm OK leaving this in the cython layer. We then have to document that NAMD 2.5 is supported for only for reading.

If we decide to leave this in libdcd I would prefer the code to be in the pyx file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could simply read out the 6 numbers and then have the reader figure it out in the beginning and then use one conversion function in the layer that converts to/from dimensions. This would make the C/pyx code fairly clean: we really just read what's there.

I don't know how to autodetect without actually reading the first frame, though.

I also don't know if there's a significant speed penalty for moving the asin() etc out of the C layer into Python – probably not, compared to reading the rest, but one potential consideration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code shouldn't be here and it doesn't have any affect either. read_dcdsubstep doesn't update the unitcell array at all. Only read_dcdstep does.

@kain88-de
Copy link
Member

any suggestions on how to 'feed in' coordinate data and so on for the writing / testing of writing?

See libmdaxdr the required arguments should be added to the write function

@kain88-de
Copy link
Member

@tylerjereddy box dimension handling still seems to be broken. From our original DCD tests the dimension checks are failing.

kain88-de added 2 commits May 29, 2017 21:06
Unitcells are not handled uniformly. We take a [A, B, C, alpha, beta, gamma]
and return the same as well.
@kain88-de
Copy link
Member

@tylerjereddy I finally fixed all unit cell handling. It is now consistent between read and write. I also removed all special handling from the C files. Now it's just hooking the new libdcd up to our readers / writers and some clean up.

@kain88-de
Copy link
Member

OK I did the hookup of the reader and writer. Our normal unit tests pass.

@tylerjereddy
Copy link
Member Author

@kain88-de Ok, thanks. Presumably the old / new tests can be combined in a sensible way if you've not done that already.

I might try to get the one property-based test I added to pass as well, since that presumably doesn't pass yet (it will have to check for a truncated return string, etc.). We'll have to see how long that takes, but my experience is that these are very fast too.

@kain88-de
Copy link
Member

@kain88-de Ok, thanks. Presumably the old / new tests can be combined in a sensible way if you've not done that already.

I first only checked our old tests. The new style tests of trajectory readers I did in the morning.

I might try to get the one property-based test I added to pass as well, since that presumably doesn't pass yet (it will have to check for a truncated return string, etc.). We'll have to see how long that takes, but my experience is that these are very fast too.

Before you start. Please write down the encoding and symbols that we should support in the Remarks. Right now you restricted the code to printable ascii. As I see in hypothesis it will try though to generate a large portion of the unicode including non printable control characters. Personally I would stick with ASCII seeing how old DCD files are. I'm not super set on control characters. I think some of our own files actually have them and we currently just ignore them but that needs to be checked.

@tylerjereddy
Copy link
Member Author

@kain88-de Currently, when I try to build off your latest commit I get:

building 'coordinates._dcdmodule' extension
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/treddy/miniconda2/envs/mda_dev_py2/lib/python2.7/site-packages/numpy/core/include -IMDAnalysis/coordinates/include -I/home/treddy/miniconda2/envs/mda_dev_py2/include/python2.7 -c MDAnalysis/coordinates/src/dcd.c -o build/temp.linux-x86_64-2.7/MDAnalysis/coordinates/src/dcd.o
gcc: error: MDAnalysis/coordinates/src/dcd.c: No such file or directory
gcc: fatal error: no input files
compilation terminated.
error: command 'gcc' failed with exit status 1

Presumably setup.py and possibly other things need to be updated to reflect the file removal.

@tylerjereddy
Copy link
Member Author

Ok, patched setup.py such that build succeeds in py2 & 3 on my machine.

@tylerjereddy
Copy link
Member Author

  • I've restored / adjusted test_written_remarks_property() and the hypothesis test now passes for printable ASCII strings in the 0 to 500 length range, with appropriate truncation; it seems that the empty '' is now passing too
  • there is a substantial performance cost associated with increasing the tested string length up to 500 -- in both python 2 & 3 we're looking at 8.5 seconds vs 0.7 seconds for the 115 (including hypothesis) tests vs. 112 (excluding hypothesis) tests; there is perhaps a middle ground where we cap at say 300 and put one explicit @example() case that's really big to see if we can improve performance that way--I suspect there will always be a tradeoff between testing thoroughness and time usage; further inflating the max string length to 900 increased the test suite time to around 10 seconds

@tylerjereddy
Copy link
Member Author

I'll aim to rebase against develop in a few hours unless someone has an objection to that--I think it makes sense to have the CI running automatically again now that we're getting closer?

@richardjgowers
Copy link
Member

@tylerjereddy yep. I can find some time this weekend to give this a review with some fresh eyes.

@richardjgowers richardjgowers self-assigned this May 31, 2017
@kain88-de
Copy link
Member

@richardjgowers I'm not quite done finalizing the API of libdcd. I essentially would like to move all unitcell post processing into MDAnalysis and have libdcd read the vanilla data. Also writing the header will be made explicit.

@kain88-de
Copy link
Member

@tylerjereddy can you upload your rebase?

@tylerjereddy
Copy link
Member Author

@kain88-de I'm actually having some strange issues with the rebase command:
git rebase origin/develop

...
Applying: current_frame variable is now updated in write method of DCDFile.
Applying: Removed spurious header writing; fixed an issue in write() method of DCDFile such that single-frame NAMD DCD trajectory writing exactly reproduces the coordinates from the input file.
Applying: Patched issue with frame iteration in test_coord_match -- all unit tests now pass.
fatal: empty ident name (for <tyler.je.reddy@gmail.com>) not allowed

This happens on two computers where I've already set my git config for user name and email. I suspect this relates to me working on a few new / different machines after my relocation, with various credential settings. Maybe another core dev can see if they reproduce this issue when rebasing against the latest develop on their machine?

@kain88-de
Copy link
Member

@tylerjereddy yeah your wrong identity setting is a problem for the rebase. But there is another problem as well. A month ago or so you merged develop into this branch. This makes rebasing hard. I can try to remove it but that has to wait until later today.

@kain88-de kain88-de mentioned this pull request Jun 1, 2017
4 tasks
@kain88-de
Copy link
Member

rebase done in #1372 I'll leave this branch here in case I made a mistake. I don't think so though.

@kain88-de kain88-de closed this Jun 1, 2017
@kain88-de kain88-de deleted the dcd-cython-tyler branch January 24, 2018 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants