[WIP] DCD Cython Changes #1155

tylerjereddy · 2017-01-10T23:15:50Z

Related to #929 -- I'm working in a separate branch descending from the original but rebased against develop. We can deal with merging stuff etc. later.

The changes introduced so far finally allow the DCD unit tests to run again on my machine, although there is a new failure for one of them for some reason. Trying to figure this out & slowly chip away at writing DCDs. Focusing on py2 at the moment.

tylerjereddy · 2017-01-11T00:56:34Z

I've added very early draft code for writing the DCD header data & an extremely crude test that simply calls the method without generating an exception at the moment -- this doesn't mean it is working properly though -- it simply compiles & doesn't complain
at the moment this is just calling _write_header() from an already-open DCDFile object -- presumably a write method will have to be written to actually flush to a file by calling _write_header() (and eventually other stuff as well)?

kain88-de · 2017-01-11T10:23:09Z

testsuite/MDAnalysisTests/formats/test_libdcd.py

+class DCDWriteHeaderTest(TestCase):
+
+    def setUp(self):
+        self.dcdfile = DCDFile(DCD)


Maybe open a new file here instead of writing to the test file.

kain88-de

Looks good as a first draft. We need some error checks though to not accidentally overwrite files.

kain88-de · 2017-01-11T10:23:32Z

testsuite/MDAnalysisTests/formats/test_libdcd.py

+
+    @run_in_tempdir()
+    def test_write_header(self):
+        self.dcdfile._write_header()


This should raise an error since the dcd isn't opened in write mode

kain88-de · 2017-01-11T10:24:10Z

package/MDAnalysis/lib/formats/libdcd.pyx

        self.current_frame = frame
+
+    def _write_header(self):
+        if not self.is_open:


This should also check that the dcd is opened in write mode.

kain88-de · 2017-01-11T10:26:02Z

@tylerjereddy thanks for working on this. I'm currently more focused on getting 0.16.0 out and preparing things for GSoC

tylerjereddy · 2017-01-11T16:46:28Z

Yeah, the idea is that I'm probably going to do most stuff incorrectly on the first try here, but I can at least catalyze the process / discussion/ get feedback & edge us closer to use of MDA with python 3.

tylerjereddy · 2017-01-11T23:27:45Z

added code to prevent file overwrites & ensure that only w mode files are used by DCD header writing code
using the _write_header() method from a DCDFile object initialized from a blank file in w mode produces a header that looks like this in the binary file: TCORDT?REMARKS Created 11 January, 2017 at 16:19??UY?P?????
Up next -- suggestions for getting actual DCD file header data piped into the _write_header() method for testing? Will that method have to accept the data fields as arguments, or should the data be set to various attributes within DCDFile that are then accessed by _write_header()?

kain88-de · 2017-01-12T08:48:33Z

testsuite/MDAnalysisTests/formats/test_libdcd.py

        assert_equal(self.dcdfile.n_atoms, 3341)

-    @raises(IOError)
+    @raises(RuntimeError)


I've think in the xdr wrapper I changed everything to raise an IO error. I think that would be a good idea here to.

richardjgowers · 2017-01-12T11:32:50Z

package/MDAnalysis/coordinates/__init__.py

 from . import memory
 from . import MMTF
 from . import null
-


If you add these imports back in, more tests should start passing (or failing with better messages)

tylerjereddy · 2017-01-14T23:34:05Z

rebased to gain py3 compatibility improvements from @jbarnoud
replaced all usage of RuntimeError with IOError in the DCD code / test suite as requested
restored the original DCD import checks used by the package for py2/3 compatibility, now that the upstream py3 changed are available -- this was the suggestion from @richardjgowers
- although I've done this, it seems that this code might be trying to import the old DCD machinery (?) in python 2 (which is part of why I removed it in the first place) -- so it produces ImportError: cannot import name _dcdmodule
  - maybe suggest a fix for this if you really want me to keep those imports in that form -- eventually the try/except block has to go though right? since py2/3 DCD handling should match (the entire point of the Cythonization process here)
block_import now works with python 3 (this can't be achieved with the six library because mock was never in the python 2 stdlib)
there's a single failure in the python 3.5 unit tests for libdcd: FAIL: test_read_unit_cell -- this appears to be a regression introduced after rebasing (an older rebase..) since it is related to reading DCDs. That wouldn't be surprising since the unit tests in this branch don't run automatically on develop changes.
- can probably track the issue down fairly swiftly with git bisect

Will be good to have things in a state where python 2 & 3 produce matching results for nosetests test_libdcd.py, which I suspect will be the case once the above DCD import block thing is remedied.

jbarnoud · 2017-01-15T09:05:25Z

block_import now works with python 3 (this can't be achieved with the six library because mock was never in the python 2 stdlib)

Are there tests for block_import? I thought I fixed its python 3 compatibility because I did not break anything by touching it.

kain88-de · 2017-01-15T09:40:08Z

although I've done this, it seems that this code might be trying to import the old DCD machinery (?) in python 2 (which is part of why I removed it in the first place) -- so it produces ImportError: cannot import name _dcdmodule

Yes it does still use the old dcd module for the DCDReader. This is because the new reader is currently a stand alone module. This style has advantages for the development. Rebasing against develop is easy since there can be almost no merge-conflicts. When I test the new libdcd functions it's easy to compare against the current DCDReader using the same python2 session. I've done that because developing libdcd is taking much longer then actually hooking it up to the reader in mdanalysis later, that should then be simple.

there's a single failure in the python 3.5 unit tests for libdcd: FAIL: test_read_unit_cell -- this appears to be a regression introduced after rebasing (an older rebase..) since it is related to reading DCDs. That wouldn't be surprising since the unit tests in this branch don't run automatically on develop changes.

I'm surprised by that. You shouldn't have had any merge conflicts since this branch contains almost exclusively new files.

Will be good to have things in a state where python 2 & 3 produce matching results for nosetests test_libdcd.py, which I suspect will be the case once the above DCD import block thing is remedied.

Yes that should be the first goal before we replace _dcdmodule in this branch.

tylerjereddy · 2017-01-16T03:26:27Z

Are there tests for block_import? I thought I fixed its python 3 compatibility because I did not break anything by touching it.

@jbarnoud Are you saying you previously fixed the way that the mock module was imported for block_import to work with both python 2 & 3?

Bisecting in python 3.5 is proving harder than I thought... other confounding issues keep cropping up with the setup / build process.

jbarnoud · 2017-01-16T07:11:03Z

MDAnalysisTests.util could not import on python 3, not because of mock but because of __builtin__. I fixed that without breaking the tests. But now I realize that there is no test for block_import so I am not sure I fixed it on python 3. If I actually fixed it, it is in 2f57649. Le 16 janv. 2017 4:26 AM, Tyler Reddy <notifications@github.com> a écrit : Are there tests for block_import? I thought I fixed its python 3 compatibility because I did not break anything by touching it. @jbarnoud Are you saying you previously fixed the way that the mock module was imported for block_import to work with both python 2 & 3? Bisecting in python 3.5 is proving harder than I thought... other confounding issues keep cropping up with the setup / build process. —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

richardjgowers · 2017-01-16T11:31:27Z

There's no tests for block_import, maybe we should have a few tests for our test fixtures?

Anyway, I ran the following in a python 3 environment and it worked (in that it blocked the import) so I think block_import has been fixed for py3

from MDAnalysisTests import block_import


@block_import('numpy')
def try_this():
    import numpy as np

    return np.zeros(3)

WRT mock, we're still installing this even in the py3 install, which isn't a real problem as I think it's identical, but it does mean that we can do either unittest.mock or mock imports.

@tylerjereddy I didn't realise that the DCDFile wasn't coupled to the DCDReader yet, so you can not import it if it makes things easier.

kain88-de · 2017-01-16T12:04:52Z

I activated the _dcdmodule build again. This should help with travis

tylerjereddy · 2017-01-17T02:19:32Z

With respect to the mock behaviour, I mean for testing with python 3 on my machine not travis (i.e., I'm assuming that the pip installed "mock" is extraneous in a python 3 context and the standard library version from unittest should be used). We can cut those changes out before any final merge anyway, if you want to continue using the third-party mock in both pythons for CI purposes.

tylerjereddy · 2017-01-17T02:54:35Z

Also, there's no regression -- the git log shows that I noted a single test failure in October of last year, so that is just being carried over rather than being introduced by some other change. So, just need to fix it rather than bisect / hunt.

tylerjereddy · 2017-01-17T03:09:22Z

Also, likely thanks to the changes from @kain88-de, I'm now seeing the exact same unit test results for nosetests test_libdcd.py in both pythons, which is helpful as they both now have the single same failing test for me to patch.

tylerjereddy · 2017-01-23T19:30:55Z

How confident are we that readdcd.h can read unit cell information properly without a topology (PSF) file? I've been debugging the C code with the DCD test file that MDAnalysis uses and if I iterate through the elements of unitcell in read_dcdstep with:

596   printf("Starting to print unit cell in C\n");                                                   
597   int i;                                                                                          
598   for (i=0;i < (sizeof (unitcell));i++) {                                                         
599               printf("unitcell element from C code: %lf\n",unitcell[i]);                          
600   }                                                                                               
601   printf("Finished printing unit cell in C\n");                                                   
602   return DCD_SUCCESS;                                                                             
603 }

I get:

Starting to print unit cell in C
unitcell element from C code: 0.000000
unitcell element from C code: 0.000000
unitcell element from C code: 0.000056
unitcell element from C code: -0.000000
unitcell element from C code: 0.000000
unitcell element from C code: 0.000000
unitcell element from C code: 0.000000
unitcell element from C code: 0.000000
Finished printing unit cell in C

(two extra elements -- so I'm probably over-iterating in C for whatever reason?)

And the Cython-captured result looks like this:

[  0.00000000e+00   1.08420217e-19   5.57586063e-05  -2.86145146e-42
   2.89659934e-31   1.40129846e-45]

There appears to be some stochastic component to the result having values that hover around 0 / floating point stuff going on? But overall, it looks like the values are basically just zeroes and that never changes. The unit test expected result from test_read_unit_cell is obtained by using the old MDA DCD reading code, but on a universe that is created using both a PSF & a DCD, so I just want to make sure it is reasonable to try to extract the unit cell information directly from the new DCDFile object, without any topology input from i.e., a PSF.

kain88-de · 2017-01-23T19:35:17Z

Look into the old python code. There was some unit-cell handling in there as well. I wanted to keep it at the python level since unit-cell handling seems to be different for some dcd formats.

orbeckst · 2017-01-24T01:15:54Z

DCD unit cells are different in NAMD DCD vs newer CHARMM DCD. I patched the CHARMM DCD unit cell recognition into the Python code but ideally that gets handled at the C/Cython level. See #187 .

tylerjereddy · 2017-01-29T02:38:33Z

all tests in test_libdcd.py now pass in python 2.7 & 3.5
unit cell handling behaviour has been drafted in to allow for the above, mostly by transplanting what Oli has done to various locations in the new workflow; hacking some of the stuff directly into the C header file might be considered a bit ugly, but it works well enough so far (and it is easier for me since I can use the C stuff Oli wrote directly)
if there are additional DCD files / formats in the test suite, please do direct me to those as we currently just have 1 unit test for the unit cell handling behaviour with the new DCD handling
on a more technical note, the switch to read_dcdsubset from read_dcdstep probably wasn't needed, so we could reverse that if not desirable at some stage, but at least the simple unit cell test case is now handled

tylerjereddy · 2017-01-30T02:54:58Z

added an early draft of the DCDfile write method -- currently just checks that file object is opened in the correct mode
added an associated unit test for the the general write() method file mode
this introduces a bit of duplication in the testing framework -- we can easily abstract that later though; I'll focus for now on getting the actual writing to work -- any suggestions on how to 'feed in' coordinate data and so on for the writing / testing of writing? It seems that assigning to attributes of the DCDFile is often a bit tricky / not really intended.

kain88-de · 2017-01-30T09:37:15Z

package/MDAnalysis/lib/formats/include/readdcd.h

+    alpha = unitcell[4];
+    beta  = unitcell[3];
+    gamma = unitcell[1];
+  }


my original idea was that we leave out these things out of this library. If we have this in here it will make it difficult to write DCD files for CHARM or NAMD 2.5 separately. This is because that code standardizes the unitcell we receive from a DCD file so in writing the same standard should be accepted. Then when writing a DCD we need to add a flag what unit-cell format should be written (complicating this code) and adopting to changes in the unitcell handling will mean we have to change this file and add more options. This goes against my original goal of having this wrapper as simple as possible and dealing with program specific DCD file logic in the python mda layer.

But I'm not to familiar with the DCD file landscape or CHARM/NAMD if NAMD=2.5 isn't used anymore and other programs using DCD use the same standard for unitcell's I'm OK leaving this in the cython layer. We then have to document that NAMD 2.5 is supported for only for reading.

If we decide to leave this in libdcd I would prefer the code to be in the pyx file.

We could simply read out the 6 numbers and then have the reader figure it out in the beginning and then use one conversion function in the layer that converts to/from dimensions. This would make the C/pyx code fairly clean: we really just read what's there.

I don't know how to autodetect without actually reading the first frame, though.

I also don't know if there's a significant speed penalty for moving the asin() etc out of the C layer into Python – probably not, compared to reading the rest, but one potential consideration.

This code shouldn't be here and it doesn't have any affect either. read_dcdsubstep doesn't update the unitcell array at all. Only read_dcdstep does.

kain88-de · 2017-01-30T09:41:44Z

any suggestions on how to 'feed in' coordinate data and so on for the writing / testing of writing?

See libmdaxdr the required arguments should be added to the write function

kain88-de · 2017-05-28T15:58:03Z

@tylerjereddy box dimension handling still seems to be broken. From our original DCD tests the dimension checks are failing.

Unitcells are not handled uniformly. We take a [A, B, C, alpha, beta, gamma] and return the same as well.

kain88-de · 2017-05-29T19:11:00Z

@tylerjereddy I finally fixed all unit cell handling. It is now consistent between read and write. I also removed all special handling from the C files. Now it's just hooking the new libdcd up to our readers / writers and some clean up.

kain88-de · 2017-05-29T19:29:37Z

OK I did the hookup of the reader and writer. Our normal unit tests pass.

tylerjereddy · 2017-05-30T01:00:09Z

@kain88-de Ok, thanks. Presumably the old / new tests can be combined in a sensible way if you've not done that already.

I might try to get the one property-based test I added to pass as well, since that presumably doesn't pass yet (it will have to check for a truncated return string, etc.). We'll have to see how long that takes, but my experience is that these are very fast too.

kain88-de · 2017-05-30T07:20:54Z

@kain88-de Ok, thanks. Presumably the old / new tests can be combined in a sensible way if you've not done that already.

I first only checked our old tests. The new style tests of trajectory readers I did in the morning.

I might try to get the one property-based test I added to pass as well, since that presumably doesn't pass yet (it will have to check for a truncated return string, etc.). We'll have to see how long that takes, but my experience is that these are very fast too.

Before you start. Please write down the encoding and symbols that we should support in the Remarks. Right now you restricted the code to printable ascii. As I see in hypothesis it will try though to generate a large portion of the unicode including non printable control characters. Personally I would stick with ASCII seeing how old DCD files are. I'm not super set on control characters. I think some of our own files actually have them and we currently just ignore them but that needs to be checked.

tylerjereddy · 2017-05-30T21:40:46Z

@kain88-de Currently, when I try to build off your latest commit I get:

building 'coordinates._dcdmodule' extension
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/treddy/miniconda2/envs/mda_dev_py2/lib/python2.7/site-packages/numpy/core/include -IMDAnalysis/coordinates/include -I/home/treddy/miniconda2/envs/mda_dev_py2/include/python2.7 -c MDAnalysis/coordinates/src/dcd.c -o build/temp.linux-x86_64-2.7/MDAnalysis/coordinates/src/dcd.o
gcc: error: MDAnalysis/coordinates/src/dcd.c: No such file or directory
gcc: fatal error: no input files
compilation terminated.
error: command 'gcc' failed with exit status 1

Presumably setup.py and possibly other things need to be updated to reflect the file removal.

tylerjereddy · 2017-05-30T21:46:23Z

Ok, patched setup.py such that build succeeds in py2 & 3 on my machine.

… ASCII REMARKS section strings.

tylerjereddy · 2017-05-30T22:54:50Z

I've restored / adjusted test_written_remarks_property() and the hypothesis test now passes for printable ASCII strings in the 0 to 500 length range, with appropriate truncation; it seems that the empty '' is now passing too
there is a substantial performance cost associated with increasing the tested string length up to 500 -- in both python 2 & 3 we're looking at 8.5 seconds vs 0.7 seconds for the 115 (including hypothesis) tests vs. 112 (excluding hypothesis) tests; there is perhaps a middle ground where we cap at say 300 and put one explicit @example() case that's really big to see if we can improve performance that way--I suspect there will always be a tradeoff between testing thoroughness and time usage; further inflating the max string length to 900 increased the test suite time to around 10 seconds

tylerjereddy · 2017-05-30T23:03:48Z

I'll aim to rebase against develop in a few hours unless someone has an objection to that--I think it makes sense to have the CI running automatically again now that we're getting closer?

richardjgowers · 2017-05-31T10:00:28Z

@tylerjereddy yep. I can find some time this weekend to give this a review with some fresh eyes.

kain88-de · 2017-05-31T10:04:40Z

@richardjgowers I'm not quite done finalizing the API of libdcd. I essentially would like to move all unitcell post processing into MDAnalysis and have libdcd read the vanilla data. Also writing the header will be made explicit.

kain88-de · 2017-05-31T19:03:15Z

@tylerjereddy can you upload your rebase?

tylerjereddy · 2017-05-31T23:24:30Z

@kain88-de I'm actually having some strange issues with the rebase command:
git rebase origin/develop

...
Applying: current_frame variable is now updated in write method of DCDFile.
Applying: Removed spurious header writing; fixed an issue in write() method of DCDFile such that single-frame NAMD DCD trajectory writing exactly reproduces the coordinates from the input file.
Applying: Patched issue with frame iteration in test_coord_match -- all unit tests now pass.
fatal: empty ident name (for <tyler.je.reddy@gmail.com>) not allowed

This happens on two computers where I've already set my git config for user name and email. I suspect this relates to me working on a few new / different machines after my relocation, with various credential settings. Maybe another core dev can see if they reproduce this issue when rebasing against the latest develop on their machine?

kain88-de · 2017-06-01T08:56:08Z

@tylerjereddy yeah your wrong identity setting is a problem for the rebase. But there is another problem as well. A month ago or so you merged develop into this branch. This makes rebasing hard. I can try to remove it but that has to wait until later today.

kain88-de · 2017-06-01T13:55:14Z

rebase done in #1372 I'll leave this branch here in case I made a mistake. I don't think so though.

tylerjereddy added Format-DCD Python3 labels Jan 10, 2017

kain88-de reviewed Jan 11, 2017

View reviewed changes

kain88-de reviewed Jan 12, 2017

View reviewed changes

richardjgowers reviewed Jan 12, 2017

View reviewed changes

richardjgowers mentioned this pull request Jan 12, 2017

Cannot import on python3: missing c_distances_openmp #1157

Closed

tylerjereddy force-pushed the dcd-cython-tyler branch from a1b78e1 to 077527d Compare January 14, 2017 23:08

maxscheurer mentioned this pull request Jan 21, 2017

DCD reading maxscheurer/pycontact#15

Closed

kain88-de reviewed Jan 30, 2017

View reviewed changes

kain88-de added 2 commits May 29, 2017 21:06

improve dt unit handling

a2d0791

Fix unitcell handling

0a655ac

Unitcells are not handled uniformly. We take a [A, B, C, alpha, beta, gamma] and return the same as well.

kain88-de force-pushed the dcd-cython-tyler branch from 4f8d216 to 0a655ac Compare May 29, 2017 19:07

fix failing tests

1ff37c8

kain88-de added 5 commits May 30, 2017 07:58

enable new style reader/writer tests

90d4278

fix up docs

d2c2476

remove old C dcd reader

c65ae72

remove duplicate tests

1ccf270

add header property

dc08bf1

Removed setup.py refs to old dcd framework.

308496d

test_written_remarks_property() test now works properly for printable…

1b04abe

… ASCII REMARKS section strings.

richardjgowers self-assigned this May 31, 2017

kain88-de mentioned this pull request Jun 1, 2017

DCD-Cython #1372

Merged

4 tasks

kain88-de closed this Jun 1, 2017

kain88-de deleted the dcd-cython-tyler branch January 24, 2018 19:18

[WIP] DCD Cython Changes #1155

[WIP] DCD Cython Changes #1155

Uh oh!

Conversation

tylerjereddy commented Jan 10, 2017

Uh oh!

tylerjereddy commented Jan 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kain88-de left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kain88-de commented Jan 11, 2017

Uh oh!

tylerjereddy commented Jan 11, 2017

Uh oh!

tylerjereddy commented Jan 11, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tylerjereddy commented Jan 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbarnoud commented Jan 15, 2017

Uh oh!

kain88-de commented Jan 15, 2017

Uh oh!

tylerjereddy commented Jan 16, 2017

Uh oh!

jbarnoud commented Jan 16, 2017 via email

Uh oh!

richardjgowers commented Jan 16, 2017

Uh oh!

kain88-de commented Jan 16, 2017

Uh oh!

tylerjereddy commented Jan 17, 2017

Uh oh!

tylerjereddy commented Jan 17, 2017

Uh oh!

tylerjereddy commented Jan 17, 2017

Uh oh!

tylerjereddy commented Jan 23, 2017

Uh oh!

kain88-de commented Jan 23, 2017

Uh oh!

orbeckst commented Jan 24, 2017

Uh oh!

tylerjereddy commented Jan 29, 2017

Uh oh!

tylerjereddy commented Jan 30, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kain88-de commented Jan 30, 2017

Uh oh!

kain88-de commented May 28, 2017

Uh oh!

kain88-de commented May 29, 2017

Uh oh!

kain88-de commented May 29, 2017

Uh oh!

tylerjereddy commented May 30, 2017

Uh oh!

kain88-de commented May 30, 2017

Uh oh!

tylerjereddy commented May 30, 2017

Uh oh!

tylerjereddy commented May 30, 2017

tylerjereddy commented Jan 11, 2017 •

edited

Loading

tylerjereddy commented Jan 14, 2017 •

edited

Loading