Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add routines for saving/loading arrays in npy format #581

Merged
merged 10 commits into from
Dec 6, 2021

Conversation

awvwgk
Copy link
Member

@awvwgk awvwgk commented Nov 27, 2021

  • allow saving arrays in npy format
  • allow loading of arrays from npy files
  • tests for round-tripping
  • write specifications

Related:

Hexdumps of npy files for reference:

# Produced by https://github.com/MRedies/NPY-for-Fortran
00000000  93 4e 55 4d 50 59 02 00  46 00 00 00 7b 27 64 65  |.NUMPY..F...{'de|
00000010  73 63 72 27 3a 20 27 3c  66 38 27 2c 20 27 66 6f  |scr': '<f8', 'fo|
00000020  72 74 72 61 6e 5f 6f 72  64 65 72 27 3a 20 54 72  |rtran_order': Tr|
00000030  75 65 2c 20 27 73 68 61  70 65 27 3a 20 28 31 30  |ue, 'shape': (10|
00000040  2c 34 2c 29 2c 20 7d 20  20 20 20 20 20 20 20 20  |,4,), }         |
00000050  20 0a 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ...............|  # <- misaligned data?
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000190  00 00                                             |..|
00000192
# Produced by `import numpy as np; np.save("zeros-10x4.npy", np.zeros((10, 4)))`
00000000  93 4e 55 4d 50 59 01 00  76 00 7b 27 64 65 73 63  |.NUMPY..v.{'desc|
00000010  72 27 3a 20 27 3c 66 38  27 2c 20 27 66 6f 72 74  |r': '<f8', 'fort|
00000020  72 61 6e 5f 6f 72 64 65  72 27 3a 20 46 61 6c 73  |ran_order': Fals|
00000030  65 2c 20 27 73 68 61 70  65 27 3a 20 28 31 30 2c  |e, 'shape': (10,|
00000040  20 34 29 2c 20 7d 20 20  20 20 20 20 20 20 20 20  | 4), }          |
00000050  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00000070  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 0a  |               .|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001c0
# Produced by `stdlib_io_npy`
00000000  93 4e 55 4d 50 59 01 00  76 00 7b 27 64 65 73 63  |.NUMPY..v.{'desc|
00000010  72 27 3a 20 27 3c 66 38  27 2c 20 27 66 6f 72 74  |r': '<f8', 'fort|
00000020  72 61 6e 5f 6f 72 64 65  72 27 3a 20 54 72 75 65  |ran_order': True|
00000030  2c 20 27 73 68 61 70 65  27 3a 20 28 31 30 2c 20  |, 'shape': (10, |
00000040  34 2c 20 29 2c 20 7d 20  20 20 20 20 20 20 20 20  |4, ), }         |
00000050  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00000070  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 0a  |               .|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001c0

Tests and error outputs:

❯ ./_build/src/tests/io/test_npy# Testing: npy
  Starting read-rdp-r2 ... (1/20)
       ... read-rdp-r2 [PASSED]
  Starting read-rdp-r3 ... (2/20)
       ... read-rdp-r3 [PASSED]
  Starting read-rsp-r1 ... (3/20)
       ... read-rsp-r1 [PASSED]
  Starting read-rsp-r2 ... (4/20)
       ... read-rsp-r2 [PASSED]
  Starting write-rdp-r2 ... (5/20)
       ... write-rdp-r2 [PASSED]
  Starting write-rsp-r2 ... (6/20)
       ... write-rsp-r2 [PASSED]
  Starting write-i2-r4 ... (7/20)
       ... write-i2-r4 [PASSED]
  Starting invalid-magic-number ... (8/20)
       ... invalid-magic-number [EXPECTED FAIL]
  Message: Expected z'93' but got z'50' as first byte
  Starting invalid-magic-string ... (9/20)
       ... invalid-magic-string [EXPECTED FAIL]
  Message: Expected identifier 'NUMPY'
  Starting invalid-major-version ... (10/20)
       ... invalid-major-version [EXPECTED FAIL]
  Message: Unsupported format major version number '0'
  Starting invalid-minor-version ... (11/20)
       ... invalid-minor-version [EXPECTED FAIL]
  Message: Unsupported format version '1.9'
  Starting invalid-header-len ... (12/20)
       ... invalid-header-len [EXPECTED FAIL]
  Message: Descriptor length does not match
  Starting invalid-nul-byte ... (13/20)
       ... invalid-nul-byte [EXPECTED FAIL]
  Message: Nul byte not allowed in descriptor string
  Starting invalid-key ... (14/20)
       ... invalid-key [EXPECTED FAIL]
  Message: Invalid entry 'x' in dictionary encountered
 --> .test-invalid-key.npy:1:61-63
  |
1 | {'fortran_order': True, 'shape': (10, 4, ), 'descr': '<f8', 'x': 1, }
  |                                                             ^^^
  |
  Starting invalid-comma ... (15/20)
       ... invalid-comma [EXPECTED FAIL]
  Message: Comma cannot appear at this point
 --> .test-invalid-comma.npy:1:24-24
  |
1 | {'fortran_order': True,, 'shape': (10, 4, ), 'descr': '<f8', }
  |                        ^
  |
  Starting invalid-string ... (16/20)
       ... invalid-string [EXPECTED FAIL]
  Message: String cannot appear at this point
 --> .test-invalid-string.npy:1:60-64
  |
1 | {'fortran_order': True, 'shape': (10, 4, ), 'descr': '<f8' '<f4', }
  |                                                            ^^^^^
  |
  Starting duplicate-descr ... (17/20)
       ... duplicate-descr [EXPECTED FAIL]
  Message: Duplicate entry for 'descr' found
 --> .test-invalid-descr.npy:1:18-24
  |
1 | {'descr': '<f8', 'descr': '<f8', 'fortran_order': True, 'shape': (40, ), }
  |                  ^^^^^^^
  |
  Starting missing-descr ... (18/20)
       ... missing-descr [EXPECTED FAIL]
  Message: Dictionary does not contain required entry 'descr'
 --> .test-missing-descr.npy:1:1-45
  |
1 | {'fortran_order': True, 'shape': (10, 4, ), }
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  Starting missing-fortran_order ... (19/20)
       ... missing-fortran_order [EXPECTED FAIL]
  Message: Dictionary does not contain required entry 'fortran_order'
 --> .test-missing-fortran_order.npy:1:1-38
  |
1 | {'descr': '<f8', 'shape': (10, 4, ), }
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  Starting missing-shape ... (20/20)
       ... missing-shape [EXPECTED FAIL]
  Message: Dictionary does not contain required entry 'shape'
 --> .test-missing-shape.npy:1:1-39
  |
1 | {'fortran_order': True, 'descr': '<f8'}
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |

@awvwgk awvwgk added the topic: IO Common input/output related features label Nov 27, 2021
@awvwgk awvwgk marked this pull request as ready for review November 28, 2021 23:12
@awvwgk awvwgk added the reviewers needed This patch requires extra eyes label Nov 29, 2021
@awvwgk awvwgk requested review from milancurcic and certik November 29, 2021 18:12
@awvwgk
Copy link
Member Author

awvwgk commented Nov 29, 2021

@certik @milancurcic @MRedies Please have a look at this PR for supporting the npy format. The saving routines are close to Matthias implementation in https://github.com/MRedies/NPY-for-Fortran, the main difference is that version 1.0 of the format is used by default, rather than always version 2.0, also the padding of the header seemed off by two bytes in some cases. The loading routines are new, they need some stress testing and maybe some polishing, but the basic functionality is there and we can do a full round trip via the npy format now.

We maybe should discuss how we handle the fortran_order key, currently I'm just transferring the right layout to a left layout by reversing the shape array, not sure if there is a better way to do this.

Also, for error reporting in binary chunks a hexdump like displaying function for detailed error messages might be useful, or at least for debugging ;). But this might be material for another project.

@MRedies
Copy link

MRedies commented Nov 29, 2021

I didn't have a chance to test it, but it looks like a sensible and concise version of what I did before.

Copy link

@TejasAvinashShetty TejasAvinashShetty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only some minor comments.
Thank you @awvwgk for this. I am not a Python user, but I am sure it will be usefull to many people.

doc/specs/stdlib_io.md Outdated Show resolved Hide resolved
doc/specs/stdlib_io.md Show resolved Hide resolved
doc/specs/stdlib_io.md Outdated Show resolved Hide resolved
doc/specs/stdlib_io.md Outdated Show resolved Hide resolved
doc/specs/stdlib_io.md Outdated Show resolved Hide resolved
src/stdlib_io_npy.fypp Show resolved Hide resolved
src/stdlib_io_npy_load.fypp Show resolved Hide resolved
src/tests/io/test_npy.f90 Outdated Show resolved Hide resolved
@TejasAvinashShetty
Copy link

libnpy seems to be a library that provides simple routines for saving a C or Fortran array to a data file using NumPy's own binary format.
Please see https://scipy-cookbook.readthedocs.io/items/InputOutput.html

Not my idea See first CAZT's comment on CAZT's stackoverflow answer

@awvwgk
Copy link
Member Author

awvwgk commented Dec 2, 2021

libnpy only allows saving to npy, no loading routines unfortunately.

@MRedies
Copy link

MRedies commented Dec 2, 2021

Additionally, if attempt to save some array slice call save_npy(a(:,1,:) with libnpy you are not going to be happy.

@TejasAvinashShetty
Copy link

TejasAvinashShetty commented Dec 2, 2021

Agreed @awvwgk
Just for info, I stumbled upon a newer version at https://github.com/kovalp/libnpy.
Also, thank you so much, @awvwgk, for this pull request. Sorry I did not say this earlier.

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with the changes. thank you

Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

@awvwgk awvwgk removed the reviewers needed This patch requires extra eyes label Dec 6, 2021
@awvwgk
Copy link
Member Author

awvwgk commented Dec 6, 2021

Thanks everybody for the feedback and the reviews. I'll go ahead and merge.

@awvwgk awvwgk merged commit ce9c234 into fortran-lang:master Dec 6, 2021
@awvwgk awvwgk deleted the npy branch December 6, 2021 20:11
@ivan-pi
Copy link
Member

ivan-pi commented Dec 7, 2021

Nice work here. Sorry I join the thread late, but shouldn't iomsg:

if (present(iomsg)) then

only be allocated when stat /= 0? (similar to iostat)

@awvwgk
Copy link
Member Author

awvwgk commented Dec 10, 2021

Good catch. Before this is lost at the end of this thread, let's open a new issue or setup a quick patch to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: IO Common input/output related features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants