Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace bytes of compressed stream with uint8_t #106

Merged
merged 21 commits into from
Oct 1, 2020

Conversation

halehawk
Copy link
Contributor

I replaced type bytes with const uint8_t[::1], this way the compressed stream can be checked by ensure_ndarray of numcodecs, and also it prevents numpy array tobytes procedures. I tested with ensure_ndarray, and the default zfp tests.

Copy link
Contributor

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @halehawk! Have a couple minor suggestions below.

python/zfpy.pyx Outdated Show resolved Hide resolved
python/zfpy.pyx Outdated Show resolved Hide resolved
@jakirkham
Copy link
Contributor

It seems stream_open takes a void* that is non-const. Is there a reason for that @lindstro? Are there cases where it writes back to the data provided? If so, that would be tricky with bytes objects even before this change (as they are immutable in Python). How should we go about handling read-only data then?

@lindstro
Copy link
Member

It seems stream_open takes a void* that is non-const. Is there a reason for that @lindstro? Are there cases where it writes back to the data provided? If so, that would be tricky with bytes objects even before this change (as they are immutable in Python). How should we go about handling read-only data then?

@jakirkham The reason is that the bit stream is used both for compression and decompression, and it needs to be mutable to support compression.

@jakirkham
Copy link
Contributor

Does that mean user provided data is overwritten as part of compression?

@lindstro
Copy link
Member

Does that mean user provided data is overwritten as part of compression?

No, user data is not being overwritten. The compressed data is copied and returned here:

# copy the compressed data into a perfectly sized bytes object
. I'm unsure how this all relates to stream_open, which is a very low-level C function that is called only once during compression to set up the memory buffer where the compressed data is written:
bstream = stream_open(data, maxsize)
.

I'm not entirely sure I understand the issue being discussed here. Can you please elaborate?

@jakirkham
Copy link
Contributor

A bytes object is read-only. So the pointer would need to be const. It may have worked before with a C compiler warning. Though this change is highlighting that is a problem as Cython is making this a compilation error now. Would it be possible to provide API functions that are const at least in some cases to respect this constraint? I'm guessing this means other changes under-the-hood as well.

@halehawk
Copy link
Contributor Author

halehawk commented Sep 18, 2020 via email

@lindstro
Copy link
Member

Supporting both const and nonconst pointers in the bit stream API would require substantial surgery on all of zfp. Moreover, it would preclude you from compressing to a stream and later decompressing from the same stream (something needed by zfp's compressed-array classes)--two different objects would be needed: one that holds a const pointer, one that holds a mutable pointer. From the C/C++ programmer's perspective, the expectation is that stream_open is initialized with a pointer resulting from malloc, new, or similar, and that this memory buffer must be populated with compressed data at some point, which requires write access.

Now, we should be able to safely cast a bytes or memoryview object to const void*, no? We can then use another cast in the call to stream_open to do away with the const qualifier.

@jakirkham
Copy link
Contributor

Good to know.

Sure we could do that. Just noting that we may want to fix this in the long term.

@halehawk, would you be able to give Peter's suggestion a try? Should be a small change to make.

@halehawk
Copy link
Contributor Author

halehawk commented Sep 18, 2020 via email

@lindstro
Copy link
Member

For some reason, build #992 is not showing up on CDash (you may be able to check it yourself: https://open.cdash.org/index.php?project=zfp), even though the Travis logs suggest that they were sent there. If they don't show up on CDash soon, I may restart the build.

Your changes look right, although I would caution against using char* in place of unsigned char* or, preferably, void*.

Although mostly orthogonal to this issue, the compressed stream needs to be word aligned. I'm not sure if Python makes any guarantees about alignment. I have no reason to believe that alignment has anything to do with these errors; this is just something that occurred to me when discussing pointer types.

@lindstro
Copy link
Member

Looks like some tests but not all are now passing:

test_advanced_decompression_checksum (__main__.TestNumpy) ... ok
test_advanced_decompression_nonsquare (__main__.TestNumpy) ... ok
test_different_dimensions (__main__.TestNumpy) ... ERROR
test_different_dtypes (__main__.TestNumpy) ... ERROR
test_utils (__main__.TestNumpy) ... ERROR

======================================================================
ERROR: test_different_dimensions (__main__.TestNumpy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_numpy.py", line 24, in test_different_dimensions
    self.lossless_round_trip(c_array)
  File "test_numpy.py", line 17, in lossless_round_trip
    decompressed_array = zfpy.decompress_numpy(compressed_array)
  File "zfpy.pyx", line 333, in zfpy.decompress_numpy
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
BufferError: Object is not writable.

======================================================================
ERROR: test_different_dtypes (__main__.TestNumpy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_numpy.py", line 38, in test_different_dtypes
    self.lossless_round_trip(array)
  File "test_numpy.py", line 17, in lossless_round_trip
    decompressed_array = zfpy.decompress_numpy(compressed_array)
  File "zfpy.pyx", line 333, in zfpy.decompress_numpy
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
BufferError: Object is not writable.

======================================================================
ERROR: test_utils (__main__.TestNumpy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_numpy.py", line 186, in test_utils
    compressed_array,
  File "zfpy.pyx", line 333, in zfpy.decompress_numpy
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
BufferError: Object is not writable.

----------------------------------------------------------------------
Ran 5 tests in 17.482s

FAILED (errors=3)

python/zfpy.pyx Outdated Show resolved Hide resolved
python/zfpy.pyx Outdated Show resolved Hide resolved
python/zfpy.pyx Outdated Show resolved Hide resolved
@halehawk
Copy link
Contributor Author

halehawk commented Sep 18, 2020 via email

@lindstro
Copy link
Member

Sorry, I copied the wrong log. I'm guessing this is a Cython bug (see, e.g., cython/cython#1605). I don't know what to suggest at this point.

@jakirkham Do you have other numcodecs examples that use Cython with memoryviews successfully?

@jakirkham
Copy link
Contributor

Though that has been fixed since 0.28. Cython is now deep in the 0.29.x releases.

python/zfpy.pyx Outdated
@@ -329,15 +330,15 @@ cpdef np.ndarray _decompress(
return output

cpdef np.ndarray decompress_numpy(
bytes compressed_data,
const uint8_t[::1] compressed_data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry there may have been an extra space in the suggestion before. Maybe that helps?

Suggested change
const uint8_t[::1] compressed_data,
const uint8_t[::1] compressed_data,

@jakirkham
Copy link
Contributor

Where do we look to see the build failures? Went to the CDash page, but am not familiar with this tool. So wasn't sure where to go from there to see this PR.

@lindstro
Copy link
Member

The CDash user interface leaves much to be desired. You first need to find the correct date of the build (in whatever timezone CDash defaults to), which you can do via current/prev/next (e.g., https://open.cdash.org/index.php?project=zfp&date=2020-09-18). Then find the Travis build name you're interested in (e.g., develop-#993.1). Click the entry in the red Fail column (e.g., https://open.cdash.org/viewTest.php?onlyfailed&buildid=6770400), then the "Failed" link (e.g., https://open.cdash.org/test/250854826). This should bring up the error log for that build.

@lindstro
Copy link
Member

Is it possible you release a minor release such as 0.5.5.1 (-:? If not, please merge my PR as soon as possible. I will be glad to help on this dylib issue. But can you point to me how you can run the release script?

@halehawk Thanks for the offer. By the release script, do you mean generating and uploading the wheels to PyPI? That's done through a separate repo: https://github.com/LLNL/zfpy-wheels. When experimenting with this, we should use TestPyPI; see https://packaging.python.org/guides/using-testpypi/. If this is what you meant, then let's discuss offline the steps involved in doing this.

@lindstro
Copy link
Member

I did a test on my machine with ensure_ndarray, it worked. But do I need to do it officially on test_numpy, ithen test_numpy.py needs import numcodecs.

I'm not sure I follow. Why is numcodecs needed to test that ndarrays can be passed? I'm reluctant to add unnecessary dependencies to zfp.

@halehawk
Copy link
Contributor Author

halehawk commented Sep 23, 2020 via email

@halehawk
Copy link
Contributor Author

halehawk commented Sep 23, 2020 via email

@halehawk
Copy link
Contributor Author

halehawk commented Sep 23, 2020 via email

@halehawk
Copy link
Contributor Author

I added tests and passed all checks. @lindstro

random_array,
write_header=False,
**compression_kwargs
)

if isinstance(compressed_array_t, np.ndarray):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the intent here. compressed_array_t is by definition a bytes object, so this conditional should always be false.

Wouldn't the appropriate test be to ensure that we can pass both a bytes object and an ndarray to zfpy.decompress_numpy? What other array-like objects are we trying to support here?

Minor nit: As a C/C++ programmer, I think compressed_array_t is a confusing name as the _t suffix makes it look like a type.

@halehawk
Copy link
Contributor Author

halehawk commented Sep 30, 2020 via email

@lindstro
Copy link
Member

No, compressed_array_t could be array objects from numcodecs.zfpy, and could be a bytes object from your other upstream application tools.

Maybe it's my lack of Python skills, but unless I'm missing something obvious, in this test compressed_array_t is always a bytes object returned by zfpy.compress_numpy. How could it be any other type the way this particular test is written?

Wouldn't the appropriate test be to ensure that we can pass both a bytes object and an ndarray to zfpy.decompress_numpy? What other array-like objects are we trying to support here?

Yes, in your test_numpy.py, there are two places calling _decompress (line 75 and line 104) or decompress_numpy, I only added this test on one place of _decompress or decompress_numpy.

My concern is that the conditional is always false (so what purpose does it serve?), and therefore only one branch is taken and tested. That is, only the memoryview path is exercised.

@halehawk
Copy link
Contributor Author

halehawk commented Sep 30, 2020 via email

@lindstro lindstro merged commit 50aa4f1 into LLNL:develop Oct 1, 2020
@jakirkham
Copy link
Contributor

Thanks Haiying! Also thanks Peter for the review! 😄

@lindstro
Copy link
Member

lindstro commented Oct 1, 2020

Yes, big thanks to @halehawk for this PR and for putting up with my picky comments. 😉 Some of the tests are failing on Travis and AppVeyor, but this appears to be caused by some issues with cmocka. I've restarted the failed jobs, and they seem to be passing now.

@halehawk
Copy link
Contributor Author

halehawk commented Oct 1, 2020 via email

@jakirkham
Copy link
Contributor

Thanks again for your help, Peter! Do you have an idea of when the next release (with this change) might be? 🙂

@lindstro
Copy link
Member

We're woefully behind schedule on this release, and I don't want to make any promises, but we're working hard to get it done by the end of the year.

@halehawk
Copy link
Contributor Author

halehawk commented Nov 17, 2020 via email

@lindstro
Copy link
Member

No, no changes there. The majority of new features are related to the C++ compressed-array classes and their C wrappers.

@halehawk
Copy link
Contributor Author

halehawk commented Jan 12, 2021 via email

@lindstro
Copy link
Member

We're slowly making progress, but we're not as close as I would have wished. I expect we're still a few weeks away.

@jakirkham
Copy link
Contributor

Just checking in here, how are things looking?

@halehawk
Copy link
Contributor Author

halehawk commented Feb 2, 2021 via email

@lindstro
Copy link
Member

lindstro commented Feb 2, 2021

Just checking in here, how are things looking?

Slowly making progress on 0.5.6. Sorry to be holding everyone up, but there's just a lot of stuff going into this release.

Regarding Python 3.9, I can add that to zfpy-wheels but am unsure whether it needs MB_ML_VER=2010, like 3.8.

@da-wad Do you know?

@jakirkham
Copy link
Contributor

No worries 🙂

If it is easy to add, it would be great, but if not wouldn't worry about it. I don't think the fact that Numcodecs uses Python 3.9 should constrain zfpy.

Ah guessing you are referring to this? Yeah 2010 makes sense sense as that is CentOS 6 based. I think the default is 1, which is CentOS 5 based (no one should be using that any more as it no longer receives any updates). They've also added 2014, which is CentOS 7 based. Not sure what GLIBC your users typically need, but continuing to target CentOS 6 makes sense and should give GLIBC 2.12+ support.

@halehawk
Copy link
Contributor Author

halehawk commented Mar 12, 2021 via email

@jakirkham
Copy link
Contributor

We need a tag is the issue

@halehawk
Copy link
Contributor Author

halehawk commented Mar 12, 2021 via email

@jakirkham
Copy link
Contributor

There is no git tag here including these changes from which to build. We would need that to proceed (even with Conda)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants