Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite logic in vlen to drop NumPy build dependency #555

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions numcodecs/vlen.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@

import cython
cimport cython
from numpy cimport ndarray
import numpy as np
from .abc import Codec
from .compat_ext cimport Buffer
Expand All @@ -16,7 +15,7 @@ from .compat import ensure_contiguous_ndarray
from cpython cimport (PyBytes_GET_SIZE, PyBytes_AS_STRING, PyBytes_Check,
PyBytes_FromStringAndSize, PyUnicode_AsUTF8String)
from cpython.buffer cimport PyBUF_ANY_CONTIGUOUS
from libc.stdint cimport uint8_t
from libc.stdint cimport uint8_t, uintptr_t
from libc.string cimport memcpy
from ._utils cimport store_le32, load_le32

Expand All @@ -28,6 +27,14 @@ cdef extern from "Python.h":
int PyUnicode_Check(object text)


# Hacky workaround for lack of `const object[:]`.
# Appears hiding `object` in a fused type works.
# xref: https://github.com/cython/cython/issues/2485
# xref: https://github.com/scipy/scipy/pull/18192#pullrequestreview-1359581611
ctypedef fused object_fused_t:
object
uintptr_t
Comment on lines +30 to +36
Copy link
Member Author

@jakirkham jakirkham Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the crux of the workaround

For fusing, used a uintptr_t. As object is effectively PyObject*, fusing with a generic pointer/address type seemed most appropriate

Note this differs from the SciPy approach where they used double. Though it shouldn't matter much what the other type is as it is unused

Whenever Cython fixes this issue, we can drop this and just use const object[:] where needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well as evidenced by CI, this is not enough. Think what we need to do is get the fused types inside some kind of function signature. Then it might work

However looking at the code here. There are multiple places this would need to go

Wonder if the better path would be...

  1. Move most of this logic into pure Python (Cython is only needed for a small portion of this code)
  2. Refactor the code to reuse common bits across cases
  3. Have a small Cython function that deals with the bits that need to be lower-level


# 4 bytes to store number of items
cdef Py_ssize_t HEADER_LENGTH = 4

Expand Down Expand Up @@ -75,7 +82,7 @@ class VLenUTF8(Codec):
def encode(self, buf):
cdef:
Py_ssize_t i, l, n_items, data_length, total_length
ndarray[object, ndim=1] input_values
const object_fused_t[:] input_values
object[:] encoded_values
int[:] encoded_lengths
char* encv
Expand Down
3 changes: 1 addition & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,13 +196,12 @@ def lz4_extension():

def vlen_extension():
info('setting up vlen extension')
import numpy

extra_compile_args = base_compile_args.copy()
define_macros = []

# setup sources
include_dirs = ['numcodecs', numpy.get_include()]
include_dirs = ['numcodecs']
# define_macros += [('CYTHON_TRACE', '1')]

sources = ['numcodecs/vlen.pyx']
Expand Down
Loading