Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: typed array conveniences #47

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
312 changes: 312 additions & 0 deletions text/0000-typed-array-conveniences.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,312 @@
- Feature Name: typed_array_conveniences
- Start Date: 2022-07-23
- RFC PR: (leave this empty)
- Neon Issue: (leave this empty)

# Summary
[summary]: #summary

Core support for reading and writing typed arrays was added in [RFC 39](https://github.com/neon-bindings/rfcs/blob/main/text/0039-node-api-borrow.md). This RFC adds APIs for conveniently **constructing and accessing metadata from** typed arrays.

# Motivation
[motivation]: #motivation

Constructing typed arrays can be done in JavaScript or by accessing the constructors from `cx.global()`, but this requires a lot of boilerplate and is slower than calling the direct Node-API functions for doing the same.

Reading metadata such as the byte length can be performed by borrowing a typed array's contents and doing arithmetic on the slice size. But this is error-prone and not particularly discoverable.

Having a richer API for working with typed arrays also makes for a better learning experience in the API docs.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

This example creates a typed array of unsigned 32-bit integers with a user-specified length:

```rust
fn create_int_array(mut cx: FunctionContext) -> JsResult<JsTypedArray<u32>> {
let len = cx.argument::<JsNumber>(0)?.value(&mut cx) as usize;
JsTypedArray::new(&mut cx, len)
}
```

This example creates a typed array as a view over a region of an existing buffer:

```rust
// Allocate a 16-byte ArrayBuffer and a uint32 array of length 2 (i.e., 8 bytes)
// starting at byte offset 4 of the buffer:
//
// 0 4 8 12 16
// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
// buf: | | | | | | | | | | | | | | | | |
// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
// ^ ^
// | |
// +-------+-------+
// arr: | | |
// +-------+-------+
// 0 1 2
let buf = cx.array_buffer(16);
let arr = JsUint32Array::from_region(&mut cx, buf.region(4, 2))?;
```

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

## Convenience types

The `neon::types` module contains the following convenient type aliases:

| Rust Type | Convenience Type | JavaScript Type |
| --------------------| ------------------ | ---------------------------------- |
| `JsTypedArray<u8>` | `JsUint8Array` | [`Uint8Array`][Uint8Array] |
| `JsTypedArray<i8>` | `JsInt8Array` | [`Int8Array`][Int8Array] |
| `JsTypedArray<u16>` | `JsUint16Array` | [`Uint16Array`][Uint16Array] |
| `JsTypedArray<i16>` | `JsInt16Array` | [`Int16Array`][Int16Array] |
| `JsTypedArray<u32>` | `JsUint32Array` | [`Uint32Array`][Uint32Array] |
| `JsTypedArray<i32>` | `JsInt32Array` | [`Int32Array`][Int32Array] |
| `JsTypedArray<u64>` | `JsBigUint64Array` | [`BigUint64Array`][BigUint64Array] |
| `JsTypedArray<i64>` | `JsBigInt64Array` | [`BigInt64Array`][BigInt64Array] |
| `JsTypedArray<f32>` | `JsFloat32Array` | [`Float32Array`][Float32Array] |
| `JsTypedArray<f64>` | `JsFloat64Array` | [`Float64Array`][Float64Array] |

These can be convenient to type, and also create a discoverable place for documentation and examples in the API docs.

## Constructing typed arrays

This RFC defines a number of APIs for constructing typed arrays.

### `JsTypedArray`

The `JsTypedArray` type gets three new constructor methods for creating a new typed array.

```rust
impl<T: Binary> JsTypedArray<T> {
// Constructs a new typed array from a length, allocating a new backing buffer.
pub fn new<'cx, C>(
cx: &mut C,
len: usize,
) -> JsResult<'cx, Self>
where
C: Context<'cx>;

// Constructs a new typed array from an existing buffer.
pub fn from_buffer<'cx, 'b: 'cx, C>(
cx: &mut C,
buffer: Handle<'b, JsArrayBuffer>,
) -> JsResult<'cx, Self>
where
C: Context<'cx>;

// Constructs a new typed array from a region of an existing buffer.
pub fn from_region<'c, 'r, C>(
cx: &mut C,
region: Region<'r, T>,
) -> JsResult<'c, Self>
where
C: Context<'c>;
}
```

### `JsArrayBuffer`

The `JsArrayBuffer` type gets a new method for constructing a `Region` struct, which describes a region of a buffer (see below).

```rust
impl Handle<'cx, JsArrayBuffer> {
pub fn region<T: Binary>(self, byte_offset: usize, len: usize) -> Region<'cx, T>;
kjvalencik marked this conversation as resolved.
Show resolved Hide resolved
}
```

### Helper types

#### `neon::types::binary::Binary`
kjvalencik marked this conversation as resolved.
Show resolved Hide resolved

The constructor functions require passing a type tag to Node-API functions, so this RFC defines a sealed `Binary` trait to define on all the primitive element types of typed arrays (`i8`, `u8`, `i16`, `u16`, etc) in order to define the type tag as an associated constant. This is exposed for documentation and to allow userland generic helper functions that work on all typed array types.

```rust
trait Binary: Sealed + Copy + Debug {
const TYPE_TAG: /* hidden */;
}
```

#### `neon::types::binary::Region`

The `Region` struct defines a region of an underlying buffer. For performance, it's not actually checked to be a valid region until actually constructing a typed array object. (Otherwise, every method on the type would have to re-validate every time it's called, because of the possibility of the buffer getting detached.)

```rust
#[derive(Clone, Copy)]
struct Region<'cx, T: Binary> { /* hidden */ }

impl<'cx, T: Binary> Region<'cx, T> {
// Constructs a new typed array for this region.
pub fn to_typed_array<'c, C>(self, cx: &mut C) -> JsResult<'c, JsTypedArray<T>>
kjvalencik marked this conversation as resolved.
Show resolved Hide resolved
where C: Context<'c>;
}
kjvalencik marked this conversation as resolved.
Show resolved Hide resolved
```

## Convenient queries

This RFC also defines a number of methods that query typed arrays for metadata.

### `JsTypedArray`

```rust
impl<T: Binary> JsTypedArray<T> {
// Returns the region of the backing buffer for this typed array.
pub fn to_region<'cx, C>(
&self,
cx: &mut C,
) -> Region<'cx, T>
where
C: Context<'cx>;

// Returns the backing buffer.
pub fn buffer<'cx, C>(
&self,
cx: &mut C,
) -> Handle<'cx, JsArrayBuffer>
where
C: Context<'cx>;

// Returns the byte offset into the backing buffer.
pub fn byte_offset<'cx, C>(
&self,
cx: &mut C,
) -> usize
where
C: Context<'cx>;

// Returns the number of elements in the typed array.
pub fn len<'cx, C>(
&self,
cx: &mut C,
) -> usize
where
C: Context<'cx>;
}
```

### `TypedArray`

This RFC adds a convenience method to the `TypedArray` trait:

```rust
pub trait TypedArray: Sealed {
/* as before... */

// Returns the number of bytes in this typed array.
fn byte_length<'cx, C>(&self, cx: &mut C) -> usize
where
C: Context<'cx>;
}
```

### `Region`

The `Region` struct also defines a number of convenient queries.

```rust
impl<'cx, T: Binary> Region<'cx, T> {

pub fn buffer(self) -> Handle<'cx, JsArrayBuffer>;

pub fn byte_offset(self) -> usize;

pub fn len(self) -> usize;

pub fn byte_length(self) -> usize;
}
```


# Drawbacks
[drawbacks]: #drawbacks

There aren't any real drawbacks to exposing this functionality, since it's provided by Node-API. There are, however, a number of tradeoffs in the particulars of the design.

The main drawbacks of this RFC's proposed design are:

## Naming inconsistency: `len()` vs `byte_length()`

This RFC goes with `len()` to match a strong Rust convention, and `byte_length()` to match the typed array naming convention (using snake case, of course). These does lead to an inconsistency between the two. A couple alternatives would be `byte_len()` or `size()`. But then `byte_len()` ends up feeling a bit inconsistent with `byte_offset()`. We could then choose `byte_off()` but that's a bit opaque.
kjvalencik marked this conversation as resolved.
Show resolved Hide resolved

## Unfortunate docs placement of `region()`

In order to make the lifetimes work out, `region()` is a method of `Handle<JsArrayBuffer>` instead of `JsArrayBuffer`. This has the unfortunate effect of placing the method in the API docs for [`neon::handle::Handle`][neon::handle::Handle] instead of [`neon::types::JsArrayBuffer`][neon::types::JsArrayBuffer], which makes them less discoverable. The other docs should make an effort to mention the method prominently, use it in examples, and link to its docs in order to mitigate this issue.
dherman marked this conversation as resolved.
Show resolved Hide resolved

We could eliminate the underlying buffer from the `Region` data structure. Conversion to a typed array would require passing the buffer in as an additional parameter, which ends up more verbose and redundant:
```rust
buffer.region(4, 2).to_typed_array(&mut cx, buffer)
```

## Cost of multiple FFI calls

The `.buffer()`, `.byte_offset()`, `.len()`, and `.byte_length()` methods all make FFI calls to the same underlying Node-API function ([`napi_get_typedarray_info`][napi_get_typedarray_info]), which incurs unnecessary FFI overhead. Unfortunately, the semantics of JavaScript buffer detachment means that `.byte_offset()`, `.len()`, and `.byte_length()` can all change their value over time, which means it's not safe to cache the results of this FFI call.

The `.to_region()` method mitigates this issue by allowing performance-conscious Neon users to incur a single FFI call to retrieve all these values at once.

# Rationale and alternatives
[alternatives]: #alternatives

## Conversion via `From`/`Into`

Making use of the standard conversion APIs of Rust to convert between buffers and typed arrays, or between regions and typed arrays, isn't really an option since all conversions need a context.

## Range syntax

It's appealing to imagine using Rust's [range syntax][std::ops::Range] for regions, and even to use [index overloading][std::ops::Index] to enable syntax like:

```rust
JsUint32Array::from_range(&mut cx, buffer[4..12])
```

This runs into two problems:

1. **Impedance mismatch:** The JavaScript range APIs all take a byte offset and a typed length, whereas the Rust range syntax takes a start offset and end offset.
2. **`Index` incompatibility:** The signature of [`Index::index()`][std::ops::Index::index] returns a reference, so it's not allowed to allocate a custom struct. This makes it [effectively incompatible](https://users.rust-lang.org/t/how-to-implement-index-range-usize-for-a-custom-slice-type-with-additional-data/66201) with custom slice-like types such as `Region`.

## Untyped regions

It might seem nonobvious why `Region` is typed. An untyped alternative would lose track of the meaning of the `len()` value. Another alternative would be to use byte length instead, but this would be less consistent with the JavaScript typed array APIs.

It's also possible to leverage type inference to avoid having to actually state the type explicitly:
```rust
let arr = JsUint32Array::from_region(&mut cx, buf.region(4, 2))?;
```

## Flat constructor for `from_region`

This RFC's design makes `from_region` and `to_region` symmetric signatures with `Region` as an input type vs. return type, respectively. An alternative approach would be to make `from_region` take the `byte_offset` and `len` as immediate arguments of `from_region`, and have `to_region` return a tuple. However, this would risk being more error-prone: users would have to remember what the two different `usize` elements of the tuple represent, and it would be easy to confuse the `len` integer for a byte length. It would also be untyped (see above).

## Builder pattern for `Region`

Instead of `buffer.region(offset, len)`, we could expose builder methods for constructing regions, like `buffer.byte_offset(offset).len(len)`. This would have the benefit of making the semantics of the integers more unambiguous, but doesn't seem worth the extra verbosity.

## Alternative names for `Region`
dherman marked this conversation as resolved.
Show resolved Hide resolved

One option would be to use "slice" terminology, since a `Region` is similar to a Rust slice. But a slice is a particular datatype in Rust, whereas this is a custom type. Also, a slice is a live view, and a region is more like a _description_ of a slice but it's not a live view until it's converted to a typed array.

Another option would be "range" terminology, but again, a [range][std::ops::Range] is a specific thing in Rust.


# Unresolved questions
[unresolved]: #unresolved-questions

None.

[std::ops::Index]: https://doc.rust-lang.org/stable/std/ops/trait.Index.html
[std::ops::Range]: https://doc.rust-lang.org/stable/std/ops/struct.Range.html
[std::ops::Index::index]: https://doc.rust-lang.org/stable/std/ops/trait.Index.html#tymethod.index
[neon::handle::Handle]: https://docs.rs/neon/latest/neon/handle/struct.Handle.html
[neon::types::JsArrayBuffer]: https://docs.rs/neon/latest/neon/types/struct.JsArrayBuffer.html
[napi_get_typedarray_info]: https://nodejs.org/api/n-api.html#napi_get_typedarray_info
[Uint8Array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array
[Int8Array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Int8Array
[Uint16Array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint16Array
[Int16Array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Int16Array
[Uint32Array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint32Array
[Int32Array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Int32Array
[BigUint64Array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigUint64Array
[BigInt64Array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt64Array
[Float32Array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Float32Array
[Float64Array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Float64Array