Skip to content

Commit

Permalink
Merge pull request #148 from chmp/feature/remove-bytecode-deserializer
Browse files Browse the repository at this point in the history
Start to imlement basic bytecode-less deserializer
  • Loading branch information
chmp authored Mar 19, 2024
2 parents 16e0887 + 231d92c commit fea50ef
Show file tree
Hide file tree
Showing 65 changed files with 3,654 additions and 2,947 deletions.
38 changes: 37 additions & 1 deletion Changes.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,47 @@
# Change log

## 0.10.1
## 0.11.0

`0.11.0` does contain any anticipated breaking changes. However it's a major
refactoring and may change some untested behavior.

- Remove the bytecode deserializer and use the serde API directly
- Easier to understand and extend
- The `Deserialization` implementation can ask for its expected type, e.g.,
`chrono::DateTime<Utc>` can now be used with `serde_arrow` without
explicitly specifying the strategy
- Add `Date32` and `Time64` support
- Allow to use `arrow` schemas in `SchemaLike::from_value()`, e.g., `let fields
= Vec::<Field>::from_value(&batch.schema())`.
- Fix bug in `SchemaLike::from_type()` for nested unions

### Thanks

The following people contributed to this release:

- [@gz](https://github.com/gz) added `Date32` and `Time64` support
([PR](https://github.com/chmp/serde_arrow/pull/147))
- [@progval](https://github.com/progval) added additional error messages
([PR](https://github.com/chmp/serde_arrow/pull/142))

## 0.10.0

- Remove deprecated APIs
- Use the serde serialization APIs directly, instead of using the bytecode
serializer. Serialization will be about `2x` faster
- Fix bug in `SchemaLike::from_value` with incorrect strategy deserialization

### Thanks

The following people contributed to this release:

- [@Ten0](https://github.com/Ten0) motivated the rewrite to use the serde API
directly and contributed additional benchmarks for JSON transcoding
([PR](https://github.com/chmp/serde_arrow/pull/130))
- [@alamb](https://github.com/alamb) added improved documentation on how to use
`serde_arrow` with the `arrow` crate
([PR](https://github.com/chmp/serde_arrow/pull/131))

## 0.9.1

- `Decimal128` support: serialize / deserialize
Expand Down Expand Up @@ -132,6 +161,13 @@ Bug fixes:
- nested options (`Option<Option<T>>`)
- creating `float16` arrays

### Thanks

The following people contributed to this release:

- [@elbaro](https://github.com/elbaro) updated the readme example
([PR](https://github.com/chmp/serde_arrow/pull/33))

## 0.6.1

- Add support for `arrow=37` with the `arrow-37` feature
Expand Down
5 changes: 3 additions & 2 deletions serde_arrow/src/_impl/docs/quickstart.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,9 @@
//! # #[cfg(not(has_arrow))] fn main() { }
//! ```
//!
//! Integer fields containing timestamps in milliseconds since the epoch can be
//! directly stored as `Date64` without any configuration:
//! Integer fields containing timestamps in milliseconds since the epoch or
//! `DateTime<Utc>` objects can be directly stored as `Date64` without any
//! configuration:
//!
//! ```rust
//! # #[cfg(has_arrow)]
Expand Down
30 changes: 10 additions & 20 deletions serde_arrow/src/arrow2_impl/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,15 @@ use serde::{Deserialize, Serialize};
use crate::{
_impl::arrow2::{array::Array, datatypes::Field},
internal::{
common::Mut,
error::Result,
schema::{GenericField, SerdeArrowSchema},
serialization_ng::OuterSequenceBuilder,
source::deserialize_from_source,
serialization::OuterSequenceBuilder,
},
};

use super::deserialization::build_deserializer;

/// Build arrow2 arrays record by record (*requires one of the `arrow2-*`
/// features*)
///
Expand Down Expand Up @@ -176,28 +178,16 @@ where
T: Deserialize<'de>,
A: AsRef<dyn Array>,
{
use crate::internal::{
common::{BufferExtract, Buffers},
deserialization,
};

let fields = fields
.iter()
.map(GenericField::try_from)
.collect::<Result<Vec<_>>>()?;

let num_items = arrays
let arrays = arrays
.iter()
.map(|a| a.as_ref().len())
.min()
.unwrap_or_default();

let mut buffers = Buffers::new();
let mut mappings = Vec::new();
for (field, array) in fields.iter().zip(arrays.iter()) {
mappings.push(array.as_ref().extract_buffers(field, &mut buffers)?);
}
.map(|array| array.as_ref())
.collect::<Vec<_>>();

let interpreter = deserialization::compile_deserialization(num_items, &mappings, buffers)?;
deserialize_from_source(interpreter)
let mut deserializer = build_deserializer(&fields, &arrays)?;
let res = T::deserialize(Mut(&mut deserializer))?;
Ok(res)
}
Loading

0 comments on commit fea50ef

Please sign in to comment.