Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework ipc_compression feature flags #1

Draft
wants to merge 129 commits into
base: flight_data_compression
Choose a base branch
from

Conversation

alamb
Copy link

@alamb alamb commented Aug 7, 2022

Currently a draft as I haven't completed this work yet

Note this target's the same branch as apache#1855

Rationale

Feature flags are somewhat of a pain to deal with in arrow. This PR attempts to clean up the feature_flag handling in @liukun4515 's PR to add ipc compression, apache#1855

Changes

  • Make two implementations of CompressionCodecType with the same interface.
  • Make code fallable (return Result)
  • Pick between the two implementations based on feature flags
  • Update the rest of the code to use the same interface
  • Test variants based on feature flags
  • Update documentation for new features / feature flags

tustvold and others added 30 commits July 29, 2022 12:05
* parquet: export json api with `serde_json` feature name

* chore: don't piggyback on optional feature name
* add instructions

Signed-off-by: remzi <13716567376yh@gmail.com>

* fmt

Signed-off-by: remzi <13716567376yh@gmail.com>

* update discord link

Signed-off-by: remzi <13716567376yh@gmail.com>
* Add AmazonS3Config, MicrosoftAzureBuilder, GoogleCloudStorageBuilder

* fix: improve docs

* review feedback: remove old code, make with_client test only
* add append_option support to decimal builders

* fix linting

* pr comments
* Rename DataType::Decimal to DataType::Decimal128

* Update doc
)

* split

Signed-off-by: remzi <13716567376yh@gmail.com>

* rename

Signed-off-by: remzi <13716567376yh@gmail.com>
* Add LimitStore (apache#2175)

* Review feedback

* Fix test
…che#2231)

* Automatically grow parquet BitWriter (apache#2226)

* Review feedback
Signed-off-by: remzi <13716567376yh@gmail.com>
…pache#2221)

* Optimized writing of byte array to parquet (apache#1764)

* Review feedback

* Fix logical conflict
…cations (apache#2235)

* Fix bug

* Add tests

* Update arrow/src/datatypes/types.rs

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

* Update arrow/src/datatypes/types.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
* Update prost requirement from 0.10 to 0.11

Updates the requirements on [prost](https://github.com/tokio-rs/prost) to permit the latest version.
- [Release notes](https://github.com/tokio-rs/prost/releases)
- [Commits](tokio-rs/prost@v0.10.0...v0.11.0)

---
updated-dependencies:
- dependency-name: prost
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update tonic-build requirement from 0.7 to 0.8

Updates the requirements on [tonic-build](https://github.com/hyperium/tonic) to permit the latest version.
- [Release notes](https://github.com/hyperium/tonic/releases)
- [Changelog](https://github.com/hyperium/tonic/blob/master/CHANGELOG.md)
- [Commits](hyperium/tonic@v0.7.0...v0.8.0)

---
updated-dependencies:
- dependency-name: tonic-build
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update tonic requirement from 0.7 to 0.8

Updates the requirements on [tonic](https://github.com/hyperium/tonic) to permit the latest version.
- [Release notes](https://github.com/hyperium/tonic/releases)
- [Changelog](https://github.com/hyperium/tonic/blob/master/CHANGELOG.md)
- [Commits](hyperium/tonic@v0.7.0...v0.8.0)

---
updated-dependencies:
- dependency-name: tonic
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update prost-derive requirement from 0.10 to 0.11

Updates the requirements on [prost-derive](https://github.com/tokio-rs/prost) to permit the latest version.
- [Release notes](https://github.com/tokio-rs/prost/releases)
- [Commits](tokio-rs/prost@v0.10.0...v0.11.0)

---
updated-dependencies:
- dependency-name: prost-derive
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update prost-types requirement from 0.10.0 to 0.11.0

Updates the requirements on [prost-types](https://github.com/tokio-rs/prost) to permit the latest version.
- [Release notes](https://github.com/tokio-rs/prost/releases)
- [Commits](tokio-rs/prost@v0.10.0...v0.11.0)

---
updated-dependencies:
- dependency-name: prost-types
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update vendored tonic/prost generated code

* Install protoc in CI builds

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* update binary_from_list

Signed-off-by: remzi <13716567376yh@gmail.com>

* fix binary from list

Signed-off-by: remzi <13716567376yh@gmail.com>

* fix decimal from fixed list

Signed-off-by: remzi <13716567376yh@gmail.com>

* fix fixed binary from fixed list

Signed-off-by: remzi <13716567376yh@gmail.com>

* fix string from list

Signed-off-by: remzi <13716567376yh@gmail.com>

* add child length check

Signed-off-by: remzi <13716567376yh@gmail.com>

* clean the code

Signed-off-by: remzi <13716567376yh@gmail.com>
…he#2275)

* fix bug: decimal cmp

* optimizer the error message

* address comment
…2251)

* feat: Implement string cast operations for Time32 and Time64

* chore: Remove unnecessary leap second handling

Remove the unnecessary conditionals to extract the leap second, as it is
already handled when converting to a time unit relative to midnight 🤦🏻‍♂️

* chore: Inline trivial functions
* Handle symlinks in LocalFileSystem (apache#2206)

* Update object_store/src/local.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Improve crates.io page

* Improve builder doc examples

* Add examples in main library docs

* Apply suggestions from code review

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* Fix coverage and mac jobs -- still need to fix windows

* try and fix coverage

* comment out coverage
…pache#2237)

* replace ArrayReader::next_batch with ArrayReader::read_records and ArrayReader::consume_batch.

* fix ut

* fix comment

* avoid clone.

* fix new ut

* fix comment

Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Markus Westerlind and others added 30 commits August 11, 2022 08:06
…dictionaries (apache#2391)

* fix: Don't instantiate the scalar composition code quadratically for dictionaries

Instead, re-use the ones normal function. Reduces how much code `datafusion-physical-expr` generated significantly (since the functions are generic, and not instantiated in `arrow` itself, it only shows up downstream).

https://github.com/apache/arrow-datafusion

There is technically an extra indirect call now as the recursive call to `eq_dyn_scalar` etc coerces to a `dyn Array` again but that seems unlikely to matter.

## cargo llvm-lines -p datafusion-physical-expr

### Before

```
 Lines           Copies        Function name
  -----           ------        -------------
  2270242 (100%)  38377 (100%)  (TOTAL)
   245854 (10.8%)  5580 (14.5%) core::option::Option<T>::ok_or_else
    58690 (2.6%)     10 (0.0%)  arrow::compute::kernels::comparison::eq_dyn_scalar
    58690 (2.6%)     10 (0.0%)  arrow::compute::kernels::comparison::gt_dyn_scalar
    58690 (2.6%)     10 (0.0%)  arrow::compute::kernels::comparison::gt_eq_dyn_scalar
    58690 (2.6%)     10 (0.0%)  arrow::compute::kernels::comparison::lt_dyn_scalar
    58690 (2.6%)     10 (0.0%)  arrow::compute::kernels::comparison::lt_eq_dyn_scalar
    58690 (2.6%)     10 (0.0%)  arrow::compute::kernels::comparison::neq_dyn_scalar
    55800 (2.5%)    900 (2.3%)  arrow::compute::kernels::comparison::eq_dyn_scalar::{{closure}}
    55800 (2.5%)    900 (2.3%)  arrow::compute::kernels::comparison::gt_dyn_scalar::{{closure}}
    55800 (2.5%)    900 (2.3%)  arrow::compute::kernels::comparison::gt_eq_dyn_scalar::{{closure}}
    55800 (2.5%)    900 (2.3%)  arrow::compute::kernels::comparison::lt_dyn_scalar::{{closure}}
    55800 (2.5%)    900 (2.3%)  arrow::compute::kernels::comparison::lt_eq_dyn_scalar::{{closure}}
    55800 (2.5%)    900 (2.3%)  arrow::compute::kernels::comparison::neq_dyn_scalar::{{closure}}
    44929 (2.0%)    900 (2.3%)  core::option::Option<T>::map
    40986 (1.8%)    162 (0.4%)  <arrow::array::array_boolean::BooleanArray as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
    37528 (1.7%)    508 (1.3%)  core::iter::traits::iterator::Iterator::fold
    30595 (1.3%)    245 (0.6%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    29272 (1.3%)     46 (0.1%)  <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::size_hint
    27815 (1.2%)    285 (0.7%)  core::iter::traits::iterator::Iterator::try_fold
    26014 (1.1%)      1 (0.0%)  datafusion_physical_expr::expressions::binary::BinaryExpr::evaluate_array_scalar
    25095 (1.1%)    441 (1.1%)  core::iter::adapters::map::map_fold::{{closure}}
    22849 (1.0%)    174 (0.5%)  <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold::{{closure}}
    21888 (1.0%)     96 (0.3%)  arrow::compute::kernels::comparison::compare_op_scalar
    21464 (0.9%)     56 (0.1%)  <arrow::array::array_string::GenericStringArray<OffsetSize> as core::iter::traits::collect::FromIterator<core::option::Option<Ptr>>>::from_iter
    21461 (0.9%)    441 (1.1%)  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
    19918 (0.9%)    118 (0.3%)  arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter
    16916 (0.7%)    246 (0.6%)  <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
```

### After

```
  Lines           Copies        Function name
  -----           ------        -------------
  1475122 (100%)  28777 (100%)  (TOTAL)
    44929 (3.0%)    900 (3.1%)  core::option::Option<T>::map
    40986 (2.8%)    162 (0.6%)  <arrow::array::array_boolean::BooleanArray as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
    37528 (2.5%)    508 (1.8%)  core::iter::traits::iterator::Iterator::fold
    34174 (2.3%)    780 (2.7%)  core::option::Option<T>::ok_or_else
    30595 (2.1%)    245 (0.9%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    29272 (2.0%)     46 (0.2%)  <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::size_hint
    27815 (1.9%)    285 (1.0%)  core::iter::traits::iterator::Iterator::try_fold
    26014 (1.8%)      1 (0.0%)  datafusion_physical_expr::expressions::binary::BinaryExpr::evaluate_array_scalar
    25095 (1.7%)    441 (1.5%)  core::iter::adapters::map::map_fold::{{closure}}
    22849 (1.5%)    174 (0.6%)  <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold::{{closure}}
    21888 (1.5%)     96 (0.3%)  arrow::compute::kernels::comparison::compare_op_scalar
    21464 (1.5%)     56 (0.2%)  <arrow::array::array_string::GenericStringArray<OffsetSize> as core::iter::traits::collect::FromIterator<core::option::Option<Ptr>>>::from_iter
    21461 (1.5%)    441 (1.5%)  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
    19918 (1.4%)    118 (0.4%)  arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter
    16916 (1.1%)    246 (0.9%)  <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
    16146 (1.1%)    960 (3.3%)  core::iter::adapters::map::Map<I,F>::new
    15492 (1.1%)    427 (1.5%)  core::iter::traits::iterator::Iterator::for_each
    14921 (1.0%)    111 (0.4%)  alloc::vec::Vec<T,A>::extend_desugared
    14670 (1.0%)    126 (0.4%)  core::iter::adapters::try_process
    13918 (0.9%)      1 (0.0%)  datafusion_physical_expr::expressions::binary::BinaryExpr::evaluate_scalar_array
    13120 (0.9%)     64 (0.2%)  <arrow::array::array_primitive::PrimitiveArray<T> as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
    12963 (0.9%)     52 (0.2%)  <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::try_fold
    12245 (0.8%)    180 (0.6%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}}
    12201 (0.8%)     81 (0.3%)  arrow::buffer::mutable::MutableBuffer::extend_from_iter
    11826 (0.8%)    162 (0.6%)  <arrow::array::array_boolean::BooleanArray as core::iter::traits::collect::FromIterator<Ptr>>::from_iter::{{closure}}
    11536 (0.8%)    960 (3.3%)  core::iter::traits::iterator::Iterator::map
    11200 (0.8%)     32 (0.1%)  alloc::raw_vec::RawVec<T,A>::grow_amortized

```

* refactor: Avoid instantiating a quadratic number of closures due to try_to_type in comparisons (-4%)

Reduces the number of llvm-lines in datafusion-physical-expr by another 4%
…apache#2401)

* Exclude tags when generating changelogs

* Fix release-tarball typo
* Add RowFilter API

* Review feedback

* Fix doc

* Fix handling of NULL boolean array

* Add tests, fix bugs

* Fix clippy

* Review feedback

* Fix doc
* Upgrade ahash to 0.8

* Use hash_one

* Use hash_one

* Use hash_one

* Use compile-time-rng for wasm

* Use compile-time-rng for wasm

* Use compile-time-rng for wasm

* Clippy

* Revert "Clippy"

This reverts commit 4c693cb.
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v2...v3)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/labeler](https://github.com/actions/labeler) from 2.2.0 to 4.0.0.
- [Release notes](https://github.com/actions/labeler/releases)
- [Commits](actions/labeler@2.2.0...v4.0.0)

---
updated-dependencies:
- dependency-name: actions/labeler
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 1 to 4.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v1...v4)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 2 to 3.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](actions/setup-node@v2...v3)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Implement Skip for DeltaBitPackDecoder

* move check out of loop

* add bench

* change to use batch read.
…he#2407)

* Support peek_next_page and skip_next_page in InMemoryPageReader

* fix comment
…ew_bytes` and add length bound for `Decimal::raw_value` (apache#2405)

* add bound

Signed-off-by: remzi <13716567376yh@gmail.com>

* update doc

Signed-off-by: remzi <13716567376yh@gmail.com>

Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>

Signed-off-by: remzi <13716567376yh@gmail.com>
Co-authored-by: Kun Liu <liukun@apache.org>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.