forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 15
ARROW-16676: [C++] Wrong result of ReservationListenableMemoryPool::Impl::bytes_allocated() #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… writing dataset ignores a single file Closes apache#12898 from jorisvandenbossche/ARROW-16204 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Closes apache#12958 from kszucs/macos-git Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
This PR fixes the CI failures due to the latest git release fixing CVE-2022-24765. I have been able to see the build passing the scm step with the change here: https://app.travis-ci.com/github/raulcd/arrow/builds/249688925 The above build fails due to some tests failing but not related with installation anymore. And this was the failure before the change: https://app.travis-ci.com/github/raulcd/arrow/builds/249683847 I also have been able to reproduce the issue on the `verify-conda-rc` locally. The failure: ``` $ docker-compose run conda-verify-rc .... Preparing transaction: done Verifying transaction: done Executing transaction: done /arrow/python /arrow / Traceback (most recent call last): File "/arrow/python/setup.py", line 607, in <module> setup( File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/__init__.py", line 87, in setup return distutils.core.setup(**attrs) File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 109, in setup _setup_distribution = dist = klass(attrs) File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/dist.py", line 462, in __init__ _Distribution.__init__( File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 293, in __init__ self.finalize_options() File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/dist.py", line 886, in finalize_options ep(self) File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/dist.py", line 907, in _finalize_setup_keywords ep.load()(self, ep.name, value) File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools_scm/integration.py", line 75, in version_keyword _assign_version(dist, config) File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools_scm/integration.py", line 51, in _assign_version _version_missing(config) File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools_scm/__init__.py", line 106, in _version_missing raise LookupError( LookupError: setuptools-scm was unable to detect version for /arrow. Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work. For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj Failed to verify release candidate. See /tmp/arrow-HEAD.YUVPq for details. ``` Waiting to validate the verify-rc fix at the moment, will update once the local build finishes Closes apache#12945 from raulcd/ARROW-16219 Authored-by: Raúl Cumplido <raulcumplido@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
…ng files This is a PR to support arbitrary R "connection" objects as Input and Output streams. In particular, this adds support for sockets (ARROW-4512), URLs, and some other IO operations that are implemented as R connections (e.g., in the [archive](https://github.com/r-lib/archive#archive) package). The gist of it is that you should be able to do this: ``` r # remotes::install_github("paleolimbot/arrow/r@r-connections") library(arrow, warn.conflicts = FALSE) addr <- "https://github.com/apache/arrow/raw/master/r/inst/v0.7.1.parquet" stream <- arrow:::make_readable_file(addr) rawToChar(as.raw(stream$Read(4))) #> [1] "PAR1" stream$close() stream <- arrow:::make_readable_file(url(addr, open = "rb")) rawToChar(as.raw(stream$Read(4))) #> [1] "PAR1" stream$close() ``` There are two serious issues that prevent this PR from being useful yet. First, it uses functions that R considers "non-API" functions from the C API. > checking compiled code ... NOTE File ‘arrow/libs/arrow.so’: Found non-API calls to R: ‘R_GetConnection’, ‘R_ReadConnection’, ‘R_WriteConnection’ Compiled code should not call non-API entry points in R. We can get around this by calling back into R (in the same way this PR implements `Tell()` and `Close()`). We could also go all out and implement the other half (exposing `InputStream`/`OutputStream`s as R connections) and ask for an exemption (at least one R package, curl, does this). The archive package seems to expose connections without a NOTE on the CRAN check page, so perhaps there is also a workaround. Second, we get a crash when passing the input stream to most functions. I think this is because the `Read()` method is getting called from another thread but it also could be an error in my implementation. If the issue is threading, we would have to arrange a way to queue jobs for the R main thread (e.g., how the [later](https://github.com/r-lib/later#background-tasks) package does it) and a way to ping it occasionally to fetch the results. This is complicated but might be useful for other reasons (supporting evaluation of R functions in more places). It also might be more work than it's worth. ``` r # remotes::install_github("paleolimbot/arrow/r@r-connections") library(arrow, warn.conflicts = FALSE) addr <- "https://github.com/apache/arrow/raw/master/r/inst/v0.7.1.parquet" read_parquet(addr) ``` ``` *** caught segfault *** address 0x28, cause 'invalid permissions' Traceback: 1: parquet___arrow___FileReader__OpenFile(file, props) ``` Closes apache#12323 from paleolimbot/r-connections Lead-authored-by: Dewey Dunnington <dewey@fishandwhistle.net> Co-authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
…l2 wheels This approach is more robust to vcpkg and dependency changes, see the ticket's description for details. Closes apache#12959 from kszucs/delocate-fuse Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
## Summary of Changes * Added `rbind` and `cbind` for Table * Added `cbind` for RecordBatch. `rbind` just redirects the user to use `Table$create` * Changed `c.Array()` implementation to use either `concat_array()` or `ChunkedArray$create()` depending on whether the user wants a single array or zero-copy. * Implemented `c.ChunkedArray` Closes apache#12751 from wjones127/ARROW-15989-rbind-table Lead-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Co-authored-by: Dewey Dunnington <dewey@fishandwhistle.net> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
…es of ParquetDataset This PR tries to: - deprecate the `metadata,` `metadata_path` and `common_metadata_path` attributes in the legacy ParquetDataset. - deprecate passing the `metadata` keyword in the ParquetDataset constructor. `common_metadata` attribute has already been deprecated. Closes apache#12952 from AlenkaF/ARROW-16121 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
@jonkeane there are still a bunch of skips but the once relying on the "helper-skip" functions should all be forced to run now. Closes apache#12940 from assignUser/ARROW-15015-all-tests Authored-by: Jacob Wujciak-Jens <jacob@wujciak.de> Signed-off-by: Jonathan Keane <jkeane@gmail.com>
I see big improvements in the `ExecuteScalarExpressionOverhead` benchmarks, especially `ref_only_expression` with this, for example: ``` before: ExecuteScalarExpressionOverhead/ref_only_expression/rows_per_batch:1000000/real_time/threads:1 35.4 ns 35.4 ns 19577007 batches_per_second=28.2395M/s rows_per_second=28.2395T/s ExecuteScalarExpressionOverhead/ref_only_expression/rows_per_batch:1000000/real_time/threads:16 49.8 ns 788 ns 14280992 batches_per_second=20.0734M/s rows_per_second=20.0734T/s after: ExecuteScalarExpressionOverhead/ref_only_expression/rows_per_batch:1000000/real_time/threads:1 27.6 ns 27.5 ns 25090317 batches_per_second=36.2832M/s rows_per_second=36.2832T/s ExecuteScalarExpressionOverhead/ref_only_expression/rows_per_batch:1000000/real_time/threads:16 4.26 ns 67.2 ns 184745728 batches_per_second=235.006M/s rows_per_second=235.006T/s ``` Also the overhead of small batch size/multithreaded benchmarks is reduced, for example in `complex_expression`: ``` before: ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:1000/real_time/threads:1 3723682 ns 3721326 ns 191 batches_per_second=268.551k/s rows_per_second=268.551M/s ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:1000/real_time/threads:16 1153070 ns 18365265 ns 624 batches_per_second=867.25k/s rows_per_second=867.25M/s after: ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:1000/real_time/threads:1 3543745 ns 3541909 ns 197 batches_per_second=282.187k/s rows_per_second=282.187M/s ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:1000/real_time/threads:16 841776 ns 13395266 ns 944 batches_per_second=1.18796M/s rows_per_second=1.18796G/s ``` Closes apache#12957 from zagto/datatype-performace-expression-type Authored-by: Tobias Zagorni <tobias@zagorni.eu> Signed-off-by: David Li <li.davidm96@gmail.com>
Not sure if it's important, but this will break the link to this section, which is currently ``` https://arrow.apache.org/docs/python/parquet.html#writing-metadata-and-common-medata-files ``` Closes apache#12961 from kylebarron/patch-2 Authored-by: Kyle Barron <kylebarron2@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…es it impossible to maintain old behavior This PR tries to pass `existing_data_behavior` to `write_to_dataset` in case of the new dataset implementation. Connected to apache#12811. Closes apache#12838 from AlenkaF/ARROW-15757 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Closes apache#12953 from kou/glib-parquet-statistics Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
….write_to_dataset with use_legacy_dataset=False Closes apache#12955 from AlenkaF/ARROW-16240 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
…de-test Closes apache#12944 from westonpace/bugfix/ARROW-16264--valgrind-timeout-hash-join-node-test Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
…n scanning parquet This PR changes a few things. * The default file readahead is changed to 4. This doesn't seem to affect performance on HDD/SSD and users should already be doing special tuning for S3. Besides, in many cases, users are reading IPC/Parquet files that have many row groups and so we already have sufficient I/O parallelism. This is important for bringing down the overall memory usage as can be seen in the formula below. * The default batch readahead is changed to 16. Previously, when we were doing filtering and projection within the scanner, it made sense to read many batches ahead (generally want at least 2 * # of CPUs in that case). Now that the exec plan is doing the computation the exec plan buffering is instead handled by kDefaultBackpressureLowBytes and kDefaultBackpressureHighBytes. * Moves around the parquet readahead a bit. The previous version would read ahead N row groups. Now we always read ahead exactly 1 row group but we read ahead N batches (this may mean that we read ahead more than 1 row group if the batch size is much larger than the row group size). * Backpressure now utilizes the pause/resume producing signals in the execution plan. I've adding a `counter` argument to the calls to help deal with the challenges that arise when we try and sequence backpressure signals. Partly this was to add support for monitoring backpressure (for tests). Partly it is because I have since become more aware of the reasons for these signals. They are needed to allow for backpressure from the aggregate & join nodes. * Sink backpressure can now be monitored. This makes it easier to test and could be potentially useful to a user that wanted to know when they are consuming the plan too slowly. * Changes the default scanner batch size to 128Ki rows. Now that we have more or less decoupled the scanning batch size from the row group size we can pass smaller batches through the scanner. This makes it easier to get parallelism on small datasets.. Putting this altogether the scanner should now buffer in memory: MAX(fragment_readahead * row_group_size_bytes * 2, fragment_readahead * batch_readahead * batch_size_bytes) The exec plan sink node should buffer ~ kDefaultBackpressureHighBytes bytes. The exec plan itself can have some number of tasks in flight but, assuming there are no pipeline breakers, this will be limited to the number of threads in the CPU thread pool and so it should be parallelism * batch_size_bytes. Adding those together should give the total RAM usage of a plan being read via a sink node that doesn't have any pipeline breakers. When the sink is a write node then there is a separate backpressure consideration based on # of rows (we can someday change this to be # of bytes but it would be a bit tricky at the moment because we need to balance this with the other write parameters like min_rows_per_group). So, given the parquet dataset mentioned in the JIRA (21 files, 10 million rows each, 10 row groups each) and knowing that 1 row group is ~140MB when decompressed into Arrow format we should get the following default memory usage: Scanner readahead = MAX(4 * 140MB * 2, 4 * 16 * 17.5MB) = MAX(1120MB, 1120MB) = 1120MB Sink readahead ~ 1GiB Total RAM usage should then be ~2GiB. - [x] Add tests to verify memory usage - [ ] ~~Update docs to mention that S3 users may want to increase the fragment readahead but this will come at the cost of more RAM usage.~~ - [ ] ~~Update docs to give some of this "expected memory usage" information~~ Closes apache#12228 from westonpace/feature/ARROW-15410--improve-dataset-parquet-memory-usage Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
1. Increase timeout to 60 from 40 for macOS and Windows jobs because 40 is short to build without ccache cache. 2. Don't build C++ utilities (ARROW_BUILD_UTILITIES=OFF) because they aren't used. 3. Omit a test for gparquet_row_group_metadata_equal() because it seems that parquet::RowGroupMetaData::Equals() is unstable. We can look into this later and fix this. Closes apache#12964 from kou/ci-glib-stable Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
This fixes the following warning:
GLib-GObject-WARNING **: value "((GArrowRoundMode) -765521144)"
of type 'GArrowRoundMode' is invalid or out of range for property
'mode' of type 'GArrowRoundMode'
Closes apache#12971 from kou/glib-compute-round-options-missing-cast
Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
This is a follow-up of apache#12958. Closes apache#12969 from kou/ci-java-jars-macos Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
This is a follow-up of apache#12958. Closes apache#12966 from kou/ci-verify-rc-macos Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Closes apache#12893 from assignUser/ARROW-16198-bump-vcpkg Lead-authored-by: Jacob Wujciak-Jens <jacob@wujciak.de> Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
This fixes a bug when coalescing is enabled. Also, it changes the conditional so that we *skip* coalescing for zero-copy files instead of enabling it. Finally, it slightly refactors the tests to ensure this case is hit. Closes apache#12937 from lidavidm/arrow-16238 Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
…True Also improve errno propagation from HDFS-related errors. Closes apache#12943 from pitrou/ARROW-16261-hdfs-delete-dir-contents Lead-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Closes apache#12817 from paleolimbot/r-s3-generics Lead-authored-by: Dewey Dunnington <dewey@fishandwhistle.net> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Closes apache#12968 from lidavidm/minor-ruby Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>
Quick follow up to apache#12891 Closes apache#12949 from lidavidm/arrow-12659 Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Yibo Cai <yibo.cai@arm.com>
At the moment there is not enough non-Substrait functionality to justify a dedicated engine module. In the future there likely will be and so I have some of the general structure around in the C++ (e.g. `ARROW_ENGINE_EXPORT`) but removed any external facing references. Closes apache#12915 from westonpace/feature/ARROW-16158--rename-engine-to-substrait Lead-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Closes apache#12938 from domoritz/dom/versions Lead-authored-by: Dominik Moritz <domoritz@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Ubuntu 22.04 ships OpenSSL 3 and .NET 5.0 or earlier doesn't support OpenSSL 3 yet. Closes apache#12870 from kou/release-ubuntu-22.04-csharp Lead-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Because windows-2016 is deprecated: actions/runner-images#4312 Closes apache#12970 from kou/ci-verify-rc-windows Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Uses `nil` for `defLvls` and `repLvls` when skipping boolean values, since the scratch buffer allocated for n boolean values when skipping is not large enough to hold n def and rep levels, resulting in an [out of bounds panic](https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader.go#L407) when skipping too many rows. Closes apache#13221 from mdepero/go-boolskip Authored-by: Matt DePero <depero@neeva.co> Signed-off-by: Matthew Topol <mtopol@factset.com>
…sive amount of RAM if the producer uses large anchors This PR modifies the ExtensionSet in C++ Consumer to use an `unordered_map` instead of a `vector` to store the `uri` anchors as the lookup table. This also modifies the usage of the `impl` struct as now the included functions are defined directly with the ExtensionSet implementation. Closes apache#12852 from sanjibansg/substrait/uri_map Authored-by: Sanjiban Sengupta <sanjiban.sg@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
Closes apache#13226 from davisusanibar/ARROW-16267 Authored-by: david dali susanibar arce <davi.sarces@gmail.com> Signed-off-by: Alessandro Molina <amol@turbogears.org>
…inks in the pkgdown site Closes apache#13213 from eitsupi/fix-changelog Authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
…(--doctest-modules) A series of 3 PRs add `doctest` functionality to ensure that docstring examples are actually correct (and keep being correct). - [x] Add `--doctest-module` - [x] Add `--doctest-cython` apache#13204 - [x] Create a CI job apache#13216 This PR can be tested with `pytest --doctest-modules python/pyarrow`. Closes apache#13199 from AlenkaF/ARROW-16018 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
This PR was created to implement binary functions in Gandiva side based on [Hive implementation](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToBinary.java). This PR implements the follow signatures: FunctionSignature{name =binary, return type =binary, param types =[string]} FunctionSignature{name =binary, return type =binary, param types =[binary]} Closes apache#13073 from Johnnathanalmeida/feature/add-binary-function Authored-by: Johnnathan <johnnathanalmeida@gmail.com> Signed-off-by: Pindikura Ravindra <ravindra@dremio.com>
…ry_encode() with interval types Add support for unique(), value_counts(), dictionary_encode() with interval types. Closes apache#13231 from okadakk/add-support-unique-valuecounts-dictencode-with-interval-types Authored-by: okadakk <k.suke.jp1990@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>
…#13228) Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Closes apache#13236 from save-buffer/sasha_scalars Authored-by: Sasha Krassovsky <krassovskysasha@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
Closes apache#13235 from zagto/vectorkernel-constructor Authored-by: Tobias Zagorni <tobias@zagorni.eu> Signed-off-by: Weston Pace <weston.pace@gmail.com>
…3237) `message()` without mode is not preceded with `--`, and may be printed out of order. E.g., `Using ld linker` is not aligned with other messages. ``` -- Performing Test CXX_SUPPORTS_AVX512 - Success -- Arrow build warning level: CHECKIN Using ld linker -- Configured for RELWITHDEBINFO build ... ``` Authored-by: Yibo Cai <yibo.cai@arm.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…(CI job) This PR adds a CI job to test python docstrings with `doctest.` It can be tested with `archery docker run conda-python-docs`. Closes apache#13216 from AlenkaF/ARROW-16018-CI Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Raúl Cumplido <raulcumplido@gmail.com> Signed-off-by: Alessandro Molina <amol@turbogears.org>
… update release comments to contain MINOR This is a minor fix to be consistent with our commit messages on the automated release comments and a minor fix to archery release to be able to process tickets that are `MINOR`. Closes apache#13229 from raulcd/update-release-messages Authored-by: Raúl Cumplido <raulcumplido@gmail.com> Signed-off-by: Alessandro Molina <amol@turbogears.org>
…(--doctest-cython) Adding `--doctest-cython` functionality which will be run on the CI with a follow-up PR. This PR can be tested with `pytest --doctest-cython python/pyarrow`. Closes apache#13204 from AlenkaF/ARROW-16018-doctest-cython Authored-by: Alenka Frim <frim.alenka@gmail.com> Signed-off-by: Alessandro Molina <amol@turbogears.org>
Closes apache#13149 from assignUser/ARROW-16403-nightly-crossbow Lead-authored-by: Jacob Wujciak-Jens <jacob@wujciak.de> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Alessandro Molina <amol@turbogears.org>
* Pushes KVM handling into ExecPlan so that Run() preserves the R metadata we want. * Also pushes special handling for a kind of collapsed query from collect() into Build(). * Better encapsulate KVM for the the $metadata and $r_metadata so that as a user/developer, you never have to touch the serialize/deserialize functions, you just have a list to work with. This is a slight API change, most noticeable if you were to `print(tab$metadata)`; better is to `print(str(tab$metdata))`. * Factor out a common utility in r/src for taking cpp11::strings (named character vector) and producing arrow::KeyValueMetadata The upshot of all of this is that we can push the ExecPlan evaluation into `as_record_batch_reader()`, and all that `collect()` does on top is read the RBR into a Table/data.frame. This means that we can plug dplyr queries into anything else that expects a RecordBatchReader, and it will be (to the maximum extent possible, given the limitations of ExecPlan) streaming, not requiring you to `compute()` and materialize things first. Closes apache#13210 from nealrichardson/kvm Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Authored-by: Raúl Cumplido <raulcumplido@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…itioning This PR fixes the issue of the partitioning field having null values while using FilenamePartitioning. For FilenamePartitioning, we should only remove the prefix and thus should not use `StripPrefixAndFilename()`, which will remove the filename too along with the prefix. Closes apache#12977 from sanjibansg/fix-FilenamePartitioning Authored-by: Sanjiban Sengupta <sanjiban.sg@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
…che#13191) Currently a failed write (due to the server sending an error, disconnecting, etc.) will raise an uninformative error on the client. Prior to the refactoring done in Arrow 8.0.0, this was silently swallowed (so clients would not get any indication until they finished writing). In 8.0.0 instead the error got propagated but this led to confusing, uninformative errors. Instead, tag this specific error so that the client implementation knows to finish the call and get the actual server error. (gRPC doesn't give us the actual error until we explicitly finish the call, so we can't get the actual error directly.) Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Yibo Cai <yibo.cai@arm.com>
…apache#13113) This PR adds new jobs to the nightly tests in order to exercise the existing Python minimal build examples. Authored-by: Raúl Cumplido <raulcumplido@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.