-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge branch 'branch-23.12' into main [skip ci] #10342
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Download Maven from apache.org archives (NVIDIA#10225) Fixes NVIDIA#10224 Replace broken install using apt by downloading Maven from apache.org. Signed-off-by: Gera Shegalov <gera@apache.org> * Fix a hang for Pandas UDFs on DB 13.3[databricks] (NVIDIA#9833) fix NVIDIA#9493 fix NVIDIA#9844 The python runner uses two separate threads to write and read data with Python processes, however on DB13.3, it becomes single-threaded, which means reading and writing run on the same thread. Now the first reading is always ahead of the first writing. But the original BatchQueue will wait on the first reading until the first writing is done. Then it will wait forever. Change made: - Update the BatchQueue to support asking for a batch instead of waiting unitl one is inserted into the queue. This can eliminate the order requirement of reading and writing. - Introduce a new class named BatchProducer to work with the new BatchQueue to support rows number peek on demand for the reading. - Apply this new BatchQueue to relevant plans. - Update the Python runners to support writing one batch one time for the singled-threaded model. - Found an issue about PythonUDAF and RunningWindoFunctionExec, it may be a bug specific to DB 13.3, and add a test (test_window_aggregate_udf_on_cpu) for it. - Other small refactors --------- Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Fix a potential data corruption for Pandas UDF (NVIDIA#9942) This PR moves the BatchQueue into the DataProducer to share the same lock as the output iterator returned by asIterator, and make the batch movement from the input iterator to the batch queue be an atomic operation to eliminate the race when appending the batches to the queue. * Do some refactor for the Python UDF code to try to reduce duplicate code. (NVIDIA#9902) Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Fixed 330db Shims to Adopt the PythonRunner Changes [databricks] (NVIDIA#10232) This PR removes the old 330db shims in favor of the new Shims, similar to the one in 341db. **Tests:** Ran udf_test.py on Databricks 11.3 and they all passed. fixes NVIDIA#10228 --------- Signed-off-by: raza jafri <rjafri@nvidia.com> --------- Signed-off-by: Gera Shegalov <gera@apache.org> Signed-off-by: Firestarman <firestarmanllc@gmail.com> Signed-off-by: raza jafri <rjafri@nvidia.com> Co-authored-by: Gera Shegalov <gera@apache.org> Co-authored-by: Liangcai Li <firestarmanllc@gmail.com>
update download page to v23.12.2 for the Databricks hotfix: NVIDIA#10274 Signed-off-by: Tim Liu <timl@nvidia.com>
Signed-off-by: Tim Liu <timl@nvidia.com>
Signed-off-by: Tim Liu <timl@nvidia.com>
Update changelog with ``` GITHUB_TOKEN=<> scripts/generate-changelog --releases=23.10,23.12 ``` Signed-off-by: Tim Liu <timl@nvidia.com>
NvTimLiu
added
documentation
Improvements or additions to documentation
build
Related to CI / CD or cleanly building
labels
Jan 31, 2024
NvTimLiu
requested review from
jlowe,
revans2,
tgravescs and
GaryShen2008
as code owners
January 31, 2024 06:00
NvTimLiu
requested review from
sameerz,
yinqingh,
YanxuanLiu and
gerashegalov
January 31, 2024 06:01
Signed-off-by: Tim Liu <timl@nvidia.com>
build |
GaryShen2008
approved these changes
Jan 31, 2024
build |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
build
Related to CI / CD or cleanly building
documentation
Improvements or additions to documentation
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change version to v23.12.2
Note: merge this PR with "Create a new merge commit"
Signed-off-by: Tim Liu timl@nvidia.com