Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-31621: [JS] Fix Union null bitmaps #37122

Merged
merged 30 commits into from
Sep 7, 2023

Conversation

trxcllnt
Copy link
Contributor

@trxcllnt trxcllnt commented Aug 11, 2023

This PR fixes Union null handling, and re-enables the disabled SparseUnion test.

The format doc says:

Unlike other data types, unions do not have their own validity bitmap.

Therefore we need to remove null masks from union types and allow them to delegate validity to their children.

Also fixes #37063 for good measure.

@github-actions
Copy link

⚠️ GitHub issue #31621 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Aug 11, 2023
@pitrou
Copy link
Member

pitrou commented Aug 24, 2023

@trxcllnt Does this allow removing the skip here?

generate_unions_case()
.skip_category('C#')
.skip_category('JS'),

@trxcllnt
Copy link
Contributor Author

@pitrou yes, it should. I'll enable the integration tests now.

@trxcllnt
Copy link
Contributor Author

The integration test is failing because the unions test has duplicate field names. I have a fix for it locally, but it involves renaming the duplicated sparse and dense fields to sparse_{1,2} and dense_{1,2}.

@trxcllnt
Copy link
Contributor Author

From the logs, it seems like the C# implementation may not be aligning everything to 4-byte boundaries. Is there any way to get the files C# produces to confirm?

################# FAILURES #################
FAILED TEST: primitive C# producing,  JS consuming
FAILED TEST: null C# producing,  JS consuming
FAILED TEST: decimal C# producing,  JS consuming
FAILED TEST: datetime C# producing,  JS consuming
FAILED TEST: nested C# producing,  JS consuming
FAILED TEST: recursive_nested C# producing,  JS consuming

@kou
Copy link
Member

kou commented Aug 29, 2023

@eerhardt Could you take a look at this?

@eerhardt
Copy link
Contributor

@trxcllnt - can you run the integration tests locally? The arrow file should be written to the location that --arrow-file specifies when the test runs.

@trxcllnt
Copy link
Contributor Author

@eerhardt I've only ever compiled and run the JS, C++, and Java. Don't really wanna install mono, is there a way to do it via docker?

@eerhardt
Copy link
Contributor

Don't really wanna install mono

You don't need to install mono. .NET runs on Linux and Mac. https://dot.net/download

is there a way to do it via docker?

Yes, this is how the integration tests run in CI. You can execute the same commands as what CI does:

- name: Setup Archery
run: pip install -e dev/archery[docker]
- name: Execute Docker Build
env:
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}
run: >
archery docker run \
-e ARCHERY_DEFAULT_BRANCH=${{ github.event.repository.default_branch }} \
-e ARCHERY_INTEGRATION_WITH_RUST=1 \
conda-integration

You should be able to speed it up by commenting out the languages you don't need to build here:

arrow/docker-compose.yml

Lines 1709 to 1715 in 2981942

["/arrow/ci/scripts/rust_build.sh /arrow /build &&
/arrow/ci/scripts/cpp_build.sh /arrow /build &&
/arrow/ci/scripts/csharp_build.sh /arrow /build &&
/arrow/ci/scripts/go_build.sh /arrow &&
/arrow/ci/scripts/java_build.sh /arrow /build &&
/arrow/ci/scripts/js_build.sh /arrow /build &&
/arrow/ci/scripts/integration_arrow.sh /arrow /build"]

and here:

archery integration \
--run-flight \
--with-cpp=1 \
--with-csharp=1 \
--with-java=1 \
--with-js=1 \
--with-go=1 \
--gold-dirs=$gold_dir/0.14.1 \
--gold-dirs=$gold_dir/0.17.1 \
--gold-dirs=$gold_dir/1.0.0-bigendian \
--gold-dirs=$gold_dir/1.0.0-littleendian \
--gold-dirs=$gold_dir/2.0.0-compression \
--gold-dirs=$gold_dir/4.0.0-shareddict \

@domoritz domoritz merged commit b957847 into apache:main Sep 7, 2023
12 checks passed
@domoritz domoritz removed the awaiting merge Awaiting merge label Sep 7, 2023
@github-actions github-actions bot added the awaiting merge Awaiting merge label Sep 7, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit b957847.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
This PR fixes `Union` null handling, and re-enables the disabled `SparseUnion` test.

The format doc [says](https://arrow.apache.org/docs/format/Columnar.html#union-layout):
> Unlike other data types, unions do not have their own validity bitmap.

Therefore we need to remove null masks from union types and allow them to delegate validity to their children.

Also fixes apache#37063 for good measure.

* Closes: apache#31621
* Closes: apache#37063
* Closes apache#24123
* Closes apache#17168

Authored-by: ptaylor <paul.e.taylor@me.com>
Signed-off-by: Dominik Moritz <domoritz@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
This PR fixes `Union` null handling, and re-enables the disabled `SparseUnion` test.

The format doc [says](https://arrow.apache.org/docs/format/Columnar.html#union-layout):
> Unlike other data types, unions do not have their own validity bitmap.

Therefore we need to remove null masks from union types and allow them to delegate validity to their children.

Also fixes apache#37063 for good measure.

* Closes: apache#31621
* Closes: apache#37063
* Closes apache#24123
* Closes apache#17168

Authored-by: ptaylor <paul.e.taylor@me.com>
Signed-off-by: Dominik Moritz <domoritz@gmail.com>
@trxcllnt trxcllnt deleted the fix/remove-union-null-bitmap branch March 7, 2024 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants