Skip to content

Conversation

@XiangpengHao
Copy link
Contributor

@XiangpengHao XiangpengHao commented Mar 9, 2025

Which issue does this PR close?

Rationale for this change

I encountered a compile error when trying to upgrade DataFusion to V46 for paruqet viewer.

I think the bug was accidentally introduced in #14464 cc @logan-keede

What changes are included in this PR?

Reverted the wasm32 gate as it should work for wasm32. I also enabled the parquet feature for wasm-test so that we can capture the bug in ci.

I also removed the fs feature on tokio, as I don't think parquet datasource relies on it, which breaks wasm build.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the development-process Related to development process of DataFusion label Mar 9, 2025
- name: Install dependencies
run: |
apt-get update -qq
apt-get install -y -qq clang
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this I got: https://github.com/XiangpengHao/datafusion/actions/runs/13744579987/job/38437951143#step:6:144

warning: zstd-sys@2.0.13+zstd.1.5.6: ToolExecError: Command LC_ALL="C" "sccache" "clang" "-O0" "-ffunction-sections" "-fdata-sections" "-fno-exceptions" "-g" "-fno-omit-frame-pointer" "--target=wasm32-unknown-unknown" "-I" "wasm-shim/" "-I" "zstd/lib/" "-I" "zstd/lib/common" "-fvisibility=hidden" "-DZSTD_LIB_DEPRECATED=0" "-DXXH_PRIVATE_API=" "-DZSTDLIB_VISIBILITY=" "-DZSTDERRORLIB_VISIBILITY=" "-o" "/__w/datafusion/datafusion/target/wasm32-unknown-unknown/debug/build/zstd-sys-789e7626e12bcd14/out/44ff4c55aa9e5133-debug.o" "-c" "zstd/lib/common/debug.c" with args clang did not execute successfully (status code exit status: 2).cargo:warning=Compiler family detection failed due to error: ToolNotFound: Failed to find tool. Is `clang` installed?
warning: zstd-sys@2.0.13+zstd.1.5.6: Compiler family detection failed due to error: ToolNotFound: Failed to find tool. Is `clang` installed?
warning: zstd-sys@2.0.13+zstd.1.5.6: Compiler family detection failed due to error: ToolNotFound: Failed to find tool. Is `clang` installed?
warning: zstd-sys@2.0.13+zstd.1.5.6: Compiler family detection failed due to error: ToolNotFound: Failed to find tool. Is `clang` installed?
warning: zstd-sys@2.0.13+zstd.1.5.6: sccache: error: failed to execute compile
warning: zstd-sys@2.0.13+zstd.1.5.6: sccache: caused by: cannot find binary path

So I added clang here

@logan-keede
Copy link
Contributor

Thanks for cleaning up after me @XiangpengHao.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @XiangpengHao

It seems like this code is not covered and thus is likely to get broken again as part of refactoring.

We have a CI job that is supposed to check wasm like this:

linux-wasm-pack:
name: build with wasm-pack
runs-on: ubuntu-latest
container:
image: amd64/rust
steps:
- uses: actions/checkout@v4
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: stable
- name: Install wasm-pack
run: curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh
- name: Build with wasm-pack
working-directory: ./datafusion/wasmtest

It has a special crate that compiles some sort of test
https://github.com/apache/datafusion/blob/main/datafusion/wasmtest

Can you help me understand what is different about the parquet viewer that isn't covered by the existing test?

# code size when deploying.
console_error_panic_hook = { version = "0.1.1", optional = true }
datafusion = { workspace = true }
datafusion = { workspace = true, features = ["parquet"] }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand what is different about the parquet viewer that isn't covered by the existing test?

Here's the secret @alamb . I think the workspace datafusion has all features disabled. If we enable parquet feature here, we should be able to see compile error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed you are right -- I verified I got a compile error without the code changes in this PR

andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion/datafusion/wasmtest$ wasm-pack build --dev

Compiling datafusion-datasource-csv v46.0.0 (/Users/andrewlamb/Software/datafusion/datafusion/datasource-csv)
error[E0432]: unresolved import `crate::file_format::coerce_file_schema_to_view_type`
   --> datafusion/datasource-parquet/src/opener.rs:23:40
    |
23  |     coerce_file_schema_to_string_type, coerce_file_schema_to_view_type,
    |                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |                                        |
    |                                        no `coerce_file_schema_to_view_type` in `file_format`
    |                                        help: a similar name exists in the module: `coerce_file_schema_to_string_type`
    |
note: found an item that was configured out
   --> datafusion/datasource-parquet/src/file_format.rs:470:8
    |
470 | pub fn coerce_file_schema_to_view_type(
    |        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: the item is gated here
   --> datafusion/datasource-parquet/src/file_format.rs:469:1
    |
469 | #[cfg(not(target_arch = "wasm32"))]
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0425]: cannot find function `coerce_file_schema_to_view_type` in this scope
   --> datafusion/datasource-parquet/src/file_format.rs:726:27
    |
519 | / pub fn coerce_file_schema_to_string_type(
520 | |     table_schema: &Schema,
521 | |     file_schema: &Schema,
522 | | ) -> Option<Schema> {
...   |
571 | | }
    | |_- similarly named function `coerce_file_schema_to_string_type` defined here
...
726 |       if let Some(merged) = coerce_file_schema_to_view_type(&table_schema, &file...
    |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: a function with a similar name exists: `coerce_file_schema_to_string_type`

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 maybe we should plant to make a DataFusion 46.0.1 release with this fix

@alamb
Copy link
Contributor

alamb commented Mar 10, 2025

I have been thinking about this issue

I plan to:

  1. Make a ticket proposing a new patch release for datafusion 46
  2. Make a ticket to cover doing something with parquet via wasm (not just adding the feature flag as in this PR)

@XiangpengHao
Copy link
Contributor Author

Make a ticket to cover doing something with parquet via wasm (not just adding the feature flag as in this PR)

Sounds great! I can also try to upgrade DataFusion in parquet viewer before every major release to see if I can help capture anything.

@alamb
Copy link
Contributor

alamb commented Mar 11, 2025

BTW I think we should consider fixing this in a 46.0.0 release.

@alamb
Copy link
Contributor

alamb commented Mar 11, 2025

Filed a ticket to track adding coverage

@alamb alamb merged commit 6f285d6 into apache:main Mar 11, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Mar 11, 2025

Thanks again @XiangpengHao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of DataFusion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

regression: DataFusion 46 wasm compile error with parquet

3 participants