Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Windows build fails to parse invalid URL in prepared file #794

Open
jordanrfrazier opened this issue Oct 5, 2023 · 4 comments
Open
Assignees

Comments

@jordanrfrazier
Copy link
Collaborator

jordanrfrazier commented Oct 5, 2023

Summary

Windows build fails to parse an invalid URL due to presence of tilde in prepared file path.

Initial Bug:

Below is initial bug finding:

Windows build is failing after recent changes to read parquet files directly rather than convert them to batches.

https://github.com/kaskada-ai/kaskada/actions/runs/6423963739/job/17443601949

   ---------------------------- Captured stderr call -----------------------------
  thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: \x1b[1mfailed to prepare batch\x1b[22m\n\u251c\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-session\\src\\table.rs:153:14\x1b[23m\n\u2502\n\u251c\u2500\u25b6 \x1b[1minternal error\x1b[22m\n\u2502   \u2570\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-runtime\\src\\prepare\\preparer.rs:133:10\x1b[23m\n\u2502\n\u251c\u2500\u25b6 \x1b[1mfailed to create Parquet file reader\x1b[22m\n\u2502   \u2570\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-runtime\\src\\prepare.rs:52:22\x1b[23m\n\u2502\n\u251c\u2500\u25b6 \x1b[1minvalid parquet file metadata\x1b[22m\n\u2502   \u2570\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-runtime\\src\\read\\parquet_file.rs:63:18\x1b[23m\n\u2502\n\u2570\u2500\u25b6 \x1b[1mGeneric LocalFileSystem error: Unable to access metadata for D:/a/kaskada/kaskada/python/D:/a/kaskada/kaskada/testdata/purchases/purchases_part1.parquet: The filename, directory name, or volume label syntax is incorrect. (os error 123)\x1b[22m\n    \u2570\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-runtime\\src\\read\\parquet_file.rs:62:18\x1b[23m', src\\table.rs:94:24\nnote: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
  _______________________ test_read_parquet_with_subsort ________________________
  
  golden = <conftest.GoldenFixture object at 0x0000018C117B2910>
  
      async def test_read_parquet_with_subsort(golden) -> None:
  >       source = await kd.sources.Parquet.create(
              "../testdata/purchases/purchases_part1.parquet",
              time_column="purchase_time",
              key_column="customer_id",
              subsort_column="subsort_id",
          )
  
  pytests\parquet_source_test.py:17: 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  .venv\Lib\site-packages\kaskada\sources\arrow.py:582: in create
      await source.add_file(path)
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  
  self = <kaskada.sources.arrow.Parquet object at 0x0000018C117A42D0>
  path = 'D:\\a\\kaskada\\kaskada\\python/../testdata/purchases/purchases_part1.parquet'
  
      async def add_file(self, path: str) -> None:
          """Add data to the source."""
  >       await self._ffi_table.add_parquet(str(Source._get_absolute_path(path)))
  E       pyo3_asyncio.RustPanic: rust future panicked
  
  .venv\Lib\site-packages\kaskada\sources\arrow.py:587: RustPanic
@jordanrfrazier jordanrfrazier self-assigned this Oct 5, 2023
@jordanrfrazier
Copy link
Collaborator Author

Failing here:

Which means that it correctly created the Url and was able to get the path from it. It's possible a source of issue is coming from how the path is stored in the SourceData proto object? We do create the ObjectStoreUrl successfully right before calling into the ParquetFile, though I don't know if that indicates it is a valid url.

let file = ParquetFile::try_new(object_stores, url, None)

@jordanrfrazier
Copy link
Collaborator Author

@epinzur
Copy link
Collaborator

epinzur commented Oct 9, 2023

capture current state of the issue and then move back to backlog.

@jordanrfrazier
Copy link
Collaborator Author

jordanrfrazier commented Oct 9, 2023

Current status:

Windows build fails when attempting to parse an invalid URL. Despite our best efforts to use URLs and Paths (instead of manual string manipulation), the windows build path to the prepared file includes a tilde:

Error parsing Path "/C:/Users/RUNNER~1/AppData/Local/Temp/.tmpzmwvaw/3390f96b-b364-4e95-aaaf-11a41153f3e8/part-0.parquet": Encountered illegal character sequence "~" whilst parsing path segment "RUNNER~1"

Action Items:

  • Double appending the root to the URL
  • Prepending file:// manually in cases still
  • Clean up SourceData proto
  • Figure out why URL includes the tilde for windows builds

@jordanrfrazier jordanrfrazier changed the title Fix windows build bug: Windows build fails to parse invalid URL in prepared file Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants