Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Spark SQL Location Source e2e cannot pass #1055

Closed
1 of 4 tasks
Yuqing-cat opened this issue Feb 10, 2023 · 2 comments · Fixed by #1056
Closed
1 of 4 tasks

[BUG] Spark SQL Location Source e2e cannot pass #1055

Yuqing-cat opened this issue Feb 10, 2023 · 2 comments · Fixed by #1056
Labels
bug Something isn't working

Comments

@Yuqing-cat
Copy link
Collaborator

Willingness to contribute

Yes. I can contribute a fix for this bug independently.

Feathr version

latest0.10.4-rc1

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.0):
  • Python version:
  • Spark version, if reporting runtime issue:

Describe the problem

This test works in 0.10.4-rc1 but failed in recent main.
image
Must be some regression in recent PRs.

Tracking information

Based on investigation, it's caused by Spark SQL location treated as Simple Path.

Code to reproduce bug

No response

What component(s) does this bug affect?

  • Python Client: This is the client users use to interact with most of our API. Mostly written in Python.
  • Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.
  • Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)
  • Feature Registry Web UI: The Web UI for feature registry. Written in React
@Yuqing-cat Yuqing-cat added the bug Something isn't working label Feb 10, 2023
@Yuqing-cat
Copy link
Collaborator Author

Source in Config file confirmed is correct:
image

windoze added a commit that referenced this issue Feb 10, 2023
@Yuqing-cat
Copy link
Collaborator Author

The real error log is hidden by retry message. It's not user friendly.

23/02/07 04:18:06 INFO BatchDataLoaderFactory: Creating spark data loader for path: SELECT * FROM green_tripdata_2020_04_with_index
23/02/07 04:18:06 INFO BatchDataLoader: Loading SimplePath(path=SELECT * FROM green_tripdata_2020_04_with_index) as DataFrame, using parameters Map(split.size -> )
23/02/07 04:18:06 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: SELECT * FROM green_tripdata_2020_04_with_index.
java.lang.IllegalArgumentException: Path must be absolute: SELECT * FROM green_tripdata_2020_04_with_index

@windoze windoze mentioned this issue Feb 10, 2023
2 tasks
windoze added a commit that referenced this issue Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant