Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Databricks job launcher bugs when adding maven dependencies #1041

Closed
1 of 4 tasks
loomlike opened this issue Feb 8, 2023 · 1 comment · Fixed by #1051
Closed
1 of 4 tasks

[BUG] Databricks job launcher bugs when adding maven dependencies #1041

loomlike opened this issue Feb 8, 2023 · 1 comment · Fixed by #1051
Assignees
Labels
bug Something isn't working

Comments

@loomlike
Copy link
Collaborator

loomlike commented Feb 8, 2023

Willingness to contribute

No. I cannot contribute a bug fix at this time.

Feathr version

0.10.4-rc1

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.0):
  • Python version: 3.8
  • Spark version, if reporting runtime issue:

Describe the problem

in Databricks job launcher class, adding library dependencies to databricks job api config via index shouldn't be used:

submission_params['libraries'][0]['maven']
...
submission_params['libraries'][1]['maven']
...
submission_params["libraries"][0]["jar"]

since this will break the job api config.

E.g.
in our feature embedding example, we add pypi dependency of transformer package as follows:

"libraries": [
        {"jar": "FEATHR_FILL_IN"},
        # sentence-transformers pip package
        {"pypi": {"package": "sentence-transformers"}},
    ],

The new codes that are adding maven dependencies to the config will modify the config to be:

[
    {
        'jar': 'FEATHR_FILL_IN',
        'maven': {'coordinates': 'com.linkedin.feathr:feathr_2.12:0.10.4-rc1'}
    },
    {
        'pypi': {'package': 'sentence-transformers'},
        'maven': {'coordinates': 'com.github.everit-org.json-schema:org.everit.json.schema:1.9.1', 'repo': 'https://repository.mulesoft.org/nexus/content/repositories/public/'}
    }
]

which has a broken format and will cause unexpected behavior when creating the databricks job.

also, submission_params["libraries"][0]["jar"] is not safe to use. If the first dependency library value is not ['jar']: FEATHR_FILL, but something else like pypi or any custom maven dependency users may set, then "libraries"][0]["jar"] this logic will break the format too.

Tracking information

No response

Code to reproduce bug

No response

What component(s) does this bug affect?

  • Python Client: This is the client users use to interact with most of our API. Mostly written in Python.
  • Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.
  • Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)
  • Feature Registry Web UI: The Web UI for feature registry. Written in React
@loomlike loomlike added the bug Something isn't working label Feb 8, 2023
@loomlike loomlike mentioned this issue Feb 8, 2023
2 tasks
@blrchen
Copy link
Collaborator

blrchen commented Feb 9, 2023

Specifiy 'maven': {'coordinates': 'com.github.everit-org.json-schema:org.everit.json.schema:1.9.1', 'repo': 'https://repository.mulesoft.org/nexus/content/repositories/public/'} is no longer needed on notebook side if user is using maven jar with verson greater than v0.10.4-rc5. everit.json.schema:1.9.1 is not available on maven and current cloud sparks can not pull packages from non maven central repos. Thus PR #1043 removes everit.json.schema:1.9.1' deps.

By default, python client uses same version of maven jar, so to get the fix for this issue, either

  1. Wait till main branch pyhton client version is bumped to v0.10.4-rc5, this should happen soon if no new regressions are found on v0.10.4-rc5
  2. Add a cell to force python client use a newer version maven jar os.environ['MAVEN_ARTIFACT_VERSION'] = "0.10.4-rc5" in notebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants