Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python SQL Registry #311

Merged
merged 9 commits into from
Jun 14, 2022
Merged

Python SQL Registry #311

merged 9 commits into from
Jun 14, 2022

Conversation

windoze
Copy link
Member

@windoze windoze commented Jun 1, 2022

This is the SQL-based Feathr implemented in Python.
The API spec is updated to include some previously missing pieces.
The corresponding UI changes is in another PR #303 submitted by @blrchen

There are still some TODOs:

  1. User still needs to manually create the database schema by running the script script/schema.sql.
  2. The docker file is included, but there is not ARM template to deploy by one click.

We can talk about these tasks later, right now, please focus on the implementation itself.

feathr_project/registry/requirements.txt Outdated Show resolved Hide resolved
feathr_project/registry/main.py Outdated Show resolved Hide resolved
feathr_project/registry/registry/interface.py Outdated Show resolved Hide resolved
@xiaoyongzhu
Copy link
Member

xiaoyongzhu commented Jun 2, 2022

I have a few high level comemnts:

  1. for folder names etc. we should reflect it's a "SQL" based registry
  2. We should add docs so users know how to use this. Should they build docker files? And how do they connect to the docker file? At least I'm not clear on those.
  3. Seems like many code are missing comments, which will be hard for other devs to maintain. Please do add comments/descriptions for all the functions.
  4. Seems additional clean ups are required.

@xiaoyongzhu
Copy link
Member

for those DerivedFeatureDef or AnchorFeatureDef classes, can we rename it so that we know it's specific to registry? Currently it's a bit confusing as there might be duplications on the existing feature definition that is defined here, vs the ones that are available in the "Feathr Core" code.

feathr_project/registry/registry/models.py Outdated Show resolved Hide resolved
feathr_project/registry/registry/models.py Outdated Show resolved Hide resolved
feathr_project/registry/registry/models.py Outdated Show resolved Hide resolved
feathr_project/registry/registry/models.py Outdated Show resolved Hide resolved
feathr_project/registry/registry/models.py Outdated Show resolved Hide resolved
@windoze
Copy link
Member Author

windoze commented Jun 2, 2022

The Dockerfile is not intent to be used by users directly, it is used by the building process, and we'll setup CI actions to publish ready-to-use docker images to some public repo like DockerHub, @jainr is working on this task.

@xiaoyongzhu
Copy link
Member

The Dockerfile is not intent to be used by users directly, it is used by the building process, and we'll setup CI actions to publish ready-to-use docker images to some public repo like DockerHub, @jainr is working on this task.

How would the end users use that pre-built container in this case?

@windoze
Copy link
Member Author

windoze commented Jun 2, 2022

That's why I suggested to split the registry from the feathr client repo.
They share a lot of concepts hence lots of similar but not exactly same classes and types, these classes look similar because they're fundamentally representing the same thing, not same because they're for different purposes, on the client side, they're for user composing and job submitting, on the server side, they're for displaying, storing and indexing.
If we put these to separated components together, the confusion is inevitable.
And I don't think it's not a good idea to re-use client side code, there are too many different logics and processes need to be applied to these entities differently on different side, merging them could lead to many potential bugs and even security issues.

@windoze
Copy link
Member Author

windoze commented Jun 2, 2022

The Dockerfile is not intent to be used by users directly, it is used by the building process, and we'll setup CI actions to publish ready-to-use docker images to some public repo like DockerHub, @jainr is working on this task.

How would the end users use that pre-built container in this case?

Again, this part is covered by the manual that @jainr is currently working on, so let's put the comments about the deploying/running questions in that upcoming user-manual PR.

@xiaoyongzhu
Copy link
Member

That's why I suggested to split the registry from the feathr client repo. They share a lot of concepts hence lots of similar but not exactly same classes and types, they're similar because they're representing the same thing, not same because they're for different purposes, on the client side, they're for user composing and job submitting, on the server side, they're for displaying, storing and indexing. If we put these to separated components together, the confusion is inevitable.

Correct me if I'm wrong here, but seems like the registry in this PR is only to be consumed by UI (and the REST API layer). I.e. it's not ready to be consumed by the Python client (say, users cannot call get_features_from_registry() for this registry.

For this specific PR, it should be self contained (that's why you have a dockerfile) and it's equivalent of building another smaller Purview service, I think we should move it under a folder NOT in a separate repo, but rather in the same repo but not in the python client folder. maybe create something under root folder and put it there. Or put it under the ui/ folder.

Also - this folder (https://github.com/linkedin/feathr/tree/main/feathr_project/feathr/api) should be also moved tin a similar way (i.e. not using registry service)

@windoze
Copy link
Member Author

windoze commented Jun 2, 2022

That's why I suggested to split the registry from the feathr client repo. They share a lot of concepts hence lots of similar but not exactly same classes and types, they're similar because they're representing the same thing, not same because they're for different purposes, on the client side, they're for user composing and job submitting, on the server side, they're for displaying, storing and indexing. If we put these to separated components together, the confusion is inevitable.

Correct me if I'm wrong here, but seems like the registry in this PR is only to be consumed by UI (and the REST API layer). I.e. it's not ready to be consumed by the Python client (say, users cannot call get_features_from_registry() for this registry.

For this specific PR, it should be self contained (that's why you have a dockerfile) and it's equivalent of building another smaller Purview service, I think we should move it under a folder NOT in a separate repo, but rather in the same repo but not in the python client folder. maybe create something under root folder and put it there. Or put it under the ui/ folder.

Also - this folder (https://github.com/linkedin/feathr/tree/main/feathr_project/feathr/api) should be also moved tin a similar way (i.e. not using registry service)

Agree.

Please raise an issue and let's make decision on directory structure and do a major restructuring soon, not only for this part, but also other components.

@xiaoyongzhu
Copy link
Member

Can we do a few things here?

  1. Move this implementation to the project root or somewhere else, rather than under /feathr_project
  2. Add a bit on the implementation notes for how this solution works in the README file (like we are only storing edges and nodes, and each nodes have different type, etc.)
  3. Add comments in the python files (especially the ones under registry folder) to make it easier to maintain

1 similar comment
@xiaoyongzhu
Copy link
Member

Can we do a few things here?

  1. Move this implementation to the project root or somewhere else, rather than under /feathr_project
  2. Add a bit on the implementation notes for how this solution works in the README file (like we are only storing edges and nodes, and each nodes have different type, etc.)
  3. Add comments in the python files (especially the ones under registry folder) to make it easier to maintain

@jainr jainr added the safe to test Tag to execute build pipeline for a PR from forked repo label Jun 10, 2022
@jainr jainr requested a review from Yuqing-cat June 10, 2022 20:56
raise ValueError("Too many nested levels")
if isinstance(d, str):
d = d[:100]
return re.sub(r'([A-Z]\w+$)', r'_\1', d).lower()

Check warning

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data

This [regular expression](1) that depends on [a user-provided value](2) may run slow on strings starting with 'A' and with many repetitions of 'A'. This [regular expression](1) that depends on [a user-provided value](3) may run slow on strings starting with 'A' and with many repetitions of 'A'. This [regular expression](1) that depends on [a user-provided value](4) may run slow on strings starting with 'A' and with many repetitions of 'A'. This [regular expression](1) that depends on [a user-provided value](5) may run slow on strings starting with 'A' and with many repetitions of 'A'. This [regular expression](1) that depends on [a user-provided value](6) may run slow on strings starting with 'A' and with many repetitions of 'A'.
@windoze windoze merged commit a2832d7 into main Jun 14, 2022
@windoze windoze deleted the windoze/registry branch June 14, 2022 15:23
bozhonghu pushed a commit that referenced this pull request Jun 15, 2022
* main:
  Fixing purview test issues and improve performance (#350)
  [feathr] Add product_recommendation advanced sample (#348)
  obejectId query cmd update (#360)
  add license, release, docs, python api ref badges with shields img (#357)
  quick fix the 404 not found in read me link (#355)
  Python SQL Registry (#311)
  enable JWT token param in frontend API calls (#337)
  Optimize environment variable behavior (#333)
  Adding better warning message to let user know that config file is missing and they need to set env parameters. (#347)
  Feature Monitoring (#330)
  Windoze/211 maven submission (#334)
  Windoze/211 maven submission (#334)
  Windoze/211 maven submission (#334)
  Fix Synapse quickstart link (#346)
  Show feature details when click feature in lineage graph (#339)
  Update pull_request_push_test.yml
  Update UI README for how to create overrides for local development (#335)
  Update databricks quick start experience (#217)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to test Tag to execute build pipeline for a PR from forked repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants