Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretty-print the features produced by buildFeatures #214

Merged
merged 10 commits into from
May 17, 2022

Conversation

ahlag
Copy link
Contributor

@ahlag ahlag commented May 4, 2022

Description

  • Added pretty print flag to build_features

Resolves #163

How was this patch tested?

Ran pytest -s feathr_project/test/test_feature_materialization.py -k "test_build_feature_verbose" to check the pretty-printed features

Does this PR introduce any user-facing changes?

Introduces verbose argument to pretty print features

ahlag added 2 commits May 4, 2022 11:12
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
@hangfei hangfei added the safe to test Tag to execute build pipeline for a PR from forked repo label May 4, 2022
@hangfei
Copy link
Collaborator

hangfei commented May 5, 2022

For pretty print, let's use a util so the logics can be unified and reused.

I am thinking about this format:

These are the features you are going to register/build/materialize:
feature1, feature2, and feature3.

Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
@ahlag
Copy link
Contributor Author

ahlag commented May 5, 2022

@hangfei
Okay sure. Where should I place the util code?

I am receiving the following errors from the CI/CD. Do I need to set Databricks Token on my laptop?

  1. Connection error to Databricks
Run # overwrite corresponding environment variables to utilize feathr to upload the files
  # overwrite corresponding environment variables to utilize feathr to upload the files
  # assuming there will be only one jar in the target folder
  databricks fs cp target/scala-[2](https://github.com/linkedin/feathr/runs/6301795147?check_suite_focus=true#step:7:2).12/feathr-assembly-0.1.0.jar dbfs:/feathr_jar_github_action_06_42_58/feathr-assembly-0.1.0.jar --overwrite
  shell: /usr/bin/bash -e {0}
  env:
    JAVA_HOME: /opt/hostedtoolcache/Java_Temurin-Hotspot_jdk/8.0.[3](https://github.com/linkedin/feathr/runs/6301795147?check_suite_focus=true#step:7:3)22-6/x6[4](https://github.com/linkedin/feathr/runs/6301795147?check_suite_focus=true#step:7:4)
    CI_SPARK_REMOTE_JAR_FOLDER: feathr_jar_github_action_06_42_[5](https://github.com/linkedin/feathr/runs/6301795147?check_suite_focus=true#step:7:5)8
    FEATHR_LOCAL_JAR_NAME: feathr-assembly-0.1.0.jar
    FEATHR_LOCAL_JAR_FULL_NAME_PATH: target/scala-2.12/feathr-assembly-0.1.0.jar
    pythonLocation: /opt/hostedtoolcache/Python/3.8.12/x[6](https://github.com/linkedin/feathr/runs/6301795147?check_suite_focus=true#step:7:6)4
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.[8](https://github.com/linkedin/feathr/runs/6301795147?check_suite_focus=true#step:7:8).[12](https://github.com/linkedin/feathr/runs/6301795147?check_suite_focus=true#step:7:12)/x64/lib
    DATABRICKS_HOST: 
    DATABRICKS_TOKEN: 
Error: IndexError: string index out of range
Error: Process completed with exit code 1.
  1. Connection error to Azure Blob Storage
Run fixpoint/azblob-upload-artifact@v4
Error: Unable to extract accountName with provided information.
Error: Unable to extract accountName with provided information.

@hangfei
Copy link
Collaborator

hangfei commented May 5, 2022

may not be critical. just re-ran the tests to see if it passes

@hangfei
Copy link
Collaborator

hangfei commented May 5, 2022

For util, let's create a new py file in the same direcotry.

feathr_project/test/test_feature_materialization.py Outdated Show resolved Hide resolved
feathr_project/feathr/client.py Outdated Show resolved Hide resolved
@hangfei
Copy link
Collaborator

hangfei commented May 6, 2022

As long as your local test that you added passes, it should be fine. There is a crendential issue on a forked PR(we are fixing it and hopefully it can be fixed in 1-2 weeks).

Since you are just adding some pretty-print, i don't think it will break the e2e tests.

I will ran the e2e tests after it's merged.

@ahlag
Copy link
Contributor Author

ahlag commented May 6, 2022

@hangfei
Ok sure!

@ahlag ahlag changed the title [WIP] Pretty-print the features produced by buildFeatures Pretty-print the features produced by buildFeatures May 8, 2022
@ahlag
Copy link
Contributor Author

ahlag commented May 8, 2022

@hangfei
This PR is up for review.

I tested the verbose flag with the following command:

pytest -s feathr_project/test/test_feature_materialization.py -k "test_build_feature_verbose"                                                                         ─╯
=========================================================================== test session starts ===========================================================================
platform darwin -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/changyonglik/Desktop/opensource/feathr/feathr_project
collected 5 items / 4 deselected / 1 selected

feathr_project/test/test_feature_materialization.py 2022-05-08 14:15:15.225 | INFO     | feathr._envvariableutil:get_environment_variable:64 - DATABRICKS_WORKSPACE_TOKEN_VALUE is not set in the environment variables.
2022-05-08 14:15:15.233 | INFO     | feathr._envvariableutil:get_environment_variable:64 - REDIS_PASSWORD is not set in the environment variables.
("request_features is the achor feature of ['trip_distance', "
 "'f_is_long_trip_distance', 'f_day_of_week']>")
"user_embemdding_derived is the derived feature of ['user_embedding']>"
.

@ahlag ahlag force-pushed the build-feature-pretty-print branch 4 times, most recently from 2236307 to 729013d Compare May 8, 2022 08:24
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
@ahlag ahlag force-pushed the build-feature-pretty-print branch from 729013d to 88b4549 Compare May 8, 2022 08:33
feathr_project/test/test_feature_materialization.py Outdated Show resolved Hide resolved
feathr_project/feathr/client.py Outdated Show resolved Hide resolved
ahlag added 2 commits May 11, 2022 14:00
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
@ahlag ahlag force-pushed the build-feature-pretty-print branch from 743a3e9 to cb6f14b Compare May 13, 2022 15:38
ahlag added 2 commits May 16, 2022 23:56
Fixed conflict
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
@ahlag ahlag force-pushed the build-feature-pretty-print branch from c42b064 to 95a6a96 Compare May 16, 2022 15:14
@ahlag
Copy link
Contributor Author

ahlag commented May 16, 2022

@hangfei
I have finished implementing the pretty-prints. Could you take a look? test_get_offline_features_verbose and test_materialize_features_verbose fail because it requires REDIS and KAFKA settings. The pretty-print works without a problem.

❯ pytest -s feathr_project/test/test_feature_materialization.py -k "test_build_feature_verbose"                                                               
====================================================================== test session starts =======================================================================
platform darwin -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/changyonglik/Desktop/opensource/feathr/feathr_project
collected 7 items / 6 deselected / 1 selected

feathr_project/test/test_feature_materialization.py 2022-05-17 00:00:06.106 | INFO     | feathr._envvariableutil:get_environment_variable:66 - DATABRICKS_WORKSPACE_TOKEN_VALUE is not set in the environment variables, fetching the value from Key Vault
2022-05-17 00:00:06.115 | INFO     | feathr._envvariableutil:get_environment_variable:66 - REDIS_PASSWORD is not set in the environment variables, fetching the value from Key Vault
("request_features is the achor of ['trip_distance', "
 "'f_is_long_trip_distance', 'f_day_of_week']")
❯ pytest -s feathr_project/test/test_feature_materialization.py -k "test_get_offline_features_verbose"                                                         
====================================================================== test session starts =======================================================================
platform darwin -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/changyonglik/Desktop/opensource/feathr/feathr_project
collected 7 items / 6 deselected / 1 selected

feathr_project/test/test_feature_materialization.py 2022-05-16 23:59:32.888 | INFO     | feathr._envvariableutil:get_environment_variable:66 - DATABRICKS_WORKSPACE_TOKEN_VALUE is not set in the environment variables, fetching the value from Key Vault
2022-05-16 23:59:32.896 | INFO     | feathr._envvariableutil:get_environment_variable:66 - REDIS_PASSWORD is not set in the environment variables, fetching the value from Key Vault
Features in feature_query: ['f_location_avg_fare']
❯ pytest -s feathr_project/test/test_feature_materialization.py -k "test_materialize_features_verbose"                                                                                           
======================================================================================= test session starts ========================================================================================
platform darwin -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/changyonglik/Desktop/opensource/feathr/feathr_project
collected 7 items / 6 deselected / 1 selected

feathr_project/test/test_feature_materialization.py 2022-05-17 00:13:00.655 | INFO     | feathr._envvariableutil:get_environment_variable:66 - DATABRICKS_WORKSPACE_TOKEN_VALUE is not set in the environment variables, fetching the value from Key Vault
2022-05-17 00:13:00.664 | INFO     | feathr._envvariableutil:get_environment_variable:66 - REDIS_PASSWORD is not set in the environment variables, fetching the value from Key Vault
Yes
Materialization features in settings: ['f_location_avg_fare', 'f_location_max_fare']
2022-05-17 00:13:00.746 | INFO     | feathr._envvariableutil:get_environment_variable:66 - KAFKA_SASL_JAAS_CONFIG is not set in the environment variables, fetching the value from Key Vault

ahlag added 2 commits May 17, 2022 00:18
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
@hangfei
Copy link
Collaborator

hangfei commented May 17, 2022

This looks great! Very impressive test coverage!

@xiaoyongzhu xiaoyongzhu merged commit 1ab7ea2 into feathr-ai:main May 17, 2022
@xiaoyongzhu
Copy link
Member

Thanks for the contribution @ahlag !

@chinmaytredence
Copy link
Contributor

@ahlag Should we put the utils.py inside the feathr directory

@ahlag ahlag deleted the build-feature-pretty-print branch May 18, 2022 03:13
@ahlag
Copy link
Contributor Author

ahlag commented May 18, 2022

@chinmaytredence (cc: @hangfei )
Will the pretty-prints be used elsewhere? (Just confirming)

@chinmaytredence
Copy link
Contributor

@ahlag this is breaking the uvicorn run when we add this feathr project into requirements.txt
Screenshot 2022-05-18 at 9 16 17 AM
t

@ahlag
Copy link
Contributor Author

ahlag commented May 18, 2022

@chinmaytredence
Could you share the steps you used to reproduce this? I will take a look into this.

@chinmaytredence
Copy link
Contributor

chinmaytredence commented May 18, 2022

@ahlag you can start a flask/fastapi project and add this "git+https://github.com/linkedin/feathr.git@main#subdirectory=feathr_project"
into the requirements.txt to use this as a library.
If the utils.py is not inside the "feathr" directory then it will not be downloaded into the dependencies. Thus it breaks the running of application.

@chinmaytredence
Copy link
Contributor

Screenshot 2022-05-18 at 9 22 17 AM

Screenshot 2022-05-18 at 9 22 50 AM

@ahlag
Copy link
Contributor Author

ahlag commented May 18, 2022

@chinmaytredence
Ok. I will create a quick fix and then tag the maintainers and you for a review. Thank you!

@ahlag
Copy link
Contributor Author

ahlag commented May 18, 2022

@chinmaytredence @hangfei
I have created a fix here. Could you review it? I have tested it locally.
#273

@chinmaytredence
Copy link
Contributor

@ahlag validated it. Working now. Thanks for the fix.

bozhonghu pushed a commit that referenced this pull request Jun 1, 2022
* main: (30 commits)
  Yihui/moderate registration conflict (#304)
  Update homepage (#310)
  Add extensible extractor APIs (#302)
  Remove Java and JS from Code Scanning
  Create codeql-analysis.yml
  [feathr] Add API to materialize features to offline store (#294)
  Improve error message when path is not supported (#257)
  Add tech talk slides for Feathr (#296)
  Update README.md
  Add milestone link (#286)
  Fix millisecond timestamp handling (#288)
  Consolidating CI pipelines (#280)
  Fixed dependecy problem of pretty print utils (#273)
  Fixing a broken link in README.md (#277)
  Fix test failure (#276)
  Added feature validation (#258)
  Feathr UI: Display feature key and transform expression in feature detail pages (#262)
  Feathr UI: enable multiple tenant auth (#266)
  Reduce feathr web api docker image build time (#261)
  Pretty-print the features produced by buildFeatures  (#214)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to test Tag to execute build pipeline for a PR from forked repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pretty-print the features produced by buildFeatures and submit job
5 participants