Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Monitoring #330

Merged
merged 15 commits into from
Jun 13, 2022
Merged

Feature Monitoring #330

merged 15 commits into from
Jun 13, 2022

Conversation

hangfei
Copy link
Collaborator

@hangfei hangfei commented Jun 8, 2022

Feature monitoring for scalar features.

SQL schema:
For numeric features

feature_name | feature_type | mean | median | max | min | coverage |

For string/boolean features:

feature_name | feature_type | cardinality | coverage

coverage = (total_count - missing_count) / total_count
cardinality = count of different items. For example, ["apple", "apple", "orage" ] => cardinatily is 2(apple and orange)

For example:
input feature table:

+----+------+--------+-----+--------------------+---------+
|key0|f_null|f_string|f_int|            f_double|f_boolean|
+----+------+--------+-----+--------------------+---------+
|   1|  null|   apple|    1|  0.7191070126381514|     true|
|   2|  null|   apple|    0|   0.631537113156793|    false|
|   5|  null|  orange|    1|  0.8990245049992188|    false|
|   3|  null|    null|    0|  0.5762385339376767|    false|
|   4|  null|  orange|    0|  0.4366523814297121|    false|
|   8|  null|  orange|    0| 0.14328502026059808|    false|
|  10|  null|  orange|    0|  0.9389259655580797|     true|
|   6|  null|  orange|    0|  0.7877756383579947|    false|
|   7|  null|  orange|    1|0.056606459160327804|    false|
|   9|  null|  orange|    1|  0.6003317209869276|    false|
+----+------+--------+-----+--------------------+---------+
+------------+------------+----------+----+----+----+----+--------+
|feature_name|feature_type|      date|mean| avg| min| max|coverage|
+------------+------------+----------+----+----+----+----+--------+
|      f_null|      double|2022-06-07|null|null|null|null|     0.0|
+------------+------------+----------+----+----+----+----+--------+

+------------+------------+----------+----+----+----+----+--------+
|feature_name|feature_type|      date|mean| avg| min| max|coverage|
+------------+------------+----------+----+----+----+----+--------+
|      f_null|      double|2022-06-07|null|null|null|null|     0.0|
+------------+------------+----------+----+----+----+----+--------+

+------------+------------+----------+----+---+---+---+--------+
|feature_name|feature_type|      date|mean|avg|min|max|coverage|
+------------+------------+----------+----+---+---+---+--------+
|       f_int|     integer|2022-06-07| 0.4|0.4|  0|  1|     1.0|
+------------+------------+----------+----+---+---+---+--------+

+------------+------------+----------+------------------+------------------+--------------------+------------------+--------+
|feature_name|feature_type|      date|              mean|               avg|                 min|               max|coverage|
+------------+------------+----------+------------------+------------------+--------------------+------------------+--------+
|    f_double|      double|2022-06-07|0.5789484350485481|0.5789484350485481|0.056606459160327804|0.9389259655580797|     1.0|
+------------+------------+----------+------------------+------------------+--------------------+------------------+--------+

+------------+------------+----------+------------------+------------------+--------------------+------------------+--------+
|feature_name|feature_type|      date|              mean|               avg|                 min|               max|coverage|
+------------+------------+----------+------------------+------------------+--------------------+------------------+--------+
|    f_double|      double|2022-06-07|0.5789484350485481|0.5789484350485481|0.056606459160327804|0.9389259655580797|     1.0|
+------------+------------+----------+------------------+------------------+--------------------+------------------+--------+

+------------+------------+----------+-----+------+--------+-----------+
|feature_name|feature_type|      date|  min|   max|coverage|cardinality|
+------------+------------+----------+-----+------+--------+-----------+
|    f_string|      string|2022-06-09|apple|orange|     0.9|          3|
+------------+------------+----------+-----+------+--------+-----------+
+------------+------------+----------+-----+----+--------+-----------+
|feature_name|feature_type|      date|  min| max|coverage|cardinality|
+------------+------------+----------+-----+----+--------+-----------+
|   f_boolean|     boolean|2022-06-09|false|true|     1.0|          2|
+------------+------------+----------+-----+----+--------+-----------+

@hangfei hangfei changed the title Monitor Feature Monitoring Jun 8, 2022
xiaoyongzhu
xiaoyongzhu previously approved these changes Jun 12, 2022
@hangfei hangfei merged commit d18294c into main Jun 13, 2022
bozhonghu pushed a commit that referenced this pull request Jun 15, 2022
* main:
  Fixing purview test issues and improve performance (#350)
  [feathr] Add product_recommendation advanced sample (#348)
  obejectId query cmd update (#360)
  add license, release, docs, python api ref badges with shields img (#357)
  quick fix the 404 not found in read me link (#355)
  Python SQL Registry (#311)
  enable JWT token param in frontend API calls (#337)
  Optimize environment variable behavior (#333)
  Adding better warning message to let user know that config file is missing and they need to set env parameters. (#347)
  Feature Monitoring (#330)
  Windoze/211 maven submission (#334)
  Windoze/211 maven submission (#334)
  Windoze/211 maven submission (#334)
  Fix Synapse quickstart link (#346)
  Show feature details when click feature in lineage graph (#339)
  Update pull_request_push_test.yml
  Update UI README for how to create overrides for local development (#335)
  Update databricks quick start experience (#217)
@hangfei hangfei deleted the monitor branch July 29, 2022 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants