-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce inspect_level in inspector and metadata #113
Conversation
Apply Sweep Rules to your PR?
This is an automated message generated by Sweep AI. |
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #113 +/- ##
==========================================
+ Coverage 78.94% 79.13% +0.19%
==========================================
Files 64 64
Lines 2764 2823 +59
==========================================
+ Hits 2182 2234 +52
- Misses 582 589 +7 ☔ View full report in Codecov by Sentry. |
I suggest limiting the priority to a range such as (1-100), the larger the priority, the sooner the execution. Our built-in inspector should be presented in a hierarchy (10,20,30...), so that users can insert a custom inspector from the middle. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Nice suggestion!
I will plan it later.
從我的iPhone傳送
Zhongsheng Ji ***@***.***>於2024年1月17日 16:53寫道:
I suggest limiting the priority to a range such as (1-100), the larger the priority, the sooner the execution.
Our built-in inspector should be presented in a hierarchy (10,20,30...), so that users can insert a custom inspector from the middle.
|
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request some teses on inspect_level
Co-authored-by: Zhongsheng Ji <9573586@qq.com>
It seems that all our package build test are failed because the SSL error of datahub.io., causing the fail of some unit test (downloading the demo dataset). |
Can a more stable data source be found? Or hitz-ids could provide a public download source like s3? |
Kangda Wu will work on this
從我的iPhone傳送
Zhongsheng Ji ***@***.***>於2024年1月19日 23:42寫道:
Can a more stable data source be found? Or hitz-ids could provide a public download source similar to s3?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one more question, should we add some tests in test_metadata.py
for column_inspect_level
? We can leave it in next PR.
Sure, currently some test are in each inspectors, we can add test cases in |
Description
Inspected level is a concept newly introduced in version 0.1.5.
Since a single column in the table may be marked by different inspectors at the same time, which may cause confuse.
Thus, a
inspect_level
is introduced when determining the specific type of a column.Motivation and Context
A single column, can be labeled by different multiple inspector.
For example, A email column may be recognized as email, but it may also be recognized as the id column, and it may also be recognized by different inspectors at the same time identified as a discrete column, which will cause confusion in subsequent processing.
Also, you can see from our multi-table demo dataset, child table "train", the
Date
column is marked twice, as discrete type and date type.We will preset different inspector levels for different inspectors, usually more specific inspectors will get higher levels, and general inspectors (like discrete) will have inspect_level. In baseclass, the inspect_level is set to 1.
How has this been tested?
Types of changes
Checklist: