Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructured dev and test code #12

Merged
merged 38 commits into from
Dec 30, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
6f45085
Update README.md
PiyushGSlab Nov 28, 2022
3047cf4
Update README.md
PiyushGSlab Nov 29, 2022
8531a1d
updated pip package name
PiyushGSlab Nov 29, 2022
70a816b
Merge branch 'acryldata:main' into main
mardikark-gslab Dec 1, 2022
08984df
Removed unwanted CSV files
mardikark-gslab Dec 6, 2022
15d4043
Restructured unit testing file to load dataset from provided director…
mardikark-gslab Dec 6, 2022
52d3519
cosmetic changes
mardikark-gslab Dec 6, 2022
690f9bf
Refactored code to compute name desc, dtype score into a singe funtion
PiyushGSlab Dec 8, 2022
bc6e6e6
Added function annotations
PiyushGSlab Dec 8, 2022
2737897
added function annotations
PiyushGSlab Dec 8, 2022
c8fb546
added quick test functionality
PiyushGSlab Dec 8, 2022
1c046c5
Removed TODO comment
mardikark-gslab Dec 12, 2022
b9478bc
Removed restriction of loading only 1000 rows in test file
mardikark-gslab Dec 12, 2022
8819477
Renamed the test file
mardikark-gslab Dec 12, 2022
418a7f2
Merge branch 'main' into test_restructure
hsheth2 Dec 13, 2022
0694949
Updated function annotations (list and dict)
PiyushGSlab Dec 14, 2022
b2f0ceb
Merge branch 'test_restructure' of https://github.com/mardikark-gslab…
PiyushGSlab Dec 14, 2022
9b167bb
Updated function annotations and ran gradle sanity checks
PiyushGSlab Dec 14, 2022
062d3e5
Removed the quick test functionality. Separate script will be added l…
PiyushGSlab Dec 20, 2022
ac523d5
add Final qualifier to prevent mypy type checking errors
PiyushGSlab Dec 20, 2022
2f298e9
added a class DebugInfo
PiyushGSlab Dec 20, 2022
e4ab9b5
changed the debug_info from raw dict to TypedDict
PiyushGSlab Dec 20, 2022
3bc6f02
reduced the verbosity of logger messages (some logs moved to debug le…
PiyushGSlab Dec 20, 2022
e0c4866
added typing_extensions library to base requirements
PiyushGSlab Dec 20, 2022
76dcb44
removed the Final qualifier as it is not required any more for mypy t…
PiyushGSlab Dec 23, 2022
eb0125a
changed DebugInfo from TypedDict to dataclass
PiyushGSlab Dec 23, 2022
68c293d
some syntax changes as debug_info is now instance of dataclass and fi…
PiyushGSlab Dec 23, 2022
b19a365
fixed some incorrect function annotations
PiyushGSlab Dec 23, 2022
85736b5
fixed some incorrect function annotations
PiyushGSlab Dec 23, 2022
7228548
removed typing_extensions from base requirements as it is not require…
PiyushGSlab Dec 23, 2022
c22b8ab
class variables of DebugInfo assigned default value None
PiyushGSlab Dec 26, 2022
72cc38e
removed hasattr check
PiyushGSlab Dec 26, 2022
325d5d9
replaced debug_info NoneType check with prediction_factors_weights wi…
PiyushGSlab Dec 26, 2022
c619e6b
Modified the float comparison, also changed the DebugInfo instance va…
mardikark-gslab Dec 29, 2022
cc6d546
Removed unused import
mardikark-gslab Dec 29, 2022
d41a293
Removed cast operation
mardikark-gslab Dec 30, 2022
5b5c0cd
Update datahub-classify/src/datahub_classify/infotype_helper.py
hsheth2 Dec 30, 2022
0bcc6bd
Update datahub-classify/src/datahub_classify/infotype_helper.py
hsheth2 Dec 30, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions datahub-classify/src/datahub_classify/helper_classes.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ class ColumnInfo:

@dataclass
class DebugInfo:
name: Union[str, float] = field(init=False)
description: Union[str, float] = field(init=False)
datatype: Union[str, float] = field(init=False)
values: Union[str, float] = field(init=False)
name: Optional[Union[str, float]] = None
description: Optional[Union[str, float]] = None
datatype: Optional[Union[str, float]] = None
values: Optional[Union[str, float]] = None
17 changes: 9 additions & 8 deletions datahub-classify/src/datahub_classify/infotype_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ def compute_overall_confidence(debug_info: DebugInfo, config: Dict[str, Dict]) -
}
confidence_level = 0
for key, value in vars(debug_info).items():
if type(value) != str:
if value and type(value) != str:
confidence_level += prediction_factors_weights[key] * value
confidence_level = np.round(confidence_level, 2)
return confidence_level
Expand Down Expand Up @@ -185,10 +185,10 @@ def inspect_for_gender(

try:
if (
hasattr(debug_info, "name")
and int(debug_info.name) == 1
and hasattr(debug_info, "values")
and debug_info.values == 0
prediction_factors_weights.get(NAME, 0) > 0
and debug_info.name == 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comparing floats with == is unreliable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the code not to use "==" for float comparison. Also I have modified DebugInfo instance variable type to float instead of float and str. Added couple of TODOs regarding adding warning/error messages in the error flag which will get passed in ColumnInfo object in future PR.

and prediction_factors_weights.get(VALUES, 0) > 0
and debug_info.values == 0.0
):
num_unique_values = len(np.unique(values))
if num_unique_values < 5:
Expand Down Expand Up @@ -348,11 +348,12 @@ def inspect_for_full_name(

try:
if (
hasattr(debug_info, "name")
and int(debug_info.name) == 1
and hasattr(debug_info, "values")
prediction_factors_weights.get(NAME, 0) > 0
and debug_info.name == 1.0
and prediction_factors_weights.get(VALUES, 0) > 0
and 0.5 > cast(float, debug_info.values) > 0.1
):

debug_info.values = 0.8
except Exception as e:
logger.error(f"Column {metadata.name} failed due to {e}")
Expand Down