Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Labeler Compiler Diff #336

Merged
merged 6 commits into from
Jul 16, 2021

Conversation

grant-eden
Copy link
Contributor

Pretty straightforward.
Might be strange that we have a for loop for one profile, but in the future that may change and it is consistent with the profile property.
Took a page out of Andrew's book for mocking.

'c': 0.84
}
},
'data_label': ['a', 'b']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this data_label different than the one in statistics? Also, in profile do we have data_label at both levels? Should this instead pop it from the statistics level and reinsert it at the top level?

Copy link
Contributor Author

@grant-eden grant-eden Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, diff in the datalabeler profile is not supposed to return data label

@@ -233,6 +233,7 @@ def profile(self):
Property for profile. Returns the profile of the column.
"""
profile = {
"data_label": self.data_label,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JGSweets @AnhTruong I've made a small refactor to put the data_label in the profile instead of having it extracted from the compiler level. Was there a specific reason the data_label was left out of the profile? Is this refactor inappropriate?

Copy link
Contributor

@JGSweets JGSweets Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is appropriate.

The update above: profile["data_label"] = col_profile.pop("data_label") completes the refactor.

JGSweets
JGSweets previously approved these changes Jul 15, 2021
Copy link
Contributor

@AnhTruong AnhTruong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments

# Test disabling both datalabeler profiles for compiler diff
compiler2 = col_pro_compilers.ColumnDataLabelerCompiler(data, options)
expected_diff = {}
self.assertDictEqual(expected_diff, compiler1.diff(compiler2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test with one profile is empty, then the result should be the other profile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result wont be the other profile, it would be empty, since you cant have a difference unless you have 2 profiles to compare

@JGSweets JGSweets merged commit 3aee648 into capitalone:main Jul 16, 2021
az85252 pushed a commit to az85252/DataProfiler that referenced this pull request Jul 19, 2021
JGSweets pushed a commit that referenced this pull request Jul 20, 2021
* implementing ability for null option

* Woah. NICE. (#336)

* revised unalikeability functionality

* added test cases for revised unalikeability functionality

* update to 0.6.1

* added ability for null values to be set by user choice

* fixed test cases for test_profiler_options.py and test_profiler_class_options.py relating to null_values

* fixed functionality for null_values

* fixed functionality and documentation for null_values; added test case to cover more types

* fixed syntax for null_values

* fixed syntax for profiler_options.py

* fixed syntax for profiler_options.py

* fixed syntax for test cases

* fixed syntax for test cases

* fixed syntax for test cases

* fixed syntax

* fixed syntax

Co-authored-by: Grant <56846128+gme5078@users.noreply.github.com>
stevensecreti pushed a commit to stevensecreti/DataProfiler that referenced this pull request Jun 15, 2022
stevensecreti pushed a commit to stevensecreti/DataProfiler that referenced this pull request Jun 15, 2022
* implementing ability for null option

* Woah. NICE. (capitalone#336)

* revised unalikeability functionality

* added test cases for revised unalikeability functionality

* update to 0.6.1

* added ability for null values to be set by user choice

* fixed test cases for test_profiler_options.py and test_profiler_class_options.py relating to null_values

* fixed functionality for null_values

* fixed functionality and documentation for null_values; added test case to cover more types

* fixed syntax for null_values

* fixed syntax for profiler_options.py

* fixed syntax for profiler_options.py

* fixed syntax for test cases

* fixed syntax for test cases

* fixed syntax for test cases

* fixed syntax

* fixed syntax

Co-authored-by: Grant <56846128+gme5078@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants