Variable types #10 #12

j-grover · 2019-10-18T00:56:24Z

Adding variable types as parameters to auto_entityset, make_entityset
Test in tests/test_normalize
Updated README
Resolves Variable types not preserved after call to normalize_entity() #10

CLAassistant · 2019-10-18T00:56:57Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

thehomebrewnerd · 2020-03-12T14:36:09Z

Generally I think this PR looks pretty good, but I have a question about how we want this to behave for the ID columns. If you look at the original entityset (entityset) generated in test_variable_types before normalizing, the customer_id is of type Numeric. However, if you look at this same column of the session_id entity in the normalized entityset, this same customer_id column now is now of type Id.

@rwedge Is this behavior acceptable? I know we will have some differences in variable types where a column in the original non-normalized entityset gets set as an index in a normalized entity, but wasn't sure how we wanted non-index columns to be treated throughout.

We probably should also update the test to test all of the columns we expect to have the same variable types throughout instead of just the single zip_code column.

rwedge · 2020-03-13T16:45:15Z

@thehomebrewnerd changing the variable we normalized on to Id type is standard featuretools behavior. I think the logic is that we now know this variable can be treated as a foreign key, so we label it as a special categorical type, Id.

I agree the test should test all of the columns, and I think we should test that make_entityset and auto_entityset use variable_types as expected

rwedge · 2020-03-13T18:57:36Z

@j-grover sorry for the delay on the review, are you interested in updating the tests?

j-grover · 2020-03-15T05:52:59Z

@rwedge Updated test to check all columns

thehomebrewnerd · 2020-03-16T15:59:10Z

@j-grover Thanks for the quick response and updates to the test. Since we have now modified the parameters for the auto_entityset and make_entityset to add the optional variable_types parameter, it would also be good to add specific tests for these two functions to make sure they behave properly and return the expected result when using this optional parameter.

Is that something you would be able to do as well?

j-grover · 2020-03-22T04:11:33Z

@j-grover Thanks for the quick response and updates to the test. Since we have now modified the parameters for the auto_entityset and make_entityset to add the optional variable_types parameter, it would also be good to add specific tests for these two functions to make sure they behave properly and return the expected result when using this optional parameter.

Is that something you would be able to do as well?

What sort of things are we looking to test for these two methods. For example one case with default values and one with custom args?

thehomebrewnerd · 2020-03-23T12:53:52Z

@j-grover Thanks for the quick response and updates to the test. Since we have now modified the parameters for the auto_entityset and make_entityset to add the optional variable_types parameter, it would also be good to add specific tests for these two functions to make sure they behave properly and return the expected result when using this optional parameter.
Is that something you would be able to do as well?

What sort of things are we looking to test for these two methods. For example one case with default values and one with custom args?

Yes, that is what I was thinking...tests very similar to the test you added for normalize_entity, just to make sure those methods return the expected results when passing a variable_types parameter. Since these methods are part of the public API, it would be good for test coverage to have specific tests that cover their possible use cases as well.

j-grover · 2020-04-13T04:57:29Z

@thehomebrewnerd
I've added the initial tests for make_entityset and auto_entityset. There is overlap between the tests as auto_entityset makes use of make_entityset internally. When testing auto_entityset, I've found that for entity 0 (refer to line 330 in test_normalize.py) the name changes between 'jersey_num_team' and 'team_jersey_num' for different runs. This causes the following to fail occasionally:
assert normalized_entityset.entities[0].variable_types['jersey_num_team'] == Index
How would you go about testing this?

thehomebrewnerd · 2020-04-13T13:40:54Z

@thehomebrewnerd
I've added the initial tests for make_entityset and auto_entityset. There is overlap between the tests as auto_entityset makes use of make_entityset internally. When testing auto_entityset, I've found that for entity 0 (refer to line 330 in test_normalize.py) the name changes between 'jersey_num_team' and 'team_jersey_num' for different runs. This causes the following to fail occasionally:
assert normalized_entityset.entities[0].variable_types['jersey_num_team'] == Index
How would you go about testing this?

@j-grover Thanks for creating these tests. If we expect this behavior - where the name is not deterministic - we could set a variable for the name that is actually returned and then use that variable in the tests. One way that comes to mind would be to check if team_jersey_num is in the variable_types dictionary keys and if it is not we would use the other option of jersey_num_team. Something like this:

# Index name is not always the same - checking what was returned
index_vname = "jersey_num_team"
if index_vname not in normalized_entityset.entities[0].variable_types.keys():
    index_vname = "team_jersey_num"
assert normalized_entityset.entities[0].variable_types[index_vname] == Index

I think you would also need to rename one of the dataframe columns for the df.equals test to pass reliably as well.

I don't know enough about the details of this code to know if expect to get the same name back every time, but I can look into that a bit more in the meantime to make sure this isn't highlighting some other issue.

rwedge · 2020-04-14T16:39:43Z

autonormalize/tests/test_normalize.py

+
+
+def test_auto_entityset_custom_args():
+    dic = {'team': ['Red', 'Red', 'Red', 'Orange', 'Orange', 'Yellow',


these dictionaries get re-used in several tests, creating many duplicated lines of code. Could you turn these dictionaries into pytest fixtures so they're only defined once?

thehomebrewnerd · 2020-04-17T18:37:25Z

@j-grover After reviewing the code with @rwedge , the non-deterministic nature of column names is to be expected as the new names are created by joining an unsorted list of column names. Issue #24 was created to fix this problem, so for this PR I suggest we go ahead and implement the tests as we have described above, and then we can update later after issue #24 is closed.

j-grover · 2020-04-18T05:18:38Z

@j-grover After reviewing the code with @rwedge , the non-deterministic nature of column names is to be expected as the new names are created by joining an unsorted list of column names. Issue #24 was created to fix this problem, so for this PR I suggest we go ahead and implement the tests as we have described above, and then we can update later after issue #24 is closed.

Sorting the list sounds good. I have change the index names accordingly to jersey_num_team.
@rwedge I've added a pytest fixture for that particular example

j-grover added 2 commits October 18, 2019 12:42

Adding variable types to autonormalize

561b33f

Update README.md

05ab955

j-grover force-pushed the variable-types-#10 branch from e3aff39 to 05ab955 Compare October 18, 2019 01:45

j-grover force-pushed the variable-types-#10 branch 2 times, most recently from 361a774 to 8cc177c Compare March 15, 2020 05:47

Testing variable types of all columns in normalized entityset

db8257a

j-grover force-pushed the variable-types-#10 branch from 8cc177c to db8257a Compare March 15, 2020 05:50

Initial tests for make_entityset and auto_entityset

46c2a19

j-grover force-pushed the variable-types-#10 branch from 7060663 to 46c2a19 Compare April 13, 2020 04:49

rwedge reviewed Apr 14, 2020

View reviewed changes

thehomebrewnerd mentioned this pull request Apr 17, 2020

New index columns are not deterministic #24

Closed

j-grover added 3 commits April 18, 2020 14:59

Adding pytest fixture for teams example

5a66f94

Changing new index names to be alphabetical for teams example

0d96a1b

Merge branch 'master' into variable-types-alteryx#10

72c82fd

j-grover force-pushed the variable-types-#10 branch from e0e8de0 to 72c82fd Compare April 18, 2020 05:15

Base automatically changed from master to main February 19, 2021 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variable types #10 #12

Variable types #10 #12

j-grover commented Oct 18, 2019

CLAassistant commented Oct 18, 2019 •

edited

Loading

thehomebrewnerd commented Mar 12, 2020

rwedge commented Mar 13, 2020

rwedge commented Mar 13, 2020

j-grover commented Mar 15, 2020

thehomebrewnerd commented Mar 16, 2020

j-grover commented Mar 22, 2020

thehomebrewnerd commented Mar 23, 2020 •

edited

Loading

j-grover commented Apr 13, 2020

thehomebrewnerd commented Apr 13, 2020

rwedge Apr 14, 2020

thehomebrewnerd commented Apr 17, 2020

j-grover commented Apr 18, 2020



		def test_auto_entityset_custom_args():
		dic = {'team': ['Red', 'Red', 'Red', 'Orange', 'Orange', 'Yellow',

Variable types #10 #12

Are you sure you want to change the base?

Variable types #10 #12

Conversation

j-grover commented Oct 18, 2019

CLAassistant commented Oct 18, 2019 • edited Loading

thehomebrewnerd commented Mar 12, 2020

rwedge commented Mar 13, 2020

rwedge commented Mar 13, 2020

j-grover commented Mar 15, 2020

thehomebrewnerd commented Mar 16, 2020

j-grover commented Mar 22, 2020

thehomebrewnerd commented Mar 23, 2020 • edited Loading

j-grover commented Apr 13, 2020

thehomebrewnerd commented Apr 13, 2020

rwedge Apr 14, 2020

Choose a reason for hiding this comment

thehomebrewnerd commented Apr 17, 2020

j-grover commented Apr 18, 2020

CLAassistant commented Oct 18, 2019 •

edited

Loading

thehomebrewnerd commented Mar 23, 2020 •

edited

Loading