[tests][dask] Create an informative categorical feature #4113

jmoralez · 2021-03-26T03:48:13Z

This modifies the dataframe-with-categorical output type to create only two categorical features and makes one of them informative for the task.

In order to have more consistent results I've modified the _create_data function to generate 1,000 samples instead of 100, which caused problems with categoricals and reduced the accuracy of the distributed training. Also, for regression 100 features were being generated, I changed this to generate only 4, with 2 informative.

Adding more samples allows the tolerances in the array checks to be lower, but I had to increase a bit the complexity of the models for test_regressor and test_classifier.

I didn't change anything in _create_ranking_data because I haven't seen problems with those tests, but could take a look as well.

… n_features for regression

jameslamb

overall, looks great! I totally support this change, and it's ok to not touch the ranker tests for now. I'm glad it allowed for some simplifications, and will enable us to go deeper on testing the statistical behavior of the Dask module (#3896 (comment)).

I looked at the logs to see if making the models more complex had a big impact on total runtime of the tests. It doesn't seem like it!

See the results below for the linux_regular job. A 20 second difference is not problematic given the run time of this project's CI jobs. Just comparing one job isn't a precise estimate, but it's enough to give me confidence that this change won't have a huge impact on the test runtime.

latest build on master (link)

===== 388 passed, 7 skipped, 2 xfailed, 357 warnings in 696.97s (0:11:36) ======

this PR (link)

===== 388 passed, 7 skipped, 2 xfailed, 357 warnings in 720.08s (0:12:00) ======

jameslamb · 2021-03-26T06:48:53Z

tests/python_package_test/test_dask.py

-        assert_eq(s1, s2, atol=.01)
-        assert_eq(s1, s1_local, atol=.003)
+    assert_eq(s1, s2, atol=.01)
+    assert_eq(s1, s1_local, atol=0.003)


when you tested, did you try running these tests multiple times in a row? I'd like to know how close these tolerances are to the maximum difference that you saw. It's good to push these tolerances down to catch big differences between distributed training and non-distributed training, but not so close to the edge that the tests might fail sometimes due to randomness in how the chunks are distributed across the workers.

For the test above the maximum absolute difference I observed was 0.0025, so 0.01 should be fine. For this one I believe we can even remove the atol, since this checks the score for the same model, doesn't it?

great, thank you. Yes, I think you're right. s1 and s1_local should be identical since they're from the same model.

I played around with the checks for the predictions and the target and found that for atol=50 we could go to rtol=0.5 (the maximum observed was 0.25). But maybe we could instead check p1 vs p2? They're very similar so maybe that'd be a better check and could go very low in the tolerances as well. Also I switched the order in the comparison for this one because assert_eq(a, b) checks abs(a - b) <= atol + rtol * abs(b), so I think it makes more sense that b = y in this case. What do you think?

ok I'm a little confused by your notation, sorry. Are you referring to this change specifically?

To be honest, I don't understand what you mean. Can you give a small reproducible example that shows assert_eq() passing or failing based only on the order that you pass things in?

To me, it looks like the checks in that function are symmetric: https://github.com/dask/dask/blob/913a192fd3b35696883bdd4c4eeba2418b946b3a/dask/dataframe/utils.py#L841-L847.

Haha, I'm sorry. Yes, I meant those checks exactly. The asymmetric part is in https://github.com/dask/dask/blob/913a192fd3b35696883bdd4c4eeba2418b946b3a/dask/dataframe/utils.py#L847, which calls np.allclose, and np.isclose

Here's an example:

import numpy as np a = 100 b = 120 np.allclose(a, b, rtol=0.1, atol=9) # (20 - 9) / 0.1 = 110 <= 120 (True) np.allclose(b, a, rtol=0.1, atol=9) # (20 - 9) / 0.1 = 110 <= 100 (False)

Here's with assert_eq:

import numpy as np from dask.array.utils import assert_eq a = np.array([100]) b = np.array([120]) assert_eq(a, b, rtol=0.1, atol=9) # passes assert_eq(b, a, rtol=0.1, atol=9) # fails

🤯 I had no idea that these were not symmetric! Thanks for the example. I see now that there's a note about this in the np.isclose() docs:

Unlike the built-in math.isclose, the above equation is not symmetric
in a and b -- it assumes b is the reference value -- so that
isclose(a, b) might be different from isclose(b, a)

Ok, since this is checking that we get a "good enough" model result, I agree with the change to treat y (the actual value of the target) as the "reference value". Thanks for the attention to detail.

also now I know @jmoralez and @jose-moralez are the same person 😛

haha, yes. This is the account I use at work. I usually contribute on my free time using my personal account, but feel free to ping me on this one at work hours.

StrikerRUS · 2021-03-26T12:44:40Z

tests/python_package_test/test_dask.py

-    if not output.startswith('dataframe'):
-        assert_eq(s1, s2, atol=.01)
-        assert_eq(s1, s1_local, atol=.003)
+    assert_eq(s1, s2, atol=.01)


Suggested change

assert_eq(s1, s2, atol=.01)

assert_eq(s1, s2, atol=0.01)

jose-moralez · 2021-03-26T17:13:00Z

There's a failing test in linux regular. I've seen this happen a couple of times now, the environment finds conflicts and then gives up.

jameslamb

Looks great to me, thanks very much for the suggestions.

jameslamb · 2021-03-26T17:22:49Z

There's a failing test in linux regular. I've seen this happen a couple of times now, the environment finds conflicts and then gives up.

yeah, I've seen this a few times too :/

We try not to use pinned dependencies for Python packages in this project's CI, to be sure the project is tested against new releases. Sometimes this can result in failures like that where conda thinks it cannot resolve an environment. Usually 1 rebuild fixes this, and I just restarted that job.

github-actions · 2023-08-23T22:52:19Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

make one categorical variable informative. increase n_samples. reduce…

66dc455

… n_features for regression

jmoralez requested a review from jameslamb as a code owner March 26, 2021 03:48

jameslamb reviewed Mar 26, 2021

View reviewed changes

StrikerRUS approved these changes Mar 26, 2021

View reviewed changes

StrikerRUS added the maintenance label Mar 26, 2021

adjust tolerances in checks

434e18a

jameslamb self-requested a review March 26, 2021 17:14

jameslamb approved these changes Mar 26, 2021

View reviewed changes

jameslamb merged commit 8cc6eef into microsoft:master Mar 26, 2021

jmoralez deleted the tests-categoricals branch March 26, 2021 20:27

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tests][dask] Create an informative categorical feature #4113

[tests][dask] Create an informative categorical feature #4113

jmoralez commented Mar 26, 2021

jameslamb left a comment •

edited

Loading

jameslamb Mar 26, 2021

jmoralez Mar 26, 2021

jameslamb Mar 26, 2021

jmoralez Mar 26, 2021

jameslamb Mar 26, 2021

jose-moralez Mar 26, 2021 •

edited

Loading

jameslamb Mar 26, 2021

jameslamb Mar 26, 2021

jose-moralez Mar 26, 2021

StrikerRUS Mar 26, 2021

jose-moralez commented Mar 26, 2021

jameslamb left a comment

jameslamb commented Mar 26, 2021

github-actions bot commented Aug 23, 2023

[tests][dask] Create an informative categorical feature #4113

[tests][dask] Create an informative categorical feature #4113

Conversation

jmoralez commented Mar 26, 2021

jameslamb left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jose-moralez Mar 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jose-moralez commented Mar 26, 2021

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb commented Mar 26, 2021

github-actions bot commented Aug 23, 2023

jameslamb left a comment •

edited

Loading

jose-moralez Mar 26, 2021 •

edited

Loading