This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Improve test seeding and robustness in test_numpy_interoperablity.py #18762
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Hey @DickJC123 , Thanks for submitting the PR
CI supported jobs: [centos-gpu, windows-cpu, unix-gpu, centos-cpu, unix-cpu, sanity, edge, windows-gpu, miscellaneous, clang, website] Note: |
Tagging @reminisce and @szha (who introduced/modified test_np_array_function_protocol) for comment. |
szha
approved these changes
Jul 20, 2020
DickJC123
added a commit
to DickJC123/mxnet
that referenced
this pull request
Sep 15, 2020
szha
pushed a commit
that referenced
this pull request
Sep 17, 2020
…so test seeding (#18762). (#19148) * Add sm arch 80 to Makefile * Unittest tolerance handling improvements (#18694) * Add sm arch 80 to Makefile * Add TF32 to cuBLAS GEMMs Signed-off-by: Serge Panev <spanev@nvidia.com> * Add CUDA version guards Signed-off-by: Serge Panev <spanev@nvidia.com> * Remove useless TF32 for double and old CUDA version Signed-off-by: Serge Panev <spanev@nvidia.com> * Factorize VERSION_ADJUSTED_TF32_MATH Signed-off-by: Serge Panev <spanev@nvidia.com> * Add TF32 considerations to test_util.py:check_consistency() * Bypass test_gluon_gpu.py:test_large_models if gmem >32GB * Default tols in assert_almost_equal() now a function of dtype and ctx * Expand types listed by default_tols() * Fix pylint * All with_seed() tests to waitall in teardown * Elevate MXNET_TEST_SEED logging to WARNING * Revert test_gluon_gpu.py:test_rnn_layer to default tols * Fix test_gluon_model_zoo_gpu.py::test_inference and test_operator_gpy.py::test_np_linalg_{solve,tensorinv} * test_numpy_interoperability.py to not fix seed for rest of CI * Further fix to test_np_linalg_tensorinv * Fix test_gluon_data.py:test_dataloader_context when run on 1-GPU system. * Fix test_operator_gpu.py::test_embedding_with_type * Fix test_operator_gpu.py::{test_*convolution_large_c,test_np_linalg_tensorsolve} * Remove unneeded print() from test_numpy_interoperability.py * Unify tol handling of check_consistency() and assert_almost_equal(). Test tweeks. * Add tol handling of assert_almost_equal() with number args * Add tol handling of bool comparisons * Fix test_numpy_op.py::test_np_random_rayleigh * Fix test_operator_gpu.py::test_batchnorm_with_type * Fix test_gluon.py::test_sync_batchnorm in cpu selftest * Improve unittest failure reporting * Add to robustness of test_operator_gpu.py::test_embedding_with_type * Check_consistency() to use equal backward gradients for increased test robustness * Fix test_operator_gpu.py::test_{fully_connected,gemm}. Add default_numeric_eps(). * test_utils.py fix for numeric gradient calc * Reinstate rtol=1e-2 for test_operator.py::test_order * Remove auto-cast of check_consistency() input data to least precise dtype (not needed) * Fix test_operator.py::test_{reciprocol,cbrt,rcbrt}_op * Expand default float64 numeric_eps for test_operator_gpu.py::test_sofmin * Fix segfault-on-error of @Retry decorator. Add test isolation. * assert_almost_equal() to handle a,b scalars * Fix test_operator_gpu.py::test_gluon_{mvn,mvn_v1} race * Fix test_operator_gpu.py::test_flatten_slice_after_conv via scale * Remove test_utils.py:almost_equal_ignore_nan() * Fix sample vs. pop variance issue with test_numpy_op.py::test_npx_batch_norm * Expose test_utils.py:effective_dtype() and use to fix test_operator_gpu.py::test_np_linalg_svd * Fix true_divide int_array / int_scalar -> float_array to honor np_default_dtype * Try test_elemwise_binary_ops serial to avoid pytest worker crash * Fix (log_)softmax backward on empty ndarray * Temporarily log all CI seeds to troubleshoot seed non-determinism * Revert "Temporarily log all CI seeds to troubleshoot seed non-determinism" This reverts commit f60eff2. * Temp log all CI seeds to troubleshoot unwanted seed determinism * Revert "Add sm arch 80 to Makefile" This reverts commit f9306ce. * Same fix of sample vs. pop variance issue, now with test_operator_gpu.py::test_batchnorm * Revert "Temp log all CI seeds to troubleshoot unwanted seed determinism" This reverts commit ff328ef. * Marking test_sparse_dot_grad with garbage_expected after teardown error * Fix flakiness of test_gluon_probability{_v1,_v2}.py::test_gluon_kl{_v1,} * Temp skip of test_aggregate_duplication on gpu * Add seeding to test_{numpy,}_contrib_gluon_data_vision.py. Make created files unique. * Add ndarray module isolation to help debug test_bbox_augmenters worker crash * Marking test_sparse_square_sum serial after pytest worker crash * Fix flakiness of test_gluon_probability{_v1,_v2}.py::test_half_cauchy{_v1,} Co-authored-by: Serge Panev <spanev@nvidia.com> Co-authored-by: Bart Gawrych <gawrych.bartlomiej@intel.com> * Fix test_gluon_data.py:test_dataloader_context when run on 1-GPU system. * Remove pytest decorators introduced in error * Fix test_forward.py:test_consistency * Fix test_numpy_op.py tests * Improve test seeding in test_numpy_interoperablity.py (#18762) * Fix test_numpy_op.py:test_np_random_{beta,chisquare} * Reduce problem sizes with test_optimizer.py:test_multilamb * Skip test_gluon_gpu.py:test_fused_{lstm,gpu}_layer, fix test_rnn_cells, for fp16 contexts * Trigger CI Co-authored-by: Serge Panev <spanev@nvidia.com> Co-authored-by: Bart Gawrych <gawrych.bartlomiej@intel.com>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
I recently ran into a CI failure in test_numpy_interoperability.py::test_np_array_function_protocol: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-cpu/detail/PR-18694/14/pipeline. I was not able to use the reported seed for the failure to reproduce it. I have investigated why and am supplying this PR as a fix- now reported seeds can be used to repro failures. I was then able to use the new facility to troubleshoot which tests needed loosened tolerances for increased test robustness, and have supplied that as well- the failure rate I estimate now is around 1:10000.
As a review, the robustness of a test should be able to be explored with:
The issue with test_numpy_interoperability.py was that it was creating a test workload at file import time using unseeded random values. The fix makes the workload be regenerated for each test at test runtime in a manner that will depend on the seed of the test.
The two tests that required loosened tolerances were linalg.tensorinv and linalg.solve. At the setting as I left them, I saw 1 failure in 10K trials. Rather than loosening the tolerances further, I will leave it to the code owners to diagnose the situation and propose a fix if they see fit to. The tolerances could be loosened further, but other approaches could involve changing the scale or other properties of the input data. The remaining failure can (after the PR is merged) be repro'd with:
A curious property of the remaining failure is that so many of the values are consistently smaller than the golden copy by 1.9%:
[This PR may have additional fixes to other tests if I can't get a clean CI]
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments