skip quantized conv flaky case #16866

xinyu-intel · 2019-11-20T06:57:55Z

Description

temporary skip this case.
@pengzhao-intel

#16830

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b

ChaiBapchya · 2019-11-27T00:48:49Z

Are you trying to use this PR for testing the issue? If that's the case it doesn't make sense coz its a costly experiment in that case.

xinyu-intel · 2019-11-27T00:59:12Z

Some flaky tests always fail in CI but cannot reproduce easily from local even with docker env on EC2. So debug with the CI environment can be more efficiency to find the root cause.

access2rohit · 2019-11-27T18:48:35Z

@xinyu-intel this is not a proper use of public resources. If you have issues with reproducing your issues on local then you should ask on dev@ channel thats a good place to start. Using CI for debugging causes a fresh run of all 11 test suits on your PR each time you push new changes. A single run of all 11 tests costs approximately 200$. Please be mindful of that.

larroy · 2019-11-27T19:02:00Z

@xinyu-intel you can use the ci/build.py tool to easily reproduce what runs on CI. it's dockerized so it's going to produce the same result with very high probability: https://cwiki.apache.org/confluence/display/MXNET/Reproducing+test+results#Reproducingtestresults-2.1.Build

xinyu-intel · 2019-11-28T00:37:51Z

okay, I'll ask in dev list next time.

xinyu-intel · 2019-11-28T00:45:09Z

@larroy I have already used this script and it is very easy for use. However, in contrast with "very high probability", this case also cannot be reproduced withci\docker.py scripts on c5.18xlarge:)
I'll skip this case to save the cost. But you should fix the https error first:(http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-16866/11/pipeline

pengzhao-intel · 2019-11-28T02:01:53Z

Let's skip this test first :)

pengzhao-intel

LGTM

TaoLv · 2019-11-28T03:18:34Z

@ChaiBapchya @access2rohit @larroy Really appreciate your suggestions and frugal spirit for MXNet development. I think this PR is not the first time for contributors to use CI for debugging purpose, and most likely it will not be the last one. I just took two simple filters on the PR list:
https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+in%3Atitle+%22do+not+merge%22
https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+in%3Atitle+%22debug%22
@access2rohit Can you explain your PRs at the top of the filtered list? If you really worry about the usage of CI resources, could you please bring this topic to @dev to forbid this behavior and note it somewhere if you get a consensus there?

I believe @xinyu-intel has tried his best to reproduce this issue on both local machine and also on EC2 instances to better emulate the CI environment. But unfortunately, he cannot reproduce the error and I think it's not the first time that we cannot reproduce an error out of CI.

Another choice is maybe someone can contribute to the CI system so developers can only trigger just one test suits rather than all of the 11 tests suits. It will be very helpful especially to document contributors.

access2rohit · 2019-11-28T08:07:03Z

@TaoLv I was already pointed out by @szha about the consequences of opening such PRs already and I closed those PRs. Also, if I had done that before doesn't mean I cannot suggest other developers not to do the same mistake as I did. Yes, if such a behaviour becomes widespread I will start a thread on @dev list.

TaoLv · 2019-11-28T15:04:24Z

@access2rohit Good to learn that.
@xinyu-intel Please try it again to see if you can reproduce the problem out of CI. If not, seems that we have no other choice but to disable this UT.

wkcn · 2019-12-07T11:17:23Z

Hi @access2rohit and @larroy, I have several suggestions about MXNet CI.

Allow trusty contributors or committers to get access to the machines of MXNet CI by ssh
Contributors may could not reproduce issues on their local machines. They don't have the server, as the same as that on MXNet CI. When some unit-tests fail on CI, they can only read the log, but they couldn't look for the reason in detail. For example, we met the problem about CI driver, like PR Downgrade the cublas version for nightly gpu #16985. Since of no permission, other contributors couldn't address the issue. Contributors may need to access to the CI machine, in order to solve the problem.

It is better to provide some machines to contributors to debug : )

Allow contributors to do single run and stop the unit-test.
Because of network error or precision problem, unit-tests may fail. Contributors need to restart all tests recently. If CI supports single run, the cost of CI will be reduced. If contributors submit a new commit, the old CI test should be stopped. When contributors realize that there is a mistake in this PR, they should have the permission to stop the CI test.
For document update (such as fixing typo), it will be better to build the documents only.

Thanks.

marcoabreu · 2019-12-07T12:02:37Z

Option 1 is certainly not an option. The CI is a productive system and granting ssh access is a security concern. Amazon security was very explicit when we went through the security audit.

wkcn

LGTM. Thank you for the fix : )

* Fix quantized concat when inputs are mixed int8 and uint8 Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b * skip flaky test * trigger ci [Numpy] add op full_like, c++ impl, fix zeros_like, ones_like type inference (apache#16804)

* Fix quantized concat when inputs are mixed int8 and uint8 Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b * skip flaky test * trigger ci [Numpy] add op full_like, c++ impl, fix zeros_like, ones_like type inference (#16804)

ChaiBapchya · 2020-06-14T18:13:34Z

tests/python/quantization/test_quantization.py

- print('skipped testing quantized_conv for mkldnn cpu int8 since it is not supported yet')
+ elif is_test_for_mkldnn():
+ # (TODO)Xinyu: https://github.com/apache/incubator-mxnet/issues/16830
+ print('skipped testing quantized_conv for mkldnn cpu since it is a flaky case')


@xinyu-intel Any chance we switch this back on?

* Fix quantized concat when inputs are mixed int8 and uint8 Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b * skip flaky test * trigger ci

…8560) * fix centos 7 url to unblock centos-cpu & gpu pipeline * [v1.7.x] update jetson dockerfile to support CUDA 10.0 (#18339) * update dockerfile for jetson * add toolchain files * update build_jetson function * update ubuntu_julia.sh * update FindCUDAToolkit.cmake * Update centos7_python.sh * revert changes on ubuntu_julia.sh * disable TVM for gpu build * Disable TVM_OP on GPU builds Co-authored-by: Wei Chu <weichu@amazon.com> Co-authored-by: Leonard Lausen <leonard@lausen.nl> * skip quantized conv flaky case (#16866) * Fix quantized concat when inputs are mixed int8 and uint8 Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b * skip flaky test * trigger ci Co-authored-by: waytrue17 <52505574+waytrue17@users.noreply.github.com> Co-authored-by: Wei Chu <weichu@amazon.com> Co-authored-by: Leonard Lausen <leonard@lausen.nl> Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>

* Fix quantized concat when inputs are mixed int8 and uint8 Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b * skip flaky test * trigger ci

* add the missing build_ubuntu_gpu_cuda101_cudnn7_mkldnn_cpp_test in runtime_functions.sh * Revert "add the missing build_ubuntu_gpu_cuda101_cudnn7_mkldnn_cpp_test in runtime_functions.sh" This reverts commit de173b0. * Revert "[CI][1.6.x] fix centos 7 url to unblock centos-cpu & gpu pipeline (#18560)" This reverts commit d271348. * fix centos 7 url to unblock centos-cpu & gpu pipeline * skip quantized conv flaky case (#16866) * Fix quantized concat when inputs are mixed int8 and uint8 Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b * skip flaky test * trigger ci * Trigger empty commit * [v1.7.x] update jetson dockerfile to support CUDA 10.0 (#18339) * update dockerfile for jetson * add toolchain files * update build_jetson function * update ubuntu_julia.sh * update FindCUDAToolkit.cmake * Update centos7_python.sh * revert changes on ubuntu_julia.sh * disable TVM for gpu build * Disable TVM_OP on GPU builds Co-authored-by: Wei Chu <weichu@amazon.com> Co-authored-by: Leonard Lausen <leonard@lausen.nl> * add setuptools to ci/docker/install/requirements * add missing build_ubuntu_gpu_cuda101_cudnn7_mkldnn_cpp_test * add setuptool to docker & cpp-test build syntax error * remove erroneously added cpp tests in 1.6.x * py3 to p2 Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com> Co-authored-by: waytrue17 <52505574+waytrue17@users.noreply.github.com> Co-authored-by: Wei Chu <weichu@amazon.com> Co-authored-by: Leonard Lausen <leonard@lausen.nl>

ZhennanQin and others added 15 commits July 24, 2019 21:53

Fix quantized concat when inputs are mixed int8 and uint8

b9c4957

Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b

Merge branch 'master' of https://github.com/apache/incubator-mxnet

207d72d

Merge branch 'master' of https://github.com/apache/incubator-mxnet

5b8410e

Merge branch 'master' of https://github.com/apache/incubator-mxnet

6d91299

Merge branch 'master' of https://github.com/apache/incubator-mxnet

329f00c

Merge branch 'master' of https://github.com/apache/incubator-mxnet

425f8a1

Merge branch 'master' of https://github.com/apache/incubator-mxnet

6194767

Merge branch 'master' of https://github.com/apache/incubator-mxnet

0dfd159

Merge branch 'master' of https://github.com/apache/incubator-mxnet

160f654

Merge branch 'master' of https://github.com/apache/incubator-mxnet

7b07927

Merge branch 'master' of https://github.com/apache/incubator-mxnet

180f26e

Merge branch 'master' of https://github.com/apache/incubator-mxnet

bdb7f39

Merge branch 'master' of https://github.com/apache/incubator-mxnet

967c514

Merge branch 'master' of https://github.com/apache/incubator-mxnet

075a63d

Merge branch 'master' of https://github.com/apache/incubator-mxnet

655daf5

xinyu-intel requested review from aaronmarkham and marcoabreu as code owners November 20, 2019 06:57

xinyu-intel force-pushed the test_ci branch from 8bcf272 to 0bd5d15 Compare November 25, 2019 02:06

xinyu-intel changed the title ~~[DO NOT MERGE] Test quantized_conv flaky case~~ Enhance quantized_conv case Nov 25, 2019

xinyu-intel changed the title ~~Enhance quantized_conv case~~ [WIP]Enhance quantized_conv case Nov 26, 2019

pengzhao-intel mentioned this pull request Nov 26, 2019

CI error in unix gpu test_quantization_gpu.test_quantized_conv #16830

Closed

skip flaky test

dff1f5a

xinyu-intel force-pushed the test_ci branch from 04fab8b to dff1f5a Compare November 27, 2019 02:49

xinyu-intel changed the title ~~[WIP]Enhance quantized_conv case~~ skip quantized conv flaky case Nov 27, 2019

trigger ci

f96ba20

pengzhao-intel approved these changes Nov 28, 2019

View reviewed changes

Merge branch 'master' of https://github.com/apache/incubator-mxnet

bc368cb

xinyu-intel added 2 commits December 2, 2019 10:52

Merge branch 'master' of https://github.com/apache/incubator-mxnet

17714ec

Merge branch 'master' into test_ci

0627737

Merge remote-tracking branch 'origin/master' into test_ci

af94948

wkcn approved these changes Dec 9, 2019

View reviewed changes

pengzhao-intel merged commit 3b8fdac into apache:master Dec 9, 2019

ChaiBapchya reviewed Jun 14, 2020

View reviewed changes

ChaiBapchya pushed a commit to ChaiBapchya/mxnet that referenced this pull request Jun 15, 2020

skip quantized conv flaky case (apache#16866)

b426623

* Fix quantized concat when inputs are mixed int8 and uint8 Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b * skip flaky test * trigger ci

ChaiBapchya pushed a commit to ChaiBapchya/mxnet that referenced this pull request Jun 21, 2020

skip quantized conv flaky case (apache#16866)

143bbc3

* Fix quantized concat when inputs are mixed int8 and uint8 Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b * skip flaky test * trigger ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skip quantized conv flaky case #16866

skip quantized conv flaky case #16866

xinyu-intel commented Nov 20, 2019 •

edited

Loading

ChaiBapchya commented Nov 27, 2019

xinyu-intel commented Nov 27, 2019

access2rohit commented Nov 27, 2019

larroy commented Nov 27, 2019

xinyu-intel commented Nov 28, 2019

xinyu-intel commented Nov 28, 2019

pengzhao-intel commented Nov 28, 2019

pengzhao-intel left a comment

TaoLv commented Nov 28, 2019

access2rohit commented Nov 28, 2019

TaoLv commented Nov 28, 2019

wkcn commented Dec 7, 2019 •

edited

Loading

marcoabreu commented Dec 7, 2019 •

edited

Loading

wkcn left a comment

ChaiBapchya Jun 14, 2020

skip quantized conv flaky case #16866

skip quantized conv flaky case #16866

Conversation

xinyu-intel commented Nov 20, 2019 • edited Loading

Description

Checklist

Essentials

Changes

Comments

ChaiBapchya commented Nov 27, 2019

xinyu-intel commented Nov 27, 2019

access2rohit commented Nov 27, 2019

larroy commented Nov 27, 2019

xinyu-intel commented Nov 28, 2019

xinyu-intel commented Nov 28, 2019

pengzhao-intel commented Nov 28, 2019

pengzhao-intel left a comment

Choose a reason for hiding this comment

TaoLv commented Nov 28, 2019

access2rohit commented Nov 28, 2019

TaoLv commented Nov 28, 2019

wkcn commented Dec 7, 2019 • edited Loading

marcoabreu commented Dec 7, 2019 • edited Loading

wkcn left a comment

Choose a reason for hiding this comment

ChaiBapchya Jun 14, 2020

Choose a reason for hiding this comment

xinyu-intel commented Nov 20, 2019 •

edited

Loading

wkcn commented Dec 7, 2019 •

edited

Loading

marcoabreu commented Dec 7, 2019 •

edited

Loading