Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

skip quantized conv flaky case #16866

Merged
merged 21 commits into from
Dec 9, 2019
Merged

Conversation

xinyu-intel
Copy link
Contributor

@xinyu-intel xinyu-intel commented Nov 20, 2019

Description

temporary skip this case.
@pengzhao-intel

#16830

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@xinyu-intel xinyu-intel changed the title [DO NOT MERGE] Test quantized_conv flaky case Enhance quantized_conv case Nov 25, 2019
@xinyu-intel xinyu-intel changed the title Enhance quantized_conv case [WIP]Enhance quantized_conv case Nov 26, 2019
@ChaiBapchya
Copy link
Contributor

Are you trying to use this PR for testing the issue? If that's the case it doesn't make sense coz its a costly experiment in that case.

@xinyu-intel
Copy link
Contributor Author

Some flaky tests always fail in CI but cannot reproduce easily from local even with docker env on EC2. So debug with the CI environment can be more efficiency to find the root cause.

@xinyu-intel xinyu-intel changed the title [WIP]Enhance quantized_conv case skip quantized conv flaky case Nov 27, 2019
@access2rohit
Copy link
Contributor

@xinyu-intel this is not a proper use of public resources. If you have issues with reproducing your issues on local then you should ask on dev@ channel thats a good place to start. Using CI for debugging causes a fresh run of all 11 test suits on your PR each time you push new changes. A single run of all 11 tests costs approximately 200$. Please be mindful of that.

@larroy
Copy link
Contributor

larroy commented Nov 27, 2019

@xinyu-intel you can use the ci/build.py tool to easily reproduce what runs on CI. it's dockerized so it's going to produce the same result with very high probability: https://cwiki.apache.org/confluence/display/MXNET/Reproducing+test+results#Reproducingtestresults-2.1.Build

@xinyu-intel
Copy link
Contributor Author

okay, I'll ask in dev list next time.

@xinyu-intel
Copy link
Contributor Author

@larroy I have already used this script and it is very easy for use. However, in contrast with "very high probability", this case also cannot be reproduced withci\docker.py scripts on c5.18xlarge:)
I'll skip this case to save the cost. But you should fix the https error first:(http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-16866/11/pipeline

@pengzhao-intel
Copy link
Contributor

Let's skip this test first :)

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TaoLv
Copy link
Member

TaoLv commented Nov 28, 2019

@ChaiBapchya @access2rohit @larroy Really appreciate your suggestions and frugal spirit for MXNet development. I think this PR is not the first time for contributors to use CI for debugging purpose, and most likely it will not be the last one. I just took two simple filters on the PR list:
https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+in%3Atitle+%22do+not+merge%22
https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+in%3Atitle+%22debug%22
@access2rohit Can you explain your PRs at the top of the filtered list? If you really worry about the usage of CI resources, could you please bring this topic to @dev to forbid this behavior and note it somewhere if you get a consensus there?

I believe @xinyu-intel has tried his best to reproduce this issue on both local machine and also on EC2 instances to better emulate the CI environment. But unfortunately, he cannot reproduce the error and I think it's not the first time that we cannot reproduce an error out of CI.

Another choice is maybe someone can contribute to the CI system so developers can only trigger just one test suits rather than all of the 11 tests suits. It will be very helpful especially to document contributors.

@access2rohit
Copy link
Contributor

@TaoLv I was already pointed out by @szha about the consequences of opening such PRs already and I closed those PRs. Also, if I had done that before doesn't mean I cannot suggest other developers not to do the same mistake as I did. Yes, if such a behaviour becomes widespread I will start a thread on @dev list.

@TaoLv
Copy link
Member

TaoLv commented Nov 28, 2019

@access2rohit Good to learn that.
@xinyu-intel Please try it again to see if you can reproduce the problem out of CI. If not, seems that we have no other choice but to disable this UT.

@wkcn
Copy link
Member

wkcn commented Dec 7, 2019

Hi @access2rohit and @larroy, I have several suggestions about MXNet CI.

  1. Allow trusty contributors or committers to get access to the machines of MXNet CI by ssh
    Contributors may could not reproduce issues on their local machines. They don't have the server, as the same as that on MXNet CI. When some unit-tests fail on CI, they can only read the log, but they couldn't look for the reason in detail. For example, we met the problem about CI driver, like PR Downgrade the cublas version for nightly gpu #16985. Since of no permission, other contributors couldn't address the issue. Contributors may need to access to the CI machine, in order to solve the problem.

It is better to provide some machines to contributors to debug : )

  1. Allow contributors to do single run and stop the unit-test.
    Because of network error or precision problem, unit-tests may fail. Contributors need to restart all tests recently. If CI supports single run, the cost of CI will be reduced. If contributors submit a new commit, the old CI test should be stopped. When contributors realize that there is a mistake in this PR, they should have the permission to stop the CI test.
    For document update (such as fixing typo), it will be better to build the documents only.

Thanks.

@marcoabreu
Copy link
Contributor

marcoabreu commented Dec 7, 2019

Option 1 is certainly not an option. The CI is a productive system and granting ssh access is a security concern. Amazon security was very explicit when we went through the security audit.

Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for the fix : )

@pengzhao-intel pengzhao-intel merged commit 3b8fdac into apache:master Dec 9, 2019
Alicia1529 pushed a commit to Alicia1529/incubator-mxnet that referenced this pull request Dec 9, 2019
* Fix quantized concat when inputs are mixed int8 and uint8

Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b

* skip flaky test

* trigger ci

[Numpy] add op full_like, c++ impl, fix zeros_like, ones_like type inference (apache#16804)
haojin2 pushed a commit that referenced this pull request Dec 9, 2019
* Fix quantized concat when inputs are mixed int8 and uint8

Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b

* skip flaky test

* trigger ci

[Numpy] add op full_like, c++ impl, fix zeros_like, ones_like type inference (#16804)
print('skipped testing quantized_conv for mkldnn cpu int8 since it is not supported yet')
elif is_test_for_mkldnn():
# (TODO)Xinyu: https://github.com/apache/incubator-mxnet/issues/16830
print('skipped testing quantized_conv for mkldnn cpu since it is a flaky case')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xinyu-intel Any chance we switch this back on?

ChaiBapchya pushed a commit to ChaiBapchya/mxnet that referenced this pull request Jun 15, 2020
* Fix quantized concat when inputs are mixed int8 and uint8

Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b

* skip flaky test

* trigger ci
szha pushed a commit that referenced this pull request Jun 15, 2020
…8560)

* fix centos 7 url to unblock centos-cpu & gpu pipeline

* [v1.7.x] update jetson dockerfile to support CUDA 10.0 (#18339)

* update dockerfile for jetson

* add toolchain files

* update build_jetson function

* update ubuntu_julia.sh

* update FindCUDAToolkit.cmake

* Update centos7_python.sh

* revert changes on ubuntu_julia.sh

* disable TVM for gpu build

* Disable TVM_OP on GPU builds

Co-authored-by: Wei Chu <weichu@amazon.com>
Co-authored-by: Leonard Lausen <leonard@lausen.nl>

* skip quantized conv flaky case (#16866)

* Fix quantized concat when inputs are mixed int8 and uint8

Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b

* skip flaky test

* trigger ci

Co-authored-by: waytrue17 <52505574+waytrue17@users.noreply.github.com>
Co-authored-by: Wei Chu <weichu@amazon.com>
Co-authored-by: Leonard Lausen <leonard@lausen.nl>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
ChaiBapchya pushed a commit to ChaiBapchya/mxnet that referenced this pull request Jun 21, 2020
* Fix quantized concat when inputs are mixed int8 and uint8

Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b

* skip flaky test

* trigger ci
sandeep-krishnamurthy pushed a commit that referenced this pull request Jul 1, 2020
* add the missing build_ubuntu_gpu_cuda101_cudnn7_mkldnn_cpp_test in runtime_functions.sh

* Revert "add the missing build_ubuntu_gpu_cuda101_cudnn7_mkldnn_cpp_test in runtime_functions.sh"

This reverts commit de173b0.

* Revert "[CI][1.6.x] fix centos 7 url to unblock centos-cpu & gpu pipeline (#18560)"

This reverts commit d271348.

* fix centos 7 url to unblock centos-cpu & gpu pipeline

* skip quantized conv flaky case (#16866)

* Fix quantized concat when inputs are mixed int8 and uint8

Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b

* skip flaky test

* trigger ci

* Trigger empty commit

* [v1.7.x] update jetson dockerfile to support CUDA 10.0 (#18339)

* update dockerfile for jetson

* add toolchain files

* update build_jetson function

* update ubuntu_julia.sh

* update FindCUDAToolkit.cmake

* Update centos7_python.sh

* revert changes on ubuntu_julia.sh

* disable TVM for gpu build

* Disable TVM_OP on GPU builds

Co-authored-by: Wei Chu <weichu@amazon.com>
Co-authored-by: Leonard Lausen <leonard@lausen.nl>

* add setuptools to ci/docker/install/requirements

* add missing build_ubuntu_gpu_cuda101_cudnn7_mkldnn_cpp_test

* add setuptool to docker & cpp-test build syntax error

* remove erroneously added cpp tests in 1.6.x

* py3 to p2

Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: waytrue17 <52505574+waytrue17@users.noreply.github.com>
Co-authored-by: Wei Chu <weichu@amazon.com>
Co-authored-by: Leonard Lausen <leonard@lausen.nl>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants