-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Split Integration tests out of first phase of pipeline #9128
Conversation
I took a look at the time taken by each stage in the Jenkins pipeline and what comprises the 6 hour CI build time. CPU Integration tests took `65` minutes of the `100` minutes of `Build: CPU`. By adding `python3: CPU` with just those Integration tests, it lines up with `python3: GPU` and `python3: i386` which both take a similar amount of time and takes roughly 60 minutes off the overall run time. Numbers copied from sample successful run (final time approx: 358 minutes): |Phase|ID |Job |Minutes |Start| |-----|-----------------------------|------|---------------------------------------------|-----| |0 |0 |Sanity|3 |0 | |1 |0 |BUILD: arm|2 |3 | |1 |1 |BUILD: i386|33 |3 | |1 |2 |BUILD: CPU|100 |3 | |1 |3 |BUILD: GPU|25 |3 | |1 |4 |BUILD: QEMU|6 |3 | |1 |5 |BUILD: WASM|2 |3 | |2 |0 |java: GPU|1 |103 | |2 |1 |python3: GPU|66 |103 | |2 |2 |python3: arm|22 |103 | |2 |3 |python3: i386|70 |103 | |3 |0 |docs: GPU|3 |173 | |3 |1 |frontend: CPU|40 |173 | |3 |2 |frontend: GPU|185 |173 | |3 |3 |topi: GPU|110 |173 | | | | | | | Numbers predicted after change (final time approx: 293 minutes): |Phase|ID |Job |Minutes |Start| |-----|-----------------------------|------|---------------------------------------------|-----| |0 |0 |Sanity|3 |0 | |1 |0 |BUILD: arm|2 |3 | |1 |1 |BUILD: i386|33 |3 | |1 |2 |BUILD: CPU|35 |3 | |1 |3 |BUILD: GPU|25 |3 | |1 |4 |BUILD: QEMU|6 |3 | |1 |5 |BUILD: WASM|2 |3 | |2 |0 |java: GPU|1 |38 | |2 |1 |python3: GPU|66 |38 | |2 |2 |python3: arm|22 |38 | |2 |3 |python3: i386|70 |38 | |2 |4 |python3: CPU|60 |38 | |3 |0 |docs: GPU|3 |108 | |3 |1 |frontend: CPU|40 |108 | |3 |2 |frontend: GPU|185 |108 | |3 |3 |topi: GPU|110 |108 |
@tqchen can you comment on why we have integration tests in the first part? IIRC it was originally due to scarcity of GPU nodes but now perhaps we don't need to worry so much. wdyt? i agree with @Mousius assessment that the CPU is the long pole in the first phase, and switching to xdist will only make that more obvious. |
i agree we can do that, the main thing is to be able to test on staging before merge |
This is building, https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/160/pipeline only merge after this has gone green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait for CI staging run before merging.
e108fcf
to
8cbf84b
Compare
Had to re-push due to a flakey unit test on the PR build, docker-staging build is here now: |
In bias for action, I am going to merge this one feel free to follow up if other things need to happen. |
* main: Fix flaky NMS test by making sure scores are unique (apache#9140) [Relay] Merge analysis/context_analysis.cc and transforms/device_annotation.cc (apache#9038) [LLVM] Make changes needed for opaque pointers (apache#9138) Arm(R) Ethos(TM)-U NPU codegen integration (apache#8849) [CI] Split Integration tests out of first phase of pipeline (apache#9128) [Meta Schedule][M3b] Runner (apache#9111) Fix Google Mock differences between Ubuntu 18.04 and 16.04 (apache#9141) [TIR] add loop partition hint pragma (apache#9121) fix things (apache#9146) [Meta Schedule][M3a] SearchStrategy (apache#9132) [Frontend][PyTorch] support for quantized conv_transpose2d op (apache#9133) [UnitTest] Parametrized test_conv2d_int8_intrinsics (apache#9143) [OpenCL] Remove redundant visit statement in CodeGen. (apache#9144) [BYOC] support arbitrary input dims for add/mul/relu of dnnl c_src codegen (apache#9127) [Relay][ConvertLayout] Support for qnn.conv2d_transpose (apache#9139) add nn.global_avgpool to fq2i (apache#9137) [UnitTests] Enable minimum testing on Vulkan target in CI (apache#9093) [Torch] Support returning quantized weights and bias for BYOC use cases (apache#9135) [Relay] Prepare for new plan_devices.cc (part II) (apache#9130) [microTVM][Zephyr] Add MIMXRT1050 board support (apache#9068)
* main: (80 commits) Introduce centralised name transformation functions (apache#9088) [OpenCL] Add vectorization to cuda conv2d_nhwc schedule (apache#8636) [6/6] Arm(R) Ethos(TM)-U NPU codegen integration with `tvmc` (apache#8854) [microTVM] Add wrapper for creating project using a MLF (apache#9090) Fix typo (apache#9156) [Hotfix][Testing] Wait for RPCServer to be established (apache#9150) Update find cublas so it search default path if needed. (apache#9149) [TIR][LowerMatchBuffer] Fix lowering strides when source region has higher dimension than the buffer (apache#9145) Fix flaky NMS test by making sure scores are unique (apache#9140) [Relay] Merge analysis/context_analysis.cc and transforms/device_annotation.cc (apache#9038) [LLVM] Make changes needed for opaque pointers (apache#9138) Arm(R) Ethos(TM)-U NPU codegen integration (apache#8849) [CI] Split Integration tests out of first phase of pipeline (apache#9128) [Meta Schedule][M3b] Runner (apache#9111) Fix Google Mock differences between Ubuntu 18.04 and 16.04 (apache#9141) [TIR] add loop partition hint pragma (apache#9121) fix things (apache#9146) [Meta Schedule][M3a] SearchStrategy (apache#9132) [Frontend][PyTorch] support for quantized conv_transpose2d op (apache#9133) [UnitTest] Parametrized test_conv2d_int8_intrinsics (apache#9143) ...
) * [CI] Split Integration tests out of first phase of pipeline I took a look at the time taken by each stage in the Jenkins pipeline and what comprises the 6 hour CI build time. CPU Integration tests took `65` minutes of the `100` minutes of `Build: CPU`. By adding `python3: CPU` with just those Integration tests, it lines up with `python3: GPU` and `python3: i386` which both take a similar amount of time and takes roughly 60 minutes off the overall run time. Numbers copied from sample successful run (final time approx: 358 minutes): |Phase|ID |Job |Minutes |Start| |-----|-----------------------------|------|---------------------------------------------|-----| |0 |0 |Sanity|3 |0 | |1 |0 |BUILD: arm|2 |3 | |1 |1 |BUILD: i386|33 |3 | |1 |2 |BUILD: CPU|100 |3 | |1 |3 |BUILD: GPU|25 |3 | |1 |4 |BUILD: QEMU|6 |3 | |1 |5 |BUILD: WASM|2 |3 | |2 |0 |java: GPU|1 |103 | |2 |1 |python3: GPU|66 |103 | |2 |2 |python3: arm|22 |103 | |2 |3 |python3: i386|70 |103 | |3 |0 |docs: GPU|3 |173 | |3 |1 |frontend: CPU|40 |173 | |3 |2 |frontend: GPU|185 |173 | |3 |3 |topi: GPU|110 |173 | | | | | | | Numbers predicted after change (final time approx: 293 minutes): |Phase|ID |Job |Minutes |Start| |-----|-----------------------------|------|---------------------------------------------|-----| |0 |0 |Sanity|3 |0 | |1 |0 |BUILD: arm|2 |3 | |1 |1 |BUILD: i386|33 |3 | |1 |2 |BUILD: CPU|35 |3 | |1 |3 |BUILD: GPU|25 |3 | |1 |4 |BUILD: QEMU|6 |3 | |1 |5 |BUILD: WASM|2 |3 | |2 |0 |java: GPU|1 |38 | |2 |1 |python3: GPU|66 |38 | |2 |2 |python3: arm|22 |38 | |2 |3 |python3: i386|70 |38 | |2 |4 |python3: CPU|60 |38 | |3 |0 |docs: GPU|3 |108 | |3 |1 |frontend: CPU|40 |108 | |3 |2 |frontend: GPU|185 |108 | |3 |3 |topi: GPU|110 |108 | * Fix typo in ci_cpu commands
) * [CI] Split Integration tests out of first phase of pipeline I took a look at the time taken by each stage in the Jenkins pipeline and what comprises the 6 hour CI build time. CPU Integration tests took `65` minutes of the `100` minutes of `Build: CPU`. By adding `python3: CPU` with just those Integration tests, it lines up with `python3: GPU` and `python3: i386` which both take a similar amount of time and takes roughly 60 minutes off the overall run time. Numbers copied from sample successful run (final time approx: 358 minutes): |Phase|ID |Job |Minutes |Start| |-----|-----------------------------|------|---------------------------------------------|-----| |0 |0 |Sanity|3 |0 | |1 |0 |BUILD: arm|2 |3 | |1 |1 |BUILD: i386|33 |3 | |1 |2 |BUILD: CPU|100 |3 | |1 |3 |BUILD: GPU|25 |3 | |1 |4 |BUILD: QEMU|6 |3 | |1 |5 |BUILD: WASM|2 |3 | |2 |0 |java: GPU|1 |103 | |2 |1 |python3: GPU|66 |103 | |2 |2 |python3: arm|22 |103 | |2 |3 |python3: i386|70 |103 | |3 |0 |docs: GPU|3 |173 | |3 |1 |frontend: CPU|40 |173 | |3 |2 |frontend: GPU|185 |173 | |3 |3 |topi: GPU|110 |173 | | | | | | | Numbers predicted after change (final time approx: 293 minutes): |Phase|ID |Job |Minutes |Start| |-----|-----------------------------|------|---------------------------------------------|-----| |0 |0 |Sanity|3 |0 | |1 |0 |BUILD: arm|2 |3 | |1 |1 |BUILD: i386|33 |3 | |1 |2 |BUILD: CPU|35 |3 | |1 |3 |BUILD: GPU|25 |3 | |1 |4 |BUILD: QEMU|6 |3 | |1 |5 |BUILD: WASM|2 |3 | |2 |0 |java: GPU|1 |38 | |2 |1 |python3: GPU|66 |38 | |2 |2 |python3: arm|22 |38 | |2 |3 |python3: i386|70 |38 | |2 |4 |python3: CPU|60 |38 | |3 |0 |docs: GPU|3 |108 | |3 |1 |frontend: CPU|40 |108 | |3 |2 |frontend: GPU|185 |108 | |3 |3 |topi: GPU|110 |108 | * Fix typo in ci_cpu commands
I took a look at the time taken by each stage in the Jenkins pipeline and what comprises the 6 hour CI build time. CPU Integration tests took
60
minutes of the100
minutes ofBuild: CPU
. By addingpython3: CPU
with just those Integration tests, it lines up withpython3: GPU
andpython3: i386
which both take a similar amount of time and takes roughly 60 minutes off the overall run time.Numbers copied from sample successful run (final time approx: 358 minutes):
Numbers predicted after change (final time approx: 293 minutes):