-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[v1.7] Fix the monitor_callback invalid issue during calibration with variable input shapes #18703
Conversation
… variable input shapes (apache#18632) * Fix the monitor_callback invalid issue during calibration with variable input shapes * retrigger CI * Add UT for monitor check and disable codecov
Hey @ciyongch , Thanks for submitting the PR
CI supported jobs: [edge, centos-gpu, centos-cpu, windows-cpu, miscellaneous, website, clang, windows-gpu, unix-cpu, sanity, unix-gpu] Note: |
@mxnet-bot run ci [unix-gpu] |
Jenkins CI successfully triggered : [unix-gpu] |
@mxnet-bot run ci [unix-gpu] |
Jenkins CI successfully triggered : [unix-gpu] |
Hi @TaoLv @szha @leezu @ChaiBapchya , please help to take a review and merge. |
I just rebased your source branch. @ciyongch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the cherry-pick!
@mxnet-bot run ci [unix-gpu] |
Jenkins CI successfully triggered : [unix-gpu] |
@mxnet-bot run ci [unix-cpu] |
Jenkins CI successfully triggered : [unix-cpu] |
@mxnet-bot run ci [unix-cpu] |
Jenkins CI successfully triggered : [unix-cpu] |
Hi @TaoLv , please help to merge. |
… variable input shapes (apache#18632) (apache#18703) * Fix the monitor_callback invalid issue during calibration with variable input shapes * retrigger CI * Add UT for monitor check and disable codecov Co-authored-by: Tao Lv <tao.a.lv@intel.com>
* * Fix einsum gradient (#18482) * [v1.7.x] Backport PRs of numpy features (#18653) * add zero grad for npi_unique (#18080) * fix np.clip scalar input case (#17788) * fix true_divide (#18393) Co-authored-by: Hao Jin <hjjn.amzn@gmail.com> Co-authored-by: Xi Wang <xidulu@gmail.com> * [v1.7.x] backport mixed type binary ops to v1.7.x (#18649) * Fix Windows GPU CI (#17962) Update Windows CI to use VS 2019 and enable x64 bit toolchain. Previously we are using an older 32 bit toolchain causing OOM errors during linking. Switching to x64 bit toolchain on the older VS version previously used by the CI was attempted in #17912 and did not work. Update to Cuda 10.2 as it is required by VS 2019. Switch to ninja-build on Windows to speed up build as ninja-build is now preinstalled. Remove logic to install cmake 3.16 on every PR as cmake 3.17 is now preinstalled. Add build retrials due to cuda thrust + VS2019 flakyness. Co-authored-by: vexilligera <vexilligera@gmail.com> * backport mixed type Co-authored-by: Leonard Lausen <lausen@amazon.com> Co-authored-by: vexilligera <vexilligera@gmail.com> * revise activations (#18700) * [v1.6] Fix the monitor_callback invalid issue during calibration with variable input shapes (#18632) (#18703) * Fix the monitor_callback invalid issue during calibration with variable input shapes * retrigger CI * Add UT for monitor check and disable codecov Co-authored-by: Tao Lv <tao.a.lv@intel.com> * Fail build_windows.py if all retries failed (#18177) * Update to thrust 1.9.8 on Windows (#18218) * Update to thrust 1.9.8 on Windows * Remove debug logic * Re-enable build retries on MSVC (#18230) Updating thrust alone did not help. Similar issues (though less often) still occur with updated thrust, and also with nvidia cub. Tracked upstream at NVIDIA/thrust#1090 Co-authored-by: Ke Han <38852697+hanke580@users.noreply.github.com> Co-authored-by: Xingjian Shi <xshiab@connect.ust.hk> Co-authored-by: Hao Jin <hjjn.amzn@gmail.com> Co-authored-by: Xi Wang <xidulu@gmail.com> Co-authored-by: Yijun Chen <chenyijun0902@gmail.com> Co-authored-by: vexilligera <vexilligera@gmail.com> Co-authored-by: ciyong <ciyong.chen@intel.com> Co-authored-by: Tao Lv <tao.a.lv@intel.com>
* * Fix einsum gradient (apache#18482) * [v1.7.x] Backport PRs of numpy features (apache#18653) * add zero grad for npi_unique (apache#18080) * fix np.clip scalar input case (apache#17788) * fix true_divide (apache#18393) Co-authored-by: Hao Jin <hjjn.amzn@gmail.com> Co-authored-by: Xi Wang <xidulu@gmail.com> * [v1.7.x] backport mixed type binary ops to v1.7.x (apache#18649) * Fix Windows GPU CI (apache#17962) Update Windows CI to use VS 2019 and enable x64 bit toolchain. Previously we are using an older 32 bit toolchain causing OOM errors during linking. Switching to x64 bit toolchain on the older VS version previously used by the CI was attempted in apache#17912 and did not work. Update to Cuda 10.2 as it is required by VS 2019. Switch to ninja-build on Windows to speed up build as ninja-build is now preinstalled. Remove logic to install cmake 3.16 on every PR as cmake 3.17 is now preinstalled. Add build retrials due to cuda thrust + VS2019 flakyness. Co-authored-by: vexilligera <vexilligera@gmail.com> * backport mixed type Co-authored-by: Leonard Lausen <lausen@amazon.com> Co-authored-by: vexilligera <vexilligera@gmail.com> * revise activations (apache#18700) * [v1.6] Fix the monitor_callback invalid issue during calibration with variable input shapes (apache#18632) (apache#18703) * Fix the monitor_callback invalid issue during calibration with variable input shapes * retrigger CI * Add UT for monitor check and disable codecov Co-authored-by: Tao Lv <tao.a.lv@intel.com> * Fail build_windows.py if all retries failed (apache#18177) * Update to thrust 1.9.8 on Windows (apache#18218) * Update to thrust 1.9.8 on Windows * Remove debug logic * Re-enable build retries on MSVC (apache#18230) Updating thrust alone did not help. Similar issues (though less often) still occur with updated thrust, and also with nvidia cub. Tracked upstream at NVIDIA/thrust#1090 Co-authored-by: Ke Han <38852697+hanke580@users.noreply.github.com> Co-authored-by: Xingjian Shi <xshiab@connect.ust.hk> Co-authored-by: Hao Jin <hjjn.amzn@gmail.com> Co-authored-by: Xi Wang <xidulu@gmail.com> Co-authored-by: Yijun Chen <chenyijun0902@gmail.com> Co-authored-by: vexilligera <vexilligera@gmail.com> Co-authored-by: ciyong <ciyong.chen@intel.com> Co-authored-by: Tao Lv <tao.a.lv@intel.com>
* * Fix einsum gradient (#18482) * [v1.7.x] Backport PRs of numpy features (#18653) * add zero grad for npi_unique (#18080) * fix np.clip scalar input case (#17788) * fix true_divide (#18393) Co-authored-by: Hao Jin <hjjn.amzn@gmail.com> Co-authored-by: Xi Wang <xidulu@gmail.com> * [v1.7.x] backport mixed type binary ops to v1.7.x (#18649) * Fix Windows GPU CI (#17962) Update Windows CI to use VS 2019 and enable x64 bit toolchain. Previously we are using an older 32 bit toolchain causing OOM errors during linking. Switching to x64 bit toolchain on the older VS version previously used by the CI was attempted in #17912 and did not work. Update to Cuda 10.2 as it is required by VS 2019. Switch to ninja-build on Windows to speed up build as ninja-build is now preinstalled. Remove logic to install cmake 3.16 on every PR as cmake 3.17 is now preinstalled. Add build retrials due to cuda thrust + VS2019 flakyness. Co-authored-by: vexilligera <vexilligera@gmail.com> * backport mixed type Co-authored-by: Leonard Lausen <lausen@amazon.com> Co-authored-by: vexilligera <vexilligera@gmail.com> * revise activations (#18700) * [v1.6] Fix the monitor_callback invalid issue during calibration with variable input shapes (#18632) (#18703) * Fix the monitor_callback invalid issue during calibration with variable input shapes * retrigger CI * Add UT for monitor check and disable codecov Co-authored-by: Tao Lv <tao.a.lv@intel.com> * Fail build_windows.py if all retries failed (#18177) * Update to thrust 1.9.8 on Windows (#18218) * Update to thrust 1.9.8 on Windows * Remove debug logic * Re-enable build retries on MSVC (#18230) Updating thrust alone did not help. Similar issues (though less often) still occur with updated thrust, and also with nvidia cub. Tracked upstream at NVIDIA/thrust#1090 Co-authored-by: Ke Han <38852697+hanke580@users.noreply.github.com> Co-authored-by: Xingjian Shi <xshiab@connect.ust.hk> Co-authored-by: Hao Jin <hjjn.amzn@gmail.com> Co-authored-by: Xi Wang <xidulu@gmail.com> Co-authored-by: Yijun Chen <chenyijun0902@gmail.com> Co-authored-by: vexilligera <vexilligera@gmail.com> Co-authored-by: ciyong <ciyong.chen@intel.com> Co-authored-by: Tao Lv <tao.a.lv@intel.com> Co-authored-by: Leonard Lausen <lausen@amazon.com> Co-authored-by: Ke Han <38852697+hanke580@users.noreply.github.com> Co-authored-by: Xingjian Shi <xshiab@connect.ust.hk> Co-authored-by: Hao Jin <hjjn.amzn@gmail.com> Co-authored-by: Xi Wang <xidulu@gmail.com> Co-authored-by: Yijun Chen <chenyijun0902@gmail.com> Co-authored-by: vexilligera <vexilligera@gmail.com> Co-authored-by: ciyong <ciyong.chen@intel.com> Co-authored-by: Tao Lv <tao.a.lv@intel.com>
Description
Backport #18632 to addressed the calibration invalid issue with variable input shapes.
@szha @TaoLv @ChaiBapchya