Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Windows builds running out of heap space in CI #13958

Open
KellenSunderland opened this issue Jan 22, 2019 · 14 comments
Open

Windows builds running out of heap space in CI #13958

KellenSunderland opened this issue Jan 22, 2019 · 14 comments
Labels
Bug Build CI CMake CMake related bugs/issues/improvements Windows

Comments

@KellenSunderland
Copy link
Contributor

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-cpu/detail/PR-13917/9/pipeline

c:\program files (x86)\microsoft visual studio 14.0\vc\include\xlocnum(1144) : fatal error C1002: compiler is out of heap space in pass 2

jom: C:\jenkins_slave\workspace\build-cpu\build\CMakeFiles\mxnet.dir\build.make [CMakeFiles\mxnet.dir\src\operator\tensor\elemwise_unary_op_basic.cc.obj] Error 1

jom: C:\jenkins_slave\workspace\build-cpu\build\CMakeFiles\Makefile2 [CMakeFiles\mxnet.dir\all] Error 2

jom: C:\jenkins_slave\workspace\build-cpu\build\Makefile [all] Error 2

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels:

@KellenSunderland KellenSunderland changed the title Windows builds running out of heap space Windows builds running out of heap space in CI Jan 22, 2019
@frankfliu
Copy link
Contributor

@mxnet-label-bot add [Windows, CI, build, CMake]

@marcoabreu marcoabreu added Build CI CMake CMake related bugs/issues/improvements Windows labels Jan 22, 2019
@marcoabreu
Copy link
Contributor

Interesting, haven't seen that one before. Considering we got ~300GB of Swap enabled on Windows instances, it surprises me that we are allegedly running out of memory.

Do you have an idea what this could be caused by or if Swap is not applicable to that usecase?

Also, I found something:

"Link Time Code Generation" should be set to "Profile Guided Optimization - Optimization (/LTCG:PGOptimize)" instead of being blank.

Source: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/d2c4bb60-e558-4dc6-a0ba-47611d45bc86/c1002-compiler-is-out-of-heap-space-in-pass-2?forum=vcgeneral

@stereomatchingkiss
Copy link

Also, I found something:

I found that too, but can't solve the issue, you can refer to #14203

@haojin2
Copy link
Contributor

haojin2 commented Mar 14, 2019

I'm running into this issue for #14359 too. Any updates?

@stereomatchingkiss
Copy link

stereomatchingkiss commented Mar 14, 2019

I'm running into this issue for #14359 too. Any updates?

I would like to know too, maybe you could try 1.3.1, unless it works on windows.
You could check this blog if you want to compile 1.3.1 on windows, I have to warn you this is not a pleasant journey

@haojin2
Copy link
Contributor

haojin2 commented Mar 14, 2019

@stereomatchingkiss I actually only want my PR to pass CI instead of getting a local build. Thanks for your info all the same! @marcoabreu any updates on this issue?

@haojin2
Copy link
Contributor

haojin2 commented Apr 23, 2019

@marcoabreu is there any updates on this issue? I'm facing it again in one of my PRs: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-cpu/detail/PR-14773/2/pipeline.

@marcoabreu
Copy link
Contributor

@Chancebair has been working on windows lately, maybe he has some info

@haojin2
Copy link
Contributor

haojin2 commented Apr 23, 2019

@Chancebair Could you please provide some updates? This has been open for more than 3 months and I've personally experienced this error 5+ times and always had to apply some workarounds.

@Chancebair
Copy link
Contributor

I can dig a bit on this, however as context I worked on codifying windows workers dependencies last year and made no changes to the instances otherwise so it might be good to find someone with more windows development experience to provide their thoughts.

@kshitij12345
Copy link
Contributor

kshitij12345 commented Oct 20, 2019

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-cpu/detail/PR-15909/6/

Probably because broadcast_reduce_op.h is too big.

@damNull
Copy link
Contributor

damNull commented Dec 16, 2019

I meet the same problem too. And I use the following method to easily solve the problem.
Enter the property page of generated MXNet visual studio solution and edit the property:

"Link Time Code Generation" should be set to "Profile Guided Optimization - Optimization (/LTCG:PGOptimize)" instead of being blank.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Build CI CMake CMake related bugs/issues/improvements Windows
Projects
None yet
Development

No branches or pull requests

9 participants