Bugfix/test timeout #807

darbyjohnston · 2020-08-08T02:07:48Z

Hi,

Running the OpenEXR.IlmImf test on one of my dev machines takes about 38 minutes which exceeds the default CTest timeout of 1500 seconds, causing it to be marked as failing. This change just doubles the timeout to 3000 seconds. The machine is not terribly slow, it's a quad core Xeon @ 3.7Ghz, maybe the slower file access on Windows is partially at fault?

This is on Windows 10 with Visual Studio 2019, compiled with Debug/x64.

Test project C:/dev/openexr/openexr-build
Start 1: IlmBase.Half
1/5 Test #1: IlmBase.Half .....................   Passed    7.16 sec
Start 2: IlmBase.Iex
2/5 Test #2: IlmBase.Iex ......................   Passed    0.01 sec
Start 3: IlmBase.Imath
3/5 Test #3: IlmBase.Imath ....................   Passed   31.03 sec
Start 4: OpenEXR.IlmImf
4/5 Test #4: OpenEXR.IlmImf ...................   Passed  2343.49 sec
Start 5: OpenEXR.IlmImfUtil
5/5 Test #5: OpenEXR.IlmImfUtil ...............   Passed  100.89 sec

100% tests passed, 0 tests failed out of 5

Total Test time (real) = 2482.62 sec

Signed-off-by: Darby Johnston <darbyjohnston@yahoo.com>

meshula · 2020-08-08T23:37:11Z

I wonder if there's a smart way to cut the test4 time in half....

darbyjohnston · 2020-08-10T19:17:37Z

For comparison, here are the test times on the same hardware with CentOS 7 (dual boot machine):

Test project /home/darby/dev/openexr/openexr-build
Start  1: IlmBase.Half
1/11 Test  #1: IlmBase.Half .........................   Passed    1.03 sec
Start  2: IlmBase.Iex
2/11 Test  #2: IlmBase.Iex ..........................   Passed    0.00 sec
Start  3: IlmBase.Imath
3/11 Test  #3: IlmBase.Imath ........................   Passed    3.54 sec
Start  4: OpenEXR.IlmImf
4/11 Test  #4: OpenEXR.IlmImf .......................   Passed  338.07 sec
Start  5: OpenEXR.IlmImfUtil
5/11 Test  #5: OpenEXR.IlmImfUtil ...................   Passed   22.94 sec

I can't imagine there is that large of a difference between CPU usage, so I suspect it's the disk access. I double checked that disk compression was turned off on Windows 10:

> fsutil behavior query disablecompression
DisableCompression = 1  (Enabled)

Both tests were run on SATA SSDs.

cary-ilm · 2020-08-10T22:09:03Z

@darbyjohnston, your test times are definitely indicating there is something noteworthy about your machine, not just slower but slower in an unexpected way. Which you could argue is a good reason to have the test fail, or you wouldn't have noticed it. Is there a way to specify the timeout limit at execution time? Just because I was curious, I timed the individual steps in IlmImfTest, wondering if it would make sense to split the test into pieces. 160.57 testHuf 118.04 testOptimizedInterleavePatterns 63.93 testDeepTiledBasic 61.01 testDwaCompressorSimd 51.94 testTiledLineOrder 42.21 testMultiTiledPartThreading 37.33 testDeepScanLineBasic 36.06 testDwaLookups 34.10 testMultiPartFileMixingBasic 26.96 testSharedFrameBuffer 26.80 testCopyDeepScanLine 25.10 testLargeDataWindowOffsets 18.45 testScanLineApi 14.17 testOptimized 9.61 testRgbaThreading 9.36 testCopyDeepTiled 9.02 testYca 8.65 testTiledRgba 6.96 testTiledCopyPixels 6.96 testCompression 4.77 testCompositeDeepScanLine 4.00 testCopyMultiPartFile 3.52 testRgba 3.39 testRle 2.15 testMultiPartThreading 1.81 testConversion 1.59 testMultiPartApi 1.54 testInputPart 1.52 testNativeFormat 0.98 testTiledCompression 0.92 testWav 0.90 testTiledYa 0.77 testFutureProofing 0.65 testCopyPixels 0.56 testSampleImages 0.48 testStandardAttributes 0.42 testPreviewImage 0.15 testMultiScanlinePartThreading 0.07 testDeepScanLineMultipleRead 0.05 testIsComplete 0.05 testExistingStreams 0.03 testBadTypeAttributes 0.02 testLut 0.02 testLineOrder 0.01 testCustomAttributes 0.01 testBackwardCompatibility 0.01 testAttributes 0.00 testXdr 0.00 testPartHelper 0.00 testMultiView 0.00 testMultiPartSharedAttributes 0.00 testMagic 0.00 testChannels 0.00 testB 44ExpLogTable

…

On Mon, Aug 10, 2020 at 12:18 PM Darby Johnston ***@***.***> wrote: For comparison, here are the test times on the same hardware with CentOS 7 (dual boot machine): Test project /home/darby/dev/openexr/openexr-build Start 1: IlmBase.Half 1/11 Test #1: IlmBase.Half ......................... Passed 1.03 sec Start 2: IlmBase.Iex 2/11 Test #2: IlmBase.Iex .......................... Passed 0.00 sec Start 3: IlmBase.Imath 3/11 Test #3: IlmBase.Imath ........................ Passed 3.54 sec Start 4: OpenEXR.IlmImf 4/11 Test #4: OpenEXR.IlmImf ....................... Passed 338.07 sec Start 5: OpenEXR.IlmImfUtil 5/11 Test #5: OpenEXR.IlmImfUtil ................... Passed 22.94 sec I can't imagine there is that large of a difference between CPU usage, so I suspect it's the disk access. I double checked that disk compression was turned off on Windows 10: > fsutil behavior query disablecompression DisableCompression = 1 (Enabled) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#807 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFC3DGKRLLOIY65NG46LOH3SABBWRANCNFSM4PYJRM7A> .

-- Cary Phillips | R&D Supervisor | ILM | San Francisco

darbyjohnston · 2020-08-11T00:38:38Z

Hi @cary-ilm,

I tried testing on another machine (Windows 10 Home, 8 core Ryzen, NVMe drive), which I thought would be faster, and actually got worse results:

Test project C:/dev/openexr/openexr-build
Start 1: IlmBase.Half
Test #1: IlmBase.Half .....................   Passed    9.45 sec
Start 2: IlmBase.Iex
Test #2: IlmBase.Iex ......................   Passed    0.06 sec
Start 3: IlmBase.Imath
Test #3: IlmBase.Imath ....................   Passed   39.19 sec
Start 4: OpenEXR.IlmImf
Test #4: OpenEXR.IlmImf ...................   Passed  2900.37 sec
Start 5: OpenEXR.IlmImfUtil
Test #5: OpenEXR.IlmImfUtil ...............   Passed  131.07 sec

Maybe this could also be from building the code in "Debug" mode, which seems to have a larger performance impact on Windows than Linux?

Is there a way to specify the timeout limit at execution time?

I believe you can also set the timeout on the CTest command line.

kdt3rd · 2020-11-07T19:06:23Z

@darbyjohnston - I do not know what it is, but in trying to get the github actions workflows running for windows, I also noticed that in debug mode, the windows tests are extremely slow (optimized, they seem fine). On the current master, I have split it such that each test runs as a separate test, hoping I could ask you to compile that in debug and narrow down which test is particularly slow and maybe do a bit of a profile to see what it is about that which is so much slower in debug? thanks in advance...

The main branch will will be in a high state of development flux for a little while as we're doing a full re-org after the Imath split, so it may look a bit different than what you have setup currently...

meshula · 2020-11-09T22:14:36Z

I don't have a reference available at the moment, but there is a large amount of heap-sanity checking that goes on in a MSVC debug build and hence the terrible performance. I believe that under VS2019 preprocessor macros were introduced that elide that facility and get comparable performance times to release (at least same order-of-magnitude, anyway). Given that release and debug builds under msvc have vastly different runtime behavior (zero-initialized versus not, sentinel values for constructed objects versus destructed objects, 0xfeeefeee to indicate freed allocations, and so on), I wonder what we might gain from running the tests in an msvc debug mode in general though?

peterhillman · 2020-11-09T23:49:47Z

I wonder what we might gain from running the tests in an msvc debug mode in general though?

Presumably a debug build of OpenEXRTest needs to be available to debug any issues reported by running tests in release mode. That would likely be done by running the OpenEXRTest binary directly, rather than via the ctest mechanism.

Now @kdt3rd has made it run tests separately, perhaps ctest in debug mode could default to running one brief test (e.g. testMultiPartApi) just to make sure that the OpenEXRTest binary builds and runs. Release mode ctest should still run all the tests.

meshula · 2020-11-10T00:12:20Z

@peterhillman I'm not arguing against having a debug build, clearly we benefit from that for debugging. I'm arguing that running (as opposed to building) the entirety of the full suite in debug mode isn't providing useful information in and of itself. To devil's advocate my own suggestion about running that build, it is true that the MSVC diagnostics detect double deletes and freed memory access, but I would argue that ASAN, TSAN, and UBSAN should be our focus instead on that front.

Anyway, as you note:

Now @kdt3rd has made it run tests separately, perhaps ctest in debug mode could default to running one brief test (e.g. testMultiPartApi) just to make sure that the OpenEXRTest binary builds and runs. Release mode ctest should still run all the tests

I think that would be an excellent resolution to the issue.

darbyjohnston · 2020-11-13T16:49:22Z

Hi Nick, the message you posted to OpenEXR dev discussion yesterday, "slow tests on Windows", I assume is related to this? I tried syncing the latest from master and using:

set(CMAKE_C_FLAGS /GS-)
set(CMAKE_CXX_FLAGS /GS-)
add_definitions(-D_ITERATOR_DEBUG_LEVEL=0)
add_definitions(-D_HAS_ITERATOR_DEBUGGING=0)

This reduced the test run time in debug mode from 2877 seconds to 2266 seconds.

When I initially opened this pull request I assumed that the tests should be run in debug mode, but if that's not the case then maybe this is a non issue? Instead of special casing the debug mode to only run a single test, maybe just don't run the tests at all for the GitHub actions Windows/debug run? As Peter mentioned if a test fails and needs to be debugged on Windows then a developer can run the OpenEXRTest binary directly.

meshula · 2020-11-13T19:42:49Z

That didn't make as a big a difference as I expected!

Don't run theGitHub actions to test specifically Windows/debug sounds like the right move to me.

darbyjohnston added 2 commits August 7, 2020 18:55

Increase test timeout

ddbb69e

Signed-off-by: Darby Johnston <darbyjohnston@yahoo.com>

Merge branch 'master' into bugfix/test_timeout

d1cb306

darbyjohnston closed this Nov 24, 2020

darbyjohnston deleted the bugfix/test_timeout branch September 2, 2022 19:20

henrirosten mentioned this pull request Jul 2, 2023

openexr_2: fix IlmImf test timeout NixOS/nixpkgs#240660

Closed

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/test timeout #807

Bugfix/test timeout #807

darbyjohnston commented Aug 8, 2020

meshula commented Aug 8, 2020

darbyjohnston commented Aug 10, 2020 •

edited

Loading

cary-ilm commented Aug 10, 2020 via email

darbyjohnston commented Aug 11, 2020

kdt3rd commented Nov 7, 2020

meshula commented Nov 9, 2020

peterhillman commented Nov 9, 2020

meshula commented Nov 10, 2020

darbyjohnston commented Nov 13, 2020

meshula commented Nov 13, 2020

Bugfix/test timeout #807

Bugfix/test timeout #807

Conversation

darbyjohnston commented Aug 8, 2020

meshula commented Aug 8, 2020

darbyjohnston commented Aug 10, 2020 • edited Loading

cary-ilm commented Aug 10, 2020 via email

darbyjohnston commented Aug 11, 2020

kdt3rd commented Nov 7, 2020

meshula commented Nov 9, 2020

peterhillman commented Nov 9, 2020

meshula commented Nov 10, 2020

darbyjohnston commented Nov 13, 2020

meshula commented Nov 13, 2020

darbyjohnston commented Aug 10, 2020 •

edited

Loading