Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Test] tests/python/contrib/test_ethosu #10300

Closed
driazati opened this issue Feb 18, 2022 · 7 comments
Closed

[Flaky Test] tests/python/contrib/test_ethosu #10300

driazati opened this issue Feb 18, 2022 · 7 comments
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it

Comments

@driazati
Copy link
Member

driazati commented Feb 18, 2022

Test(s)

  • tests/python/contrib/test_ethosu/test_codegen.py::test_tflite_pack[ifm_shapes1-1-ethos-u65-256]
  • tests/python/contrib/test_ethosu/test_codegen.py::test_tflite_transpose_convolution[False-ifm_shape3-ofm_shape3-kernel_shape3-VALID-ethos-u55-128]
  • tests/python/contrib/test_ethosu/test_codegen.py::test_tflite_expand_dims[ifm_shape0-0-ethos-u55-64]
  • tests/python/contrib/test_ethosu/test_codegen.py::test_tflite_leaky_relu[0.2-ifm_shape1-ethos-u65-256]
  • tests/python/contrib/test_ethosu/test_lookup_table.py::test_tflite_lut_activations[ethos-u55-128]
  • tests/python/contrib/test_ethosu/test_codegen.py::test_tflite_resize2d_bilinear[ifm_shape0-size0-False-ethos-u65-256]
  • tests/python/contrib/test_ethosu/test_codegen.py::test_tflite_sigmoid[ethos-u65-256]

Jenkins Links

Errors are similar to failures from #10213

cc @lhutton1 @ekalda @manupa-arm

driazati added a commit to driazati/tvm that referenced this issue Feb 18, 2022
masahi pushed a commit that referenced this issue Feb 18, 2022
See #10300

Co-authored-by: driazati <driazati@users.noreply.github.com>
@driazati driazati changed the title [CI Problem] Flaky tests in tests/python/contrib/test_ethosu [Flaky Test] tests/python/contrib/test_ethosu Feb 18, 2022
@lhutton1
Copy link
Contributor

Thanks for raising this @driazati, we'll take a look today

@manupak
Copy link
Contributor

manupak commented Feb 25, 2022

We are unable to reproduce these as of today and we are not seeing these in our nightly downstream builds either.

I think landing this : #10214 would really help.

@manupak
Copy link
Contributor

manupak commented Feb 26, 2022

@ekalda @lhutton1 , so it seems the flaky ness happens to come from the run, until we know better, I would suggest adding a retry to the FVP run. WDYT?

@lhutton1
Copy link
Contributor

Thanks @manupa-arm I think that would be a good temporary solution until we can figure out the cause of the failures, I've attempted this in #10408

@lhutton1
Copy link
Contributor

lhutton1 commented Mar 2, 2022

After #10214 we witnessed a flaky test failure in #10445:

E           RuntimeError: Subprocess failed: make -f /workspace/tests/python/relay/aot/corstone300.mk build_dir=/tmp/tmpel9x370d/test/build CFLAGS='-DTVM_RUNTIME_ALLOC_ALIGNMENT_BYTES=8 ' TVM_ROOT=/workspace/tests/python/relay/aot/../../../.. AOT_TEST_ROOT=/workspace/tests/python/relay/aot CODEGEN_ROOT=/tmp/tmpel9x370d/test/codegen STANDALONE_CRT_DIR=/workspace/build/standalone_crt FVP_DIR=/opt/arm/FVP_Corstone_SSE-300/models/Linux64_GCC-6.4/ run
[2022-03-02T11:44:54.304Z] E           stdout:
[2022-03-02T11:44:54.304Z] E           
[2022-03-02T11:44:54.304Z] E           --------------------------------------------------------------------------------2022-03-02 11:29:32: Execute (/tmp/tmpel9x370d/test/build): make -f /workspace/tests/python/relay/aot/corstone300.mk build_dir=/tmp/tmpel9x370d/test/build CFLAGS='-DTVM_RUNTIME_ALLOC_ALIGNMENT_BYTES=8 ' TVM_ROOT=/workspace/tests/python/relay/aot/../../../.. AOT_TEST_ROOT=/workspace/tests/python/relay/aot CODEGEN_ROOT=/tmp/tmpel9x370d/test/codegen STANDALONE_CRT_DIR=/workspace/build/standalone_crt FVP_DIR=/opt/arm/FVP_Corstone_SSE-300/models/Linux64_GCC-6.4/ run
[2022-03-02T11:44:54.304Z] E           --------------------------------------------------------------------------------
[2022-03-02T11:44:54.304Z] E           find: '/tmp/tmpel9x370d/test/codegen/host/src/*.cc': No such file or directory
[2022-03-02T11:44:54.304Z] E           /opt/arm/FVP_Corstone_SSE-300/models/Linux64_GCC-6.4//FVP_Corstone_SSE-300_Ethos-U55 -C cpu0.CFGDTCMSZ=15 \
[2022-03-02T11:44:54.304Z] E           -C cpu0.CFGITCMSZ=15 -C mps3_board.uart0.out_file=\"-\" -C mps3_board.uart0.shutdown_tag=\"EXITTHESIM\" \
[2022-03-02T11:44:54.304Z] E           -C mps3_board.visualisation.disable-visualisation=1 -C mps3_board.telnetterminal0.start_telnet=0 \
[2022-03-02T11:44:54.304Z] E           -C mps3_board.telnetterminal1.start_telnet=0 -C mps3_board.telnetterminal2.start_telnet=0 -C mps3_board.telnetterminal5.start_telnet=0 \
[2022-03-02T11:44:54.304Z] E           -C ethosu.extra_args="--fast" \
[2022-03-02T11:44:54.304Z] E           -C ethosu.num_macs=256 /tmp/tmpel9x370d/test/build/aot_test_runner
[2022-03-02T11:44:54.304Z] E           telnetterminal0: Listening for serial connection on port 5000
[2022-03-02T11:44:54.304Z] E           telnetterminal1: Listening for serial connection on port 5001
[2022-03-02T11:44:54.304Z] E           telnetterminal2: Listening for serial connection on port 5002
[2022-03-02T11:44:54.304Z] E           telnetterminal5: Listening for serial connection on port 5003
[2022-03-02T11:44:54.304Z] E           
[2022-03-02T11:44:54.304Z] E           Stopping simulation...
[2022-03-02T11:44:54.304Z] E           
[2022-03-02T11:44:54.304Z] E           
[2022-03-02T11:44:54.304Z] E           Info: /OSCI/SystemC: Simulation stopped by user.
[2022-03-02T11:44:54.304Z] E           Terminated
[2022-03-02T11:44:54.304Z] E           /workspace/tests/python/relay/aot/corstone300.mk:129: recipe for target 'run' failed
[2022-03-02T11:44:54.304Z] E           make: *** [run] Error 143

(https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-10445/1/pipeline/)

@manupak
Copy link
Contributor

manupak commented Mar 2, 2022

It seems like model library format extraction seems missing (or the temp directory itself goes missing) and that happens intermittently. Any thoughts why would this happen in the CI ?

@areusch @leandron @driazati @grant-arm @Mousius

lhutton1 added a commit to lhutton1/tvm that referenced this issue Mar 7, 2022
Uses a python temporary directory with a context manager in an effort to
solve the flaky FVP tests raised in
apache#10300 and
apache#10314. Now that CI is becoming
more and more parallelized, the thinking is that the python temporary
directory implementation might be more stable than `utils.tempdir`.
Removing the XFail markings off the affected tests, but keeping the
work around implemented in apache#10408
while we monitor with the above change.

Change-Id: Id07869b51cd2278ec4885ef964bc1b23892ba235
lhutton1 added a commit to lhutton1/tvm that referenced this issue Mar 8, 2022
Uses a python temporary directory with a context manager in an effort to
solve the flaky FVP tests raised in
apache#10300 and
apache#10314. Now that CI is becoming
more and more parallelized, the thinking is that the python temporary
directory implementation might be more stable than `utils.tempdir`.
Removing the XFail markings off the affected tests, but keeping the
work around implemented in apache#10408
while we monitor with the above change.

Change-Id: Id07869b51cd2278ec4885ef964bc1b23892ba235
manupak pushed a commit that referenced this issue Mar 10, 2022
* [AOT] Use python temporary directory for AOT tests

Uses a python temporary directory with a context manager in an effort to
solve the flaky FVP tests raised in
#10300 and
#10314. Now that CI is becoming
more and more parallelized, the thinking is that the python temporary
directory implementation might be more stable than `utils.tempdir`.
Removing the XFail markings off the affected tests, but keeping the
work around implemented in #10408
while we monitor with the above change.

Change-Id: Id07869b51cd2278ec4885ef964bc1b23892ba235

* alter context manager to make more readable

Change-Id: Iba0644db14e50648f6dc99a4ed0f455641c31912
pfk-beta pushed a commit to pfk-beta/tvm that referenced this issue Apr 11, 2022
See apache#10300

Co-authored-by: driazati <driazati@users.noreply.github.com>
pfk-beta pushed a commit to pfk-beta/tvm that referenced this issue Apr 11, 2022
* [AOT] Use python temporary directory for AOT tests

Uses a python temporary directory with a context manager in an effort to
solve the flaky FVP tests raised in
apache#10300 and
apache#10314. Now that CI is becoming
more and more parallelized, the thinking is that the python temporary
directory implementation might be more stable than `utils.tempdir`.
Removing the XFail markings off the affected tests, but keeping the
work around implemented in apache#10408
while we monitor with the above change.

Change-Id: Id07869b51cd2278ec4885ef964bc1b23892ba235

* alter context manager to make more readable

Change-Id: Iba0644db14e50648f6dc99a4ed0f455641c31912
@areusch areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022
@driazati
Copy link
Member Author

none of these tests are skipped / failing lately so closing this issue

lhutton1 added a commit to lhutton1/tvm that referenced this issue Jun 29, 2023
Now that apache#10300 and apache#10314 have been closed, we should be able to
remove an old previous attempt to help resolve test flakyness.

Change-Id: I70c09d6ba5ffc5cb15d0b775732bd048b7ebfbb4
lhutton1 added a commit to lhutton1/tvm that referenced this issue Jun 29, 2023
Now that apache#10300 and apache#10314 have been closed, we should be able to
remove an old previous attempt to help resolve test flakiness.

Change-Id: I70c09d6ba5ffc5cb15d0b775732bd048b7ebfbb4
lhutton1 added a commit that referenced this issue Jul 3, 2023
Now that #10300 and #10314 have been closed, we should be able to
remove an old previous attempt to help resolve test flakiness.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it
Projects
None yet
Development

No branches or pull requests

4 participants