Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some stressors of stress-ng will always fail #800

Closed
rickwu666666 opened this issue Oct 27, 2023 · 8 comments
Closed

Some stressors of stress-ng will always fail #800

rickwu666666 opened this issue Oct 27, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@rickwu666666
Copy link
Contributor

rickwu666666 commented Oct 27, 2023

Bug Description

We discovered that certain stressors in stress-ng would consistently fail due to being disabled by default, resulting in an exit code of 1. This situation caused the test job to fail, even though the stressor was not executed. But seems the latest version of stress-ng has resolved this issue.
However, we've noticed that stress-ng no longer supports the SNAP version. The stress-ng snap is no longer on the snap store and the snapcraft.yaml file has been removed from the git repo. This raises concerns about stress-ng's suitability for the UC environment.

For X86 platform:
https://certification.canonical.com/hardware/202310-32151/submission/339803/test-results/fail/
bad-ioctl
bind-mount
mlockmany
oom-pipe
sysinval
watchdog

For ARM64 platform:
https://certification.canonical.com/hardware/202304-31535/submission/326256/test-results/fail/?term=stressor
bind-mount
cpu-online
mlockmany
oom-pipe
smi
sysinval
watchdog

To Reproduce

  1. Run stress-ng stressor in checkbox shell
    $ sudo checkbox.shell
    $ stress-ng --watchdog 0 --timeout 30 --skip-silent --verbose

Environment

  • OS: UC20 and UC22
  • stress-ng version 0.15.07 and 0.15.09

Relevant log output

stress-ng: debug: [367553] invoked with 'stress-ng --bad-ioctl 0 --timeout 30 --skip-silent --verbose' by user 1000
stress-ng: debug: [367553] stress-ng 0.15.09 gefc98a49f14e
stress-ng: debug: [367553] system: Linux ubuntu 5.15.0-86-generic #96-Ubuntu SMP Wed Sep 20 08:23:49 UTC 2023 x86_64, glibc 2.35
stress-ng: debug: [367553] RAM total: 3.6G, RAM free: 1.6G, swap free: 0.0
stress-ng: debug: [367553] temporary file path: '/var/tmp', filesystem type: ext2 (13864259 blocks available)
stress-ng: debug: [367553] 4 processors online, 4 processors configured
stress-ng: info:  [367553] disabled 'bad-ioctl' as it may hang or reboot the machine (enable it with the --pathological option)
stress-ng: info:  [367553] setting to a 30 second run per stressor
stress-ng: error: [367553] No stress workers invoked
EXIT_CODE=1

Additional context

No response

@rickwu666666 rickwu666666 added the bug Something isn't working label Oct 27, 2023
@seankingyang
Copy link
Contributor

The stressor sockpair will take a very long long time.

stress-ng: debug: [1349] invoked with 'stress-ng --sockpair 0 --timeout 30 --skip-silent --verbose' by user 1000
stress-ng: debug: [1349] stress-ng 0.15.09 gefc98a49f14e
stress-ng: debug: [1349] system: Linux ubuntu 5.15.0-86-generic #96-Ubuntu SMP Wed Sep 20 08:23:49 UTC 2023 x86_64, glibc 2.35
stress-ng: debug: [1349] RAM total: 3.6G, RAM free: 2.7G, swap free: 0.0
stress-ng: debug: [1349] temporary file path: '/var/tmp', filesystem type: ext2 (13756827 blocks available)
stress-ng: debug: [1349] 4 processors online, 4 processors configured
stress-ng: info:  [1349] setting to a 30 second run per stressor
stress-ng: info:  [1349] dispatching hogs: 4 sockpair
stress-ng: debug: [1349] cache allocate: using cache maximum level L2
stress-ng: debug: [1349] cache allocate: shared cache buffer size: 1024K
stress-ng: debug: [1349] starting stressors
stress-ng: debug: [1349] 4 stressors started
stress-ng: debug: [1350] sockpair: [1350] started (instance 0 on CPU 3)
stress-ng: debug: [1353] sockpair: [1353] started (instance 3 on CPU 2)
stress-ng: debug: [1352] sockpair: [1352] started (instance 2 on CPU 1)
stress-ng: debug: [1351] sockpair: [1351] started (instance 1 on CPU 2)
stress-ng: debug: [1351] sockpair: [1351] exited (instance 1 on CPU 2)
stress-ng: debug: [1353] sockpair: [1353] exited (instance 3 on CPU 3)
stress-ng: debug: [1350] sockpair: [1350] exited (instance 0 on CPU 3)
stress-ng: debug: [1352] sockpair: [1352] exited (instance 2 on CPU 0)
stress-ng: debug: [1349] sockpair: [1350] terminated (success)
stress-ng: debug: [1349] sockpair: [1351] terminated (success)
stress-ng: debug: [1349] sockpair: [1352] terminated (success)
stress-ng: debug: [1349] sockpair: [1353] terminated (success)
stress-ng: metrc: [1349] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [1349]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [1349] sockpair         948165  33512.10      6.10  22488.29        28.29          42.15        16.78          1496
stress-ng: metrc: [1349] miscellaneous metrics:
stress-ng: metrc: [1349] sockpair          102402.81 socketpair calls sec (geometric mean of 4 instances)
stress-ng: metrc: [1349] sockpair               0.06 MB written per sec (geometric mean of 4 instances)
stress-ng: debug: [1349] metrics-check: all stressor metrics validated and sane
stress-ng: info:  [1349] passed: 4: sockpair (4)
stress-ng: info:  [1349] failed: 0
stress-ng: info:  [1349] skipped: 0
stress-ng: info:  [1349] successful run completed in 33512.44s (9 hours, 18 mins, 32.44 secs)
EXIT_CODE=0

@bladernr
Copy link
Collaborator

commit a650e32db690a6c63b7e08e7fd3fbb96545018b7
Author: Colin Ian King colin.i.king@gmail.com
Date: Wed Aug 23 18:11:28 2023 +0100

Remove snapcraft yaml file

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>

Probably worth reaching out to Colin and asking why he removed that.

@rickwu666666
Copy link
Contributor Author

After reaching out to Colin, this is the feedback from Colin:

  1. stress-ng uses nearly every system call and kernel interface when performing a range of deep and systematic stress tests on the kernel.
    Historically, the snap interfaces could never provide the exact functionality required to get stress-ng to run in a snap the same was as a normally packaged .deb version, unless it was run as a snap in devmode. It end up with stress-ng having to work around non-privileged error handling that is sometimes out of scope for how system calls normally work.
  2. Don't have enough resources to figuring out why snaps don't build on various architectures, or figuring out what to do when snapcraft deprecated features.
  3. Don't have enough resources to debug when installed as a snap the snap overhead on small ARM and RISC-V dev was skewing memory stressor tests and debugging these in a snap environment.
  4. The added overhead of stress-ng working through the extra protection layers of the snap containerization skewed the bogo-op benchmarks - stress-ng was really designed to be as close to the metal as possible so this extra layering was not helpful in shaking out tight race conditions in the kernel. If you want stress-ng to find kernel and hardware / memory issues, it's best to run it outside a container for a lot of the tests it runs.
  5. Stress-ng is backporting for all supported Ubuntu releases, so the stress-ng snap is redundant.
    https://launchpad.net/~colin-king/+archive/ubuntu/stress-ng
  6. The snaps are fundamentally flawed for tools like stress-ng because of the extra layering and memory footprint required to use them. Stress-ng uses shared libraries shared by other applications, and being to exercises these shared pages is part of the mindset behind tripping obscure kernel virtual memory bugs. When it's snapped these pages are less likely to be shared and this totally changes the way stress-ng exercises the kernel.

Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/CHECKBOX-1077.

This message was autogenerated

@pieqq
Copy link
Collaborator

pieqq commented Apr 11, 2024

I've created #1173 to ignore the stressors that would be marked as "disabled" by stress-ng.

@LiaoU3
Copy link
Contributor

LiaoU3 commented Apr 11, 2024

Me and @stanley31huang found that the latest verstion of stress-ng fixes this issue which means that return code is 0.

(base) vincent@vincent-XPS-9320:~$ stress-ng --version
stress-ng, version 0.16.05 (gcc 13.2.0, x86_64 Linux 6.5.0-27-generic) 💻🔥
(base) vincent@vincent-XPS-9320:~$ stress-ng --bad-ioctl 0 --timeout 30 --skip-silent --verbose
stress-ng: debug: [31713] invoked with 'stress-ng --bad-ioctl 0 --timeout 30 --skip-silent --verbose' by user 1000 'vincent'
stress-ng: debug: [31713] stress-ng 0.16.05
stress-ng: debug: [31713] system: Linux vincent-XPS-9320 6.5.0-27-generic #28-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar  7 18:21:00 UTC 2024 x86_64, gcc 13.2.0, glibc 2.38
stress-ng: debug: [31713] RAM total: 15.2G, RAM free: 1.6G, swap free: 1.9G
stress-ng: debug: [31713] temporary file path: '/home/vincent', filesystem type: ext2 (93904806 blocks available)
stress-ng: debug: [31713] CPUs have 5 idle states: C10 C1E C6 C8 POLL
stress-ng: debug: [31713] 16 processors online, 16 processors configured
stress-ng: info:  [31713] disabled 'bad-ioctl' as it may hang or reboot the machine (enable it with the --pathological option)
stress-ng: info:  [31713] setting to a 30 secs run per stressor
stress-ng: debug: [31713] CPU data cache: L1: 48K, L2: 1280K, L3: 12288K
stress-ng: debug: [31713] cache allocate: shared cache buffer size: 12288K
stress-ng: info:  [31713] dispatching hogs:
stress-ng: debug: [31713] starting stressors
stress-ng: debug: [31713] 0 stressors started
stress-ng: warn:  [31713] metrics-check: all bogo-op counters are zero, data may be incorrect
stress-ng: debug: [31713] metrics-check: all stressor metrics validated and sane
stress-ng: info:  [31713] skipped: 16: bad-ioctl (16)
stress-ng: info:  [31713] passed: 0
stress-ng: info:  [31713] failed: 0
stress-ng: info:  [31713] metrics untrustworthy: 0
stress-ng: info:  [31713] successful run completed in 0.00 secs
(base) vincent@vincent-XPS-9320:~$ echo $?
0
(base) vincent@vincent-XPS-9320:~$ stress-ng --bind-mount 0 --timeout 30 --skip-silent --verbose
stress-ng: debug: [31714] invoked with 'stress-ng --bind-mount 0 --timeout 30 --skip-silent --verbose' by user 1000 'vincent'
stress-ng: debug: [31714] stress-ng 0.16.05
stress-ng: debug: [31714] system: Linux vincent-XPS-9320 6.5.0-27-generic #28-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar  7 18:21:00 UTC 2024 x86_64, gcc 13.2.0, glibc 2.38
stress-ng: debug: [31714] RAM total: 15.2G, RAM free: 1.6G, swap free: 1.9G
stress-ng: debug: [31714] temporary file path: '/home/vincent', filesystem type: ext2 (93904802 blocks available)
stress-ng: debug: [31714] CPUs have 5 idle states: C10 C1E C6 C8 POLL
stress-ng: debug: [31714] 16 processors online, 16 processors configured
stress-ng: info:  [31714] disabled 'bind-mount' as it may hang or reboot the machine (enable it with the --pathological option)
stress-ng: info:  [31714] setting to a 30 secs run per stressor
stress-ng: debug: [31714] CPU data cache: L1: 48K, L2: 1280K, L3: 12288K
stress-ng: debug: [31714] cache allocate: shared cache buffer size: 12288K
stress-ng: info:  [31714] dispatching hogs:
stress-ng: debug: [31714] starting stressors
stress-ng: debug: [31714] 0 stressors started
stress-ng: warn:  [31714] metrics-check: all bogo-op counters are zero, data may be incorrect
stress-ng: debug: [31714] metrics-check: all stressor metrics validated and sane
stress-ng: info:  [31714] skipped: 16: bind-mount (16)
stress-ng: info:  [31714] passed: 0
stress-ng: info:  [31714] failed: 0
stress-ng: info:  [31714] metrics untrustworthy: 0
stress-ng: info:  [31714] successful run completed in 0.00 secs
(base) vincent@vincent-XPS-9320:~$ echo $?
0
(base) vincent@vincent-XPS-9320:~$ stress-ng --mlockmany 0 --timeout 30 --skip-silent --verbose
stress-ng: debug: [31730] invoked with 'stress-ng --mlockmany 0 --timeout 30 --skip-silent --verbose' by user 1000 'vincent'
stress-ng: debug: [31730] stress-ng 0.16.05
stress-ng: debug: [31730] system: Linux vincent-XPS-9320 6.5.0-27-generic #28-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar  7 18:21:00 UTC 2024 x86_64, gcc 13.2.0, glibc 2.38
stress-ng: debug: [31730] RAM total: 15.2G, RAM free: 1.6G, swap free: 1.9G
stress-ng: debug: [31730] temporary file path: '/home/vincent', filesystem type: ext2 (93904795 blocks available)
stress-ng: debug: [31730] CPUs have 5 idle states: C10 C1E C6 C8 POLL
stress-ng: debug: [31730] 16 processors online, 16 processors configured
stress-ng: info:  [31730] disabled 'mlockmany' as it may hang or reboot the machine (enable it with the --pathological option)
stress-ng: info:  [31730] setting to a 30 secs run per stressor
stress-ng: debug: [31730] CPU data cache: L1: 48K, L2: 1280K, L3: 12288K
stress-ng: debug: [31730] cache allocate: shared cache buffer size: 12288K
stress-ng: info:  [31730] dispatching hogs:
stress-ng: debug: [31730] starting stressors
stress-ng: debug: [31730] 0 stressors started
stress-ng: warn:  [31730] metrics-check: all bogo-op counters are zero, data may be incorrect
stress-ng: debug: [31730] metrics-check: all stressor metrics validated and sane
stress-ng: info:  [31730] skipped: 16: mlockmany (16)
stress-ng: info:  [31730] passed: 0
stress-ng: info:  [31730] failed: 0
stress-ng: info:  [31730] metrics untrustworthy: 0
stress-ng: info:  [31730] successful run completed in 0.00 secs
(base) vincent@vincent-XPS-9320:~$ echo $?
0
(base) vincent@vincent-XPS-9320:~$ stress-ng --oom-pipe 0 --timeout 30 --skip-silent --verbose
stress-ng: debug: [31731] invoked with 'stress-ng --oom-pipe 0 --timeout 30 --skip-silent --verbose' by user 1000 'vincent'
stress-ng: debug: [31731] stress-ng 0.16.05
stress-ng: debug: [31731] system: Linux vincent-XPS-9320 6.5.0-27-generic #28-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar  7 18:21:00 UTC 2024 x86_64, gcc 13.2.0, glibc 2.38
stress-ng: debug: [31731] RAM total: 15.2G, RAM free: 1.5G, swap free: 1.9G
stress-ng: debug: [31731] temporary file path: '/home/vincent', filesystem type: ext2 (93904794 blocks available)
stress-ng: debug: [31731] CPUs have 5 idle states: C10 C1E C6 C8 POLL
stress-ng: debug: [31731] 16 processors online, 16 processors configured
stress-ng: info:  [31731] disabled 'oom-pipe' as it may hang or reboot the machine (enable it with the --pathological option)
stress-ng: info:  [31731] setting to a 30 secs run per stressor
stress-ng: debug: [31731] CPU data cache: L1: 48K, L2: 1280K, L3: 12288K
stress-ng: debug: [31731] cache allocate: shared cache buffer size: 12288K
stress-ng: info:  [31731] dispatching hogs:
stress-ng: debug: [31731] starting stressors
stress-ng: debug: [31731] 0 stressors started
stress-ng: warn:  [31731] metrics-check: all bogo-op counters are zero, data may be incorrect
stress-ng: debug: [31731] metrics-check: all stressor metrics validated and sane
stress-ng: info:  [31731] skipped: 16: oom-pipe (16)
stress-ng: info:  [31731] passed: 0
stress-ng: info:  [31731] failed: 0
stress-ng: info:  [31731] metrics untrustworthy: 0
stress-ng: info:  [31731] successful run completed in 0.00 secs
(base) vincent@vincent-XPS-9320:~$ echo $?
0
(base) vincent@vincent-XPS-9320:~$ stress-ng --sysinval 0 --timeout 30 --skip-silent --verbose
stress-ng: debug: [31742] invoked with 'stress-ng --sysinval 0 --timeout 30 --skip-silent --verbose' by user 1000 'vincent'
stress-ng: debug: [31742] stress-ng 0.16.05
stress-ng: debug: [31742] system: Linux vincent-XPS-9320 6.5.0-27-generic #28-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar  7 18:21:00 UTC 2024 x86_64, gcc 13.2.0, glibc 2.38
stress-ng: debug: [31742] RAM total: 15.2G, RAM free: 1.6G, swap free: 1.9G
stress-ng: debug: [31742] temporary file path: '/home/vincent', filesystem type: ext2 (93904791 blocks available)
stress-ng: debug: [31742] CPUs have 5 idle states: C10 C1E C6 C8 POLL
stress-ng: debug: [31742] 16 processors online, 16 processors configured
stress-ng: info:  [31742] disabled 'sysinval' as it may hang or reboot the machine (enable it with the --pathological option)
stress-ng: info:  [31742] setting to a 30 secs run per stressor
stress-ng: debug: [31742] CPU data cache: L1: 48K, L2: 1280K, L3: 12288K
stress-ng: debug: [31742] cache allocate: shared cache buffer size: 12288K
stress-ng: info:  [31742] dispatching hogs:
stress-ng: debug: [31742] starting stressors
stress-ng: debug: [31742] 0 stressors started
stress-ng: warn:  [31742] metrics-check: all bogo-op counters are zero, data may be incorrect
stress-ng: debug: [31742] metrics-check: all stressor metrics validated and sane
stress-ng: info:  [31742] skipped: 16: sysinval (16)
stress-ng: info:  [31742] passed: 0
stress-ng: info:  [31742] failed: 0
stress-ng: info:  [31742] metrics untrustworthy: 0
stress-ng: info:  [31742] successful run completed in 0.00 secs
(base) vincent@vincent-XPS-9320:~$ echo $?
0
(base) vincent@vincent-XPS-9320:~$ stress-ng --watchdog 0 --timeout 30 --skip-silent --verbose
stress-ng: debug: [31761] invoked with 'stress-ng --watchdog 0 --timeout 30 --skip-silent --verbose' by user 1000 'vincent'
stress-ng: debug: [31761] stress-ng 0.16.05
stress-ng: debug: [31761] system: Linux vincent-XPS-9320 6.5.0-27-generic #28-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar  7 18:21:00 UTC 2024 x86_64, gcc 13.2.0, glibc 2.38
stress-ng: debug: [31761] RAM total: 15.2G, RAM free: 1.6G, swap free: 1.9G
stress-ng: debug: [31761] temporary file path: '/home/vincent', filesystem type: ext2 (93904783 blocks available)
stress-ng: debug: [31761] CPUs have 5 idle states: C10 C1E C6 C8 POLL
stress-ng: debug: [31761] 16 processors online, 16 processors configured
stress-ng: info:  [31761] disabled 'watchdog' as it may hang or reboot the machine (enable it with the --pathological option)
stress-ng: info:  [31761] setting to a 30 secs run per stressor
stress-ng: debug: [31761] CPU data cache: L1: 48K, L2: 1280K, L3: 12288K
stress-ng: debug: [31761] cache allocate: shared cache buffer size: 12288K
stress-ng: info:  [31761] dispatching hogs:
stress-ng: debug: [31761] starting stressors
stress-ng: debug: [31761] 0 stressors started
stress-ng: warn:  [31761] metrics-check: all bogo-op counters are zero, data may be incorrect
stress-ng: debug: [31761] metrics-check: all stressor metrics validated and sane
stress-ng: info:  [31761] skipped: 16: watchdog (16)
stress-ng: info:  [31761] passed: 0
stress-ng: info:  [31761] failed: 0
stress-ng: info:  [31761] metrics untrustworthy: 0
stress-ng: info:  [31761] successful run completed in 0.00 secs
(base) vincent@vincent-XPS-9320:~$ echo $?
0
``

@zongminl
Copy link
Collaborator

Fixed by #1425, now stress-ng comes from maintainer's PPA (ppa: colin-king/stress-ng)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants