Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPI2: Offlining one CPU crashes Kernel #843

Closed
msperl opened this issue Feb 20, 2015 · 14 comments
Closed

RPI2: Offlining one CPU crashes Kernel #843

msperl opened this issue Feb 20, 2015 · 14 comments

Comments

@msperl
Copy link
Contributor

msperl commented Feb 20, 2015

Offlining a CPU via:

echo "0" > /sys/devices/system/cpu/cpu3/online

results in the following crash:

[  203.201421] ---[ end trace 68f68af397b09efa ]---
[  203.201432] Kernel panic - not syncing: Attempted to kill the idle task!
[  203.201446] CPU0: stopping
[  203.220426] CPU: 0 PID: 2441 Comm: bash Tainted: G      D        3.18.7-v7+ 6
[  203.231035] [<80016d14>] (unwind_backtrace) from [<80012c40>] (show_stack+0x)
[  203.242031] [<80012c40>] (show_stack) from [<8052f164>] (dump_stack+0x98/0xd)
[  203.250976] [<8052f164>] (dump_stack) from [<8001509c>] (handle_IPI+0x234/0x)
[  203.261697] [<8001509c>] (handle_IPI) from [<80008618>] (do_IPI+0x18/0x1c)
[  203.270370] [<80008618>] (do_IPI) from [<80534b34>] (__irq_svc+0x34/0x14c)
[  203.279045] Exception stack(0xb85b9d58 to 0xb85b9da0)
[  203.285891] 9d40:                                                       80817
[  203.297650] 9d60: 00000003 80456bfc ffffffe0 80819ca8 00000000 00000003 00000
[  203.309552] 9d80: b856390c b85b9dc4 b85b9dc8 b85b9da0 800414a8 80456c2c 6000f
[  203.321703] [<80534b34>] (__irq_svc) from [<80456c2c>] (dev_cpu_callback+0x3)
[  203.333569] [<80456c2c>] (dev_cpu_callback) from [<800414a8>] (notifier_call)
[  203.346291] [<800414a8>] (notifier_call_chain) from [<800415dc>] (__raw_noti)
[  203.359904] [<800415dc>] (__raw_notifier_call_chain) from [<800253d8>] (cpu_)
[  203.372737] [<800253d8>] (cpu_notify) from [<80025540>] (cpu_notify_nofail+0)
[  203.384913] [<80025540>] (cpu_notify_nofail) from [<8052abbc>] (_cpu_down+0x)
[  203.397231] [<8052abbc>] (_cpu_down) from [<8052ad08>] (cpu_down+0x38/0x5c)
[  203.406488] [<8052ad08>] (cpu_down) from [<803534e0>] (cpu_subsys_offline+0x)
[  203.418721] [<803534e0>] (cpu_subsys_offline) from [<8034ebd4>] (device_offl)
[  203.431647] [<8034ebd4>] (device_offline) from [<8034ecf0>] (online_store+0x)
[  203.444042] [<8034ecf0>] (online_store) from [<8034c668>] (dev_attr_store+0x)
[  203.456432] [<8034c668>] (dev_attr_store) from [<801adeb8>] (sysfs_kf_write+)
[  203.468994] [<801adeb8>] (sysfs_kf_write) from [<801ad30c>] (kernfs_fop_writ)
[  203.481819] [<801ad30c>] (kernfs_fop_write) from [<80143938>] (vfs_write+0xb)
[  203.494202] [<80143938>] (vfs_write) from [<80143f28>] (SyS_write+0x4c/0xa0)
[  203.503629] [<80143f28>] (SyS_write) from [<8000ebc0>] (ret_fast_syscall+0x0)
[  203.515721] CPU2: stopping
[  203.520630] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D        3.18.7-v76
[  203.532537] [<80016d14>] (unwind_backtrace) from [<80012c40>] (show_stack+0x)
[  203.544656] [<80012c40>] (show_stack) from [<8052f164>] (dump_stack+0x98/0xd)
[  203.554129] [<8052f164>] (dump_stack) from [<8001509c>] (handle_IPI+0x234/0x)
[  203.565794] [<8001509c>] (handle_IPI) from [<80008618>] (do_IPI+0x18/0x1c)
[  203.574856] [<80008618>] (do_IPI) from [<80534b34>] (__irq_svc+0x34/0x14c)
[  203.583878] Exception stack(0xb98c7f58 to 0xb98c7fa0)
[  203.591032] 7f40:                                                       807e0
[  203.603241] 7f60: ffffffed 00000000 b98c6030 807e8dd4 00000000 00000000 b98c0
[  203.615388] 7f80: 808238fc b98c7fac b98c7fa0 b98c7fa0 8000f900 8000f904 6000f
[  203.627540] [<80534b34>] (__irq_svc) from [<8000f904>] (arch_cpu_idle+0x30/0)
[  203.638961] [<8000f904>] (arch_cpu_idle) from [<8005c5cc>] (cpu_startup_entr)
[  203.651375] [<8005c5cc>] (cpu_startup_entry) from [<80014bec>] (secondary_st)
[  203.664717] [<80014bec>] (secondary_start_kernel) from [<000086a4>] (0x86a4)
[  203.674101] ---[ end Kernel panic - not syncing: Attempted to kill the idle !
[  203.674105] 5f60: ffffffed 00000000 b98c4030 807e8dd4 00000000 00000000 b98c0
[  203.698487] 5f80: 808238fc b98c5fac b98c5fa0 b98c5fa0 8000f900 8000f904 6000f
[  203.711141] [<80534b34>] (__irq_svc) from [<8000f904>] (arch_cpu_idle+0x30/0)
[  203.723107] [<8000f904>] (arch_cpu_idle) from [<8005c5cc>] (cpu_startup_entr)
[  203.736124] [<8005c5cc>] (cpu_startup_entry) from [<80014bec>] (secondary_st)
[  203.749921] [<80014bec>] (secondary_start_kernel) from [<000086a4>] (0x86a4)

That is: with: 3.18.7-v7+ #756 SMP PREEMPT Wed Feb 18 16:14:51 GMT 2015 armv7l GNU/Linux

Similar with a self-built kernel based on "fe4a83540ec73dfc298f16f027277355470ea9a0"

@popcornmix
Copy link
Collaborator

Yes, never been tested, so not supported.
I need to understand what the correct behaviour should be.
I was planning on disabling CONFIG_SUSPEND/CONFIG_CPU_IDLE which may just disable this code path.

From the backtrace it looks like there is a suspend IPI which we can possibly just wfi from. I assume there will be another IPI to wake back up.

Any reason why you need this? If it's a power saving exercise, then maxcpus=1 added to cmdline.txt should work (but the power saving will be negligible).

@msperl
Copy link
Contributor Author

msperl commented Feb 22, 2015

Primarily curiosity - wanted to test something/ was playing with some ideas mainly and then it crashed...
So I thought I should raise it...
Still things like these should work/are expected to work...

@popcornmix
Copy link
Collaborator

Removing CONFIG_CPU_IDLE and CONFIG_HOTPLUG_CPU from kernel config means:

$ ls /sys/devices/system/cpu/cpu3/
cpufreq  subsystem  topology  uevent

so the online field is removed. I'll add that to next update.

@msperl
Copy link
Contributor Author

msperl commented Feb 23, 2015

well - in principle OK, but it just hides the bug that remains...

@popcornmix
Copy link
Collaborator

Well it's the correct solution for the problem reported of using an unsupported feature crashes.
A different issue would be a feature request to support offlining a cpu.
But I think we'd want some justification that it's useful before going to a lot of effort to implement it.

popcornmix added a commit to raspberrypi/firmware that referenced this issue Feb 23, 2015
See: raspberrypi/linux#843

kernel: serial: amba-pl011: Kickstart TX by explicit FIFO fill
See: raspberrypi/linux#148

kernel: config: enable TOUCHSCREEN_USB_COMPOSITE
See: raspberrypi/linux#718

firmware: ldconfig: Sort config options and use bsearch for lookups

firmware: video_decode: Require a small factor improvement for fifo timestamps
See: http://forum.kodi.tv/showthread.php?tid=215399

firmware: video codec: allow length-delineated input to have startcodes as well
See: popcornmix/omxplayer#272
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Feb 23, 2015
See: raspberrypi/linux#843

kernel: serial: amba-pl011: Kickstart TX by explicit FIFO fill
See: raspberrypi/linux#148

kernel: config: enable TOUCHSCREEN_USB_COMPOSITE
See: raspberrypi/linux#718

firmware: ldconfig: Sort config options and use bsearch for lookups

firmware: video_decode: Require a small factor improvement for fifo timestamps
See: http://forum.kodi.tv/showthread.php?tid=215399

firmware: video codec: allow length-delineated input to have startcodes as well
See: popcornmix/omxplayer#272
@popcornmix
Copy link
Collaborator

Latest update doesn't expose the "online" node, so doesn't crash.

@msperl
Copy link
Contributor Author

msperl commented Feb 24, 2015

I found that for my purposes the "isolcpus=3" kernel parameter plus taskset is probably a cleaner solution to my ideas than off-lining them from the kernel view and then re-enabling it again outside of the kernel context... (this is for low latency gpio stuff...)

@msperl msperl closed this as completed Feb 24, 2015
@Ferroin
Copy link
Contributor

Ferroin commented Mar 4, 2015

Just hiding the issue by not exporting the 'online' node shouldn't be the long term fix. There are other things in the kernel that use the same code path (kexec in particular), and they won't ever work until this is fixed (and having kexec support would be really nice, and also significant because the pi would be one of the few arm boards it actually works on).

@popcornmix
Copy link
Collaborator

As I've said, suspend, hibernate, cpu hotplugging has never been supported, so removing the config options is correct.

Sure, it would be nice if every possible kernel option could be enabled and do something useful, but we've got to prioritize. 99% of users will see no benefit from this. It may be looked into at some point in the future, but it's a low priority issue.

We'd be happy to consider a pull request if someone adds support.

@Ferroin
Copy link
Contributor

Ferroin commented Mar 4, 2015

While I agree that kexec, hibernate, and cpu hotplug wouldn't benefit most users, I do think that a lot of people who have media-center type systems might really like the ability to have it automatically sleep when not in use. I would look into getting this working myself, but have nowhere near the degree of architecture specific knowledge to tackle it an any reasonable amount of time. I would suggest putting somewhere in the documentation that it specifically doesn't work and isn't supported though.

@popcornmix
Copy link
Collaborator

But there is no hardware support for suspend.

All 4 cores are all clocked off the same clock and supplied the same power. I'm pretty certain a suspend mode would use exactly the same power consumption as we currently do when not busy.

@Ferroin
Copy link
Contributor

Ferroin commented Mar 4, 2015

Interesting, tthat actually makes me kinda curious about the startup and shutdown code for the Pi in Linux. Does the GPU load the IP on all 4 CPU cores during boot, or is bringup handled by linux for all but the first core?

@popcornmix
Copy link
Collaborator

As soon as arm is brought out of reset all 4 cores start executing.
3 of the cores spin waiting from a signal from the first core.
Once the first core has initialised enough, smp_boot_secondary is called which makes the other cores branch off to secondary_startup.
But really there is no way to really stop any cores. All you can do is wfi (wait for interrupt), but that happens anyway when cpu is low.

@Ferroin
Copy link
Contributor

Ferroin commented Mar 4, 2015

Yeah, it sounds like suspend would provide no gains whatsoever (especially since it would probably need some extra hardware for wakeup support).

neuschaefer pushed a commit to neuschaefer/raspi-binary-firmware that referenced this issue Feb 27, 2017
See: raspberrypi/linux#843

kernel: serial: amba-pl011: Kickstart TX by explicit FIFO fill
See: raspberrypi/linux#148

kernel: config: enable TOUCHSCREEN_USB_COMPOSITE
See: raspberrypi/linux#718

firmware: ldconfig: Sort config options and use bsearch for lookups

firmware: video_decode: Require a small factor improvement for fifo timestamps
See: http://forum.kodi.tv/showthread.php?tid=215399

firmware: video codec: allow length-delineated input to have startcodes as well
See: popcornmix/omxplayer#272
pfpacket pushed a commit to pfpacket/linux-rpi-rust that referenced this issue Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants