-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vc4: HDMI power on when not connected at boot not supported. #11
Comments
Fixes segmentation fault using, for instance: (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 (gdb) bt #0 0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 #1 0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:433 #2 0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:498 #3 0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:936 #4 0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391 #5 0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361 #6 0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401 #7 0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253 #8 0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364 #9 0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664 #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539 #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264 #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390 #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451 #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495 #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618 (gdb) Intel PT attempts to find the sched:sched_switch tracepoint but that seg faults if tracefs is not readable, because the error reporting structure is null, as errors are not reported when automatically adding tracepoints. Fix by checking before using. Committer note: This doesn't take place in a kernel that supports perf_event_attr.context_switch, that is the default way that will be used for tracking context switches, only in older kernels, like 4.2, in a machine with Intel PT (e.g. Broadwell) for non-priviledged users. Further info from a similar patch by Wang: The error is in tracepoint_error: it assumes the 'e' parameter is valid. However, there are many situation a parse_event() can be called without parse_events_error. See result of $ grep 'parse_events(.*NULL)' ./tools/perf/ -r' Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Tong Zhang <ztong@vt.edu> Cc: Wang Nan <wangnan0@huawei.com> Cc: stable@vger.kernel.org # v4.4+ Fixes: 1965817 ("perf tools: Enhance parsing events tracepoint error output") Link: http://lkml.kernel.org/r/1453809921-24596-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Patches are headed upstream for this one: https://lists.freedesktop.org/archives/dri-devel/2016-February/100963.html |
On Sat, Mar 12, 2016 at 12:57 AM, Eric Anholt notifications@github.com
|
awesome! can't wait to test it! |
commit ec183d2 upstream. Fixes segmentation fault using, for instance: (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 (gdb) bt #0 0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 #1 0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:433 #2 0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:498 #3 0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:936 #4 0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391 #5 0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361 #6 0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401 #7 0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253 #8 0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364 #9 0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664 #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539 #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264 #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390 #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451 #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495 #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618 (gdb) Intel PT attempts to find the sched:sched_switch tracepoint but that seg faults if tracefs is not readable, because the error reporting structure is null, as errors are not reported when automatically adding tracepoints. Fix by checking before using. Committer note: This doesn't take place in a kernel that supports perf_event_attr.context_switch, that is the default way that will be used for tracking context switches, only in older kernels, like 4.2, in a machine with Intel PT (e.g. Broadwell) for non-priviledged users. Further info from a similar patch by Wang: The error is in tracepoint_error: it assumes the 'e' parameter is valid. However, there are many situation a parse_event() can be called without parse_events_error. See result of $ grep 'parse_events(.*NULL)' ./tools/perf/ -r' Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Tong Zhang <ztong@vt.edu> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 1965817 ("perf tools: Enhance parsing events tracepoint error output") Link: http://lkml.kernel.org/r/1453809921-24596-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Patches merged upstream, and the 4.4 and 4.5 pull requests downstream should cover it as well (just tested again on 4.5) |
When a seq-virmidi driver is initialized, it registers a rawmidi instance with its callback to create an associated seq kernel client. Currently it's done throughly in rawmidi's register_mutex context. Recently it was found that this may lead to a deadlock another rawmidi device that is being attached with the sequencer is accessed, as both open with the same register_mutex. This was actually triggered by syzkaller, as Dmitry Vyukov reported: ====================================================== [ INFO: possible circular locking dependency detected ] 4.8.0-rc1+ anholt#11 Not tainted ------------------------------------------------------- syz-executor/7154 is trying to acquire lock: (register_mutex#5){+.+.+.}, at: [<ffffffff84fd6d4b>] snd_rawmidi_kernel_open+0x4b/0x260 sound/core/rawmidi.c:341 but task is already holding lock: (&grp->list_mutex){++++.+}, at: [<ffffffff850138bb>] check_and_subscribe_port+0x5b/0x5c0 sound/core/seq/seq_ports.c:495 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> anholt#1 (&grp->list_mutex){++++.+}: [<ffffffff8147a3a8>] lock_acquire+0x208/0x430 kernel/locking/lockdep.c:3746 [<ffffffff863f6199>] down_read+0x49/0xc0 kernel/locking/rwsem.c:22 [< inline >] deliver_to_subscribers sound/core/seq/seq_clientmgr.c:681 [<ffffffff85005c5e>] snd_seq_deliver_event+0x35e/0x890 sound/core/seq/seq_clientmgr.c:822 [<ffffffff85006e96>] > snd_seq_kernel_client_dispatch+0x126/0x170 sound/core/seq/seq_clientmgr.c:2418 [<ffffffff85012c52>] snd_seq_system_broadcast+0xb2/0xf0 sound/core/seq/seq_system.c:101 [<ffffffff84fff70a>] snd_seq_create_kernel_client+0x24a/0x330 sound/core/seq/seq_clientmgr.c:2297 [< inline >] snd_virmidi_dev_attach_seq sound/core/seq/seq_virmidi.c:383 [<ffffffff8502d29f>] snd_virmidi_dev_register+0x29f/0x750 sound/core/seq/seq_virmidi.c:450 [<ffffffff84fd208c>] snd_rawmidi_dev_register+0x30c/0xd40 sound/core/rawmidi.c:1645 [<ffffffff84f816d3>] __snd_device_register.part.0+0x63/0xc0 sound/core/device.c:164 [< inline >] __snd_device_register sound/core/device.c:162 [<ffffffff84f8235d>] snd_device_register_all+0xad/0x110 sound/core/device.c:212 [<ffffffff84f7546f>] snd_card_register+0xef/0x6c0 sound/core/init.c:749 [<ffffffff85040b7f>] snd_virmidi_probe+0x3ef/0x590 sound/drivers/virmidi.c:123 [<ffffffff833ebf7b>] platform_drv_probe+0x8b/0x170 drivers/base/platform.c:564 ...... -> #0 (register_mutex#5){+.+.+.}: [< inline >] check_prev_add kernel/locking/lockdep.c:1829 [< inline >] check_prevs_add kernel/locking/lockdep.c:1939 [< inline >] validate_chain kernel/locking/lockdep.c:2266 [<ffffffff814791f4>] __lock_acquire+0x4d44/0x4d80 kernel/locking/lockdep.c:3335 [<ffffffff8147a3a8>] lock_acquire+0x208/0x430 kernel/locking/lockdep.c:3746 [< inline >] __mutex_lock_common kernel/locking/mutex.c:521 [<ffffffff863f0ef1>] mutex_lock_nested+0xb1/0xa20 kernel/locking/mutex.c:621 [<ffffffff84fd6d4b>] snd_rawmidi_kernel_open+0x4b/0x260 sound/core/rawmidi.c:341 [<ffffffff8502e7c7>] midisynth_subscribe+0xf7/0x350 sound/core/seq/seq_midi.c:188 [< inline >] subscribe_port sound/core/seq/seq_ports.c:427 [<ffffffff85013cc7>] check_and_subscribe_port+0x467/0x5c0 sound/core/seq/seq_ports.c:510 [<ffffffff85015da9>] snd_seq_port_connect+0x2c9/0x500 sound/core/seq/seq_ports.c:579 [<ffffffff850079b8>] snd_seq_ioctl_subscribe_port+0x1d8/0x2b0 sound/core/seq/seq_clientmgr.c:1480 [<ffffffff84ffe9e4>] snd_seq_do_ioctl+0x184/0x1e0 sound/core/seq/seq_clientmgr.c:2225 [<ffffffff84ffeae8>] snd_seq_kernel_client_ctl+0xa8/0x110 sound/core/seq/seq_clientmgr.c:2440 [<ffffffff85027664>] snd_seq_oss_midi_open+0x3b4/0x610 sound/core/seq/oss/seq_oss_midi.c:375 [<ffffffff85023d67>] snd_seq_oss_synth_setup_midi+0x107/0x4c0 sound/core/seq/oss/seq_oss_synth.c:281 [<ffffffff8501b0a8>] snd_seq_oss_open+0x748/0x8d0 sound/core/seq/oss/seq_oss_init.c:274 [<ffffffff85019d8a>] odev_open+0x6a/0x90 sound/core/seq/oss/seq_oss.c:138 [<ffffffff84f7040f>] soundcore_open+0x30f/0x640 sound/sound_core.c:639 ...... other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&grp->list_mutex); lock(register_mutex#5); lock(&grp->list_mutex); lock(register_mutex#5); *** DEADLOCK *** ====================================================== The fix is to simply move the registration parts in snd_rawmidi_dev_register() to the outside of the register_mutex lock. The lock is needed only to manage the linked list, and it's not necessarily to cover the whole initialization process. Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
commit 816f318 upstream. When a seq-virmidi driver is initialized, it registers a rawmidi instance with its callback to create an associated seq kernel client. Currently it's done throughly in rawmidi's register_mutex context. Recently it was found that this may lead to a deadlock another rawmidi device that is being attached with the sequencer is accessed, as both open with the same register_mutex. This was actually triggered by syzkaller, as Dmitry Vyukov reported: ====================================================== [ INFO: possible circular locking dependency detected ] 4.8.0-rc1+ #11 Not tainted ------------------------------------------------------- syz-executor/7154 is trying to acquire lock: (register_mutex#5){+.+.+.}, at: [<ffffffff84fd6d4b>] snd_rawmidi_kernel_open+0x4b/0x260 sound/core/rawmidi.c:341 but task is already holding lock: (&grp->list_mutex){++++.+}, at: [<ffffffff850138bb>] check_and_subscribe_port+0x5b/0x5c0 sound/core/seq/seq_ports.c:495 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&grp->list_mutex){++++.+}: [<ffffffff8147a3a8>] lock_acquire+0x208/0x430 kernel/locking/lockdep.c:3746 [<ffffffff863f6199>] down_read+0x49/0xc0 kernel/locking/rwsem.c:22 [< inline >] deliver_to_subscribers sound/core/seq/seq_clientmgr.c:681 [<ffffffff85005c5e>] snd_seq_deliver_event+0x35e/0x890 sound/core/seq/seq_clientmgr.c:822 [<ffffffff85006e96>] > snd_seq_kernel_client_dispatch+0x126/0x170 sound/core/seq/seq_clientmgr.c:2418 [<ffffffff85012c52>] snd_seq_system_broadcast+0xb2/0xf0 sound/core/seq/seq_system.c:101 [<ffffffff84fff70a>] snd_seq_create_kernel_client+0x24a/0x330 sound/core/seq/seq_clientmgr.c:2297 [< inline >] snd_virmidi_dev_attach_seq sound/core/seq/seq_virmidi.c:383 [<ffffffff8502d29f>] snd_virmidi_dev_register+0x29f/0x750 sound/core/seq/seq_virmidi.c:450 [<ffffffff84fd208c>] snd_rawmidi_dev_register+0x30c/0xd40 sound/core/rawmidi.c:1645 [<ffffffff84f816d3>] __snd_device_register.part.0+0x63/0xc0 sound/core/device.c:164 [< inline >] __snd_device_register sound/core/device.c:162 [<ffffffff84f8235d>] snd_device_register_all+0xad/0x110 sound/core/device.c:212 [<ffffffff84f7546f>] snd_card_register+0xef/0x6c0 sound/core/init.c:749 [<ffffffff85040b7f>] snd_virmidi_probe+0x3ef/0x590 sound/drivers/virmidi.c:123 [<ffffffff833ebf7b>] platform_drv_probe+0x8b/0x170 drivers/base/platform.c:564 ...... -> #0 (register_mutex#5){+.+.+.}: [< inline >] check_prev_add kernel/locking/lockdep.c:1829 [< inline >] check_prevs_add kernel/locking/lockdep.c:1939 [< inline >] validate_chain kernel/locking/lockdep.c:2266 [<ffffffff814791f4>] __lock_acquire+0x4d44/0x4d80 kernel/locking/lockdep.c:3335 [<ffffffff8147a3a8>] lock_acquire+0x208/0x430 kernel/locking/lockdep.c:3746 [< inline >] __mutex_lock_common kernel/locking/mutex.c:521 [<ffffffff863f0ef1>] mutex_lock_nested+0xb1/0xa20 kernel/locking/mutex.c:621 [<ffffffff84fd6d4b>] snd_rawmidi_kernel_open+0x4b/0x260 sound/core/rawmidi.c:341 [<ffffffff8502e7c7>] midisynth_subscribe+0xf7/0x350 sound/core/seq/seq_midi.c:188 [< inline >] subscribe_port sound/core/seq/seq_ports.c:427 [<ffffffff85013cc7>] check_and_subscribe_port+0x467/0x5c0 sound/core/seq/seq_ports.c:510 [<ffffffff85015da9>] snd_seq_port_connect+0x2c9/0x500 sound/core/seq/seq_ports.c:579 [<ffffffff850079b8>] snd_seq_ioctl_subscribe_port+0x1d8/0x2b0 sound/core/seq/seq_clientmgr.c:1480 [<ffffffff84ffe9e4>] snd_seq_do_ioctl+0x184/0x1e0 sound/core/seq/seq_clientmgr.c:2225 [<ffffffff84ffeae8>] snd_seq_kernel_client_ctl+0xa8/0x110 sound/core/seq/seq_clientmgr.c:2440 [<ffffffff85027664>] snd_seq_oss_midi_open+0x3b4/0x610 sound/core/seq/oss/seq_oss_midi.c:375 [<ffffffff85023d67>] snd_seq_oss_synth_setup_midi+0x107/0x4c0 sound/core/seq/oss/seq_oss_synth.c:281 [<ffffffff8501b0a8>] snd_seq_oss_open+0x748/0x8d0 sound/core/seq/oss/seq_oss_init.c:274 [<ffffffff85019d8a>] odev_open+0x6a/0x90 sound/core/seq/oss/seq_oss.c:138 [<ffffffff84f7040f>] soundcore_open+0x30f/0x640 sound/sound_core.c:639 ...... other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&grp->list_mutex); lock(register_mutex#5); lock(&grp->list_mutex); lock(register_mutex#5); *** DEADLOCK *** ====================================================== The fix is to simply move the registration parts in snd_rawmidi_dev_register() to the outside of the register_mutex lock. The lock is needed only to manage the linked list, and it's not necessarily to cover the whole initialization process. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Function ib_create_qp() was failing to return an error when rdma_rw_init_mrs() fails, causing a crash further down in ib_create_qp() when trying to dereferece the qp pointer which was actually a negative errno. The crash: crash> log|grep BUG [ 136.458121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 crash> bt PID: 3736 TASK: ffff8808543215c0 CPU: 2 COMMAND: "kworker/u64:2" #0 [ffff88084d323340] machine_kexec at ffffffff8105fbb0 #1 [ffff88084d3233b0] __crash_kexec at ffffffff81116758 #2 [ffff88084d323480] crash_kexec at ffffffff8111682d #3 [ffff88084d3234b0] oops_end at ffffffff81032bd6 #4 [ffff88084d3234e0] no_context at ffffffff8106e431 #5 [ffff88084d323530] __bad_area_nosemaphore at ffffffff8106e610 #6 [ffff88084d323590] bad_area_nosemaphore at ffffffff8106e6f4 #7 [ffff88084d3235a0] __do_page_fault at ffffffff8106ebdc #8 [ffff88084d323620] do_page_fault at ffffffff8106f057 #9 [ffff88084d323660] page_fault at ffffffff816e3148 [exception RIP: ib_create_qp+427] RIP: ffffffffa02554fb RSP: ffff88084d323718 RFLAGS: 00010246 RAX: 0000000000000004 RBX: fffffffffffffff4 RCX: 000000018020001f RDX: ffff880830997fc0 RSI: 0000000000000001 RDI: ffff88085f407200 RBP: ffff88084d323778 R8: 0000000000000001 R9: ffffea0020bae210 R10: ffffea0020bae218 R11: 0000000000000001 R12: ffff88084d3237c8 R13: 00000000fffffff4 R14: ffff880859fa5000 R15: ffff88082eb89800 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff88084d323780] rdma_create_qp at ffffffffa0782681 [rdma_cm] #11 [ffff88084d3237b0] nvmet_rdma_create_queue_ib at ffffffffa07c43f3 [nvmet_rdma] #12 [ffff88084d323860] nvmet_rdma_alloc_queue at ffffffffa07c5ba9 [nvmet_rdma] #13 [ffff88084d323900] nvmet_rdma_queue_connect at ffffffffa07c5c96 [nvmet_rdma] #14 [ffff88084d323980] nvmet_rdma_cm_handler at ffffffffa07c6450 [nvmet_rdma] #15 [ffff88084d3239b0] iw_conn_req_handler at ffffffffa0787480 [rdma_cm] #16 [ffff88084d323a60] cm_conn_req_handler at ffffffffa0775f06 [iw_cm] #17 [ffff88084d323ab0] process_event at ffffffffa0776019 [iw_cm] #18 [ffff88084d323af0] cm_work_handler at ffffffffa0776170 [iw_cm] #19 [ffff88084d323cb0] process_one_work at ffffffff810a1483 #20 [ffff88084d323d90] worker_thread at ffffffff810a211d #21 [ffff88084d323ec0] kthread at ffffffff810a6c5c #22 [ffff88084d323f50] ret_from_fork at ffffffff816e1ebf Fixes: 632bc3f ("IB/core, RDMA RW API: Do not exceed QP SGE send limit") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: stable@vger.kernel.org Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
commit 4dfce57 upstream. There have been several reports over the years of NULL pointer dereferences in xfs_trans_log_inode during xfs_fsr processes, when the process is doing an fput and tearing down extents on the temporary inode, something like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 PID: 29439 TASK: ffff880550584fa0 CPU: 6 COMMAND: "xfs_fsr" [exception RIP: xfs_trans_log_inode+0x10] #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs] #10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs] #11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs] #12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs] #13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs] #14 [ffff8800a57bbe00] evict at ffffffff811e1b67 #15 [ffff8800a57bbe28] iput at ffffffff811e23a5 #16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8 #17 [ffff8800a57bbe88] dput at ffffffff811dd06c #18 [ffff8800a57bbea8] __fput at ffffffff811c823b #19 [ffff8800a57bbef0] ____fput at ffffffff811c846e #20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27 #21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c #22 [ffff8800a57bbf50] int_signal at ffffffff8161405d As it turns out, this is because the i_itemp pointer, along with the d_ops pointer, has been overwritten with zeros when we tear down the extents during truncate. When the in-core inode fork on the temporary inode used by xfs_fsr was originally set up during the extent swap, we mistakenly looked at di_nextents to determine whether all extents fit inline, but this misses extents generated by speculative preallocation; we should be using if_bytes instead. This mistake corrupts the in-memory inode, and code in xfs_iext_remove_inline eventually gets bad inputs, causing it to memmove and memset incorrect ranges; this became apparent because the two values in ifp->if_u2.if_inline_ext[1] contained what should have been in d_ops and i_itemp; they were memmoved due to incorrect array indexing and then the original locations were zeroed with memset, again due to an array overrun. Fix this by properly using i_df.if_bytes to determine the number of extents, not di_nextents. Thanks to dchinner for looking at this with me and spotting the root cause. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1f1c0e upstream. Starting with commit d94a461 ("ath9k: use ieee80211_tx_status_noskb where possible") the driver uses rcu_read_lock() && rcu_read_unlock(), yet on returning early in ath_tx_edma_tasklet() the unlock is missing leading to stalls and suspicious RCU usage: =============================== [ INFO: suspicious RCU usage. ] 4.9.0-rc8 #11 Not tainted ------------------------------- kernel/rcu/tree.c:705 Illegal idle entry in RCU read-side critical section.! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! 1 lock held by swapper/7/0: #0: ( rcu_read_lock ){......} , at: [<ffffffffa06ed110>] ath_tx_edma_tasklet+0x0/0x450 [ath9k] stack backtrace: CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.0-rc8 #11 Hardware name: Acer Aspire V3-571G/VA50_HC_CR, BIOS V2.21 12/16/2013 ffff88025efc3f38 ffffffff8132b1e5 ffff88017ede4540 0000000000000001 ffff88025efc3f68 ffffffff810a25f7 ffff88025efcee60 ffff88017edebdd8 ffff88025eeb5400 0000000000000091 ffff88025efc3f88 ffffffff810c3cd4 Call Trace: <IRQ> [<ffffffff8132b1e5>] dump_stack+0x68/0x93 [<ffffffff810a25f7>] lockdep_rcu_suspicious+0xd7/0x110 [<ffffffff810c3cd4>] rcu_eqs_enter_common.constprop.85+0x154/0x200 [<ffffffff810c5a54>] rcu_irq_exit+0x44/0xa0 [<ffffffff81058631>] irq_exit+0x61/0xd0 [<ffffffff81018d25>] do_IRQ+0x65/0x110 [<ffffffff81672189>] common_interrupt+0x89/0x89 <EOI> [<ffffffff814ffe11>] ? cpuidle_enter_state+0x151/0x200 [<ffffffff814ffee2>] cpuidle_enter+0x12/0x20 [<ffffffff8109a6ae>] call_cpuidle+0x1e/0x40 [<ffffffff8109a8f6>] cpu_startup_entry+0x146/0x220 [<ffffffff810336f8>] start_secondary+0x148/0x170 Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Fixes: d94a461 ("ath9k: use ieee80211_tx_status_noskb where possible") Acked-by: Felix Fietkau <nbd@nbd.name> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Starting with commit d94a461 ("ath9k: use ieee80211_tx_status_noskb where possible") the driver uses rcu_read_lock() && rcu_read_unlock(), yet on returning early in ath_tx_edma_tasklet() the unlock is missing leading to stalls and suspicious RCU usage: =============================== [ INFO: suspicious RCU usage. ] 4.9.0-rc8 #11 Not tainted ------------------------------- kernel/rcu/tree.c:705 Illegal idle entry in RCU read-side critical section.! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! 1 lock held by swapper/7/0: #0: ( rcu_read_lock ){......} , at: [<ffffffffa06ed110>] ath_tx_edma_tasklet+0x0/0x450 [ath9k] stack backtrace: CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.0-rc8 #11 Hardware name: Acer Aspire V3-571G/VA50_HC_CR, BIOS V2.21 12/16/2013 ffff88025efc3f38 ffffffff8132b1e5 ffff88017ede4540 0000000000000001 ffff88025efc3f68 ffffffff810a25f7 ffff88025efcee60 ffff88017edebdd8 ffff88025eeb5400 0000000000000091 ffff88025efc3f88 ffffffff810c3cd4 Call Trace: <IRQ> [<ffffffff8132b1e5>] dump_stack+0x68/0x93 [<ffffffff810a25f7>] lockdep_rcu_suspicious+0xd7/0x110 [<ffffffff810c3cd4>] rcu_eqs_enter_common.constprop.85+0x154/0x200 [<ffffffff810c5a54>] rcu_irq_exit+0x44/0xa0 [<ffffffff81058631>] irq_exit+0x61/0xd0 [<ffffffff81018d25>] do_IRQ+0x65/0x110 [<ffffffff81672189>] common_interrupt+0x89/0x89 <EOI> [<ffffffff814ffe11>] ? cpuidle_enter_state+0x151/0x200 [<ffffffff814ffee2>] cpuidle_enter+0x12/0x20 [<ffffffff8109a6ae>] call_cpuidle+0x1e/0x40 [<ffffffff8109a8f6>] cpu_startup_entry+0x146/0x220 [<ffffffff810336f8>] start_secondary+0x148/0x170 Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Fixes: d94a461 ("ath9k: use ieee80211_tx_status_noskb where possible") Cc: <stable@vger.kernel.org> # v4.9 Acked-by: Felix Fietkau <nbd@nbd.name> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
nla_memdup_cookie was overwriting err value, declared at function scope and earlier initialized with result of ->init(). At success nla_memdup_cookie() returns 0, and thus module refcnt decremented, although the action was installed. $ sudo tc actions add action pass index 1 cookie 1234 $ sudo tc actions ls action gact action order 0: gact action pass random type none pass val 0 index 1 ref 1 bind 0 $ $ lsmod Module Size Used by act_gact 16384 0 ... $ $ sudo rmmod act_gact [ 52.310283] ------------[ cut here ]------------ [ 52.312551] WARNING: CPU: 1 PID: 455 at kernel/module.c:1113 module_put+0x99/0xa0 [ 52.316278] Modules linked in: act_gact(-) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel psmouse pcbc evbug aesni_intel aes_x86_64 crypto_simd serio_raw glue_helper pcspkr cryptd [ 52.322285] CPU: 1 PID: 455 Comm: rmmod Not tainted 4.10.0+ #11 [ 52.324261] Call Trace: [ 52.325132] dump_stack+0x63/0x87 [ 52.326236] __warn+0xd1/0xf0 [ 52.326260] warn_slowpath_null+0x1d/0x20 [ 52.326260] module_put+0x99/0xa0 [ 52.326260] tcf_hashinfo_destroy+0x7f/0x90 [ 52.326260] gact_exit_net+0x27/0x40 [act_gact] [ 52.326260] ops_exit_list.isra.6+0x38/0x60 [ 52.326260] unregister_pernet_operations+0x90/0xe0 [ 52.326260] unregister_pernet_subsys+0x21/0x30 [ 52.326260] tcf_unregister_action+0x68/0xa0 [ 52.326260] gact_cleanup_module+0x17/0xa0f [act_gact] [ 52.326260] SyS_delete_module+0x1ba/0x220 [ 52.326260] entry_SYSCALL_64_fastpath+0x1e/0xad [ 52.326260] RIP: 0033:0x7f527ffae367 [ 52.326260] RSP: 002b:00007ffeb402a598 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0 [ 52.326260] RAX: ffffffffffffffda RBX: 0000559b069912a0 RCX: 00007f527ffae367 [ 52.326260] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559b06991308 [ 52.326260] RBP: 0000000000000003 R08: 00007f5280264420 R09: 00007ffeb4029511 [ 52.326260] R10: 000000000000087b R11: 0000000000000202 R12: 00007ffeb4029580 [ 52.326260] R13: 0000000000000000 R14: 0000000000000000 R15: 0000559b069912a0 [ 52.354856] ---[ end trace 90d89401542b0db6 ]--- $ With the fix: $ sudo modprobe act_gact $ lsmod Module Size Used by act_gact 16384 0 ... $ sudo tc actions add action pass index 1 cookie 1234 $ sudo tc actions ls action gact action order 0: gact action pass random type none pass val 0 index 1 ref 1 bind 0 $ $ lsmod Module Size Used by act_gact 16384 1 ... $ sudo rmmod act_gact rmmod: ERROR: Module act_gact is in use $ $ sudo /home/mrv/bin/tc actions del action gact index 1 $ sudo rmmod act_gact $ lsmod Module Size Used by $ Fixes: 1045ba7 ("net sched actions: Add support for user cookies") Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 45caeaa ] As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: #8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . #9 [] tcp_rcv_established at ffffffff81580b64 #10 [] tcp_v4_do_rcv at ffffffff8158b54a #11 [] tcp_v4_rcv at ffffffff8158cd02 #12 [] ip_local_deliver_finish at ffffffff815668f4 #13 [] ip_local_deliver at ffffffff81566bd9 #14 [] ip_rcv_finish at ffffffff8156656d #15 [] ip_rcv at ffffffff81566f06 #16 [] __netif_receive_skb_core at ffffffff8152b3a2 #17 [] __netif_receive_skb at ffffffff8152b608 #18 [] netif_receive_skb at ffffffff8152b690 #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] #21 [] net_rx_action at ffffffff8152bac2 #22 [] __do_softirq at ffffffff81084b4f #23 [] call_softirq at ffffffff8164845c #24 [] do_softirq at ffffffff81016fc5 #25 [] irq_exit at ffffffff81084ee5 #26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 45caeaa ] As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: #8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . #9 [] tcp_rcv_established at ffffffff81580b64 #10 [] tcp_v4_do_rcv at ffffffff8158b54a #11 [] tcp_v4_rcv at ffffffff8158cd02 #12 [] ip_local_deliver_finish at ffffffff815668f4 #13 [] ip_local_deliver at ffffffff81566bd9 #14 [] ip_rcv_finish at ffffffff8156656d #15 [] ip_rcv at ffffffff81566f06 #16 [] __netif_receive_skb_core at ffffffff8152b3a2 #17 [] __netif_receive_skb at ffffffff8152b608 #18 [] netif_receive_skb at ffffffff8152b690 #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] #21 [] net_rx_action at ffffffff8152bac2 #22 [] __do_softirq at ffffffff81084b4f #23 [] call_softirq at ffffffff8164845c #24 [] do_softirq at ffffffff81016fc5 #25 [] irq_exit at ffffffff81084ee5 #26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[12493.693827] WARNING: CPU: 1 PID: 14860 at drivers/gpu/drm/i915/intel_uncore.c:795 __unclaimed_reg_debug+0x5d/0x80 [i915] [12493.693868] Unclaimed read from register 0x1f0024 [12493.693905] Modules linked in: vgem i915 drm_kms_helper drm intel_gtt i2c_algo_bit syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers intel_powerclamp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel cryptd lpc_ich i2c_i801 mfd_core video i2c_designware_platform i2c_designware_core i2c_core button autofs4 sd_mod ahci libahci libata scsi_mod [last unloaded: i915] [12493.694039] CPU: 1 PID: 14860 Comm: intel-gpu-overl Tainted: G U 4.10.0-rc7+ #11 [12493.694079] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F8 03/02/2016 [12493.694121] Call Trace: [12493.694169] dump_stack+0x67/0x9d [12493.694235] __warn+0x117/0x140 [12493.694288] warn_slowpath_fmt+0x4f/0x60 [12493.694344] ? do_raw_spin_lock+0x116/0x180 [12493.694533] ? check_for_unclaimed_mmio+0x98/0xe0 [i915] [12493.694727] __unclaimed_reg_debug+0x5d/0x80 [i915] [12493.694923] fwtable_read32+0x2c5/0x330 [i915] [12493.695108] i915_interrupt_info+0xd52/0xf80 [i915] [12493.695302] ? gen6_write16+0x310/0x310 [i915] [12493.695357] seq_read+0x187/0x710 [12493.695412] full_proxy_read+0x75/0xc0 [12493.695472] __vfs_read+0x5a/0x220 [12493.695524] ? kmem_cache_free+0x6c/0x260 [12493.695577] ? putname+0x97/0xa0 [12493.695629] ? putname+0x97/0xa0 [12493.695682] ? rcu_read_lock_sched_held+0xb8/0xd0 [12493.695735] ? rw_verify_area+0x65/0x140 [12493.695787] vfs_read+0xd1/0x1f0 [12493.695840] SyS_read+0x62/0xc0 [12493.695893] entry_SYSCALL_64_fastpath+0x1c/0xb1 [12493.695943] RIP: 0033:0x7f82dca99ba0 [12493.695985] RSP: 002b:00007ffc0bdfd4f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [12493.696031] RAX: ffffffffffffffda RBX: 00007ffc0be005a0 RCX: 00007f82dca99ba0 [12493.696073] RDX: 0000000000001fff RSI: 00007ffc0bdfd500 RDI: 000000000000001a [12493.696115] RBP: ffffffff810fb639 R08: 302f6972642f6775 R09: 00007f82dca0999a [12493.696157] R10: 00007f82dcd62760 R11: 0000000000000246 R12: ffff880069a17f98 [12493.696199] R13: 00007ffc0bdfd428 R14: 0000000000000003 R15: 00007ffc0bdfd428 [12493.696250] ? trace_hardirqs_off_caller+0xd9/0x130 [12493.696300] ---[ end trace 52ccf4d39793cc59 ]--- Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99761 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170210133632.16946-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: #8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . #9 [] tcp_rcv_established at ffffffff81580b64 #10 [] tcp_v4_do_rcv at ffffffff8158b54a #11 [] tcp_v4_rcv at ffffffff8158cd02 #12 [] ip_local_deliver_finish at ffffffff815668f4 #13 [] ip_local_deliver at ffffffff81566bd9 #14 [] ip_rcv_finish at ffffffff8156656d #15 [] ip_rcv at ffffffff81566f06 #16 [] __netif_receive_skb_core at ffffffff8152b3a2 #17 [] __netif_receive_skb at ffffffff8152b608 #18 [] netif_receive_skb at ffffffff8152b690 #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] #21 [] net_rx_action at ffffffff8152bac2 #22 [] __do_softirq at ffffffff81084b4f #23 [] call_softirq at ffffffff8164845c #24 [] do_softirq at ffffffff81016fc5 #25 [] irq_exit at ffffffff81084ee5 #26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas reported that 'perf buildid-list' gets a SEGFAULT due to NULL pointer deref when he ran it on a data with namespace events. It was because the buildid_id__mark_dso_hit_ops lacks the namespace event handler and perf_too__fill_default() didn't set it. Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () Missing separate debuginfos, use: dnf debuginfo-install audit-libs-2.7.7-1.fc25.s390x bzip2-libs-1.0.6-21.fc25.s390x elfutils-libelf-0.169-1.fc25.s390x +elfutils-libs-0.169-1.fc25.s390x libcap-ng-0.7.8-1.fc25.s390x numactl-libs-2.0.11-2.ibm.fc25.s390x openssl-libs-1.1.0e-1.1.ibm.fc25.s390x perl-libs-5.24.1-386.fc25.s390x +python-libs-2.7.13-2.fc25.s390x slang-2.3.0-7.fc25.s390x xz-libs-5.2.3-2.fc25.s390x zlib-1.2.8-10.fc25.s390x (gdb) where #0 0x0000000000000000 in ?? () #1 0x00000000010fad6a in machines__deliver_event (machines=<optimized out>, machines@entry=0x2c6fd18, evlist=<optimized out>, event=event@entry=0x3fffdf00470, sample=0x3ffffffe880, sample@entry=0x3ffffffe888, tool=tool@entry=0x1312968 <build_id.mark_dso_hit_ops>, file_offset=1136) at util/session.c:1287 #2 0x00000000010fbf4e in perf_session__deliver_event (file_offset=1136, tool=0x1312968 <build_id.mark_dso_hit_ops>, sample=0x3ffffffe888, event=0x3fffdf00470, session=0x2c6fc30) at util/session.c:1340 #3 perf_session__process_event (session=0x2c6fc30, session@entry=0x0, event=event@entry=0x3fffdf00470, file_offset=file_offset@entry=1136) at util/session.c:1522 #4 0x00000000010fddde in __perf_session__process_events (file_size=11880, data_size=<optimized out>, data_offset=<optimized out>, session=0x0) at util/session.c:1899 #5 perf_session__process_events (session=0x0, session@entry=0x2c6fc30) at util/session.c:1953 #6 0x000000000103b2ac in perf_session__list_build_ids (with_hits=<optimized out>, force=<optimized out>) at builtin-buildid-list.c:83 #7 cmd_buildid_list (argc=<optimized out>, argv=<optimized out>) at builtin-buildid-list.c:115 #8 0x00000000010a026c in run_builtin (p=0x1311f78 <commands+24>, argc=argc@entry=2, argv=argv@entry=0x3fffffff3c0) at perf.c:296 #9 0x000000000102bc00 in handle_internal_command (argv=<optimized out>, argc=2) at perf.c:348 #10 run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:392 #11 main (argc=<optimized out>, argv=0x3fffffff3c0) at perf.c:536 (gdb) Fix it by adding a stub event handler for namespace event. Committer testing: Further clarifying, plain using 'perf buildid-list' will not end up in a SEGFAULT when processing a perf.data file with namespace info: # perf record -a --namespaces sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.024 MB perf.data (1058 samples) ] # perf buildid-list | wc -l 38 # perf buildid-list | head -5 e2a171c7b905826fc8494f0711ba76ab6abbd604 /lib/modules/4.14.0-rc3+/build/vmlinux 874840a02d8f8a31cedd605d0b8653145472ced3 /lib/modules/4.14.0-rc3+/kernel/arch/x86/kvm/kvm-intel.ko ea7223776730cd8a22f320040aae4d54312984bc /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/i915/i915.ko 5961535e6732a8edb7f22b3f148bb2fa2e0be4b9 /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/drm.ko f045f54aa78cf1931cc893f78b6cbc52c72a8cb1 /usr/lib64/libc-2.25.so # It is only when one asks for checking what of those entries actually had samples, i.e. when we use either -H or --with-hits, that we will process all the PERF_RECORD_ events, and since tools/perf/builtin-buildid-list.c neither explicitely set a perf_tool.namespaces() callback nor the default stub was set that we end up, when processing a PERF_RECORD_NAMESPACE record, causing a SEGFAULT: # perf buildid-list -H Segmentation fault (core dumped) ^C # Reported-and-Tested-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Fixes: f3b3614 ("perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info") Link: http://lkml.kernel.org/r/20171017132900.11043-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Scenario: 1. Port down and do fail over 2. Ap do rds_bind syscall PID: 47039 TASK: ffff89887e2fe640 CPU: 47 COMMAND: "kworker/u:6" #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9 #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3 #2 [ffff898e35f15b30] oops_end at ffffffff8150f518 #3 [ffff898e35f15b60] no_context at ffffffff8104854c #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675 #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3 #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8 #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95 [exception RIP: unknown or invalid address] RIP: 0000000000000000 RSP: ffff898e35f15dc8 RFLAGS: 00010282 RAX: 00000000fffffffe RBX: ffff889b77f6fc00 RCX:ffffffff81c99d88 RDX: 0000000000000000 RSI: ffff896019ee08e8 RDI:ffff889b77f6fc00 RBP: ffff898e35f15df0 R8: ffff896019ee08c8 R9:0000000000000000 R10: 0000000000000400 R11: 0000000000000000 R12:ffff896019ee08c0 R13: ffff889b77f6fe68 R14: ffffffff81c99d80 R15: ffffffffa022a1e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm] #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6 #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0 #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6 PID: 45659 TASK: ffff880d313d2500 CPU: 31 COMMAND: "oracle_45659_ap" #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4 #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7 #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm] #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma] #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds] #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds] #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670 PID: 45659 PID: 47039 rds_ib_laddr_check /* create id_priv with a null event_handler */ rdma_create_id rdma_bind_addr cma_acquire_dev /* add id_priv to cma_dev->id_list */ cma_attach_to_dev cma_ndev_work_handler /* event_hanlder is null */ id_priv->id.event_handler Signed-off-by: Guanglei Li <guanglei.li@oracle.com> Signed-off-by: Honglei Wang <honglei.wang@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
commit 7ba7166 upstream. It was reported by Sergey Senozhatsky that if THP (Transparent Huge Page) and frontswap (via zswap) are both enabled, when memory goes low so that swap is triggered, segfault and memory corruption will occur in random user space applications as follow, kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000] #0 0x00007fc08889ae0d _int_malloc (libc.so.6) #1 0x00007fc08889c2f3 malloc (libc.so.6) #2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt) #3 0x0000560e6005e75c n/a (urxvt) #4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt) #5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt) #6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt) #7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt) #8 0x0000560e6005cb55 ev_run (urxvt) #9 0x0000560e6003b9b9 main (urxvt) #10 0x00007fc08883af4a __libc_start_main (libc.so.6) #11 0x0000560e6003f9da _start (urxvt) After bisection, it was found the first bad commit is bd4c82c ("mm, THP, swap: delay splitting THP after swapped out"). The root cause is as follows: When the pages are written to swap device during swapping out in swap_writepage(), zswap (fontswap) is tried to compress the pages to improve performance. But zswap (frontswap) will treat THP as a normal page, so only the head page is saved. After swapping in, tail pages will not be restored to their original contents, causing memory corruption in the applications. This is fixed by refusing to save page in the frontswap store functions if the page is a THP. So that the THP will be swapped out to swap device. Another choice is to split THP if frontswap is enabled. But it is found that the frontswap enabling isn't flexible. For example, if CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if zswap itself isn't enabled. Frontswap has multiple backends, to make it easy for one backend to enable THP support, the THP checking is put in backend frontswap store functions instead of the general interfaces. Link: http://lkml.kernel.org/r/20180209084947.22749-1-ying.huang@intel.com Fixes: bd4c82c ("mm, THP, swap: delay splitting THP after swapped out") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Suggested-by: Minchan Kim <minchan@kernel.org> [put THP checking in backend] Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Shaohua Li <shli@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Shakeel Butt <shakeelb@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: <stable@vger.kernel.org> [4.14] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was reported by Sergey Senozhatsky that if THP (Transparent Huge Page) and frontswap (via zswap) are both enabled, when memory goes low so that swap is triggered, segfault and memory corruption will occur in random user space applications as follow, kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000] #0 0x00007fc08889ae0d _int_malloc (libc.so.6) anholt#1 0x00007fc08889c2f3 malloc (libc.so.6) anholt#2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt) anholt#3 0x0000560e6005e75c n/a (urxvt) anholt#4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt) anholt#5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt) anholt#6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt) anholt#7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt) anholt#8 0x0000560e6005cb55 ev_run (urxvt) anholt#9 0x0000560e6003b9b9 main (urxvt) anholt#10 0x00007fc08883af4a __libc_start_main (libc.so.6) anholt#11 0x0000560e6003f9da _start (urxvt) After bisection, it was found the first bad commit is bd4c82c ("mm, THP, swap: delay splitting THP after swapped out"). The root cause is as follows: When the pages are written to swap device during swapping out in swap_writepage(), zswap (fontswap) is tried to compress the pages to improve performance. But zswap (frontswap) will treat THP as a normal page, so only the head page is saved. After swapping in, tail pages will not be restored to their original contents, causing memory corruption in the applications. This is fixed by refusing to save page in the frontswap store functions if the page is a THP. So that the THP will be swapped out to swap device. Another choice is to split THP if frontswap is enabled. But it is found that the frontswap enabling isn't flexible. For example, if CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if zswap itself isn't enabled. Frontswap has multiple backends, to make it easy for one backend to enable THP support, the THP checking is put in backend frontswap store functions instead of the general interfaces. Link: http://lkml.kernel.org/r/20180209084947.22749-1-ying.huang@intel.com Fixes: bd4c82c ("mm, THP, swap: delay splitting THP after swapped out") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Suggested-by: Minchan Kim <minchan@kernel.org> [put THP checking in backend] Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Shaohua Li <shli@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Shakeel Butt <shakeelb@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: <stable@vger.kernel.org> [4.14] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 3d942ee upstream. If System V shmget/shmat operations are used to create a hugetlbfs backed mapping, it is possible to munmap part of the mapping and split the underlying vma such that it is not huge page aligned. This will untimately result in the following BUG: kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310! Oops: Exception in kernel mode, sig: 5 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G C E 4.15.0-10-generic #11-Ubuntu NIP: c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009 REGS: c000003fbcdcf810 TRAP: 0700 Tainted: G C E (4.15.0-10-generic) MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24002222 XER: 20040000 CFAR: c00000000036ee44 SOFTE: 1 NIP __unmap_hugepage_range+0xa4/0x760 LR __unmap_hugepage_range_final+0x28/0x50 Call Trace: 0x7115e4e00000 (unreliable) __unmap_hugepage_range_final+0x28/0x50 unmap_single_vma+0x11c/0x190 unmap_vmas+0x94/0x140 exit_mmap+0x9c/0x1d0 mmput+0xa8/0x1d0 do_exit+0x360/0xc80 do_group_exit+0x60/0x100 SyS_exit_group+0x24/0x30 system_call+0x58/0x6c ---[ end trace ee88f958a1c62605 ]--- This bug was introduced by commit 31383c6 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct"). A split function was added to vm_operations_struct to determine if a mapping can be split. This was mostly for device-dax and hugetlbfs mappings which have specific alignment constraints. Mappings initiated via shmget/shmat have their original vm_ops overwritten with shm_vm_ops. shm_vm_ops functions will call back to the original vm_ops if needed. Add such a split function to shm_vm_ops. Link: http://lkml.kernel.org/r/20180321161314.7711-1-mike.kravetz@oracle.com Fixes: 31383c6 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Patch series "kexec_file: Clean up purgatory load", v2. Following the discussion with Dave and AKASHI, here are the common code patches extracted from my recent patch set (Add kexec_file_load support to s390) [1]. The patches were extracted to allow upstream integration together with AKASHI's common code patches before the arch code gets adjusted to the new base. The reason for this series is to prepare common code for adding kexec_file_load to s390 as well as cleaning up the mis-use of the sh_offset field during purgatory load. In detail this series contains: Patch #1&2: Minor cleanups/fixes. Patch #3-9: Clean up the purgatory load/relocation code. Especially remove the mis-use of the purgatory_info->sechdrs->sh_offset field, currently holding a pointer into either kexec_purgatory (ro) or purgatory_buf (rw) depending on the section. With these patches the section address will be calculated verbosely and sh_offset will contain the offset of the section in the stripped purgatory binary (purgatory_buf). Patch #10: Allows architectures to set the purgatory load address. This patch is important for s390 as the kernel and purgatory have to be loaded to fixed addresses. In current code this is impossible as the purgatory load is opaque to the architecture. Patch #11: Moves x86 purgatories sha implementation to common lib/ directory to allow reuse in other architectures. This patch (of 11) When building the kernel with CONFIG_KEXEC_FILE enabled gcc prints a compile warning multiple times. In file included from <path>/linux/init/initramfs.c:526:0: <path>/include/linux/kexec.h:120:9: warning: `struct kimage' declared inside parameter list [enabled by default] unsigned long cmdline_len); ^ This is because the typedefs for kexec_file_load uses struct kimage before it is declared. Fix this by simply forward declaring struct kimage. Link: http://lkml.kernel.org/r/20180321112751.22196-2-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ Upstream commit a9d572c ] When io_bits is set, GCing encrypted block may hit the following hungtask. Since io_bits requires aligned block address, f2fs_submit_page_write may return -EAGAIN if new_blkaddr does not satisify io_bits alignment. As a result, the encrypted page will never be writtenback. This patch makes move_data_block aware the EAGAIN error and cancel the writeback. [ 246.751371] INFO: task kworker/u4:4:797 blocked for more than 90 seconds. [ 246.752423] Not tainted 4.15.0-rc4+ #11 [ 246.754176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.755336] kworker/u4:4 D25448 797 2 0x80000000 [ 246.755597] Workqueue: writeback wb_workfn (flush-7:0) [ 246.755616] Call Trace: [ 246.755695] ? __schedule+0x322/0xa90 [ 246.755761] ? blk_init_request_from_bio+0x120/0x120 [ 246.755773] ? pci_mmcfg_check_reserved+0xb0/0xb0 [ 246.755801] ? __radix_tree_create+0x19e/0x200 [ 246.755813] ? delete_node+0x136/0x370 [ 246.755838] schedule+0x43/0xc0 [ 246.755904] io_schedule+0x17/0x40 [ 246.755939] wait_on_page_bit_common+0x17b/0x240 [ 246.755950] ? wake_page_function+0xa0/0xa0 [ 246.755961] ? add_to_page_cache_lru+0x160/0x160 [ 246.755972] ? page_cache_tree_insert+0x170/0x170 [ 246.755983] ? __lru_cache_add+0x96/0xb0 [ 246.756086] __filemap_fdatawait_range+0x14f/0x1c0 [ 246.756097] ? wait_on_page_bit_common+0x240/0x240 [ 246.756120] ? __wake_up_locked_key_bookmark+0x20/0x20 [ 246.756167] ? wait_on_all_pages_writeback+0xc9/0x100 [ 246.756179] ? __remove_ino_entry+0x120/0x120 [ 246.756192] ? wait_woken+0x100/0x100 [ 246.756204] filemap_fdatawait_range+0x9/0x20 [ 246.756216] write_checkpoint+0x18a1/0x1f00 [ 246.756254] ? blk_get_request+0x10/0x10 [ 246.756265] ? cpumask_next_and+0x43/0x60 [ 246.756279] ? f2fs_sync_inode_meta+0x160/0x160 [ 246.756289] ? remove_element.isra.4+0xa0/0xa0 [ 246.756300] ? __put_compound_page+0x40/0x40 [ 246.756310] ? f2fs_sync_fs+0xec/0x1c0 [ 246.756320] ? f2fs_sync_fs+0x120/0x1c0 [ 246.756329] f2fs_sync_fs+0x120/0x1c0 [ 246.756357] ? trace_event_raw_event_f2fs__page+0x260/0x260 [ 246.756393] ? ata_build_rw_tf+0x173/0x410 [ 246.756397] f2fs_balance_fs_bg+0x198/0x390 [ 246.756405] ? drop_inmem_page+0x230/0x230 [ 246.756415] ? ahci_qc_prep+0x1bb/0x2e0 [ 246.756418] ? ahci_qc_issue+0x1df/0x290 [ 246.756422] ? __accumulate_pelt_segments+0x42/0xd0 [ 246.756426] ? f2fs_write_node_pages+0xd1/0x380 [ 246.756429] f2fs_write_node_pages+0xd1/0x380 [ 246.756437] ? sync_node_pages+0x8f0/0x8f0 [ 246.756440] ? update_curr+0x53/0x220 [ 246.756444] ? __accumulate_pelt_segments+0xa2/0xd0 [ 246.756448] ? __update_load_avg_se.isra.39+0x349/0x360 [ 246.756452] ? do_writepages+0x2a/0xa0 [ 246.756456] do_writepages+0x2a/0xa0 [ 246.756460] __writeback_single_inode+0x70/0x490 [ 246.756463] ? check_preempt_wakeup+0x199/0x310 [ 246.756467] writeback_sb_inodes+0x2a2/0x660 [ 246.756471] ? is_empty_dir_inode+0x40/0x40 [ 246.756474] ? __writeback_single_inode+0x490/0x490 [ 246.756477] ? string+0xbf/0xf0 [ 246.756480] ? down_read_trylock+0x35/0x60 [ 246.756484] __writeback_inodes_wb+0x9f/0xf0 [ 246.756488] wb_writeback+0x41d/0x4b0 [ 246.756492] ? writeback_inodes_wb.constprop.55+0x150/0x150 [ 246.756498] ? set_worker_desc+0xf7/0x130 [ 246.756502] ? current_is_workqueue_rescuer+0x60/0x60 [ 246.756511] ? _find_next_bit+0x2c/0xa0 [ 246.756514] ? wb_workfn+0x400/0x5d0 [ 246.756518] wb_workfn+0x400/0x5d0 [ 246.756521] ? finish_task_switch+0xdf/0x2a0 [ 246.756525] ? inode_wait_for_writeback+0x30/0x30 [ 246.756529] process_one_work+0x3a7/0x6f0 [ 246.756533] worker_thread+0x82/0x750 [ 246.756537] kthread+0x16f/0x1c0 [ 246.756541] ? trace_event_raw_event_workqueue_work+0x110/0x110 [ 246.756544] ? kthread_create_worker_on_cpu+0xb0/0xb0 [ 246.756548] ret_from_fork+0x1f/0x30 Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2c0aa08 ] Scenario: 1. Port down and do fail over 2. Ap do rds_bind syscall PID: 47039 TASK: ffff89887e2fe640 CPU: 47 COMMAND: "kworker/u:6" #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9 #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3 #2 [ffff898e35f15b30] oops_end at ffffffff8150f518 #3 [ffff898e35f15b60] no_context at ffffffff8104854c #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675 #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3 #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8 #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95 [exception RIP: unknown or invalid address] RIP: 0000000000000000 RSP: ffff898e35f15dc8 RFLAGS: 00010282 RAX: 00000000fffffffe RBX: ffff889b77f6fc00 RCX:ffffffff81c99d88 RDX: 0000000000000000 RSI: ffff896019ee08e8 RDI:ffff889b77f6fc00 RBP: ffff898e35f15df0 R8: ffff896019ee08c8 R9:0000000000000000 R10: 0000000000000400 R11: 0000000000000000 R12:ffff896019ee08c0 R13: ffff889b77f6fe68 R14: ffffffff81c99d80 R15: ffffffffa022a1e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm] #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6 #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0 #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6 PID: 45659 TASK: ffff880d313d2500 CPU: 31 COMMAND: "oracle_45659_ap" #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4 #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7 #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm] #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma] #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds] #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds] #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670 PID: 45659 PID: 47039 rds_ib_laddr_check /* create id_priv with a null event_handler */ rdma_create_id rdma_bind_addr cma_acquire_dev /* add id_priv to cma_dev->id_list */ cma_attach_to_dev cma_ndev_work_handler /* event_hanlder is null */ id_priv->id.event_handler Signed-off-by: Guanglei Li <guanglei.li@oracle.com> Signed-off-by: Honglei Wang <honglei.wang@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It can happen that a commit message refers to an invalid commit id, because the referenced hash changed following a rebase, or simply by mistake. Add a check in checkpatch.pl which checks that an hash referenced by a Fixes tag, or just cited in the commit message, is a valid commit id. $ scripts/checkpatch.pl <<'EOF' Subject: [PATCH] test commit Sample test commit to test checkpatch.pl Commit 1da177e ("Linux-2.6.12-rc2") really exists, commit 0bba044c4ce7 ("tree") is valid but not a commit, while commit b4cc0b1c0cca ("unknown") is invalid. Fixes: f0cacc14cade ("unknown") Fixes: 1da177e ("Linux-2.6.12-rc2") EOF WARNING: Unknown commit id '0bba044c4ce7', maybe rebased or not pulled? #8: commit 0bba044c4ce7 ("tree") is valid but not a commit, WARNING: Unknown commit id 'b4cc0b1c0cca', maybe rebased or not pulled? #9: while commit b4cc0b1c0cca ("unknown") is invalid. WARNING: Unknown commit id 'f0cacc14cade', maybe rebased or not pulled? #11: Fixes: f0cacc14cade ("unknown") total: 0 errors, 3 warnings, 4 lines checked Link: http://lkml.kernel.org/r/20190711001640.13398-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Made progress on this on 2016-02-15, but not quite there yet.
The text was updated successfully, but these errors were encountered: