vc4: dmabuf import from camera driver not supported #12

anholt · 2016-02-15T20:35:51Z

We need this for zero copy display of camera contents.

Fixes segmentation fault using, for instance: (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 (gdb) bt #0 0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 #1 0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:433 #2 0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:498 #3 0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:936 #4 0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391 #5 0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361 #6 0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401 #7 0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253 #8 0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364 #9 0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664 #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539 #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264 #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390 #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451 #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495 #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618 (gdb) Intel PT attempts to find the sched:sched_switch tracepoint but that seg faults if tracefs is not readable, because the error reporting structure is null, as errors are not reported when automatically adding tracepoints. Fix by checking before using. Committer note: This doesn't take place in a kernel that supports perf_event_attr.context_switch, that is the default way that will be used for tracking context switches, only in older kernels, like 4.2, in a machine with Intel PT (e.g. Broadwell) for non-priviledged users. Further info from a similar patch by Wang: The error is in tracepoint_error: it assumes the 'e' parameter is valid. However, there are many situation a parse_event() can be called without parse_events_error. See result of $ grep 'parse_events(.*NULL)' ./tools/perf/ -r' Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Tong Zhang <ztong@vt.edu> Cc: Wang Nan <wangnan0@huawei.com> Cc: stable@vger.kernel.org # v4.4+ Fixes: 1965817 ("perf tools: Enhance parsing events tracepoint error output") Link: http://lkml.kernel.org/r/1453809921-24596-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

MarcoBodega · 2016-03-15T10:44:49Z

+1 on everything related to camera

Signed-off-by: popcornmix <popcornmix@gmail.com> vchiq: create_pagelist copes with vmalloc memory Signed-off-by: Daniel Stone <daniels@collabora.com> vchiq: fix the shim message release Signed-off-by: Daniel Stone <daniels@collabora.com> vchiq: export additional symbols Signed-off-by: Daniel Stone <daniels@collabora.com> VCHIQ: Make service closure fully synchronous (drv) This is one half of a two-part patch, the other half of which is to the vchiq_lib user library. With these patches, calls to vchiq_close_service and vchiq_remove_service won't return until any associated callbacks have been delivered to the callback thread. VCHIQ: Add per-service tracing The new service option VCHIQ_SERVICE_OPTION_TRACE is a boolean that toggles tracing for the specified service. This commit also introduces vchi_service_set_option and the associated option VCHI_SERVICE_OPTION_TRACE. vchiq: Make the synchronous-CLOSE logic more tolerant vchiq: Move logging control into debugfs vchiq: Take care of a corner case tickled by VCSM Closing a connection that isn't fully open requires care, since one side does not know the other side's port number. Code was present to handle the case where a CLOSE is sent immediately after an OPEN, i.e. before the OPENACK has been received, but this was incorrectly being used when an OPEN from a client using port 0 was rejected. (In the observed failure, the host was attempting to use the VCSM service, which isn't present in the 'cutdown' firmware. The failure was intermittent because sometimes the keepalive service would grab port 0.) This case can be distinguished because the client's remoteport will still be VCHIQ_PORT_FREE, and the srvstate will be OPENING. Either condition is sufficient to differentiate it from the special case described above. vchiq: Avoid high load when blocked and unkillable vchiq: Include SIGSTOP and SIGCONT in list of signals not-masked by vchiq to allow gdb to work vchiq_arm: Complete support for SYNCHRONOUS mode vchiq: Remove inline from suspend/resume vchiq: Allocation does not need to be atomic vchiq: Fix wrong condition check The log level is checked from within the log call. Remove the check in the call. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> BCM270x: Add vchiq device to platform file and Device Tree Prepare to turn the vchiq module into a driver. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> bcm2708: vchiq: Add Device Tree support Turn vchiq into a driver and stop hardcoding resources. Use devm_* functions in probe path to simplify cleanup. A global variable is used to hold the register address. This is done to keep this patch as small as possible. Also make available on ARCH_BCM2835. Based on work by Lubomir Rintel. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> vchiq: Change logging level for inbound data vchiq_arm: Two cacheing fixes 1) Make fragment size vary with cache line size Without this patch, non-cache-line-aligned transfers may corrupt (or be corrupted by) adjacent data structures. Both ARM and VC need to be updated to enable this feature. This is ensured by having the loader apply a new DT parameter - cache-line-size. The existence of this parameter guarantees that the kernel is capable, and the parameter will only be modified from the safe default if the loader is capable. 2) Flush/invalidate vmalloc'd memory, and invalidate after reads vchiq: fix NULL pointer dereference when closing driver The following code run as root will cause a null pointer dereference oops: int fd = open("/dev/vc-cma", O_RDONLY); if (fd < 0) err(1, "open failed"); (void)close(fd); [ 1704.877721] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 1704.877725] pgd = b899c000 [ 1704.877736] [00000000] *pgd=37fab831, *pte=00000000, *ppte=00000000 [ 1704.877748] Internal error: Oops: 817 [#1] PREEMPT SMP ARM [ 1704.877765] Modules linked in: evdev i2c_bcm2708 uio_pdrv_genirq uio [ 1704.877774] CPU: 2 PID: 3656 Comm: stress-ng-fstat Not tainted 3.19.1-12-generic-bcm2709 #12-Ubuntu [ 1704.877777] Hardware name: BCM2709 [ 1704.877783] task: b8ab9b00 ti: b7e68000 task.ti: b7e68000 [ 1704.877798] PC is at __down_interruptible+0x50/0xec [ 1704.877806] LR is at down_interruptible+0x5c/0x68 [ 1704.877813] pc : [<80630ee8>] lr : [<800704b0>] psr: 60080093 sp : b7e69e50 ip : b7e69e88 fp : b7e69e84 [ 1704.877817] r10: b88123c8 r9 : 00000010 r8 : 00000001 [ 1704.877822] r7 : b8ab9b00 r6 : 7fffffff r5 : 80a1cc34 r4 : 80a1cc34 [ 1704.877826] r3 : b7e69e50 r2 : 00000000 r1 : 00000000 r0 : 80a1cc34 [ 1704.877833] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user [ 1704.877838] Control: 10c5387d Table: 3899c06a DAC: 00000015 [ 1704.877843] Process do-oops (pid: 3656, stack limit = 0xb7e68238) [ 1704.877848] Stack: (0xb7e69e50 to 0xb7e6a000) [ 1704.877856] 9e40: 80a1cc3c 00000000 00000010 b88123c8 [ 1704.877865] 9e60: b7e69e84 80a1cc34 fff9fee9 ffffffff b7e68000 00000009 b7e69ea4 b7e69e88 [ 1704.877874] 9e80: 800704b0 80630ea fff9fee9 60080013 80a1cc28 fff9fee9 b7e69edc b7e69ea8 [ 1704.877884] 9ea0: 8040f558 80070460 fff9fee9 ffffffff 00000000 00000000 00000009 80a1cb7c [ 1704.877893] 9ec0: 00000000 80a1cb7c 00000000 00000010 b7e69ef4 b7e69ee0 803e1ba4 8040f514 [ 1704.877902] 9ee0: 00000e48 80a1cb7c b7e69f14 b7e69ef8 803e1c9c 803e1b74 b88123c0 b92acb18 [ 1704.877911] 9f00: b8812790 b8d815d8 b7e69f24 b7e69f18 803e2250 803e1bc8 b7e69f5c b7e69f28 [ 1704.877921] 9f20: 80167bac 803e222c 00000000 00000000 b7e69f54 b8ab9ffc 00000000 8098c794 [ 1704.877930] 9f40: b8ab9b00 8000efc4 b7e68000 00000000 b7e69f6c b7e69f60 80167d6c 80167b28 [ 1704.877939] 9f60: b7e69f8c b7e69f70 80047d38 80167d60 b7e68000 b7e68010 8000efc4 b7e69fb0 [ 1704.877949] 9f80: b7e69fac b7e69f90 80012820 80047c84 01155490 011549a8 00000001 00000006 [ 1704.877957] 9fa0: 00000000 b7e69fb0 8000ee5c 80012790 00000000 353d8c0f 7efc4308 00000000 [ 1704.877966] 9fc0: 01155490 011549a8 00000001 00000006 00000000 00000000 76cf3ba0 00000003 [ 1704.877975] 9fe0: 00000000 7efc42e4 0002272f 76e2ed66 60080030 00000003 00000000 00000000 [ 1704.877998] [<80630ee8>] (__down_interruptible) from [<800704b0>] (down_interruptible+0x5c/0x68) [ 1704.878015] [<800704b0>] (down_interruptible) from [<8040f558>] (vchiu_queue_push+0x50/0xd8) [ 1704.878032] [<8040f558>] (vchiu_queue_push) from [<803e1ba4>] (send_worker_msg+0x3c/0x54) [ 1704.878045] [<803e1ba4>] (send_worker_msg) from [<803e1c9c>] (vc_cma_set_reserve+0xe0/0x1c4) [ 1704.878057] [<803e1c9c>] (vc_cma_set_reserve) from [<803e2250>] (vc_cma_release+0x30/0x38) [ 1704.878069] [<803e2250>] (vc_cma_release) from [<80167bac>] (__fput+0x90/0x1e0) [ 1704.878082] [<80167bac>] (__fput) from [<80167d6c>] (____fput+0x18/0x1c) [ 1704.878094] [<80167d6c>] (____fput) from [<80047d38>] (task_work_run+0xc0/0xf8) [ 1704.878109] [<80047d38>] (task_work_run) from [<80012820>] (do_work_pending+0x9c/0xc4) [ 1704.878123] [<80012820>] (do_work_pending) from [<8000ee5c>] (work_pending+0xc/0x20) [ 1704.878133] Code: e50b1034 e3a01000 e50b2030 e580300c (e5823000) ..the fix is to ensure that we have actually initialized the queue before we attempt to push any items onto it. This occurs if we do an open() followed by a close() without any activity in between. Signed-off-by: Colin Ian King <colin.king@canonical.com> vchiq_arm: Sort out the vmalloc case See: raspberrypi#1055 vchiq: hack: Add include depecated dma include file

commit ec183d2 upstream. Fixes segmentation fault using, for instance: (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 (gdb) bt #0 0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 #1 0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:433 #2 0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:498 #3 0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:936 #4 0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391 #5 0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361 #6 0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401 #7 0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253 #8 0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364 #9 0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664 #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539 #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264 #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390 #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451 #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495 #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618 (gdb) Intel PT attempts to find the sched:sched_switch tracepoint but that seg faults if tracefs is not readable, because the error reporting structure is null, as errors are not reported when automatically adding tracepoints. Fix by checking before using. Committer note: This doesn't take place in a kernel that supports perf_event_attr.context_switch, that is the default way that will be used for tracking context switches, only in older kernels, like 4.2, in a machine with Intel PT (e.g. Broadwell) for non-priviledged users. Further info from a similar patch by Wang: The error is in tracepoint_error: it assumes the 'e' parameter is valid. However, there are many situation a parse_event() can be called without parse_events_error. See result of $ grep 'parse_events(.*NULL)' ./tools/perf/ -r' Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Tong Zhang <ztong@vt.edu> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 1965817 ("perf tools: Enhance parsing events tracepoint error output") Link: http://lkml.kernel.org/r/1453809921-24596-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: popcornmix <popcornmix@gmail.com> vchiq: create_pagelist copes with vmalloc memory Signed-off-by: Daniel Stone <daniels@collabora.com> vchiq: fix the shim message release Signed-off-by: Daniel Stone <daniels@collabora.com> vchiq: export additional symbols Signed-off-by: Daniel Stone <daniels@collabora.com> VCHIQ: Make service closure fully synchronous (drv) This is one half of a two-part patch, the other half of which is to the vchiq_lib user library. With these patches, calls to vchiq_close_service and vchiq_remove_service won't return until any associated callbacks have been delivered to the callback thread. VCHIQ: Add per-service tracing The new service option VCHIQ_SERVICE_OPTION_TRACE is a boolean that toggles tracing for the specified service. This commit also introduces vchi_service_set_option and the associated option VCHI_SERVICE_OPTION_TRACE. vchiq: Make the synchronous-CLOSE logic more tolerant vchiq: Move logging control into debugfs vchiq: Take care of a corner case tickled by VCSM Closing a connection that isn't fully open requires care, since one side does not know the other side's port number. Code was present to handle the case where a CLOSE is sent immediately after an OPEN, i.e. before the OPENACK has been received, but this was incorrectly being used when an OPEN from a client using port 0 was rejected. (In the observed failure, the host was attempting to use the VCSM service, which isn't present in the 'cutdown' firmware. The failure was intermittent because sometimes the keepalive service would grab port 0.) This case can be distinguished because the client's remoteport will still be VCHIQ_PORT_FREE, and the srvstate will be OPENING. Either condition is sufficient to differentiate it from the special case described above. vchiq: Avoid high load when blocked and unkillable vchiq: Include SIGSTOP and SIGCONT in list of signals not-masked by vchiq to allow gdb to work vchiq_arm: Complete support for SYNCHRONOUS mode vchiq: Remove inline from suspend/resume vchiq: Allocation does not need to be atomic vchiq: Fix wrong condition check The log level is checked from within the log call. Remove the check in the call. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> BCM270x: Add vchiq device to platform file and Device Tree Prepare to turn the vchiq module into a driver. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> bcm2708: vchiq: Add Device Tree support Turn vchiq into a driver and stop hardcoding resources. Use devm_* functions in probe path to simplify cleanup. A global variable is used to hold the register address. This is done to keep this patch as small as possible. Also make available on ARCH_BCM2835. Based on work by Lubomir Rintel. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> vchiq: Change logging level for inbound data vchiq_arm: Two cacheing fixes 1) Make fragment size vary with cache line size Without this patch, non-cache-line-aligned transfers may corrupt (or be corrupted by) adjacent data structures. Both ARM and VC need to be updated to enable this feature. This is ensured by having the loader apply a new DT parameter - cache-line-size. The existence of this parameter guarantees that the kernel is capable, and the parameter will only be modified from the safe default if the loader is capable. 2) Flush/invalidate vmalloc'd memory, and invalidate after reads vchiq: fix NULL pointer dereference when closing driver The following code run as root will cause a null pointer dereference oops: int fd = open("/dev/vc-cma", O_RDONLY); if (fd < 0) err(1, "open failed"); (void)close(fd); [ 1704.877721] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 1704.877725] pgd = b899c000 [ 1704.877736] [00000000] *pgd=37fab831, *pte=00000000, *ppte=00000000 [ 1704.877748] Internal error: Oops: 817 [#1] PREEMPT SMP ARM [ 1704.877765] Modules linked in: evdev i2c_bcm2708 uio_pdrv_genirq uio [ 1704.877774] CPU: 2 PID: 3656 Comm: stress-ng-fstat Not tainted 3.19.1-12-generic-bcm2709 #12-Ubuntu [ 1704.877777] Hardware name: BCM2709 [ 1704.877783] task: b8ab9b00 ti: b7e68000 task.ti: b7e68000 [ 1704.877798] PC is at __down_interruptible+0x50/0xec [ 1704.877806] LR is at down_interruptible+0x5c/0x68 [ 1704.877813] pc : [<80630ee8>] lr : [<800704b0>] psr: 60080093 sp : b7e69e50 ip : b7e69e88 fp : b7e69e84 [ 1704.877817] r10: b88123c8 r9 : 00000010 r8 : 00000001 [ 1704.877822] r7 : b8ab9b00 r6 : 7fffffff r5 : 80a1cc34 r4 : 80a1cc34 [ 1704.877826] r3 : b7e69e50 r2 : 00000000 r1 : 00000000 r0 : 80a1cc34 [ 1704.877833] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user [ 1704.877838] Control: 10c5387d Table: 3899c06a DAC: 00000015 [ 1704.877843] Process do-oops (pid: 3656, stack limit = 0xb7e68238) [ 1704.877848] Stack: (0xb7e69e50 to 0xb7e6a000) [ 1704.877856] 9e40: 80a1cc3c 00000000 00000010 b88123c8 [ 1704.877865] 9e60: b7e69e84 80a1cc34 fff9fee9 ffffffff b7e68000 00000009 b7e69ea4 b7e69e88 [ 1704.877874] 9e80: 800704b0 80630ea fff9fee9 60080013 80a1cc28 fff9fee9 b7e69edc b7e69ea8 [ 1704.877884] 9ea0: 8040f558 80070460 fff9fee9 ffffffff 00000000 00000000 00000009 80a1cb7c [ 1704.877893] 9ec0: 00000000 80a1cb7c 00000000 00000010 b7e69ef4 b7e69ee0 803e1ba4 8040f514 [ 1704.877902] 9ee0: 00000e48 80a1cb7c b7e69f14 b7e69ef8 803e1c9c 803e1b74 b88123c0 b92acb18 [ 1704.877911] 9f00: b8812790 b8d815d8 b7e69f24 b7e69f18 803e2250 803e1bc8 b7e69f5c b7e69f28 [ 1704.877921] 9f20: 80167bac 803e222c 00000000 00000000 b7e69f54 b8ab9ffc 00000000 8098c794 [ 1704.877930] 9f40: b8ab9b00 8000efc4 b7e68000 00000000 b7e69f6c b7e69f60 80167d6c 80167b28 [ 1704.877939] 9f60: b7e69f8c b7e69f70 80047d38 80167d60 b7e68000 b7e68010 8000efc4 b7e69fb0 [ 1704.877949] 9f80: b7e69fac b7e69f90 80012820 80047c84 01155490 011549a8 00000001 00000006 [ 1704.877957] 9fa0: 00000000 b7e69fb0 8000ee5c 80012790 00000000 353d8c0f 7efc4308 00000000 [ 1704.877966] 9fc0: 01155490 011549a8 00000001 00000006 00000000 00000000 76cf3ba0 00000003 [ 1704.877975] 9fe0: 00000000 7efc42e4 0002272f 76e2ed66 60080030 00000003 00000000 00000000 [ 1704.877998] [<80630ee8>] (__down_interruptible) from [<800704b0>] (down_interruptible+0x5c/0x68) [ 1704.878015] [<800704b0>] (down_interruptible) from [<8040f558>] (vchiu_queue_push+0x50/0xd8) [ 1704.878032] [<8040f558>] (vchiu_queue_push) from [<803e1ba4>] (send_worker_msg+0x3c/0x54) [ 1704.878045] [<803e1ba4>] (send_worker_msg) from [<803e1c9c>] (vc_cma_set_reserve+0xe0/0x1c4) [ 1704.878057] [<803e1c9c>] (vc_cma_set_reserve) from [<803e2250>] (vc_cma_release+0x30/0x38) [ 1704.878069] [<803e2250>] (vc_cma_release) from [<80167bac>] (__fput+0x90/0x1e0) [ 1704.878082] [<80167bac>] (__fput) from [<80167d6c>] (____fput+0x18/0x1c) [ 1704.878094] [<80167d6c>] (____fput) from [<80047d38>] (task_work_run+0xc0/0xf8) [ 1704.878109] [<80047d38>] (task_work_run) from [<80012820>] (do_work_pending+0x9c/0xc4) [ 1704.878123] [<80012820>] (do_work_pending) from [<8000ee5c>] (work_pending+0xc/0x20) [ 1704.878133] Code: e50b1034 e3a01000 e50b2030 e580300c (e5823000) ..the fix is to ensure that we have actually initialized the queue before we attempt to push any items onto it. This occurs if we do an open() followed by a close() without any activity in between. Signed-off-by: Colin Ian King <colin.king@canonical.com> vchiq_arm: Sort out the vmalloc case See: raspberrypi#1055 vchiq: hack: Add include depecated dma include file

Signed-off-by: popcornmix <popcornmix@gmail.com> vchiq: create_pagelist copes with vmalloc memory Signed-off-by: Daniel Stone <daniels@collabora.com> vchiq: fix the shim message release Signed-off-by: Daniel Stone <daniels@collabora.com> vchiq: export additional symbols Signed-off-by: Daniel Stone <daniels@collabora.com> VCHIQ: Make service closure fully synchronous (drv) This is one half of a two-part patch, the other half of which is to the vchiq_lib user library. With these patches, calls to vchiq_close_service and vchiq_remove_service won't return until any associated callbacks have been delivered to the callback thread. VCHIQ: Add per-service tracing The new service option VCHIQ_SERVICE_OPTION_TRACE is a boolean that toggles tracing for the specified service. This commit also introduces vchi_service_set_option and the associated option VCHI_SERVICE_OPTION_TRACE. vchiq: Make the synchronous-CLOSE logic more tolerant vchiq: Move logging control into debugfs vchiq: Take care of a corner case tickled by VCSM Closing a connection that isn't fully open requires care, since one side does not know the other side's port number. Code was present to handle the case where a CLOSE is sent immediately after an OPEN, i.e. before the OPENACK has been received, but this was incorrectly being used when an OPEN from a client using port 0 was rejected. (In the observed failure, the host was attempting to use the VCSM service, which isn't present in the 'cutdown' firmware. The failure was intermittent because sometimes the keepalive service would grab port 0.) This case can be distinguished because the client's remoteport will still be VCHIQ_PORT_FREE, and the srvstate will be OPENING. Either condition is sufficient to differentiate it from the special case described above. vchiq: Avoid high load when blocked and unkillable vchiq: Include SIGSTOP and SIGCONT in list of signals not-masked by vchiq to allow gdb to work vchiq_arm: Complete support for SYNCHRONOUS mode vchiq: Remove inline from suspend/resume vchiq: Allocation does not need to be atomic vchiq: Fix wrong condition check The log level is checked from within the log call. Remove the check in the call. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> BCM270x: Add vchiq device to platform file and Device Tree Prepare to turn the vchiq module into a driver. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> bcm2708: vchiq: Add Device Tree support Turn vchiq into a driver and stop hardcoding resources. Use devm_* functions in probe path to simplify cleanup. A global variable is used to hold the register address. This is done to keep this patch as small as possible. Also make available on ARCH_BCM2835. Based on work by Lubomir Rintel. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> vchiq: Change logging level for inbound data vchiq_arm: Two cacheing fixes 1) Make fragment size vary with cache line size Without this patch, non-cache-line-aligned transfers may corrupt (or be corrupted by) adjacent data structures. Both ARM and VC need to be updated to enable this feature. This is ensured by having the loader apply a new DT parameter - cache-line-size. The existence of this parameter guarantees that the kernel is capable, and the parameter will only be modified from the safe default if the loader is capable. 2) Flush/invalidate vmalloc'd memory, and invalidate after reads vchiq: fix NULL pointer dereference when closing driver The following code run as root will cause a null pointer dereference oops: int fd = open("/dev/vc-cma", O_RDONLY); if (fd < 0) err(1, "open failed"); (void)close(fd); [ 1704.877721] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 1704.877725] pgd = b899c000 [ 1704.877736] [00000000] *pgd=37fab831, *pte=00000000, *ppte=00000000 [ 1704.877748] Internal error: Oops: 817 [#1] PREEMPT SMP ARM [ 1704.877765] Modules linked in: evdev i2c_bcm2708 uio_pdrv_genirq uio [ 1704.877774] CPU: 2 PID: 3656 Comm: stress-ng-fstat Not tainted 3.19.1-12-generic-bcm2709 #12-Ubuntu [ 1704.877777] Hardware name: BCM2709 [ 1704.877783] task: b8ab9b00 ti: b7e68000 task.ti: b7e68000 [ 1704.877798] PC is at __down_interruptible+0x50/0xec [ 1704.877806] LR is at down_interruptible+0x5c/0x68 [ 1704.877813] pc : [<80630ee8>] lr : [<800704b0>] psr: 60080093 sp : b7e69e50 ip : b7e69e88 fp : b7e69e84 [ 1704.877817] r10: b88123c8 r9 : 00000010 r8 : 00000001 [ 1704.877822] r7 : b8ab9b00 r6 : 7fffffff r5 : 80a1cc34 r4 : 80a1cc34 [ 1704.877826] r3 : b7e69e50 r2 : 00000000 r1 : 00000000 r0 : 80a1cc34 [ 1704.877833] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user [ 1704.877838] Control: 10c5387d Table: 3899c06a DAC: 00000015 [ 1704.877843] Process do-oops (pid: 3656, stack limit = 0xb7e68238) [ 1704.877848] Stack: (0xb7e69e50 to 0xb7e6a000) [ 1704.877856] 9e40: 80a1cc3c 00000000 00000010 b88123c8 [ 1704.877865] 9e60: b7e69e84 80a1cc34 fff9fee9 ffffffff b7e68000 00000009 b7e69ea4 b7e69e88 [ 1704.877874] 9e80: 800704b0 80630ea fff9fee9 60080013 80a1cc28 fff9fee9 b7e69edc b7e69ea8 [ 1704.877884] 9ea0: 8040f558 80070460 fff9fee9 ffffffff 00000000 00000000 00000009 80a1cb7c [ 1704.877893] 9ec0: 00000000 80a1cb7c 00000000 00000010 b7e69ef4 b7e69ee0 803e1ba4 8040f514 [ 1704.877902] 9ee0: 00000e48 80a1cb7c b7e69f14 b7e69ef8 803e1c9c 803e1b74 b88123c0 b92acb18 [ 1704.877911] 9f00: b8812790 b8d815d8 b7e69f24 b7e69f18 803e2250 803e1bc8 b7e69f5c b7e69f28 [ 1704.877921] 9f20: 80167bac 803e222c 00000000 00000000 b7e69f54 b8ab9ffc 00000000 8098c794 [ 1704.877930] 9f40: b8ab9b00 8000efc4 b7e68000 00000000 b7e69f6c b7e69f60 80167d6c 80167b28 [ 1704.877939] 9f60: b7e69f8c b7e69f70 80047d38 80167d60 b7e68000 b7e68010 8000efc4 b7e69fb0 [ 1704.877949] 9f80: b7e69fac b7e69f90 80012820 80047c84 01155490 011549a8 00000001 00000006 [ 1704.877957] 9fa0: 00000000 b7e69fb0 8000ee5c 80012790 00000000 353d8c0f 7efc4308 00000000 [ 1704.877966] 9fc0: 01155490 011549a8 00000001 00000006 00000000 00000000 76cf3ba0 00000003 [ 1704.877975] 9fe0: 00000000 7efc42e4 0002272f 76e2ed66 60080030 00000003 00000000 00000000 [ 1704.877998] [<80630ee8>] (__down_interruptible) from [<800704b0>] (down_interruptible+0x5c/0x68) [ 1704.878015] [<800704b0>] (down_interruptible) from [<8040f558>] (vchiu_queue_push+0x50/0xd8) [ 1704.878032] [<8040f558>] (vchiu_queue_push) from [<803e1ba4>] (send_worker_msg+0x3c/0x54) [ 1704.878045] [<803e1ba4>] (send_worker_msg) from [<803e1c9c>] (vc_cma_set_reserve+0xe0/0x1c4) [ 1704.878057] [<803e1c9c>] (vc_cma_set_reserve) from [<803e2250>] (vc_cma_release+0x30/0x38) [ 1704.878069] [<803e2250>] (vc_cma_release) from [<80167bac>] (__fput+0x90/0x1e0) [ 1704.878082] [<80167bac>] (__fput) from [<80167d6c>] (____fput+0x18/0x1c) [ 1704.878094] [<80167d6c>] (____fput) from [<80047d38>] (task_work_run+0xc0/0xf8) [ 1704.878109] [<80047d38>] (task_work_run) from [<80012820>] (do_work_pending+0x9c/0xc4) [ 1704.878123] [<80012820>] (do_work_pending) from [<8000ee5c>] (work_pending+0xc/0x20) [ 1704.878133] Code: e50b1034 e3a01000 e50b2030 e580300c (e5823000) ..the fix is to ensure that we have actually initialized the queue before we attempt to push any items onto it. This occurs if we do an open() followed by a close() without any activity in between. Signed-off-by: Colin Ian King <colin.king@canonical.com> vchiq_arm: Sort out the vmalloc case See: raspberrypi#1055 vchiq: hack: Add include depecated dma include file vchiq_arm: Tweak the logging output Signed-off-by: Phil Elwell <phil@raspberrypi.org> vchiq_arm: Access the dequeue_pending flag locked Reading through this code looking for another problem (now found in userland) the use of dequeue_pending outside a lock didn't seem safe. Signed-off-by: Phil Elwell <phil@raspberrypi.org> vchiq_arm: Service callbacks must not fail Service callbacks are not allowed to return an error. The internal callback that delivers events and messages to user tasks does not enqueue them if the service is closing, but this is not an error and should not be reported as such. Signed-off-by: Phil Elwell <phil@raspberrypi.org>

6by9 · 2016-08-02T08:02:46Z

Hi Eric.
What do you need from the camera subsystem to support this? Or is this a reminder to yourself that you need to support dmabuf importing?
I presume buffers need to be either allocated from VCSM and the reloc heap, or CMA and passed in to V4L2/MMAL, but whichever way you'll want them contiguous.

I'm hopefully going to be looking very soon at what I can do with VCSM and dmabuf (import and export hopefully), so please shout (or we can take the conversation offline) if there is anything in particular that would help you and I'll see if it's possible.

Are we going to have fun with the ARM and VPU fighting over power and clock controls? Ought to coordinate that with PhilE or Dom. There was code for BCM28145 that did have the VPU talking to a kernel service to deal with clock and power requests. Would that help and make the ARM in charge of everything? There's probably easy access to the VPU side code, but probably not the kernel driver.

Dave

anholt · 2016-08-02T20:06:05Z

I think it should be fine for the camera subsystem to be generating dmabuf buffers. My expectation is that I just need to have vc4 walk the page list, make sure they're contiguous, map the paddr to kernel virutal, and set up the drm_gem_cma_object's paddr/vaddr appropriately. I also need to then have vc4 participate in fencing, which I haven't looked into too much yet.

I would love to have the VPU be asking the linux kernel to do power and clock management (we only have a clock driver so far, but it would be great to control power, too). I suspect that will be a challenging architecture change to get Dom/Phil on board with, though.

6by9 · 2016-08-02T20:40:46Z

So currently the V4L2 driver produces non-contiguous buffers as VCHI is doing a copy from VC reloc heap (contiguous) to ARM buffers.
There is a zero copy option that will expose the reloc heap handle from which you can get the phys addr fairly easily. There are a couple of examples kicking around of userspace using MMAL and zero copy, but not mainstream.
Fitting that in with V4L2 is not trivial, however I'm working on the reverse of getting V4L2 to use the kernel contiguous allocator (the magic of videobuf2-dma-contig.h), and then get VCSM to wrap that into a reloc handle.
Fencing is going to be a nightmare, I suspect. Generally the approach taken was that buffer ownership went with buffer passing. It may be possible to make the kernel side of the V4L2 deal with fences as the buffers come back from the VPU, but I don't know for sure. First I'd have to investigate how fences worked with V4L2 at all!

A way down the line, but how were you anticipating getting the 3D side to consume YUV data, as AFAIK it will only accept RGB? It was one of the very troublesome areas with the GLES driver, and that was on on the VPU. Deal with the HVS first though, as that will quite happily swallow it.

One of Phil's team from back in the day wrote the sysman and clockman remote drivers, so if I can get it working then he and Dom can probably be persuaded. Making it backward compatible so that a new firmware with old kernel (missing the appropriate receiver) works may prove trickier.
Do you have access to the MPS Android kernel tree? Pi Towers have the VPU side of the driver, but not the kernel side. Getting hold of that may save some time if we want to go down this route.

Thanks for the feedback. My playtime is limited by young family, but knowing what is useful helps me prioritise. I will apologise in advance as I'm likely to overcommit myself for what I can actually get done.

anholt · 2016-08-02T23:59:23Z

The GL driver already has to do a bunch of shader customizations based on texture format, so inserting sample-3-times-and-colorspace-convert isn't going to be that bad. The math for the specific tiled format might be a pain, though.

I haven't ever seen that Android tree and don't know where I'd find it.

anholt · 2016-08-17T16:03:44Z

@6by9 Have you had any chance to work on the V4L2 stuff? I'm trying to work on the vc4 side of things using some terrible hacks to vchiq so the camera can be used with vb2-dc, but zero-copy should be better for everybody.

anholt · 2016-08-17T17:19:56Z

So, it appears that vc4 is ready to take dmabuf imports, even without new synchronization work. With crazy hacks to the camera module and some demo code, with the command line:

./dmabuf-sharing -M vc4 -i /dev/video0 -f BGR4 -F XB24 -b 2

I've got fullscreen camera output at the console.

Signed-off-by: popcornmix <popcornmix@gmail.com> vchiq: create_pagelist copes with vmalloc memory Signed-off-by: Daniel Stone <daniels@collabora.com> vchiq: fix the shim message release Signed-off-by: Daniel Stone <daniels@collabora.com> vchiq: export additional symbols Signed-off-by: Daniel Stone <daniels@collabora.com> VCHIQ: Make service closure fully synchronous (drv) This is one half of a two-part patch, the other half of which is to the vchiq_lib user library. With these patches, calls to vchiq_close_service and vchiq_remove_service won't return until any associated callbacks have been delivered to the callback thread. VCHIQ: Add per-service tracing The new service option VCHIQ_SERVICE_OPTION_TRACE is a boolean that toggles tracing for the specified service. This commit also introduces vchi_service_set_option and the associated option VCHI_SERVICE_OPTION_TRACE. vchiq: Make the synchronous-CLOSE logic more tolerant vchiq: Move logging control into debugfs vchiq: Take care of a corner case tickled by VCSM Closing a connection that isn't fully open requires care, since one side does not know the other side's port number. Code was present to handle the case where a CLOSE is sent immediately after an OPEN, i.e. before the OPENACK has been received, but this was incorrectly being used when an OPEN from a client using port 0 was rejected. (In the observed failure, the host was attempting to use the VCSM service, which isn't present in the 'cutdown' firmware. The failure was intermittent because sometimes the keepalive service would grab port 0.) This case can be distinguished because the client's remoteport will still be VCHIQ_PORT_FREE, and the srvstate will be OPENING. Either condition is sufficient to differentiate it from the special case described above. vchiq: Avoid high load when blocked and unkillable vchiq: Include SIGSTOP and SIGCONT in list of signals not-masked by vchiq to allow gdb to work vchiq_arm: Complete support for SYNCHRONOUS mode vchiq: Remove inline from suspend/resume vchiq: Allocation does not need to be atomic vchiq: Fix wrong condition check The log level is checked from within the log call. Remove the check in the call. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> BCM270x: Add vchiq device to platform file and Device Tree Prepare to turn the vchiq module into a driver. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> bcm2708: vchiq: Add Device Tree support Turn vchiq into a driver and stop hardcoding resources. Use devm_* functions in probe path to simplify cleanup. A global variable is used to hold the register address. This is done to keep this patch as small as possible. Also make available on ARCH_BCM2835. Based on work by Lubomir Rintel. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> vchiq: Change logging level for inbound data vchiq_arm: Two cacheing fixes 1) Make fragment size vary with cache line size Without this patch, non-cache-line-aligned transfers may corrupt (or be corrupted by) adjacent data structures. Both ARM and VC need to be updated to enable this feature. This is ensured by having the loader apply a new DT parameter - cache-line-size. The existence of this parameter guarantees that the kernel is capable, and the parameter will only be modified from the safe default if the loader is capable. 2) Flush/invalidate vmalloc'd memory, and invalidate after reads vchiq: fix NULL pointer dereference when closing driver The following code run as root will cause a null pointer dereference oops: int fd = open("/dev/vc-cma", O_RDONLY); if (fd < 0) err(1, "open failed"); (void)close(fd); [ 1704.877721] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 1704.877725] pgd = b899c000 [ 1704.877736] [00000000] *pgd=37fab831, *pte=00000000, *ppte=00000000 [ 1704.877748] Internal error: Oops: 817 [#1] PREEMPT SMP ARM [ 1704.877765] Modules linked in: evdev i2c_bcm2708 uio_pdrv_genirq uio [ 1704.877774] CPU: 2 PID: 3656 Comm: stress-ng-fstat Not tainted 3.19.1-12-generic-bcm2709 #12-Ubuntu [ 1704.877777] Hardware name: BCM2709 [ 1704.877783] task: b8ab9b00 ti: b7e68000 task.ti: b7e68000 [ 1704.877798] PC is at __down_interruptible+0x50/0xec [ 1704.877806] LR is at down_interruptible+0x5c/0x68 [ 1704.877813] pc : [<80630ee8>] lr : [<800704b0>] psr: 60080093 sp : b7e69e50 ip : b7e69e88 fp : b7e69e84 [ 1704.877817] r10: b88123c8 r9 : 00000010 r8 : 00000001 [ 1704.877822] r7 : b8ab9b00 r6 : 7fffffff r5 : 80a1cc34 r4 : 80a1cc34 [ 1704.877826] r3 : b7e69e50 r2 : 00000000 r1 : 00000000 r0 : 80a1cc34 [ 1704.877833] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user [ 1704.877838] Control: 10c5387d Table: 3899c06a DAC: 00000015 [ 1704.877843] Process do-oops (pid: 3656, stack limit = 0xb7e68238) [ 1704.877848] Stack: (0xb7e69e50 to 0xb7e6a000) [ 1704.877856] 9e40: 80a1cc3c 00000000 00000010 b88123c8 [ 1704.877865] 9e60: b7e69e84 80a1cc34 fff9fee9 ffffffff b7e68000 00000009 b7e69ea4 b7e69e88 [ 1704.877874] 9e80: 800704b0 80630ea fff9fee9 60080013 80a1cc28 fff9fee9 b7e69edc b7e69ea8 [ 1704.877884] 9ea0: 8040f558 80070460 fff9fee9 ffffffff 00000000 00000000 00000009 80a1cb7c [ 1704.877893] 9ec0: 00000000 80a1cb7c 00000000 00000010 b7e69ef4 b7e69ee0 803e1ba4 8040f514 [ 1704.877902] 9ee0: 00000e48 80a1cb7c b7e69f14 b7e69ef8 803e1c9c 803e1b74 b88123c0 b92acb18 [ 1704.877911] 9f00: b8812790 b8d815d8 b7e69f24 b7e69f18 803e2250 803e1bc8 b7e69f5c b7e69f28 [ 1704.877921] 9f20: 80167bac 803e222c 00000000 00000000 b7e69f54 b8ab9ffc 00000000 8098c794 [ 1704.877930] 9f40: b8ab9b00 8000efc4 b7e68000 00000000 b7e69f6c b7e69f60 80167d6c 80167b28 [ 1704.877939] 9f60: b7e69f8c b7e69f70 80047d38 80167d60 b7e68000 b7e68010 8000efc4 b7e69fb0 [ 1704.877949] 9f80: b7e69fac b7e69f90 80012820 80047c84 01155490 011549a8 00000001 00000006 [ 1704.877957] 9fa0: 00000000 b7e69fb0 8000ee5c 80012790 00000000 353d8c0f 7efc4308 00000000 [ 1704.877966] 9fc0: 01155490 011549a8 00000001 00000006 00000000 00000000 76cf3ba0 00000003 [ 1704.877975] 9fe0: 00000000 7efc42e4 0002272f 76e2ed66 60080030 00000003 00000000 00000000 [ 1704.877998] [<80630ee8>] (__down_interruptible) from [<800704b0>] (down_interruptible+0x5c/0x68) [ 1704.878015] [<800704b0>] (down_interruptible) from [<8040f558>] (vchiu_queue_push+0x50/0xd8) [ 1704.878032] [<8040f558>] (vchiu_queue_push) from [<803e1ba4>] (send_worker_msg+0x3c/0x54) [ 1704.878045] [<803e1ba4>] (send_worker_msg) from [<803e1c9c>] (vc_cma_set_reserve+0xe0/0x1c4) [ 1704.878057] [<803e1c9c>] (vc_cma_set_reserve) from [<803e2250>] (vc_cma_release+0x30/0x38) [ 1704.878069] [<803e2250>] (vc_cma_release) from [<80167bac>] (__fput+0x90/0x1e0) [ 1704.878082] [<80167bac>] (__fput) from [<80167d6c>] (____fput+0x18/0x1c) [ 1704.878094] [<80167d6c>] (____fput) from [<80047d38>] (task_work_run+0xc0/0xf8) [ 1704.878109] [<80047d38>] (task_work_run) from [<80012820>] (do_work_pending+0x9c/0xc4) [ 1704.878123] [<80012820>] (do_work_pending) from [<8000ee5c>] (work_pending+0xc/0x20) [ 1704.878133] Code: e50b1034 e3a01000 e50b2030 e580300c (e5823000) ..the fix is to ensure that we have actually initialized the queue before we attempt to push any items onto it. This occurs if we do an open() followed by a close() without any activity in between. Signed-off-by: Colin Ian King <colin.king@canonical.com> vchiq_arm: Sort out the vmalloc case See: raspberrypi#1055 vchiq: hack: Add include depecated dma include file [gregkh] added dependancy on CONFIG_BROKEN to make things sane for now. Cc: Eric Anholt <eric@anholt.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Function ib_create_qp() was failing to return an error when rdma_rw_init_mrs() fails, causing a crash further down in ib_create_qp() when trying to dereferece the qp pointer which was actually a negative errno. The crash: crash> log|grep BUG [ 136.458121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 crash> bt PID: 3736 TASK: ffff8808543215c0 CPU: 2 COMMAND: "kworker/u64:2" #0 [ffff88084d323340] machine_kexec at ffffffff8105fbb0 #1 [ffff88084d3233b0] __crash_kexec at ffffffff81116758 #2 [ffff88084d323480] crash_kexec at ffffffff8111682d #3 [ffff88084d3234b0] oops_end at ffffffff81032bd6 #4 [ffff88084d3234e0] no_context at ffffffff8106e431 #5 [ffff88084d323530] __bad_area_nosemaphore at ffffffff8106e610 #6 [ffff88084d323590] bad_area_nosemaphore at ffffffff8106e6f4 #7 [ffff88084d3235a0] __do_page_fault at ffffffff8106ebdc #8 [ffff88084d323620] do_page_fault at ffffffff8106f057 #9 [ffff88084d323660] page_fault at ffffffff816e3148 [exception RIP: ib_create_qp+427] RIP: ffffffffa02554fb RSP: ffff88084d323718 RFLAGS: 00010246 RAX: 0000000000000004 RBX: fffffffffffffff4 RCX: 000000018020001f RDX: ffff880830997fc0 RSI: 0000000000000001 RDI: ffff88085f407200 RBP: ffff88084d323778 R8: 0000000000000001 R9: ffffea0020bae210 R10: ffffea0020bae218 R11: 0000000000000001 R12: ffff88084d3237c8 R13: 00000000fffffff4 R14: ffff880859fa5000 R15: ffff88082eb89800 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff88084d323780] rdma_create_qp at ffffffffa0782681 [rdma_cm] #11 [ffff88084d3237b0] nvmet_rdma_create_queue_ib at ffffffffa07c43f3 [nvmet_rdma] #12 [ffff88084d323860] nvmet_rdma_alloc_queue at ffffffffa07c5ba9 [nvmet_rdma] #13 [ffff88084d323900] nvmet_rdma_queue_connect at ffffffffa07c5c96 [nvmet_rdma] #14 [ffff88084d323980] nvmet_rdma_cm_handler at ffffffffa07c6450 [nvmet_rdma] #15 [ffff88084d3239b0] iw_conn_req_handler at ffffffffa0787480 [rdma_cm] #16 [ffff88084d323a60] cm_conn_req_handler at ffffffffa0775f06 [iw_cm] #17 [ffff88084d323ab0] process_event at ffffffffa0776019 [iw_cm] #18 [ffff88084d323af0] cm_work_handler at ffffffffa0776170 [iw_cm] #19 [ffff88084d323cb0] process_one_work at ffffffff810a1483 #20 [ffff88084d323d90] worker_thread at ffffffff810a211d #21 [ffff88084d323ec0] kthread at ffffffff810a6c5c #22 [ffff88084d323f50] ret_from_fork at ffffffff816e1ebf Fixes: 632bc3f ("IB/core, RDMA RW API: Do not exceed QP SGE send limit") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: stable@vger.kernel.org Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>

commit 4dfce57 upstream. There have been several reports over the years of NULL pointer dereferences in xfs_trans_log_inode during xfs_fsr processes, when the process is doing an fput and tearing down extents on the temporary inode, something like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 PID: 29439 TASK: ffff880550584fa0 CPU: 6 COMMAND: "xfs_fsr" [exception RIP: xfs_trans_log_inode+0x10] #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs] #10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs] #11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs] #12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs] #13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs] #14 [ffff8800a57bbe00] evict at ffffffff811e1b67 #15 [ffff8800a57bbe28] iput at ffffffff811e23a5 #16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8 #17 [ffff8800a57bbe88] dput at ffffffff811dd06c #18 [ffff8800a57bbea8] __fput at ffffffff811c823b #19 [ffff8800a57bbef0] ____fput at ffffffff811c846e #20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27 #21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c #22 [ffff8800a57bbf50] int_signal at ffffffff8161405d As it turns out, this is because the i_itemp pointer, along with the d_ops pointer, has been overwritten with zeros when we tear down the extents during truncate. When the in-core inode fork on the temporary inode used by xfs_fsr was originally set up during the extent swap, we mistakenly looked at di_nextents to determine whether all extents fit inline, but this misses extents generated by speculative preallocation; we should be using if_bytes instead. This mistake corrupts the in-memory inode, and code in xfs_iext_remove_inline eventually gets bad inputs, causing it to memmove and memset incorrect ranges; this became apparent because the two values in ifp->if_u2.if_inline_ext[1] contained what should have been in d_ops and i_itemp; they were memmoved due to incorrect array indexing and then the original locations were zeroed with memset, again due to an array overrun. Fix this by properly using i_df.if_bytes to determine the number of extents, not di_nextents. Thanks to dchinner for looking at this with me and spotting the root cause. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt · 2017-01-27T21:56:11Z

Heads up: I just submitted the camera driver as of a15ba87 upstream to the staging tree. This was a request from the staging maintainer so that we have the consumer of VCHI in-tree whlie people are actively fixing up VCHI.

gohai · 2017-01-29T03:53:09Z

@anholt Exciting news!

I stumbled upon an unrelated issue with the v4l2 camera driver and GStreamer 1.10, where the sizeimage provided by TRY/S_FMT does not match what GStreamer was expecting. Here are the details, including a workaround for GStreamer that does the trick for me. Hoping that this will get merged into future versions of GStreamer, but would it maybe make sense to add to the TODO for staging? (I have no experience with v4l2 myself, unfortunately.)

6by9 · 2017-01-29T07:29:49Z

This is not really the place to discuss unrelated issues - raise a bug on raspberrypi/linux, but a quick explanation.
It's not a bug. The GPU has a restriction that the height must be a multiple of 16. Your height of 200 is not. The buffer size has to reflect the padded height of a frame to 640x208, otherwise you'll corrupt the memory after your image. 640*208 = 133120 which is the result you have seen.
I've not seen a mechanism within V4L2 to allow a driver to specify vertical padding requirements except by requesting a bigger buffer than is required. This doesn't work on formats such as V4L2_PIX_FMT_YUV420 where you want to specify padding between the planes, so the GPU has to do some further munging of the data on those formats.

gohai · 2017-01-29T18:39:55Z

@6by9 Thank you for your explanation and sorry for bringing this up here. I will pass on your info to the GStreamer developers, and I hope they'll handle this in future 1.10.x releases.

[ Upstream commit 45caeaa ] As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: #8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . #9 [] tcp_rcv_established at ffffffff81580b64 #10 [] tcp_v4_do_rcv at ffffffff8158b54a #11 [] tcp_v4_rcv at ffffffff8158cd02 #12 [] ip_local_deliver_finish at ffffffff815668f4 #13 [] ip_local_deliver at ffffffff81566bd9 #14 [] ip_rcv_finish at ffffffff8156656d #15 [] ip_rcv at ffffffff81566f06 #16 [] __netif_receive_skb_core at ffffffff8152b3a2 #17 [] __netif_receive_skb at ffffffff8152b608 #18 [] netif_receive_skb at ffffffff8152b690 #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] #21 [] net_rx_action at ffffffff8152bac2 #22 [] __do_softirq at ffffffff81084b4f #23 [] call_softirq at ffffffff8164845c #24 [] do_softirq at ffffffff81016fc5 #25 [] irq_exit at ffffffff81084ee5 #26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: #8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . #9 [] tcp_rcv_established at ffffffff81580b64 #10 [] tcp_v4_do_rcv at ffffffff8158b54a #11 [] tcp_v4_rcv at ffffffff8158cd02 #12 [] ip_local_deliver_finish at ffffffff815668f4 #13 [] ip_local_deliver at ffffffff81566bd9 #14 [] ip_rcv_finish at ffffffff8156656d #15 [] ip_rcv at ffffffff81566f06 #16 [] __netif_receive_skb_core at ffffffff8152b3a2 #17 [] __netif_receive_skb at ffffffff8152b608 #18 [] netif_receive_skb at ffffffff8152b690 #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] #21 [] net_rx_action at ffffffff8152bac2 #22 [] __do_softirq at ffffffff81084b4f #23 [] call_softirq at ffffffff8164845c #24 [] do_softirq at ffffffff81016fc5 #25 [] irq_exit at ffffffff81084ee5 #26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>

commit ba80aa9 upstream. This patch closes a long standing race in configfs between the creation of a new symlink in create_link(), while the symlink target's config_item is being concurrently removed via configfs_rmdir(). This can happen because the symlink target's reference is obtained by config_item_get() in create_link() before the CONFIGFS_USET_DROPPING bit set by configfs_detach_prep() during configfs_rmdir() shutdown is actually checked.. This originally manifested itself on ppc64 on v4.8.y under heavy load using ibmvscsi target ports with Novalink API: [ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added [ 7879.893760] ------------[ cut here ]------------ [ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs] [ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G O 4.8.17-customv2.22 #12 [ 7879.893812] task: c00000018a0d3400 task.stack: c0000001f3b40000 [ 7879.893813] NIP: d000000002c664ec LR: d000000002c60980 CTR: c000000000b70870 [ 7879.893814] REGS: c0000001f3b43810 TRAP: 0700 Tainted: G O (4.8.17-customv2.22) [ 7879.893815] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28222242 XER: 00000000 [ 7879.893820] CFAR: d000000002c664bc SOFTE: 1 GPR00: d000000002c60980 c0000001f3b43a90 d000000002c70908 c0000000fbc06820 GPR04: c0000001ef1bd900 0000000000000004 0000000000000001 0000000000000000 GPR08: 0000000000000000 0000000000000001 d000000002c69560 d000000002c66d80 GPR12: c000000000b70870 c00000000e798700 c0000001f3b43ca0 c0000001d4949d40 GPR16: c00000014637e1c0 0000000000000000 0000000000000000 c0000000f2392940 GPR20: c0000001f3b43b98 0000000000000041 0000000000600000 0000000000000000 GPR24: fffffffffffff000 0000000000000000 d000000002c60be0 c0000001f1dac490 GPR28: 0000000000000004 0000000000000000 c0000001ef1bd900 c0000000f2392940 [ 7879.893839] NIP [d000000002c664ec] config_item_get+0x7c/0x90 [configfs] [ 7879.893841] LR [d000000002c60980] check_perm+0x80/0x2e0 [configfs] [ 7879.893842] Call Trace: [ 7879.893844] [c0000001f3b43ac0] [d000000002c60980] check_perm+0x80/0x2e0 [configfs] [ 7879.893847] [c0000001f3b43b10] [c000000000329770] do_dentry_open+0x2c0/0x460 [ 7879.893849] [c0000001f3b43b70] [c000000000344480] path_openat+0x210/0x1490 [ 7879.893851] [c0000001f3b43c80] [c00000000034708c] do_filp_open+0xfc/0x170 [ 7879.893853] [c0000001f3b43db0] [c00000000032b5bc] do_sys_open+0x1cc/0x390 [ 7879.893856] [c0000001f3b43e30] [c000000000009584] system_call+0x38/0xec [ 7879.893856] Instruction dump: [ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d220000 e94981e0 892a0000 [ 7879.893861] 2f890000 409effe0 39200001 992a0000 <0fe00000> 4bffffd0 60000000 60000000 [ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]--- To close this race, go ahead and obtain the symlink's target config_item reference only after the existing CONFIGFS_USET_DROPPING check succeeds. This way, if configfs_rmdir() wins create_link() will return -ENONET, and if create_link() wins configfs_rmdir() will return -EBUSY. Reported-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

The following lockdep splat has been noticed during LTP testing ====================================================== WARNING: possible circular locking dependency detected 4.13.0-rc3-next-20170807 #12 Not tainted ------------------------------------------------------ a.out/4771 is trying to acquire lock: (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140 but task is already holding lock: (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&mm->mmap_sem){++++++}: lock_acquire+0xc9/0x230 __might_fault+0x70/0xa0 _copy_to_user+0x23/0x70 filldir+0xa7/0x110 xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs] xfs_readdir+0x1fa/0x2c0 [xfs] xfs_file_readdir+0x30/0x40 [xfs] iterate_dir+0x17a/0x1a0 SyS_getdents+0xb0/0x160 entry_SYSCALL_64_fastpath+0x1f/0xbe -> #2 (&type->i_mutex_dir_key#3){++++++}: lock_acquire+0xc9/0x230 down_read+0x51/0xb0 lookup_slow+0xde/0x210 walk_component+0x160/0x250 link_path_walk+0x1a6/0x610 path_openat+0xe4/0xd50 do_filp_open+0x91/0x100 file_open_name+0xf5/0x130 filp_open+0x33/0x50 kernel_read_file_from_path+0x39/0x80 _request_firmware+0x39f/0x880 request_firmware_direct+0x37/0x50 request_microcode_fw+0x64/0xe0 reload_store+0xf7/0x180 dev_attr_store+0x18/0x30 sysfs_kf_write+0x44/0x60 kernfs_fop_write+0x113/0x1a0 __vfs_write+0x37/0x170 vfs_write+0xc7/0x1c0 SyS_write+0x58/0xc0 do_syscall_64+0x6c/0x1f0 return_from_SYSCALL_64+0x0/0x7a -> #1 (microcode_mutex){+.+.+.}: lock_acquire+0xc9/0x230 __mutex_lock+0x88/0x960 mutex_lock_nested+0x1b/0x20 microcode_init+0xbb/0x208 do_one_initcall+0x51/0x1a9 kernel_init_freeable+0x208/0x2a7 kernel_init+0xe/0x104 ret_from_fork+0x2a/0x40 -> #0 (cpu_hotplug_lock.rw_sem){++++++}: __lock_acquire+0x153c/0x1550 lock_acquire+0xc9/0x230 cpus_read_lock+0x4b/0x90 drain_all_stock.part.35+0x18/0x140 try_charge+0x3ab/0x6e0 mem_cgroup_try_charge+0x7f/0x2c0 shmem_getpage_gfp+0x25f/0x1050 shmem_fault+0x96/0x200 __do_fault+0x1e/0xa0 __handle_mm_fault+0x9c3/0xe00 handle_mm_fault+0x16e/0x380 __do_page_fault+0x24a/0x530 do_page_fault+0x30/0x80 page_fault+0x28/0x30 other info that might help us debug this: Chain exists of: cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(&type->i_mutex_dir_key#3); lock(&mm->mmap_sem); lock(cpu_hotplug_lock.rw_sem); *** DEADLOCK *** 2 locks held by a.out/4771: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530 #1: (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0 The problem is very similar to the one fixed by commit a459eeb ("mm, page_alloc: do not depend on cpu hotplug locks inside the allocator"). We are taking hotplug locks while we can be sitting on top of basically arbitrary locks. This just calls for problems. We can get rid of {get,put}_online_cpus, fortunately. We do not have to be worried about races with memory hotplug because drain_local_stock, which is called from both the WQ draining and the memory hotplug contexts, is always operating on the local cpu stock with IRQs disabled. The only thing to be careful about is that the target memcg doesn't vanish while we are still in drain_all_stock so take a reference on it. Link: http://lkml.kernel.org/r/20170913090023.28322-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Artem Savkov <asavkov@redhat.com> Tested-by: Artem Savkov <asavkov@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

spin_lock/unlock of health->wq_lock should be IRQ safe. It was changed to spin_lock_irqsave since adding commit 0179720 ("net/mlx5: Introduce trigger_health_work function") which uses spin_lock from asynchronous event (IRQ) context. Thus, all spin_lock/unlock of health->wq_lock should have been moved to IRQ safe mode. However, one occurrence on new code using this lock missed that change, resulting in possible deadlock: kernel: Possible unsafe locking scenario: kernel: CPU0 kernel: ---- kernel: lock(&(&health->wq_lock)->rlock); kernel: <Interrupt> kernel: lock(&(&health->wq_lock)->rlock); kernel: #12 *** DEADLOCK *** Fixes: 2a0165a ("net/mlx5: Cancel delayed recovery work when unloading the driver") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

…ns[] When adding GP-1-28 port pin support it was forgotten to remove the CLKOUT pin from the list of pins that are not associated with a GPIO port in pinmux_pins[]. This results in a warning when reading the pinctrl files in sysfs as the CLKOUT pin is still added as a none GPIO pin. Fix this by removing the duplicated entry which is no longer needed. ~ # cat /sys/kernel/debug/pinctrl/e6060000.pin-controller/pinconf-pins [ 89.432081] ------------[ cut here ]------------ [ 89.436904] Pin 496 is not in bias info list [ 89.441252] WARNING: CPU: 1 PID: 456 at drivers/pinctrl/sh-pfc/core.c:408 sh_pfc_pin_to_bias_reg+0xb0/0xb8 [ 89.451002] CPU: 1 PID: 456 Comm: cat Not tainted 4.16.0-rc1-arm64-renesas-00048-gdfafc344a4f24dde anholt#12 [ 89.460394] Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT) [ 89.468910] pstate: 80000085 (Nzcv daIf -PAN -UAO) [ 89.473747] pc : sh_pfc_pin_to_bias_reg+0xb0/0xb8 [ 89.478495] lr : sh_pfc_pin_to_bias_reg+0xb0/0xb8 [ 89.483241] sp : ffff00000aff3ab0 [ 89.486587] x29: ffff00000aff3ab0 x28: ffff00000893c698 [ 89.491955] x27: ffff000008ad7d98 x26: 0000000000000000 [ 89.497323] x25: ffff8006fb3f5028 x24: ffff8006fb3f5018 [ 89.502690] x23: 0000000000000001 x22: 00000000000001f0 [ 89.508057] x21: ffff8006fb3f5018 x20: ffff000008bef000 [ 89.513423] x19: 0000000000000000 x18: ffffffffffffffff [ 89.518790] x17: 0000000000006c4a x16: ffff000008d67c98 [ 89.524157] x15: 0000000000000001 x14: ffff00000896ca98 [ 89.529524] x13: 00000000cce5f611 x12: ffff8006f8d3b5a8 [ 89.534891] x11: ffff00000981e000 x10: ffff000008befa08 [ 89.540258] x9 : ffff8006f9b987a0 x8 : ffff000008befa08 [ 89.545625] x7 : ffff000008137094 x6 : 0000000000000000 [ 89.550991] x5 : 0000000000000000 x4 : 0000000000000001 [ 89.556357] x3 : 0000000000000007 x2 : 0000000000000007 [ 89.561723] x1 : 1ff24f80f1818600 x0 : 0000000000000000 [ 89.567091] Call trace: [ 89.569561] sh_pfc_pin_to_bias_reg+0xb0/0xb8 [ 89.573960] r8a7795_pinmux_get_bias+0x30/0xc0 [ 89.578445] sh_pfc_pinconf_get+0x1e0/0x2d8 [ 89.582669] pin_config_get_for_pin+0x20/0x30 [ 89.587067] pinconf_generic_dump_one+0x180/0x1c8 [ 89.591815] pinconf_generic_dump_pins+0x84/0xd8 [ 89.596476] pinconf_pins_show+0xc8/0x130 [ 89.600528] seq_read+0xe4/0x510 [ 89.603789] full_proxy_read+0x60/0x90 [ 89.607576] __vfs_read+0x30/0x140 [ 89.611010] vfs_read+0x90/0x170 [ 89.614269] SyS_read+0x60/0xd8 [ 89.617443] __sys_trace_return+0x0/0x4 [ 89.621314] ---[ end trace 99c8d0d39c13e794 ]--- Fixes: 82d2de5 ("pinctrl: sh-pfc: r8a7795: Add GP-1-28 port pin support") Reviewed-and-tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

Scheduler debug stats include newlines that display out of alignment when prefixed by timestamps. For example, the dmesg utility: % echo t > /proc/sysrq-trigger % dmesg ... [ 83.124251] runnable tasks: S task PID tree-key switches prio wait-time sum-exec sum-sleep ----------------------------------------------------------------------------------------------------------- At the same time, some syslog utilities (like rsyslog by default) don't like the additional newlines control characters, saving lines like this to /var/log/messages: Mar 16 16:02:29 localhost kernel: #012runnable tasks:anholt#12 S task PID tree-key ... ^^^^ ^^^^ Clean these up by moving newline characters to their own SEQ_printf invocation. This leaves the /proc/sched_debug unchanged, but brings the entire output into alignment when prefixed: % echo t > /proc/sysrq-trigger % dmesg ... [ 62.410368] runnable tasks: [ 62.410368] S task PID tree-key switches prio wait-time sum-exec sum-sleep [ 62.410369] ----------------------------------------------------------------------------------------------------------- [ 62.410369] I kworker/u12:0 5 1932.215593 332 120 0.000000 3.621252 0.000000 0 0 / and no escaped control characters from rsyslog in /var/log/messages: Mar 16 16:15:06 localhost kernel: runnable tasks: Mar 16 16:15:06 localhost kernel: S task PID tree-key ... Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1521484555-8620-3-git-send-email-joe.lawrence@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>

The seg6_build_state() function is called with RCU read lock held, so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead. [ 92.770271] ============================= [ 92.770628] WARNING: suspicious RCU usage [ 92.770921] 4.16.0-rc4+ anholt#12 Not tainted [ 92.771277] ----------------------------- [ 92.771585] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section! [ 92.772279] [ 92.772279] other info that might help us debug this: [ 92.772279] [ 92.773067] [ 92.773067] rcu_scheduler_active = 2, debug_locks = 1 [ 92.773514] 2 locks held by ip/2413: [ 92.773765] #0: (rtnl_mutex){+.+.}, at: [<00000000e5461720>] rtnetlink_rcv_msg+0x441/0x4d0 [ 92.774377] anholt#1: (rcu_read_lock){....}, at: [<00000000df4f161e>] lwtunnel_build_state+0x59/0x210 [ 92.775065] [ 92.775065] stack backtrace: [ 92.775371] CPU: 0 PID: 2413 Comm: ip Not tainted 4.16.0-rc4+ anholt#12 [ 92.775791] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014 [ 92.776608] Call Trace: [ 92.776852] dump_stack+0x7d/0xbc [ 92.777130] __schedule+0x133/0xf00 [ 92.777393] ? unwind_get_return_address_ptr+0x50/0x50 [ 92.777783] ? __sched_text_start+0x8/0x8 [ 92.778073] ? rcu_is_watching+0x19/0x30 [ 92.778383] ? kernel_text_address+0x49/0x60 [ 92.778800] ? __kernel_text_address+0x9/0x30 [ 92.779241] ? unwind_get_return_address+0x29/0x40 [ 92.779727] ? pcpu_alloc+0x102/0x8f0 [ 92.780101] _cond_resched+0x23/0x50 [ 92.780459] __mutex_lock+0xbd/0xad0 [ 92.780818] ? pcpu_alloc+0x102/0x8f0 [ 92.781194] ? seg6_build_state+0x11d/0x240 [ 92.781611] ? save_stack+0x9b/0xb0 [ 92.781965] ? __ww_mutex_wakeup_for_backoff+0xf0/0xf0 [ 92.782480] ? seg6_build_state+0x11d/0x240 [ 92.782925] ? lwtunnel_build_state+0x1bd/0x210 [ 92.783393] ? ip6_route_info_create+0x687/0x1640 [ 92.783846] ? ip6_route_add+0x74/0x110 [ 92.784236] ? inet6_rtm_newroute+0x8a/0xd0 Fixes: 6c8702c ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>

[ Upstream commit 191f86c ] The seg6_build_state() function is called with RCU read lock held, so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead. [ 92.770271] ============================= [ 92.770628] WARNING: suspicious RCU usage [ 92.770921] 4.16.0-rc4+ #12 Not tainted [ 92.771277] ----------------------------- [ 92.771585] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section! [ 92.772279] [ 92.772279] other info that might help us debug this: [ 92.772279] [ 92.773067] [ 92.773067] rcu_scheduler_active = 2, debug_locks = 1 [ 92.773514] 2 locks held by ip/2413: [ 92.773765] #0: (rtnl_mutex){+.+.}, at: [<00000000e5461720>] rtnetlink_rcv_msg+0x441/0x4d0 [ 92.774377] #1: (rcu_read_lock){....}, at: [<00000000df4f161e>] lwtunnel_build_state+0x59/0x210 [ 92.775065] [ 92.775065] stack backtrace: [ 92.775371] CPU: 0 PID: 2413 Comm: ip Not tainted 4.16.0-rc4+ #12 [ 92.775791] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014 [ 92.776608] Call Trace: [ 92.776852] dump_stack+0x7d/0xbc [ 92.777130] __schedule+0x133/0xf00 [ 92.777393] ? unwind_get_return_address_ptr+0x50/0x50 [ 92.777783] ? __sched_text_start+0x8/0x8 [ 92.778073] ? rcu_is_watching+0x19/0x30 [ 92.778383] ? kernel_text_address+0x49/0x60 [ 92.778800] ? __kernel_text_address+0x9/0x30 [ 92.779241] ? unwind_get_return_address+0x29/0x40 [ 92.779727] ? pcpu_alloc+0x102/0x8f0 [ 92.780101] _cond_resched+0x23/0x50 [ 92.780459] __mutex_lock+0xbd/0xad0 [ 92.780818] ? pcpu_alloc+0x102/0x8f0 [ 92.781194] ? seg6_build_state+0x11d/0x240 [ 92.781611] ? save_stack+0x9b/0xb0 [ 92.781965] ? __ww_mutex_wakeup_for_backoff+0xf0/0xf0 [ 92.782480] ? seg6_build_state+0x11d/0x240 [ 92.782925] ? lwtunnel_build_state+0x1bd/0x210 [ 92.783393] ? ip6_route_info_create+0x687/0x1640 [ 92.783846] ? ip6_route_add+0x74/0x110 [ 92.784236] ? inet6_rtm_newroute+0x8a/0xd0 Fixes: 6c8702c ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt · 2018-04-30T18:22:20Z

vc4 is fine at dma-bufs, and @6by9's doing the dma-buf support in the v4l2 camera driver.

gohai · 2018-05-01T18:26:11Z

@6by9 Do you have an estimate when the updated v4l2 camera driver will land in Raspbian? Any way to test this currently? Thanks!

6by9 · 2018-05-02T09:47:51Z

The https://github.com/6by9/linux/tree/rpi-4.14.y-v4l2-codec being developed for #13 has all the required mods for the camera too. It'll all get merged together, and is also hit by the same vcsm-cma cleanup issue.

Testing - it's no real difference from before. Eric's demo code he linked to above should still be valid for almost all pixel formats (not YUYV and variants as the hardware doesn't support them).
I haven't got a Pi with camera module set up at the moment, but I'd expect GStreamer's v4l2src io-mode=dmabuf to just work with it when sending it to kmssink.

This was referenced Jan 17, 2017

ARM64: What's missing at this point raspberrypi/linux#1801

Closed

V4L2 Improvements - videobuf2_contig allocator raspberrypi/linux#1807

Closed

anholt closed this as completed Apr 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vc4: dmabuf import from camera driver not supported #12

vc4: dmabuf import from camera driver not supported #12

anholt commented Feb 15, 2016

MarcoBodega commented Mar 15, 2016

Uh oh!

6by9 commented Aug 2, 2016

Uh oh!

anholt commented Aug 2, 2016

Uh oh!

6by9 commented Aug 2, 2016

Uh oh!

anholt commented Aug 2, 2016

Uh oh!

anholt commented Aug 17, 2016

Uh oh!

anholt commented Aug 17, 2016

Uh oh!

anholt commented Jan 27, 2017

Uh oh!

gohai commented Jan 29, 2017

Uh oh!

6by9 commented Jan 29, 2017

Uh oh!

gohai commented Jan 29, 2017

Uh oh!

anholt commented Apr 30, 2018

Uh oh!

gohai commented May 1, 2018

Uh oh!

6by9 commented May 2, 2018

Uh oh!

vc4: dmabuf import from camera driver not supported #12

vc4: dmabuf import from camera driver not supported #12

Comments

anholt commented Feb 15, 2016

MarcoBodega commented Mar 15, 2016

Uh oh!

6by9 commented Aug 2, 2016

Uh oh!

anholt commented Aug 2, 2016

Uh oh!

6by9 commented Aug 2, 2016

Uh oh!

anholt commented Aug 2, 2016

Uh oh!

anholt commented Aug 17, 2016

Uh oh!

anholt commented Aug 17, 2016

Uh oh!

anholt commented Jan 27, 2017

Uh oh!

gohai commented Jan 29, 2017

Uh oh!

6by9 commented Jan 29, 2017

Uh oh!

gohai commented Jan 29, 2017

Uh oh!

anholt commented Apr 30, 2018

Uh oh!

gohai commented May 1, 2018

Uh oh!

6by9 commented May 2, 2018

Uh oh!