Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I2C-6 Doesn't Work #15

Closed
iracigt opened this issue Jun 28, 2017 · 29 comments
Closed

I2C-6 Doesn't Work #15

iracigt opened this issue Jun 28, 2017 · 29 comments

Comments

@iracigt
Copy link

iracigt commented Jun 28, 2017

As was previously mentioned on the Intel forums, i2c-6 doesn't work with the newer kernels because it requires setting the pin mux. Intel had previously exported this functionality through a (somewhat clunky) debugfs interface that worked with the old version of the GPIO driver included with the official 3.x kernels.

For a project I'm working on, I need both I2C buses. What would it take to get i2c-6 working, both temporarily and in a way that could be sent upstream? I'm willing to work on this and to learn, but have limited background with Linux kernel development (took an operating systems course using a teaching kernel, but never contributed to Linux).

So far, I've considered writing a userspace program that sets the pin mux, but it looks like the I2C pins are in a protected region that requires using an IPC channel. This rules out a userspace program because that would be complicated and we couldn't safely lock the IPC.

It looks like much of the groundwork has been done for this in pinctrl-merrifield.c. What is the appropriate place to be calling the pinctrl code from?

@htot
Copy link

htot commented Jun 28, 2017

I have been checking how much work it would be to port mraa to newer kernels. In mraa's edison platform code I stumbled into https://github.com/intel-iot-devkit/mraa/blob/master/src/x86/intel_edison_fab_c.c line #127. Here it is attempted to change the pinmux using SYSFS_CLASS_GPIO "/gpio%i/pinmux", which doesn't exist on 3.14 eds and if that fails using DEBUGFS_PINMODE_PATH "%i/current_pinmux", which is not really standard. So I'm wondering if the former is standard for later kernels, or just another hack for eds that never got implemented. If more or less standard it might be a suitable interface for code that would live in pinctrl-merrifield.c

@andy-shev
Copy link
Owner

The patches are in my tree, though I didn't make them working. You may try to amend them. Though, I'm not going to upstream them by few reasons, main of which is badly prepared firmware. We might fix this via U-Boot nevertheless.

@andy-shev andy-shev self-assigned this Jun 30, 2017
@htot
Copy link

htot commented Jul 1, 2017

Looks like everything is in place and only needs to be set when the i2c driver is probed. Correct?

andy-shev pushed a commit that referenced this issue Aug 28, 2017
Currently we are allocating drm_device in rockchip_drm_bind, so if the
suspend/resume code access it when drm is not bound, we would hit this
crash:

[  253.402836] Unable to handle kernel NULL pointer dereference at virtual address 00000028
[  253.402837] pgd = ffffffc06c9b0000
[  253.402841] [00000028] *pgd=0000000000000000, *pud=0000000000000000
[  253.402844] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  253.402859] Modules linked in: btusb btrtl btbcm btintel bluetooth ath10k_pci ath10k_core ar10k_ath ar10k_mac80211 cfg80211 ip6table_filter asix usbnet mii
[  253.402864] CPU: 4 PID: 1331 Comm: cat Not tainted 4.4.70 #15
[  253.402865] Hardware name: Google Scarlet (DT)
[  253.402867] task: ffffffc076c0ce00 ti: ffffffc06c2c8000 task.ti: ffffffc06c2c8000
[  253.402871] PC is at rockchip_drm_sys_suspend+0x20/0x5c

Add sanity checks to prevent that.

Reported-by: Brian Norris <briannorris@chromium.com>
Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.kernel.org/patch/9890297/
andy-shev pushed a commit that referenced this issue Oct 1, 2017
If FF request comes in while uinput device is going away,
uinput_request_send() will fail with -ENODEV, and uinput_request_submit()
will attempt to mark the slot as unused by calling uinput_request_done().
Unfortunately in this case we haven't initialized request->done completion
yet, and we get a crash:

[   39.402036] BUG: spinlock bad magic on CPU#1, fftest/3108
[   39.402046]  lock: 0xffff88006a93bb00, .magic: 00000000, .owner: /39, .owner_cpu: 1217155072
[   39.402055] CPU: 1 PID: 3108 Comm: fftest Tainted: G        W 4.13.0+ #15
[   39.402059] Hardware name: LENOVO 20HQS0EG02/20HQS0EG02, BIOS N1MET37W (1.22 ) 07/04/2017
[   39.402064]  0000000000000086 f0fad82f3ceaa120 ffff88006a93b9a0 ffffffff9de941bb
[   39.402077]  ffff88026df8ae00 ffff88006a93bb00 ffff88006a93b9c0 ffffffff9dca62b7
[   39.402088]  ffff88006a93bb00 ffff88006a93baf8 ffff88006a93b9e0 ffffffff9dca62e7
[   39.402099] Call Trace:
[   39.402112]  [<ffffffff9de941bb>] dump_stack+0x4d/0x63
[   39.402123]  [<ffffffff9dca62b7>] spin_dump+0x97/0x9c
[   39.402130]  [<ffffffff9dca62e7>] spin_bug+0x2b/0x2d
[   39.402138]  [<ffffffff9dca6373>] do_raw_spin_lock+0x28/0xfd
[   39.402147]  [<ffffffff9e3055cd>] _raw_spin_lock_irqsave+0x19/0x1f
[   39.402154]  [<ffffffff9dca05b7>] complete+0x1d/0x48
[   39.402162]  [<ffffffffc04f30af>] 0xffffffffc04f30af
[   39.402167]  [<ffffffffc04f468c>] 0xffffffffc04f468c
[   39.402177]  [<ffffffff9dd59c16>] ? __slab_free+0x22f/0x359
[   39.402184]  [<ffffffff9dcc13e9>] ? tk_clock_read+0xc/0xe
[   39.402189]  [<ffffffffc04f471f>] 0xffffffffc04f471f
[   39.402195]  [<ffffffff9dc9ffe5>] ? __wake_up+0x44/0x4b
[   39.402200]  [<ffffffffc04f3240>] ? 0xffffffffc04f3240
[   39.402207]  [<ffffffff9e0f57f3>] erase_effect+0xa1/0xd2
[   39.402214]  [<ffffffff9e0f58c6>] input_ff_flush+0x43/0x5c
[   39.402219]  [<ffffffffc04f32ad>] 0xffffffffc04f32ad
[   39.402227]  [<ffffffff9e0f174f>] input_flush_device+0x3d/0x51
[   39.402234]  [<ffffffff9e0f69ae>] evdev_flush+0x49/0x5c
[   39.402243]  [<ffffffff9dd62d6e>] filp_close+0x3f/0x65
[   39.402253]  [<ffffffff9dd7dcf7>] put_files_struct+0x66/0xc1
[   39.402261]  [<ffffffff9dd7ddeb>] exit_files+0x47/0x4e
[   39.402270]  [<ffffffff9dc6b329>] do_exit+0x483/0x969
[   39.402278]  [<ffffffff9dc73211>] ? recalc_sigpending_tsk+0x3d/0x44
[   39.402285]  [<ffffffff9dc6c7a2>] do_group_exit+0x42/0xb0
[   39.402293]  [<ffffffff9dc767e1>] get_signal+0x58d/0x5bf
[   39.402300]  [<ffffffff9dc03701>] do_signal+0x37/0x53e
[   39.402307]  [<ffffffff9e0f8401>] ? evdev_ioctl_handler+0xac8/0xb04
[   39.402314]  [<ffffffff9e0f8464>] ? evdev_ioctl+0x10/0x12
[   39.402321]  [<ffffffff9dd74cfa>] ? do_vfs_ioctl+0x42e/0x501
[   39.402328]  [<ffffffff9dc0170e>] prepare_exit_to_usermode+0x66/0x90
[   39.402333]  [<ffffffff9dc0181b>] syscall_return_slowpath+0xe3/0xec
[   39.402339]  [<ffffffff9e305b7b>] int_ret_from_sys_call+0x25/0x8f

While we could solve this by simply initializing the completion earlier, we
are better off rearranging the code a bit so we avoid calling complete() on
requests that we did not send out. This patch consolidates marking request
slots as free in one place (in uinput_request_submit(), the same place
where we acquire them) and having everyone else simply signal completion
of the requests.

Fixes: 00ce756 ("Input: uinput - mark failed submission requests as free")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
@svenschwermer
Copy link

Hi! I spent some time and figured out a way to make i2c-6 work, see https://svenschwermer.de/2017/10/14/intel-edison-i2c6-with-vanilla-linux.html

@iracigt
Copy link
Author

iracigt commented Oct 14, 2017

Thanks for the link! Turns out I went down the same path last week and finally got it working too. I ended up creating an small module that just requested the pin muxing I wanted and loading that. It's more complicated, but had the advantage of allowing the I2C drivers to be present at boot and for the kernel to find the GPIO expander we're using at boot. I'll try to get my code up soon, though for most people your fix is probably better.

@farfromrefug
Copy link

@iracigt could you share your code? We are looking at doing the same thing. What is what you call a gpio expander? Cause may be we could use that too if you are willing to share. Thanks

@floion
Copy link

floion commented Dec 20, 2017

Hi @andy-shev @htot is uboot now doing this pinmuxing?

@andy-shev
Copy link
Owner

@floion
Nope. We need a volunteer who can do this.

See Sven's comment above how to enable it in Linux.

@floion
Copy link

floion commented Dec 21, 2017

Thanks for the follow-up, Andy.
Using Sven's patch at the moment for enabling i2c-6. I guess we will need do the same for other pinmux'ing we want?

@andy-shev
Copy link
Owner

@floion Not same for sure. I2C is just special, other muxings can be done in usual way via pin control lookup tables.

@alext-mkrs
Copy link

I'd be glad to volunteer, just need to finish some current project, that'd take me about a week or two from now and if no one does this by then I'll bite.

@htot
Copy link

htot commented Dec 22, 2017

@andy-shev What would be an upstream-able solution for this? Do you have any ideas?

@alext-mkrs
Copy link

So I think I'm ready to start looking into this one. But as I'm new to U-Boot's code, if you indeed have any hints or ideas, @andy-shev - I'm all ears.

@andy-shev
Copy link
Owner

@htot @alext-mkrs
Like I mentioned before, the upstream solution is to provide pin control and GPIO drivers for Intel Merrifield in U-Boot. This will give an essential part to configure board at early stage as needed.

@alext-mkrs
Copy link

@andy-shev, yeah, thanks - that part is understood, I meant more like the next level of detail, i.e. where specifically would that fit best in U-Boot, do you think it makes any sense to port the driver from Linux vs. write a new one tailored to U-Boot driver model - that sort of thing, like a basic direction advice.

@andy-shev
Copy link
Owner

@alext-mkrs
Linux driver (more precisely drivers) can be considered as a documentation, though U-Boot driver (or drivers) should be made according to U-Boot practices, e.g. Driver Model.

@htot
Copy link

htot commented Jan 25, 2018

But as I understood the issue is not with the I2C driver per sé, but setting up one of the needed pin mode? As @iracigt mentions above he created a small module to do that. Maybe this code could be used as example what to add to u-boot?

@alext-mkrs
Copy link

@htot, the thing is not with the code itself, but the order two IP blocks are probed, that's why the module approach helps - not because it contains some different code, but because it can be loaded after the SCU device (which owns those pins) is probed.

Ok, thanks Andy, I'll look into this.

@andy-shev
Copy link
Owner

@alext-mkrs
The pins are not owned by SCU, they are protected by internal bus firewall, and SCU just has more permissions than usual I/O writes/reads.

@alext-mkrs
Copy link

alext-mkrs commented Jan 29, 2018

Oh, I see, that's interesting - thanks. I don't think this is described in any external Edison docs, so hopefully the Linux driver has enough information to do it meaningfully for u-boot.

@andy-shev
Copy link
Owner

@alext-mkrs
...Linux driver from stock kernel. Unfortunately I have no firmware specification documents either. But I can share some of my observations (basically I extracted what is needed in my never-be-upstreamed patch set which enables I2C bus 6).

@alext-mkrs
Copy link

Thanks, I'll surely take a look. Haven't yet got this unfortunately still :\ but this is the first on my list.

@alext-mkrs
Copy link

Okay, so I planned to get a bit deeper into the u-boot, however life has executed its rights and looks like I won't and I won't have time to look into this particular one anytime soon :( So FYI and it's again up for grabs for anyone willing.

@htot
Copy link

htot commented Apr 22, 2018

That's unfortunate, I hope everything is ok with you. In the meanwhile I have gone ahead and put @svenschwermer patch as a temporary solution in my meta-intel-edison rocko. It works only in the non-acpi case.

@alext-mkrs
Copy link

Thanks. It's ok, just that I have to change focus somewhat and I honestly don't see how I can find time for this specific one in a foreseeable future - so I though it would only be fair to let everyone know about that explicitly.

@htot
Copy link

htot commented Sep 11, 2018

Hi! I spent some time and figured out a way to make i2c-6 work, see https://svenschwermer.de/2017/10/14/intel-edison-i2c6-with-vanilla-linux.html

@svenschwermer We now have a way of enabling I2C-6 from U-Boot. Nevertheless I have used your patch for our non-acpi enabled kernel and non-I2C enabled U-Boot - but never tested if I2C-6 really worked by connecting a device.

Now I have linux 4.18 and after testing the new U-Boot I decided to try our old U-Boot and kernel with your patch and find that it doesn't work. Our kernel recipe (ok, for the 4.16 kernel) is here, your patch here

root@edison:~# cat /sys/kernel/debug/pinctrl/pinctrl-merrifield/pins
registered pins: 233
....
pin 101 (GP19_I2C_1_SCL) mode 1 0x00203511
pin 102 (GP20_I2C_1_SDA) mode 1 0x00203511
pin 103 (GP21_I2C_2_SCL) mode 1 0x00203501
pin 104 (GP22_I2C_2_SDA) mode 1 0x00203581
pin 105 (GP17_I2C_3_SCL_HDMI) mode 1 0x20203421
pin 106 (GP18_I2C_3_SDA_HDMI) mode 1 0x202034a1
pin 107 (GP23_I2C_4_SCL) mode 1 0x00203501
pin 108 (GP24_I2C_4_SDA) mode 1 0x00203581
pin 109 (GP25_I2C_5_SCL) mode 1 0x00203101
pin 110 (GP26_I2C_5_SDA) mode 1 0x00203181
pin 111 (GP27_I2C_6_SCL) GPIO 0x00000510
pin 112 (GP28_I2C_6_SDA) GPIO 0x00000590
pin 113 (GP29_I2C_7_SCL) mode 1 0x00203501
pin 114 (GP30_I2C_7_SDA) mode 1 0x00203581

I looks pin mode is not set to the correct mode. Would you know what causes that? Something in 4.18?

Even though we are going in the direction of using ACPI and having this set in U-Boot, I would like to have this option working too.

@htot
Copy link

htot commented Sep 15, 2018

@svenschwermer I found it. The Edison Arduino Hardware guide tells us to do this:

echo 28 > /sys/class/gpio/export
echo 27 > /sys/class/gpio/export

But when we touch these pins, they change to GPIO instead of mode 1. For future reference, this script correctly sets the I2C-6 bus (tested with 4.18):

#!/bin/sh
echo 204 > /sys/class/gpio/export
echo 205 > /sys/class/gpio/export
echo 236 > /sys/class/gpio/export
echo 237 > /sys/class/gpio/export
echo 14 > /sys/class/gpio/export
echo 165 > /sys/class/gpio/export
echo 212 > /sys/class/gpio/export
echo 213 > /sys/class/gpio/export
echo 214 > /sys/class/gpio/export
echo low > /sys/class/gpio/gpio214/direction
echo low > /sys/class/gpio/gpio204/direction # HW Guide §11.6 is wrong, follow Table 4
echo low > /sys/class/gpio/gpio205/direction # HW Guide §11.6 is wrong, follow Table 4
echo in > /sys/class/gpio/gpio14/direction
echo in > /sys/class/gpio/gpio165/direction
echo low > /sys/class/gpio/gpio236/direction
echo low > /sys/class/gpio/gpio237/direction
echo in > /sys/class/gpio/gpio212/direction
echo in > /sys/class/gpio/gpio213/direction
echo high > /sys/class/gpio/gpio214/direction

With modprobe -i i2c-dev and TSL2561 connected:

root@edison:~# i2cdetect -r -y 6
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- -- 
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
30: -- -- -- -- -- -- -- -- -- 39 -- -- -- -- -- -- 
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
70: -- -- -- -- -- -- -- --                         

@andy-shev
Copy link
Owner

@htot
Just a side note, that this is a script for non-ACPI version of kernel/U-Boot. in ACPI case this settings are done via ASL code.

@htot
Copy link

htot commented Sep 21, 2018

Absolutely.

htot pushed a commit to htot/linux that referenced this issue May 17, 2019
If codec registration fails after the ASoC Intel SST driver has been probed,
the kernel will Oops and crash at suspend/resume.

general protection fault: 0000 [andy-shev#1] PREEMPT SMP KASAN PTI
CPU: 1 PID: 2811 Comm: cat Tainted: G        W         4.19.30 andy-shev#15
Hardware name: GOOGLE Clapper, BIOS Google_Clapper.5216.199.7 08/22/2014
RIP: 0010:snd_soc_suspend+0x5a/0xd21
Code: 03 80 3c 10 00 49 89 d7 74 0b 48 89 df e8 71 72 c4 fe 4c 89
fa 48 8b 03 48 89 45 d0 48 8d 98 a0 01 00 00 48 89 d8 48 c1 e8 03
<8a> 04 10 84 c0 0f 85 85 0c 00 00 80 3b 00 0f 84 6b 0c 00 00 48 8b
RSP: 0018:ffff888035407750 EFLAGS: 00010202
RAX: 0000000000000034 RBX: 00000000000001a0 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88805c417098
RBP: ffff8880354077b0 R08: dffffc0000000000 R09: ffffed100b975718
R10: 0000000000000001 R11: ffffffff949ea4a3 R12: 1ffff1100b975746
R13: dffffc0000000000 R14: ffff88805cba4588 R15: dffffc0000000000
FS:  0000794a78e91b80(0000) GS:ffff888068d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007bd5283ccf58 CR3: 000000004b7aa000 CR4: 00000000001006e0
Call Trace:
? dpm_complete+0x67b/0x67b
? i915_gem_suspend+0x14d/0x1ad
sst_soc_prepare+0x91/0x1dd
? sst_be_hw_params+0x7e/0x7e
dpm_prepare+0x39a/0x88b
dpm_suspend_start+0x13/0x9d
suspend_devices_and_enter+0x18f/0xbd7
? arch_suspend_enable_irqs+0x11/0x11
? printk+0xd9/0x12d
? lock_release+0x95f/0x95f
? log_buf_vmcoreinfo_setup+0x131/0x131
? rcu_read_lock_sched_held+0x140/0x22a
? __bpf_trace_rcu_utilization+0xa/0xa
? __pm_pr_dbg+0x186/0x190
? pm_notifier_call_chain+0x39/0x39
? suspend_test+0x9d/0x9d
pm_suspend+0x2f4/0x728
? trace_suspend_resume+0x3da/0x3da
? lock_release+0x95f/0x95f
? kernfs_fop_write+0x19f/0x32d
state_store+0xd8/0x147
? sysfs_kf_read+0x155/0x155
kernfs_fop_write+0x23e/0x32d
__vfs_write+0x108/0x608
? vfs_read+0x2e9/0x2e9
? rcu_read_lock_sched_held+0x140/0x22a
? __bpf_trace_rcu_utilization+0xa/0xa
? debug_smp_processor_id+0x10/0x10
? selinux_file_permission+0x1c5/0x3c8
? rcu_sync_lockdep_assert+0x6a/0xad
? __sb_start_write+0x129/0x2ac
vfs_write+0x1aa/0x434
ksys_write+0xfe/0x1be
? __ia32_sys_read+0x82/0x82
do_syscall_64+0xcd/0x120
entry_SYSCALL_64_after_hwframe+0x49/0xbe

In the observed situation, the problem is seen because the codec driver
failed to probe due to a hardware problem.

max98090 i2c-193C9890:00: Failed to read device revision: -1
max98090 i2c-193C9890:00: ASoC: failed to probe component -1
cht-bsw-max98090 cht-bsw-max98090: ASoC: failed to instantiate card -1
cht-bsw-max98090 cht-bsw-max98090: snd_soc_register_card failed -1
cht-bsw-max98090: probe of cht-bsw-max98090 failed with error -1

The problem is similar to the problem solved with commit 2fc995a
("ASoC: intel: Fix crash at suspend/resume without card registration"),
but codec registration fails at a later point. At that time, the pointer
checked with the above mentioned commit is already set, but it is not
cleared if the device is subsequently removed. Adding a remove function
to clear the pointer fixes the problem.

Cc: stable@vger.kernel.org
Cc: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Cc: Curtis Malainey <cujomalainey@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
htot pushed a commit to htot/linux that referenced this issue May 17, 2019
By calling maps__insert() we assume to get 2 references on the map,
which we relese within maps__remove call.

However if there's already same map name, we currently don't bump the
reference and can crash, like:

  Program received signal SIGABRT, Aborted.
  0x00007ffff75e60f5 in raise () from /lib64/libc.so.6

  (gdb) bt
  #0  0x00007ffff75e60f5 in raise () from /lib64/libc.so.6
  andy-shev#1  0x00007ffff75d0895 in abort () from /lib64/libc.so.6
  andy-shev#2  0x00007ffff75d0769 in __assert_fail_base.cold () from /lib64/libc.so.6
  andy-shev#3  0x00007ffff75de596 in __assert_fail () from /lib64/libc.so.6
  andy-shev#4  0x00000000004fc006 in refcount_sub_and_test (i=1, r=0x1224e88) at tools/include/linux/refcount.h:131
  andy-shev#5  refcount_dec_and_test (r=0x1224e88) at tools/include/linux/refcount.h:148
  andy-shev#6  map__put (map=0x1224df0) at util/map.c:299
  andy-shev#7  0x00000000004fdb95 in __maps__remove (map=0x1224df0, maps=0xb17d80) at util/map.c:953
  andy-shev#8  maps__remove (maps=0xb17d80, map=0x1224df0) at util/map.c:959
  andy-shev#9  0x00000000004f7d8a in map_groups__remove (map=<optimized out>, mg=<optimized out>) at util/map_groups.h:65
  andy-shev#10 machine__process_ksymbol_unregister (sample=<optimized out>, event=0x7ffff7279670, machine=<optimized out>) at util/machine.c:728
  andy-shev#11 machine__process_ksymbol (machine=<optimized out>, event=0x7ffff7279670, sample=<optimized out>) at util/machine.c:741
  andy-shev#12 0x00000000004fffbb in perf_session__deliver_event (session=0xb11390, event=0x7ffff7279670, tool=0x7fffffffc7b0, file_offset=13936) at util/session.c:1362
  andy-shev#13 0x00000000005039bb in do_flush (show_progress=false, oe=0xb17e80) at util/ordered-events.c:243
  andy-shev#14 __ordered_events__flush (oe=0xb17e80, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:322
  andy-shev#15 0x00000000005005e4 in perf_session__process_user_event (session=session@entry=0xb11390, event=event@entry=0x7ffff72a4af8,
  ...

Add the map to the list and getting the reference event if we find the
map with same name.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Fixes: 1e62856 ("perf symbols: Fix slowness due to -ffunction-section")
Link: http://lkml.kernel.org/r/20190416160127.30203-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
htot pushed a commit to edison-fw/linux that referenced this issue Nov 24, 2020
This fix is for a failure that occurred in the DWARF unwind perf test.

Stack unwinders may probe memory when looking for frames.

Memory sanitizer will poison and track uninitialized memory on the
stack, and on the heap if the value is copied to the heap.

This can lead to false memory sanitizer failures for the use of an
uninitialized value.

Avoid this problem by removing the poison on the copied stack.

The full msan failure with track origins looks like:

==2168==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8
    andy-shev#1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    andy-shev#2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    andy-shev#3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    andy-shev#4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    andy-shev#5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    andy-shev#6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    andy-shev#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    andy-shev#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    andy-shev#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    andy-shev#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    andy-shev#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    andy-shev#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    andy-shev#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    andy-shev#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    andy-shev#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    andy-shev#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    andy-shev#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    andy-shev#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    andy-shev#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    andy-shev#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    andy-shev#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    andy-shev#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    andy-shev#23 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22
    andy-shev#1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13
    andy-shev#2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    andy-shev#3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    andy-shev#4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    andy-shev#5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    andy-shev#6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    andy-shev#7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    andy-shev#8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    andy-shev#9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    andy-shev#10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    andy-shev#11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    andy-shev#12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    andy-shev#13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    andy-shev#14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    andy-shev#15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    andy-shev#16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    andy-shev#17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    andy-shev#18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    andy-shev#19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    andy-shev#20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    andy-shev#21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    andy-shev#22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    andy-shev#23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    andy-shev#24 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9
    andy-shev#1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    andy-shev#2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    andy-shev#3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    andy-shev#4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    andy-shev#5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    andy-shev#6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    andy-shev#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    andy-shev#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    andy-shev#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    andy-shev#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    andy-shev#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    andy-shev#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    andy-shev#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    andy-shev#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    andy-shev#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    andy-shev#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    andy-shev#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    andy-shev#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    andy-shev#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    andy-shev#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    andy-shev#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    andy-shev#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    andy-shev#23 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10
    andy-shev#1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13
    andy-shev#2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18
    andy-shev#3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    andy-shev#4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    andy-shev#5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    andy-shev#6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    andy-shev#7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    andy-shev#8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    andy-shev#9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    andy-shev#10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    andy-shev#11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    andy-shev#12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    andy-shev#13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    andy-shev#14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    andy-shev#15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    andy-shev#16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    andy-shev#17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    andy-shev#18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    andy-shev#19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    andy-shev#20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    andy-shev#21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    andy-shev#22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    andy-shev#23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    andy-shev#24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    andy-shev#25 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3
    andy-shev#1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2
    andy-shev#2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9
    andy-shev#3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6
    andy-shev#4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    andy-shev#5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    andy-shev#6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    andy-shev#7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    andy-shev#8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    andy-shev#9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    andy-shev#10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    andy-shev#11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    andy-shev#12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    andy-shev#13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    andy-shev#14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    andy-shev#15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    andy-shev#16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    andy-shev#17 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events'
    #0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445

SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: clang-built-linux@googlegroups.com
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandeep Dasgupta <sdasgup@google.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201113182053.754625-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
htot pushed a commit to edison-fw/linux that referenced this issue Apr 2, 2021
commit 0bb7883 upstream.

While removing a qgroup's sysfs entry we end up taking the kernfs_mutex,
through kobject_del(), while holding the fs_info->qgroup_lock spinlock,
producing the following trace:

  [821.843637] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281
  [821.843641] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 28214, name: podman
  [821.843644] CPU: 3 PID: 28214 Comm: podman Tainted: G        W         5.11.6 andy-shev#15
  [821.843646] Hardware name: Dell Inc. PowerEdge R330/084XW4, BIOS 2.11.0 12/08/2020
  [821.843647] Call Trace:
  [821.843650]  dump_stack+0xa1/0xfb
  [821.843656]  ___might_sleep+0x144/0x160
  [821.843659]  mutex_lock+0x17/0x40
  [821.843662]  kernfs_remove_by_name_ns+0x1f/0x80
  [821.843666]  sysfs_remove_group+0x7d/0xe0
  [821.843668]  sysfs_remove_groups+0x28/0x40
  [821.843670]  kobject_del+0x2a/0x80
  [821.843672]  btrfs_sysfs_del_one_qgroup+0x2b/0x40 [btrfs]
  [821.843685]  __del_qgroup_rb+0x12/0x150 [btrfs]
  [821.843696]  btrfs_remove_qgroup+0x288/0x2a0 [btrfs]
  [821.843707]  btrfs_ioctl+0x3129/0x36a0 [btrfs]
  [821.843717]  ? __mod_lruvec_page_state+0x5e/0xb0
  [821.843719]  ? page_add_new_anon_rmap+0xbc/0x150
  [821.843723]  ? kfree+0x1b4/0x300
  [821.843725]  ? mntput_no_expire+0x55/0x330
  [821.843728]  __x64_sys_ioctl+0x5a/0xa0
  [821.843731]  do_syscall_64+0x33/0x70
  [821.843733]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [821.843736] RIP: 0033:0x4cd3fb
  [821.843741] RSP: 002b:000000c000906b20 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
  [821.843744] RAX: ffffffffffffffda RBX: 000000c000050000 RCX: 00000000004cd3fb
  [821.843745] RDX: 000000c000906b98 RSI: 000000004010942a RDI: 000000000000000f
  [821.843747] RBP: 000000c000907cd0 R08: 000000c000622901 R09: 0000000000000000
  [821.843748] R10: 000000c000d992c0 R11: 0000000000000206 R12: 000000000000012d
  [821.843749] R13: 000000000000012c R14: 0000000000000200 R15: 0000000000000049

Fix this by removing the qgroup sysfs entry while not holding the spinlock,
since the spinlock is only meant for protection of the qgroup rbtree.

Reported-by: Stuart Shelton <srcshelton@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/7A5485BB-0628-419D-A4D3-27B1AF47E25A@gmail.com/
Fixes: 49e5fb4 ("btrfs: qgroup: export qgroups in sysfs")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
htot pushed a commit to htot/linux that referenced this issue Feb 3, 2022
Per HiFive Unmatched schematics, the card detect signal of the
micro SD card is connected to gpio pin andy-shev#15, which should be
reflected in the DT via the <gpios> property, as described in
Documentation/devicetree/bindings/mmc/mmc-spi-slot.txt.

[1] https://sifive.cdn.prismic.io/sifive/6a06d6c0-6e66-49b5-8e9e-e68ce76f4192_hifive-unmatched-schematics-v3.pdf

Signed-off-by: Bin Meng <bin.meng@windriver.com>
Fixes: d573b55 ("riscv: dts: add initial board data for the SiFive HiFive Unmatched")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
andy-shev pushed a commit that referenced this issue Jan 14, 2024
When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a
cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when
removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be
dereferenced as wrong struct in irdma_free_pending_cqp_request().

  PID: 3669   TASK: ffff88aef892c000  CPU: 28  COMMAND: "kworker/28:0"
   #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34
   #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2
   #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f
   #3 [fffffe0000549eb8] do_nmi at ffffffff81079582
   #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4
      [exception RIP: native_queued_spin_lock_slowpath+1291]
      RIP: ffffffff8127e72b  RSP: ffff88aa841ef778  RFLAGS: 00000046
      RAX: 0000000000000000  RBX: ffff88b01f849700  RCX: ffffffff8127e47e
      RDX: 0000000000000000  RSI: 0000000000000004  RDI: ffffffff83857ec0
      RBP: ffff88afe3e4efc8   R8: ffffed15fc7c9dfa   R9: ffffed15fc7c9dfa
      R10: 0000000000000001  R11: ffffed15fc7c9df9  R12: 0000000000740000
      R13: ffff88b01f849708  R14: 0000000000000003  R15: ffffed1603f092e1
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  -- <NMI exception stack> --
   #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b
   #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4
   #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363
   #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma]
   #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma]
   #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma]
   #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma]
   #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb
   #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6
   #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278
   #15 [ffff88aa841efb88] device_del at ffffffff82179d23
   #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice]
   #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice]
   #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a
   #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff
   #20 [ffff88aa841eff10] kthread at ffffffff811d87a0
   #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f

Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions")
Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn
Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com>
Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
andy-shev pushed a commit that referenced this issue Jan 14, 2024
…s_del_by_dev()

I got the below warning trace:

WARNING: CPU: 4 PID: 4056 at net/core/dev.c:11066 unregister_netdevice_many_notify
CPU: 4 PID: 4056 Comm: ip Not tainted 6.7.0-rc4+ #15
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:unregister_netdevice_many_notify+0x9a4/0x9b0
Call Trace:
 rtnl_dellink
 rtnetlink_rcv_msg
 netlink_rcv_skb
 netlink_unicast
 netlink_sendmsg
 __sock_sendmsg
 ____sys_sendmsg
 ___sys_sendmsg
 __sys_sendmsg
 do_syscall_64
 entry_SYSCALL_64_after_hwframe

It can be repoduced via:

    ip netns add ns1
    ip netns exec ns1 ip link add bond0 type bond mode 0
    ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2
    ip netns exec ns1 ip link set bond_slave_1 master bond0
[1] ip netns exec ns1 ethtool -K bond0 rx-vlan-filter off
[2] ip netns exec ns1 ip link add link bond_slave_1 name bond_slave_1.0 type vlan id 0
[3] ip netns exec ns1 ip link add link bond0 name bond0.0 type vlan id 0
[4] ip netns exec ns1 ip link set bond_slave_1 nomaster
[5] ip netns exec ns1 ip link del veth2
    ip netns del ns1

This is all caused by command [1] turning off the rx-vlan-filter function
of bond0. The reason is the same as commit 01f4fd2 ("bonding: Fix
incorrect deletion of ETH_P_8021AD protocol vid from slaves"). Commands
[2] [3] add the same vid to slave and master respectively, causing
command [4] to empty slave->vlan_info. The following command [5] triggers
this problem.

To fix this problem, we should add VLAN_FILTER feature checks in
vlan_vids_add_by_dev() and vlan_vids_del_by_dev() to prevent incorrect
addition or deletion of vlan_vid information.

Fixes: 348a144 ("vlan: introduce functions to do mass addition/deletion of vids by another device")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
andy-shev pushed a commit that referenced this issue Apr 25, 2024
vhost_worker will call tun call backs to receive packets. If too many
illegal packets arrives, tun_do_read will keep dumping packet contents.
When console is enabled, it will costs much more cpu time to dump
packet and soft lockup will be detected.

net_ratelimit mechanism can be used to limit the dumping rate.

PID: 33036    TASK: ffff949da6f20000  CPU: 23   COMMAND: "vhost-32980"
 #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253
 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3
 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e
 #3 [fffffe00003fced0] do_nmi at ffffffff8922660d
 #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663
    [exception RIP: io_serial_in+20]
    RIP: ffffffff89792594  RSP: ffffa655314979e8  RFLAGS: 00000002
    RAX: ffffffff89792500  RBX: ffffffff8af428a0  RCX: 0000000000000000
    RDX: 00000000000003fd  RSI: 0000000000000005  RDI: ffffffff8af428a0
    RBP: 0000000000002710   R8: 0000000000000004   R9: 000000000000000f
    R10: 0000000000000000  R11: ffffffff8acbf64f  R12: 0000000000000020
    R13: ffffffff8acbf698  R14: 0000000000000058  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594
 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470
 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6
 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605
 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558
 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124
 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07
 #12 [ffffa65531497b68] printk at ffffffff89318306
 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765
 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun]
 #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun]
 #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net]
 #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost]
 #18 [ffffa65531497f10] kthread at ffffffff892d2e72
 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f

Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors")
Signed-off-by: Lei Chen <lei.chen@smartx.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
andy-shev pushed a commit that referenced this issue Sep 4, 2024
A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants