Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aufs kernel panics #121

Open
paralin opened this issue Aug 12, 2015 · 5 comments
Open

Aufs kernel panics #121

paralin opened this issue Aug 12, 2015 · 5 comments

Comments

@paralin
Copy link

paralin commented Aug 12, 2015

I think there might be an issue with the aufs merge into the odroid-xu3 branch.

I looked up this error online and it was said that it was due to a buggy implementation in the kernel.

[   26.595581] [c7] aufs au_opts_verify:1602:docker[432]: dirperm1 breaks the protection by the permission bits on the lower branch
[   26.644463] [c4] aufs au_opts_verify:1602:docker[442]: dirperm1 breaks the protection by the permission bits on the lower branch
[   26.778337] [c4] cgroup: docker (442) created nested cgroup for controller "memory" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.
[   26.792432] [c4] cgroup: "memory" requires setting use_hierarchy to 1 on the root.
[   26.916644] [c4] BUG: looking up invalid subclass: 8
[   26.920145] [c4] turning off the locking correctness validator.
[   26.926042] [c4] CPU: 4 PID: 629 Comm: docker Tainted: G           O 3.10.82 #1
[   26.933331] [c4] Backtrace:
[   26.936194] [c4] [<c0012e90>] (dump_backtrace+0x0/0x114) from [<c0013100>] (show_stack+0x20/0x24)
[   26.945045] [c4]  r6:c0911a5c r5:c095c99c r4:00000000 r3:00000000
[   26.951114] [c4] [<c00130e0>] (show_stack+0x0/0x24) from [<c06049c0>] (dump_stack+0x24/0x28)
[   26.959544] [c4] [<c060499c>] (dump_stack+0x0/0x28) from [<c008ef3c>] (__lock_acquire.isra.26+0x4c8/0xd24)
[   26.969171] [c4] [<c008ea74>] (__lock_acquire.isra.26+0x0/0xd24) from [<c008ffac>] (lock_acquire+0xa4/0x138)
[   26.978975] [c4] [<c008ff08>] (lock_acquire+0x0/0x138) from [<c0605d50>] (mutex_lock_nested+0x68/0x3dc)
[   26.988359] [c4] [<c0605ce8>] (mutex_lock_nested+0x0/0x3dc) from [<bf3e9d88>] (au_lkup_dentry+0x4ac/0x508 [aufs])
[   26.998597] [c4] [<bf3e98dc>] (au_lkup_dentry+0x0/0x508 [aufs]) from [<bf3f2804>] (aufs_lookup+0xd8/0x278 [aufs])
[   27.008833] [c4] [<bf3f272c>] (aufs_lookup+0x0/0x278 [aufs]) from [<c013d5c4>] (lookup_real+0x30/0x5c)
[   27.018102] [c4]  r8:00000001 r7:c5847e08 r6:c6639bd0 r5:c6639bd0 r4:c66cc888
r3:bf3f272c
[   27.026340] [c4] [<c013d594>] (lookup_real+0x0/0x5c) from [<c013e39c>] (__lookup_hash+0x48/0x50)
[   27.035108] [c4]  r5:c6639bd0 r4:00000001
[   27.039094] [c4] [<c013e354>] (__lookup_hash+0x0/0x50) from [<c013ea30>] (lookup_slow+0x4c/0xb8)
[   27.047862] [c4]  r5:c5847e10 r4:c5847e68
[   27.051851] [c4] [<c013e9e4>] (lookup_slow+0x0/0xb8) from [<c0140840>] (path_lookupat+0x23c/0x830)
[   27.060791] [c4]  r7:ffffff9c r6:c5847e08 r5:c5847e10 r4:c5846028
[   27.066860] [c4] [<c0140604>] (path_lookupat+0x0/0x830) from [<c0140e64>] (filename_lookup.isra.45+0x30/0x78)
[   27.076762] [c4] [<c0140e34>] (filename_lookup.isra.45+0x0/0x78) from [<c01432a0>] (user_path_at_empty+0x64/0x8c)
[   27.086996] [c4]  r7:c5847f00 r6:c5847e68 r5:00000001 r4:c630e000
[   27.093063] [c4] [<c014323c>] (user_path_at_empty+0x0/0x8c) from [<c01432ec>] (user_path_at+0x24/0x2c)
[   27.102353] [c4]  r8:c5847f40 r7:10c35920 r6:ffffff9c r5:00000001 r4:10c32a80
[   27.109462] [c4] [<c01432c8>] (user_path_at+0x0/0x2c) from [<c01388f8>] (vfs_fstatat+0x54/0xa8)
[   27.118150] [c4] [<c01388a4>] (vfs_fstatat+0x0/0xa8) from [<c0138974>] (vfs_stat+0x28/0x2c)
[   27.126473] [c4]  r8:c000ea64 r7:000000c3 r6:35386534 r5:00000000 r4:10c32a80
[   27.133586] [c4] [<c013894c>] (vfs_stat+0x0/0x2c) from [<c01390b4>] (SyS_stat64+0x24/0x40)
[   27.141841] [c4] [<c0139090>] (SyS_stat64+0x0/0x40) from [<c000e840>] (ret_fast_syscall+0x0/0x38)
[   27.150686] [c4]  r4:00000000
b26eb84e669fb72092415f848d54e851e38d1d8172d2526bac5821584f4b9e13

Any ideas?

@umiddelb
Copy link

Hi @sfjro,

do you have any ideas? I know, aufs_3.10 has run out of support, but there is no 3.14 kernel available for this platform.

@paralin : Which docker version are you using?

Best regards
Uli

@paralin
Copy link
Author

paralin commented Aug 13, 2015

Latest release from github.

On Thu, Aug 13, 2015, 3:49 AM Uli Middelberg notifications@github.com
wrote:

Hi @sfjro https://github.com/sfjro,

do you have any ideas? I know, aufs_3.10 has run out of support, but there
is no 3.14 kernel available for this platform.

@paralin https://github.com/paralin : Which docker version are you
using?

Best regards
Uli


Reply to this email directly or view it on GitHub
#121 (comment).

@sfjro
Copy link

sfjro commented Aug 13, 2015

Uli Middelberg:

do you have any ideas? I know, aufs_3.10 has run out of support, but there is no 3.14 kernel available for this platform.

This is a well known generic issue on linux kernel.
On LKML, people sometimes posts and suggests about how to increase the
LOCKDEP limit.
If you don't know about LOCKDEP, then I'd suggest you to disable
CONFIG_LOCKDEP or you should read
linux/Documentation/locking/lockdep-design.txt.

If you want to enable LOCKDEP with aufs, then you need to increase these
values in linux/include/linux/lockdep.h and
linux/kernel/lockdep_internals.h.

  • MAX_LOCKDEP_SUBCLASSES
  • MAX_LOCKDEP_KEYS_BITS
  • MAX_LOCKDEP_ENTRIES
  • MAX_LOCKDEP_CHAINS_BITS
  • MAX_STACK_TRACE_ENTRIES

J. R. Okajima

@umiddelb
Copy link

@sfjro Thank you very much for the fast reply

@paralin The latest release from github (1.8.0) requires additional patches

@paralin
Copy link
Author

paralin commented Aug 14, 2015

mdrjr pushed a commit that referenced this issue Aug 5, 2019
[ Upstream commit 80bf6ce ]

When we get into activate_mm(), lockdep complains that we're doing
something strange:

    WARNING: possible circular locking dependency detected
    5.1.0-10252-gb00152307319-dirty #121 Not tainted
    ------------------------------------------------------
    inside.sh/366 is trying to acquire lock:
    (____ptrval____) (&(&p->alloc_lock)->rlock){+.+.}, at: flush_old_exec+0x703/0x8d7

    but task is already holding lock:
    (____ptrval____) (&mm->mmap_sem){++++}, at: flush_old_exec+0x6c5/0x8d7

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&mm->mmap_sem){++++}:
           [...]
           __lock_acquire+0x12ab/0x139f
           lock_acquire+0x155/0x18e
           down_write+0x3f/0x98
           flush_old_exec+0x748/0x8d7
           load_elf_binary+0x2ca/0xddb
           [...]

    -> #0 (&(&p->alloc_lock)->rlock){+.+.}:
           [...]
           __lock_acquire+0x12ab/0x139f
           lock_acquire+0x155/0x18e
           _raw_spin_lock+0x30/0x83
           flush_old_exec+0x703/0x8d7
           load_elf_binary+0x2ca/0xddb
           [...]

    other info that might help us debug this:

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(&mm->mmap_sem);
                                   lock(&(&p->alloc_lock)->rlock);
                                   lock(&mm->mmap_sem);
      lock(&(&p->alloc_lock)->rlock);

     *** DEADLOCK ***

    2 locks held by inside.sh/366:
     #0: (____ptrval____) (&sig->cred_guard_mutex){+.+.}, at: __do_execve_file+0x12d/0x869
     #1: (____ptrval____) (&mm->mmap_sem){++++}, at: flush_old_exec+0x6c5/0x8d7

    stack backtrace:
    CPU: 0 PID: 366 Comm: inside.sh Not tainted 5.1.0-10252-gb00152307319-dirty #121
    Stack:
     [...]
    Call Trace:
     [<600420de>] show_stack+0x13b/0x155
     [<6048906b>] dump_stack+0x2a/0x2c
     [<6009ae64>] print_circular_bug+0x332/0x343
     [<6009c5c6>] check_prev_add+0x669/0xdad
     [<600a06b4>] __lock_acquire+0x12ab/0x139f
     [<6009f3d0>] lock_acquire+0x155/0x18e
     [<604a07e0>] _raw_spin_lock+0x30/0x83
     [<60151e6a>] flush_old_exec+0x703/0x8d7
     [<601a8eb8>] load_elf_binary+0x2ca/0xddb
     [...]

I think it's because in exec_mmap() we have

	down_read(&old_mm->mmap_sem);
...
        task_lock(tsk);
...
	activate_mm(active_mm, mm);
	(which does down_write(&mm->mmap_sem))

I'm not really sure why lockdep throws in the whole knowledge
about the task lock, but it seems that old_mm and mm shouldn't
ever be the same (and it doesn't deadlock) so tell lockdep that
they're different.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mdrjr pushed a commit that referenced this issue Aug 29, 2019
[ Upstream commit 80bf6ce ]

When we get into activate_mm(), lockdep complains that we're doing
something strange:

    WARNING: possible circular locking dependency detected
    5.1.0-10252-gb00152307319-dirty #121 Not tainted
    ------------------------------------------------------
    inside.sh/366 is trying to acquire lock:
    (____ptrval____) (&(&p->alloc_lock)->rlock){+.+.}, at: flush_old_exec+0x703/0x8d7

    but task is already holding lock:
    (____ptrval____) (&mm->mmap_sem){++++}, at: flush_old_exec+0x6c5/0x8d7

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&mm->mmap_sem){++++}:
           [...]
           __lock_acquire+0x12ab/0x139f
           lock_acquire+0x155/0x18e
           down_write+0x3f/0x98
           flush_old_exec+0x748/0x8d7
           load_elf_binary+0x2ca/0xddb
           [...]

    -> #0 (&(&p->alloc_lock)->rlock){+.+.}:
           [...]
           __lock_acquire+0x12ab/0x139f
           lock_acquire+0x155/0x18e
           _raw_spin_lock+0x30/0x83
           flush_old_exec+0x703/0x8d7
           load_elf_binary+0x2ca/0xddb
           [...]

    other info that might help us debug this:

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(&mm->mmap_sem);
                                   lock(&(&p->alloc_lock)->rlock);
                                   lock(&mm->mmap_sem);
      lock(&(&p->alloc_lock)->rlock);

     *** DEADLOCK ***

    2 locks held by inside.sh/366:
     #0: (____ptrval____) (&sig->cred_guard_mutex){+.+.}, at: __do_execve_file+0x12d/0x869
     #1: (____ptrval____) (&mm->mmap_sem){++++}, at: flush_old_exec+0x6c5/0x8d7

    stack backtrace:
    CPU: 0 PID: 366 Comm: inside.sh Not tainted 5.1.0-10252-gb00152307319-dirty #121
    Stack:
     [...]
    Call Trace:
     [<600420de>] show_stack+0x13b/0x155
     [<6048906b>] dump_stack+0x2a/0x2c
     [<6009ae64>] print_circular_bug+0x332/0x343
     [<6009c5c6>] check_prev_add+0x669/0xdad
     [<600a06b4>] __lock_acquire+0x12ab/0x139f
     [<6009f3d0>] lock_acquire+0x155/0x18e
     [<604a07e0>] _raw_spin_lock+0x30/0x83
     [<60151e6a>] flush_old_exec+0x703/0x8d7
     [<601a8eb8>] load_elf_binary+0x2ca/0xddb
     [...]

I think it's because in exec_mmap() we have

	down_read(&old_mm->mmap_sem);
...
        task_lock(tsk);
...
	activate_mm(active_mm, mm);
	(which does down_write(&mm->mmap_sem))

I'm not really sure why lockdep throws in the whole knowledge
about the task lock, but it seems that old_mm and mm shouldn't
ever be the same (and it doesn't deadlock) so tell lockdep that
they're different.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants