repair.c:263: check_node: Assertion `!(level != btrfs_header_level(b))' failed. #8

kakra · 2011-12-07T23:35:22Z

I've copied repair.c from unstable to your master branch to use its new support for lzo compression.

Trying to repair my filesystem shows the following message:

root@jupiter /usr/src/josefbacik-btrfs-progs-df8b44a % ./repair /dev/sda3
Checking extent root
parent transid verify failed on 622147694592 wanted 130733 found 134506
parent transid verify failed on 622147694592 wanted 130733 found 134506
parent transid verify failed on 622147694592 wanted 130733 found 134506
parent transid verify failed on 622147694592 wanted 130733 found 134506
Ignoring transid failure
repair: repair.c:263: check_node: Assertion `!(level != btrfs_header_level(b))' failed.

The text was updated successfully, but these errors were encountered:

kakra · 2011-12-15T03:52:20Z

Here is the backtrace in case it helps...

(gdb) run
Starting program: /usr/src/btrfs-stuff/josef-btrfs-progs/repair -d /dev/sda3
failed to read /dev/sr1: No medium found
failed to read /dev/sr0: No medium found
failed to read /dev/sr1: No medium found
failed to read /dev/sr0: No medium found
Checking extent root
parent transid verify failed on 622147694592 wanted 130733 found 134506
parent transid verify failed on 622147694592 wanted 130733 found 134506
parent transid verify failed on 622147694592 wanted 130733 found 134506
parent transid verify failed on 622147694592 wanted 130733 found 134506
Ignoring transid failure
repair: repair.c:263: check_node: Assertion `!(level != btrfs_header_level(b))' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff7451075 in raise () from /lib64/libc.so.6
(gdb) bt full
#0  0x00007ffff7451075 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007ffff7452a26 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007ffff7449935 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#3  0x0000000000402997 in check_node (root=0x63fac0, path=0x63ec10, level=0) at repair.c:263
        b = 0x82c7a0
        key = {objectid = 622147694592, type = 192 '\300', offset = 4611686018427413498}
        offset = 101
        size = 33
        i = 0
        did_cow = 0
        ret = 0
        block = 8598509584
        __PRETTY_FUNCTION__ = "check_node"
#4  0x0000000000403da0 in check_children (root=0x63fac0, path=0x63ec10, level=1) at repair.c:777
        tmp = 0x82c7a0
        b = 0x82d810
        i = 3
        ret = 0
#5  0x0000000000403dd6 in check_children (root=0x63fac0, path=0x63ec10, level=2) at repair.c:782
        tmp = 0x82d810
        b = 0x22eb670
        i = 26
        ret = 0
#6  0x0000000000403dd6 in check_children (root=0x63fac0, path=0x63ec10, level=3) at repair.c:782
        tmp = 0x22eb670
        b = 0x67b8c0
        i = 15
        ret = 0
#7  0x0000000000404556 in main (argc=3, argv=0x7fffffffdc88) at repair.c:971
        b = 0x67b8c0
        root = 0x959110
        extent_root = 0x63fac0
        path = 0x63ec10
        tree_root = 0x647460
        n = 0x4016b0
        key = {objectid = 0, type = 125 '}', offset = 360287970189656076}
        opt = -1
        ret = 0
        level = 3

kakra · 2011-12-15T04:38:26Z

If I skip calling check_node() if level <= 1 (around repair.c:777) then it does a lot more things, at least in dry run mode. Not sure if my solution is right... I tried this because BUG_ON(level==0) suggests check_node() is not meant to be called with (level-1)==0...

--- repair.c.orig       2011-12-15 05:35:36.838000374 +0100
+++ repair.c    2011-12-15 05:20:34.114717448 +0100
@@ -774,10 +774,12 @@
                }
                path->nodes[level - 1] = tmp;
                if (btrfs_header_level(tmp)) {
-                       ret = check_node(root, path, level - 1);
-                       if (ret) {
-                               free_extent_buffer(tmp);
-                               return ret;
+                       if (level > 1) {
+                               ret = check_node(root, path, level - 1);
+                               if (ret) {
+                                       free_extent_buffer(tmp);
+                                       return ret;
+                               }
                        }
                        ret = check_children(root, path, level - 1);
                        if (ret) {

When a struct btrfs_fs_devices was being torn down by btrfs_close_devices(), there was an invalidated pointer in the global list fs_uuids which still pointed to it; if a device was closed and then reopened (which btrfs-convert does), freed memory would be accessed. This was found using ThreadSanitizer (pretty much doing what AddressSanitizer would, but not exiting after the first failure). To reproduce, build with -fsanitize=thread and run 'make test'. Representative output is below. This change makes the current tests TSan-clean. WARNING: ThreadSanitizer: heap-use-after-free (pid=29161) Read of size 8 at 0x7d180000eee0 by main thread: #0 memcmp ??:0 #1 find_fsid .../volumes.c:81 #2 device_list_add .../volumes.c:95 #3 btrfs_scan_one_device .../volumes.c:259 #4 btrfs_scan_fs_devices .../disk-io.c:1002 #5 __open_ctree_fd .../disk-io.c:1090 #6 open_ctree_fd .../disk-io.c:1191 #7 do_convert .../btrfs-convert.c:2317 #8 main .../btrfs-convert.c:2745 Previous write of size 8 at 0x7d180000eee0 by main thread: #0 free ??:0 #1 btrfs_close_devices .../volumes.c:191 #2 close_ctree .../disk-io.c:1401 #3 do_convert .../btrfs-convert.c:2300 #4 main .../btrfs-convert.c:2745 Location is heap block of size 96 at 0x7d180000eee0 allocated by main thread: #0 calloc ??:0 (exe+0x00000002acc6) #1 device_list_add .../volumes.c:97 #2 btrfs_scan_one_device .../volumes.c:259 #3 btrfs_scan_fs_devices .../disk-io.c:1002 #4 __open_ctree_fd .../disk-io.c:1090 #5 open_ctree_fd .../disk-io.c:1191 #6 do_convert .../btrfs-convert.c:2256 #7 main .../btrfs-convert.c:2745 Signed-off-by: Adam Buchbinder <abuchbinder@google.com> Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.cz>

@iref

…_info_cache() This bug is exposed by fsck-test with D=asan, hit by test case 020, with the following error report: ================================================================= ==10740==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x621000061580 at pc 0x56051f0db6cd bp 0x7ffe170f3e20 sp 0x7ffe170f3e10 READ of size 1 at 0x621000061580 thread T0 #0 0x56051f0db6cc in btrfs_extent_inline_ref_type /home/adam/btrfs/btrfs-progs/ctree.h:1727 #1 0x56051f13b669 in build_roots_info_cache /home/adam/btrfs/btrfs-progs/cmds-check.c:14306 #2 0x56051f13c86a in repair_root_items /home/adam/btrfs/btrfs-progs/cmds-check.c:14450 #3 0x56051f13ea89 in cmd_check /home/adam/btrfs/btrfs-progs/cmds-check.c:14965 #4 0x56051efe75bb in main /home/adam/btrfs/btrfs-progs/btrfs.c:302 #5 0x7f04ddbb0f49 in __libc_start_main (/usr/lib/libc.so.6+0x20f49) #6 0x56051efe68c9 in _start (/home/adam/btrfs/btrfs-progs/btrfs+0x5b8c9) 0x621000061580 is located 0 bytes to the right of 4224-byte region [0x621000060500,0x621000061580) allocated by thread T0 here: #0 0x7f04ded50ce1 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:70 #1 0x56051f04685e in __alloc_extent_buffer /home/adam/btrfs/btrfs-progs/extent_io.c:553 #2 0x56051f047563 in alloc_extent_buffer /home/adam/btrfs/btrfs-progs/extent_io.c:687 #3 0x56051efff1d1 in btrfs_find_create_tree_block /home/adam/btrfs/btrfs-progs/disk-io.c:187 #4 0x56051f000133 in read_tree_block /home/adam/btrfs/btrfs-progs/disk-io.c:327 #5 0x56051efeddb8 in read_node_slot /home/adam/btrfs/btrfs-progs/ctree.c:652 #6 0x56051effb0d9 in btrfs_next_leaf /home/adam/btrfs/btrfs-progs/ctree.c:2853 #7 0x56051f13b343 in build_roots_info_cache /home/adam/btrfs/btrfs-progs/cmds-check.c:14267 #8 0x56051f13c86a in repair_root_items /home/adam/btrfs/btrfs-progs/cmds-check.c:14450 #9 0x56051f13ea89 in cmd_check /home/adam/btrfs/btrfs-progs/cmds-check.c:14965 #10 0x56051efe75bb in main /home/adam/btrfs/btrfs-progs/btrfs.c:302 #11 0x7f04ddbb0f49 in __libc_start_main (/usr/lib/libc.so.6+0x20f49) It's completely possible that one extent/metadata item has no inline reference, while build_roots_info_cache() doesn't have such check. Fix it by checking @iref against item end to avoid such problem. Issue: #92 Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

…y wrong condition to free delayed ref/head. [BUG] When btrfs-progs is compiled with D=asan, it can't pass even the very basic fsck tests due to btrfs-image has memory leak: === START TEST /home/adam/btrfs/btrfs-progs/tests//fsck-tests/001-bad-file-extent-bytenr restoring image default_case.img ================================================================= ==7790==ERROR: LeakSanitizer: detected memory leaks Direct leak of 104 byte(s) in 1 object(s) allocated from: #0 0x7f1d3b738389 in __interceptor_malloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:86 #1 0x560ca6b7f4ff in btrfs_add_delayed_tree_ref /home/adam/btrfs/btrfs-progs/delayed-ref.c:569 #2 0x560ca6af2d0b in btrfs_free_extent /home/adam/btrfs/btrfs-progs/extent-tree.c:2155 #3 0x560ca6ac16ca in __btrfs_cow_block /home/adam/btrfs/btrfs-progs/ctree.c:319 #4 0x560ca6ac1d8c in btrfs_cow_block /home/adam/btrfs/btrfs-progs/ctree.c:383 #5 0x560ca6ac6c8e in btrfs_search_slot /home/adam/btrfs/btrfs-progs/ctree.c:1153 #6 0x560ca6ab7e83 in fixup_device_size image/main.c:2113 #7 0x560ca6ab9279 in fixup_chunks_and_devices image/main.c:2333 #8 0x560ca6ab9ada in restore_metadump image/main.c:2455 #9 0x560ca6abaeba in main image/main.c:2723 #10 0x7f1d3b148ce2 in __libc_start_main (/usr/lib/libc.so.6+0x23ce2) ... tons of similar leakage for delayed_tree_ref ... Direct leak of 96 byte(s) in 1 object(s) allocated from: #0 0x7f1d3b738389 in __interceptor_malloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:86 #1 0x560ca6b7f5fb in btrfs_add_delayed_tree_ref /home/adam/btrfs/btrfs-progs/delayed-ref.c:583 #2 0x560ca6af5679 in alloc_tree_block /home/adam/btrfs/btrfs-progs/extent-tree.c:2503 #3 0x560ca6af57ac in btrfs_alloc_free_block /home/adam/btrfs/btrfs-progs/extent-tree.c:2524 #4 0x560ca6ac115b in __btrfs_cow_block /home/adam/btrfs/btrfs-progs/ctree.c:290 #5 0x560ca6ac1d8c in btrfs_cow_block /home/adam/btrfs/btrfs-progs/ctree.c:383 #6 0x560ca6b7bb15 in commit_tree_roots /home/adam/btrfs/btrfs-progs/transaction.c:98 #7 0x560ca6b7c525 in btrfs_commit_transaction /home/adam/btrfs/btrfs-progs/transaction.c:192 #8 0x560ca6ab92be in fixup_chunks_and_devices image/main.c:2337 #9 0x560ca6ab9ada in restore_metadump image/main.c:2455 #10 0x560ca6abaeba in main image/main.c:2723 #11 0x7f1d3b148ce2 in __libc_start_main (/usr/lib/libc.so.6+0x23ce2) ... tons of similar leakage for delayed_ref_head ... SUMMARY: AddressSanitizer: 1600 byte(s) leaked in 16 allocation(s). failed to restore image ./default_case.img [CAUSE] Commit c603970 ("btrfs-progs: Add delayed refs infrastructure") introduces delayed ref infrastructure for free space tree, however the refcount_dec_and_test() from kernel code is wrongly backported. refcount_dec_and_test() will return true if the refcount reaches 0. So kernel code will free the allocated space as expected: if (refcount_dec_and_test(&ref->refs)) { kmem_cache_free(); } However btrfs-progs backport is using the opposite condition: if (--ref->refs) { kfree(); } This will not free the memory for the last user, but for refs >= 2. Causing both use-after-free and memory leak for any offline write operation. [FIX] Fix the (--ref->refs) condition to (--ref->refs == 0) to fix the backport error. Fixes: c603970 ("btrfs-progs: Add delayed refs infrastructure") Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

[BUG] For certain fuzzed image, `btrfs check` will fail with the following call trace: Checking filesystem on issue_213.raw UUID: 99e50868-0bda-4d89-b0e4-7e8560312ef9 [1/7] checking root items [2/7] checking extents Program received signal SIGABRT, Aborted. 0x00007ffff7c88f25 in raise () from /usr/lib/libc.so.6 (gdb) bt #0 0x00007ffff7c88f25 in raise () from /usr/lib/libc.so.6 #1 0x00007ffff7c72897 in abort () from /usr/lib/libc.so.6 #2 0x00005555555abc3e in run_next_block (...) at check/main.c:6398 #3 0x00005555555b0f36 in deal_root_from_list (...) at check/main.c:8408 #4 0x00005555555b1a3d in check_chunks_and_extents (fs_info=0x5555556a1e30) at check/main.c:8690 #5 0x00005555555b1e3e in do_check_chunks_and_extents (fs_info=0x5555556a1e30) a #6 0x00005555555b5710 in cmd_check (cmd=0x555555696920 <cmd_struct_check>, argc #7 0x0000555555568dc7 in cmd_execute (cmd=0x555555696920 <cmd_struct_check>, ar #8 0x0000555555569713 in main (argc=2, argv=0x7fffffffde70) at btrfs.c:386 [CAUSE] This fuzzed images has a corrupted EXTENT_DATA item in data reloc tree: item 1 key (256 EXTENT_DATA 256) itemoff 16111 itemsize 12 generation 0 type 2 (prealloc) prealloc data disk byte 16777216 nr 0 prealloc data offset 0 nr 0 There are several problems with the item: - Bad item size 12 is too small. - Bad key offset offset of EXTENT_DATA type key represents file offset, which should always be aligned to sector size (4K in this particular case). [FIX] Do extra item size and key offset check for original mode, and remove the abort() call in run_next_block(). And to show off how robust lowmem mode is, lowmem can handle it without any hiccup. With this fix, original mode can detect the problem properly: Checking filesystem on issue_213.raw UUID: 99e50868-0bda-4d89-b0e4-7e8560312ef9 [1/7] checking root items [2/7] checking extents ERROR: invalid file extent item size, have 12 expect (21, 16283] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space cache [4/7] checking fs roots root 18446744073709551607 root dir 256 error root 18446744073709551607 inode 256 errors 62, no orphan item, odd file extent, bad file extent ERROR: errors found in fs roots found 131072 bytes used, error(s) found total csum bytes: 0 total tree bytes: 131072 total fs tree bytes: 32768 total extent tree bytes: 16384 btree space waste bytes: 124774 file data blocks allocated: 0 referenced 0 Issue: #213 Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Su Yue <Damenly_Su@gmx.com> Signed-off-by: David Sterba <dsterba@suse.com>

…level [BUG] When running lowmem mode with METADATA_ITEM which has invalid level, it will crash with the following backtrace: (gdb) bt #0 0x0000555555616b0b in btrfs_header_bytenr (eb=0x4) at ./kernel-shared/ctree.h:2134 #1 0x0000555555620c78 in check_tree_block_backref (root_id=5, bytenr=30457856, level=256) at check/mode-lowmem.c:3818 #2 0x0000555555621f6c in check_extent_item (path=0x7fffffffd9c0) at check/mode-lowmem.c:4334 #3 0x00005555556235a5 in check_leaf_items (root=0x555555688e10, path=0x7fffffffd9c0, nrefs=0x7fffffffda30, account_bytes=1) at check/mode-lowmem.c:4835 #4 0x0000555555623c6d in walk_down_tree (root=0x555555688e10, path=0x7fffffffd9c0, level=0x7fffffffd984, nrefs=0x7fffffffda30, check_all=1) at check/mode-lowmem.c:4967 #5 0x000055555562494f in check_btrfs_root (root=0x555555688e10, check_all=1) at check/mode-lowmem.c:5266 #6 0x00005555556254ee in check_chunks_and_extents_lowmem () at check/mode-lowmem.c:5556 #7 0x00005555555f0b82 in do_check_chunks_and_extents () at check/main.c:9114 #8 0x00005555555f50ea in cmd_check (cmd=0x55555567c640 <cmd_struct_check>, argc=3, argv=0x7fffffffdec0) at check/main.c:10892 #9 0x000055555556b2b1 in cmd_execute (argv=0x7fffffffdec0, argc=3, cmd=0x55555567c640 <cmd_struct_check>) at cmds/commands.h:125 [CAUSE] For function check_extent_item() it will go through inline extent items and then check their backrefs. But for METADATA_ITEM, it doesn't really validate key.offset, which is u64 and can contain value way larger than BTRFS_MAX_LEVEL (mostly caused by bit flip). In that case, if we have a larger value like 256 in key.offset, then later check_tree_block_backref() will use 256 as level, and overflow path->nodes[level] and crash. [FIX] Just verify the level, no matter if it's from btrfs_tree_block_level() (which is just u8), or it's from key.offset (which is u64). To do the check properly and detect higher bits corruption, also change the type of @Level from u8 to u64. Now lowmem mode can detect the problem properly: ... [2/7] checking extents ERROR: tree block 30457856 has bad backref level, has 256 expect [0, 7] ERROR: extent[30457856 16384] level mismatch, wanted: 0, have: 256 ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space tree ... Reviewed-by: Su Yue <l@damenly.su> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

kakra closed this as completed Oct 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

repair.c:263: check_node: Assertion `!(level != btrfs_header_level(b))' failed. #8

repair.c:263: check_node: Assertion `!(level != btrfs_header_level(b))' failed. #8

kakra commented Dec 7, 2011

kakra commented Dec 15, 2011

kakra commented Dec 15, 2011

repair.c:263: check_node: Assertion `!(level != btrfs_header_level(b))' failed. #8

repair.c:263: check_node: Assertion `!(level != btrfs_header_level(b))' failed. #8

Comments

kakra commented Dec 7, 2011

kakra commented Dec 15, 2011

kakra commented Dec 15, 2011