Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[testing] ABD2: linear/scatter dual typed buffer for ARC ([upstream] rebase master January 13th 2016) #4225

Conversation

kernelOfTruth
Copy link
Contributor

upload for buildbot-testing the rebased patchset

@tuxoko I apologize in advance if your rebase is still in WIP or not complete

it built fine locally - so I figured it's time for buildbot-testing to prepare an re-adoption/upgrade to the rebased branch

Thanks 👍

zfsolinux currently uses vmalloc backed slab for ARC buffers. There are some
major problems with this approach. One is that 32-bit system have only a
handful of vmalloc space. Another is that the fragmentation in slab will easily
deplete memory in busy system.

With ABD, we use scatterlist to allocate data buffers. In this approach we can
allocate in HIGHMEM, which alleviates vmalloc space pressure on 32-bit. Also,
we don't have to rely on slab, so there's no fragmentation issue.

But for metadata buffers, we still uses linear buffer from slab. The reason for
this is that there are a lot of *_phys pointers directly point to metadata
buffers. Thus, it would be much more complicated to use scatter buffer for
metadata.

Currently, ABD is not enabled and its API will treat them as normal buffers.
We will enable it once all relevant code is modified to use the API.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Modify/Add incremental fletcher function prototype to match abd_iterate_rfunc
callback type. Also, reduce duplicated code a bit in zfs_fletcher.c.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
1. Use abd_t in arc_buf_t->b_data, dmu_buf_t->db_data, zio_t->io_data and
zio_transform_t->zt_orig_data
2. zio_* function take abd_t for data

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
1. Add checksum function for abd_t
2. Use abd_t version checksum function in zio_checksum_table
3. Make zio_checksum_compute and zio_checksum_error handle abd_t

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
@kernelOfTruth kernelOfTruth changed the title [upstream][buildbot]ABD2: linear/scatter dual typed buffer for ARC [upstream/WIP][NON-production][buildbot]ABD2: linear/scatter dual typed buffer for ARC Jan 15, 2016
@tuxoko
Copy link
Contributor

tuxoko commented Jan 15, 2016

There's still bugs in it I haven't located yet.

tuxoko and others added 23 commits January 15, 2016 14:12
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
…d zil.c

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Use ABD API on related pointers and functions.(b_data, db_data, zio_*(), etc.)

Suggested-by: DHE <git@dehacked.net>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Currently, abd_uiomove repeatedly calls abd_copy_*_off. The problem is that it
will need to do abd_miter_advance repeatedly over the parts that were skipped
before.

We split out the miter part of the abd_copy_*_off into abd_miter_copy_*. These
new function will take miter directly and they will automatically advance it
after finish. We initialize an miter in uiomove and use the iterator copy
functions to solve the stated problem.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
The check is needed to make sure the user buffer is indeed in user space. Also
change copy_{to,from}_user to __copy_{to,from}_user so that we don't
repeatedly call access_ok.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
When we aren't allocating in HIGHMEM, we can try to allocate contiguous pages,
we can also use sg_alloc_table_from_pages to merge adjacent pages for us. This
will allow more efficient cache prefetch and also reduce sg iterator overhead.
And this has been tested to greatly improve performance.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
This patch adds non-highmem scatter ABD and also adds a new set of functions,
namely abd_buf_segment and friends, for accessing those ABD.

This is a preparation for metadata scatter ABD.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Some metadata types are not required or not easily to use scatter ABD. So to
allow ARC to accommodate both types of metadata, we add a new flag
ARC_FLAG_META_SCATTER to indicate which type the buffer belongs.

We also introduce a new type arc_buf_alloc_t, which is basically
arc_buf_contents_t with a flag indicating scatter metadata. Users can pass it
to arc_buf_alloc to decide which buffer type to allocate.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
We add a new member ot_scatter to dmu_ot to determine the ABD type for each
DMU type. We temporary set them all to FALSE, until some types are ready to
handle scatter abd.

BP_GET_BUFA_TYPE and DBUF_GET_BUFA_TYPE will returns with the scatter flag set
up accordingly, so they can be passed to arc_buf_alloc, and the ARC subsystem
will returns the correct ABD type.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Use non-highmem scatter ABD for indirect block, i.e. level > 0. abd_array
should be used to access the blkptr.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Use non-highmem scatter ABD for dnode array blocks, abd_array should be used
to access the dnodes.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
abd_alloc_scatter always allocate nr*PAGE_SIZE memory. So in order to save
memory, we allow it to fallback to do linear allocation when size is less than
PAGE_SIZE.

Note that orignally, we requires that zio->io_data remains the same type
before it reaches lower level, vdev_{queue,cache,disk}. After this patch,
however, transformation like compression may change a scatter io_data to
linear. This is fine, because every scatter ABD aware API can also take linear
ABD, but not vice versa. So we relax the type check in push transform to check
linear only.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
In dsl_scan_scrub_cb and spa_load_verify_cb, we originally always allocated
linear ABD. Now we try to allocate scatter ABD according to the BUFA type of
the blkptr to reduce unnecessary spl slab allocation.

Also in zio_ddt_read_start, we match the parent zio->io_data ABD type for the
same reason.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
SPA history is access purely through dmu_read/dmu_write. It doesn't require
any modification except for setting ot_scatter.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Add abd version byteswap function with the name "abd_<old bswap func name>".

Note that abd_byteswap_uint*_array and abd_dnode_buf_byteswap can handle
scatter buffer, so now we don't need extra borrow/copy.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
@kernelOfTruth kernelOfTruth force-pushed the tuxoko_zfs/abd_next_15.01.2016 branch from 154bdf9 to 17906c7 Compare January 15, 2016 23:17
@kernelOfTruth
Copy link
Contributor Author

There were dozens of changes, several large ones - so I guess it's natural that it'll take a few steps until it works.

If you have no objections I've force-pushed the changes to see how it goes.

Thanks

@kernelOfTruth
Copy link
Contributor Author

hm, that one rings a bell:

ztest: ../../module/zfs/arc.c:3811: Assertion `(arc_stats.arcstat_c.value.ui64) >= 2ULL << 24 (0xd70000 >= 0x2000000)' failed.
ztest: ../../module/zfs/arc.c:3811: Assertion `(arc_stats.arcstat_c.value.ui64) >= 2ULL << 24 (0xd70000 >= 0x2000000)' failed.
ztest: ../../module/zfs/arc.c:3811: Assertion `(arc_stats.arcstat_c.value.ui64) >= 2ULL << 24 (0xd70000 >= 0x2000000)' failed.
ztest: ../../module/zfs/arc.c:3811: Assertion `(arc_stats.arcstat_c.value.ui64) >= 2ULL << 24 (0xd70000 >= 0x2000000)' failed.
ztest: ../../module/zfs/arc.c:3811: Assertion `(arc_stats.arcstat_c.value.ui64) >= 2ULL << 24 (0xd70000 >= 0x2000000)' failed.
ztest: ../../module/zfs/arc.c:3811: Assertion `(arc_stats.arcstat_c.value.ui64) >= 2ULL << 24 (0x159930 >= 0x2000000)' failed.
ztest: ../../module/zfs/arc.c:3811: Assertion `(arc_stats.arcstat_c.value.ui64) >= 2ULL << 24 (0xd70000 >= 0x2000000)' failed.
ztest: ../../module/zfs/arc.c:3811: Assertion `(arc_stats.arcstat_c.value.ui64) >= 2ULL << 24 (0xd70000 >= 0x2000000)' failed.
ztest: ../../module/zfs/arc.c:3811: Assertion `(arc_stats.arcstat_c.value.ui64) >= 2ULL << 24 (0xd70000 >= 0x2000000)' failed.
ztest: ../../module/zfs/arc.c:3811: Assertion `(arc_stats.arcstat_c.value.ui64) >= 2ULL << 24 (0xd70000 >= 0x2000000)' failed.
ztest: ../../module/zfs/arc.c:3811: Assertion `(arc_stats.arcstat_c.value.ui64) >= 2ULL << 24 (0xd70000 >= 0x2000000)' failed.
ztest: ../../module/zfs/arc.c:3811: Assertion `(arc_stats.arcstat_c.value.ui64) >= 2ULL << 24 (0x159930 >= 0x2000000)' failed.
/sbin/ztest[0x408133]
/lib64/libpthread.so.0(+0xf790)[0x7f68ccbe2790]
/lib64/libc.so.6(gsignal+0x35)[0x7f68cc871625]
/lib64/libc.so.6(abort+0x175)[0x7f68cc872e05]
/lib64/libc.so.6(+0x2b74e)[0x7f68cc86a74e]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x7f68cc86a810]
/sbin/ztest[0x408133]
/lib64/libc.so.6(+0x2b87b)[0x7f68cc86a87b]
/lib64/libzpool.so.2(+0x37638)[0x7f68cdd21638]
/lib64/libzpool.so.2(+0x395d3)[0x7f68cdd235d3]
/lib64/libzpool.so.2(arc_buf_alloc+0x1fa)[0x7f68cdd2a84a]
/lib64/libzpool.so.2(arc_read+0x1d6)[0x7f68cdd2ae76]
/lib64/libzpool.so.2(+0x4bfa5)[0x7f68cdd35fa5]
/lib64/libzpool.so.2(dbuf_read+0xe6)[0x7f68cdd36396]
/lib64/libzpool.so.2(+0x4f2c8)[0x7f68cdd392c8]
/lib64/libzpool.so.2(dbuf_hold_impl+0xe4)[0x7f68cdd38bd4]
/lib64/libzpool.so.2(dbuf_hold_level+0x2a)[0x7f68cdd3981a]
/lib64/libzpool.so.2(dmu_buf_hold_noread+0x96)[0x7f68cdd44a16]
/lib64/libzpool.so.2(dmu_buf_hold+0x42)[0x7f68cdd45032]
/lib64/libzpool.so.2(zap_lockdir+0x83)[0x7f68cddd8ce3]
/lib64/libzpool.so.2(zap_add+0x61)[0x7f68cddda3c1]
/sbin/ztest[0x412599]
/sbin/ztest[0x40ab8d]
/lib64/libzpool.so.2(zk_thread_helper+0x22c)[0x7f68cdd1565c]
/lib64/libpthread.so.0(+0x7a51)[0x7f68ccbdaa51]
/lib64/libc.so.6(clone+0x6d)[0x7f68cc92793d]
child died with signal 6
ztest: ../../module/zfs/arc.c:3811: Assertion `(arc_stats.arcstat_c.value.ui64) >= 2ULL << 24 

referencing: #3904 PANIC at arc.c:3760:arc_adapt()

@behlendorf
Copy link
Contributor

@kernelOfTruth you may want to add #4197 to your stack. This should address the PANIC you're seeing, for some reason it appears to be happening much more frequently now in the buildbot. Id definitely want o get that finalized and merged hopefully next week.

@tuxoko
Copy link
Contributor

tuxoko commented Jan 16, 2016

There's a segfault here http://build.zfsonlinux.org/builders/Ubuntu%2014.04%20x86_64%20%28TEST%29/builds/456/steps/shell_5/logs/stdio
Not sure where does that come from.

@kernelOfTruth
Copy link
Contributor Author

gladly at least one buildbot appeared to not suffer from that PANIC - let's see what the Amazon TEST buildbot will show

@behlendorf sure, will do that - thanks

@kernelOfTruth
Copy link
Contributor Author

https://github.com/zfsonlinux/zfs/blob/ab5cbbd1078bf007b50b084bb31fd58c7c5652f4/lib/libzpool/kernel.c

zk_thread_join

calls pthread_join

found one related issue: #989 ztest failures

judging from http://man7.org/linux/man-pages/man3/pthread_join.3.html

there could be an access, locking issue


edit:

adding dweeezil@d3fd50c "Update arc_c under a mutex" from #4197 only on top of the upstream ABD2 rebase - that should work better - and if that causes trouble, the commit can still be removed to TEST build the last commit of ABD2

@kernelOfTruth kernelOfTruth force-pushed the tuxoko_zfs/abd_next_15.01.2016 branch from 17906c7 to 7c6c2ee Compare January 16, 2016 01:23
The arc_c value can be updated concurrently by multiple threads including
the arc reclaim thread, kswapd, kthreadd and user processes.  This patch
updates it under a mutex to close the race between the checking of its
value and subsequent updates.

Reverts: 935434e
Fixes: openzfs#3904
Fixes: openzfs#4161
@kernelOfTruth kernelOfTruth force-pushed the tuxoko_zfs/abd_next_15.01.2016 branch from 7c6c2ee to beb209c Compare January 16, 2016 15:53
@kernelOfTruth kernelOfTruth changed the title [upstream/WIP][NON-production][buildbot]ABD2: linear/scatter dual typed buffer for ARC [upstream][testing][buildbot]ABD2: linear/scatter dual typed buffer for ARC Jan 16, 2016
@kernelOfTruth kernelOfTruth changed the title [upstream][testing][buildbot]ABD2: linear/scatter dual typed buffer for ARC [testing][buildbot]ABD2: linear/scatter dual typed buffer for ARC Jan 16, 2016
@kernelOfTruth
Copy link
Contributor Author

Looking good 🍰

Great work & thanks @tuxoko 👍

@kernelOfTruth
Copy link
Contributor Author

@sempervictus ping

@kernelOfTruth kernelOfTruth changed the title [testing][buildbot]ABD2: linear/scatter dual typed buffer for ARC [testing]ABD2: linear/scatter dual typed buffer for ARC ([upstream] rebase master January 13th 2016) Jan 16, 2016
@kernelOfTruth kernelOfTruth changed the title [testing]ABD2: linear/scatter dual typed buffer for ARC ([upstream] rebase master January 13th 2016) [testing] ABD2: linear/scatter dual typed buffer for ARC ([upstream] rebase master January 13th 2016) Jan 16, 2016
@kernelOfTruth
Copy link
Contributor Author

closing, superseded by newer pull requests/testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants