-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[buildbot, proof-of-concept] #2129, #3115 #3188
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a revision of openzfs#2332 Currently, the optimization only applies to kernel space, because I haven't figured out how to it properly in user space. AVX2 is untested because I don't have such CPU. So use it with you own discretion.
…esystem openzfs#2784 Reviewed by: Adam Leventhal adam.leventhal@delphix.com Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Sebastien Roy sebastien.roy@delphix.com Reviewed by: Boris Protopopov bprotopopov@hotmail.com Approved by: Dan McDonald danmcd@omniti.com Ported-by: Richard Yao richard.yao@clusterhq.com Porting notes: 1. ZoL currently does not log discards to zvols, so the portion of this patch that modifies the discard logging to mark it as freeing space has been discarded. may_delete_now had been removed from zfs_remove() in ZoL. It has been reintroduced. https://illumos.org/issues/4950
… v2) In file included from /var/tmp/portage/sys-fs/zfs-kmod-9999-r1/work/zfs-kmod-9999/module/zfs/../../module/zfs/sha256.c:29:0: /var/tmp/portage/sys-fs/zfs-kmod-9999-r1/work/zfs-kmod-9999/include/sys/sha256.h:5:24: fatal error: asm/sha256.h: No such file or directory #include <asm/sha256.h>
The use of vdev GUIDs are a necessary workaround in edge cases where the names provided by zpool status are not accepted by the zpool detach/offline/remove/replace commands. The current method of obtaining them uses zdb, but this does not work in all cases (see openzfs#1530). This provides a method of obtaining vdev GUIDs that is more reliable and straightforward than zdb. It would be better to fix all edge cases that require the use of GUIDs as a workaround, but Linux's /dev design makes it difficult to anticipate such edge cases, which makes this option necessary. Note that this adds a new boolean parameter to zpool_vdev_name, which changes the libzfs interface. Closes openzfs#2011 Signed-off-by: Richard Yao ryao@gentoo.org
The commit d958324 fixed the deadlock between page lock and range lock by unlocking the page lock before aquiring the range lock. However, this create a new issue openzfs#3075. The problem is that if we don't set the writeback flag before releasing the page flag, other process will be unaware that the page is under writeback and may truncate or invalidate the page, or may break the sync semantic. To work around this problem, we lock the page again after aquiring the range lock and check the validity of the page. If all is well, we set the write back bit and continue on. Signed-off-by: Chunwei Chen tuxoko@gmail.com
Introduce ABD: linear/scatter dual typed buffer for ARC zfsolinux currently uses vmalloc backed slab for ARC buffers. There are some major problems with this approach. One is that 32-bit system have only a handful of vmalloc space. Another is that the fragmentation in slab will easily trigger OOM in busy system. With ABD, we use scatterlist to allocate data buffers. In this approach we can allocate in HIGHMEM, which alleviates vmalloc space pressure on 32-bit. Also, we don't have to rely on slab, so there's no fragmentation issue. But for metadata buffers, we still uses linear buffer from slab. The reason for this is that there are a lot of *_phys pointers directly point to metadata buffers. So it's kind of impractical to change all those code. And metadata should only be a small percentage so we probably don't have to worry about the 32-bit and fragmentation issue, right? This patchset is divided into the following part: 1. Add compatibility layer for {kmap,kunmap}_atomic 2. Introduce ABD, but with scatter type disabled 3. Modify checksum implementation for ABD 4. Modify every file which touches abd_t field 5. Enable scatter type ABD Edit: I forgot to mention... There are some parts that hasn't been able to handle scatter buffer, compression and raidz gaussian elimination to name a few. Enable them will cost you extra copy than without this patch set. Another thing is the xuio thing, which I totally skip. Todo: 1. Clean up the implementation and documentation 2. Compression and raidz gaussian elimination takes scatter buffer 3. Probably change vdev_cache and vdev_queue to scatter buffer 4. vdev_file should be able to submit a iovec. --------------------------------------------------------------------------------------- Rebased and add userspace scatter ABD --------------------------------------------------------------------------------------- Remove container_of kernel code --------------------------------------------------------------------------------------- Rebase to master and add two patch for missing things There're still a lot of checksum error when running ztest, and all of them are data blocks. I still haven't figure out why. Apr 24 2014 --------------------------------------------------------------------------------------- So the original sha256 implementation cannot be used as-is for scatter abd. This should work now. Apr 25 2014 --------------------------------------------------------------------------------------- Hi all, I just found out there's a bug causing lockup when rebuilding pool with raidz2 or above. I've already fixed it in my working tree, but I still need to test it together with other piles of changes. In the meantime, if you need to rebuild such pool, please use unpatched zfs to do so. Thanks Jun 18 2014 --------------------------------------------------------------------------------------- @sempervictus When the bug is triggered, there will be a zfs thread looping infinitely. So you should see that thread consuming 100% CPU in system monitor. Also, if the kernel is built with !CONFIG_PREEMPT, there should be "BUG: soft lockup" in dmesg. --------------------------------------------------------------------------------------- Hi all I updated the patch series. There are many changes, but most of them are cosmetic. The notable ones are: 1. Rebase to 0.6.3 2. Bug fixes are squashed back. 3. Fix make checkstyle 4. Rename some APIs. This is to differentiate "buf" and "linear", where the first denotes normal buffer and the latter denotes linear type abd. 5. Fix the above mentioned raidz2 rebuild bug. --------------------------------------------------------------------------------------- @DeHackEd I've updated these patches. How will buildbot run it? Edit: Never mind the question. Oct 3 2014 --------------------------------------------------------------------------------------- I updated this patch series. It turned out not many conflicts needed to be dealt with. Also, the panic above hit by @DeHackEd emerges when you directly rebase the old ABD revision onto the master. The problematic part is exactly where the compiler warning occurs as @sempervictus pointed out. So it would be a good idea not to ignore them. Anyway this issue is dealt with in restore_write(). So it shouldn't happen again. Jan 13 2015 --------------------------------------------------------------------------------------- @prakashsurya I personally think this patch series is quite stable. However, there are some corners that should be smoothed out before being merged. I'll work on them once the kmem-rework mentioned by @behlendorf is finished. --------------------------------------------------------------------------------------- @prakashsurya The most ugliest thing in this patch series is probably the last bit hack for determining scatter or linear type buffer. It should be hide behind the ABD struct and APIs. Although it's not that hard to fix (and the main reason it hasn't been done is due to my laziness), what I really like to do is to use scatterlist for both scatter buffer and linear buffer, so we only need single version of function for each API. However, earlier version of Linux doesn't provide "skip" for the sg_miter iterator, which would make everyone's life much easier when looking at the code. So, I'm kind of hesitate which approach I should use. Other problems like making things work with scatter buffer, so we don't need extra copy. The hardest part for this would be the compression algorithms. But these could be dealt afterward if it's acceptable with everyone. --------------------------------------------------------------------------------------- Hi all, I just push out a new revision of the patch series. This is a rather major update, so there could be new bug introduced, though I didn't find any in my testing. Anyway, test it before using it. Notable changes are: 1. Both scatter and linear type ABD are now under the ABD struct. No more bit hack (finally!) 2. In stead of using the sg_miter provided by kernel, I introduced my own abd_miter. The reason for this is that a) it can be use on both type of ABD, b) sg_miter cannot be used to simultaneously iterate on two scatterlist in older kernel, and c) we sometimes need to fall back to non-atomic kmap. 3. Some API changes. Now with this version being pushed out. I believe this patch series is ready for serious review. Jan 12 2015 --------------------------------------------------------------------------------------- The version I pushed yesterday would fail to build on older kernels. This one should fix the problem. Jan 13 2015
* zfs_arc_meta_prune now takes a number of objects to scan. * Scan 10,000 object per-batch in the dentry/inode caches when attempting to reclaim in order to honor arc_meta_limit. * Retry immediately in arc_adjust_meta(). * Fix arc_meta_max according, it should be updated in consume not return. Signed-off-by: Brian Behlendorf behlendorf1@llnl.gov Issue openzfs#3160 -------------------------------------------------------- zfs_arc_meta_prune raised to 12500
https://github.com/DeHackEd/zfs/blob/2827b75ae1fd5c0c1627dd46eac0b81992bf4afd/include/sys/zfs_vfsops.h to a higher value , going with experimental value of 2048
arc_evict_iterations zfs_arc_grow_retry zfs_arc_shrink_shift
This reverts commit f3f5ece.
2048 lead to trouble during import of pool(s)
This patch helps the most common cases. I can still get other deadlocks, however. openzfs#3183
This reverts commit 037763e. XXX - expand this comment as to why we're reverting it. (conflicts due to openzfs#2129)
This reverts commit 68121a0. Conflicts: module/zfs/arc.c
This reverts commit ecf3d9b. Conflicts: module/zfs/arc.c module/zfs/ddt.c module/zfs/spa_misc.c
5369 arc flags should be an enum 5370 consistent arc_buf_hdr_t naming scheme Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Alex Reece <alex.reece@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Richard Lowe <richlowe@richlowe.net> (conflicts due to openzfs#2129)
5408 managing ZFS cache devices requires lots of RAM Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Don Brady <dev.fs.zfs@gmail.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Approved by: Garrett D'Amore <garrett@damore.org> Ported by: Tim Chase <tim@chase2k.com> Porting notes: Due to the restructuring of the ARC-related structures, this patch conflicts with at least the following existing ZoL commits: 6e1d727 Fix inaccurate arcstat_l2_hdr_size calculations The ARC_SPACE_HDRS constant no longer exists and has been somewhat equivalently replaced by HDR_L2ONLY_SIZE. e0b0ca9 Add visibility in to cached dbufs The new layering of l{1,2}arc_buf_hdr_t within the arc_buf_hdr struct requires additional structure member names to be used when referencing the inner items. Also, the presence of L1 or L2 inner member is indicated by flags using the new HDR_HAS_L{1,2}HDR macros. (conflicts due to openzfs#2129)
This reverts commit 8135db5.
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com> Porting notes: This patch from illumos either duplicates or obviates a number of ZoL-specific patches which have been added over time. As part of this porting effort, several functions were copied in toto from illumos, effictively undoing some of the previous ZoL-specific patches: ... trying to remember which ones ... arc_adapt() ? arc_adjust_impl() arc_adjust_meta() arc_adjust_type() Other notes: The illumos 5368 patch (ARC should cache more metadata), which was never picked up by ZoL, are mostly undone by this patch. Since ZoL relies on the kernel asynchronously calling the shrinker to actually reap memory, the shrinker wakes up arc_reclaim_waiters_cv every time it runs. Similarly, and in order to operate in a similar manner as to illumos, the arc_adapt_thread does the same, analogously to the arc_reclaim_thread in illumos. Notable conflicting ZoL commits which conflict with this patch or whose effects are either duplicated or un-done by this patch: 302f753 - Integrate ARC more tightly with Linux 39e055c - Adjust arc_p based on "bytes" in arc_shrink f521ce1 - Allow "arc_p" to drop to zero or grow to "arc_c" 77765b5 - Remove "arc_meta_used" from arc_adjust calculation 94520ca - Prune metadata from ghost lists in arc_adjust_meta arcstat.py needs to be updated to deal with new/missing stats. I have removed "recycle_miss" from it in order that it still work. (conflicts due to openzfs#2129)
arc.c:3470:47: error: ‘buf’ undeclared (first use in this function) cmn_err(CE_PANIC, "invalid arc state 0x%p", buf->b_state); change introduced with "Illumos 5369 - arc flags should be an enum"
due to "5408 managing ZFS cache devices requires lots of RAM" the name changed again hdr->b_state to hdr->b_l1hdr.b_state
arc.c:1712:16: warning: assignment from incompatible pointer type df->l2df_func = free_func; (oversight) void (*free_func)(abd_t *, size_t))
This reverts commit fa8f5cd.
closed in favor of #3189 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Like written in #3115
If I had known that I would succeed, I would have done it on a clean base (D'oh ! sorry about that)
anyway, it's a waste of work to throw it away and start anew (perhaps I'll do so - but for now the buildbots can chew on it) - here's a branch / tree with #2129 and this patchset #3115 on top
https://github.com/kernelOfTruth/zfs/commits/zfs_kOT_15.03.2015
Midway through I realized that I had to revert #3181 otherwise it would have been even more complicated
it compiled cleanly - and besides the above mentioned message
module/zfs/multilist.c:30:1: warning: ‘multilist_d2l’ defined but not used [-Wunused-function]
multilist_d2l(multilist_t *ml, void *obj)
there were no error messages
Hope it helps
edit:
Now if I could figure out how to get the buildbots going ?