Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[buildbot, proof-of-concept] #2129, #3115 #3188

Closed
wants to merge 24 commits into from

Conversation

kernelOfTruth
Copy link
Contributor

Like written in #3115

If I had known that I would succeed, I would have done it on a clean base (D'oh ! sorry about that)

anyway, it's a waste of work to throw it away and start anew (perhaps I'll do so - but for now the buildbots can chew on it) - here's a branch / tree with #2129 and this patchset #3115 on top

https://github.com/kernelOfTruth/zfs/commits/zfs_kOT_15.03.2015

Midway through I realized that I had to revert #3181 otherwise it would have been even more complicated

it compiled cleanly - and besides the above mentioned message

module/zfs/multilist.c:30:1: warning: ‘multilist_d2l’ defined but not used [-Wunused-function]
multilist_d2l(multilist_t *ml, void *obj)
there were no error messages

Hope it helps

edit:

Now if I could figure out how to get the buildbots going ?

tuxoko and others added 24 commits March 13, 2015 05:44
This is a revision of openzfs#2332

Currently, the optimization only applies to kernel space,
because I haven't figured out how to it properly in user space.

AVX2 is untested because I don't have such CPU.
So use it with you own discretion.
…esystem openzfs#2784

Reviewed by: Adam Leventhal adam.leventhal@delphix.com
Reviewed by: George Wilson george.wilson@delphix.com
Reviewed by: Sebastien Roy sebastien.roy@delphix.com
Reviewed by: Boris Protopopov bprotopopov@hotmail.com
Approved by: Dan McDonald danmcd@omniti.com
Ported-by: Richard Yao richard.yao@clusterhq.com

Porting notes:
1. ZoL currently does not log discards to zvols, so the portion of this patch that
modifies the discard logging to mark it as freeing space has been discarded.

may_delete_now had been removed from zfs_remove() in ZoL. It has been reintroduced.

https://illumos.org/issues/4950
… v2)

In file included from /var/tmp/portage/sys-fs/zfs-kmod-9999-r1/work/zfs-kmod-9999/module/zfs/../../module/zfs/sha256.c:29:0:
/var/tmp/portage/sys-fs/zfs-kmod-9999-r1/work/zfs-kmod-9999/include/sys/sha256.h:5:24: fatal error: asm/sha256.h: No such file or directory
 #include <asm/sha256.h>
The use of vdev GUIDs are a necessary workaround in edge cases where the
names provided by zpool status are not accepted by the zpool
detach/offline/remove/replace commands. The current method of obtaining
them uses zdb, but this does not work in all cases (see
openzfs#1530).

This provides a method of obtaining vdev GUIDs that is more reliable and
straightforward than zdb. It would be better to fix all edge cases that
require the use of GUIDs as a workaround, but Linux's /dev design makes
it difficult to anticipate such edge cases, which makes this option
necessary.

Note that this adds a new boolean parameter to zpool_vdev_name, which
changes the libzfs interface.

Closes openzfs#2011

Signed-off-by: Richard Yao ryao@gentoo.org
The commit d958324 fixed the deadlock between page lock and range lock by
unlocking the page lock before aquiring the range lock. However, this create a
new issue openzfs#3075.

The problem is that if we don't set the writeback flag before releasing the
page flag, other process will be unaware that the page is under writeback and
may truncate or invalidate the page, or may break the sync semantic.

To work around this problem, we lock the page again after aquiring the range
lock and check the validity of the page. If all is well, we set the write back
bit and continue on.

Signed-off-by: Chunwei Chen tuxoko@gmail.com
Introduce ABD: linear/scatter dual typed buffer for ARC

zfsolinux currently uses vmalloc backed slab for ARC buffers. There are some
major problems with this approach. One is that 32-bit system have only a
handful of vmalloc space. Another is that the fragmentation in slab will easily
trigger OOM in busy system.

With ABD, we use scatterlist to allocate data buffers. In this approach we can
allocate in HIGHMEM, which alleviates vmalloc space pressure on 32-bit. Also,
we don't have to rely on slab, so there's no fragmentation issue.

But for metadata buffers, we still uses linear buffer from slab. The reason for
this is that there are a lot of *_phys pointers directly point to metadata
buffers. So it's kind of impractical to change all those code. And metadata
should only be a small percentage so we probably don't have to worry about
the 32-bit and fragmentation issue, right?

This patchset is divided into the following part:
1. Add compatibility layer for {kmap,kunmap}_atomic
2. Introduce ABD, but with scatter type disabled
3. Modify checksum implementation for ABD
4. Modify every file which touches abd_t field
5. Enable scatter type ABD

Edit:
I forgot to mention...
There are some parts that hasn't been able to handle scatter buffer,
compression and raidz gaussian elimination to name a few.
Enable them will cost you extra copy than without this patch set.
Another thing is the xuio thing, which I totally skip.

Todo:
1. Clean up the implementation and documentation
2. Compression and raidz gaussian elimination takes scatter buffer
3. Probably change vdev_cache and vdev_queue to scatter buffer
4. vdev_file should be able to submit a iovec.

---------------------------------------------------------------------------------------

Rebased and add userspace scatter ABD

---------------------------------------------------------------------------------------

Remove container_of kernel code

---------------------------------------------------------------------------------------

Rebase to master and add two patch for missing things

There're still a lot of checksum error when running ztest, and all of them are data blocks.
I still haven't figure out why.

Apr 24 2014

---------------------------------------------------------------------------------------

So the original sha256 implementation cannot be used as-is for scatter abd.
This should work now.

Apr 25 2014

---------------------------------------------------------------------------------------

Hi all,

I just found out there's a bug causing lockup when rebuilding pool with raidz2 or above.
I've already fixed it in my working tree, but I still need to test it together with other piles of changes.
In the meantime, if you need to rebuild such pool, please use unpatched zfs to do so.

Thanks

Jun 18 2014

---------------------------------------------------------------------------------------

@sempervictus
When the bug is triggered, there will be a zfs thread looping infinitely.
So you should see that thread consuming 100% CPU in system monitor.
Also, if the kernel is built with !CONFIG_PREEMPT, there should be "BUG: soft lockup" in dmesg.

---------------------------------------------------------------------------------------

Hi all

I updated the patch series.
There are many changes, but most of them are cosmetic.

The notable ones are:
1. Rebase to 0.6.3
2. Bug fixes are squashed back.
3. Fix make checkstyle
4. Rename some APIs. This is to differentiate "buf" and "linear", where the first denotes normal buffer and the latter denotes linear type abd.
5. Fix the above mentioned raidz2 rebuild bug.

---------------------------------------------------------------------------------------

@DeHackEd
I've updated these patches.
How will buildbot run it?

Edit: Never mind the question.

Oct 3 2014

---------------------------------------------------------------------------------------

I updated this patch series.
It turned out not many conflicts needed to be dealt with.

Also, the panic above hit by @DeHackEd emerges when you directly rebase the old ABD revision onto the master. The problematic part is exactly where the compiler warning occurs as @sempervictus pointed
out. So it would be a good idea not to ignore them.
Anyway this issue is dealt with in restore_write(). So it shouldn't happen again.

Jan 13 2015

---------------------------------------------------------------------------------------

@prakashsurya
I personally think this patch series is quite stable. However, there are some corners that should be smoothed out before being merged. I'll work on them once the kmem-rework mentioned by @behlendorf is
finished.

---------------------------------------------------------------------------------------

@prakashsurya
The most ugliest thing in this patch series is probably the last bit hack for determining scatter or linear type buffer. It should be hide behind the ABD struct and APIs.
Although it's not that hard to fix (and the main reason it hasn't been done is due to my laziness), what I really like to do is to use scatterlist for both scatter buffer and linear buffer, so we only
need single version of function for each API. However, earlier version of Linux doesn't provide "skip" for the sg_miter iterator, which would make everyone's life much easier when looking at the code.
So, I'm kind of hesitate which approach I should use.

Other problems like making things work with scatter buffer, so we don't need extra copy. The hardest part for this would be the compression algorithms. But these could be dealt afterward if it's
acceptable with everyone.

---------------------------------------------------------------------------------------

Hi all,
I just push out a new revision of the patch series.
This is a rather major update, so there could be new bug introduced, though I didn't find any in my testing. Anyway, test it before using it.

Notable changes are:
1. Both scatter and linear type ABD are now under the ABD struct. No more bit hack (finally!)
2. In stead of using the sg_miter provided by kernel, I introduced my own abd_miter. The reason for this is that a) it can be use on both type of ABD, b) sg_miter cannot be used to simultaneously iterate
on two scatterlist in older kernel, and c) we sometimes need to fall back to non-atomic kmap.
3. Some API changes.

Now with this version being pushed out. I believe this patch series is ready for serious review.

Jan 12 2015

---------------------------------------------------------------------------------------

The version I pushed yesterday would fail to build on older kernels.
This one should fix the problem.

Jan 13 2015
* zfs_arc_meta_prune now takes a number of objects to scan.

* Scan 10,000 object per-batch in the dentry/inode caches
when attempting to reclaim in order to honor arc_meta_limit.

* Retry immediately in arc_adjust_meta().

* Fix arc_meta_max according, it should be updated in consume
not return.

Signed-off-by: Brian Behlendorf behlendorf1@llnl.gov
Issue openzfs#3160

--------------------------------------------------------

zfs_arc_meta_prune raised to 12500
arc_evict_iterations
zfs_arc_grow_retry
zfs_arc_shrink_shift
2048 lead to trouble during import of pool(s)
This patch helps the most common cases. I can still get other deadlocks, however.

openzfs#3183
This reverts commit 037763e.

XXX - expand this comment as to why we're reverting it.

(conflicts due to openzfs#2129)
This reverts commit 68121a0.

Conflicts:
	module/zfs/arc.c
This reverts commit ecf3d9b.

Conflicts:
	module/zfs/arc.c
	module/zfs/ddt.c
	module/zfs/spa_misc.c
5369 arc flags should be an enum
5370 consistent arc_buf_hdr_t naming scheme
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

(conflicts due to openzfs#2129)
5408 managing ZFS cache devices requires lots of RAM
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported by: Tim Chase <tim@chase2k.com>

Porting notes:

Due to the restructuring of the ARC-related structures, this
patch conflicts with at least the following existing ZoL commits:

    6e1d727
    Fix inaccurate arcstat_l2_hdr_size calculations

        The ARC_SPACE_HDRS constant no longer exists and has been
        somewhat equivalently replaced by HDR_L2ONLY_SIZE.

    e0b0ca9
    Add visibility in to cached dbufs

        The new layering of l{1,2}arc_buf_hdr_t within the arc_buf_hdr
        struct requires additional structure member names to be used
        when referencing the inner items.  Also, the presence of L1 or L2
        inner member is indicated by flags using the new HDR_HAS_L{1,2}HDR
        macros.

(conflicts due to openzfs#2129)
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>

Porting notes:

This patch from illumos either duplicates or obviates a number of
ZoL-specific patches which have been added over time.  As part of this
porting effort, several functions were copied in toto from illumos,
effictively undoing some of the previous ZoL-specific patches:

	... trying to remember which ones ...
	arc_adapt() ?
	arc_adjust_impl()
	arc_adjust_meta()
	arc_adjust_type()

Other notes:

The illumos 5368 patch (ARC should cache more metadata), which
was never picked up by ZoL, are mostly undone by this patch.

Since ZoL relies on the kernel asynchronously calling the shrinker to
actually reap memory, the shrinker wakes up arc_reclaim_waiters_cv every
time it runs.  Similarly, and in order to operate in a similar manner
as to illumos, the arc_adapt_thread does the same, analogously to the
arc_reclaim_thread in illumos.

Notable conflicting ZoL commits which conflict with this patch
or whose effects are either duplicated or un-done by this patch:

302f753 - Integrate ARC more tightly with Linux
39e055c - Adjust arc_p based on "bytes" in arc_shrink
f521ce1 - Allow "arc_p" to drop to zero or grow to "arc_c"
77765b5 - Remove "arc_meta_used" from arc_adjust calculation
94520ca - Prune metadata from ghost lists in arc_adjust_meta

arcstat.py needs to be updated to deal with new/missing stats.
I have removed "recycle_miss" from it in order that it still work.

(conflicts due to openzfs#2129)
arc.c:3470:47: error: ‘buf’ undeclared (first use in this function)
   cmn_err(CE_PANIC, "invalid arc state 0x%p", buf->b_state);

change introduced with "Illumos 5369 - arc flags should be an enum"
due to "5408 managing ZFS cache devices requires lots of RAM"

the name changed again

hdr->b_state

to

hdr->b_l1hdr.b_state
arc.c:1712:16: warning: assignment from incompatible pointer type
  df->l2df_func = free_func;

(oversight)

void (*free_func)(abd_t *, size_t))
arc.c:1784:19: warning: passing argument 3 of ‘arc_buf_free_on_write’ from incompatible pointer type
      hdr->b_size, zio_data_buf_free);

change from

zio_data_buf_free
to
abd_free
@kernelOfTruth
Copy link
Contributor Author

closed in favor of #3189

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants