Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send performance is very bad while all cpu occupy by low priority process #3607

Closed
AndCycle opened this issue Jul 19, 2015 · 11 comments
Closed
Labels
Type: Performance Performance improvement or performance problem
Milestone

Comments

@AndCycle
Copy link

my system have BOINC projects running on it,
it donate the cpu time for grid computing,
it will run one process per cpu core,

but current ZoL performance seems heavily effect by this,

when BOINC is on, the send performance on 6 disk raidz2 is as low as 20MB/s,
nice 19 on BOINC won't help

when it's off, it easily top out 400MB/s,

is there any I can tune on module parameter? or I have to touch the source?

@AndCycle
Copy link
Author

tracing code and found this in spa.c,

I tought lower priority is bigger number not smaller?
or am I wrong? I am not familiar with prio

pri_t pri = maxclsyspri;
/*
 * The write issue taskq can be extremely CPU
 * intensive.  Run it at slightly lower priority
 * than the other taskqs.
 */
if (t == ZIO_TYPE_WRITE && q == ZIO_TASKQ_ISSUE)
    pri--;

@behlendorf behlendorf added the Type: Performance Performance improvement or performance problem label Jul 20, 2015
@behlendorf
Copy link
Contributor

This code could certainly use a second look to make sure it's doing something reasonable under Linux. The idea is to take the specified priority and translate it to the matching nice setting for the taskq threads. See http://lxr.free-electrons.com/source/include/linux/sched/prio.h#L24 for additional detail on how this is done.

The default values were originally selected for the Solaris scheduler so they may not be the optimal setting for Linux. It's definitely an area worth investigating. To make benchmarking easier I'd suggest temporarily adding a module option so you can explicitly set the priority of this taskq when it's created. Then see what impact specific values have. That way we can seriously consider changing the default.

@AndCycle
Copy link
Author

I patch zfs/spl maxclsyspri to default 70, which make my system have a reasonable speed I never see before,

before this simple tuning,
a full send from my main pool 5TBx6 raidz2 require about 2 weeks to finish,
now it's gonna finish in one more day, at least 3 times faster,

this prio value definitely require some attention,
and I think you can simply test how bad this is by spawning cpu intensive process for each core with nice 19,
it killed performance just that easy.

@kernelOfTruth
Copy link
Contributor

@AndCycle care to share your patch ? =)

@AndCycle
Copy link
Author

@kernelOfTruth
sorry I never use github's repository before so I gonna post those simple patch here

prio issue only effect if your cpu is always full,
for people have bunch of idle cpu time this doesn't make any different.

patch against 0.6.4.2, with kernel 4.0.5

for SPL

diff -rupN a/include/sys/sysmacros.h b/include/sys/sysmacros.h
--- a/include/sys/sysmacros.h   2015-06-24 07:58:01.000000000 +0800
+++ b/include/sys/sysmacros.h   2015-07-20 08:24:19.001026533 +0800
@@ -93,7 +93,7 @@
  * Treat shim tasks as SCHED_NORMAL tasks
  */
 #define minclsyspri                    (MAX_RT_PRIO)
-#define maxclsyspri                    (MAX_PRIO-1)
+#define maxclsyspri                    DEFAULT_PRIO

 #ifndef NICE_TO_PRIO
 #define NICE_TO_PRIO(nice)             (MAX_RT_PRIO + (nice) + 20)

for ZFS

diff -rupN a/include/sys/zfs_context.h b/include/sys/zfs_context.h
--- a/include/sys/zfs_context.h 2015-04-09 06:32:38.000000000 +0800
+++ b/include/sys/zfs_context.h 2015-07-20 01:52:22.119260755 +0800
@@ -611,7 +611,7 @@ extern void delay(clock_t ticks);
 #define        max_ncpus       64

 #define        minclsyspri     60
-#define        maxclsyspri     99
+#define        maxclsyspri     70

 #define        CPU_SEQID       (pthread_self() & (max_ncpus - 1))

minor fix ZFS

diff -rupN a/module/zfs/spa.c b/module/zfs/spa.c
--- a/module/zfs/spa.c  2015-06-23 09:14:46.000000000 +0800
+++ b/module/zfs/spa.c  2015-07-20 02:05:09.200483683 +0800
@@ -880,7 +880,7 @@ spa_taskqs_init(spa_t *spa, zio_type_t t
                         * than the other taskqs.
                         */
                        if (t == ZIO_TYPE_WRITE && q == ZIO_TASKQ_ISSUE)
-                               pri--;
+                               pri++;

                        tq = taskq_create_proc(name, value, pri, 50,
                            INT_MAX, spa->spa_proc, flags);

@kernelOfTruth
Copy link
Contributor

@AndCycle so it won't apply to my system most of the time - unless I stress-test or stress-use it (gaming, compiling, transferring files, etc. - at the same time)

Most likely a condition larger servers are very often in - and thus very useful for those cases =)

Thanks !

@behlendorf
Copy link
Contributor

That's an impressive improvement, thanks for running this to ground! I've taken your original patch and reworked it in to a form which we can get merged. @AndCycle could you please verify that the proposed patch in #3622 solves your issue. You'll need to use the latest version of the SPL because I've added a new defclsyspri rather than redefining maxclsyspri as you did in your original patch.

That patch also doesn't contain the pri-- to pri++ change. The intention here is to give the more CPU intensive threads a lower priority so they run first and accomplish their work as soon as possible. I believe this to be correct, but if your testing shows otherwise I'm willing to be convinced!

behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 25, 2015
Under Linux filesystem threads responsible for handling I/O are
normally created with the maximum priority.  Non-I/O filesystem
processes run with the default priority.  ZFS should adopt the
same priority scheme under Linux to maintain good performance
and so that it will complete fairly when other Linux filesystems
are active.  The priorities have been updated to the following:

$ ps -eLo rtprio,cls,pid,pri,nice,cmd | egrep 'z_|spl_|zvol|arc|dbu|meta'
     -  TS 10743  19 -20 [spl_kmem_cache]
     -  TS 10744  19 -20 [spl_system_task]
     -  TS 10745  19 -20 [spl_dynamic_tas]
     -  TS 10764  19   0 [dbu_evict]
     -  TS 10765  19   0 [arc_prune]
     -  TS 10766  19   0 [arc_reclaim]
     -  TS 10767  19   0 [arc_user_evicts]
     -  TS 10768  19   0 [l2arc_feed]
     -  TS 10769  39   0 [z_unmount]
     -  TS 10770  39 -20 [zvol]
     -  TS 11011  39 -20 [z_null_iss]
     -  TS 11012  39 -20 [z_null_int]
     -  TS 11013  39 -20 [z_rd_iss]
     -  TS 11014  39 -20 [z_rd_int_0]
     -  TS 11022  38 -19 [z_wr_iss]
     -  TS 11023  39 -20 [z_wr_iss_h]
     -  TS 11024  39 -20 [z_wr_int_0]
     -  TS 11032  39 -20 [z_wr_int_h]
     -  TS 11033  39 -20 [z_fr_iss_0]
     -  TS 11041  39 -20 [z_fr_int]
     -  TS 11042  39 -20 [z_cl_iss]
     -  TS 11043  39 -20 [z_cl_int]
     -  TS 11044  39 -20 [z_ioctl_iss]
     -  TS 11045  39 -20 [z_ioctl_int]
     -  TS 11046  39 -20 [metaslab_group_]
     -  TS 11050  19   0 [z_iput]
     -  TS 11121  38 -19 [z_wr_iss]

Note that under Linux the meaning of a processes priority is inverted
with respect to illumos.  High values on Linux indicate a _low_ priority
while high value on illumos indicate a _high_ priority.

In order to preserve the logical meaning of the minclsyspri and
maxclsyspri macros when they are used by the illumos wrapper functions
their values have been inverted.  This way when changes are merged
from upstream illumos we won't need to remember to invert the macro.
It could also lead to confusion.

This patch depends on openzfs/spl#466.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue openzfs#3607
behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 25, 2015
Under Linux filesystem threads responsible for handling I/O are
normally created with the maximum priority.  Non-I/O filesystem
processes run with the default priority.  ZFS should adopt the
same priority scheme under Linux to maintain good performance
and so that it will complete fairly when other Linux filesystems
are active.  The priorities have been updated to the following:

$ ps -eLo rtprio,cls,pid,pri,nice,cmd | egrep 'z_|spl_|zvol|arc|dbu|meta'
     -  TS 10743  19 -20 [spl_kmem_cache]
     -  TS 10744  19 -20 [spl_system_task]
     -  TS 10745  19 -20 [spl_dynamic_tas]
     -  TS 10764  19   0 [dbu_evict]
     -  TS 10765  19   0 [arc_prune]
     -  TS 10766  19   0 [arc_reclaim]
     -  TS 10767  19   0 [arc_user_evicts]
     -  TS 10768  19   0 [l2arc_feed]
     -  TS 10769  39   0 [z_unmount]
     -  TS 10770  39 -20 [zvol]
     -  TS 11011  39 -20 [z_null_iss]
     -  TS 11012  39 -20 [z_null_int]
     -  TS 11013  39 -20 [z_rd_iss]
     -  TS 11014  39 -20 [z_rd_int_0]
     -  TS 11022  38 -19 [z_wr_iss]
     -  TS 11023  39 -20 [z_wr_iss_h]
     -  TS 11024  39 -20 [z_wr_int_0]
     -  TS 11032  39 -20 [z_wr_int_h]
     -  TS 11033  39 -20 [z_fr_iss_0]
     -  TS 11041  39 -20 [z_fr_int]
     -  TS 11042  39 -20 [z_cl_iss]
     -  TS 11043  39 -20 [z_cl_int]
     -  TS 11044  39 -20 [z_ioctl_iss]
     -  TS 11045  39 -20 [z_ioctl_int]
     -  TS 11046  39 -20 [metaslab_group_]
     -  TS 11050  19   0 [z_iput]
     -  TS 11121  38 -19 [z_wr_iss]

Note that under Linux the meaning of a processes priority is inverted
with respect to illumos.  High values on Linux indicate a _low_ priority
while high value on illumos indicate a _high_ priority.

In order to preserve the logical meaning of the minclsyspri and
maxclsyspri macros when they are used by the illumos wrapper functions
their values have been inverted.  This way when changes are merged
from upstream illumos we won't need to remember to invert the macro.
It could also lead to confusion.

This patch depends on openzfs/spl#466.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue openzfs#3607
snajpa pushed a commit to vpsfreecz/zfs that referenced this issue Jul 26, 2015
Under Linux filesystem threads responsible for handling I/O are
normally created with the maximum priority.  Non-I/O filesystem
processes run with the default priority.  ZFS should adopt the
same priority scheme under Linux to maintain good performance
and so that it will complete fairly when other Linux filesystems
are active.  The priorities have been updated to the following:

$ ps -eLo rtprio,cls,pid,pri,nice,cmd | egrep 'z_|spl_|zvol|arc|dbu|meta'
     -  TS 10743  19 -20 [spl_kmem_cache]
     -  TS 10744  19 -20 [spl_system_task]
     -  TS 10745  19 -20 [spl_dynamic_tas]
     -  TS 10764  19   0 [dbu_evict]
     -  TS 10765  19   0 [arc_prune]
     -  TS 10766  19   0 [arc_reclaim]
     -  TS 10767  19   0 [arc_user_evicts]
     -  TS 10768  19   0 [l2arc_feed]
     -  TS 10769  39   0 [z_unmount]
     -  TS 10770  39 -20 [zvol]
     -  TS 11011  39 -20 [z_null_iss]
     -  TS 11012  39 -20 [z_null_int]
     -  TS 11013  39 -20 [z_rd_iss]
     -  TS 11014  39 -20 [z_rd_int_0]
     -  TS 11022  38 -19 [z_wr_iss]
     -  TS 11023  39 -20 [z_wr_iss_h]
     -  TS 11024  39 -20 [z_wr_int_0]
     -  TS 11032  39 -20 [z_wr_int_h]
     -  TS 11033  39 -20 [z_fr_iss_0]
     -  TS 11041  39 -20 [z_fr_int]
     -  TS 11042  39 -20 [z_cl_iss]
     -  TS 11043  39 -20 [z_cl_int]
     -  TS 11044  39 -20 [z_ioctl_iss]
     -  TS 11045  39 -20 [z_ioctl_int]
     -  TS 11046  39 -20 [metaslab_group_]
     -  TS 11050  19   0 [z_iput]
     -  TS 11121  38 -19 [z_wr_iss]

Note that under Linux the meaning of a processes priority is inverted
with respect to illumos.  High values on Linux indicate a _low_ priority
while high value on illumos indicate a _high_ priority.

In order to preserve the logical meaning of the minclsyspri and
maxclsyspri macros when they are used by the illumos wrapper functions
their values have been inverted.  This way when changes are merged
from upstream illumos we won't need to remember to invert the macro.
It could also lead to confusion.

This patch depends on openzfs/spl#466.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue openzfs#3607
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
behlendorf added a commit to openzfs/spl that referenced this issue Jul 29, 2015
On Linux the meaning of a processes priority is inverted with respect
to illumos.  High values on Linux indicate a _low_ priority while high
value on illumos indicate a _high_ priority.

In order to preserve the logical meaning of the minclsyspri and
maxclsyspri macros when they are used by the illumos wrapper functions
their values have been inverted.  This way when changes are merged
from upstream illumos we won't need to remember to invert the macro.
It could also lead to confusion.

Note this change also reverts some of the priorities changes in prior
commit 62aa81a.  The rational is as follows:

spl_kmem_cache    - High priority may result in blocked memory allocs
spl_system_taskq  - May perform I/O for file backed VDEVs
spl_dynamic_taskq - New taskq threads should be spawned promptly

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue openzfs/zfs#3607
@AndCycle
Copy link
Author

@behlendorf your work looks fine,

but probably won't test your patch on my system,
there are few line need to be removed in order to clean apply on 0.6.4 which I don't have time to read though,

I think I can wait for 0.6.5 to see how it works out :)

@behlendorf
Copy link
Contributor

@AndCycle no problem. Thanks for drawing our attention to this issue so we could address it.

@avg-I
Copy link
Contributor

avg-I commented Mar 28, 2016

@behlendorf while doing some research I have come accross this rather old issue and I have a comment (which is possibly irrelevant now).
You said:

That patch also doesn't contain the pri-- to pri++ change. The intention here is to give the more CPU intensive threads a lower priority so they run first and accomplish their work as soon as possible. I believe this to be correct, but if your testing shows otherwise I'm willing to be convinced!

I believe that on Solaris / illumos the greater numbers represent the higher / better priorities.
I believe that the relation is the opposite on Linux. That's certainly the case on FreeBSD.
So, I think that the intention in the original code is to a give a lower (worse) priority to CPU-hungry threads, so that they do not stall other threads for too long.

@behlendorf
Copy link
Contributor

@avg-I thanks for commenting. After taking another look at FreeBSD and illumos we determined the same thing, when 1229323 was finally merged to sort of priorities we did tale that in to account and I believe got it as intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

4 participants