-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
send performance is very bad while all cpu occupy by low priority process #3607
Comments
tracing code and found this in spa.c, I tought lower priority is bigger number not smaller?
|
This code could certainly use a second look to make sure it's doing something reasonable under Linux. The idea is to take the specified priority and translate it to the matching nice setting for the taskq threads. See http://lxr.free-electrons.com/source/include/linux/sched/prio.h#L24 for additional detail on how this is done. The default values were originally selected for the Solaris scheduler so they may not be the optimal setting for Linux. It's definitely an area worth investigating. To make benchmarking easier I'd suggest temporarily adding a module option so you can explicitly set the priority of this taskq when it's created. Then see what impact specific values have. That way we can seriously consider changing the default. |
I patch zfs/spl maxclsyspri to default 70, which make my system have a reasonable speed I never see before, before this simple tuning, this prio value definitely require some attention, |
@AndCycle care to share your patch ? =) |
@kernelOfTruth prio issue only effect if your cpu is always full, patch against 0.6.4.2, with kernel 4.0.5 for SPL diff -rupN a/include/sys/sysmacros.h b/include/sys/sysmacros.h
--- a/include/sys/sysmacros.h 2015-06-24 07:58:01.000000000 +0800
+++ b/include/sys/sysmacros.h 2015-07-20 08:24:19.001026533 +0800
@@ -93,7 +93,7 @@
* Treat shim tasks as SCHED_NORMAL tasks
*/
#define minclsyspri (MAX_RT_PRIO)
-#define maxclsyspri (MAX_PRIO-1)
+#define maxclsyspri DEFAULT_PRIO
#ifndef NICE_TO_PRIO
#define NICE_TO_PRIO(nice) (MAX_RT_PRIO + (nice) + 20) for ZFS diff -rupN a/include/sys/zfs_context.h b/include/sys/zfs_context.h
--- a/include/sys/zfs_context.h 2015-04-09 06:32:38.000000000 +0800
+++ b/include/sys/zfs_context.h 2015-07-20 01:52:22.119260755 +0800
@@ -611,7 +611,7 @@ extern void delay(clock_t ticks);
#define max_ncpus 64
#define minclsyspri 60
-#define maxclsyspri 99
+#define maxclsyspri 70
#define CPU_SEQID (pthread_self() & (max_ncpus - 1)) minor fix ZFS diff -rupN a/module/zfs/spa.c b/module/zfs/spa.c
--- a/module/zfs/spa.c 2015-06-23 09:14:46.000000000 +0800
+++ b/module/zfs/spa.c 2015-07-20 02:05:09.200483683 +0800
@@ -880,7 +880,7 @@ spa_taskqs_init(spa_t *spa, zio_type_t t
* than the other taskqs.
*/
if (t == ZIO_TYPE_WRITE && q == ZIO_TASKQ_ISSUE)
- pri--;
+ pri++;
tq = taskq_create_proc(name, value, pri, 50,
INT_MAX, spa->spa_proc, flags); |
@AndCycle so it won't apply to my system most of the time - unless I stress-test or stress-use it (gaming, compiling, transferring files, etc. - at the same time) Most likely a condition larger servers are very often in - and thus very useful for those cases =) Thanks ! |
That's an impressive improvement, thanks for running this to ground! I've taken your original patch and reworked it in to a form which we can get merged. @AndCycle could you please verify that the proposed patch in #3622 solves your issue. You'll need to use the latest version of the SPL because I've added a new That patch also doesn't contain the |
Under Linux filesystem threads responsible for handling I/O are normally created with the maximum priority. Non-I/O filesystem processes run with the default priority. ZFS should adopt the same priority scheme under Linux to maintain good performance and so that it will complete fairly when other Linux filesystems are active. The priorities have been updated to the following: $ ps -eLo rtprio,cls,pid,pri,nice,cmd | egrep 'z_|spl_|zvol|arc|dbu|meta' - TS 10743 19 -20 [spl_kmem_cache] - TS 10744 19 -20 [spl_system_task] - TS 10745 19 -20 [spl_dynamic_tas] - TS 10764 19 0 [dbu_evict] - TS 10765 19 0 [arc_prune] - TS 10766 19 0 [arc_reclaim] - TS 10767 19 0 [arc_user_evicts] - TS 10768 19 0 [l2arc_feed] - TS 10769 39 0 [z_unmount] - TS 10770 39 -20 [zvol] - TS 11011 39 -20 [z_null_iss] - TS 11012 39 -20 [z_null_int] - TS 11013 39 -20 [z_rd_iss] - TS 11014 39 -20 [z_rd_int_0] - TS 11022 38 -19 [z_wr_iss] - TS 11023 39 -20 [z_wr_iss_h] - TS 11024 39 -20 [z_wr_int_0] - TS 11032 39 -20 [z_wr_int_h] - TS 11033 39 -20 [z_fr_iss_0] - TS 11041 39 -20 [z_fr_int] - TS 11042 39 -20 [z_cl_iss] - TS 11043 39 -20 [z_cl_int] - TS 11044 39 -20 [z_ioctl_iss] - TS 11045 39 -20 [z_ioctl_int] - TS 11046 39 -20 [metaslab_group_] - TS 11050 19 0 [z_iput] - TS 11121 38 -19 [z_wr_iss] Note that under Linux the meaning of a processes priority is inverted with respect to illumos. High values on Linux indicate a _low_ priority while high value on illumos indicate a _high_ priority. In order to preserve the logical meaning of the minclsyspri and maxclsyspri macros when they are used by the illumos wrapper functions their values have been inverted. This way when changes are merged from upstream illumos we won't need to remember to invert the macro. It could also lead to confusion. This patch depends on openzfs/spl#466. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Issue openzfs#3607
Under Linux filesystem threads responsible for handling I/O are normally created with the maximum priority. Non-I/O filesystem processes run with the default priority. ZFS should adopt the same priority scheme under Linux to maintain good performance and so that it will complete fairly when other Linux filesystems are active. The priorities have been updated to the following: $ ps -eLo rtprio,cls,pid,pri,nice,cmd | egrep 'z_|spl_|zvol|arc|dbu|meta' - TS 10743 19 -20 [spl_kmem_cache] - TS 10744 19 -20 [spl_system_task] - TS 10745 19 -20 [spl_dynamic_tas] - TS 10764 19 0 [dbu_evict] - TS 10765 19 0 [arc_prune] - TS 10766 19 0 [arc_reclaim] - TS 10767 19 0 [arc_user_evicts] - TS 10768 19 0 [l2arc_feed] - TS 10769 39 0 [z_unmount] - TS 10770 39 -20 [zvol] - TS 11011 39 -20 [z_null_iss] - TS 11012 39 -20 [z_null_int] - TS 11013 39 -20 [z_rd_iss] - TS 11014 39 -20 [z_rd_int_0] - TS 11022 38 -19 [z_wr_iss] - TS 11023 39 -20 [z_wr_iss_h] - TS 11024 39 -20 [z_wr_int_0] - TS 11032 39 -20 [z_wr_int_h] - TS 11033 39 -20 [z_fr_iss_0] - TS 11041 39 -20 [z_fr_int] - TS 11042 39 -20 [z_cl_iss] - TS 11043 39 -20 [z_cl_int] - TS 11044 39 -20 [z_ioctl_iss] - TS 11045 39 -20 [z_ioctl_int] - TS 11046 39 -20 [metaslab_group_] - TS 11050 19 0 [z_iput] - TS 11121 38 -19 [z_wr_iss] Note that under Linux the meaning of a processes priority is inverted with respect to illumos. High values on Linux indicate a _low_ priority while high value on illumos indicate a _high_ priority. In order to preserve the logical meaning of the minclsyspri and maxclsyspri macros when they are used by the illumos wrapper functions their values have been inverted. This way when changes are merged from upstream illumos we won't need to remember to invert the macro. It could also lead to confusion. This patch depends on openzfs/spl#466. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Issue openzfs#3607
Under Linux filesystem threads responsible for handling I/O are normally created with the maximum priority. Non-I/O filesystem processes run with the default priority. ZFS should adopt the same priority scheme under Linux to maintain good performance and so that it will complete fairly when other Linux filesystems are active. The priorities have been updated to the following: $ ps -eLo rtprio,cls,pid,pri,nice,cmd | egrep 'z_|spl_|zvol|arc|dbu|meta' - TS 10743 19 -20 [spl_kmem_cache] - TS 10744 19 -20 [spl_system_task] - TS 10745 19 -20 [spl_dynamic_tas] - TS 10764 19 0 [dbu_evict] - TS 10765 19 0 [arc_prune] - TS 10766 19 0 [arc_reclaim] - TS 10767 19 0 [arc_user_evicts] - TS 10768 19 0 [l2arc_feed] - TS 10769 39 0 [z_unmount] - TS 10770 39 -20 [zvol] - TS 11011 39 -20 [z_null_iss] - TS 11012 39 -20 [z_null_int] - TS 11013 39 -20 [z_rd_iss] - TS 11014 39 -20 [z_rd_int_0] - TS 11022 38 -19 [z_wr_iss] - TS 11023 39 -20 [z_wr_iss_h] - TS 11024 39 -20 [z_wr_int_0] - TS 11032 39 -20 [z_wr_int_h] - TS 11033 39 -20 [z_fr_iss_0] - TS 11041 39 -20 [z_fr_int] - TS 11042 39 -20 [z_cl_iss] - TS 11043 39 -20 [z_cl_int] - TS 11044 39 -20 [z_ioctl_iss] - TS 11045 39 -20 [z_ioctl_int] - TS 11046 39 -20 [metaslab_group_] - TS 11050 19 0 [z_iput] - TS 11121 38 -19 [z_wr_iss] Note that under Linux the meaning of a processes priority is inverted with respect to illumos. High values on Linux indicate a _low_ priority while high value on illumos indicate a _high_ priority. In order to preserve the logical meaning of the minclsyspri and maxclsyspri macros when they are used by the illumos wrapper functions their values have been inverted. This way when changes are merged from upstream illumos we won't need to remember to invert the macro. It could also lead to confusion. This patch depends on openzfs/spl#466. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Issue openzfs#3607 Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
On Linux the meaning of a processes priority is inverted with respect to illumos. High values on Linux indicate a _low_ priority while high value on illumos indicate a _high_ priority. In order to preserve the logical meaning of the minclsyspri and maxclsyspri macros when they are used by the illumos wrapper functions their values have been inverted. This way when changes are merged from upstream illumos we won't need to remember to invert the macro. It could also lead to confusion. Note this change also reverts some of the priorities changes in prior commit 62aa81a. The rational is as follows: spl_kmem_cache - High priority may result in blocked memory allocs spl_system_taskq - May perform I/O for file backed VDEVs spl_dynamic_taskq - New taskq threads should be spawned promptly Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Issue openzfs/zfs#3607
@behlendorf your work looks fine, but probably won't test your patch on my system, I think I can wait for 0.6.5 to see how it works out :) |
@AndCycle no problem. Thanks for drawing our attention to this issue so we could address it. |
@behlendorf while doing some research I have come accross this rather old issue and I have a comment (which is possibly irrelevant now).
I believe that on Solaris / illumos the greater numbers represent the higher / better priorities. |
my system have BOINC projects running on it,
it donate the cpu time for grid computing,
it will run one process per cpu core,
but current ZoL performance seems heavily effect by this,
when BOINC is on, the send performance on 6 disk raidz2 is as low as 20MB/s,
nice 19 on BOINC won't help
when it's off, it easily top out 400MB/s,
is there any I can tune on module parameter? or I have to touch the source?
The text was updated successfully, but these errors were encountered: