-
Notifications
You must be signed in to change notification settings - Fork 54.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix typo 'acceptible' #203
base: master
Are you sure you want to change the base?
Conversation
Username checks out |
Indeed
|
I don't see why every little spelling error in the commentary. Comments do
|
Meant to say, "I don't see why you try to fix every little spelling error
|
@nkeck720 This is a bot :) |
I should add this: If you think there is anything wrong with this pull request or just have a question, be kind to mail me Looking for the source code of this bot? Well, you have to be patient! The bot is under development If you decide to close this pull request, pleace specify why before doing so. With kind regards, |
@TheTypoMaster You have a typo ("pleace") in your comment here on github: |
@petr0001 Yep! |
Jesus christ people hate this bot |
What did I expect tbh |
I'm not gonna hate this bot but you are making a pulling request to the Linux kernel which is.. |
The issue isn't your bot. The Linux project just does not accept Pull Requests on Github. There is a system for emailing patches to the Linux mailing list, search and you shall find. |
Maybe it would be a good idea for your bot to specifically blacklist Github repos which do not accept PR:s? |
When manually overwriting the HWS, rather than assume irq_seqno_barrier does the right thing, we can explicitly flush the cacheline instead. This avoids us calling the engine->irq_seqno_barrier() from an illegal context: [ 1472.651797] BUG: scheduling while atomic: migration/0/11/0x00000002 [ 1472.651807] Modules linked in: ctr ccm arc4 snd_hda_codec_hdmi bnep rfcomm iwldvm snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel mac80211 snd_hda_codec snd_hda_core snd_pcm dm_multipath snd_hwdep intel_powerclamp coretemp snd_seq_midi crct10dif_pclmul snd_seq_midi_event crc32_pclmul iwlwifi ghash_clmulni_intel btusb snd_rawmidi btrtl aesni_intel btbcm aes_x86_64 crypto_simd btintel cryptd glue_helper bluetooth snd_seq cfg80211 snd_timer snd_seq_device intel_ips binfmt_misc snd mei_me soundcore mei dm_mirror dm_region_hash dm_log i915 intel_gtt i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea prime_numbers e1000e drm ahci libahci [ 1472.651897] CPU: 0 PID: 11 Comm: migration/0 Tainted: G U 4.11.0-rc1+ torvalds#203 [ 1472.651899] Hardware name: LENOVO 514328U/514328U, BIOS 6QET44WW (1.14 ) 04/20/2010 [ 1472.651900] Call Trace: [ 1472.651913] dump_stack+0x63/0x90 [ 1472.651922] __schedule_bug+0x5d/0x6b [ 1472.651930] __schedule+0x46a/0x5f0 [ 1472.651934] schedule+0x38/0x90 [ 1472.651938] schedule_hrtimeout_range_clock+0x85/0x110 [ 1472.651945] ? hrtimer_init+0x10/0x10 [ 1472.651949] schedule_hrtimeout_range+0xe/0x10 [ 1472.651952] usleep_range+0x4d/0x60 [ 1472.652037] gen5_seqno_barrier+0x13/0x20 [i915] [ 1472.652101] intel_engine_init_global_seqno+0xd7/0x160 [i915] [ 1472.652160] __i915_gem_set_wedged_BKL+0xa0/0x180 [i915] [ 1472.652166] multi_cpu_stop+0xbb/0xe0 [ 1472.652170] ? cpu_stop_queue_work+0x90/0x90 [ 1472.652174] cpu_stopper_thread+0x82/0x110 [ 1472.652179] smpboot_thread_fn+0x137/0x190 [ 1472.652184] kthread+0xf7/0x130 [ 1472.652187] ? sort_range+0x20/0x20 [ 1472.652191] ? kthread_park+0x90/0x90 [ 1472.652195] ret_from_fork+0x2c/0x40 Testcase: igt/gem_eio #ilk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170314111452.9375-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Dasd uses completion_data from struct request to store per request private data - this is problematic since this member is part of a union which is also used by IO schedulers. Let the block layer maintain space for per request data behind each struct request. Fixes crashes on block layer timeouts like this one: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 0000000000000000 TEID: 0000000000000483 Fault in home space mode while using kernel ASCE. AS:0000000001308007 R3:00000000fffc8007 S:00000000fffcc000 P:000000000000013d Oops: 0004 ilc:2 [#1] PREEMPT SMP Modules linked in: [...] CPU: 0 PID: 1480 Comm: kworker/0:2H Not tainted 4.17.0-rc4-00046-gaa3bcd43b5af #203 Hardware name: IBM 3906 M02 702 (LPAR) Workqueue: kblockd blk_mq_timeout_work Krnl PSW : 0000000067ac406b 00000000b6960308 (do_raw_spin_trylock+0x30/0x78) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000c00 0000000000000000 0000000000000000 0000000000000001 0000000000b9d3c8 0000000000000000 0000000000000001 00000000cf9639d8 0000000000000000 0700000000000000 0000000000000000 000000000099f09e 0000000000000000 000000000076e9d0 000000006247bb08 000000006247bae0 Krnl Code: 00000000001c159c: b90400c2 lgr %r12,%r2 00000000001c15a0: a7180000 lhi %r1,0 #00000000001c15a4: 583003a4 l %r3,932 >00000000001c15a8: ba132000 cs %r1,%r3,0(%r2) 00000000001c15ac: a7180001 lhi %r1,1 00000000001c15b0: a784000b brc 8,1c15c6 00000000001c15b4: c0e5004e72aa brasl %r14,b8fb08 00000000001c15ba: 1812 lr %r1,%r2 Call Trace: ([<0700000000000000>] 0x700000000000000) [<0000000000b9d3d2>] _raw_spin_lock_irqsave+0x7a/0xb8 [<000000000099f09e>] dasd_times_out+0x46/0x278 [<000000000076ea6e>] blk_mq_terminate_expired+0x9e/0x108 [<000000000077497a>] bt_for_each+0x102/0x130 [<0000000000774e54>] blk_mq_queue_tag_busy_iter+0x74/0xd8 [<000000000076fea0>] blk_mq_timeout_work+0x260/0x320 [<0000000000169dd4>] process_one_work+0x3bc/0x708 [<000000000016a382>] worker_thread+0x262/0x408 [<00000000001723a8>] kthread+0x160/0x178 [<0000000000b9e73a>] kernel_thread_starter+0x6/0xc [<0000000000b9e734>] kernel_thread_starter+0x0/0xc INFO: lockdep is turned off. Last Breaking-Event-Address: [<0000000000b9d3cc>] _raw_spin_lock_irqsave+0x74/0xb8 Kernel panic - not syncing: Fatal exception: panic_on_oops Signed-off-by: Sebastian Ott <sebott@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[ Upstream commit f0f59a2 ] Dasd uses completion_data from struct request to store per request private data - this is problematic since this member is part of a union which is also used by IO schedulers. Let the block layer maintain space for per request data behind each struct request. Fixes crashes on block layer timeouts like this one: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 0000000000000000 TEID: 0000000000000483 Fault in home space mode while using kernel ASCE. AS:0000000001308007 R3:00000000fffc8007 S:00000000fffcc000 P:000000000000013d Oops: 0004 ilc:2 [#1] PREEMPT SMP Modules linked in: [...] CPU: 0 PID: 1480 Comm: kworker/0:2H Not tainted 4.17.0-rc4-00046-gaa3bcd43b5af #203 Hardware name: IBM 3906 M02 702 (LPAR) Workqueue: kblockd blk_mq_timeout_work Krnl PSW : 0000000067ac406b 00000000b6960308 (do_raw_spin_trylock+0x30/0x78) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000c00 0000000000000000 0000000000000000 0000000000000001 0000000000b9d3c8 0000000000000000 0000000000000001 00000000cf9639d8 0000000000000000 0700000000000000 0000000000000000 000000000099f09e 0000000000000000 000000000076e9d0 000000006247bb08 000000006247bae0 Krnl Code: 00000000001c159c: b90400c2 lgr %r12,%r2 00000000001c15a0: a7180000 lhi %r1,0 #00000000001c15a4: 583003a4 l %r3,932 >00000000001c15a8: ba132000 cs %r1,%r3,0(%r2) 00000000001c15ac: a7180001 lhi %r1,1 00000000001c15b0: a784000b brc 8,1c15c6 00000000001c15b4: c0e5004e72aa brasl %r14,b8fb08 00000000001c15ba: 1812 lr %r1,%r2 Call Trace: ([<0700000000000000>] 0x700000000000000) [<0000000000b9d3d2>] _raw_spin_lock_irqsave+0x7a/0xb8 [<000000000099f09e>] dasd_times_out+0x46/0x278 [<000000000076ea6e>] blk_mq_terminate_expired+0x9e/0x108 [<000000000077497a>] bt_for_each+0x102/0x130 [<0000000000774e54>] blk_mq_queue_tag_busy_iter+0x74/0xd8 [<000000000076fea0>] blk_mq_timeout_work+0x260/0x320 [<0000000000169dd4>] process_one_work+0x3bc/0x708 [<000000000016a382>] worker_thread+0x262/0x408 [<00000000001723a8>] kthread+0x160/0x178 [<0000000000b9e73a>] kernel_thread_starter+0x6/0xc [<0000000000b9e734>] kernel_thread_starter+0x0/0xc INFO: lockdep is turned off. Last Breaking-Event-Address: [<0000000000b9d3cc>] _raw_spin_lock_irqsave+0x74/0xb8 Kernel panic - not syncing: Fatal exception: panic_on_oops Signed-off-by: Sebastian Ott <sebott@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line) torvalds#12: Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such WARNING: braces {} are not necessary for single statement blocks torvalds#163: FILE: fs/proc/generic.c:536: + if (pde->proc_ops->proc_flags & PROC_ENTRY_PERMANENT) { + pde->flags |= PROC_ENTRY_PERMANENT; + } WARNING: line over 80 characters torvalds#203: FILE: fs/proc/generic.c:676: + WARN(1, "removing permanent /proc entry '%s'", de->name); WARNING: braces {} are not necessary for single statement blocks torvalds#207: FILE: fs/proc/generic.c:680: + if (S_ISDIR(de->mode)) { + parent->nlink--; + } WARNING: line over 80 characters torvalds#244: FILE: fs/proc/inode.c:198: +static loff_t pde_lseek(struct proc_dir_entry *pde, struct file *file, loff_t offset, int whence) WARNING: line over 80 characters torvalds#274: FILE: fs/proc/inode.c:222: +static ssize_t pde_read(struct proc_dir_entry *pde, struct file *file, char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#303: FILE: fs/proc/inode.c:246: +static ssize_t pde_write(struct proc_dir_entry *pde, struct file *file, const char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#332: FILE: fs/proc/inode.c:270: +static __poll_t pde_poll(struct proc_dir_entry *pde, struct file *file, struct poll_table_struct *pts) WARNING: line over 80 characters torvalds#361: FILE: fs/proc/inode.c:294: +static long pde_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#391: FILE: fs/proc/inode.c:319: +static long pde_compat_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#421: FILE: fs/proc/inode.c:343: +static int pde_mmap(struct proc_dir_entry *pde, struct file *file, struct vm_area_struct *vma) WARNING: line over 80 characters torvalds#452: FILE: fs/proc/inode.c:368: +pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned long orig_addr, WARNING: line over 80 characters torvalds#489: FILE: fs/proc/inode.c:393: + return pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: line over 80 characters torvalds#491: FILE: fs/proc/inode.c:395: + rv = pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: braces {} are not necessary for single statement blocks torvalds#518: FILE: fs/proc/inode.c:470: + if (release) { + return release(inode, file); + } total: 0 errors, 15 warnings, 462 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/proc-faster-open-read-close-with-permanent-files.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line) torvalds#12: Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such WARNING: braces {} are not necessary for single statement blocks torvalds#163: FILE: fs/proc/generic.c:536: + if (pde->proc_ops->proc_flags & PROC_ENTRY_PERMANENT) { + pde->flags |= PROC_ENTRY_PERMANENT; + } WARNING: line over 80 characters torvalds#203: FILE: fs/proc/generic.c:676: + WARN(1, "removing permanent /proc entry '%s'", de->name); WARNING: braces {} are not necessary for single statement blocks torvalds#207: FILE: fs/proc/generic.c:680: + if (S_ISDIR(de->mode)) { + parent->nlink--; + } WARNING: line over 80 characters torvalds#244: FILE: fs/proc/inode.c:198: +static loff_t pde_lseek(struct proc_dir_entry *pde, struct file *file, loff_t offset, int whence) WARNING: line over 80 characters torvalds#274: FILE: fs/proc/inode.c:222: +static ssize_t pde_read(struct proc_dir_entry *pde, struct file *file, char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#303: FILE: fs/proc/inode.c:246: +static ssize_t pde_write(struct proc_dir_entry *pde, struct file *file, const char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#332: FILE: fs/proc/inode.c:270: +static __poll_t pde_poll(struct proc_dir_entry *pde, struct file *file, struct poll_table_struct *pts) WARNING: line over 80 characters torvalds#361: FILE: fs/proc/inode.c:294: +static long pde_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#391: FILE: fs/proc/inode.c:319: +static long pde_compat_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#421: FILE: fs/proc/inode.c:343: +static int pde_mmap(struct proc_dir_entry *pde, struct file *file, struct vm_area_struct *vma) WARNING: line over 80 characters torvalds#452: FILE: fs/proc/inode.c:368: +pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned long orig_addr, WARNING: line over 80 characters torvalds#489: FILE: fs/proc/inode.c:393: + return pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: line over 80 characters torvalds#491: FILE: fs/proc/inode.c:395: + rv = pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: braces {} are not necessary for single statement blocks torvalds#518: FILE: fs/proc/inode.c:470: + if (release) { + return release(inode, file); + } total: 0 errors, 15 warnings, 462 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/proc-faster-open-read-close-with-permanent-files.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line) torvalds#12: Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such WARNING: braces {} are not necessary for single statement blocks torvalds#163: FILE: fs/proc/generic.c:536: + if (pde->proc_ops->proc_flags & PROC_ENTRY_PERMANENT) { + pde->flags |= PROC_ENTRY_PERMANENT; + } WARNING: line over 80 characters torvalds#203: FILE: fs/proc/generic.c:676: + WARN(1, "removing permanent /proc entry '%s'", de->name); WARNING: braces {} are not necessary for single statement blocks torvalds#207: FILE: fs/proc/generic.c:680: + if (S_ISDIR(de->mode)) { + parent->nlink--; + } WARNING: line over 80 characters torvalds#244: FILE: fs/proc/inode.c:198: +static loff_t pde_lseek(struct proc_dir_entry *pde, struct file *file, loff_t offset, int whence) WARNING: line over 80 characters torvalds#274: FILE: fs/proc/inode.c:222: +static ssize_t pde_read(struct proc_dir_entry *pde, struct file *file, char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#303: FILE: fs/proc/inode.c:246: +static ssize_t pde_write(struct proc_dir_entry *pde, struct file *file, const char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#332: FILE: fs/proc/inode.c:270: +static __poll_t pde_poll(struct proc_dir_entry *pde, struct file *file, struct poll_table_struct *pts) WARNING: line over 80 characters torvalds#361: FILE: fs/proc/inode.c:294: +static long pde_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#391: FILE: fs/proc/inode.c:319: +static long pde_compat_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#421: FILE: fs/proc/inode.c:343: +static int pde_mmap(struct proc_dir_entry *pde, struct file *file, struct vm_area_struct *vma) WARNING: line over 80 characters torvalds#452: FILE: fs/proc/inode.c:368: +pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned long orig_addr, WARNING: line over 80 characters torvalds#489: FILE: fs/proc/inode.c:393: + return pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: line over 80 characters torvalds#491: FILE: fs/proc/inode.c:395: + rv = pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: braces {} are not necessary for single statement blocks torvalds#518: FILE: fs/proc/inode.c:470: + if (release) { + return release(inode, file); + } total: 0 errors, 15 warnings, 462 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/proc-faster-open-read-close-with-permanent-files.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line) torvalds#12: Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such WARNING: braces {} are not necessary for single statement blocks torvalds#163: FILE: fs/proc/generic.c:536: + if (pde->proc_ops->proc_flags & PROC_ENTRY_PERMANENT) { + pde->flags |= PROC_ENTRY_PERMANENT; + } WARNING: line over 80 characters torvalds#203: FILE: fs/proc/generic.c:676: + WARN(1, "removing permanent /proc entry '%s'", de->name); WARNING: braces {} are not necessary for single statement blocks torvalds#207: FILE: fs/proc/generic.c:680: + if (S_ISDIR(de->mode)) { + parent->nlink--; + } WARNING: line over 80 characters torvalds#244: FILE: fs/proc/inode.c:198: +static loff_t pde_lseek(struct proc_dir_entry *pde, struct file *file, loff_t offset, int whence) WARNING: line over 80 characters torvalds#274: FILE: fs/proc/inode.c:222: +static ssize_t pde_read(struct proc_dir_entry *pde, struct file *file, char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#303: FILE: fs/proc/inode.c:246: +static ssize_t pde_write(struct proc_dir_entry *pde, struct file *file, const char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#332: FILE: fs/proc/inode.c:270: +static __poll_t pde_poll(struct proc_dir_entry *pde, struct file *file, struct poll_table_struct *pts) WARNING: line over 80 characters torvalds#361: FILE: fs/proc/inode.c:294: +static long pde_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#391: FILE: fs/proc/inode.c:319: +static long pde_compat_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#421: FILE: fs/proc/inode.c:343: +static int pde_mmap(struct proc_dir_entry *pde, struct file *file, struct vm_area_struct *vma) WARNING: line over 80 characters torvalds#452: FILE: fs/proc/inode.c:368: +pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned long orig_addr, WARNING: line over 80 characters torvalds#489: FILE: fs/proc/inode.c:393: + return pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: line over 80 characters torvalds#491: FILE: fs/proc/inode.c:395: + rv = pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: braces {} are not necessary for single statement blocks torvalds#518: FILE: fs/proc/inode.c:470: + if (release) { + return release(inode, file); + } total: 0 errors, 15 warnings, 462 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/proc-faster-open-read-close-with-permanent-files.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line) torvalds#12: Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such WARNING: braces {} are not necessary for single statement blocks torvalds#163: FILE: fs/proc/generic.c:536: + if (pde->proc_ops->proc_flags & PROC_ENTRY_PERMANENT) { + pde->flags |= PROC_ENTRY_PERMANENT; + } WARNING: line over 80 characters torvalds#203: FILE: fs/proc/generic.c:676: + WARN(1, "removing permanent /proc entry '%s'", de->name); WARNING: braces {} are not necessary for single statement blocks torvalds#207: FILE: fs/proc/generic.c:680: + if (S_ISDIR(de->mode)) { + parent->nlink--; + } WARNING: line over 80 characters torvalds#244: FILE: fs/proc/inode.c:198: +static loff_t pde_lseek(struct proc_dir_entry *pde, struct file *file, loff_t offset, int whence) WARNING: line over 80 characters torvalds#274: FILE: fs/proc/inode.c:222: +static ssize_t pde_read(struct proc_dir_entry *pde, struct file *file, char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#303: FILE: fs/proc/inode.c:246: +static ssize_t pde_write(struct proc_dir_entry *pde, struct file *file, const char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#332: FILE: fs/proc/inode.c:270: +static __poll_t pde_poll(struct proc_dir_entry *pde, struct file *file, struct poll_table_struct *pts) WARNING: line over 80 characters torvalds#361: FILE: fs/proc/inode.c:294: +static long pde_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#391: FILE: fs/proc/inode.c:319: +static long pde_compat_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#421: FILE: fs/proc/inode.c:343: +static int pde_mmap(struct proc_dir_entry *pde, struct file *file, struct vm_area_struct *vma) WARNING: line over 80 characters torvalds#452: FILE: fs/proc/inode.c:368: +pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned long orig_addr, WARNING: line over 80 characters torvalds#489: FILE: fs/proc/inode.c:393: + return pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: line over 80 characters torvalds#491: FILE: fs/proc/inode.c:395: + rv = pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: braces {} are not necessary for single statement blocks torvalds#518: FILE: fs/proc/inode.c:470: + if (release) { + return release(inode, file); + } total: 0 errors, 15 warnings, 462 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/proc-faster-open-read-close-with-permanent-files.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line) torvalds#12: Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such WARNING: braces {} are not necessary for single statement blocks torvalds#163: FILE: fs/proc/generic.c:536: + if (pde->proc_ops->proc_flags & PROC_ENTRY_PERMANENT) { + pde->flags |= PROC_ENTRY_PERMANENT; + } WARNING: line over 80 characters torvalds#203: FILE: fs/proc/generic.c:676: + WARN(1, "removing permanent /proc entry '%s'", de->name); WARNING: braces {} are not necessary for single statement blocks torvalds#207: FILE: fs/proc/generic.c:680: + if (S_ISDIR(de->mode)) { + parent->nlink--; + } WARNING: line over 80 characters torvalds#244: FILE: fs/proc/inode.c:198: +static loff_t pde_lseek(struct proc_dir_entry *pde, struct file *file, loff_t offset, int whence) WARNING: line over 80 characters torvalds#274: FILE: fs/proc/inode.c:222: +static ssize_t pde_read(struct proc_dir_entry *pde, struct file *file, char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#303: FILE: fs/proc/inode.c:246: +static ssize_t pde_write(struct proc_dir_entry *pde, struct file *file, const char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#332: FILE: fs/proc/inode.c:270: +static __poll_t pde_poll(struct proc_dir_entry *pde, struct file *file, struct poll_table_struct *pts) WARNING: line over 80 characters torvalds#361: FILE: fs/proc/inode.c:294: +static long pde_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#391: FILE: fs/proc/inode.c:319: +static long pde_compat_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#421: FILE: fs/proc/inode.c:343: +static int pde_mmap(struct proc_dir_entry *pde, struct file *file, struct vm_area_struct *vma) WARNING: line over 80 characters torvalds#452: FILE: fs/proc/inode.c:368: +pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned long orig_addr, WARNING: line over 80 characters torvalds#489: FILE: fs/proc/inode.c:393: + return pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: line over 80 characters torvalds#491: FILE: fs/proc/inode.c:395: + rv = pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: braces {} are not necessary for single statement blocks torvalds#518: FILE: fs/proc/inode.c:470: + if (release) { + return release(inode, file); + } total: 0 errors, 15 warnings, 462 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/proc-faster-open-read-close-with-permanent-files.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line) torvalds#12: Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such WARNING: braces {} are not necessary for single statement blocks torvalds#163: FILE: fs/proc/generic.c:536: + if (pde->proc_ops->proc_flags & PROC_ENTRY_PERMANENT) { + pde->flags |= PROC_ENTRY_PERMANENT; + } WARNING: line over 80 characters torvalds#203: FILE: fs/proc/generic.c:676: + WARN(1, "removing permanent /proc entry '%s'", de->name); WARNING: braces {} are not necessary for single statement blocks torvalds#207: FILE: fs/proc/generic.c:680: + if (S_ISDIR(de->mode)) { + parent->nlink--; + } WARNING: line over 80 characters torvalds#244: FILE: fs/proc/inode.c:198: +static loff_t pde_lseek(struct proc_dir_entry *pde, struct file *file, loff_t offset, int whence) WARNING: line over 80 characters torvalds#274: FILE: fs/proc/inode.c:222: +static ssize_t pde_read(struct proc_dir_entry *pde, struct file *file, char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#303: FILE: fs/proc/inode.c:246: +static ssize_t pde_write(struct proc_dir_entry *pde, struct file *file, const char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#332: FILE: fs/proc/inode.c:270: +static __poll_t pde_poll(struct proc_dir_entry *pde, struct file *file, struct poll_table_struct *pts) WARNING: line over 80 characters torvalds#361: FILE: fs/proc/inode.c:294: +static long pde_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#391: FILE: fs/proc/inode.c:319: +static long pde_compat_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#421: FILE: fs/proc/inode.c:343: +static int pde_mmap(struct proc_dir_entry *pde, struct file *file, struct vm_area_struct *vma) WARNING: line over 80 characters torvalds#452: FILE: fs/proc/inode.c:368: +pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned long orig_addr, WARNING: line over 80 characters torvalds#489: FILE: fs/proc/inode.c:393: + return pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: line over 80 characters torvalds#491: FILE: fs/proc/inode.c:395: + rv = pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: braces {} are not necessary for single statement blocks torvalds#518: FILE: fs/proc/inode.c:470: + if (release) { + return release(inode, file); + } total: 0 errors, 15 warnings, 462 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/proc-faster-open-read-close-with-permanent-files.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line) torvalds#12: Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such WARNING: braces {} are not necessary for single statement blocks torvalds#163: FILE: fs/proc/generic.c:536: + if (pde->proc_ops->proc_flags & PROC_ENTRY_PERMANENT) { + pde->flags |= PROC_ENTRY_PERMANENT; + } WARNING: line over 80 characters torvalds#203: FILE: fs/proc/generic.c:676: + WARN(1, "removing permanent /proc entry '%s'", de->name); WARNING: braces {} are not necessary for single statement blocks torvalds#207: FILE: fs/proc/generic.c:680: + if (S_ISDIR(de->mode)) { + parent->nlink--; + } WARNING: line over 80 characters torvalds#244: FILE: fs/proc/inode.c:198: +static loff_t pde_lseek(struct proc_dir_entry *pde, struct file *file, loff_t offset, int whence) WARNING: line over 80 characters torvalds#274: FILE: fs/proc/inode.c:222: +static ssize_t pde_read(struct proc_dir_entry *pde, struct file *file, char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#303: FILE: fs/proc/inode.c:246: +static ssize_t pde_write(struct proc_dir_entry *pde, struct file *file, const char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#332: FILE: fs/proc/inode.c:270: +static __poll_t pde_poll(struct proc_dir_entry *pde, struct file *file, struct poll_table_struct *pts) WARNING: line over 80 characters torvalds#361: FILE: fs/proc/inode.c:294: +static long pde_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#391: FILE: fs/proc/inode.c:319: +static long pde_compat_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#421: FILE: fs/proc/inode.c:343: +static int pde_mmap(struct proc_dir_entry *pde, struct file *file, struct vm_area_struct *vma) WARNING: line over 80 characters torvalds#452: FILE: fs/proc/inode.c:368: +pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned long orig_addr, WARNING: line over 80 characters torvalds#489: FILE: fs/proc/inode.c:393: + return pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: line over 80 characters torvalds#491: FILE: fs/proc/inode.c:395: + rv = pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: braces {} are not necessary for single statement blocks torvalds#518: FILE: fs/proc/inode.c:470: + if (release) { + return release(inode, file); + } total: 0 errors, 15 warnings, 462 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/proc-faster-open-read-close-with-permanent-files.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line) torvalds#12: Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such WARNING: braces {} are not necessary for single statement blocks torvalds#163: FILE: fs/proc/generic.c:536: + if (pde->proc_ops->proc_flags & PROC_ENTRY_PERMANENT) { + pde->flags |= PROC_ENTRY_PERMANENT; + } WARNING: line over 80 characters torvalds#203: FILE: fs/proc/generic.c:676: + WARN(1, "removing permanent /proc entry '%s'", de->name); WARNING: braces {} are not necessary for single statement blocks torvalds#207: FILE: fs/proc/generic.c:680: + if (S_ISDIR(de->mode)) { + parent->nlink--; + } WARNING: line over 80 characters torvalds#244: FILE: fs/proc/inode.c:198: +static loff_t pde_lseek(struct proc_dir_entry *pde, struct file *file, loff_t offset, int whence) WARNING: line over 80 characters torvalds#274: FILE: fs/proc/inode.c:222: +static ssize_t pde_read(struct proc_dir_entry *pde, struct file *file, char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#303: FILE: fs/proc/inode.c:246: +static ssize_t pde_write(struct proc_dir_entry *pde, struct file *file, const char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#332: FILE: fs/proc/inode.c:270: +static __poll_t pde_poll(struct proc_dir_entry *pde, struct file *file, struct poll_table_struct *pts) WARNING: line over 80 characters torvalds#361: FILE: fs/proc/inode.c:294: +static long pde_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#391: FILE: fs/proc/inode.c:319: +static long pde_compat_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#421: FILE: fs/proc/inode.c:343: +static int pde_mmap(struct proc_dir_entry *pde, struct file *file, struct vm_area_struct *vma) WARNING: line over 80 characters torvalds#452: FILE: fs/proc/inode.c:368: +pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned long orig_addr, WARNING: line over 80 characters torvalds#489: FILE: fs/proc/inode.c:393: + return pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: line over 80 characters torvalds#491: FILE: fs/proc/inode.c:395: + rv = pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: braces {} are not necessary for single statement blocks torvalds#518: FILE: fs/proc/inode.c:470: + if (release) { + return release(inode, file); + } total: 0 errors, 15 warnings, 462 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/proc-faster-open-read-close-with-permanent-files.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line) torvalds#12: Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such WARNING: braces {} are not necessary for single statement blocks torvalds#163: FILE: fs/proc/generic.c:536: + if (pde->proc_ops->proc_flags & PROC_ENTRY_PERMANENT) { + pde->flags |= PROC_ENTRY_PERMANENT; + } WARNING: line over 80 characters torvalds#203: FILE: fs/proc/generic.c:676: + WARN(1, "removing permanent /proc entry '%s'", de->name); WARNING: braces {} are not necessary for single statement blocks torvalds#207: FILE: fs/proc/generic.c:680: + if (S_ISDIR(de->mode)) { + parent->nlink--; + } WARNING: line over 80 characters torvalds#244: FILE: fs/proc/inode.c:198: +static loff_t pde_lseek(struct proc_dir_entry *pde, struct file *file, loff_t offset, int whence) WARNING: line over 80 characters torvalds#274: FILE: fs/proc/inode.c:222: +static ssize_t pde_read(struct proc_dir_entry *pde, struct file *file, char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#303: FILE: fs/proc/inode.c:246: +static ssize_t pde_write(struct proc_dir_entry *pde, struct file *file, const char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#332: FILE: fs/proc/inode.c:270: +static __poll_t pde_poll(struct proc_dir_entry *pde, struct file *file, struct poll_table_struct *pts) WARNING: line over 80 characters torvalds#361: FILE: fs/proc/inode.c:294: +static long pde_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#391: FILE: fs/proc/inode.c:319: +static long pde_compat_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#421: FILE: fs/proc/inode.c:343: +static int pde_mmap(struct proc_dir_entry *pde, struct file *file, struct vm_area_struct *vma) WARNING: line over 80 characters torvalds#452: FILE: fs/proc/inode.c:368: +pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned long orig_addr, WARNING: line over 80 characters torvalds#489: FILE: fs/proc/inode.c:393: + return pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: line over 80 characters torvalds#491: FILE: fs/proc/inode.c:395: + rv = pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: braces {} are not necessary for single statement blocks torvalds#518: FILE: fs/proc/inode.c:470: + if (release) { + return release(inode, file); + } total: 0 errors, 15 warnings, 462 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/proc-faster-open-read-close-with-permanent-files.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line) torvalds#12: Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such WARNING: braces {} are not necessary for single statement blocks torvalds#163: FILE: fs/proc/generic.c:536: + if (pde->proc_ops->proc_flags & PROC_ENTRY_PERMANENT) { + pde->flags |= PROC_ENTRY_PERMANENT; + } WARNING: line over 80 characters torvalds#203: FILE: fs/proc/generic.c:676: + WARN(1, "removing permanent /proc entry '%s'", de->name); WARNING: braces {} are not necessary for single statement blocks torvalds#207: FILE: fs/proc/generic.c:680: + if (S_ISDIR(de->mode)) { + parent->nlink--; + } WARNING: line over 80 characters torvalds#244: FILE: fs/proc/inode.c:198: +static loff_t pde_lseek(struct proc_dir_entry *pde, struct file *file, loff_t offset, int whence) WARNING: line over 80 characters torvalds#274: FILE: fs/proc/inode.c:222: +static ssize_t pde_read(struct proc_dir_entry *pde, struct file *file, char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#303: FILE: fs/proc/inode.c:246: +static ssize_t pde_write(struct proc_dir_entry *pde, struct file *file, const char __user *buf, size_t count, loff_t *ppos) WARNING: line over 80 characters torvalds#332: FILE: fs/proc/inode.c:270: +static __poll_t pde_poll(struct proc_dir_entry *pde, struct file *file, struct poll_table_struct *pts) WARNING: line over 80 characters torvalds#361: FILE: fs/proc/inode.c:294: +static long pde_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#391: FILE: fs/proc/inode.c:319: +static long pde_compat_ioctl(struct proc_dir_entry *pde, struct file *file, unsigned int cmd, unsigned long arg) WARNING: line over 80 characters torvalds#421: FILE: fs/proc/inode.c:343: +static int pde_mmap(struct proc_dir_entry *pde, struct file *file, struct vm_area_struct *vma) WARNING: line over 80 characters torvalds#452: FILE: fs/proc/inode.c:368: +pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned long orig_addr, WARNING: line over 80 characters torvalds#489: FILE: fs/proc/inode.c:393: + return pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: line over 80 characters torvalds#491: FILE: fs/proc/inode.c:395: + rv = pde_get_unmapped_area(pde, file, orig_addr, len, pgoff, flags); WARNING: braces {} are not necessary for single statement blocks torvalds#518: FILE: fs/proc/inode.c:470: + if (release) { + return release(inode, file); + } total: 0 errors, 15 warnings, 462 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/proc-faster-open-read-close-with-permanent-files.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
…ixes WARNING: labels should not be indented torvalds#158: FILE: ipc/msg.c:1319: + fail_msg_hdrs: WARNING: labels should not be indented torvalds#160: FILE: ipc/msg.c:1321: + fail_msg_bytes: ERROR: space required after that ';' (ctx:VxV) torvalds#203: FILE: ipc/util.h:75: +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} ^ total: 1 errors, 2 warnings, 144 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ipc-msg-mitigate-the-lock-contention-with-percpu-counter.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jiebin Sun <jiebin.sun@intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vasily Averin <vasily.averin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
…ixes WARNING: labels should not be indented torvalds#158: FILE: ipc/msg.c:1319: + fail_msg_hdrs: WARNING: labels should not be indented torvalds#160: FILE: ipc/msg.c:1321: + fail_msg_bytes: ERROR: space required after that ';' (ctx:VxV) torvalds#203: FILE: ipc/util.h:75: +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} ^ total: 1 errors, 2 warnings, 144 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ipc-msg-mitigate-the-lock-contention-with-percpu-counter.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jiebin Sun <jiebin.sun@intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vasily Averin <vasily.averin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
…ixes WARNING: labels should not be indented torvalds#158: FILE: ipc/msg.c:1319: + fail_msg_hdrs: WARNING: labels should not be indented torvalds#160: FILE: ipc/msg.c:1321: + fail_msg_bytes: ERROR: space required after that ';' (ctx:VxV) torvalds#203: FILE: ipc/util.h:75: +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} ^ total: 1 errors, 2 warnings, 144 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ipc-msg-mitigate-the-lock-contention-with-percpu-counter.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jiebin Sun <jiebin.sun@intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vasily Averin <vasily.averin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
…ixes WARNING: labels should not be indented torvalds#158: FILE: ipc/msg.c:1319: + fail_msg_hdrs: WARNING: labels should not be indented torvalds#160: FILE: ipc/msg.c:1321: + fail_msg_bytes: ERROR: space required after that ';' (ctx:VxV) torvalds#203: FILE: ipc/util.h:75: +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} ^ total: 1 errors, 2 warnings, 144 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ipc-msg-mitigate-the-lock-contention-with-percpu-counter.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jiebin Sun <jiebin.sun@intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vasily Averin <vasily.averin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
…ixes WARNING: labels should not be indented torvalds#158: FILE: ipc/msg.c:1319: + fail_msg_hdrs: WARNING: labels should not be indented torvalds#160: FILE: ipc/msg.c:1321: + fail_msg_bytes: ERROR: space required after that ';' (ctx:VxV) torvalds#203: FILE: ipc/util.h:75: +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} ^ total: 1 errors, 2 warnings, 144 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ipc-msg-mitigate-the-lock-contention-with-percpu-counter.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jiebin Sun <jiebin.sun@intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vasily Averin <vasily.averin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
…ixes WARNING: labels should not be indented torvalds#158: FILE: ipc/msg.c:1319: + fail_msg_hdrs: WARNING: labels should not be indented torvalds#160: FILE: ipc/msg.c:1321: + fail_msg_bytes: ERROR: space required after that ';' (ctx:VxV) torvalds#203: FILE: ipc/util.h:75: +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} ^ total: 1 errors, 2 warnings, 144 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ipc-msg-mitigate-the-lock-contention-with-percpu-counter.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jiebin Sun <jiebin.sun@intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vasily Averin <vasily.averin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
…ixes WARNING: labels should not be indented torvalds#158: FILE: ipc/msg.c:1319: + fail_msg_hdrs: WARNING: labels should not be indented torvalds#160: FILE: ipc/msg.c:1321: + fail_msg_bytes: ERROR: space required after that ';' (ctx:VxV) torvalds#203: FILE: ipc/util.h:75: +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} ^ total: 1 errors, 2 warnings, 144 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ipc-msg-mitigate-the-lock-contention-with-percpu-counter.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jiebin Sun <jiebin.sun@intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vasily Averin <vasily.averin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
…ixes WARNING: labels should not be indented torvalds#158: FILE: ipc/msg.c:1319: + fail_msg_hdrs: WARNING: labels should not be indented torvalds#160: FILE: ipc/msg.c:1321: + fail_msg_bytes: ERROR: space required after that ';' (ctx:VxV) torvalds#203: FILE: ipc/util.h:75: +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} ^ total: 1 errors, 2 warnings, 144 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ipc-msg-mitigate-the-lock-contention-with-percpu-counter.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jiebin Sun <jiebin.sun@intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vasily Averin <vasily.averin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
…ixes WARNING: labels should not be indented torvalds#158: FILE: ipc/msg.c:1319: + fail_msg_hdrs: WARNING: labels should not be indented torvalds#160: FILE: ipc/msg.c:1321: + fail_msg_bytes: ERROR: space required after that ';' (ctx:VxV) torvalds#203: FILE: ipc/util.h:75: +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} ^ total: 1 errors, 2 warnings, 144 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ipc-msg-mitigate-the-lock-contention-with-percpu-counter.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jiebin Sun <jiebin.sun@intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vasily Averin <vasily.averin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
…ixes WARNING: labels should not be indented torvalds#158: FILE: ipc/msg.c:1319: + fail_msg_hdrs: WARNING: labels should not be indented torvalds#160: FILE: ipc/msg.c:1321: + fail_msg_bytes: ERROR: space required after that ';' (ctx:VxV) torvalds#203: FILE: ipc/util.h:75: +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} ^ total: 1 errors, 2 warnings, 144 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ipc-msg-mitigate-the-lock-contention-with-percpu-counter.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jiebin Sun <jiebin.sun@intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vasily Averin <vasily.averin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
…ixes WARNING: labels should not be indented torvalds#158: FILE: ipc/msg.c:1319: + fail_msg_hdrs: WARNING: labels should not be indented torvalds#160: FILE: ipc/msg.c:1321: + fail_msg_bytes: ERROR: space required after that ';' (ctx:VxV) torvalds#203: FILE: ipc/util.h:75: +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} ^ total: 1 errors, 2 warnings, 144 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ipc-msg-mitigate-the-lock-contention-with-percpu-counter.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jiebin Sun <jiebin.sun@intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vasily Averin <vasily.averin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Some use-cases and/or data patterns may benefit from larger zspages. Currently the limit on the number of physical pages that are linked into a zspage is hardcoded to 4. Higher limit changes key characteristics of a number of the size clases, improving compactness of the pool and redusing the amount of memory zsmalloc pool uses. For instance, the huge size class watermark is currently set to 3264 bytes. With order 3 zspages we have more normal classe and huge size watermark becomes 3632. With order 4 zspages huge size watermark becomes 3840. Commit #1 has more numbers and some analysis. This patch (of 6): zsmalloc has 255 size classes. Size classes contain a number of zspages, which store objects of the same size. zspage can consist of up to four physical pages. The exact (most optimal) zspage size is calculated for each size class during zsmalloc pool creation. As a reasonable optimization, zsmalloc merges size classes that have similar characteristics: number of pages per zspage and number of objects zspage can store. For example, let's look at the following size classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable .. 94 1536 0 0 0 0 0 3 0 100 1632 0 0 0 0 0 2 0 .. Size classes torvalds#95-99 are merged with size class torvalds#100. That is, each time we store an object of size, say, 1568 bytes instead of using class torvalds#96 we end up storing it in size class torvalds#100. Class torvalds#100 is for objects of 1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes. Class torvalds#100 zspages consist of 2 physical pages and can hold 5 objects. When we need to store, say, 13 objects of size 1568 we end up allocating three zspages; in other words, 6 physical pages. However, if we'll look closer at size class torvalds#96 (which should hold objects of size 1568 bytes) and trace get_pages_per_zspage(): pages per zspage wasted bytes used% 1 960 76 2 352 95 3 1312 89 4 704 95 5 96 99 We'd notice that the most optimal zspage configuration for this class is when it consists of 5 physical pages, but currently we never let zspages to consists of more than 4 pages. A 5 page class torvalds#96 configuration would store 13 objects of size 1568 in a single zspage, allocating 5 physical pages, as opposed to 6 physical pages that class torvalds#100 will allocate. A higher order zspage for class torvalds#96 also changes its key characteristics: pages per-zspage and objects per-zspage. As a result classes torvalds#96 and torvalds#100 are not merged anymore, which gives us more compact zsmalloc. Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zram0/classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 254 4096 0 0 0 0 0 1 0 ... For exactly same reason - maximum 4 pages per zspage - the last non-huge size class is torvalds#202, which stores objects of size 3264 bytes. Any object larger than 3264 bytes, hence, is considered to be huge and lands in size class torvalds#254, which uses a whole physical page to store every object. To put it slightly differently - objects in huge classes don't share physical pages. 3264 bytes is too low of a watermark and we have too many huge classes: classes from torvalds#203 to torvalds#254. Similarly to class size torvalds#96 above, higher order zspages change key characteristics for some of those huge size classes and thus those classes become normal classes, where stored objects share physical pages. We move huge class watermark with higher order zspages. For order 3, huge class watermark becomes 3632 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 211 3408 0 0 0 0 0 5 0 217 3504 0 0 0 0 0 6 0 222 3584 0 0 0 0 0 7 0 225 3632 0 0 0 0 0 8 0 254 4096 0 0 0 0 0 1 0 ... For order 4, huge class watermark becomes 3840 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 206 3328 0 0 0 0 0 13 0 207 3344 0 0 0 0 0 9 0 208 3360 0 0 0 0 0 14 0 211 3408 0 0 0 0 0 5 0 212 3424 0 0 0 0 0 16 0 214 3456 0 0 0 0 0 11 0 217 3504 0 0 0 0 0 6 0 219 3536 0 0 0 0 0 13 0 222 3584 0 0 0 0 0 7 0 223 3600 0 0 0 0 0 15 0 225 3632 0 0 0 0 0 8 0 228 3680 0 0 0 0 0 9 0 230 3712 0 0 0 0 0 10 0 232 3744 0 0 0 0 0 11 0 234 3776 0 0 0 0 0 12 0 235 3792 0 0 0 0 0 13 0 236 3808 0 0 0 0 0 14 0 238 3840 0 0 0 0 0 15 0 254 4096 0 0 0 0 0 1 0 ... TESTS ===== 1) ChromeOS memory pressure test ----------------------------------------------------------------------------- Our standard memory pressure test, that is designed with the reproducibility in mind. zram is configured as a swap device, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted device. Columns per (Documentation/admin-guide/blockdev/zram.rst) orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) 10353639424 2981711944 3166896128 0 3543158784 579494 825135 123707 10168573952 2932288347 3106541568 0 3499085824 565187 853137 126153 9950461952 2815911234 3035693056 0 3441090560 586696 748054 122103 9892335616 2779566152 2943459328 0 3514736640 591541 650696 119621 9993949184 2814279212 3021357056 0 3336421376 582488 711744 121273 9953226752 2856382009 3025649664 0 3512893440 564559 787861 123034 9838448640 2785481728 2997575680 0 3367219200 573282 777099 122739 ORDER 3 9509138432 2706941227 2823393280 0 3389587456 535856 1011472 90223 10105245696 2882368370 3013095424 0 3296165888 563896 1059033 94808 9531236352 2666125512 2867650560 0 3396173824 567117 1126396 88807 9561812992 2714536764 2956652544 0 3310505984 548223 827322 90992 9807470592 2790315707 2908053504 0 3378315264 563670 1020933 93725 10178371584 2948838782 3071209472 0 3329548288 548533 954546 90730 9925165056 2849839413 2958274560 0 3336978432 551464 1058302 89381 ORDER 4 9444515840 2613362645 2668232704 0 3396759552 573735 1162207 83475 10129108992 2925888488 3038351360 0 3499597824 555634 1231542 84525 9876594688 2786692282 2897006592 0 3469463552 584835 1290535 84133 10012909568 2649711847 2801512448 0 3171323904 675405 750728 80424 10120966144 2866742402 2978639872 0 3257815040 587435 1093981 83587 9578790912 2671245225 2802270208 0 3376353280 545548 1047930 80895 10108588032 2888433523 2983960576 0 3316641792 571445 1290640 81402 First, we establish that order 3 and 4 don't cause any statistically significant change in `orig_data_size` (number of bytes we store during the test), in other words larger zspages don't cause regressions. T-test for order 3: x order-2-stored + order-3-stored +-----------------------------------------------------------------------------+ |+ + + + x x + x x + x+ x| | |________________________AM__|_________M_____A____|__________| | +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 7 9.8384486e+09 1.0353639e+10 9.9532268e+09 1.0021519e+10 1.7916718e+08 + 7 9.5091384e+09 1.0178372e+10 9.8074706e+09 9.8026344e+09 2.7856206e+08 No difference proven at 95.0% confidence T-test for order 4: x order-2-stored + order-4-stored +-----------------------------------------------------------------------------+ | + | |+ + x +x xx x + ++ x x| | |__________________|____A____M____M____________|_| | +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 7 9.8384486e+09 1.0353639e+10 9.9532268e+09 1.0021519e+10 1.7916718e+08 + 7 9.4445158e+09 1.0129109e+10 1.001291e+10 9.8959249e+09 2.7947784e+08 No difference proven at 95.0% confidence Next we establish that there is a statistically significant improvement in `mem_used_total` metrics. T-test for order 3: x order-2-usedmem + order-3-usedmem +-----------------------------------------------------------------------------+ |+ + + x ++ x + xx x + x x| | |_________________A__M__|____________|__A________________| | +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 7 2.9434593e+09 3.1668961e+09 3.0256497e+09 3.0424532e+09 73235062 + 7 2.8233933e+09 3.0712095e+09 2.9566525e+09 2.9426185e+09 84630851 Difference at 95.0% confidence -9.98347e+07 +/- 9.21744e+07 -3.28139% +/- 3.02961% (Student's t, pooled s = 7.91383e+07) T-test for order 4: x order-2-usedmem + order-4-usedmem +-----------------------------------------------------------------------------+ | + x | |+ + + x ++ x x * x x| | |__________________A__M__________|_____|_M__A__________| | +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 7 2.9434593e+09 3.1668961e+09 3.0256497e+09 3.0424532e+09 73235062 + 7 2.6682327e+09 3.0383514e+09 2.8970066e+09 2.8814248e+09 1.3098053e+08 Difference at 95.0% confidence -1.61028e+08 +/- 1.23591e+08 -5.29272% +/- 4.0622% (Student's t, pooled s = 1.06111e+08) Order 3 zspages also show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem + order-3-maxmem +-----------------------------------------------------------------------------+ |+ + + x+ x + + + x x x x| | |________M__A_________|_|_____________________A___________M____________| | +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 7 3.3364214e+09 3.5431588e+09 3.4990858e+09 3.4592294e+09 80073158 + 7 3.2961659e+09 3.3961738e+09 3.3369784e+09 3.3481822e+09 39840377 Difference at 95.0% confidence -1.11047e+08 +/- 7.36589e+07 -3.21017% +/- 2.12934% (Student's t, pooled s = 6.32415e+07) Order 4 zspages, on the other hand, do not show any statistically significant improvement in `mem_used_max` metrics. T-test for order 4: x order-2-maxmem + order-4-maxmem +-----------------------------------------------------------------------------+ |+ + + x x + + x + * x x| | |_______________________A___M________________A_|_____M_______| | +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 7 3.3364214e+09 3.5431588e+09 3.4990858e+09 3.4592294e+09 80073158 + 7 3.1713239e+09 3.4995978e+09 3.3763533e+09 3.3554221e+09 1.1609062e+08 No difference proven at 95.0% confidence Overall, with sufficient level of confidence order 3 zspages appear to be beneficial for these particular use-case and data patterns. Rather expectedly we also observed lower numbers of huge-pages when zsmalloc is configured with order 3 and order 4 zspages, for the reason already explained. 2) Synthetic test ----------------------------------------------------------------------------- Test untars linux-6.0.tar.xz and compiles the kernel. zram is configured as a block device with ext4 file system, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted VM. orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) 1691807744 628091753 655187968 0 655187968 59 0 34042 34043 1691803648 628089105 655159296 0 655159296 60 0 34043 34043 1691795456 628087429 655151104 0 655151104 59 0 34046 34046 1691799552 628093723 655216640 0 655216640 60 0 34044 34044 ORDER 3 1691787264 627781464 641740800 0 641740800 59 0 33591 33591 1691795456 627794239 641789952 0 641789952 59 0 33591 33591 1691811840 627788466 641691648 0 641691648 60 0 33591 33591 1691791360 627790682 641781760 0 641781760 59 0 33591 33591 ORDER 4 1691807744 627729506 639627264 0 639627264 59 0 33432 33432 1691820032 627731485 639606784 0 639606784 59 0 33432 33432 1691799552 627725753 639623168 0 639623168 59 0 33432 33433 1691820032 627734080 639746048 0 639746048 61 0 33432 33432 Order 3 and order 4 show statistically significant improvement in `mem_used_total` metrics. T-test for order 3: x order-2-usedmem-comp + order-3-usedmem-comp +-----------------------------------------------------------------------------+ |++ x| |++ x| |AM A| +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 4 6.551511e+08 6.5521664e+08 6.5518797e+08 6.5517875e+08 29795.878 + 4 6.4169165e+08 6.4178995e+08 6.4178176e+08 6.4175104e+08 45056 Difference at 95.0% confidence -1.34277e+07 +/- 66089.8 -2.04947% +/- 0.0100873% (Student's t, pooled s = 38195.8) T-test for order 4: x order-2-usedmem-comp + order-4-usedmem-comp +-----------------------------------------------------------------------------+ |+ x| |+ x| |++ x| |A| A| +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 4 6.551511e+08 6.5521664e+08 6.5518797e+08 6.5517875e+08 29795.878 + 4 6.3960678e+08 6.3974605e+08 6.3962726e+08 6.3965082e+08 64101.637 Difference at 95.0% confidence -1.55279e+07 +/- 86486.9 -2.37003% +/- 0.0132005% (Student's t, pooled s = 49984.1) Order 3 and order 4 show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem-comp + order-3-maxmem-comp +-----------------------------------------------------------------------------+ |++ x| |++ x| |AM A| +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 4 6.551511e+08 6.5521664e+08 6.5518797e+08 6.5517875e+08 29795.878 + 4 6.4169165e+08 6.4178995e+08 6.4178176e+08 6.4175104e+08 45056 Difference at 95.0% confidence -1.34277e+07 +/- 66089.8 -2.04947% +/- 0.0100873% (Student's t, pooled s = 38195.8) T-test for order 4: x order-2-maxmem-comp + order-4-maxmem-comp +-----------------------------------------------------------------------------+ |+ x| |+ x| |++ x| |A| A| +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 4 6.551511e+08 6.5521664e+08 6.5518797e+08 6.5517875e+08 29795.878 + 4 6.3960678e+08 6.3974605e+08 6.3962726e+08 6.3965082e+08 64101.637 Difference at 95.0% confidence -1.55279e+07 +/- 86486.9 -2.37003% +/- 0.0132005% (Student's t, pooled s = 49984.1) This test tends to benefit more from order 4 zspages, due to test's data patterns. Data patterns that generate a considerable number of badly compressible objects benefit from higher `huge_class_size` watermark, which is achieved with order 4 zspages. Link: https://lkml.kernel.org/r/20221024161213.3221725-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20221024161213.3221725-2-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zsmalloc has 255 size classes. Size classes contain a number of zspages, which store objects of the same size. zspage can consist of up to four physical pages. The exact (most optimal) zspage size is calculated for each size class during zsmalloc pool creation. As a reasonable optimization, zsmalloc merges size classes that have similar characteristics: number of pages per zspage and number of objects zspage can store. For example, let's look at the following size classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable .. 94 1536 0 0 0 0 0 3 0 100 1632 0 0 0 0 0 2 0 .. Size classes torvalds#95-99 are merged with size class torvalds#100. That is, each time we store an object of size, say, 1568 bytes instead of using class torvalds#96 we end up storing it in size class torvalds#100. Class torvalds#100 is for objects of 1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes. Class torvalds#100 zspages consist of 2 physical pages and can hold 5 objects. When we need to store, say, 13 objects of size 1568 we end up allocating three zspages; in other words, 6 physical pages. However, if we'll look closer at size class torvalds#96 (which should hold objects of size 1568 bytes) and trace get_pages_per_zspage(): pages per zspage wasted bytes used% 1 960 76 2 352 95 3 1312 89 4 704 95 5 96 99 We'd notice that the most optimal zspage configuration for this class is when it consists of 5 physical pages, but currently we never let zspages to consists of more than 4 pages. A 5 page class torvalds#96 configuration would store 13 objects of size 1568 in a single zspage, allocating 5 physical pages, as opposed to 6 physical pages that class torvalds#100 will allocate. A higher order zspage for class torvalds#96 also changes its key characteristics: pages per-zspage and objects per-zspage. As a result classes torvalds#96 and torvalds#100 are not merged anymore, which gives us more compact zsmalloc. Of course the described effect does not apply only to size classes torvalds#96 and We still merge classes, but less often so. In other words classes are grouped in a more compact way, which decreases memory wastage: zspage order # unique size classes 2 69 3 123 4 191 Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zram0/classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 254 4096 0 0 0 0 0 1 0 ... For exactly same reason - maximum 4 pages per zspage - the last non-huge size class is torvalds#202, which stores objects of size 3264 bytes. Any object larger than 3264 bytes, hence, is considered to be huge and lands in size class torvalds#254, which uses a whole physical page to store every object. To put it slightly differently - objects in huge classes don't share physical pages. 3264 bytes is too low of a watermark and we have too many huge classes: classes from torvalds#203 to torvalds#254. Similarly to class size torvalds#96 above, higher order zspages change key characteristics for some of those huge size classes and thus those classes become normal classes, where stored objects share physical pages. Hence yet another consequence of higher order zspages: we move the huge size class watermark with higher order zspages, have less huge classes and store large objects in a more compact way. For order 3, huge class watermark becomes 3632 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 211 3408 0 0 0 0 0 5 0 217 3504 0 0 0 0 0 6 0 222 3584 0 0 0 0 0 7 0 225 3632 0 0 0 0 0 8 0 254 4096 0 0 0 0 0 1 0 ... For order 4, huge class watermark becomes 3840 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 206 3328 0 0 0 0 0 13 0 207 3344 0 0 0 0 0 9 0 208 3360 0 0 0 0 0 14 0 211 3408 0 0 0 0 0 5 0 212 3424 0 0 0 0 0 16 0 214 3456 0 0 0 0 0 11 0 217 3504 0 0 0 0 0 6 0 219 3536 0 0 0 0 0 13 0 222 3584 0 0 0 0 0 7 0 223 3600 0 0 0 0 0 15 0 225 3632 0 0 0 0 0 8 0 228 3680 0 0 0 0 0 9 0 230 3712 0 0 0 0 0 10 0 232 3744 0 0 0 0 0 11 0 234 3776 0 0 0 0 0 12 0 235 3792 0 0 0 0 0 13 0 236 3808 0 0 0 0 0 14 0 238 3840 0 0 0 0 0 15 0 254 4096 0 0 0 0 0 1 0 ... TESTS ===== 1) ChromeOS memory pressure test ============================================================================= Our standard memory pressure test, that is designed with reproducibility in mind. zram is configured as a swap device, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted device. Columns per (Documentation/admin-guide/blockdev/zram.rst) orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) zspage 10353639424 2981711944 3166896128 0 3543158784 579494 825135 123707 10168573952 2932288347 3106541568 0 3499085824 565187 853137 126153 9950461952 2815911234 3035693056 0 3441090560 586696 748054 122103 9892335616 2779566152 2943459328 0 3514736640 591541 650696 119621 9993949184 2814279212 3021357056 0 3336421376 582488 711744 121273 9953226752 2856382009 3025649664 0 3512893440 564559 787861 123034 9838448640 2785481728 2997575680 0 3367219200 573282 777099 122739 ORDER 3 zspage 9509138432 2706941227 2823393280 0 3389587456 535856 1011472 90223 10105245696 2882368370 3013095424 0 3296165888 563896 1059033 94808 9531236352 2666125512 2867650560 0 3396173824 567117 1126396 88807 9561812992 2714536764 2956652544 0 3310505984 548223 827322 90992 9807470592 2790315707 2908053504 0 3378315264 563670 1020933 93725 10178371584 2948838782 3071209472 0 3329548288 548533 954546 90730 9925165056 2849839413 2958274560 0 3336978432 551464 1058302 89381 ORDER 4 zspage 9444515840 2613362645 2668232704 0 3396759552 573735 1162207 83475 10129108992 2925888488 3038351360 0 3499597824 555634 1231542 84525 9876594688 2786692282 2897006592 0 3469463552 584835 1290535 84133 10012909568 2649711847 2801512448 0 3171323904 675405 750728 80424 10120966144 2866742402 2978639872 0 3257815040 587435 1093981 83587 9578790912 2671245225 2802270208 0 3376353280 545548 1047930 80895 10108588032 2888433523 2983960576 0 3316641792 571445 1290640 81402 First, we establish that order 3 and 4 don't cause any statistically significant change in `orig_data_size` (number of bytes we store during the test), in other words larger zspages don't cause regressions. T-test for order 3: x order-2-stored + order-3-stored +-----------------------------------------------------------------------------+ |+ + + + x x + x x + x+ x| | |________________________AM__|_________M_____A____|__________| | +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 7 9.8384486e+09 1.0353639e+10 9.9532268e+09 1.0021519e+10 1.7916718e+08 + 7 9.5091384e+09 1.0178372e+10 9.8074706e+09 9.8026344e+09 2.7856206e+08 No difference proven at 95.0% confidence T-test for order 4: x order-2-stored + order-4-stored +-----------------------------------------------------------------------------+ | + | |+ + x +x xx x + ++ x x| | |__________________|____A____M____M____________|_| | +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 7 9.8384486e+09 1.0353639e+10 9.9532268e+09 1.0021519e+10 1.7916718e+08 + 7 9.4445158e+09 1.0129109e+10 1.001291e+10 9.8959249e+09 2.7947784e+08 No difference proven at 95.0% confidence Next we establish that there is a statistically significant improvement in `mem_used_total` metrics. T-test for order 3: x order-2-usedmem + order-3-usedmem +-----------------------------------------------------------------------------+ |+ + + x ++ x + xx x + x x| | |_________________A__M__|____________|__A________________| | +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 7 2.9434593e+09 3.1668961e+09 3.0256497e+09 3.0424532e+09 73235062 + 7 2.8233933e+09 3.0712095e+09 2.9566525e+09 2.9426185e+09 84630851 Difference at 95.0% confidence -9.98347e+07 +/- 9.21744e+07 -3.28139% +/- 3.02961% (Student's t, pooled s = 7.91383e+07) T-test for order 4: x order-2-usedmem + order-4-usedmem +-----------------------------------------------------------------------------+ | + x | |+ + + x ++ x x * x x| | |__________________A__M__________|_____|_M__A__________| | +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 7 2.9434593e+09 3.1668961e+09 3.0256497e+09 3.0424532e+09 73235062 + 7 2.6682327e+09 3.0383514e+09 2.8970066e+09 2.8814248e+09 1.3098053e+08 Difference at 95.0% confidence -1.61028e+08 +/- 1.23591e+08 -5.29272% +/- 4.0622% (Student's t, pooled s = 1.06111e+08) Order 3 zspages also show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem + order-3-maxmem +-----------------------------------------------------------------------------+ |+ + + x+ x + + + x x x x| | |________M__A_________|_|_____________________A___________M____________| | +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 7 3.3364214e+09 3.5431588e+09 3.4990858e+09 3.4592294e+09 80073158 + 7 3.2961659e+09 3.3961738e+09 3.3369784e+09 3.3481822e+09 39840377 Difference at 95.0% confidence -1.11047e+08 +/- 7.36589e+07 -3.21017% +/- 2.12934% (Student's t, pooled s = 6.32415e+07) Order 4 zspages, on the other hand, do not show any statistically significant improvement in `mem_used_max` metrics. T-test for order 4: x order-2-maxmem + order-4-maxmem +-----------------------------------------------------------------------------+ |+ + + x x + + x + * x x| | |_______________________A___M________________A_|_____M_______| | +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 7 3.3364214e+09 3.5431588e+09 3.4990858e+09 3.4592294e+09 80073158 + 7 3.1713239e+09 3.4995978e+09 3.3763533e+09 3.3554221e+09 1.1609062e+08 No difference proven at 95.0% confidence Overall, with sufficient level of confidence, order 3 zspages appear to be beneficial for these particular use-case and data patterns. Rather expectedly we also observed lower numbers of huge-pages when zsmalloc is configured with order 3 and order 4 zspages, for the reason already explained. 2) Synthetic test ============================================================================= Test untars linux-6.0.tar.xz and compiles the kernel. zram is configured as a block device with ext4 file system, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted the VM. orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) zspage 1691791360 628086729 655171584 0 655171584 60 0 34043 1691787264 628089196 655175680 0 655175680 60 0 34046 1691803648 628098840 655187968 0 655187968 59 0 34047 1691795456 628091503 655183872 0 655183872 60 0 34044 1691799552 628086877 655183872 0 655183872 60 0 34047 ORDER 3 zspage 1691803648 627792993 641794048 0 641794048 60 0 33591 1691787264 627779342 641708032 0 641708032 59 0 33591 1691811840 627786616 641769472 0 641769472 60 0 33591 1691803648 627794468 641818624 0 641818624 59 0 33592 1691783168 627780882 641794048 0 641794048 61 0 33591 ORDER 4 zspage 1691803648 627726635 639655936 0 639655936 60 0 33435 1691811840 627733348 639643648 0 639643648 61 0 33434 1691795456 627726290 639614976 0 639614976 60 0 33435 1691803648 627730458 639688704 0 639688704 60 0 33434 1691811840 627727771 639688704 0 639688704 60 0 33434 Order 3 and order 4 show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem + order-3-maxmem +--------------------------------------------------------------------------+ |+ x| |+ x| |+ x| |++ x| |A| A| +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.4170803e+08 6.4181862e+08 6.4179405e+08 6.4177684e+08 42210.666 Difference at 95.0% confidence -1.34038e+07 +/- 44080.7 -2.04581% +/- 0.00672802% (Student's t, pooled s = 30224.5) T-test for order 4: x order-2-maxmem + order-4-maxmem +--------------------------------------------------------------------------+ |+ x| |+ x| |+ x| |+ x| |+ x| |A A| +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.3961498e+08 6.396887e+08 6.3965594e+08 6.3965839e+08 31408.602 Difference at 95.0% confidence -1.55222e+07 +/- 33126.2 -2.36915% +/- 0.00505604% (Student's t, pooled s = 22713.4) This test tends to benefit more from order 4 zspages, due to test's data patterns. zsmalloc object distribution analysis ============================================================================= Order 2 (4 pages per zspage) tends to put many objects in size class 2048, which is merged with size classes torvalds#112-torvalds#125: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 0 6146 6146 1756 2 0 74 1216 0 1 4560 4552 1368 3 0 76 1248 0 1 2938 2934 904 4 0 83 1360 0 0 10971 10971 3657 1 0 91 1488 0 0 16126 16126 5864 4 0 94 1536 0 1 5912 5908 2217 3 0 100 1632 0 0 11990 11990 4796 2 0 107 1744 0 1 15771 15768 6759 3 0 111 1808 0 1 10386 10380 4616 4 0 126 2048 0 0 45444 45444 22722 1 0 144 2336 0 0 47446 47446 27112 4 0 151 2448 1 0 10760 10759 6456 3 0 168 2720 0 0 10173 10173 6782 2 0 190 3072 0 1 1700 1697 1275 3 0 202 3264 0 1 290 286 232 4 0 254 4096 0 0 34051 34051 34051 1 0 Order 3 (8 pages per zspage) changed pool characteristics and unmerged some of the size classes, which resulted in less objects being put into size class 2048, because there are lower size classes are now available for more compact object storage: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 1 2996 2994 856 2 0 72 1184 0 1 1632 1609 476 7 0 73 1200 1 0 1445 1442 425 5 0 74 1216 0 0 1510 1510 453 3 0 75 1232 0 1 1495 1479 455 7 0 76 1248 0 1 1456 1451 448 4 0 78 1280 0 1 3040 3033 950 5 0 79 1296 0 1 1584 1571 504 7 0 83 1360 0 0 6375 6375 2125 1 0 84 1376 0 1 1817 1796 632 8 0 87 1424 0 1 6020 6006 2107 7 0 88 1440 0 1 2108 2101 744 6 0 89 1456 0 1 2072 2064 740 5 0 91 1488 0 1 4169 4159 1516 4 0 92 1504 0 1 2014 2007 742 7 0 94 1536 0 1 3904 3900 1464 3 0 95 1552 0 1 1890 1873 720 8 0 96 1568 0 1 1963 1958 755 5 0 97 1584 0 1 1980 1974 770 7 0 100 1632 0 1 6190 6187 2476 2 0 103 1680 0 0 6477 6477 2667 7 0 104 1696 0 1 2256 2253 940 5 0 105 1712 0 1 2356 2340 992 8 0 107 1744 1 0 4697 4696 2013 3 0 110 1792 0 1 7744 7734 3388 7 0 111 1808 0 1 2655 2649 1180 4 0 114 1856 0 1 8371 8365 3805 5 0 116 1888 1 0 5863 5862 2706 6 0 117 1904 0 1 2955 2942 1379 7 0 118 1920 0 1 3009 2997 1416 8 0 126 2048 0 0 25276 25276 12638 1 0 128 2080 0 1 6060 6052 3232 8 0 129 2096 1 0 3081 3080 1659 7 0 134 2176 0 1 14835 14830 7912 8 0 135 2192 0 1 2769 2758 1491 7 0 137 2224 0 1 5082 5077 2772 6 0 140 2272 0 1 7236 7232 4020 5 0 144 2336 0 1 8428 8423 4816 4 0 147 2384 0 1 5316 5313 3101 7 0 151 2448 0 1 5445 5443 3267 3 0 155 2512 0 0 4121 4121 2536 8 0 158 2560 0 1 2208 2205 1380 5 0 160 2592 0 0 1133 1133 721 7 0 168 2720 0 0 2712 2712 1808 2 0 177 2864 1 0 1100 1098 770 7 0 180 2912 0 1 189 183 135 5 0 184 2976 0 1 176 166 128 8 0 190 3072 0 0 252 252 189 3 0 197 3184 0 1 198 192 154 7 0 202 3264 0 1 100 96 80 4 0 211 3408 0 1 210 208 175 5 0 217 3504 0 1 98 94 84 6 0 222 3584 0 0 104 104 91 7 0 225 3632 0 1 54 50 48 8 0 254 4096 0 0 33591 33591 33591 1 0 Note, the huge size watermark is above 3632 and there are a number of new normal classes available that previously were merged with the huge class. For instance, size class torvalds#211 holds 210 objects of size 3408 and uses 175 physical pages, while previously for those objects we would have used 210 physical pages. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
zsmalloc has 255 size classes. Size classes contain a number of zspages, which store objects of the same size. zspage can consist of up to four physical pages. The exact (most optimal) zspage size is calculated for each size class during zsmalloc pool creation. As a reasonable optimization, zsmalloc merges size classes that have similar characteristics: number of pages per zspage and number of objects zspage can store. For example, let's look at the following size classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable .. 94 1536 0 0 0 0 0 3 0 100 1632 0 0 0 0 0 2 0 .. Size classes torvalds#95-99 are merged with size class torvalds#100. That is, each time we store an object of size, say, 1568 bytes instead of using class torvalds#96 we end up storing it in size class torvalds#100. Class torvalds#100 is for objects of 1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes. Class torvalds#100 zspages consist of 2 physical pages and can hold 5 objects. When we need to store, say, 13 objects of size 1568 we end up allocating three zspages; in other words, 6 physical pages. However, if we'll look closer at size class torvalds#96 (which should hold objects of size 1568 bytes) and trace get_pages_per_zspage(): pages per zspage wasted bytes used% 1 960 76 2 352 95 3 1312 89 4 704 95 5 96 99 We'd notice that the most optimal zspage configuration for this class is when it consists of 5 physical pages, but currently we never let zspages to consists of more than 4 pages. A 5 page class torvalds#96 configuration would store 13 objects of size 1568 in a single zspage, allocating 5 physical pages, as opposed to 6 physical pages that class torvalds#100 will allocate. A higher order zspage for class torvalds#96 also changes its key characteristics: pages per-zspage and objects per-zspage. As a result classes torvalds#96 and torvalds#100 are not merged anymore, which gives us more compact zsmalloc. Of course the described effect does not apply only to size classes torvalds#96 and We still merge classes, but less often so. In other words classes are grouped in a more compact way, which decreases memory wastage: zspage order # unique size classes 2 69 3 123 4 191 Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zram0/classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 254 4096 0 0 0 0 0 1 0 ... For exactly same reason - maximum 4 pages per zspage - the last non-huge size class is torvalds#202, which stores objects of size 3264 bytes. Any object larger than 3264 bytes, hence, is considered to be huge and lands in size class torvalds#254, which uses a whole physical page to store every object. To put it slightly differently - objects in huge classes don't share physical pages. 3264 bytes is too low of a watermark and we have too many huge classes: classes from torvalds#203 to torvalds#254. Similarly to class size torvalds#96 above, higher order zspages change key characteristics for some of those huge size classes and thus those classes become normal classes, where stored objects share physical pages. Hence yet another consequence of higher order zspages: we move the huge size class watermark with higher order zspages, have less huge classes and store large objects in a more compact way. For order 3, huge class watermark becomes 3632 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 211 3408 0 0 0 0 0 5 0 217 3504 0 0 0 0 0 6 0 222 3584 0 0 0 0 0 7 0 225 3632 0 0 0 0 0 8 0 254 4096 0 0 0 0 0 1 0 ... For order 4, huge class watermark becomes 3840 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 206 3328 0 0 0 0 0 13 0 207 3344 0 0 0 0 0 9 0 208 3360 0 0 0 0 0 14 0 211 3408 0 0 0 0 0 5 0 212 3424 0 0 0 0 0 16 0 214 3456 0 0 0 0 0 11 0 217 3504 0 0 0 0 0 6 0 219 3536 0 0 0 0 0 13 0 222 3584 0 0 0 0 0 7 0 223 3600 0 0 0 0 0 15 0 225 3632 0 0 0 0 0 8 0 228 3680 0 0 0 0 0 9 0 230 3712 0 0 0 0 0 10 0 232 3744 0 0 0 0 0 11 0 234 3776 0 0 0 0 0 12 0 235 3792 0 0 0 0 0 13 0 236 3808 0 0 0 0 0 14 0 238 3840 0 0 0 0 0 15 0 254 4096 0 0 0 0 0 1 0 ... TESTS ===== Test untars linux-6.0.tar.xz and compiles the kernel. zram is configured as a block device with ext4 file system, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted the VM. orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) zspage 1691791360 628086729 655171584 0 655171584 60 0 34043 1691787264 628089196 655175680 0 655175680 60 0 34046 1691803648 628098840 655187968 0 655187968 59 0 34047 1691795456 628091503 655183872 0 655183872 60 0 34044 1691799552 628086877 655183872 0 655183872 60 0 34047 ORDER 3 zspage 1691803648 627792993 641794048 0 641794048 60 0 33591 1691787264 627779342 641708032 0 641708032 59 0 33591 1691811840 627786616 641769472 0 641769472 60 0 33591 1691803648 627794468 641818624 0 641818624 59 0 33592 1691783168 627780882 641794048 0 641794048 61 0 33591 ORDER 4 zspage 1691803648 627726635 639655936 0 639655936 60 0 33435 1691811840 627733348 639643648 0 639643648 61 0 33434 1691795456 627726290 639614976 0 639614976 60 0 33435 1691803648 627730458 639688704 0 639688704 60 0 33434 1691811840 627727771 639688704 0 639688704 60 0 33434 Order 3 and order 4 show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem + order-3-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.4170803e+08 6.4181862e+08 6.4179405e+08 6.4177684e+08 42210.666 Difference at 95.0% confidence -1.34038e+07 +/- 44080.7 -2.04581% +/- 0.00672802% (Student's t, pooled s = 30224.5) T-test for order 4: x order-2-maxmem + order-4-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.3961498e+08 6.396887e+08 6.3965594e+08 6.3965839e+08 31408.602 Difference at 95.0% confidence -1.55222e+07 +/- 33126.2 -2.36915% +/- 0.00505604% (Student's t, pooled s = 22713.4) This test tends to benefit more from order 4 zspages, due to test's data patterns. zsmalloc object distribution analysis ============================================================================= Order 2 (4 pages per zspage) tends to put many objects in size class 2048, which is merged with size classes torvalds#112-torvalds#125: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 0 6146 6146 1756 2 0 74 1216 0 1 4560 4552 1368 3 0 76 1248 0 1 2938 2934 904 4 0 83 1360 0 0 10971 10971 3657 1 0 91 1488 0 0 16126 16126 5864 4 0 94 1536 0 1 5912 5908 2217 3 0 100 1632 0 0 11990 11990 4796 2 0 107 1744 0 1 15771 15768 6759 3 0 111 1808 0 1 10386 10380 4616 4 0 126 2048 0 0 45444 45444 22722 1 0 144 2336 0 0 47446 47446 27112 4 0 151 2448 1 0 10760 10759 6456 3 0 168 2720 0 0 10173 10173 6782 2 0 190 3072 0 1 1700 1697 1275 3 0 202 3264 0 1 290 286 232 4 0 254 4096 0 0 34051 34051 34051 1 0 Order 3 (8 pages per zspage) changed pool characteristics and unmerged some of the size classes, which resulted in less objects being put into size class 2048, because there are lower size classes are now available for more compact object storage: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 1 2996 2994 856 2 0 72 1184 0 1 1632 1609 476 7 0 73 1200 1 0 1445 1442 425 5 0 74 1216 0 0 1510 1510 453 3 0 75 1232 0 1 1495 1479 455 7 0 76 1248 0 1 1456 1451 448 4 0 78 1280 0 1 3040 3033 950 5 0 79 1296 0 1 1584 1571 504 7 0 83 1360 0 0 6375 6375 2125 1 0 84 1376 0 1 1817 1796 632 8 0 87 1424 0 1 6020 6006 2107 7 0 88 1440 0 1 2108 2101 744 6 0 89 1456 0 1 2072 2064 740 5 0 91 1488 0 1 4169 4159 1516 4 0 92 1504 0 1 2014 2007 742 7 0 94 1536 0 1 3904 3900 1464 3 0 95 1552 0 1 1890 1873 720 8 0 96 1568 0 1 1963 1958 755 5 0 97 1584 0 1 1980 1974 770 7 0 100 1632 0 1 6190 6187 2476 2 0 103 1680 0 0 6477 6477 2667 7 0 104 1696 0 1 2256 2253 940 5 0 105 1712 0 1 2356 2340 992 8 0 107 1744 1 0 4697 4696 2013 3 0 110 1792 0 1 7744 7734 3388 7 0 111 1808 0 1 2655 2649 1180 4 0 114 1856 0 1 8371 8365 3805 5 0 116 1888 1 0 5863 5862 2706 6 0 117 1904 0 1 2955 2942 1379 7 0 118 1920 0 1 3009 2997 1416 8 0 126 2048 0 0 25276 25276 12638 1 0 128 2080 0 1 6060 6052 3232 8 0 129 2096 1 0 3081 3080 1659 7 0 134 2176 0 1 14835 14830 7912 8 0 135 2192 0 1 2769 2758 1491 7 0 137 2224 0 1 5082 5077 2772 6 0 140 2272 0 1 7236 7232 4020 5 0 144 2336 0 1 8428 8423 4816 4 0 147 2384 0 1 5316 5313 3101 7 0 151 2448 0 1 5445 5443 3267 3 0 155 2512 0 0 4121 4121 2536 8 0 158 2560 0 1 2208 2205 1380 5 0 160 2592 0 0 1133 1133 721 7 0 168 2720 0 0 2712 2712 1808 2 0 177 2864 1 0 1100 1098 770 7 0 180 2912 0 1 189 183 135 5 0 184 2976 0 1 176 166 128 8 0 190 3072 0 0 252 252 189 3 0 197 3184 0 1 198 192 154 7 0 202 3264 0 1 100 96 80 4 0 211 3408 0 1 210 208 175 5 0 217 3504 0 1 98 94 84 6 0 222 3584 0 0 104 104 91 7 0 225 3632 0 1 54 50 48 8 0 254 4096 0 0 33591 33591 33591 1 0 Note, the huge size watermark is above 3632 and there are a number of new normal classes available that previously were merged with the huge class. For instance, size class torvalds#211 holds 210 objects of size 3408 and uses 175 physical pages, while previously for those objects we would have used 210 physical pages. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
zsmalloc has 255 size classes. Size classes contain a number of zspages, which store objects of the same size. zspage can consist of up to four physical pages. The exact (most optimal) zspage size is calculated for each size class during zsmalloc pool creation. As a reasonable optimization, zsmalloc merges size classes that have similar characteristics: number of pages per zspage and number of objects zspage can store. For example, let's look at the following size classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable .. 94 1536 0 0 0 0 0 3 0 100 1632 0 0 0 0 0 2 0 .. Size classes torvalds#95-99 are merged with size class torvalds#100. That is, each time we store an object of size, say, 1568 bytes instead of using class torvalds#96 we end up storing it in size class torvalds#100. Class torvalds#100 is for objects of 1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes. Class torvalds#100 zspages consist of 2 physical pages and can hold 5 objects. When we need to store, say, 13 objects of size 1568 we end up allocating three zspages; in other words, 6 physical pages. However, if we'll look closer at size class torvalds#96 (which should hold objects of size 1568 bytes) and trace get_pages_per_zspage(): pages per zspage wasted bytes used% 1 960 76 2 352 95 3 1312 89 4 704 95 5 96 99 We'd notice that the most optimal zspage configuration for this class is when it consists of 5 physical pages, but currently we never let zspages to consists of more than 4 pages. A 5 page class torvalds#96 configuration would store 13 objects of size 1568 in a single zspage, allocating 5 physical pages, as opposed to 6 physical pages that class torvalds#100 will allocate. A higher order zspage for class torvalds#96 also changes its key characteristics: pages per-zspage and objects per-zspage. As a result classes torvalds#96 and torvalds#100 are not merged anymore, which gives us more compact zsmalloc. Of course the described effect does not apply only to size classes torvalds#96 and We still merge classes, but less often so. In other words classes are grouped in a more compact way, which decreases memory wastage: zspage order # unique size classes 2 69 3 123 4 191 Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zram0/classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 254 4096 0 0 0 0 0 1 0 ... For exactly same reason - maximum 4 pages per zspage - the last non-huge size class is torvalds#202, which stores objects of size 3264 bytes. Any object larger than 3264 bytes, hence, is considered to be huge and lands in size class torvalds#254, which uses a whole physical page to store every object. To put it slightly differently - objects in huge classes don't share physical pages. 3264 bytes is too low of a watermark and we have too many huge classes: classes from torvalds#203 to torvalds#254. Similarly to class size torvalds#96 above, higher order zspages change key characteristics for some of those huge size classes and thus those classes become normal classes, where stored objects share physical pages. Hence yet another consequence of higher order zspages: we move the huge size class watermark with higher order zspages, have less huge classes and store large objects in a more compact way. For order 3, huge class watermark becomes 3632 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 211 3408 0 0 0 0 0 5 0 217 3504 0 0 0 0 0 6 0 222 3584 0 0 0 0 0 7 0 225 3632 0 0 0 0 0 8 0 254 4096 0 0 0 0 0 1 0 ... For order 4, huge class watermark becomes 3840 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 206 3328 0 0 0 0 0 13 0 207 3344 0 0 0 0 0 9 0 208 3360 0 0 0 0 0 14 0 211 3408 0 0 0 0 0 5 0 212 3424 0 0 0 0 0 16 0 214 3456 0 0 0 0 0 11 0 217 3504 0 0 0 0 0 6 0 219 3536 0 0 0 0 0 13 0 222 3584 0 0 0 0 0 7 0 223 3600 0 0 0 0 0 15 0 225 3632 0 0 0 0 0 8 0 228 3680 0 0 0 0 0 9 0 230 3712 0 0 0 0 0 10 0 232 3744 0 0 0 0 0 11 0 234 3776 0 0 0 0 0 12 0 235 3792 0 0 0 0 0 13 0 236 3808 0 0 0 0 0 14 0 238 3840 0 0 0 0 0 15 0 254 4096 0 0 0 0 0 1 0 ... TESTS ===== Test untars linux-6.0.tar.xz and compiles the kernel. zram is configured as a block device with ext4 file system, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted the VM. orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) zspage 1691791360 628086729 655171584 0 655171584 60 0 34043 1691787264 628089196 655175680 0 655175680 60 0 34046 1691803648 628098840 655187968 0 655187968 59 0 34047 1691795456 628091503 655183872 0 655183872 60 0 34044 1691799552 628086877 655183872 0 655183872 60 0 34047 ORDER 3 zspage 1691803648 627792993 641794048 0 641794048 60 0 33591 1691787264 627779342 641708032 0 641708032 59 0 33591 1691811840 627786616 641769472 0 641769472 60 0 33591 1691803648 627794468 641818624 0 641818624 59 0 33592 1691783168 627780882 641794048 0 641794048 61 0 33591 ORDER 4 zspage 1691803648 627726635 639655936 0 639655936 60 0 33435 1691811840 627733348 639643648 0 639643648 61 0 33434 1691795456 627726290 639614976 0 639614976 60 0 33435 1691803648 627730458 639688704 0 639688704 60 0 33434 1691811840 627727771 639688704 0 639688704 60 0 33434 Order 3 and order 4 show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem + order-3-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.4170803e+08 6.4181862e+08 6.4179405e+08 6.4177684e+08 42210.666 Difference at 95.0% confidence -1.34038e+07 +/- 44080.7 -2.04581% +/- 0.00672802% (Student's t, pooled s = 30224.5) T-test for order 4: x order-2-maxmem + order-4-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.3961498e+08 6.396887e+08 6.3965594e+08 6.3965839e+08 31408.602 Difference at 95.0% confidence -1.55222e+07 +/- 33126.2 -2.36915% +/- 0.00505604% (Student's t, pooled s = 22713.4) This test tends to benefit more from order 4 zspages, due to test's data patterns. zsmalloc object distribution analysis ============================================================================= Order 2 (4 pages per zspage) tends to put many objects in size class 2048, which is merged with size classes torvalds#112-torvalds#125: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 0 6146 6146 1756 2 0 74 1216 0 1 4560 4552 1368 3 0 76 1248 0 1 2938 2934 904 4 0 83 1360 0 0 10971 10971 3657 1 0 91 1488 0 0 16126 16126 5864 4 0 94 1536 0 1 5912 5908 2217 3 0 100 1632 0 0 11990 11990 4796 2 0 107 1744 0 1 15771 15768 6759 3 0 111 1808 0 1 10386 10380 4616 4 0 126 2048 0 0 45444 45444 22722 1 0 144 2336 0 0 47446 47446 27112 4 0 151 2448 1 0 10760 10759 6456 3 0 168 2720 0 0 10173 10173 6782 2 0 190 3072 0 1 1700 1697 1275 3 0 202 3264 0 1 290 286 232 4 0 254 4096 0 0 34051 34051 34051 1 0 Order 3 (8 pages per zspage) changed pool characteristics and unmerged some of the size classes, which resulted in less objects being put into size class 2048, because there are lower size classes are now available for more compact object storage: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 1 2996 2994 856 2 0 72 1184 0 1 1632 1609 476 7 0 73 1200 1 0 1445 1442 425 5 0 74 1216 0 0 1510 1510 453 3 0 75 1232 0 1 1495 1479 455 7 0 76 1248 0 1 1456 1451 448 4 0 78 1280 0 1 3040 3033 950 5 0 79 1296 0 1 1584 1571 504 7 0 83 1360 0 0 6375 6375 2125 1 0 84 1376 0 1 1817 1796 632 8 0 87 1424 0 1 6020 6006 2107 7 0 88 1440 0 1 2108 2101 744 6 0 89 1456 0 1 2072 2064 740 5 0 91 1488 0 1 4169 4159 1516 4 0 92 1504 0 1 2014 2007 742 7 0 94 1536 0 1 3904 3900 1464 3 0 95 1552 0 1 1890 1873 720 8 0 96 1568 0 1 1963 1958 755 5 0 97 1584 0 1 1980 1974 770 7 0 100 1632 0 1 6190 6187 2476 2 0 103 1680 0 0 6477 6477 2667 7 0 104 1696 0 1 2256 2253 940 5 0 105 1712 0 1 2356 2340 992 8 0 107 1744 1 0 4697 4696 2013 3 0 110 1792 0 1 7744 7734 3388 7 0 111 1808 0 1 2655 2649 1180 4 0 114 1856 0 1 8371 8365 3805 5 0 116 1888 1 0 5863 5862 2706 6 0 117 1904 0 1 2955 2942 1379 7 0 118 1920 0 1 3009 2997 1416 8 0 126 2048 0 0 25276 25276 12638 1 0 128 2080 0 1 6060 6052 3232 8 0 129 2096 1 0 3081 3080 1659 7 0 134 2176 0 1 14835 14830 7912 8 0 135 2192 0 1 2769 2758 1491 7 0 137 2224 0 1 5082 5077 2772 6 0 140 2272 0 1 7236 7232 4020 5 0 144 2336 0 1 8428 8423 4816 4 0 147 2384 0 1 5316 5313 3101 7 0 151 2448 0 1 5445 5443 3267 3 0 155 2512 0 0 4121 4121 2536 8 0 158 2560 0 1 2208 2205 1380 5 0 160 2592 0 0 1133 1133 721 7 0 168 2720 0 0 2712 2712 1808 2 0 177 2864 1 0 1100 1098 770 7 0 180 2912 0 1 189 183 135 5 0 184 2976 0 1 176 166 128 8 0 190 3072 0 0 252 252 189 3 0 197 3184 0 1 198 192 154 7 0 202 3264 0 1 100 96 80 4 0 211 3408 0 1 210 208 175 5 0 217 3504 0 1 98 94 84 6 0 222 3584 0 0 104 104 91 7 0 225 3632 0 1 54 50 48 8 0 254 4096 0 0 33591 33591 33591 1 0 Note, the huge size watermark is above 3632 and there are a number of new normal classes available that previously were merged with the huge class. For instance, size class torvalds#211 holds 210 objects of size 3408 and uses 175 physical pages, while previously for those objects we would have used 210 physical pages. Link: https://lkml.kernel.org/r/20221027042651.234524-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zsmalloc has 255 size classes. Size classes contain a number of zspages, which store objects of the same size. zspage can consist of up to four physical pages. The exact (most optimal) zspage size is calculated for each size class during zsmalloc pool creation. As a reasonable optimization, zsmalloc merges size classes that have similar characteristics: number of pages per zspage and number of objects zspage can store. For example, let's look at the following size classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable .. 94 1536 0 0 0 0 0 3 0 100 1632 0 0 0 0 0 2 0 .. Size classes torvalds#95-99 are merged with size class torvalds#100. That is, each time we store an object of size, say, 1568 bytes instead of using class torvalds#96 we end up storing it in size class torvalds#100. Class torvalds#100 is for objects of 1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes. Class torvalds#100 zspages consist of 2 physical pages and can hold 5 objects. When we need to store, say, 13 objects of size 1568 we end up allocating three zspages; in other words, 6 physical pages. However, if we'll look closer at size class torvalds#96 (which should hold objects of size 1568 bytes) and trace get_pages_per_zspage(): pages per zspage wasted bytes used% 1 960 76 2 352 95 3 1312 89 4 704 95 5 96 99 We'd notice that the most optimal zspage configuration for this class is when it consists of 5 physical pages, but currently we never let zspages to consists of more than 4 pages. A 5 page class torvalds#96 configuration would store 13 objects of size 1568 in a single zspage, allocating 5 physical pages, as opposed to 6 physical pages that class torvalds#100 will allocate. A higher order zspage for class torvalds#96 also changes its key characteristics: pages per-zspage and objects per-zspage. As a result classes torvalds#96 and torvalds#100 are not merged anymore, which gives us more compact zsmalloc. Of course the described effect does not apply only to size classes torvalds#96 and We still merge classes, but less often so. In other words classes are grouped in a more compact way, which decreases memory wastage: zspage order # unique size classes 2 69 3 123 4 191 Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zram0/classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 254 4096 0 0 0 0 0 1 0 ... For exactly same reason - maximum 4 pages per zspage - the last non-huge size class is torvalds#202, which stores objects of size 3264 bytes. Any object larger than 3264 bytes, hence, is considered to be huge and lands in size class torvalds#254, which uses a whole physical page to store every object. To put it slightly differently - objects in huge classes don't share physical pages. 3264 bytes is too low of a watermark and we have too many huge classes: classes from torvalds#203 to torvalds#254. Similarly to class size torvalds#96 above, higher order zspages change key characteristics for some of those huge size classes and thus those classes become normal classes, where stored objects share physical pages. Hence yet another consequence of higher order zspages: we move the huge size class watermark with higher order zspages, have less huge classes and store large objects in a more compact way. For order 3, huge class watermark becomes 3632 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 211 3408 0 0 0 0 0 5 0 217 3504 0 0 0 0 0 6 0 222 3584 0 0 0 0 0 7 0 225 3632 0 0 0 0 0 8 0 254 4096 0 0 0 0 0 1 0 ... For order 4, huge class watermark becomes 3840 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 206 3328 0 0 0 0 0 13 0 207 3344 0 0 0 0 0 9 0 208 3360 0 0 0 0 0 14 0 211 3408 0 0 0 0 0 5 0 212 3424 0 0 0 0 0 16 0 214 3456 0 0 0 0 0 11 0 217 3504 0 0 0 0 0 6 0 219 3536 0 0 0 0 0 13 0 222 3584 0 0 0 0 0 7 0 223 3600 0 0 0 0 0 15 0 225 3632 0 0 0 0 0 8 0 228 3680 0 0 0 0 0 9 0 230 3712 0 0 0 0 0 10 0 232 3744 0 0 0 0 0 11 0 234 3776 0 0 0 0 0 12 0 235 3792 0 0 0 0 0 13 0 236 3808 0 0 0 0 0 14 0 238 3840 0 0 0 0 0 15 0 254 4096 0 0 0 0 0 1 0 ... TESTS ===== Test untars linux-6.0.tar.xz and compiles the kernel. zram is configured as a block device with ext4 file system, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted the VM. orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) zspage 1691791360 628086729 655171584 0 655171584 60 0 34043 1691787264 628089196 655175680 0 655175680 60 0 34046 1691803648 628098840 655187968 0 655187968 59 0 34047 1691795456 628091503 655183872 0 655183872 60 0 34044 1691799552 628086877 655183872 0 655183872 60 0 34047 ORDER 3 zspage 1691803648 627792993 641794048 0 641794048 60 0 33591 1691787264 627779342 641708032 0 641708032 59 0 33591 1691811840 627786616 641769472 0 641769472 60 0 33591 1691803648 627794468 641818624 0 641818624 59 0 33592 1691783168 627780882 641794048 0 641794048 61 0 33591 ORDER 4 zspage 1691803648 627726635 639655936 0 639655936 60 0 33435 1691811840 627733348 639643648 0 639643648 61 0 33434 1691795456 627726290 639614976 0 639614976 60 0 33435 1691803648 627730458 639688704 0 639688704 60 0 33434 1691811840 627727771 639688704 0 639688704 60 0 33434 Order 3 and order 4 show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem + order-3-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.4170803e+08 6.4181862e+08 6.4179405e+08 6.4177684e+08 42210.666 Difference at 95.0% confidence -1.34038e+07 +/- 44080.7 -2.04581% +/- 0.00672802% (Student's t, pooled s = 30224.5) T-test for order 4: x order-2-maxmem + order-4-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.3961498e+08 6.396887e+08 6.3965594e+08 6.3965839e+08 31408.602 Difference at 95.0% confidence -1.55222e+07 +/- 33126.2 -2.36915% +/- 0.00505604% (Student's t, pooled s = 22713.4) This test tends to benefit more from order 4 zspages, due to test's data patterns. zsmalloc object distribution analysis ============================================================================= Order 2 (4 pages per zspage) tends to put many objects in size class 2048, which is merged with size classes torvalds#112-torvalds#125: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 0 6146 6146 1756 2 0 74 1216 0 1 4560 4552 1368 3 0 76 1248 0 1 2938 2934 904 4 0 83 1360 0 0 10971 10971 3657 1 0 91 1488 0 0 16126 16126 5864 4 0 94 1536 0 1 5912 5908 2217 3 0 100 1632 0 0 11990 11990 4796 2 0 107 1744 0 1 15771 15768 6759 3 0 111 1808 0 1 10386 10380 4616 4 0 126 2048 0 0 45444 45444 22722 1 0 144 2336 0 0 47446 47446 27112 4 0 151 2448 1 0 10760 10759 6456 3 0 168 2720 0 0 10173 10173 6782 2 0 190 3072 0 1 1700 1697 1275 3 0 202 3264 0 1 290 286 232 4 0 254 4096 0 0 34051 34051 34051 1 0 Order 3 (8 pages per zspage) changed pool characteristics and unmerged some of the size classes, which resulted in less objects being put into size class 2048, because there are lower size classes are now available for more compact object storage: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 1 2996 2994 856 2 0 72 1184 0 1 1632 1609 476 7 0 73 1200 1 0 1445 1442 425 5 0 74 1216 0 0 1510 1510 453 3 0 75 1232 0 1 1495 1479 455 7 0 76 1248 0 1 1456 1451 448 4 0 78 1280 0 1 3040 3033 950 5 0 79 1296 0 1 1584 1571 504 7 0 83 1360 0 0 6375 6375 2125 1 0 84 1376 0 1 1817 1796 632 8 0 87 1424 0 1 6020 6006 2107 7 0 88 1440 0 1 2108 2101 744 6 0 89 1456 0 1 2072 2064 740 5 0 91 1488 0 1 4169 4159 1516 4 0 92 1504 0 1 2014 2007 742 7 0 94 1536 0 1 3904 3900 1464 3 0 95 1552 0 1 1890 1873 720 8 0 96 1568 0 1 1963 1958 755 5 0 97 1584 0 1 1980 1974 770 7 0 100 1632 0 1 6190 6187 2476 2 0 103 1680 0 0 6477 6477 2667 7 0 104 1696 0 1 2256 2253 940 5 0 105 1712 0 1 2356 2340 992 8 0 107 1744 1 0 4697 4696 2013 3 0 110 1792 0 1 7744 7734 3388 7 0 111 1808 0 1 2655 2649 1180 4 0 114 1856 0 1 8371 8365 3805 5 0 116 1888 1 0 5863 5862 2706 6 0 117 1904 0 1 2955 2942 1379 7 0 118 1920 0 1 3009 2997 1416 8 0 126 2048 0 0 25276 25276 12638 1 0 128 2080 0 1 6060 6052 3232 8 0 129 2096 1 0 3081 3080 1659 7 0 134 2176 0 1 14835 14830 7912 8 0 135 2192 0 1 2769 2758 1491 7 0 137 2224 0 1 5082 5077 2772 6 0 140 2272 0 1 7236 7232 4020 5 0 144 2336 0 1 8428 8423 4816 4 0 147 2384 0 1 5316 5313 3101 7 0 151 2448 0 1 5445 5443 3267 3 0 155 2512 0 0 4121 4121 2536 8 0 158 2560 0 1 2208 2205 1380 5 0 160 2592 0 0 1133 1133 721 7 0 168 2720 0 0 2712 2712 1808 2 0 177 2864 1 0 1100 1098 770 7 0 180 2912 0 1 189 183 135 5 0 184 2976 0 1 176 166 128 8 0 190 3072 0 0 252 252 189 3 0 197 3184 0 1 198 192 154 7 0 202 3264 0 1 100 96 80 4 0 211 3408 0 1 210 208 175 5 0 217 3504 0 1 98 94 84 6 0 222 3584 0 0 104 104 91 7 0 225 3632 0 1 54 50 48 8 0 254 4096 0 0 33591 33591 33591 1 0 Note, the huge size watermark is above 3632 and there are a number of new normal classes available that previously were merged with the huge class. For instance, size class torvalds#211 holds 210 objects of size 3408 and uses 175 physical pages, while previously for those objects we would have used 210 physical pages. Link: https://lkml.kernel.org/r/20221027042651.234524-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zsmalloc has 255 size classes. Size classes contain a number of zspages, which store objects of the same size. zspage can consist of up to four physical pages. The exact (most optimal) zspage size is calculated for each size class during zsmalloc pool creation. As a reasonable optimization, zsmalloc merges size classes that have similar characteristics: number of pages per zspage and number of objects zspage can store. For example, let's look at the following size classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable .. 94 1536 0 0 0 0 0 3 0 100 1632 0 0 0 0 0 2 0 .. Size classes torvalds#95-99 are merged with size class torvalds#100. That is, each time we store an object of size, say, 1568 bytes instead of using class torvalds#96 we end up storing it in size class torvalds#100. Class torvalds#100 is for objects of 1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes. Class torvalds#100 zspages consist of 2 physical pages and can hold 5 objects. When we need to store, say, 13 objects of size 1568 we end up allocating three zspages; in other words, 6 physical pages. However, if we'll look closer at size class torvalds#96 (which should hold objects of size 1568 bytes) and trace get_pages_per_zspage(): pages per zspage wasted bytes used% 1 960 76 2 352 95 3 1312 89 4 704 95 5 96 99 We'd notice that the most optimal zspage configuration for this class is when it consists of 5 physical pages, but currently we never let zspages to consists of more than 4 pages. A 5 page class torvalds#96 configuration would store 13 objects of size 1568 in a single zspage, allocating 5 physical pages, as opposed to 6 physical pages that class torvalds#100 will allocate. A higher order zspage for class torvalds#96 also changes its key characteristics: pages per-zspage and objects per-zspage. As a result classes torvalds#96 and torvalds#100 are not merged anymore, which gives us more compact zsmalloc. Of course the described effect does not apply only to size classes torvalds#96 and We still merge classes, but less often so. In other words classes are grouped in a more compact way, which decreases memory wastage: zspage order # unique size classes 2 69 3 123 4 191 Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zram0/classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 254 4096 0 0 0 0 0 1 0 ... For exactly same reason - maximum 4 pages per zspage - the last non-huge size class is torvalds#202, which stores objects of size 3264 bytes. Any object larger than 3264 bytes, hence, is considered to be huge and lands in size class torvalds#254, which uses a whole physical page to store every object. To put it slightly differently - objects in huge classes don't share physical pages. 3264 bytes is too low of a watermark and we have too many huge classes: classes from torvalds#203 to torvalds#254. Similarly to class size torvalds#96 above, higher order zspages change key characteristics for some of those huge size classes and thus those classes become normal classes, where stored objects share physical pages. Hence yet another consequence of higher order zspages: we move the huge size class watermark with higher order zspages, have less huge classes and store large objects in a more compact way. For order 3, huge class watermark becomes 3632 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 211 3408 0 0 0 0 0 5 0 217 3504 0 0 0 0 0 6 0 222 3584 0 0 0 0 0 7 0 225 3632 0 0 0 0 0 8 0 254 4096 0 0 0 0 0 1 0 ... For order 4, huge class watermark becomes 3840 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 206 3328 0 0 0 0 0 13 0 207 3344 0 0 0 0 0 9 0 208 3360 0 0 0 0 0 14 0 211 3408 0 0 0 0 0 5 0 212 3424 0 0 0 0 0 16 0 214 3456 0 0 0 0 0 11 0 217 3504 0 0 0 0 0 6 0 219 3536 0 0 0 0 0 13 0 222 3584 0 0 0 0 0 7 0 223 3600 0 0 0 0 0 15 0 225 3632 0 0 0 0 0 8 0 228 3680 0 0 0 0 0 9 0 230 3712 0 0 0 0 0 10 0 232 3744 0 0 0 0 0 11 0 234 3776 0 0 0 0 0 12 0 235 3792 0 0 0 0 0 13 0 236 3808 0 0 0 0 0 14 0 238 3840 0 0 0 0 0 15 0 254 4096 0 0 0 0 0 1 0 ... TESTS ===== Test untars linux-6.0.tar.xz and compiles the kernel. zram is configured as a block device with ext4 file system, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted the VM. orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) zspage 1691791360 628086729 655171584 0 655171584 60 0 34043 1691787264 628089196 655175680 0 655175680 60 0 34046 1691803648 628098840 655187968 0 655187968 59 0 34047 1691795456 628091503 655183872 0 655183872 60 0 34044 1691799552 628086877 655183872 0 655183872 60 0 34047 ORDER 3 zspage 1691803648 627792993 641794048 0 641794048 60 0 33591 1691787264 627779342 641708032 0 641708032 59 0 33591 1691811840 627786616 641769472 0 641769472 60 0 33591 1691803648 627794468 641818624 0 641818624 59 0 33592 1691783168 627780882 641794048 0 641794048 61 0 33591 ORDER 4 zspage 1691803648 627726635 639655936 0 639655936 60 0 33435 1691811840 627733348 639643648 0 639643648 61 0 33434 1691795456 627726290 639614976 0 639614976 60 0 33435 1691803648 627730458 639688704 0 639688704 60 0 33434 1691811840 627727771 639688704 0 639688704 60 0 33434 Order 3 and order 4 show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem + order-3-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.4170803e+08 6.4181862e+08 6.4179405e+08 6.4177684e+08 42210.666 Difference at 95.0% confidence -1.34038e+07 +/- 44080.7 -2.04581% +/- 0.00672802% (Student's t, pooled s = 30224.5) T-test for order 4: x order-2-maxmem + order-4-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.3961498e+08 6.396887e+08 6.3965594e+08 6.3965839e+08 31408.602 Difference at 95.0% confidence -1.55222e+07 +/- 33126.2 -2.36915% +/- 0.00505604% (Student's t, pooled s = 22713.4) This test tends to benefit more from order 4 zspages, due to test's data patterns. zsmalloc object distribution analysis ============================================================================= Order 2 (4 pages per zspage) tends to put many objects in size class 2048, which is merged with size classes torvalds#112-torvalds#125: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 0 6146 6146 1756 2 0 74 1216 0 1 4560 4552 1368 3 0 76 1248 0 1 2938 2934 904 4 0 83 1360 0 0 10971 10971 3657 1 0 91 1488 0 0 16126 16126 5864 4 0 94 1536 0 1 5912 5908 2217 3 0 100 1632 0 0 11990 11990 4796 2 0 107 1744 0 1 15771 15768 6759 3 0 111 1808 0 1 10386 10380 4616 4 0 126 2048 0 0 45444 45444 22722 1 0 144 2336 0 0 47446 47446 27112 4 0 151 2448 1 0 10760 10759 6456 3 0 168 2720 0 0 10173 10173 6782 2 0 190 3072 0 1 1700 1697 1275 3 0 202 3264 0 1 290 286 232 4 0 254 4096 0 0 34051 34051 34051 1 0 Order 3 (8 pages per zspage) changed pool characteristics and unmerged some of the size classes, which resulted in less objects being put into size class 2048, because there are lower size classes are now available for more compact object storage: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 1 2996 2994 856 2 0 72 1184 0 1 1632 1609 476 7 0 73 1200 1 0 1445 1442 425 5 0 74 1216 0 0 1510 1510 453 3 0 75 1232 0 1 1495 1479 455 7 0 76 1248 0 1 1456 1451 448 4 0 78 1280 0 1 3040 3033 950 5 0 79 1296 0 1 1584 1571 504 7 0 83 1360 0 0 6375 6375 2125 1 0 84 1376 0 1 1817 1796 632 8 0 87 1424 0 1 6020 6006 2107 7 0 88 1440 0 1 2108 2101 744 6 0 89 1456 0 1 2072 2064 740 5 0 91 1488 0 1 4169 4159 1516 4 0 92 1504 0 1 2014 2007 742 7 0 94 1536 0 1 3904 3900 1464 3 0 95 1552 0 1 1890 1873 720 8 0 96 1568 0 1 1963 1958 755 5 0 97 1584 0 1 1980 1974 770 7 0 100 1632 0 1 6190 6187 2476 2 0 103 1680 0 0 6477 6477 2667 7 0 104 1696 0 1 2256 2253 940 5 0 105 1712 0 1 2356 2340 992 8 0 107 1744 1 0 4697 4696 2013 3 0 110 1792 0 1 7744 7734 3388 7 0 111 1808 0 1 2655 2649 1180 4 0 114 1856 0 1 8371 8365 3805 5 0 116 1888 1 0 5863 5862 2706 6 0 117 1904 0 1 2955 2942 1379 7 0 118 1920 0 1 3009 2997 1416 8 0 126 2048 0 0 25276 25276 12638 1 0 128 2080 0 1 6060 6052 3232 8 0 129 2096 1 0 3081 3080 1659 7 0 134 2176 0 1 14835 14830 7912 8 0 135 2192 0 1 2769 2758 1491 7 0 137 2224 0 1 5082 5077 2772 6 0 140 2272 0 1 7236 7232 4020 5 0 144 2336 0 1 8428 8423 4816 4 0 147 2384 0 1 5316 5313 3101 7 0 151 2448 0 1 5445 5443 3267 3 0 155 2512 0 0 4121 4121 2536 8 0 158 2560 0 1 2208 2205 1380 5 0 160 2592 0 0 1133 1133 721 7 0 168 2720 0 0 2712 2712 1808 2 0 177 2864 1 0 1100 1098 770 7 0 180 2912 0 1 189 183 135 5 0 184 2976 0 1 176 166 128 8 0 190 3072 0 0 252 252 189 3 0 197 3184 0 1 198 192 154 7 0 202 3264 0 1 100 96 80 4 0 211 3408 0 1 210 208 175 5 0 217 3504 0 1 98 94 84 6 0 222 3584 0 0 104 104 91 7 0 225 3632 0 1 54 50 48 8 0 254 4096 0 0 33591 33591 33591 1 0 Note, the huge size watermark is above 3632 and there are a number of new normal classes available that previously were merged with the huge class. For instance, size class torvalds#211 holds 210 objects of size 3408 and uses 175 physical pages, while previously for those objects we would have used 210 physical pages. Link: https://lkml.kernel.org/r/20221031054108.541190-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zsmalloc has 255 size classes. Size classes contain a number of zspages, which store objects of the same size. zspage can consist of up to four physical pages. The exact (most optimal) zspage size is calculated for each size class during zsmalloc pool creation. As a reasonable optimization, zsmalloc merges size classes that have similar characteristics: number of pages per zspage and number of objects zspage can store. For example, let's look at the following size classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable .. 94 1536 0 0 0 0 0 3 0 100 1632 0 0 0 0 0 2 0 .. Size classes torvalds#95-99 are merged with size class torvalds#100. That is, each time we store an object of size, say, 1568 bytes instead of using class torvalds#96 we end up storing it in size class torvalds#100. Class torvalds#100 is for objects of 1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes. Class torvalds#100 zspages consist of 2 physical pages and can hold 5 objects. When we need to store, say, 13 objects of size 1568 we end up allocating three zspages; in other words, 6 physical pages. However, if we'll look closer at size class torvalds#96 (which should hold objects of size 1568 bytes) and trace get_pages_per_zspage(): pages per zspage wasted bytes used% 1 960 76 2 352 95 3 1312 89 4 704 95 5 96 99 We'd notice that the most optimal zspage configuration for this class is when it consists of 5 physical pages, but currently we never let zspages to consists of more than 4 pages. A 5 page class torvalds#96 configuration would store 13 objects of size 1568 in a single zspage, allocating 5 physical pages, as opposed to 6 physical pages that class torvalds#100 will allocate. A higher order zspage for class torvalds#96 also changes its key characteristics: pages per-zspage and objects per-zspage. As a result classes torvalds#96 and torvalds#100 are not merged anymore, which gives us more compact zsmalloc. Of course the described effect does not apply only to size classes torvalds#96 and We still merge classes, but less often so. In other words classes are grouped in a more compact way, which decreases memory wastage: zspage order # unique size classes 2 69 3 123 4 191 Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zram0/classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 254 4096 0 0 0 0 0 1 0 ... For exactly same reason - maximum 4 pages per zspage - the last non-huge size class is torvalds#202, which stores objects of size 3264 bytes. Any object larger than 3264 bytes, hence, is considered to be huge and lands in size class torvalds#254, which uses a whole physical page to store every object. To put it slightly differently - objects in huge classes don't share physical pages. 3264 bytes is too low of a watermark and we have too many huge classes: classes from torvalds#203 to torvalds#254. Similarly to class size torvalds#96 above, higher order zspages change key characteristics for some of those huge size classes and thus those classes become normal classes, where stored objects share physical pages. Hence yet another consequence of higher order zspages: we move the huge size class watermark with higher order zspages, have less huge classes and store large objects in a more compact way. For order 3, huge class watermark becomes 3632 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 211 3408 0 0 0 0 0 5 0 217 3504 0 0 0 0 0 6 0 222 3584 0 0 0 0 0 7 0 225 3632 0 0 0 0 0 8 0 254 4096 0 0 0 0 0 1 0 ... For order 4, huge class watermark becomes 3840 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 206 3328 0 0 0 0 0 13 0 207 3344 0 0 0 0 0 9 0 208 3360 0 0 0 0 0 14 0 211 3408 0 0 0 0 0 5 0 212 3424 0 0 0 0 0 16 0 214 3456 0 0 0 0 0 11 0 217 3504 0 0 0 0 0 6 0 219 3536 0 0 0 0 0 13 0 222 3584 0 0 0 0 0 7 0 223 3600 0 0 0 0 0 15 0 225 3632 0 0 0 0 0 8 0 228 3680 0 0 0 0 0 9 0 230 3712 0 0 0 0 0 10 0 232 3744 0 0 0 0 0 11 0 234 3776 0 0 0 0 0 12 0 235 3792 0 0 0 0 0 13 0 236 3808 0 0 0 0 0 14 0 238 3840 0 0 0 0 0 15 0 254 4096 0 0 0 0 0 1 0 ... TESTS ===== Test untars linux-6.0.tar.xz and compiles the kernel. zram is configured as a block device with ext4 file system, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted the VM. orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) zspage 1691791360 628086729 655171584 0 655171584 60 0 34043 1691787264 628089196 655175680 0 655175680 60 0 34046 1691803648 628098840 655187968 0 655187968 59 0 34047 1691795456 628091503 655183872 0 655183872 60 0 34044 1691799552 628086877 655183872 0 655183872 60 0 34047 ORDER 3 zspage 1691803648 627792993 641794048 0 641794048 60 0 33591 1691787264 627779342 641708032 0 641708032 59 0 33591 1691811840 627786616 641769472 0 641769472 60 0 33591 1691803648 627794468 641818624 0 641818624 59 0 33592 1691783168 627780882 641794048 0 641794048 61 0 33591 ORDER 4 zspage 1691803648 627726635 639655936 0 639655936 60 0 33435 1691811840 627733348 639643648 0 639643648 61 0 33434 1691795456 627726290 639614976 0 639614976 60 0 33435 1691803648 627730458 639688704 0 639688704 60 0 33434 1691811840 627727771 639688704 0 639688704 60 0 33434 Order 3 and order 4 show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem + order-3-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.4170803e+08 6.4181862e+08 6.4179405e+08 6.4177684e+08 42210.666 Difference at 95.0% confidence -1.34038e+07 +/- 44080.7 -2.04581% +/- 0.00672802% (Student's t, pooled s = 30224.5) T-test for order 4: x order-2-maxmem + order-4-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.3961498e+08 6.396887e+08 6.3965594e+08 6.3965839e+08 31408.602 Difference at 95.0% confidence -1.55222e+07 +/- 33126.2 -2.36915% +/- 0.00505604% (Student's t, pooled s = 22713.4) This test tends to benefit more from order 4 zspages, due to test's data patterns. zsmalloc object distribution analysis ============================================================================= Order 2 (4 pages per zspage) tends to put many objects in size class 2048, which is merged with size classes torvalds#112-torvalds#125: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 0 6146 6146 1756 2 0 74 1216 0 1 4560 4552 1368 3 0 76 1248 0 1 2938 2934 904 4 0 83 1360 0 0 10971 10971 3657 1 0 91 1488 0 0 16126 16126 5864 4 0 94 1536 0 1 5912 5908 2217 3 0 100 1632 0 0 11990 11990 4796 2 0 107 1744 0 1 15771 15768 6759 3 0 111 1808 0 1 10386 10380 4616 4 0 126 2048 0 0 45444 45444 22722 1 0 144 2336 0 0 47446 47446 27112 4 0 151 2448 1 0 10760 10759 6456 3 0 168 2720 0 0 10173 10173 6782 2 0 190 3072 0 1 1700 1697 1275 3 0 202 3264 0 1 290 286 232 4 0 254 4096 0 0 34051 34051 34051 1 0 Order 3 (8 pages per zspage) changed pool characteristics and unmerged some of the size classes, which resulted in less objects being put into size class 2048, because there are lower size classes are now available for more compact object storage: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 1 2996 2994 856 2 0 72 1184 0 1 1632 1609 476 7 0 73 1200 1 0 1445 1442 425 5 0 74 1216 0 0 1510 1510 453 3 0 75 1232 0 1 1495 1479 455 7 0 76 1248 0 1 1456 1451 448 4 0 78 1280 0 1 3040 3033 950 5 0 79 1296 0 1 1584 1571 504 7 0 83 1360 0 0 6375 6375 2125 1 0 84 1376 0 1 1817 1796 632 8 0 87 1424 0 1 6020 6006 2107 7 0 88 1440 0 1 2108 2101 744 6 0 89 1456 0 1 2072 2064 740 5 0 91 1488 0 1 4169 4159 1516 4 0 92 1504 0 1 2014 2007 742 7 0 94 1536 0 1 3904 3900 1464 3 0 95 1552 0 1 1890 1873 720 8 0 96 1568 0 1 1963 1958 755 5 0 97 1584 0 1 1980 1974 770 7 0 100 1632 0 1 6190 6187 2476 2 0 103 1680 0 0 6477 6477 2667 7 0 104 1696 0 1 2256 2253 940 5 0 105 1712 0 1 2356 2340 992 8 0 107 1744 1 0 4697 4696 2013 3 0 110 1792 0 1 7744 7734 3388 7 0 111 1808 0 1 2655 2649 1180 4 0 114 1856 0 1 8371 8365 3805 5 0 116 1888 1 0 5863 5862 2706 6 0 117 1904 0 1 2955 2942 1379 7 0 118 1920 0 1 3009 2997 1416 8 0 126 2048 0 0 25276 25276 12638 1 0 128 2080 0 1 6060 6052 3232 8 0 129 2096 1 0 3081 3080 1659 7 0 134 2176 0 1 14835 14830 7912 8 0 135 2192 0 1 2769 2758 1491 7 0 137 2224 0 1 5082 5077 2772 6 0 140 2272 0 1 7236 7232 4020 5 0 144 2336 0 1 8428 8423 4816 4 0 147 2384 0 1 5316 5313 3101 7 0 151 2448 0 1 5445 5443 3267 3 0 155 2512 0 0 4121 4121 2536 8 0 158 2560 0 1 2208 2205 1380 5 0 160 2592 0 0 1133 1133 721 7 0 168 2720 0 0 2712 2712 1808 2 0 177 2864 1 0 1100 1098 770 7 0 180 2912 0 1 189 183 135 5 0 184 2976 0 1 176 166 128 8 0 190 3072 0 0 252 252 189 3 0 197 3184 0 1 198 192 154 7 0 202 3264 0 1 100 96 80 4 0 211 3408 0 1 210 208 175 5 0 217 3504 0 1 98 94 84 6 0 222 3584 0 0 104 104 91 7 0 225 3632 0 1 54 50 48 8 0 254 4096 0 0 33591 33591 33591 1 0 Note, the huge size watermark is above 3632 and there are a number of new normal classes available that previously were merged with the huge class. For instance, size class torvalds#211 holds 210 objects of size 3408 and uses 175 physical pages, while previously for those objects we would have used 210 physical pages. Link: https://lkml.kernel.org/r/20221031054108.541190-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zsmalloc has 255 size classes. Size classes contain a number of zspages, which store objects of the same size. zspage can consist of up to four physical pages. The exact (most optimal) zspage size is calculated for each size class during zsmalloc pool creation. As a reasonable optimization, zsmalloc merges size classes that have similar characteristics: number of pages per zspage and number of objects zspage can store. For example, let's look at the following size classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable .. 94 1536 0 0 0 0 0 3 0 100 1632 0 0 0 0 0 2 0 .. Size classes torvalds#95-99 are merged with size class torvalds#100. That is, each time we store an object of size, say, 1568 bytes instead of using class torvalds#96 we end up storing it in size class torvalds#100. Class torvalds#100 is for objects of 1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes. Class torvalds#100 zspages consist of 2 physical pages and can hold 5 objects. When we need to store, say, 13 objects of size 1568 we end up allocating three zspages; in other words, 6 physical pages. However, if we'll look closer at size class torvalds#96 (which should hold objects of size 1568 bytes) and trace get_pages_per_zspage(): pages per zspage wasted bytes used% 1 960 76 2 352 95 3 1312 89 4 704 95 5 96 99 We'd notice that the most optimal zspage configuration for this class is when it consists of 5 physical pages, but currently we never let zspages to consists of more than 4 pages. A 5 page class torvalds#96 configuration would store 13 objects of size 1568 in a single zspage, allocating 5 physical pages, as opposed to 6 physical pages that class torvalds#100 will allocate. A higher order zspage for class torvalds#96 also changes its key characteristics: pages per-zspage and objects per-zspage. As a result classes torvalds#96 and torvalds#100 are not merged anymore, which gives us more compact zsmalloc. Of course the described effect does not apply only to size classes torvalds#96 and We still merge classes, but less often so. In other words classes are grouped in a more compact way, which decreases memory wastage: zspage order # unique size classes 2 69 3 123 4 191 Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zram0/classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 254 4096 0 0 0 0 0 1 0 ... For exactly same reason - maximum 4 pages per zspage - the last non-huge size class is torvalds#202, which stores objects of size 3264 bytes. Any object larger than 3264 bytes, hence, is considered to be huge and lands in size class torvalds#254, which uses a whole physical page to store every object. To put it slightly differently - objects in huge classes don't share physical pages. 3264 bytes is too low of a watermark and we have too many huge classes: classes from torvalds#203 to torvalds#254. Similarly to class size torvalds#96 above, higher order zspages change key characteristics for some of those huge size classes and thus those classes become normal classes, where stored objects share physical pages. Hence yet another consequence of higher order zspages: we move the huge size class watermark with higher order zspages, have less huge classes and store large objects in a more compact way. For order 3, huge class watermark becomes 3632 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 211 3408 0 0 0 0 0 5 0 217 3504 0 0 0 0 0 6 0 222 3584 0 0 0 0 0 7 0 225 3632 0 0 0 0 0 8 0 254 4096 0 0 0 0 0 1 0 ... For order 4, huge class watermark becomes 3840 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 206 3328 0 0 0 0 0 13 0 207 3344 0 0 0 0 0 9 0 208 3360 0 0 0 0 0 14 0 211 3408 0 0 0 0 0 5 0 212 3424 0 0 0 0 0 16 0 214 3456 0 0 0 0 0 11 0 217 3504 0 0 0 0 0 6 0 219 3536 0 0 0 0 0 13 0 222 3584 0 0 0 0 0 7 0 223 3600 0 0 0 0 0 15 0 225 3632 0 0 0 0 0 8 0 228 3680 0 0 0 0 0 9 0 230 3712 0 0 0 0 0 10 0 232 3744 0 0 0 0 0 11 0 234 3776 0 0 0 0 0 12 0 235 3792 0 0 0 0 0 13 0 236 3808 0 0 0 0 0 14 0 238 3840 0 0 0 0 0 15 0 254 4096 0 0 0 0 0 1 0 ... TESTS ===== Test untars linux-6.0.tar.xz and compiles the kernel. zram is configured as a block device with ext4 file system, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted the VM. orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) zspage 1691791360 628086729 655171584 0 655171584 60 0 34043 1691787264 628089196 655175680 0 655175680 60 0 34046 1691803648 628098840 655187968 0 655187968 59 0 34047 1691795456 628091503 655183872 0 655183872 60 0 34044 1691799552 628086877 655183872 0 655183872 60 0 34047 ORDER 3 zspage 1691803648 627792993 641794048 0 641794048 60 0 33591 1691787264 627779342 641708032 0 641708032 59 0 33591 1691811840 627786616 641769472 0 641769472 60 0 33591 1691803648 627794468 641818624 0 641818624 59 0 33592 1691783168 627780882 641794048 0 641794048 61 0 33591 ORDER 4 zspage 1691803648 627726635 639655936 0 639655936 60 0 33435 1691811840 627733348 639643648 0 639643648 61 0 33434 1691795456 627726290 639614976 0 639614976 60 0 33435 1691803648 627730458 639688704 0 639688704 60 0 33434 1691811840 627727771 639688704 0 639688704 60 0 33434 Order 3 and order 4 show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem + order-3-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.4170803e+08 6.4181862e+08 6.4179405e+08 6.4177684e+08 42210.666 Difference at 95.0% confidence -1.34038e+07 +/- 44080.7 -2.04581% +/- 0.00672802% (Student's t, pooled s = 30224.5) T-test for order 4: x order-2-maxmem + order-4-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.3961498e+08 6.396887e+08 6.3965594e+08 6.3965839e+08 31408.602 Difference at 95.0% confidence -1.55222e+07 +/- 33126.2 -2.36915% +/- 0.00505604% (Student's t, pooled s = 22713.4) This test tends to benefit more from order 4 zspages, due to test's data patterns. zsmalloc object distribution analysis ============================================================================= Order 2 (4 pages per zspage) tends to put many objects in size class 2048, which is merged with size classes torvalds#112-torvalds#125: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 0 6146 6146 1756 2 0 74 1216 0 1 4560 4552 1368 3 0 76 1248 0 1 2938 2934 904 4 0 83 1360 0 0 10971 10971 3657 1 0 91 1488 0 0 16126 16126 5864 4 0 94 1536 0 1 5912 5908 2217 3 0 100 1632 0 0 11990 11990 4796 2 0 107 1744 0 1 15771 15768 6759 3 0 111 1808 0 1 10386 10380 4616 4 0 126 2048 0 0 45444 45444 22722 1 0 144 2336 0 0 47446 47446 27112 4 0 151 2448 1 0 10760 10759 6456 3 0 168 2720 0 0 10173 10173 6782 2 0 190 3072 0 1 1700 1697 1275 3 0 202 3264 0 1 290 286 232 4 0 254 4096 0 0 34051 34051 34051 1 0 Order 3 (8 pages per zspage) changed pool characteristics and unmerged some of the size classes, which resulted in less objects being put into size class 2048, because there are lower size classes are now available for more compact object storage: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 1 2996 2994 856 2 0 72 1184 0 1 1632 1609 476 7 0 73 1200 1 0 1445 1442 425 5 0 74 1216 0 0 1510 1510 453 3 0 75 1232 0 1 1495 1479 455 7 0 76 1248 0 1 1456 1451 448 4 0 78 1280 0 1 3040 3033 950 5 0 79 1296 0 1 1584 1571 504 7 0 83 1360 0 0 6375 6375 2125 1 0 84 1376 0 1 1817 1796 632 8 0 87 1424 0 1 6020 6006 2107 7 0 88 1440 0 1 2108 2101 744 6 0 89 1456 0 1 2072 2064 740 5 0 91 1488 0 1 4169 4159 1516 4 0 92 1504 0 1 2014 2007 742 7 0 94 1536 0 1 3904 3900 1464 3 0 95 1552 0 1 1890 1873 720 8 0 96 1568 0 1 1963 1958 755 5 0 97 1584 0 1 1980 1974 770 7 0 100 1632 0 1 6190 6187 2476 2 0 103 1680 0 0 6477 6477 2667 7 0 104 1696 0 1 2256 2253 940 5 0 105 1712 0 1 2356 2340 992 8 0 107 1744 1 0 4697 4696 2013 3 0 110 1792 0 1 7744 7734 3388 7 0 111 1808 0 1 2655 2649 1180 4 0 114 1856 0 1 8371 8365 3805 5 0 116 1888 1 0 5863 5862 2706 6 0 117 1904 0 1 2955 2942 1379 7 0 118 1920 0 1 3009 2997 1416 8 0 126 2048 0 0 25276 25276 12638 1 0 128 2080 0 1 6060 6052 3232 8 0 129 2096 1 0 3081 3080 1659 7 0 134 2176 0 1 14835 14830 7912 8 0 135 2192 0 1 2769 2758 1491 7 0 137 2224 0 1 5082 5077 2772 6 0 140 2272 0 1 7236 7232 4020 5 0 144 2336 0 1 8428 8423 4816 4 0 147 2384 0 1 5316 5313 3101 7 0 151 2448 0 1 5445 5443 3267 3 0 155 2512 0 0 4121 4121 2536 8 0 158 2560 0 1 2208 2205 1380 5 0 160 2592 0 0 1133 1133 721 7 0 168 2720 0 0 2712 2712 1808 2 0 177 2864 1 0 1100 1098 770 7 0 180 2912 0 1 189 183 135 5 0 184 2976 0 1 176 166 128 8 0 190 3072 0 0 252 252 189 3 0 197 3184 0 1 198 192 154 7 0 202 3264 0 1 100 96 80 4 0 211 3408 0 1 210 208 175 5 0 217 3504 0 1 98 94 84 6 0 222 3584 0 0 104 104 91 7 0 225 3632 0 1 54 50 48 8 0 254 4096 0 0 33591 33591 33591 1 0 Note, the huge size watermark is above 3632 and there are a number of new normal classes available that previously were merged with the huge class. For instance, size class torvalds#211 holds 210 objects of size 3408 and uses 175 physical pages, while previously for those objects we would have used 210 physical pages. Link: https://lkml.kernel.org/r/20221031054108.541190-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zsmalloc has 255 size classes. Size classes contain a number of zspages, which store objects of the same size. zspage can consist of up to four physical pages. The exact (most optimal) zspage size is calculated for each size class during zsmalloc pool creation. As a reasonable optimization, zsmalloc merges size classes that have similar characteristics: number of pages per zspage and number of objects zspage can store. For example, let's look at the following size classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable .. 94 1536 0 0 0 0 0 3 0 100 1632 0 0 0 0 0 2 0 .. Size classes torvalds#95-99 are merged with size class torvalds#100. That is, each time we store an object of size, say, 1568 bytes instead of using class torvalds#96 we end up storing it in size class torvalds#100. Class torvalds#100 is for objects of 1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes. Class torvalds#100 zspages consist of 2 physical pages and can hold 5 objects. When we need to store, say, 13 objects of size 1568 we end up allocating three zspages; in other words, 6 physical pages. However, if we'll look closer at size class torvalds#96 (which should hold objects of size 1568 bytes) and trace get_pages_per_zspage(): pages per zspage wasted bytes used% 1 960 76 2 352 95 3 1312 89 4 704 95 5 96 99 We'd notice that the most optimal zspage configuration for this class is when it consists of 5 physical pages, but currently we never let zspages to consists of more than 4 pages. A 5 page class torvalds#96 configuration would store 13 objects of size 1568 in a single zspage, allocating 5 physical pages, as opposed to 6 physical pages that class torvalds#100 will allocate. A higher order zspage for class torvalds#96 also changes its key characteristics: pages per-zspage and objects per-zspage. As a result classes torvalds#96 and torvalds#100 are not merged anymore, which gives us more compact zsmalloc. Of course the described effect does not apply only to size classes torvalds#96 and We still merge classes, but less often so. In other words classes are grouped in a more compact way, which decreases memory wastage: zspage order # unique size classes 2 69 3 123 4 191 Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zram0/classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 254 4096 0 0 0 0 0 1 0 ... For exactly same reason - maximum 4 pages per zspage - the last non-huge size class is torvalds#202, which stores objects of size 3264 bytes. Any object larger than 3264 bytes, hence, is considered to be huge and lands in size class torvalds#254, which uses a whole physical page to store every object. To put it slightly differently - objects in huge classes don't share physical pages. 3264 bytes is too low of a watermark and we have too many huge classes: classes from torvalds#203 to torvalds#254. Similarly to class size torvalds#96 above, higher order zspages change key characteristics for some of those huge size classes and thus those classes become normal classes, where stored objects share physical pages. Hence yet another consequence of higher order zspages: we move the huge size class watermark with higher order zspages, have less huge classes and store large objects in a more compact way. For order 3, huge class watermark becomes 3632 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 211 3408 0 0 0 0 0 5 0 217 3504 0 0 0 0 0 6 0 222 3584 0 0 0 0 0 7 0 225 3632 0 0 0 0 0 8 0 254 4096 0 0 0 0 0 1 0 ... For order 4, huge class watermark becomes 3840 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 206 3328 0 0 0 0 0 13 0 207 3344 0 0 0 0 0 9 0 208 3360 0 0 0 0 0 14 0 211 3408 0 0 0 0 0 5 0 212 3424 0 0 0 0 0 16 0 214 3456 0 0 0 0 0 11 0 217 3504 0 0 0 0 0 6 0 219 3536 0 0 0 0 0 13 0 222 3584 0 0 0 0 0 7 0 223 3600 0 0 0 0 0 15 0 225 3632 0 0 0 0 0 8 0 228 3680 0 0 0 0 0 9 0 230 3712 0 0 0 0 0 10 0 232 3744 0 0 0 0 0 11 0 234 3776 0 0 0 0 0 12 0 235 3792 0 0 0 0 0 13 0 236 3808 0 0 0 0 0 14 0 238 3840 0 0 0 0 0 15 0 254 4096 0 0 0 0 0 1 0 ... TESTS ===== Test untars linux-6.0.tar.xz and compiles the kernel. zram is configured as a block device with ext4 file system, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted the VM. orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) zspage 1691791360 628086729 655171584 0 655171584 60 0 34043 1691787264 628089196 655175680 0 655175680 60 0 34046 1691803648 628098840 655187968 0 655187968 59 0 34047 1691795456 628091503 655183872 0 655183872 60 0 34044 1691799552 628086877 655183872 0 655183872 60 0 34047 ORDER 3 zspage 1691803648 627792993 641794048 0 641794048 60 0 33591 1691787264 627779342 641708032 0 641708032 59 0 33591 1691811840 627786616 641769472 0 641769472 60 0 33591 1691803648 627794468 641818624 0 641818624 59 0 33592 1691783168 627780882 641794048 0 641794048 61 0 33591 ORDER 4 zspage 1691803648 627726635 639655936 0 639655936 60 0 33435 1691811840 627733348 639643648 0 639643648 61 0 33434 1691795456 627726290 639614976 0 639614976 60 0 33435 1691803648 627730458 639688704 0 639688704 60 0 33434 1691811840 627727771 639688704 0 639688704 60 0 33434 Order 3 and order 4 show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem + order-3-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.4170803e+08 6.4181862e+08 6.4179405e+08 6.4177684e+08 42210.666 Difference at 95.0% confidence -1.34038e+07 +/- 44080.7 -2.04581% +/- 0.00672802% (Student's t, pooled s = 30224.5) T-test for order 4: x order-2-maxmem + order-4-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.3961498e+08 6.396887e+08 6.3965594e+08 6.3965839e+08 31408.602 Difference at 95.0% confidence -1.55222e+07 +/- 33126.2 -2.36915% +/- 0.00505604% (Student's t, pooled s = 22713.4) This test tends to benefit more from order 4 zspages, due to test's data patterns. zsmalloc object distribution analysis ============================================================================= Order 2 (4 pages per zspage) tends to put many objects in size class 2048, which is merged with size classes torvalds#112-torvalds#125: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 0 6146 6146 1756 2 0 74 1216 0 1 4560 4552 1368 3 0 76 1248 0 1 2938 2934 904 4 0 83 1360 0 0 10971 10971 3657 1 0 91 1488 0 0 16126 16126 5864 4 0 94 1536 0 1 5912 5908 2217 3 0 100 1632 0 0 11990 11990 4796 2 0 107 1744 0 1 15771 15768 6759 3 0 111 1808 0 1 10386 10380 4616 4 0 126 2048 0 0 45444 45444 22722 1 0 144 2336 0 0 47446 47446 27112 4 0 151 2448 1 0 10760 10759 6456 3 0 168 2720 0 0 10173 10173 6782 2 0 190 3072 0 1 1700 1697 1275 3 0 202 3264 0 1 290 286 232 4 0 254 4096 0 0 34051 34051 34051 1 0 Order 3 (8 pages per zspage) changed pool characteristics and unmerged some of the size classes, which resulted in less objects being put into size class 2048, because there are lower size classes are now available for more compact object storage: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 1 2996 2994 856 2 0 72 1184 0 1 1632 1609 476 7 0 73 1200 1 0 1445 1442 425 5 0 74 1216 0 0 1510 1510 453 3 0 75 1232 0 1 1495 1479 455 7 0 76 1248 0 1 1456 1451 448 4 0 78 1280 0 1 3040 3033 950 5 0 79 1296 0 1 1584 1571 504 7 0 83 1360 0 0 6375 6375 2125 1 0 84 1376 0 1 1817 1796 632 8 0 87 1424 0 1 6020 6006 2107 7 0 88 1440 0 1 2108 2101 744 6 0 89 1456 0 1 2072 2064 740 5 0 91 1488 0 1 4169 4159 1516 4 0 92 1504 0 1 2014 2007 742 7 0 94 1536 0 1 3904 3900 1464 3 0 95 1552 0 1 1890 1873 720 8 0 96 1568 0 1 1963 1958 755 5 0 97 1584 0 1 1980 1974 770 7 0 100 1632 0 1 6190 6187 2476 2 0 103 1680 0 0 6477 6477 2667 7 0 104 1696 0 1 2256 2253 940 5 0 105 1712 0 1 2356 2340 992 8 0 107 1744 1 0 4697 4696 2013 3 0 110 1792 0 1 7744 7734 3388 7 0 111 1808 0 1 2655 2649 1180 4 0 114 1856 0 1 8371 8365 3805 5 0 116 1888 1 0 5863 5862 2706 6 0 117 1904 0 1 2955 2942 1379 7 0 118 1920 0 1 3009 2997 1416 8 0 126 2048 0 0 25276 25276 12638 1 0 128 2080 0 1 6060 6052 3232 8 0 129 2096 1 0 3081 3080 1659 7 0 134 2176 0 1 14835 14830 7912 8 0 135 2192 0 1 2769 2758 1491 7 0 137 2224 0 1 5082 5077 2772 6 0 140 2272 0 1 7236 7232 4020 5 0 144 2336 0 1 8428 8423 4816 4 0 147 2384 0 1 5316 5313 3101 7 0 151 2448 0 1 5445 5443 3267 3 0 155 2512 0 0 4121 4121 2536 8 0 158 2560 0 1 2208 2205 1380 5 0 160 2592 0 0 1133 1133 721 7 0 168 2720 0 0 2712 2712 1808 2 0 177 2864 1 0 1100 1098 770 7 0 180 2912 0 1 189 183 135 5 0 184 2976 0 1 176 166 128 8 0 190 3072 0 0 252 252 189 3 0 197 3184 0 1 198 192 154 7 0 202 3264 0 1 100 96 80 4 0 211 3408 0 1 210 208 175 5 0 217 3504 0 1 98 94 84 6 0 222 3584 0 0 104 104 91 7 0 225 3632 0 1 54 50 48 8 0 254 4096 0 0 33591 33591 33591 1 0 Note, the huge size watermark is above 3632 and there are a number of new normal classes available that previously were merged with the huge class. For instance, size class torvalds#211 holds 210 objects of size 3408 and uses 175 physical pages, while previously for those objects we would have used 210 physical pages. Link: https://lkml.kernel.org/r/20221031054108.541190-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zsmalloc has 255 size classes. Size classes contain a number of zspages, which store objects of the same size. zspage can consist of up to four physical pages. The exact (most optimal) zspage size is calculated for each size class during zsmalloc pool creation. As a reasonable optimization, zsmalloc merges size classes that have similar characteristics: number of pages per zspage and number of objects zspage can store. For example, let's look at the following size classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable .. 94 1536 0 0 0 0 0 3 0 100 1632 0 0 0 0 0 2 0 .. Size classes torvalds#95-99 are merged with size class torvalds#100. That is, each time we store an object of size, say, 1568 bytes instead of using class torvalds#96 we end up storing it in size class torvalds#100. Class torvalds#100 is for objects of 1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes. Class torvalds#100 zspages consist of 2 physical pages and can hold 5 objects. When we need to store, say, 13 objects of size 1568 we end up allocating three zspages; in other words, 6 physical pages. However, if we'll look closer at size class torvalds#96 (which should hold objects of size 1568 bytes) and trace get_pages_per_zspage(): pages per zspage wasted bytes used% 1 960 76 2 352 95 3 1312 89 4 704 95 5 96 99 We'd notice that the most optimal zspage configuration for this class is when it consists of 5 physical pages, but currently we never let zspages to consists of more than 4 pages. A 5 page class torvalds#96 configuration would store 13 objects of size 1568 in a single zspage, allocating 5 physical pages, as opposed to 6 physical pages that class torvalds#100 will allocate. A higher order zspage for class torvalds#96 also changes its key characteristics: pages per-zspage and objects per-zspage. As a result classes torvalds#96 and torvalds#100 are not merged anymore, which gives us more compact zsmalloc. Of course the described effect does not apply only to size classes torvalds#96 and We still merge classes, but less often so. In other words classes are grouped in a more compact way, which decreases memory wastage: zspage order # unique size classes 2 69 3 123 4 191 Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zram0/classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 254 4096 0 0 0 0 0 1 0 ... For exactly same reason - maximum 4 pages per zspage - the last non-huge size class is torvalds#202, which stores objects of size 3264 bytes. Any object larger than 3264 bytes, hence, is considered to be huge and lands in size class torvalds#254, which uses a whole physical page to store every object. To put it slightly differently - objects in huge classes don't share physical pages. 3264 bytes is too low of a watermark and we have too many huge classes: classes from torvalds#203 to torvalds#254. Similarly to class size torvalds#96 above, higher order zspages change key characteristics for some of those huge size classes and thus those classes become normal classes, where stored objects share physical pages. Hence yet another consequence of higher order zspages: we move the huge size class watermark with higher order zspages, have less huge classes and store large objects in a more compact way. For order 3, huge class watermark becomes 3632 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 211 3408 0 0 0 0 0 5 0 217 3504 0 0 0 0 0 6 0 222 3584 0 0 0 0 0 7 0 225 3632 0 0 0 0 0 8 0 254 4096 0 0 0 0 0 1 0 ... For order 4, huge class watermark becomes 3840 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 206 3328 0 0 0 0 0 13 0 207 3344 0 0 0 0 0 9 0 208 3360 0 0 0 0 0 14 0 211 3408 0 0 0 0 0 5 0 212 3424 0 0 0 0 0 16 0 214 3456 0 0 0 0 0 11 0 217 3504 0 0 0 0 0 6 0 219 3536 0 0 0 0 0 13 0 222 3584 0 0 0 0 0 7 0 223 3600 0 0 0 0 0 15 0 225 3632 0 0 0 0 0 8 0 228 3680 0 0 0 0 0 9 0 230 3712 0 0 0 0 0 10 0 232 3744 0 0 0 0 0 11 0 234 3776 0 0 0 0 0 12 0 235 3792 0 0 0 0 0 13 0 236 3808 0 0 0 0 0 14 0 238 3840 0 0 0 0 0 15 0 254 4096 0 0 0 0 0 1 0 ... TESTS ===== Test untars linux-6.0.tar.xz and compiles the kernel. zram is configured as a block device with ext4 file system, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted the VM. orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) zspage 1691791360 628086729 655171584 0 655171584 60 0 34043 1691787264 628089196 655175680 0 655175680 60 0 34046 1691803648 628098840 655187968 0 655187968 59 0 34047 1691795456 628091503 655183872 0 655183872 60 0 34044 1691799552 628086877 655183872 0 655183872 60 0 34047 ORDER 3 zspage 1691803648 627792993 641794048 0 641794048 60 0 33591 1691787264 627779342 641708032 0 641708032 59 0 33591 1691811840 627786616 641769472 0 641769472 60 0 33591 1691803648 627794468 641818624 0 641818624 59 0 33592 1691783168 627780882 641794048 0 641794048 61 0 33591 ORDER 4 zspage 1691803648 627726635 639655936 0 639655936 60 0 33435 1691811840 627733348 639643648 0 639643648 61 0 33434 1691795456 627726290 639614976 0 639614976 60 0 33435 1691803648 627730458 639688704 0 639688704 60 0 33434 1691811840 627727771 639688704 0 639688704 60 0 33434 Order 3 and order 4 show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem + order-3-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.4170803e+08 6.4181862e+08 6.4179405e+08 6.4177684e+08 42210.666 Difference at 95.0% confidence -1.34038e+07 +/- 44080.7 -2.04581% +/- 0.00672802% (Student's t, pooled s = 30224.5) T-test for order 4: x order-2-maxmem + order-4-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.3961498e+08 6.396887e+08 6.3965594e+08 6.3965839e+08 31408.602 Difference at 95.0% confidence -1.55222e+07 +/- 33126.2 -2.36915% +/- 0.00505604% (Student's t, pooled s = 22713.4) This test tends to benefit more from order 4 zspages, due to test's data patterns. zsmalloc object distribution analysis ============================================================================= Order 2 (4 pages per zspage) tends to put many objects in size class 2048, which is merged with size classes torvalds#112-torvalds#125: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 0 6146 6146 1756 2 0 74 1216 0 1 4560 4552 1368 3 0 76 1248 0 1 2938 2934 904 4 0 83 1360 0 0 10971 10971 3657 1 0 91 1488 0 0 16126 16126 5864 4 0 94 1536 0 1 5912 5908 2217 3 0 100 1632 0 0 11990 11990 4796 2 0 107 1744 0 1 15771 15768 6759 3 0 111 1808 0 1 10386 10380 4616 4 0 126 2048 0 0 45444 45444 22722 1 0 144 2336 0 0 47446 47446 27112 4 0 151 2448 1 0 10760 10759 6456 3 0 168 2720 0 0 10173 10173 6782 2 0 190 3072 0 1 1700 1697 1275 3 0 202 3264 0 1 290 286 232 4 0 254 4096 0 0 34051 34051 34051 1 0 Order 3 (8 pages per zspage) changed pool characteristics and unmerged some of the size classes, which resulted in less objects being put into size class 2048, because there are lower size classes are now available for more compact object storage: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 1 2996 2994 856 2 0 72 1184 0 1 1632 1609 476 7 0 73 1200 1 0 1445 1442 425 5 0 74 1216 0 0 1510 1510 453 3 0 75 1232 0 1 1495 1479 455 7 0 76 1248 0 1 1456 1451 448 4 0 78 1280 0 1 3040 3033 950 5 0 79 1296 0 1 1584 1571 504 7 0 83 1360 0 0 6375 6375 2125 1 0 84 1376 0 1 1817 1796 632 8 0 87 1424 0 1 6020 6006 2107 7 0 88 1440 0 1 2108 2101 744 6 0 89 1456 0 1 2072 2064 740 5 0 91 1488 0 1 4169 4159 1516 4 0 92 1504 0 1 2014 2007 742 7 0 94 1536 0 1 3904 3900 1464 3 0 95 1552 0 1 1890 1873 720 8 0 96 1568 0 1 1963 1958 755 5 0 97 1584 0 1 1980 1974 770 7 0 100 1632 0 1 6190 6187 2476 2 0 103 1680 0 0 6477 6477 2667 7 0 104 1696 0 1 2256 2253 940 5 0 105 1712 0 1 2356 2340 992 8 0 107 1744 1 0 4697 4696 2013 3 0 110 1792 0 1 7744 7734 3388 7 0 111 1808 0 1 2655 2649 1180 4 0 114 1856 0 1 8371 8365 3805 5 0 116 1888 1 0 5863 5862 2706 6 0 117 1904 0 1 2955 2942 1379 7 0 118 1920 0 1 3009 2997 1416 8 0 126 2048 0 0 25276 25276 12638 1 0 128 2080 0 1 6060 6052 3232 8 0 129 2096 1 0 3081 3080 1659 7 0 134 2176 0 1 14835 14830 7912 8 0 135 2192 0 1 2769 2758 1491 7 0 137 2224 0 1 5082 5077 2772 6 0 140 2272 0 1 7236 7232 4020 5 0 144 2336 0 1 8428 8423 4816 4 0 147 2384 0 1 5316 5313 3101 7 0 151 2448 0 1 5445 5443 3267 3 0 155 2512 0 0 4121 4121 2536 8 0 158 2560 0 1 2208 2205 1380 5 0 160 2592 0 0 1133 1133 721 7 0 168 2720 0 0 2712 2712 1808 2 0 177 2864 1 0 1100 1098 770 7 0 180 2912 0 1 189 183 135 5 0 184 2976 0 1 176 166 128 8 0 190 3072 0 0 252 252 189 3 0 197 3184 0 1 198 192 154 7 0 202 3264 0 1 100 96 80 4 0 211 3408 0 1 210 208 175 5 0 217 3504 0 1 98 94 84 6 0 222 3584 0 0 104 104 91 7 0 225 3632 0 1 54 50 48 8 0 254 4096 0 0 33591 33591 33591 1 0 Note, the huge size watermark is above 3632 and there are a number of new normal classes available that previously were merged with the huge class. For instance, size class torvalds#211 holds 210 objects of size 3408 and uses 175 physical pages, while previously for those objects we would have used 210 physical pages. Link: https://lkml.kernel.org/r/20221031054108.541190-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zsmalloc has 255 size classes. Size classes contain a number of zspages, which store objects of the same size. zspage can consist of up to four physical pages. The exact (most optimal) zspage size is calculated for each size class during zsmalloc pool creation. As a reasonable optimization, zsmalloc merges size classes that have similar characteristics: number of pages per zspage and number of objects zspage can store. For example, let's look at the following size classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable .. 94 1536 0 0 0 0 0 3 0 100 1632 0 0 0 0 0 2 0 .. Size classes torvalds#95-99 are merged with size class torvalds#100. That is, each time we store an object of size, say, 1568 bytes instead of using class torvalds#96 we end up storing it in size class torvalds#100. Class torvalds#100 is for objects of 1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes. Class torvalds#100 zspages consist of 2 physical pages and can hold 5 objects. When we need to store, say, 13 objects of size 1568 we end up allocating three zspages; in other words, 6 physical pages. However, if we'll look closer at size class torvalds#96 (which should hold objects of size 1568 bytes) and trace get_pages_per_zspage(): pages per zspage wasted bytes used% 1 960 76 2 352 95 3 1312 89 4 704 95 5 96 99 We'd notice that the most optimal zspage configuration for this class is when it consists of 5 physical pages, but currently we never let zspages to consists of more than 4 pages. A 5 page class torvalds#96 configuration would store 13 objects of size 1568 in a single zspage, allocating 5 physical pages, as opposed to 6 physical pages that class torvalds#100 will allocate. A higher order zspage for class torvalds#96 also changes its key characteristics: pages per-zspage and objects per-zspage. As a result classes torvalds#96 and torvalds#100 are not merged anymore, which gives us more compact zsmalloc. Of course the described effect does not apply only to size classes torvalds#96 and We still merge classes, but less often so. In other words classes are grouped in a more compact way, which decreases memory wastage: zspage order # unique size classes 2 69 3 123 4 191 Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zram0/classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 254 4096 0 0 0 0 0 1 0 ... For exactly same reason - maximum 4 pages per zspage - the last non-huge size class is torvalds#202, which stores objects of size 3264 bytes. Any object larger than 3264 bytes, hence, is considered to be huge and lands in size class torvalds#254, which uses a whole physical page to store every object. To put it slightly differently - objects in huge classes don't share physical pages. 3264 bytes is too low of a watermark and we have too many huge classes: classes from torvalds#203 to torvalds#254. Similarly to class size torvalds#96 above, higher order zspages change key characteristics for some of those huge size classes and thus those classes become normal classes, where stored objects share physical pages. Hence yet another consequence of higher order zspages: we move the huge size class watermark with higher order zspages, have less huge classes and store large objects in a more compact way. For order 3, huge class watermark becomes 3632 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 211 3408 0 0 0 0 0 5 0 217 3504 0 0 0 0 0 6 0 222 3584 0 0 0 0 0 7 0 225 3632 0 0 0 0 0 8 0 254 4096 0 0 0 0 0 1 0 ... For order 4, huge class watermark becomes 3840 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 206 3328 0 0 0 0 0 13 0 207 3344 0 0 0 0 0 9 0 208 3360 0 0 0 0 0 14 0 211 3408 0 0 0 0 0 5 0 212 3424 0 0 0 0 0 16 0 214 3456 0 0 0 0 0 11 0 217 3504 0 0 0 0 0 6 0 219 3536 0 0 0 0 0 13 0 222 3584 0 0 0 0 0 7 0 223 3600 0 0 0 0 0 15 0 225 3632 0 0 0 0 0 8 0 228 3680 0 0 0 0 0 9 0 230 3712 0 0 0 0 0 10 0 232 3744 0 0 0 0 0 11 0 234 3776 0 0 0 0 0 12 0 235 3792 0 0 0 0 0 13 0 236 3808 0 0 0 0 0 14 0 238 3840 0 0 0 0 0 15 0 254 4096 0 0 0 0 0 1 0 ... TESTS ===== Test untars linux-6.0.tar.xz and compiles the kernel. zram is configured as a block device with ext4 file system, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted the VM. orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) zspage 1691791360 628086729 655171584 0 655171584 60 0 34043 1691787264 628089196 655175680 0 655175680 60 0 34046 1691803648 628098840 655187968 0 655187968 59 0 34047 1691795456 628091503 655183872 0 655183872 60 0 34044 1691799552 628086877 655183872 0 655183872 60 0 34047 ORDER 3 zspage 1691803648 627792993 641794048 0 641794048 60 0 33591 1691787264 627779342 641708032 0 641708032 59 0 33591 1691811840 627786616 641769472 0 641769472 60 0 33591 1691803648 627794468 641818624 0 641818624 59 0 33592 1691783168 627780882 641794048 0 641794048 61 0 33591 ORDER 4 zspage 1691803648 627726635 639655936 0 639655936 60 0 33435 1691811840 627733348 639643648 0 639643648 61 0 33434 1691795456 627726290 639614976 0 639614976 60 0 33435 1691803648 627730458 639688704 0 639688704 60 0 33434 1691811840 627727771 639688704 0 639688704 60 0 33434 Order 3 and order 4 show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem + order-3-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.4170803e+08 6.4181862e+08 6.4179405e+08 6.4177684e+08 42210.666 Difference at 95.0% confidence -1.34038e+07 +/- 44080.7 -2.04581% +/- 0.00672802% (Student's t, pooled s = 30224.5) T-test for order 4: x order-2-maxmem + order-4-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.3961498e+08 6.396887e+08 6.3965594e+08 6.3965839e+08 31408.602 Difference at 95.0% confidence -1.55222e+07 +/- 33126.2 -2.36915% +/- 0.00505604% (Student's t, pooled s = 22713.4) This test tends to benefit more from order 4 zspages, due to test's data patterns. zsmalloc object distribution analysis ============================================================================= Order 2 (4 pages per zspage) tends to put many objects in size class 2048, which is merged with size classes torvalds#112-torvalds#125: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 0 6146 6146 1756 2 0 74 1216 0 1 4560 4552 1368 3 0 76 1248 0 1 2938 2934 904 4 0 83 1360 0 0 10971 10971 3657 1 0 91 1488 0 0 16126 16126 5864 4 0 94 1536 0 1 5912 5908 2217 3 0 100 1632 0 0 11990 11990 4796 2 0 107 1744 0 1 15771 15768 6759 3 0 111 1808 0 1 10386 10380 4616 4 0 126 2048 0 0 45444 45444 22722 1 0 144 2336 0 0 47446 47446 27112 4 0 151 2448 1 0 10760 10759 6456 3 0 168 2720 0 0 10173 10173 6782 2 0 190 3072 0 1 1700 1697 1275 3 0 202 3264 0 1 290 286 232 4 0 254 4096 0 0 34051 34051 34051 1 0 Order 3 (8 pages per zspage) changed pool characteristics and unmerged some of the size classes, which resulted in less objects being put into size class 2048, because there are lower size classes are now available for more compact object storage: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 1 2996 2994 856 2 0 72 1184 0 1 1632 1609 476 7 0 73 1200 1 0 1445 1442 425 5 0 74 1216 0 0 1510 1510 453 3 0 75 1232 0 1 1495 1479 455 7 0 76 1248 0 1 1456 1451 448 4 0 78 1280 0 1 3040 3033 950 5 0 79 1296 0 1 1584 1571 504 7 0 83 1360 0 0 6375 6375 2125 1 0 84 1376 0 1 1817 1796 632 8 0 87 1424 0 1 6020 6006 2107 7 0 88 1440 0 1 2108 2101 744 6 0 89 1456 0 1 2072 2064 740 5 0 91 1488 0 1 4169 4159 1516 4 0 92 1504 0 1 2014 2007 742 7 0 94 1536 0 1 3904 3900 1464 3 0 95 1552 0 1 1890 1873 720 8 0 96 1568 0 1 1963 1958 755 5 0 97 1584 0 1 1980 1974 770 7 0 100 1632 0 1 6190 6187 2476 2 0 103 1680 0 0 6477 6477 2667 7 0 104 1696 0 1 2256 2253 940 5 0 105 1712 0 1 2356 2340 992 8 0 107 1744 1 0 4697 4696 2013 3 0 110 1792 0 1 7744 7734 3388 7 0 111 1808 0 1 2655 2649 1180 4 0 114 1856 0 1 8371 8365 3805 5 0 116 1888 1 0 5863 5862 2706 6 0 117 1904 0 1 2955 2942 1379 7 0 118 1920 0 1 3009 2997 1416 8 0 126 2048 0 0 25276 25276 12638 1 0 128 2080 0 1 6060 6052 3232 8 0 129 2096 1 0 3081 3080 1659 7 0 134 2176 0 1 14835 14830 7912 8 0 135 2192 0 1 2769 2758 1491 7 0 137 2224 0 1 5082 5077 2772 6 0 140 2272 0 1 7236 7232 4020 5 0 144 2336 0 1 8428 8423 4816 4 0 147 2384 0 1 5316 5313 3101 7 0 151 2448 0 1 5445 5443 3267 3 0 155 2512 0 0 4121 4121 2536 8 0 158 2560 0 1 2208 2205 1380 5 0 160 2592 0 0 1133 1133 721 7 0 168 2720 0 0 2712 2712 1808 2 0 177 2864 1 0 1100 1098 770 7 0 180 2912 0 1 189 183 135 5 0 184 2976 0 1 176 166 128 8 0 190 3072 0 0 252 252 189 3 0 197 3184 0 1 198 192 154 7 0 202 3264 0 1 100 96 80 4 0 211 3408 0 1 210 208 175 5 0 217 3504 0 1 98 94 84 6 0 222 3584 0 0 104 104 91 7 0 225 3632 0 1 54 50 48 8 0 254 4096 0 0 33591 33591 33591 1 0 Note, the huge size watermark is above 3632 and there are a number of new normal classes available that previously were merged with the huge class. For instance, size class torvalds#211 holds 210 objects of size 3408 and uses 175 physical pages, while previously for those objects we would have used 210 physical pages. Link: https://lkml.kernel.org/r/20221031054108.541190-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zsmalloc has 255 size classes. Size classes contain a number of zspages, which store objects of the same size. zspage can consist of up to four physical pages. The exact (most optimal) zspage size is calculated for each size class during zsmalloc pool creation. As a reasonable optimization, zsmalloc merges size classes that have similar characteristics: number of pages per zspage and number of objects zspage can store. For example, let's look at the following size classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable .. 94 1536 0 0 0 0 0 3 0 100 1632 0 0 0 0 0 2 0 .. Size classes torvalds#95-99 are merged with size class torvalds#100. That is, each time we store an object of size, say, 1568 bytes instead of using class torvalds#96 we end up storing it in size class torvalds#100. Class torvalds#100 is for objects of 1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes. Class torvalds#100 zspages consist of 2 physical pages and can hold 5 objects. When we need to store, say, 13 objects of size 1568 we end up allocating three zspages; in other words, 6 physical pages. However, if we'll look closer at size class torvalds#96 (which should hold objects of size 1568 bytes) and trace get_pages_per_zspage(): pages per zspage wasted bytes used% 1 960 76 2 352 95 3 1312 89 4 704 95 5 96 99 We'd notice that the most optimal zspage configuration for this class is when it consists of 5 physical pages, but currently we never let zspages to consists of more than 4 pages. A 5 page class torvalds#96 configuration would store 13 objects of size 1568 in a single zspage, allocating 5 physical pages, as opposed to 6 physical pages that class torvalds#100 will allocate. A higher order zspage for class torvalds#96 also changes its key characteristics: pages per-zspage and objects per-zspage. As a result classes torvalds#96 and torvalds#100 are not merged anymore, which gives us more compact zsmalloc. Of course the described effect does not apply only to size classes torvalds#96 and We still merge classes, but less often so. In other words classes are grouped in a more compact way, which decreases memory wastage: zspage order # unique size classes 2 69 3 123 4 191 Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zram0/classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 254 4096 0 0 0 0 0 1 0 ... For exactly same reason - maximum 4 pages per zspage - the last non-huge size class is torvalds#202, which stores objects of size 3264 bytes. Any object larger than 3264 bytes, hence, is considered to be huge and lands in size class torvalds#254, which uses a whole physical page to store every object. To put it slightly differently - objects in huge classes don't share physical pages. 3264 bytes is too low of a watermark and we have too many huge classes: classes from torvalds#203 to torvalds#254. Similarly to class size torvalds#96 above, higher order zspages change key characteristics for some of those huge size classes and thus those classes become normal classes, where stored objects share physical pages. Hence yet another consequence of higher order zspages: we move the huge size class watermark with higher order zspages, have less huge classes and store large objects in a more compact way. For order 3, huge class watermark becomes 3632 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 211 3408 0 0 0 0 0 5 0 217 3504 0 0 0 0 0 6 0 222 3584 0 0 0 0 0 7 0 225 3632 0 0 0 0 0 8 0 254 4096 0 0 0 0 0 1 0 ... For order 4, huge class watermark becomes 3840 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 206 3328 0 0 0 0 0 13 0 207 3344 0 0 0 0 0 9 0 208 3360 0 0 0 0 0 14 0 211 3408 0 0 0 0 0 5 0 212 3424 0 0 0 0 0 16 0 214 3456 0 0 0 0 0 11 0 217 3504 0 0 0 0 0 6 0 219 3536 0 0 0 0 0 13 0 222 3584 0 0 0 0 0 7 0 223 3600 0 0 0 0 0 15 0 225 3632 0 0 0 0 0 8 0 228 3680 0 0 0 0 0 9 0 230 3712 0 0 0 0 0 10 0 232 3744 0 0 0 0 0 11 0 234 3776 0 0 0 0 0 12 0 235 3792 0 0 0 0 0 13 0 236 3808 0 0 0 0 0 14 0 238 3840 0 0 0 0 0 15 0 254 4096 0 0 0 0 0 1 0 ... TESTS ===== Test untars linux-6.0.tar.xz and compiles the kernel. zram is configured as a block device with ext4 file system, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted the VM. orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) zspage 1691791360 628086729 655171584 0 655171584 60 0 34043 1691787264 628089196 655175680 0 655175680 60 0 34046 1691803648 628098840 655187968 0 655187968 59 0 34047 1691795456 628091503 655183872 0 655183872 60 0 34044 1691799552 628086877 655183872 0 655183872 60 0 34047 ORDER 3 zspage 1691803648 627792993 641794048 0 641794048 60 0 33591 1691787264 627779342 641708032 0 641708032 59 0 33591 1691811840 627786616 641769472 0 641769472 60 0 33591 1691803648 627794468 641818624 0 641818624 59 0 33592 1691783168 627780882 641794048 0 641794048 61 0 33591 ORDER 4 zspage 1691803648 627726635 639655936 0 639655936 60 0 33435 1691811840 627733348 639643648 0 639643648 61 0 33434 1691795456 627726290 639614976 0 639614976 60 0 33435 1691803648 627730458 639688704 0 639688704 60 0 33434 1691811840 627727771 639688704 0 639688704 60 0 33434 Order 3 and order 4 show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem + order-3-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.4170803e+08 6.4181862e+08 6.4179405e+08 6.4177684e+08 42210.666 Difference at 95.0% confidence -1.34038e+07 +/- 44080.7 -2.04581% +/- 0.00672802% (Student's t, pooled s = 30224.5) T-test for order 4: x order-2-maxmem + order-4-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.3961498e+08 6.396887e+08 6.3965594e+08 6.3965839e+08 31408.602 Difference at 95.0% confidence -1.55222e+07 +/- 33126.2 -2.36915% +/- 0.00505604% (Student's t, pooled s = 22713.4) This test tends to benefit more from order 4 zspages, due to test's data patterns. zsmalloc object distribution analysis ============================================================================= Order 2 (4 pages per zspage) tends to put many objects in size class 2048, which is merged with size classes torvalds#112-torvalds#125: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 0 6146 6146 1756 2 0 74 1216 0 1 4560 4552 1368 3 0 76 1248 0 1 2938 2934 904 4 0 83 1360 0 0 10971 10971 3657 1 0 91 1488 0 0 16126 16126 5864 4 0 94 1536 0 1 5912 5908 2217 3 0 100 1632 0 0 11990 11990 4796 2 0 107 1744 0 1 15771 15768 6759 3 0 111 1808 0 1 10386 10380 4616 4 0 126 2048 0 0 45444 45444 22722 1 0 144 2336 0 0 47446 47446 27112 4 0 151 2448 1 0 10760 10759 6456 3 0 168 2720 0 0 10173 10173 6782 2 0 190 3072 0 1 1700 1697 1275 3 0 202 3264 0 1 290 286 232 4 0 254 4096 0 0 34051 34051 34051 1 0 Order 3 (8 pages per zspage) changed pool characteristics and unmerged some of the size classes, which resulted in less objects being put into size class 2048, because there are lower size classes are now available for more compact object storage: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 1 2996 2994 856 2 0 72 1184 0 1 1632 1609 476 7 0 73 1200 1 0 1445 1442 425 5 0 74 1216 0 0 1510 1510 453 3 0 75 1232 0 1 1495 1479 455 7 0 76 1248 0 1 1456 1451 448 4 0 78 1280 0 1 3040 3033 950 5 0 79 1296 0 1 1584 1571 504 7 0 83 1360 0 0 6375 6375 2125 1 0 84 1376 0 1 1817 1796 632 8 0 87 1424 0 1 6020 6006 2107 7 0 88 1440 0 1 2108 2101 744 6 0 89 1456 0 1 2072 2064 740 5 0 91 1488 0 1 4169 4159 1516 4 0 92 1504 0 1 2014 2007 742 7 0 94 1536 0 1 3904 3900 1464 3 0 95 1552 0 1 1890 1873 720 8 0 96 1568 0 1 1963 1958 755 5 0 97 1584 0 1 1980 1974 770 7 0 100 1632 0 1 6190 6187 2476 2 0 103 1680 0 0 6477 6477 2667 7 0 104 1696 0 1 2256 2253 940 5 0 105 1712 0 1 2356 2340 992 8 0 107 1744 1 0 4697 4696 2013 3 0 110 1792 0 1 7744 7734 3388 7 0 111 1808 0 1 2655 2649 1180 4 0 114 1856 0 1 8371 8365 3805 5 0 116 1888 1 0 5863 5862 2706 6 0 117 1904 0 1 2955 2942 1379 7 0 118 1920 0 1 3009 2997 1416 8 0 126 2048 0 0 25276 25276 12638 1 0 128 2080 0 1 6060 6052 3232 8 0 129 2096 1 0 3081 3080 1659 7 0 134 2176 0 1 14835 14830 7912 8 0 135 2192 0 1 2769 2758 1491 7 0 137 2224 0 1 5082 5077 2772 6 0 140 2272 0 1 7236 7232 4020 5 0 144 2336 0 1 8428 8423 4816 4 0 147 2384 0 1 5316 5313 3101 7 0 151 2448 0 1 5445 5443 3267 3 0 155 2512 0 0 4121 4121 2536 8 0 158 2560 0 1 2208 2205 1380 5 0 160 2592 0 0 1133 1133 721 7 0 168 2720 0 0 2712 2712 1808 2 0 177 2864 1 0 1100 1098 770 7 0 180 2912 0 1 189 183 135 5 0 184 2976 0 1 176 166 128 8 0 190 3072 0 0 252 252 189 3 0 197 3184 0 1 198 192 154 7 0 202 3264 0 1 100 96 80 4 0 211 3408 0 1 210 208 175 5 0 217 3504 0 1 98 94 84 6 0 222 3584 0 0 104 104 91 7 0 225 3632 0 1 54 50 48 8 0 254 4096 0 0 33591 33591 33591 1 0 Note, the huge size watermark is above 3632 and there are a number of new normal classes available that previously were merged with the huge class. For instance, size class torvalds#211 holds 210 objects of size 3408 and uses 175 physical pages, while previously for those objects we would have used 210 physical pages. Link: https://lkml.kernel.org/r/20221031054108.541190-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zsmalloc has 255 size classes. Size classes contain a number of zspages, which store objects of the same size. zspage can consist of up to four physical pages. The exact (most optimal) zspage size is calculated for each size class during zsmalloc pool creation. As a reasonable optimization, zsmalloc merges size classes that have similar characteristics: number of pages per zspage and number of objects zspage can store. For example, let's look at the following size classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable .. 94 1536 0 0 0 0 0 3 0 100 1632 0 0 0 0 0 2 0 .. Size classes torvalds#95-99 are merged with size class torvalds#100. That is, each time we store an object of size, say, 1568 bytes instead of using class torvalds#96 we end up storing it in size class torvalds#100. Class torvalds#100 is for objects of 1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes. Class torvalds#100 zspages consist of 2 physical pages and can hold 5 objects. When we need to store, say, 13 objects of size 1568 we end up allocating three zspages; in other words, 6 physical pages. However, if we'll look closer at size class torvalds#96 (which should hold objects of size 1568 bytes) and trace get_pages_per_zspage(): pages per zspage wasted bytes used% 1 960 76 2 352 95 3 1312 89 4 704 95 5 96 99 We'd notice that the most optimal zspage configuration for this class is when it consists of 5 physical pages, but currently we never let zspages to consists of more than 4 pages. A 5 page class torvalds#96 configuration would store 13 objects of size 1568 in a single zspage, allocating 5 physical pages, as opposed to 6 physical pages that class torvalds#100 will allocate. A higher order zspage for class torvalds#96 also changes its key characteristics: pages per-zspage and objects per-zspage. As a result classes torvalds#96 and torvalds#100 are not merged anymore, which gives us more compact zsmalloc. Of course the described effect does not apply only to size classes torvalds#96 and We still merge classes, but less often so. In other words classes are grouped in a more compact way, which decreases memory wastage: zspage order # unique size classes 2 69 3 123 4 191 Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zram0/classes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 254 4096 0 0 0 0 0 1 0 ... For exactly same reason - maximum 4 pages per zspage - the last non-huge size class is torvalds#202, which stores objects of size 3264 bytes. Any object larger than 3264 bytes, hence, is considered to be huge and lands in size class torvalds#254, which uses a whole physical page to store every object. To put it slightly differently - objects in huge classes don't share physical pages. 3264 bytes is too low of a watermark and we have too many huge classes: classes from torvalds#203 to torvalds#254. Similarly to class size torvalds#96 above, higher order zspages change key characteristics for some of those huge size classes and thus those classes become normal classes, where stored objects share physical pages. Hence yet another consequence of higher order zspages: we move the huge size class watermark with higher order zspages, have less huge classes and store large objects in a more compact way. For order 3, huge class watermark becomes 3632 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 211 3408 0 0 0 0 0 5 0 217 3504 0 0 0 0 0 6 0 222 3584 0 0 0 0 0 7 0 225 3632 0 0 0 0 0 8 0 254 4096 0 0 0 0 0 1 0 ... For order 4, huge class watermark becomes 3840 bytes: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 202 3264 0 0 0 0 0 4 0 206 3328 0 0 0 0 0 13 0 207 3344 0 0 0 0 0 9 0 208 3360 0 0 0 0 0 14 0 211 3408 0 0 0 0 0 5 0 212 3424 0 0 0 0 0 16 0 214 3456 0 0 0 0 0 11 0 217 3504 0 0 0 0 0 6 0 219 3536 0 0 0 0 0 13 0 222 3584 0 0 0 0 0 7 0 223 3600 0 0 0 0 0 15 0 225 3632 0 0 0 0 0 8 0 228 3680 0 0 0 0 0 9 0 230 3712 0 0 0 0 0 10 0 232 3744 0 0 0 0 0 11 0 234 3776 0 0 0 0 0 12 0 235 3792 0 0 0 0 0 13 0 236 3808 0 0 0 0 0 14 0 238 3840 0 0 0 0 0 15 0 254 4096 0 0 0 0 0 1 0 ... TESTS ===== Test untars linux-6.0.tar.xz and compiles the kernel. zram is configured as a block device with ext4 file system, lzo-rle compression algorithm. We captured /sys/block/zram0/mm_stat after every test and rebooted the VM. orig_data_size mem_used_total mem_used_max pages_compacted compr_data_size mem_limit same_pages huge_pages ORDER 2 (BASE) zspage 1691791360 628086729 655171584 0 655171584 60 0 34043 1691787264 628089196 655175680 0 655175680 60 0 34046 1691803648 628098840 655187968 0 655187968 59 0 34047 1691795456 628091503 655183872 0 655183872 60 0 34044 1691799552 628086877 655183872 0 655183872 60 0 34047 ORDER 3 zspage 1691803648 627792993 641794048 0 641794048 60 0 33591 1691787264 627779342 641708032 0 641708032 59 0 33591 1691811840 627786616 641769472 0 641769472 60 0 33591 1691803648 627794468 641818624 0 641818624 59 0 33592 1691783168 627780882 641794048 0 641794048 61 0 33591 ORDER 4 zspage 1691803648 627726635 639655936 0 639655936 60 0 33435 1691811840 627733348 639643648 0 639643648 61 0 33434 1691795456 627726290 639614976 0 639614976 60 0 33435 1691803648 627730458 639688704 0 639688704 60 0 33434 1691811840 627727771 639688704 0 639688704 60 0 33434 Order 3 and order 4 show statistically significant improvement in `mem_used_max` metrics. T-test for order 3: x order-2-maxmem + order-3-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.4170803e+08 6.4181862e+08 6.4179405e+08 6.4177684e+08 42210.666 Difference at 95.0% confidence -1.34038e+07 +/- 44080.7 -2.04581% +/- 0.00672802% (Student's t, pooled s = 30224.5) T-test for order 4: x order-2-maxmem + order-4-maxmem N Min Max Median Avg Stddev x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157 + 5 6.3961498e+08 6.396887e+08 6.3965594e+08 6.3965839e+08 31408.602 Difference at 95.0% confidence -1.55222e+07 +/- 33126.2 -2.36915% +/- 0.00505604% (Student's t, pooled s = 22713.4) This test tends to benefit more from order 4 zspages, due to test's data patterns. zsmalloc object distribution analysis ============================================================================= Order 2 (4 pages per zspage) tends to put many objects in size class 2048, which is merged with size classes torvalds#112-torvalds#125: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 0 6146 6146 1756 2 0 74 1216 0 1 4560 4552 1368 3 0 76 1248 0 1 2938 2934 904 4 0 83 1360 0 0 10971 10971 3657 1 0 91 1488 0 0 16126 16126 5864 4 0 94 1536 0 1 5912 5908 2217 3 0 100 1632 0 0 11990 11990 4796 2 0 107 1744 0 1 15771 15768 6759 3 0 111 1808 0 1 10386 10380 4616 4 0 126 2048 0 0 45444 45444 22722 1 0 144 2336 0 0 47446 47446 27112 4 0 151 2448 1 0 10760 10759 6456 3 0 168 2720 0 0 10173 10173 6782 2 0 190 3072 0 1 1700 1697 1275 3 0 202 3264 0 1 290 286 232 4 0 254 4096 0 0 34051 34051 34051 1 0 Order 3 (8 pages per zspage) changed pool characteristics and unmerged some of the size classes, which resulted in less objects being put into size class 2048, because there are lower size classes are now available for more compact object storage: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 71 1168 0 1 2996 2994 856 2 0 72 1184 0 1 1632 1609 476 7 0 73 1200 1 0 1445 1442 425 5 0 74 1216 0 0 1510 1510 453 3 0 75 1232 0 1 1495 1479 455 7 0 76 1248 0 1 1456 1451 448 4 0 78 1280 0 1 3040 3033 950 5 0 79 1296 0 1 1584 1571 504 7 0 83 1360 0 0 6375 6375 2125 1 0 84 1376 0 1 1817 1796 632 8 0 87 1424 0 1 6020 6006 2107 7 0 88 1440 0 1 2108 2101 744 6 0 89 1456 0 1 2072 2064 740 5 0 91 1488 0 1 4169 4159 1516 4 0 92 1504 0 1 2014 2007 742 7 0 94 1536 0 1 3904 3900 1464 3 0 95 1552 0 1 1890 1873 720 8 0 96 1568 0 1 1963 1958 755 5 0 97 1584 0 1 1980 1974 770 7 0 100 1632 0 1 6190 6187 2476 2 0 103 1680 0 0 6477 6477 2667 7 0 104 1696 0 1 2256 2253 940 5 0 105 1712 0 1 2356 2340 992 8 0 107 1744 1 0 4697 4696 2013 3 0 110 1792 0 1 7744 7734 3388 7 0 111 1808 0 1 2655 2649 1180 4 0 114 1856 0 1 8371 8365 3805 5 0 116 1888 1 0 5863 5862 2706 6 0 117 1904 0 1 2955 2942 1379 7 0 118 1920 0 1 3009 2997 1416 8 0 126 2048 0 0 25276 25276 12638 1 0 128 2080 0 1 6060 6052 3232 8 0 129 2096 1 0 3081 3080 1659 7 0 134 2176 0 1 14835 14830 7912 8 0 135 2192 0 1 2769 2758 1491 7 0 137 2224 0 1 5082 5077 2772 6 0 140 2272 0 1 7236 7232 4020 5 0 144 2336 0 1 8428 8423 4816 4 0 147 2384 0 1 5316 5313 3101 7 0 151 2448 0 1 5445 5443 3267 3 0 155 2512 0 0 4121 4121 2536 8 0 158 2560 0 1 2208 2205 1380 5 0 160 2592 0 0 1133 1133 721 7 0 168 2720 0 0 2712 2712 1808 2 0 177 2864 1 0 1100 1098 770 7 0 180 2912 0 1 189 183 135 5 0 184 2976 0 1 176 166 128 8 0 190 3072 0 0 252 252 189 3 0 197 3184 0 1 198 192 154 7 0 202 3264 0 1 100 96 80 4 0 211 3408 0 1 210 208 175 5 0 217 3504 0 1 98 94 84 6 0 222 3584 0 0 104 104 91 7 0 225 3632 0 1 54 50 48 8 0 254 4096 0 0 33591 33591 33591 1 0 Note, the huge size watermark is above 3632 and there are a number of new normal classes available that previously were merged with the huge class. For instance, size class torvalds#211 holds 210 objects of size 3408 and uses 175 physical pages, while previously for those objects we would have used 210 physical pages. Link: https://lkml.kernel.org/r/20221031054108.541190-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The set channel operation "ethtool -L tx <n>" broke with the recent suspend/resume changes. Revert back to original driver behaviour of not freeing the TX/RX IRQs at am65_cpsw_nuss_common_stop(). We will now free them only on .suspend() as we need to release the DMA channels (as DMA looses context) and re-acquiring them on .resume() may not necessarily give us the same IRQs. Introduce am65_cpsw_nuss_remove_rx_chns() which is similar to am65_cpsw_nuss_remove_tx_chns() and invoke them both in .suspend(). At .resume() call am65_cpsw_nuss_init_rx/tx_chns() to acquire the DMA channels. To as IRQs need to be requested after knowing the IRQ numbers, move am65_cpsw_nuss_ndev_add_tx_napi() call to am65_cpsw_nuss_init_tx_chns(). Also fixes the below warning during suspend/resume on multi CPU system. [ 67.347684] ------------[ cut here ]------------ [ 67.347700] Unbalanced enable for IRQ 119 [ 67.347726] WARNING: CPU: 0 PID: 1080 at kernel/irq/manage.c:781 __enable_irq+0x4c/0x80 [ 67.347754] Modules linked in: wlcore_sdio wl18xx wlcore mac80211 libarc4 cfg80211 rfkill crct10dif_ce sch_fq_codel ipv6 [ 67.347803] CPU: 0 PID: 1080 Comm: rtcwake Not tainted 6.1.0-rc4-00023-gc826e5480732-dirty torvalds#203 [ 67.347812] Hardware name: Texas Instruments AM625 (DT) [ 67.347818] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 67.347829] pc : __enable_irq+0x4c/0x80 [ 67.347838] lr : __enable_irq+0x4c/0x80 [ 67.347846] sp : ffff80000999ba00 [ 67.347850] x29: ffff80000999ba00 x28: ffff0000011c1c80 x27: 0000000000000000 [ 67.347863] x26: 00000000000001f4 x25: ffff000001058358 x24: ffff000001059080 [ 67.347876] x23: ffff000001058080 x22: ffff000001060000 x21: 0000000000000077 [ 67.347888] x20: ffff0000011c1c80 x19: ffff000001429600 x18: 0000000000000001 [ 67.347900] x17: 0000000000000080 x16: fffffc000176e008 x15: ffff0000011c21b0 [ 67.347913] x14: 0000000000000000 x13: 3931312051524920 x12: 726f6620656c6261 [ 67.347925] x11: 656820747563205b x10: 000000000000000a x9 : ffff80000999ba00 [ 67.347938] x8 : ffff800009121068 x7 : ffff80000999b810 x6 : 00000000fffff17f [ 67.347950] x5 : ffff00007fb99b18 x4 : 0000000000000000 x3 : 0000000000000027 [ 67.347962] x2 : ffff00007fb99b20 x1 : 50dd48f7f19deb00 x0 : 0000000000000000 [ 67.347975] Call trace: [ 67.347980] __enable_irq+0x4c/0x80 [ 67.347989] enable_irq+0x4c/0xa0 [ 67.347999] am65_cpsw_nuss_ndo_slave_open+0x4b0/0x568 [ 67.348015] am65_cpsw_nuss_resume+0x68/0x160 [ 67.348025] dpm_run_callback.isra.0+0x28/0x88 [ 67.348040] device_resume+0x78/0x160 [ 67.348050] dpm_resume+0xc0/0x1f8 [ 67.348057] dpm_resume_end+0x18/0x30 [ 67.348063] suspend_devices_and_enter+0x1cc/0x4e0 [ 67.348075] pm_suspend+0x1f8/0x268 [ 67.348084] state_store+0x8c/0x118 [ 67.348092] kobj_attr_store+0x18/0x30 [ 67.348104] sysfs_kf_write+0x44/0x58 [ 67.348117] kernfs_fop_write_iter+0x118/0x1a8 [ 67.348127] vfs_write+0x31c/0x418 [ 67.348140] ksys_write+0x6c/0xf8 [ 67.348150] __arm64_sys_write+0x1c/0x28 [ 67.348160] invoke_syscall+0x44/0x108 [ 67.348172] el0_svc_common.constprop.0+0x44/0xf0 [ 67.348182] do_el0_svc+0x2c/0xc8 [ 67.348191] el0_svc+0x2c/0x88 [ 67.348201] el0t_64_sync_handler+0xb8/0xc0 [ 67.348209] el0t_64_sync+0x18c/0x190 [ 67.348218] ---[ end trace 0000000000000000 ]--- Fixes: fd23df7 ("net: ethernet: ti: am65-cpsw: Add suspend/resume support") Signed-off-by: Roger Quadros <rogerq@kernel.org>
The set channel operation "ethtool -L tx <n>" broke with the recent suspend/resume changes. Revert back to original driver behaviour of not freeing the TX/RX IRQs at am65_cpsw_nuss_common_stop(). We will now free them only on .suspend() as we need to release the DMA channels (as DMA looses context) and re-acquiring them on .resume() may not necessarily give us the same IRQs. Introduce am65_cpsw_nuss_remove_rx_chns() which is similar to am65_cpsw_nuss_remove_tx_chns() and invoke them both in .suspend(). At .resume() call am65_cpsw_nuss_init_rx/tx_chns() to acquire the DMA channels. To as IRQs need to be requested after knowing the IRQ numbers, move am65_cpsw_nuss_ndev_add_tx_napi() call to am65_cpsw_nuss_init_tx_chns(). Also fixes the below warning during suspend/resume on multi CPU system. [ 67.347684] ------------[ cut here ]------------ [ 67.347700] Unbalanced enable for IRQ 119 [ 67.347726] WARNING: CPU: 0 PID: 1080 at kernel/irq/manage.c:781 __enable_irq+0x4c/0x80 [ 67.347754] Modules linked in: wlcore_sdio wl18xx wlcore mac80211 libarc4 cfg80211 rfkill crct10dif_ce sch_fq_codel ipv6 [ 67.347803] CPU: 0 PID: 1080 Comm: rtcwake Not tainted 6.1.0-rc4-00023-gc826e5480732-dirty torvalds#203 [ 67.347812] Hardware name: Texas Instruments AM625 (DT) [ 67.347818] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 67.347829] pc : __enable_irq+0x4c/0x80 [ 67.347838] lr : __enable_irq+0x4c/0x80 [ 67.347846] sp : ffff80000999ba00 [ 67.347850] x29: ffff80000999ba00 x28: ffff0000011c1c80 x27: 0000000000000000 [ 67.347863] x26: 00000000000001f4 x25: ffff000001058358 x24: ffff000001059080 [ 67.347876] x23: ffff000001058080 x22: ffff000001060000 x21: 0000000000000077 [ 67.347888] x20: ffff0000011c1c80 x19: ffff000001429600 x18: 0000000000000001 [ 67.347900] x17: 0000000000000080 x16: fffffc000176e008 x15: ffff0000011c21b0 [ 67.347913] x14: 0000000000000000 x13: 3931312051524920 x12: 726f6620656c6261 [ 67.347925] x11: 656820747563205b x10: 000000000000000a x9 : ffff80000999ba00 [ 67.347938] x8 : ffff800009121068 x7 : ffff80000999b810 x6 : 00000000fffff17f [ 67.347950] x5 : ffff00007fb99b18 x4 : 0000000000000000 x3 : 0000000000000027 [ 67.347962] x2 : ffff00007fb99b20 x1 : 50dd48f7f19deb00 x0 : 0000000000000000 [ 67.347975] Call trace: [ 67.347980] __enable_irq+0x4c/0x80 [ 67.347989] enable_irq+0x4c/0xa0 [ 67.347999] am65_cpsw_nuss_ndo_slave_open+0x4b0/0x568 [ 67.348015] am65_cpsw_nuss_resume+0x68/0x160 [ 67.348025] dpm_run_callback.isra.0+0x28/0x88 [ 67.348040] device_resume+0x78/0x160 [ 67.348050] dpm_resume+0xc0/0x1f8 [ 67.348057] dpm_resume_end+0x18/0x30 [ 67.348063] suspend_devices_and_enter+0x1cc/0x4e0 [ 67.348075] pm_suspend+0x1f8/0x268 [ 67.348084] state_store+0x8c/0x118 [ 67.348092] kobj_attr_store+0x18/0x30 [ 67.348104] sysfs_kf_write+0x44/0x58 [ 67.348117] kernfs_fop_write_iter+0x118/0x1a8 [ 67.348127] vfs_write+0x31c/0x418 [ 67.348140] ksys_write+0x6c/0xf8 [ 67.348150] __arm64_sys_write+0x1c/0x28 [ 67.348160] invoke_syscall+0x44/0x108 [ 67.348172] el0_svc_common.constprop.0+0x44/0xf0 [ 67.348182] do_el0_svc+0x2c/0xc8 [ 67.348191] el0_svc+0x2c/0x88 [ 67.348201] el0t_64_sync_handler+0xb8/0xc0 [ 67.348209] el0t_64_sync+0x18c/0x190 [ 67.348218] ---[ end trace 0000000000000000 ]--- Fixes: fd23df7 ("net: ethernet: ti: am65-cpsw: Add suspend/resume support") Signed-off-by: Roger Quadros <rogerq@kernel.org>
The set channel operation "ethtool -L tx <n>" broke with the recent suspend/resume changes. Revert back to original driver behaviour of not freeing the TX/RX IRQs at am65_cpsw_nuss_common_stop(). We will now free them only on .suspend() as we need to release the DMA channels (as DMA looses context) and re-acquiring them on .resume() may not necessarily give us the same IRQs. Introduce am65_cpsw_nuss_remove_rx_chns() which is similar to am65_cpsw_nuss_remove_tx_chns() and invoke them both in .suspend(). At .resume() call am65_cpsw_nuss_init_rx/tx_chns() to acquire the DMA channels. To as IRQs need to be requested after knowing the IRQ numbers, move am65_cpsw_nuss_ndev_add_tx_napi() call to am65_cpsw_nuss_init_tx_chns(). Also fixes the below warning during suspend/resume on multi CPU system. [ 67.347684] ------------[ cut here ]------------ [ 67.347700] Unbalanced enable for IRQ 119 [ 67.347726] WARNING: CPU: 0 PID: 1080 at kernel/irq/manage.c:781 __enable_irq+0x4c/0x80 [ 67.347754] Modules linked in: wlcore_sdio wl18xx wlcore mac80211 libarc4 cfg80211 rfkill crct10dif_ce sch_fq_codel ipv6 [ 67.347803] CPU: 0 PID: 1080 Comm: rtcwake Not tainted 6.1.0-rc4-00023-gc826e5480732-dirty torvalds#203 [ 67.347812] Hardware name: Texas Instruments AM625 (DT) [ 67.347818] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 67.347829] pc : __enable_irq+0x4c/0x80 [ 67.347838] lr : __enable_irq+0x4c/0x80 [ 67.347846] sp : ffff80000999ba00 [ 67.347850] x29: ffff80000999ba00 x28: ffff0000011c1c80 x27: 0000000000000000 [ 67.347863] x26: 00000000000001f4 x25: ffff000001058358 x24: ffff000001059080 [ 67.347876] x23: ffff000001058080 x22: ffff000001060000 x21: 0000000000000077 [ 67.347888] x20: ffff0000011c1c80 x19: ffff000001429600 x18: 0000000000000001 [ 67.347900] x17: 0000000000000080 x16: fffffc000176e008 x15: ffff0000011c21b0 [ 67.347913] x14: 0000000000000000 x13: 3931312051524920 x12: 726f6620656c6261 [ 67.347925] x11: 656820747563205b x10: 000000000000000a x9 : ffff80000999ba00 [ 67.347938] x8 : ffff800009121068 x7 : ffff80000999b810 x6 : 00000000fffff17f [ 67.347950] x5 : ffff00007fb99b18 x4 : 0000000000000000 x3 : 0000000000000027 [ 67.347962] x2 : ffff00007fb99b20 x1 : 50dd48f7f19deb00 x0 : 0000000000000000 [ 67.347975] Call trace: [ 67.347980] __enable_irq+0x4c/0x80 [ 67.347989] enable_irq+0x4c/0xa0 [ 67.347999] am65_cpsw_nuss_ndo_slave_open+0x4b0/0x568 [ 67.348015] am65_cpsw_nuss_resume+0x68/0x160 [ 67.348025] dpm_run_callback.isra.0+0x28/0x88 [ 67.348040] device_resume+0x78/0x160 [ 67.348050] dpm_resume+0xc0/0x1f8 [ 67.348057] dpm_resume_end+0x18/0x30 [ 67.348063] suspend_devices_and_enter+0x1cc/0x4e0 [ 67.348075] pm_suspend+0x1f8/0x268 [ 67.348084] state_store+0x8c/0x118 [ 67.348092] kobj_attr_store+0x18/0x30 [ 67.348104] sysfs_kf_write+0x44/0x58 [ 67.348117] kernfs_fop_write_iter+0x118/0x1a8 [ 67.348127] vfs_write+0x31c/0x418 [ 67.348140] ksys_write+0x6c/0xf8 [ 67.348150] __arm64_sys_write+0x1c/0x28 [ 67.348160] invoke_syscall+0x44/0x108 [ 67.348172] el0_svc_common.constprop.0+0x44/0xf0 [ 67.348182] do_el0_svc+0x2c/0xc8 [ 67.348191] el0_svc+0x2c/0x88 [ 67.348201] el0t_64_sync_handler+0xb8/0xc0 [ 67.348209] el0t_64_sync+0x18c/0x190 [ 67.348218] ---[ end trace 0000000000000000 ]--- Fixes: fd23df7 ("net: ethernet: ti: am65-cpsw: Add suspend/resume support") Signed-off-by: Roger Quadros <rogerq@kernel.org>
If hci_conn_del gets called on a LE connection linked to a CIS connection, subsequent hci_conn_del on the CIS connection results to use-after-free [1] as cis->link still points to the deleted connection. This occurs e.g. if hci_cmd_sync_queue fails in hci_le_create_cis. Fix it by doing the same what is done with the SCO+ACL linked connections. [1]: BUG: KASAN: use-after-free in hci_conn_del+0xa4/0x3e0 Write of size 8 at addr ffff8880013d2668 by task iso-tester/29 CPU: 0 PID: 29 Comm: iso-tester Not tainted 6.2.0-rc7-00024-g0e21956501c0-dirty torvalds#203 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x19/0x27 print_report+0x160/0x484 ? __virt_addr_valid+0xd4/0x150 ? hci_conn_del+0xa4/0x3e0 kasan_report+0xc7/0xf0 ? hci_conn_del+0xa4/0x3e0 hci_conn_del+0xa4/0x3e0 hci_conn_hash_flush+0xea/0x130 hci_dev_close_sync+0x34f/0x930 hci_unregister_dev+0x104/0x2a0 vhci_release+0x4c/0x90 __fput+0x102/0x410 task_work_run+0xfe/0x180 ? __pfx_task_work_run+0x10/0x10 exit_to_user_mode_prepare+0xfd/0x100 syscall_exit_to_user_mode+0x1c/0x50 do_syscall_64+0x4e/0x90 entry_SYSCALL_64_after_hwframe+0x70/0xda RIP: 0033:0x7f9880de0944
If hci_conn_del gets called on a LE connection linked to a CIS connection, subsequent hci_conn_del on the CIS connection results to use-after-free [1] as cis->link still points to the deleted connection. This occurs e.g. if hci_cmd_sync_queue fails in hci_le_create_cis. Fix it by doing the same what is done with the SCO+ACL linked connections. [1]: BUG: KASAN: use-after-free in hci_conn_del+0xa4/0x3e0 Write of size 8 at addr ffff8880013d2668 by task iso-tester/29 CPU: 0 PID: 29 Comm: iso-tester Not tainted 6.2.0-rc7-00024-g0e21956501c0-dirty torvalds#203 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x19/0x27 print_report+0x160/0x484 ? __virt_addr_valid+0xd4/0x150 ? hci_conn_del+0xa4/0x3e0 kasan_report+0xc7/0xf0 ? hci_conn_del+0xa4/0x3e0 hci_conn_del+0xa4/0x3e0 hci_conn_hash_flush+0xea/0x130 hci_dev_close_sync+0x34f/0x930 hci_unregister_dev+0x104/0x2a0 vhci_release+0x4c/0x90 __fput+0x102/0x410 task_work_run+0xfe/0x180 ? __pfx_task_work_run+0x10/0x10 exit_to_user_mode_prepare+0xfd/0x100 syscall_exit_to_user_mode+0x1c/0x50 do_syscall_64+0x4e/0x90 entry_SYSCALL_64_after_hwframe+0x70/0xda RIP: 0033:0x7f9880de0944
…fuel gauge unbind The charger manager obtained reference to fuel gauge power supply in probe with power_supply_get_by_name() for later usage. However if fuel gauge driver was removed and re-added then this reference would point to old power supply (from driver which was removed). This lead to accessing old (and probably invalid) memory which could be observed with: $ echo "12-0036" > /sys/bus/i2c/drivers/max17042/unbind $ echo "12-0036" > /sys/bus/i2c/drivers/max17042/bind $ cat /sys/devices/virtual/power_supply/battery/capacity [ 240.480084] INFO: task cat:1393 blocked for more than 120 seconds. [ 240.484799] Not tainted 3.17.0-next-20141007-00028-ge60b6dd79570 torvalds#203 [ 240.491782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 240.499589] cat D c0469530 0 1393 1 0x00000000 [ 240.505947] [<c0469530>] (__schedule) from [<c0469d3c>] (schedule_preempt_disabled+0x14/0x20) [ 240.514449] [<c0469d3c>] (schedule_preempt_disabled) from [<c046af08>] (mutex_lock_nested+0x1bc/0x458) [ 240.523736] [<c046af08>] (mutex_lock_nested) from [<c0287a98>] (regmap_read+0x30/0x60) [ 240.531647] [<c0287a98>] (regmap_read) from [<c032238c>] (max17042_get_property+0x2e8/0x350) [ 240.540055] [<c032238c>] (max17042_get_property) from [<c03247d8>] (charger_get_property+0x264/0x348) [ 240.549252] [<c03247d8>] (charger_get_property) from [<c0320764>] (power_supply_show_property+0x48/0x1e0) [ 240.558808] [<c0320764>] (power_supply_show_property) from [<c027308c>] (dev_attr_show+0x1c/0x48) [ 240.567664] [<c027308c>] (dev_attr_show) from [<c0141fb0>] (sysfs_kf_seq_show+0x84/0x104) [ 240.575814] [<c0141fb0>] (sysfs_kf_seq_show) from [<c0140b18>] (kernfs_seq_show+0x24/0x28) [ 240.584061] [<c0140b18>] (kernfs_seq_show) from [<c0104574>] (seq_read+0x1b0/0x484) [ 240.591702] [<c0104574>] (seq_read) from [<c00e1e24>] (vfs_read+0x88/0x144) [ 240.598640] [<c00e1e24>] (vfs_read) from [<c00e1f20>] (SyS_read+0x40/0x8c) [ 240.605507] [<c00e1f20>] (SyS_read) from [<c000e760>] (ret_fast_syscall+0x0/0x48) [ 240.612952] 4 locks held by cat/1393: [ 240.616589] #0: (&p->lock){+.+.+.}, at: [<c01043f4>] seq_read+0x30/0x484 [ 240.623414] #1: (&of->mutex){+.+.+.}, at: [<c01417dc>] kernfs_seq_start+0x1c/0x8c [ 240.631086] #2: (s_active#31){++++.+}, at: [<c01417e4>] kernfs_seq_start+0x24/0x8c [ 240.638777] #3: (&map->mutex){+.+...}, at: [<c0287a98>] regmap_read+0x30/0x60 The charger-manager should get reference to fuel gauge power supply on each use of get_property callback. The thermal zone 'tzd' field of power supply should not be used because of the same reason. Additionally this change solves also the issue with nested thermal_zone_get_temp() calls and related false lockdep positive for deadlock for thermal zone's mutex [1]. When fuel gauge is used as source of temperature then the charger manager forwards its get_temp calls to fuel gauge thermal zone. So actually different mutexes are used (one for charger manager thermal zone and second for fuel gauge thermal zone) but for lockdep this is one class of mutex. The recursion is removed by retrieving temperature through power supply's get_property(). In case external thermal zone is used ('cm-thermal-zone' property is present in DTS) the recursion does not exist. Charger manager simply exports POWER_SUPPLY_PROP_TEMP_AMBIENT property (instead of POWER_SUPPLY_PROP_TEMP) thus no thermal zone is created for this power supply. [1] https://lkml.org/lkml/2014/10/6/309 Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: <stable@vger.kernel.org> Fixes: 3bb3dbb ("power_supply: Add initial Charger-Manager driver") Signed-off-by: Sebastian Reichel <sre@kernel.org>
No description provided.