Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: using USB switch causes a memory spiral #1490

Closed
2 tasks done
tedcook94 opened this issue Jan 17, 2025 · 15 comments · Fixed by #1510
Closed
2 tasks done

Bug: using USB switch causes a memory spiral #1490

tedcook94 opened this issue Jan 17, 2025 · 15 comments · Fixed by #1510
Assignees
Labels
bug Something isn't working linux Issue pertains to Linux only

Comments

@tedcook94
Copy link

Requirements

Describe the bug

I have my keyboard and primary mouse connected to a USB switch so that I can quickly switch them between my desktop and work laptop (I have a secondary mouse that always stays connected to the desktop). Sometimes when switching from one computer to another (approximately 1 in 10 times), Kanata will go into a memory spiral until it consumes so much memory (~12 GB of memory and ~25 GB of swap) that my OS kills it with the message "Device memory is nearly full. An application was using a lot of memory and was forced to stop".

This has happened both when connecting and disconnecting the devices.

Relevant kanata config

config.kbd:

;; Change capslock to escape/left-control

(defsrc
  caps
)

(defalias
  escctrl (tap-hold 150 150 esc lctl)
)

(deflayer base
  @escctrl
)

To Reproduce

  1. Start the Kanata service
  2. Use the USB switcher to switch the keyboard and mouse from one computer to the other
  3. Wait for the computer to lock up as all available memory is used by Kanata (repeat step 2 if this part doesn't happen)

Expected behavior

I expect to be able to connect and/or disconnect devices as many times as I'd like without Kanata consuming all available memory.

Kanata version

kanata 1.7.0

Debug logs

journalctl -u kanata.service outputs nothing.

systemctl does provide some info (note the mem peak):

❯ systemctl --user -l --no-pager status kanata.service
× kanata.service - Kanata keyboard remapper
     Loaded: loaded (/home/ted/.config/systemd/user/kanata.service; enabled; preset: enabled)
     Active: failed (Result: oom-kill) since Thu 2025-01-16 21:09:03 CST; 6s ago
   Duration: 7min 23.899s
 Invocation: 3030fe72da5b4734aee1f318101f17d6
       Docs: https://github.com/jtroo/kanata
    Process: 415619 ExecStart=/usr/bin/sh -c exec $$(which kanata) -d --cfg $${HOME}/.config/kanata/config.kbd (code=killed, signal=KILL)
   Main PID: 415619 (code=killed, signal=KILL)
   Mem peak: 13.1G (swap: 28.3G)
        CPU: 58.540s

Jan 16 21:06:47 Desktop-EndeavourOS sh[415619]: 21:06:47.9486 [DEBUG] (1) kanata_state_machine::oskbd::linux: Use for input: false. Non-input device: HDA ATI HDMI HDMI/DP,pcm=11
Jan 16 21:06:48 Desktop-EndeavourOS sh[415619]: 21:06:48.0119 [DEBUG] (1) kanata_state_machine::oskbd::linux: Use for input: false. Non-input device: HD-Audio Generic Line Out CLFE
Jan 16 21:06:48 Desktop-EndeavourOS sh[415619]: 21:06:48.0819 [DEBUG] (1) kanata_state_machine::oskbd::linux: Use for input: false. Non-input device: HD-Audio Generic Front Headphone
Jan 16 21:06:48 Desktop-EndeavourOS sh[415619]: 21:06:48.1385 [DEBUG] (1) kanata_state_machine::oskbd::linux: Use for input: true. detect type KeyboardMice; device type KeyboardMouse, device name: Logitech G604
Jan 16 21:06:48 Desktop-EndeavourOS sh[415619]: 21:06:48.1719 [DEBUG] (1) kanata_state_machine::oskbd::linux: Use for input: true. detect type KeyboardMice; device type KeyboardMouse, device name: Logitech MX Master 2S
Jan 16 21:06:48 Desktop-EndeavourOS sh[415619]: 21:06:48.2485 [INFO] registering /dev/input/event6: "Logitech MX Master 2S"
Jan 16 21:09:03 Desktop-EndeavourOS systemd[1800]: kanata.service: A process of this unit has been killed by the OOM killer.
Jan 16 21:09:03 Desktop-EndeavourOS systemd[1800]: kanata.service: Main process exited, code=killed, status=9/KILL
Jan 16 21:09:03 Desktop-EndeavourOS systemd[1800]: kanata.service: Failed with result 'oom-kill'.
Jan 16 21:09:03 Desktop-EndeavourOS systemd[1800]: kanata.service: Consumed 58.540s CPU time, 13.1G memory peak, 28.3G memory swap peak.

I looked around for other log files and couldn't find any, but happy to provide more logs if someone can point me to them.

Operating system

Linux (EndeavourOS, kernel 6.12.8-arch1-1)

Additional context

No response

@tedcook94 tedcook94 added the bug Something isn't working label Jan 17, 2025
@jtroo jtroo added the linux Issue pertains to Linux only label Jan 17, 2025
@jtroo
Copy link
Owner

jtroo commented Jan 17, 2025

It's almost certainly in this file within the kbdin related functions:
https://github.com/jtroo/kanata/blob/main/src/oskbd/linux.rs

I don't spot any obvious infinite loops and I only use Linux in VMs these days, so will need help with reproduction+investigation. Might help to add some more lines of code logging and compile yourself.

@jtroo
Copy link
Owner

jtroo commented Jan 17, 2025

Also, running kanata with the --trace flag while reproducing the issue will print many more logs, which may or may not provide useful information.

@tedcook94
Copy link
Author

I was able to get it to crash again with --trace while writing out to a file: kanata.log

@jtroo
Copy link
Owner

jtroo commented Jan 17, 2025

Hm the trace logs unfortunately don't show any smoking gun to my eyes.

Valgrind might be of help. Ensure you have debug symbols; files downloaded from GitHub releases are stripped, so don't have them. Then run kanata under valgrind. Sample command:

valgrind --leak-check=full \
         --show-leak-kinds=all \
         --track-origins=yes \
         --log-file=valgrind-out.txt \
         <kanata execute params>

I'm not sure if valgrind will output useful logs upon process termination with a SIGKILL though. If SIGKILL causes problems you'd need to find a way to send a signal other than SIGKILL when the issue reproduces and memory starts climbing. E.g. https://unix.stackexchange.com/questions/172559/receive-signal-before-process-is-being-killed-by-oom-killer-cgroups.

@tedcook94
Copy link
Author

I read the Stackexchange post you linked, but I'm not really a kernel expert and got lost reading all of that documentation.

I did capture a OOM sigkill with Valgrind, so I figured I'd share that first to see if it was helpful before continuing down the rabbit hole of trying to send another signal: valgrind-sigkill.log

@jtroo
Copy link
Owner

jtroo commented Jan 20, 2025

As I guessed there's no useful output; there's no memcheck output at the end.

@jtroo jtroo added the question Further information is requested label Jan 26, 2025
@tedcook94
Copy link
Author

I was able to get a crash again and send it a SIGTERM before the OOM SIGKILL, so hopefully this log is more helpful: valgrind.log

@jtroo
Copy link
Owner

jtroo commented Jan 29, 2025

Indeed there is useful info; the memory is mostly in a single allocation. Unfortunately you didn't use a build with debug symbols. I wonder if maybe the addresses can be mapped to symbols retroactively...

==10450== 1,610,612,736 bytes in 1 blocks are still reachable in loss record 93 of 93
==10450==    at 0x484BE40: realloc (vg_replace_malloc.c:1801)
==10450==    by 0x2388B8: ??? (in /usr/bin/kanata)
==10450==    by 0x154C2B: ??? (in /usr/bin/kanata)
==10450==    by 0x1CE8A1: ??? (in /usr/bin/kanata)
==10450==    by 0x1D612D: ??? (in /usr/bin/kanata)
==10450==    by 0x13DAAE: ??? (in /usr/bin/kanata)
==10450==    by 0x137E2B: ??? (in /usr/bin/kanata)
==10450==    by 0x1410FE: ??? (in /usr/bin/kanata)
==10450==    by 0x49E7E07: (below main) (libc_start_call_main.h:58)

@tedcook94
Copy link
Author

Is there a build I could use with debug symbols, or a guide on how to build that myself?

@jtroo
Copy link
Owner

jtroo commented Jan 29, 2025

The command cargo build at root of the repository will have a build with debug symbols at <repo_dir>/target/debug/kanata.

@tedcook94
Copy link
Author

Here's another log with debug symbols now: valgrind.log

@jtroo
Copy link
Owner

jtroo commented Jan 29, 2025

Nice! This is helpful.

==19827== 25,165,824 bytes in 1 blocks are still reachable in loss record 79 of 79
==19827==    at 0x484BE40: realloc (vg_replace_malloc.c:1801)
==19827==    by 0x663A6D: alloc::raw_vec::finish_grow (in /home/ted/Downloads/kanata/target/debug/kanata)
==19827==    by 0x663C26: alloc::raw_vec::RawVecInner<A>::grow_amortized (in /home/ted/Downloads/kanata/target/debug/kanata)
==19827==    by 0x288F58: grow_one<alloc::alloc::Global> (raw_vec.rs:554)
==19827==    by 0x288F58: alloc::raw_vec::RawVec<T,A>::grow_one (raw_vec.rs:336)
==19827==    by 0x24697F: alloc::vec::Vec<T,A>::push (mod.rs:2406)
==19827==    by 0x286260: kanata_state_machine::oskbd::linux::KbdIn::read::{{closure}}::{{closure}} (linux.rs:152)
==19827==    by 0x269D39: core::iter::traits::iterator::Iterator::for_each::call::{{closure}} (iterator.rs:810)
==19827==    by 0x28DED0: core::iter::traits::iterator::Iterator::fold (iterator.rs:2584)
==19827==    by 0x28DF2A: core::iter::traits::iterator::Iterator::for_each (iterator.rs:813)
==19827==    by 0x286223: kanata_state_machine::oskbd::linux::KbdIn::read::{{closure}} (linux.rs:152)
==19827==    by 0x237F52: core::result::Result<T,E>::map (result.rs:773)
==19827==    by 0x22FDD0: kanata_state_machine::oskbd::linux::KbdIn::read (linux.rs:150)

@jtroo
Copy link
Owner

jtroo commented Jan 29, 2025

Looks like this call needs to have limiter e.g. take(N) since Kanata is apparently reading an endless stream of events 🤔

                    if let Err(e) = device
                        .fetch_events()
                        .map(|evs| evs.into_iter().for_each(|ev| input_events.push(ev)))

@jtroo jtroo linked a pull request Jan 29, 2025 that will close this issue
4 tasks
@jtroo jtroo removed the question Further information is requested label Jan 29, 2025
@jtroo
Copy link
Owner

jtroo commented Jan 29, 2025

At your convenience, if you can test that the limit-event-read-linux branch fixes the issue, that would be appreciated.

@tedcook94
Copy link
Author

I haven't been able to cause another crash since switching to that branch, so I think your commit has fixed my issue!

Once the PR is merged, what does the release cadence look like? I'd like to go back to a stable release as soon as possible instead of continuing to use a local debug build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working linux Issue pertains to Linux only
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants