feat(ui): Enhances virtual keyboard with sticky modifier key support #500

IDisposable · 2025-05-22T10:25:57Z

Adds support for Shift, Ctrl, Alt, Meta, and AltGr keys to the virtual keyboard, treating them as "sticky" keys when entered on the Virtual Keyboard. This allows the user to click the Shift or Ctrl or Alt or Meta or AltGr button and then click another button to emit the combo (e.g. Ctrl-Alt-Del would be done using exactly that click sequence).

Tracks and displays the active state of modifier keys (Shift, Ctrl, Alt, Meta, AltGr, CapsLock, NumLock, ScrollLock) in the InfoBar.
Updates the virtual keyboard to reflect the state of these modifier keys, including a "depressed" visual style.
Changes the virtual keyboard layout based on current Shift and CapsLock states.
Improves keyboard event handling to correctly send keycodes with appropriate modifiers.
Updated the InfoBar to show the status of all sticky keys
Added code to the WebRTCVideo.tsx to ensure that physical keys are also kept in sync so you can use have the Virtual Keyboard up, use the physical Ctrl (or other) key and click the desired character. For example Ctrl-C

Fixes #396

IDisposable · 2025-05-22T10:30:54Z

@ym @adamshiervani This is ready for review and works really well ;)

I am not sure about the CSS style of the "depressed" buttons, I made something up and welcome other suggestions.

IDisposable · 2025-05-22T10:33:05Z

The sticky keys and InfoBar look like this

ui/src/components/InfoBar.tsx

ui/src/components/VirtualKeyboard.tsx

ui/src/components/WebRTCVideo.tsx

ui/src/index.css

ui/src/keyboardMappings.ts

IDisposable · 2025-05-23T08:13:23Z

Rebased and ready for review @ym

IDisposable · 2025-06-12T18:33:48Z

Rebased and squashed. Ready again

ui/src/components/VirtualKeyboard.tsx

adamshiervani · 2025-07-11T15:43:03Z

@ym and I just reviewed this PR. Overall, it looks good - but the underlying keyboard state management is broken.

There are two main issues. First, the data source is split: for keyboard LED state, we fetch from the device; for all other keyboard state, we rely on JavaScript variables - unless the device doesn’t support LED state. In that case, we should theoretically fall back to the JS state.

The problem is that the current codebase handles this inconsistency poorly. Some parts assume the keyboard LED state is always available from the device, while other parts assume it may not be. This results in a confusing and brittle implementation that’s hard to follow and maintain.

Our suggestion is to take a more declarative approach: treat the device as the single source of truth for the entire keyboard state, including LED status. Let that state flow into the UI via messages over WebRTC.

That would mean introducing a unified “super keyboard state” on the device side - one that includes all the elements we want to track. The device can manage any conflicts internally (e.g. between actual LED status and internal state) and then broadcast a consistent, resolved state to the UI. What do you think?

IDisposable · 2025-07-11T16:48:17Z

Our suggestion is to take a more declarative approach: treat the device as the single source of truth for the entire keyboard state, including LED status. Let that state flow into the UI via messages over WebRTC.

I agree completely. I would be happy to make that distinct change and then (if we still want the "sticky" state for the VirtualKeyboad which a lot of folks are waiting on) redo this PR.

I wrote this a long time before the LED stuff was merged and then rebased as needed, so it's a bit of a mess, sorry.

To be clear on the specification, we would want the keystate for all modifiers to be in a single store, with the remote-driven LED ones taking primacy if enabled for the locked states. Then in WebRTCVideo we would need to forward the modifier keypresses (LShift, RShift, LCtrl, RCtrl, LAlt, RAlt/AltGr, LMeta, RMeta, CapsLock, ScrollLock, NumLock, Kana, Compose) to that state manager with a sticky/not-sticky flag of false. In VirtulKeyboard, same logic, but passing true for sticky (if selected in options? could add that). In Macro playback, track the state as the macro is playing... but ignore the remote-driven LED state until macro completes?

One other thing... for the LED states, I'm not sure what is meant by Automatic in this... Host Only would be tracking LEDs sent by the remote, Browser Only would be tracking keystate (up/down) for the shifts in the client TS code. What does Automatic portend?

adamshiervani · 2025-08-04T13:24:12Z

Thanks @IDisposable - really appreciate the thoughtful breakdown!

Yes, treating the device as the single source of truth for the entire keyboard state - including both modifier keys and LED status - makes the most sense long term for me. Any internal conflicts (for example, between LED state and modifier press/release timing) should be resolved on the device side. From there, the device can emit a clean, unified “super keyboard state” that flows into the UI via WebRTC.

On the LED state specifically: even if it arrives with some delay, it should always take precedence and override whatever the UI thought earlier. That’s consistent with how the remote host behaves - if it says Caps Lock is off, then it’s off, regardless of what the browser previously assumed. I mean, in whatever the browser thinks, as far as I see, will be irrelevant, if the LED state says something else.

If LED state isn’t available from the device, we’ll just fall back to JS-tracked state and, well, depending on your level of atheism, pray that it's in "sync" with the remote host.

As for the event flow: both WebRTCVideo and VirtualKeyboard should send all relevant modifier key events to the device. In the case of WebRTCVideo, we’d send them with sticky = false.

As for the Virtual Keyboard, should we have a sticky/record+release button in the Virtual Keyboard, or what do you think? I'm not super dogmatic on a certain approach.

After that, we can drop all local state and just react to whatever comes back from the device.

Regarding macro playback - that’s a great point. For now, I'd say we treat macros as stupidly as possible. We’ll forward the modifier key events to the device and let it manage state updates as usual. No need to ignore LED state during playback or resync afterward for now. If we end up needing tighter control (e.g. to reset before/after execution), we can always add a toggle to the macro creator later.

Lastly, I think we can safely remove the LED tracking mode setting (Host Only, Browser Only, Automatic). Let’s simplify and remove the settings. Always let the device decide what the correct state is, and if LED data is available, just merge that in and go with it. This should reduce ambiguity and eliminate some of the weird edge cases we’ve been dealing with.

Let me know if that makes sense or if you’d like to hash out any specifics further. Happy to help shape the unified state logic.

Lastly, sorry for the delay on this one. Totally get if you're frustrated with the latency, and don't want to spend time on this anymore. If so, totally understand, just let me know <3.

IDisposable · 2025-08-04T16:02:53Z

Thanks @IDisposable - really appreciate the thoughtful breakdown!

Hoping I'm being helpful instead of annoying. Please just let me know (e.g. via email IDisposable @ gmail . com) if you ever need me to do something else or shush.

Yes, treating the device as the single source of truth for the entire keyboard state - including both modifier keys and LED status - makes the most sense long term for me. Any internal conflicts (for example, between LED state and modifier press/release timing) should be resolved on the device side. From there, the device can emit a clean, unified “super keyboard state” that flows into the UI via WebRTC.

Yes. That's so much more simple. I think the big thing it drives is that when someone presses a modifier key, we need to just send the modifier key through to the device, let it reply with the current modifier state, track that browser-side ONLY for the purposes of showing the correct thing on the modifier (lower right) and keyboard state.

On the LED state specifically: even if it arrives with some delay, it should always take precedence and override whatever the UI thought earlier. That’s consistent with how the remote host behaves - if it says Caps Lock is off, then it’s off, regardless of what the browser previously assumed. I mean, in whatever the browser thinks, as far as I see, will be irrelevant, if the LED state says something else.

I love this, it makes it trivial to forward key-up/down events to the device, and trusting the replied state in the browser. I wonder if adding a (optional) overlay on the full-screen (e.g. lower right corner) that has the modifier state in visible form (Shft /Ctrl/Alt/AltGr or something) would help users on the browser side see what's active on the device's last reply

If LED state isn’t available from the device, we’ll just fall back to JS-tracked state and, well, depending on your level of atheism, pray that it's in "sync" with the remote host.

Can we implement that as fall back to the device-tracked state. e.g. send the current state from the device to the browser... and then let the user do the press-release dance on keys if the current state doesn't reflect what they think it should be based on the indicators on the lower right? That way, the device is always authoritative (and would be tracking or not the LED-status from the host if enabled... but that's completely opaque to the browser).

As for the event flow: both WebRTCVideo and VirtualKeyboard should send all relevant modifier key events to the device. In the case of WebRTCVideo, we’d send them with sticky = false.

Gotcha, so we send what we know (from the last mod-status reply). For macros, do we want to send a "resync" message at the start to get back the current state?

As for the Virtual Keyboard, should we have a sticky/record+release button in the Virtual Keyboard, or what do you think? I'm not super dogmatic on a certain approach.

Hmmm, so the point of virtual keyboard, this sticky-modification... is that pressing the Shift key (or other normally not a toggle modifier) would send the key-down event for the modifier, but NOT send the key up until you click the key a second time, right? So you could click Left Shift, keyboard indicator shows shift down, Virtual Keyboard layout flips to the shifted-state, the Left Shift key chip on the Virtual Keyboard gets a style flip (or both shifts?) indicating it's down. Then you click the Left Ctrl, so now both key's are down in the keyboard indicator, Virtual Keyboard layout updates to the Shift+Control state important for some international keyboard characters , then you click the Del key, the device sees Del go down, then a Del go up... and poof, the OS at the controlled PC sees a n actual Shift-Ctrl-Del (and both Shift and Ctrl are still down on the Virtual Keyboard).

After that, we can drop all local state and just react to whatever comes back from the device.

... sorta... I think we're ALWAYS sending just the Key-Down and Key-Up events, and always trusting the device's state management... the only thing that's odd is that when pressing the physical key we send the key down, and when it's released we sent the key up... but for the Virtual Key "chips", one click presses down the Shift key, which is sent as a Key Down to device, device changes the keyboard state, which when received causes the Virtual Keyboard layout to change to the "Shifted" state... which shows the shift-keys "down"/highlighted... and they stay that way until you click the shift chip AGAIN... when it sends a Shift-Key-Up event to the device, and trusts the reply's keyboard state which will now indicate the keyboard is in unshifted state... so the Virtual Keyboard layout reverts to the unshifted state... and that means the shift-key chips are "up"/unhighlighted.

Regarding macro playback - that’s a great point. For now, I'd say we treat macros as stupidly as possible. We’ll forward the modifier key events to the device and let it manage state updates as usual. No need to ignore LED state during playback or resync afterward for now. If we end up needing tighter control (e.g. to reset before/after execution), we can always add a toggle to the macro creator later.

I think we need the reset as always present and ensure the macro to have steps for down and up of the modifier keys... I will have to circle back to that when I get things cleaned up on the virtual key side and then will focus on the macro details.

Lastly, I think we can safely remove the LED tracking mode setting (Host Only, Browser Only, Automatic). Let’s simplify and remove the settings. Always let the device decide what the correct state is, and if LED data is available, just merge that in and go with it. This should reduce ambiguity and eliminate some of the weird edge cases we’ve been dealing with.

YES. I love this. So we need to assume a state device side at connection (possibly sending an eight-zeros keyboard report) at first connection, and any time the controlled-machine replies with an LED state, trust that as truth.

Let me know if that makes sense or if you’d like to hash out any specifics further. Happy to help shape the unified state logic.

I think I'm in sync now 😄 but let me know if my clarifications are wrong...

Lastly, sorry for the delay on this one. Totally get if you're frustrated with the latency, and don't want to spend time on this anymore. If so, totally understand, just let me know <3.

I am NOT frustrated by the latency, just concerned that I am not helping, which is my primary motivation. I don't want to waste your time on changes that aren't in line with your plans. Just point me the right direction and I'll happily work at the pace my real job allows... if I anticipate a delay, I'll let you know.

IDisposable · 2025-08-04T16:06:13Z

I'm going to leave this PR open until I've floated a new one implementing things discussed, we can then evaluate that one without all the distractions here (and close this one then). That okay?

adamshiervani · 2025-08-05T12:40:09Z

I love this, it makes it trivial to forward key-up/down events to the device, and trusting the replied state in the browser. I wonder if adding a (optional) overlay on the full-screen (e.g. lower right corner) that has the modifier state in visible form (Shft /Ctrl/Alt/AltGr or something) would help users on the browser side see what’s active on the device’s last reply

I'd be wary to add more noise to the main UI, but don't we already show this implicitly by showing the key presses?

Can we implement that as fall back to the device-tracked state. e.g. send the current state from the device to the browser… and then let the user do the press-release dance on keys if the current state doesn’t reflect what they think it should be based on the indicators on the lower right?

Yep, totally agree! To clarify, I misspoke earlier. Where I wrote “fallback to JS-tracked state,” I meant “device-tracked state.” Otherwise we’re missing the entire point of this rewrite 😅

Gotcha, so we send what we know (from the last mod-status reply). For macros, do we want to send a “resync” message at the start to get back the current state?

I’d vote to leave macros out of scope for this change unless you see a blocker. I’d prefer to preserve the current behavior and revisit improvements later, once the core input pipeline is stable.

… sorta… I think we’re ALWAYS sending just the Key-Down and Key-Up events…

Yes - agreed on all points here. The only unique bit is the sticky behavior from the Virtual Keyboard, where clicks simulate holding modifiers until toggled off. All good. Do you have any UX ideas on how to best support sticky combinations across different keyboard layouts and languages? That’s the one tricky bit I want to get right.

I am NOT frustrated by the latency…

Mate, you’ve been an incredible help. Your engagement, whether it’s PRs, feature work, or comments on issues, has been consistently thoughtful and impactful. Genuinely appreciate the collaboration!

And yes, sounds great to leave this PR open, until the new one is ready.

IDisposable · 2025-08-05T18:44:28Z

Regarding the keyboard layouts, I am planning on using something (like a keyboard.info) to source layouts and tie them to the keyboard languages/formats and tweak or replace the on screen keyboard layouts to be driven by those in the selected locale... initially going to float en-US 102 key and at least Intl-102 for POC and pull in the others in short order. As for on-screen was thinking about making the keycaps indicate the characters available and have them shift appearance as the user stickies the modifiers.

Ideally, would be cool to just support uploading a definition file from keyboard.info, but that might be a stretch, what do you think?

adamshiervani · 2025-08-06T10:34:26Z

Something like keyboard.info sounds great, but let's start small. My biggest concern is feature creep of the keyboard rewrite. Let's keep it mainly about changing the underlying mechanism, and with that in place we can easily make more significant changes to consumers of the new data flow.

IDisposable · 2025-08-07T22:16:03Z

Work started with doing away with the LED State tracking option and adding the key press/release handling to the device (go) code. I'll get some more done this weekend.

#725

adamshiervani · 2025-08-08T08:51:29Z

Awesome. Actually looking forward! <3

ui/src/components/InfoBar.tsx

IDisposable · 2025-08-22T22:23:13Z

Lastly, I think we can safely remove the LED tracking mode setting (Host Only, Browser Only, Automatic). Let’s simplify and remove the settings. Always let the device decide what the correct state is, and if LED data is available, just merge that in and go with it. This should reduce ambiguity and eliminate some of the weird edge cases we’ve been dealing with.

In reviewing these notes now that #725 is about to merge... I realize that in that code I don't know that I deal with the state where the host doesn't send LED status back to the device. We possibly will want to add that for the rare host that doesn't send LED messages... so when the device is connected to the USB port, it should initialize to something and then track the client/browser-sent keypresses to keep track of CapsLock/NumLock states which gets tossed if an LED state comes in from the host.

It's really a minor edge case, because the only place I would ever expect not to get LED states from the host is in boot mode.

IDisposable · 2025-08-28T00:39:37Z

Closing this now that everything here (except the concept of Caps Lock/Shift affecting the on-screen keyboard) is in #725 and #750