-
-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global keyboard shortcut portal #624
Comments
This also sounds like something particularly useful on Wayland, since apps can't arbitrarily access keyboard input whenever they want (no keyboard event polling). So traditional app-provided global shortcuts won't work on Wayland. On Wayland you'd need the Compositor to "capture" global shortcut usage on behalf of the app, but today that isn't very easy or accesible to manage. Having a standard way of asking the host system/compositor to provide a global shortcut for the app sounds like a very clean solution. |
An api proposal (or even better: an implementation) would be appreciated. |
This would definitely be necessary for things that have core functionality in global hotkeys, for example Mumble or Discord with push-to-talk. Is the xdg-desktop-portal project open to API proposals for this, and/or somewhere to discuss and flesh out what an API would look like? |
I can write a rough draft of portal API proposal for this feature if anyone is interested, I do have a few (optional) features that I wanted to include:
Let me know if anyone is would be interested in these optional features. If so, I can include them in the draft as well. |
@rohmishra in my mind when proposing this idea, I was really thinking of limiting the scope to launching apps and FreeDesktop Actions, which as far as I know would cover most if not all of the uses by elementary AppCenter apps. I feel like PTT is different in that it assumes the app is running and able to intercept/handle a specific key internally; are there other examples of these generic shortcuts? But off the top of my head I am not sure this is a good fit without inventing yet another standard of types of keyboard shortcuts. I don't feel strongly, I just fail to see the other potential uses and am not entirely convinced it's a good fit. I think 2 makes sense and would be really convenient for in-app education! I'm not sure I understand the use of 3. |
For one, push to talk and apps that display an overlay/menu when you press and hold apps are the only two use cases I can think of right now. I did have a different one in mind a few days ago but I failed to write it down 😅 so it's lost to history I guess. That said, I do have a different use case in mind that I glossed over previously - for PTT the app needs to know if the keys are currently pressed or not. Do we want to support that use case? I feel like with the ubiquity of video conferencing these days, there might be a lot of request to support this regardless. In that case, Do we want that to be a seperate API? As for the third one, the shortcuts are automatically cleared when you sign out (or quit the app, not really sure which one would be better). PTT actions are probably a good example of its use case. But a UI that dynamically changes a lot might wanna use this too. I personally don't care either way but just thought this might be something someone might be looking for. It was a valid feature to impliment on x after all. Except for the PTT/overlay case, I do think most use cases can be solved by the current shortcut system of just executing a command with a flag/dbus-send to let the app know of an action. It would certainly be easier as except for a portal to trigger the add shortcut UI automatically and pre-filling the command, we already have everything else in place. I'm on mobile right now, but GSConnect can trigger the action in its own UI, so unless it uses a weird hack, gnome already has most of the stuff in place. Without 1 & 3, I have a simple structure in mind: We store an invisible to user token for each app that has requested a global shortcut. The app can use that to request current key-binding, or clear it. We may also (optionally) consider passing it back to the app in argv/dbus so that if the app allows users to have some custom action on keypress (eg run a series of user defined actions or enter a custom text) it can simply identify against the unique token/UUID. |
PTT seems to me like one of the more common requests when it comes to global key bindings, and I imagine it cannot rely on executables or .desktop action entries since one would need both a start and a stop signal. Would also be nice te know what actions exactly tends to be bound to global keys in Elementary Apps too to better understand what they are actually trying to solve. As for storage in the permission store, I think it'd be good to take inspiration from the screen cast session storage method here; it solved a very similar issue; e.g. letting the portal backend provide the actual stored content, while using a x-d-p provided token to make it possible to restore. |
I have a use case for options 2 and 3. I am working on an application to calibrate mouse sensitivity between games. Without entering into technical details, you'd start the app, enter calibration mode, tab over to the video game and press a hotkey to start recording the mouse input and another hotkey to stop. So my application would need a way to configuring a few global shortcuts that are active as long as it is running, and I'd like to make those hotkeys configurable by the user. Since the app requires raw access to /dev/input/event*, I can already simulate some kind of global shortcut system, but I would prefer to use the actual portal, if there's one available. |
@jadahl PTT would require sending key-up and key-down flags as distinct signals which isn't something other apps wanna deal with and something we want to avoid sharing unless absolutely necessary. And yes I was thinking of similar structure as screencast API. I was about to sleep while writing the previous comment and it shows. @1player the plan is that apps won't be able to force a keyboard shortcut, rather the user gets to choose them. I'm thinking of allowing apps to suggest keybindings to be helpful but those still won't be binding on the user. That's why i want to include a way to read binding so that apps can dynamically update their documentation and help pages to reflect the right shortcuts. Also we might actually want to explore hiding /dev/input behind a portal cause otherwise it defeats the purpose of apps can read keyboard anyways by adding a single line to their manifest. Permissions granted by the manifest are automatically granted when running the app and users shouldn't be expected to review them for every app. |
True, but done right, they (the ones not caring about start/stop) could just ignore all but the start signal for example. |
It would result in inconsistent behavior. Some apps react to key-up, others do to key-down. It is also a more complex API that would require more time to implement. You loose the ability for shortcuts to work even when the app isn't running yet (something other platforms don't really do and would result in a better experience if we can land that! We don't want users to have to check if the app is running yet or not.) Having PTT tied to session only bindings may be a good idea too. The app can request key-binding the first time you enter a call for the day. Paired with suggested bindings it should be a one extra click/enter-key once a day/restart affair for most people. That said, people are not going to like it. But I don't think all bindings receiving key-down and key-up is a good idea. |
An alternative is for applications to ask for what type of event to receive, i.e. start/stop/cancel vs triggered. Anything that doesn't actually want the former would just receive the latter. What makes PTT special in that it shouldn't be able to be saved more persistently? |
I like your ideas @rohmishra , however I'd like to make one comment on this:
I believe it should be possible for apps to simultaneously register the same keyboard shortcut. For example, I use both Mumble and Discord - I never use the voice capabilities of each at the same time, however I don't want to have to register different PTT shortcuts for each. If I were a bit more insane and had friends scattered across Mumble, Discord, Ventrilo, Teamspeak, and Teams servers (granted I have no idea if Teams even does PTT), it would become very annoying very quickly to have to remember each one's individual PTT key depending on which app I'm using. Having 5 different push-to-talk keys could get unwieldy quickly. Also, one comment on the idea of "temporary global shortcuts" - I like the idea of having them be unregistered after an app exits, however does this mean you will be prompted for the keyboard shortcut every time the app is opened? That could get annoying very quickly and might discourage people from using that functionality. What if instead, you required that the prompt said something like "Application X would like to register key Y as a global shortcut", and then you could answer it with
or something like that? I think that would solve the use cases for temporary shortcuts. (As a bonus, it would be interesting to allow the user to change "key Y" from the prompt to something different, but not really necessary for an initial design.) |
The only thing I'm not sure about is how annoying a prompt like that would be if an app wants to register several shortcuts. As a bad example, a screenshot tool that registers one key for fullscreen, one key for current window, and one key for selecting a rectangle. If a user only wants those shortcuts registered temporarily, they'd have to answer three prompts every time they open the program. One solution to this would be to allow an app to register multiple shortcuts with a single prompt, but that gets rid of the option to allow some shortcuts but not others (at least, not without making the prompt quite a bit more complicated). I'm not a big fan of this, but I'm not wholly against it either - an app registering multiple shortcuts might be enough of an edge case that it's not worth going this route. Any thoughts? |
@k3d3 that's the idea. All apps request access to "common-PTT-key" or something like that, and the first app that actually uses it gets to use it in call while others can request a temporary alternative. So for example if you are on a discord call and your PTT combo is ctrl+space, discord PaTT will be triggered by that, but if you open teams for a call WHILE on a call in discord, it will ask you for a temporary alternative. Also, that is the idea behind temporary shortcuts. User will always be in control. The app can just ask for the shortcut to be temporarily assigned, or the use can force it on the assignment screen. Temporary shortcuts are intended for apps that you know you don't use often or as discussed above, for apps that are meant to be used once/rarely so you don't want to pollute your shortcuts with them. They are optional. Allow once doesn't really make sense so it's better to just skip that to minimise complexity. And yes, the user gets to decide the key shortcut as mentioned above. Apps can purely just SUGGEST shortcuts to simplify the flow, not enforce them. |
Agreed - so long as the UX isn't too complex, that works. User choice is better.
What about it doesn't make sense? I feel like it would keep the API much simpler for an app to just say "register a key, preferably key X" and that's it, then let the user decide if that shortcut should be allowed forever (permanently) or once (temporarily) - the naming could certainly change if the terminology is what's bothering you. I think it's better for the user to decide if an app is commonly or uncommonly used, rather than the app developer having to guess.
I have mixed opinions about this. While I do understand what you're getting at, this ends up being a bit of the same problem where now I have to remember multiple PTT keys. That said, I kinda like the idea of having a warning prompt when a keyboard shortcut is already in use. What if, instead of only asking for a temporary alternative, it asked for a temporary alternative but also gave the option to use the same key? |
My two cents: the request for global shortcuts should be as descriptive as possible making different solutions for the backend possible. One request should be a list of all shortcuts the application wants to set, each shortcut consists of:
That allows a backend to implement a UI where it's possible to:
For persistence we can reuse the screen cast session storage method like @jadahl suggested. If an action is bound to multiple shortcuts they all get triggered. This should be sufficient for all the use cases listed in this thread and give the user enough control. e: a note about event types: dbus activation is the only type which can be triggered while the application is not running (but also when running). |
@swick completely agreed, I think. I'm just curious what you mean by "type of event" - is this something that would be shown to and selected by the user, or just handled by an app via the desktop-portal API? I'm not sure if that's all that important to show to a user (so long as it shows which key, just not necessarily if it's start/stop or just triggered), though I may be misunderstanding what you mean there. Also if an app expects a start/stop PTT-style binding and a user chooses triggered, that might break things. Also since this is xdg-desktop-portal, wouldn't everything behind the scenes be handled by dbus? Other than that, yeah, I like that a lot! |
The type of event is an implementation detail that cannot be overwritten by the user. It can influence what action can be bound to it. For example a PTT can not be bound to a touch gesture.
Yes. I think you're confused because of the dbus activation type? The dbus activation mechanism allows shortcuts to be activated even if the application is not running (https://specifications.freedesktop.org/desktop-entry-spec/1.1/ar01s07.html). |
Okay, thanks for the clarification. I completely understand and fully agree now. :) |
One more thought: we might not even need a session storage mechanism if we send the events to a well-known dbus service just like with dbus activation. The portal would then basically only be responsible for configuring the shortcuts. |
I think I'll stay out of the PTT discussion as that's just not the problem I was looking to solve (and not one I've put much thought into), but I trust you all to come to a solution. :)
Sure! The two types of shortcuts that were in use pre-Flatpak that have been requested when an app is not in the foreground are just simply launching the app, and launching a specific mode of the app. The latter could be handled e.g. by passing a CLI flag, but for user presentation reasons, I think it makes sense to lean on the existing FreeDesktop additional application actions as these are in use today across FreeDesktop apps and desktops, and give us built-in niceties like translatable human-readable names for features, icons, and the actual command to be executed. It would mean not duplicating these actions across multiple places and encourage their continued use for desktop interoperability. Specific examplesThese two apps exemplify the two categories of requests we're getting from developers: ClipsClips is a rich graphical clipboard manager; it launches with a view of your recently copied items to help you recall and paste them. Since this is the whole function of the app, opening it from your applications launcher, dock, etc. always launches into this view, so they are just requesting a more streamlined way to configure a desktop-wide shortcut that would launch their app. Currently, they use in-app messaging and a PlannerPlanner is a to-do planner with a very full-featured UI, but it has a "quick add" feature intended to be launched from anywhere in your OS when an idea strikes you: It is currently implemented by manually writing a custom command to the custom keyboard shortcut GSettings which does not work reliably across desktops and in a Flatpak (besides requiring a big Flatpak sandbox hole). It could be implemented as a .desktop action to be accessible from the app launcher in addition to this system-wide keyboard shortcut. |
Thanks a lot @cassidyjames, that's really useful information! |
I think PTT/hold-for-action is something that might need further exploration. It is certainly a feature that we would want to have even if it is just for two use cases, one of which is rather niche. For now we should just focus on just automating the existing global shortcuts methods that we have - allowing apps to request shortcut and set the command for execution. PTT is something that can be added in later probably. For now call apps can work around this limitation by allowing a command to both enable and disable (toggle) mic. Not the most elegant solution but it is workable Here is something we might want / what I have in mind:
We also might want to check and maybe limit what command or dbus the app is calling and limit it to itself or warn users if it is trying to set the shortcut to something else. |
I see that this discussion is revolving around keyboard shortcuts. I've come across it while trying to see what can be used for projects that want to send key presses. An example is keepassxreboot/keepassxc#2281. Would this be connected or such functionality would require another API? |
The issue you linked to should be solved with something similar to androids autofill feature. For emulating input libei is what you want to look at. |
We Plasma have also been looking into this problem and we mostly ran into very similar conclusions as the conversation in here before we reached this thread. We need:
The as you've all mentioned repeatedly, Push-To-Talk can't easily be addressed within the former point as we'd need handling the release. Something we discussed too was the possibility of tackling Push-To-Talk using a dbus service like we do for MPRIS where applications get to implement an interface stating their information and expectations and another process (the compositor in this case, presumably) would handle the logic. If you think that addressing either is something you are interested in, we can provide a proposal to discuss. |
Leaving out Push-To-Talk out of the keyboard shortcuts is perhaps the best idea; it'll make the keyboard-release issue less urgent, if needed at all, and with a more aware (about Push-To-Talk) interface, the system providing it can be more clever and e.g. force-mute the microphone when the button/key isn't pushed. If separate, it should probably still be a portal, and not its own separate interface, so that it can more easily integrate with the permission store, have a libportal API etc. |
...not to mention that it'd be really nice if the API could be integrated into this unified control panel KDE has had for applications using KDE's own keybinding APIs since at least the early 2000s when I started using it. (One example of exactly the UX concern @rohmishra brought up is that, if you enter a keybinding in that dialog and it's already taken, it'll offer to reassign it from whatever else has it, across all applications using Sure, there's the hazard that you want two things to have the same global hotkey because you never intend to run them at the same time, but then ...and that's another concern. Making sure that applications that expect the desktop to take responsibility for that don't become less functional when run sandboxed. |
I'm not sure where this is coming from. I've included a user-readable description so it can be listed in the system's UI as desired. It's the implementation (i.e. the DE's) who do the shortcut creation, so they remain in control. Please make sure you understand the problem before adding a random rant. |
|
What's the alternative? If we don't define the format of the hints what are clients supposed to do? For example if a KDE app uses whatever format KDE expects the hint won't be useful to gnome. I guess you could have multiple hints for multiple desktops but you would still need to define the format of each of them which doesn't seem like an improvement at all.
Can you expand on what you mean by that? How should apps be in control of configuring the shortcuts if the system should be in control of that? Do you mean that apps want to be able to open a dialog to bind individual shortcuts? And if so, how would that be a better user experience than a system UI where the user can bind all shortcuts of the app? |
To expand on this, KDE apps already have this feature and we're hoping to be able to continue supporting is. The way it works is that from within an app, a user can show a window with just the shortcuts used by that app. There is also a global location in our System Settings app that aggregates all shortcuts used by all apps on the system. So we have both, because this split makes sense to us and our users. It's user-friendly for users to be able to edit the shortcuts for the app they're currently using by seeing a window of all of that app's own shortcuts. |
This window is part of the system?
I agree with that but to me it seems to contradict what @aleixpol said before:
So I feel like I'm missing something. |
It's a library component provided by a KDE library, and KDE apps can use it if they want (most do). I'll let Aleix answer your other questions; I just wanted to chime in that letting a sandboxed app show its own shortcuts to the user is something we would find desirable in KDE. |
https://docs.kde.org/stable5/en/khelpcenter/fundamentals/shortcuts.html That's what you're talking about? In that case I'm starting to understand where this is all coming from. We still have to pop up some system window which does the actual binding part, right? Can't you use another entry for global shortcuts: Settings → Configure Global Shortcuts? |
Because Wayland forbids processes from listening to system-wide keypresses, Discord and WebCord are unable to listen for a push-to-talk key. This commit hacks onto WebCord's socket server and allows for sending a payload to activate and deactivate push-to-talk. The delay between sending the packet and push-to-talk state changing appears to be low enough for daily use. You could activate push-to-talk like so: `echo '{"pttAction": "activate"}' | websocat --origin https://127.0.0.1 ws://127.0.0.1:6463` The two "actions" that are supported are `activate` and `deactivate`. The "hack" part of this commit also refers to the activation method - the F12 key is used to trigger push-to-talk. Make sure that F12 is bound before triggering actions, otherwise nothing will happen.
How about injecting the keypress events as XEvent from evdev? |
A portal implementation can implement support for evdev events. This is covered by this spec. |
Are there any simple test clients available for this protocol? |
I've done two ports so far, for our test app and for mumble. https://invent.kde.org/libraries/xdg-portal-test-kde/-/merge_requests/6 |
To be clear, are we talking about only allowing For example: I can to set a shortcut |
The portal API is explicitly clear that it doesn't restrict what the portal backend can use to trigger a global shortcut. It can be a modifier+key combination, it can be just a key press, it can be a foot pedal, looking at the computer angrily, or anything else. With that said, I'm not familiar AutoKey, so if there is something missing from portal API itself, I suggest opening an issue, describing what is missing. |
Awesome, so that should work for apps like AutoKey. Does the API include communicating whether the key/shortcut is pressed/held down or released? The other missing part of the puzzle for AutoKey is knowing which app/window is in focus when the shortcut is triggered, but I imagine that's for another solution or portal, right? If true, is there any such portal I should look into? A "window in focus" kind of portal? |
At the moment, the state implementation embodies an earlier statement that it's either too big a security threat to allow 'remember this' and too irritating/impractical to prompt every time or "not everything needs to be a portal". Here's the open issue requesting a portal for it for use in things like time tracking apps, joystick-to-keyboard event synthesizers, and so on: #304 (The argument I made there is that, if it's not a portal, then any implementer is almost certainly going to wind up needing to lobby for and bind to a separate API for each Wayland compositor now that X11 is on the wane.) |
It has two signals, one for when the trigger is activated, and another for when it's deactivated. It's meant to handle e.g. the press and release of a key, so yes.
Was going to link to #304 too, but perhaps it's less "track user activity and behavior" like to include what the focused application was when a global shortcut was activated; I don't know. Edit: missed a "de" |
Excellent! All the pieces are already falling into place, then🙂 Thank you folks! |
We have received several requests from elementary AppCenter app developers for a way to set (or prompt a user to set) a global keyboard shortcut to launch their app or a specific feature of their app (e.g. an Application action).
Pre-sandbox, these developers would set the GSettings for the desktop itself to add their app to the global keyboard shortcuts. Obviously this is non-ideal and only reasonably worked because we human reviewed every app’s source upon submission and update.
Today in Flatpaks, we recommend developers direct users to System Settings > Keyboard > Shortcuts and tell users how to add a custom shortcut manually.
Ideally, however, apps could use a Portal to request a system-wide shortcut (along with a description/rationale of the feature), and then we could provide a UI to display the request, let the user pick a shortcut, avoid conflicts, etc.
The text was updated successfully, but these errors were encountered: