-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terminal keyboard input support #163
Comments
Feel free to suggest better Unicode art for any of these, or for keys not covered yet! FWIW, I looked at the circled numbers for function keys, but they only go up to ㊿, while terminfo function keys go up to 63. Superscript digits give us as many function keys as we could want, though of course that's not the only way to go. |
Why not just go all-out and do something like: and the JSON follows this template (derived from libSDL): {
// a SDL_Scancode value
// from https://hg.libsdl.org/SDL/file/bc90ce38f1e2/include/SDL_scancode.h
"scancode": 1234,
// a SDL_Keycode value
// from https://hg.libsdl.org/SDL/file/bc90ce38f1e2/include/SDL_keycode.h#l34
"keycode": 5678,
// bitmask of modifiers -- SDL_Keymod value
// from https://hg.libsdl.org/SDL/file/bc90ce38f1e2/include/SDL_keycode.h#l322
"modifiers": 1024,
} Even if we don't use JSON, please pick a representation that can handle all modifier combinations. |
all this terminal related questions are in fact much more complex than expected. just as an inspiration, you should perhaps take a look into this microsoft article series about the history and recent improvements of the windows console infrastructure: https://devblogs.microsoft.com/commandline/windows-command-line-backgrounder/ and there is also a nice introductory article available about the POSIX/linux side and all its mysteries: http://www.linusakesson.net/programming/tty/ if you are studding this writings, you'll immediately grasp, why the console handling interfaces look so different on both sides -- on this family, which has its roots in small compact PCs for local operation and in those other tradition, where remote access was more or less a requirement from day one on. especially the benefits of the latter approach can hardly be preserved, if we only provide an inadequate simple solution. i wouldn't underestimate the complexity of this field. IMHO it makes much more sense, to just reuse the available infrastructure on the given systems and already available mature software components in a responsible manner (as also suggested in #161), instead of wasting to much energies by reinventing everything again (...and very likely trap into the same pitfalls as our predecessors ;)). sure, to some degree it's really necessary to make clear security related decisions to avoid obvious risks, but in general WASI shouldn't restrict or overly complicate the freedom and flexibility of practical utilization more than necessary. |
My sense in this issue is to consider just low-level terminal input, and not try to design a general-purpose input-event system. A general-purpose input system would be a valuable thing to have, but I think terminal input is a sufficiently distinct domain that we don't need to unify them. Assuming that's reasonable, we can go with something much simpler than JSON for terminal input. An SDL-style scancode vs keycode distinction is a good idea, but for terminal input, I don't know of any situations where we have SDL-style scancode information, so my sense is that we don't need to include it here. Using SDL code numbers, which are based on the USB keyboard spec, are also a good idea, though terminal input doesn't usually use SDL, and it doesn't receive hardware or raw OS input values, so it doesn't have a strong affinity here. Modifiers: If we did go with Unicode symbols, we could represent modifiers with symbols too -- ⎈, ⎇, ⇧, prepended to the main character. However, as I research this domain more, I'm less excited about using Unicode here. It is nice if up-arrow on input can send the same sequence as move-the-cursor-up on output, and for output, we probably want to follow the established ANSI sequences for basic cursor movement and such. |
One of the benefits of JSON is easy extensibility and wide programming language support. If we restrict ourselves to a single object where all members have C identifier names and integer values, it should be quite easy to parse for those who don't want to use a general JSON parser. We should specify that all unknown names should be ignored, except for Example parser: #[derive(Copy, Clone, Debug, Eq, PartialEq)]
pub struct Keystroke {
scancode: u32,
keycode: u32,
modifiers: u32,
}
#[derive(Clone, Debug)]
pub struct ParseError;
struct Parser<'a> {
iter: std::str::Chars<'a>,
}
impl<'a> Parser<'a> {
pub fn new(text: &'a str) -> Self {
Self { iter: text.chars() }
}
fn peek(&self) -> Option<char> {
self.iter.clone().next()
}
fn peek_some(&self) -> Result<char, ParseError> {
self.peek().ok_or(ParseError)
}
fn expect(&mut self, ch: char) -> Result<(), ParseError> {
if self.peek() == Some(ch) {
self.iter.next();
Ok(())
} else {
Err(ParseError)
}
}
fn peek_digit(&self) -> Option<u32> {
self.peek().and_then(|ch| ch.to_digit(10))
}
fn parse_int(&mut self) -> Result<u32, ParseError> {
let mut retval = self.peek_digit().ok_or(ParseError)?;
self.iter.next();
while let Some(digit) = self.peek_digit() {
if retval == 0 {
// must be zero or start with non-zero digit
return Err(ParseError);
}
retval = retval.checked_mul(10).ok_or(ParseError)?;
retval = retval.checked_add(digit).ok_or(ParseError)?;
self.iter.next();
}
Ok(retval)
}
fn parse_name(&mut self) -> Result<&'a str, ParseError> {
self.expect('"')?;
let initial_str = self.iter.as_str();
match self.peek() {
Some('_') => {}
Some(ch) if ch.is_ascii_alphabetic() => {}
_ => return Err(ParseError),
}
self.iter.next();
while self.peek_some()? != '"' {
match self.peek() {
Some('_') => {}
Some(ch) if ch.is_ascii_alphanumeric() => {}
_ => return Err(ParseError),
}
self.iter.next();
}
let len = initial_str.len() - self.iter.as_str().len();
let retval = &initial_str[..len];
self.expect('"')?;
Ok(retval)
}
fn skip_whitespace(&mut self) {
while self.peek().map(char::is_whitespace) == Some(true) {
self.iter.next();
}
}
fn parse(mut self) -> Result<Option<Keystroke>, ParseError> {
self.skip_whitespace();
self.expect('{')?;
self.skip_whitespace();
let mut scancode = None;
let mut keycode = None;
let mut modifiers = None;
let mut version = None;
while self.peek() == Some('"') {
let name = self.parse_name()?;
self.skip_whitespace();
self.expect(':')?;
self.skip_whitespace();
let value = self.parse_int()?;
self.skip_whitespace();
match name {
"scancode" => scancode = Some(value),
"keycode" => keycode = Some(value),
"modifiers" => modifiers = Some(value),
"version" => version = Some(value),
_ => {}
}
if self.peek() == Some(',') {
self.iter.next();
self.skip_whitespace();
} else {
break;
}
}
self.expect('}')?;
self.skip_whitespace();
if self.peek().is_some() {
return Err(ParseError);
}
if version.is_some() {
return Ok(None);
}
Ok(Some(Keystroke {
scancode: scancode.ok_or(ParseError)?,
keycode: keycode.ok_or(ParseError)?,
modifiers: modifiers.ok_or(ParseError)?,
}))
}
}
impl Keystroke {
pub fn parse(text: &str) -> Result<Option<Self>, ParseError> {
Parser::new(text).parse()
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test() {
fn test_case(text: &str, expected: Result<Option<Keystroke>, ()>) {
assert_eq!(
Keystroke::parse(text).map_err(|_| ()),
expected,
"text = {:?}",
text
);
}
test_case(r#"{"too_big": 12345674738384848}"#, Err(()));
test_case(r#"{"version": 1}"#, Ok(None));
test_case(
r#"{"scancode": 123, "keycode": 456, "modifiers": 789, "ignored": 0}"#,
Ok(Some(Keystroke {
scancode: 123,
keycode: 456,
modifiers: 789,
})),
);
}
} |
Using json would cause excessive memory usage. For example if you want to paste 1mb of data into the terminal, that would send ~200mb of data to the program. All of which needs to be formatted and parsed again. It is also a lot of extra mandatory code. Using something binary like An example parser would be: #[repr(C)]
struct Keystroke {
version: u32,
scancode: u32,
keycode: u32,
modifiers: u32,
}
fn parse(b: &[u32; 4]) -> &Keystroke {
unsafe { std::mem::transmute(b) }
} One problem with your current choice of fields is that it makes it impossible to paste characters without keyboard equivalent, like emoji and control codes. |
@bjorn3 I agree JSON seems rather excessive this use case. (as does |
@bjorn3 JSON is only used for the things that don't have a normal UTF-8 representation (and for ESC itself), if the text you're pasting is UTF-8 without any ESC characters, then it can be sent to the input unmodified without increasing the input size. The JSON is only used instead of UTF-8 for keys like Ctrl+Shift+1 -- because that key combination doesn't have a defined UTF-8 representation. |
i really share @sunfishcode's POV concerning this topic:
we simply have to differentiate between:
SDL and its data representation may be seen as nice cross platform solution for the later case, but it isn't a suitable example, how to handle the requirements and capabilities of the first category, i don't think we should waste to much efforts on inventing another translation/representation of control commands. it's more important to understand their original function in controlling editing operations on the local terminal and generating out-of-band signals. but most of this terminal features became highly configurable over time and provide processing on the other side of the communication line as well. it's mainly this kind of terminal IO configuration, which i would see as the most important requirement, if we want to realize more satisfying input capabilities -- e.g. switch between canonical mode and non-canonical mode input processing. some of you may argue, that i'm too much focused on nothing else than this very old fashioned POSIX terminal interface, but that's done by purpose, because it's in fact this group of systems, where terminal io and CLI based work still has most practical relevance in real world. and those few exceptions, which do not fall into this category (e.g. windows and web based solutions like xterm.js) usually provide very similar terminal control capabilities. that's why i still see a straightforward WASI POSIX termio configuration extension module as a more appreciable and much easier realizable and compatible solution than any more ambitious cross platform oriented compromise. |
I also think we can get by without a "version" field as well. The key events we're talking about here are described by terminfo and are very stable. And more broadly, while the kind of extensibility that JSON would bring would have advantages, I also think the main value in adding functionality like this to WASI is in compatibility with existing terminals. Existing terminals don't have extensibility at this layer, so we wouldn't gain much by making WASI extensible at this layer either, And, we'd risk introducing complexity and new error conditions. So while I appreciate the ideas, I don't think JSON turns out to be a good fit here. FWIW, I'm also leaning away from the Unicode approach I suggested at the top of this issue. While this is a space where vt100-family terminals in use today differ significantly, leaving room for WASI to potentially also do something different, using more traditional-style escape codes seems good enough, and will simply some implementations. |
Linux terminal is absolutely horrible for all non-English users out there, because it doesn't transmit keycodes, so if you switch the layout, all shortcuts with letter keys stop working. As much as I like Linux terminal is absolutely horrible for users of Desktop software, where you press Linux terminal has absolutely horrible latency. Because for the parser to distinguish between Esc key and any other key encoded with "Escape sequence", it needs to wait. Linux terminal is horrible stateful mess. In other OS having a buffered queue of input events was a norm - in deterministic TUIs I pressed my keyboard piano and go preparing some tea while the app processes shortcut after shortcut. In Linux I am waiting for each key combination to finish before I start the next one. Because Linux terminal needs a pause to know if it is an Esc key I am pressed or F1. Linux terminal can also mess its own state, so there is a The rhetorical question - do you really want to bring this ancient mess (which I admit was an extremely useful thing back in 1960s) into the post-COVID era to make people feel the pain as I felt it? The bottom line - there is no other opportunity out there except in WASI to create an alternative 21st century protocol for (terminal) keyboard input, convenient for developers to debug and implement, with low latency, standardized physical key ids (because USB HID standard could not figure out how to add pictures to their PDF you need to guess key names on your keyboard). The use cases should include all applications that now run in GUI, such as games, so that hot seat Mortal Combat in terminal could become possible. If WASI implements a good keyboard interface/abstraction (I dk, |
The most important thing needed for anything to happen in this space is for someone to volunteer to make it happen. The main reason for considering a design within the "ANSI" family of terminal emulators is compatibility, both with existing widely-popular terminals, and with large amounts of existing application code which knows how to talk to these kinds of terminals. This family is so popular that even Windows has chosen to join it. Also, I expect that we could scope a terminal input API such that it wouldn't be expected to be WASI's exclusive input method for all use cases. In particular, if WASI ever gains a GUI API, I wouldn't expect it to use a terminal input API for input. Programs that want to have both terminal and GUI UIs would usually need separate UI code for both in any case. |
Along with ANSI escape sequences for display, we should also consider escape sequences for input, so that arrow keys, function keys, page-up/page-down/home/end/etc. can be used.
Assuming we take the approach in #162 of avoiding exposing the termcap/terminfo/TERM information to applications, it seems logical to the same thing for inputs, and just define escape sequences used by WASI, and have implementations translate into those sequences.
At the risk of being too cute, in the Unicode era, we could have quite descriptive sequences, something like this:
and so on, with "␛" here representing an actual ESC control character, so we can distinguish between the user entering a literal unicode symbol and pressing one of these special keys.
The text was updated successfully, but these errors were encountered: