-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
64 image minimum is too small #23
Comments
FWIW, the implementation of Sixel, iTerm2 and Kitty Image protocols in wezterm map the incoming image into texture coordinates on cells in the display; those cells reference the same "atomic" image data chunk, but slice into it. Allowing for differing z-index values overlapping requires tracking multiple textures per cell, but even without that, the bare minimum that I think is generally useful would be FWIW, as an implementor, I honestly didn't think about this feature in terms of min or max number of images, cells or pixels that I wanted to support: my take was that the TE should try to display what was asked of it, and if the system runs out of resources then expose that issue to the user and/or via a response to the application so that it can react. Implementing the kitty protocol was a bit frustrating wrt. the previous paragraph, because it separates transmission from placement and that means that there is potential for "unreferenced" data that needs to be garbage collected: this is the only place where I've put an explicit resource management constraint as notcurses-demo seems to rely on the TE garbage collecting images rather than aggressively deleting them. The constraint is based on the total amount of RAM used by the images rather than size or quantity of images. |
So far as I know you are the only implementation of all three protocols in one terminal. I am curious if any of the following ideas/questions hold true compared to your experience?
|
From zero -> iTerm2 wasn't all that difficult, and I like its relative simplicity compared to the other protocols. I think the biggest potential stumbling block was allowing for arbitrary sized OSC buffers in the parser, which I know some TE maintainers don't like, and I suppose the second is likely decoding the image containers, which is hard if you can't use a pre-existing library of some kind, but easy otherwise; Rust's Using that as the foundation for the model made the others reasonably easy: reference the incoming image data and then you can "simply" track If I'd started with Sixel, I think I would have built things differently, but not in a good way. Starting with the above made it easier to look at Sixel as two stages: 1) parse the sixel data into a bitmap, 2) feed the bitmap into the same slicing logic used for the iTerm2 protocol. If I'd started with Sixel, I might have been inclined to do something like per-cell bitmaps and I think that would probably have been a bit horrible. When it came to implementing kitty, I opted to also map it to the same data model I used for the others which meant that the attached image data now turns into an array of Implementing kitty was tedious because the surface area of the protocol is so high: I had to augment my parser to support APC sequences, allow for chunking data across multiple sequences (which meant introducing a buffer in a slightly awkward place, and the logic for re-assembling that), and then implement the (de-)?serialization of the large number of parameters. There are about 2000 lines of code for that, and probably <200 lines for the relevant part of the iTerm implementation. The image-distinct-from-text-ness of the protocol largely disappears in the wezterm implementation; the difference is dealt with largely at placement time but does mean that some operations that are conceptually I would agree that the same data structures work for all of the protocols, if you pick the right ones! If I were to do this again, I would do it in the same order because the simpler protocol as a starting point led to a simpler internal design than I think I would have built if I'd started with a more complex protocol. In terms of testing, there is a little bit of core code that is common, but I think the scary bits that would benefit most from testing are around protocol decoding. Sixel has more stuff going on than iTerm2 and some weird stuff too, like the hue in its HSL scheme being rotated away from the common standard hue angle. The kitty protocol has so many parameters with single character names that have different meaning between different commands (I'm not hating on the protocol: it looks like an honest case of the design evolving that way rather than a deliberate choice) that I'm 100% sure there are at least a handful of issues yet to be discovered in my implementation that aren't obvious dumb things like me just not having implemented some of the various deletion subcommands yet. I think the biggest issue wrt. testing, is that there isn't a great way for an external test suite to run and measure conformance. A TE author can write tests that look at internal state, but, for example, I can't take Kitty's image protocol tests and run them against wezterm. It would be interesting if there was a way to run a TE with a fixed font and a way to capture a bitmap of the display and compare it against known bit-patterns. Additionally/alternatively: defining a test protocol for TEs that exported the display information in a defined format that could then be used to perform assertions. esctest sort of does the latter by taking advantage of a screen region checksumming feature to validate xterm and iTerm2, but it doesn't appear to be actively maintained (general lack of activity, and I have a MR for wezterm that hasn't had a response so far), and the image stuff wouldn't be reflected in that checksum in any case. Thinking about how all the above might shape GIP: my general inclination is that fewer protocol commands/parameters overall are "better" from a combinatorial explosion perspective, and that consistency and clarity in naming would be nice. For example, there are a lot of single character parameters in kitty's protocol that are easy to confuse or misinterpret and that it would be less prone to misinterpretation if the names were slightly longer; 2-3 chars for some of them would increase clarity a lot without dramatically inflating the bandwidth, and it's probably worth making a pass over the GIP spec with that in mind to future proof it for later versions of GIP. |
i would love for there to be emphasis on supporting a greater number of smaller bitmaps. the ideal is what i have been calling "mosaics", where bitmaps are treated entirely as cell-sized entities, which i believe y'all are in agreement with. last i checked, Jexer uses wide graphics of one cell height, right @klamonte ? that has its advantages, but going all the way to the sweet land of mosaics would essentially eliminate my most complicated state machines -- there would no longer be a need to "wipe" and "restore" cell-sized regions within a larger graphic. with that said, since kitty 0.20.0's addition of reflective animation, this is not really a huge issue. a wipe involves transmitting a constant cell's worth of 0-alpha RGBA, constant across all cells. a restore involves a single directive and no data transmission. there is one place where it would still help, though: kitty lets you position graphics on a z-axis, a tristate with regards to glyphs. that z position is graphic-wide, though, so you cannot both (a) print a glyph atop a graphic and (b) print a glyph below a partially transparent cell of that same graphic. mosaics would resolve this last pain in my life neatly. |
@wez out here preaching the Good Word <3 <3 <3 |
Jexer permits every text cell to have its own image. On output, it concatenates adjacent images on the same row into a single image and encodes that to whichever protocol that particular user-facing screen is using (sixel, iTerm2, or jexer). All images are only one cell high. (It also caches previously-generated output for performance.) Since Jexer is both multiplexer and windowing system, it may show image pieces from different terminals or application windows, and any image fragment could be obscured by an overlapping window.
I've seen a few references to different versions of the kitty protocol, but only one document online. Are these versions fixed and available, and can an application determine which "version" the TE actually complies to? |
grokked
the kitty document specifies when various features were added, but only with respect to kitty versions, not terminal-independent versions of the protocol itself. this is going to cause problems moving forward as more terminals pick it up. the next new feature i use will have to be matched against a kitty version for kitty and a wezterm version for wezterm. i'd love to see the protocol versioned instead. |
It would either have to be versioned as a whole protocol (examples of that: HTTP, VT100 (DA2)), or be able to negotiate specific features of that protocol with graceful fallback (example of that: Kermit). The thing with these bitmap image protocols (iTerm2, GIP) is that none of them except SIXEL are defined well enough that one could feasibly burn the encoder/decoder in hardware and stick it in a television that will last 15 years, or in an industrial machine that will last 40 years. That's the level of a standard I would like to see someday. |
eh, i've still never seen a true spec on Sixel. what are the failure modes when too little data is sent for a specified size? the meaning of |
AFAIK there is no failure for "too little" data: the raster attribute just sets the initial background square, it's still fine to draw less than the raster or more than the raster (the image square gets bigger). In practice xterm will fully discard (not even crop) images that exceed 1000 pixels in either direction, so we are stuck with that if we want to be interoperable. STD 070 might have more details. jerch probably knows those answers too. :) |
Inspired by notcurses, I have been coding again and playing with transparency (missing pixels) in a multiplexed environment. The image-as-cell model does in fact work OK for images-over-text; I don't think as much images-under-text though in a multiplexer. But who knows, maybe I will be happily surprised yet again. ;-) I have written up a few more notes and screenshots over here. (Plus a general thank you, including y'all here too. :-) ) It seems hard to obtain sixel images with missing pixels. I couldn't figure out how to get img2sixel or Imagemagick to do it. So I made a couple small ones and put them over here. If anyone has more such images, or better yet how to generate them, I would love to include it. Anyway, happy holidays! :-) |
Hey guys, hey @klamonte! Sorry for my sparse presence this year, that's due to some real-life implications. I'm almost free to resume my work on this protocol so it seems (yay). Happy holidays ;) |
@wez @dankamongmen @christianparpart I have added kitty support to Jexer's output. Some notes as they relate to GIP: ErasureErasing a single cell at (x, y) on an image erases the entire image. This makes the multiplexer case -- especially floating windows -- quite a bit harder, even when images are only one text row high. (You drag a text dialog over part of an image, erase that cell, and other areas are erased.) I resorted to erasing and redrawing all images on a row that:
I suspect my second option is buggy: I'm probably replacing every image on every frame at the moment, but I will dig into it more later. xterm's equivalent bugs with sixel is what led me to the horizontal strips design in the first place, and it works on alacritty+ayosec sixel; but alacritty-ayosec is not working well with notcurses due to its unusual erasure behavior. The point is that knowledge of how this frame damages images on previous frames is a bit hard to come by when you are composing overlapping windows. I believe GIP is already unambiguous here. Base64Kitty does not recognize valid base64, which can include line feeds. The default as per RFC 4648 is "don't add line feeds unless told to" so Kitty is not in actual error if that is the base64 standard it is expecting, but since Kitty does not actually refer to any base64 RFCs, it isn't clear until you try and see in Kitty's log why it isn't displaying. Other image protocols (e.g. iTerm2) do handle line feeds in base64 cleanly. But base64 encoding within OSC (as used by iTerm2) did have one terminal erroneously handling the line feeds as C0. It was fixed a while ago. As per the VT320 manual:
GIP should be very clear which base64 it is relying on. RFC 4648, or others. (I may have mentioned that elsewhere in GIP, sorry if I did...) Chunking:The 4kb chunking is dumb. With the additional "the cursor can't move, nothing else can happen in between chunks" stuff (hmm where did that come from? 🤔) it's just extra logic on the application's part to accomplish nothing. No CRC, no error recovery, no windowing, no actual protocol: just 10 extra lines in every application rather than 10 extra lines once in the terminal. (If someone had looked around more they might have read about Kermit protocol's design changes around TCP, and why it's generally better just to let the TCP/IP stack handle that part. 🙄 Or just tried it out and see what 75-300 kb/sec can do (and that's sixel!) and make some data-driven decisions.) Off-topic and annoyed...The 4kb chunking is probably why Kitty readily spews garbage on screen when you send it sixel. The whole point of APC + base64 (which is something I suggested to him when he started this, contrasting with Terminology whose sequences can lead to artifacts on other terminals) was that other terminals should quietly ignore it. It's good that he understood that, as he references it directly on his spec, but not good that he fails to give other protocols the same consideration.(Plus the little "xterm keyboard protocol is obsolete, go tell the application to change" message in the log is rather cheeky coming from a terminal whose vttest score is lower than Hyperterm's was 20 years ago. It is now a hard design goal to make DOOM playable with stock xterm. If that keyboard protocol makes it into xterm, then great; until then, rot in hell Kitty.) Someone else can open issues there if they give a shit. I won't. Bugswezterm and kitty are not producing the same output. I will file reports on wezterm in a little while. |
The image facility is cell-based, and can be used for far more than just "images". It can be a fallback for fonts (which could be handy if alt-text or similar is available), font size (e.g. VT100 double-width/double-height), images of course, custom emojis, and much more. Example - this (multiplexed multihead) screen shows at least 20 images comprising the main picture (each text row is a separate image), plus 94 images, one for each CJK glyph.
A "64 images" max might be a "64 CJK glyphs" or "64 emojis". (I know the spec calls these minimums, but for the purpose of this discussion we should assume them to be a maximum.)
I think the minimum system requirement should be:
This also supports the design philosophy that text cell operations work on images the same way, on a per-cell basis, rather than other protocols' intention for images and text to be fully distinct entities.
The text was updated successfully, but these errors were encountered: