-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Great Line Ending & Cursor Range Cleanup #362
Comments
Potentially related: #136 |
This is confusing. |
This RFC kind of has two proposals in it:
I just want to make clear that the second point is very optional, and has a lot of open questions. And I'm happy to stick to just the first point during the cleanup, keeping the behavior of Helix the same. I do want to discuss the second point as well, however, if people have ideas. |
Ah, sorry. To clarify, what I mean is that if you have this situation:
And hit
|
Re: "How to visualize this in a terminal?" I had some more thoughts about this. My general feeling now is that (assuming we want to change behavior at all), a mix of option 1 and option 2 might be best:
This way there is a visual change when e.g. first hitting |
Ah, I don't mean the text is confusing but I mean the behavior is confusing. If Then doesn't And |
@pickfire
My general proposal for how to handle operations like that is to operate as if the range were a minimum width of one, so that behavior is the same as now from the user perspective, and still visually makes sense in a terminal. So I don't think it would end up being confusing for the user. |
Let's call this approach gap indexing, based on this comment on the LSP spec:
(As discussed on Matrix:) I think we should change the Range to use gap indexing, it would simplify some of the code. The transactions already use this approach, which is why a lot of the ranges have to be extended with I think we could also truncate the value inside the transaction so we no longer have to do the I'm a bit unsure on the approach to cursors, so I think we should start by enforcing that ranges are always at least 1-width. I don't think there's a good approach to solve this in the terminal (it makes |
Awesome. In that case, I'll put this on my todo list. I'll switch Helix to gap indexing, but keep all current behavior the same and enforce a minimum-1-width range. Behavioral changes can be a separate discussion, and dealt with later as a separate project if we decide to change anything. |
Started work on this in #376. |
Resolved by #376 |
Update:
As discussed in the comments, I'll be switching Helix to gap indexing (what I called the "between-character model" in this RFC), but without any behavior changes to Helix. A minimum 1-width cursor range will still be enforced, for example.
At the same time, I'll be addressing the file-end line ending issue discussed in #309.
In this issue I'd like to propose what I am (overly grandiosely) calling The Great Line Ending & Cursor Range Cleanup.
As I see it, Helix currently has two issues with how it internally thinks about text:
I would like to change both of these things in one big PR, because the former creates special cases that... well, aren't special cases anymore if we also do the latter. But I want to get buy-off on both of these things before proceeding, because they're both changes to the underlying way that Helix thinks about text.
Since I already got buy-off about line endings in #309, I'm going to focus on cursor ranges here.
Proposal
Helix considers a cursor and a contiguous selection range to be the same thing. A cursor is simply a selection range where the start and end of the range are equal. This is a great architecture, and we should keep it. Helix calls this combined concept a
Range
, and it can be found inselection.rs
in helix-core.Right now, Helix considers the start and end indices of a range to sit on top of characters. So, for example, a range of [1, 1] looks like this (bold highlights the range ends, and square brackets mark the selected text):
And a range of [0, 3] looks like this:
I propose to change the start and end indices to sit between characters instead. A range of [1, 1] would then look like this (square brackets mark the range ends and the selected text):
And a range of [0, 3] would look like this:
(Side note: the on-character model can equivalently be formulated as "end-inclusive", and the between-character model as "end-exclusive". But with text, I think the "on-character" and "between-character" formulation is much more intuitive, and makes it more obvious how each model works, as well as how to code for them.)
Quick justification
The quick justification for changing to the between-character model is that it's strictly more powerful: it can represent everything the on-character model can, plus zero-width ranges. So it gives us more flexibility.
If we want to, we could still make everything in Helix function identically to how it does now by ensuring that ranges are always at least one character wide, and keeping all the command behavior the same with respect to those one-character ranges. But it gives us the flexibility to have zero-width ranges if we decide we want that.
Full justification
Zero-width ranges aren't just abstractly more powerful, they actually let us represent important things more precisely. I'll illustrate this with a few representative examples.
Example 1: text insertion
Consider the following two situations:
With the current on-character model, the two situations look like this:
The range itself can't distinguish between these two cases at all, so the distinction has to be at the operation/command level. Whether we want the distinction to be at the operation/command level or not is a reasonable discussion we can have. But the point is that the on-character model traps us into that choice because it cannot distinguish a non-selection vs a single-character selection.
In contrast, the same two situations can be distinguished easily with the between-character model:
Example 2: the extend-line command
The extend-line command (currently on key
x
in Helix) expands the current range to select the entire line, unless the entire line is already fully encompassed by the range, in which case it extends the range to select more and more lines on subsequent invocations.This is a super useful command. For example, deleting a line is just the key combo
xd
... unless you're on a blank line. If you're on a blank line, thenxd
will actually delete both the current and next line. The reason why is obvious if we visualize the on-character ranges involved:→
x
→→
d
→The problem is that there is no way to distinguish between the cursor simply being on the line vs selecting the line. This is easily solved with the proposed between-character model:
→
x
→→
d
→This way
x
always behaves consistently: first selecting the current line, then on subsequent invocations extending the selection to more and more lines. This also makesxd
a muscle-memory shortcut for deleting the current line: it will never delete two lines, even if the current line is blank.Example 3: empty files
When we move away from requiring a file-end line ending (as per #309), there will sometimes be buffers with literally no text in them.
In such cases, the on-character model becomes ill-defined (or we have a weird "one-past-the-end" case) and we'll need to handle it as a special case in various places.
The between-character model, on the other hand, represents this gracefully with a zero-width range. Of course, that doesn't mean there won't be special cases: commands that require a non-zero-width range will still need to check for zero-width. But those checks fall naturally out of the relationship between those commands and the range representation itself.
More generally, that's one of the main differences between the on-character and between-character models: the on-character model has special cases because of ambiguity and things it can't represent, whereas the between-character model has special cases because of command/operation requirements. The latter is a lot easier to reason about.
Other benefits
I think we probably want to support plugins/addons in the future. And the proposed between-character model is generally a simpler programming API, so we probably want to present that model to plugins for range manipulation and text edits.
Having that model consistent between the internal APIs and the plugin APIs will likely make everything simpler and less error prone.
If we want to enforce a minimum range-width on the user-facing side of things, that can easily be a validation step after plugin code runs: just expand all zero-width ranges to single-width. Such a validation pass will likely be needed anyway, for e.g. resolving overlapping ranges, etc.
Drawbacks
Of course, this change won't be all sunshine and rainbows:
How to visualize this in a terminal?
The main challenge with the proposed between-character model is that we can't draw vertical-line cursors in a terminal, we can only draw block-style cursors. (Some terminals do support vertical-line cursors, but only for the terminal's single "real" cursor, which isn't useful in a multi-cursor editor like Helix. And not all terminals support that anyway.)
At the moment, I can think of three ways to tackle this limitation:
r
) would operate on next-character (if it exists) when the range is zero-width.x
) would take advantage of the zero-width distinction and gain the nicer behavior outlined earlier.v
) would automatically expand zero-width ranges to single-width when you enter it, and range manipulation will handle range-end crossing in a way that matches current behavior.r
to operate on next-character in the case of zero-width ranges, because that keeps them useful at all times.Assuming people generally agree on making the switch to between-character at all, this is definitely the aspect of things that I think needs the most consideration and discussion.
Introducing bugs
An obvious drawback of doing a "cleanup" like this is that it will inevitably (re-)introduce indexing and command-behavior bugs, and for a while we'll be (re-)fixing them.
Personally, I think that's acceptable at this stage in Helix's development. Helix is still (comparatively) in its early days, so if we're going to make this kind of switch, now is the time to do it. And I think such bugs will actually be easier to identify and fix with the between-character model, because the special cases involved are more localized and "natural", rather than arising from ambiguities in the range model itself.
But I definitely understand if people are hesitant about this risk.
Wrapping up
So that's my proposal. To summarize my arguments in favor of between-character ranges:
Thoughts? Opinions?
The text was updated successfully, but these errors were encountered: