Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a line/column model for document content #18

Open
ymeine opened this issue Sep 18, 2013 · 1 comment
Open

Implement a line/column model for document content #18

ymeine opened this issue Sep 18, 2013 · 1 comment

Comments

@ymeine
Copy link
Contributor

ymeine commented Sep 18, 2013

Representing the document content, that is text, in a model of lines and columns would have several advantages:

Main editors work like this

A lot of editors (clients/frontend) implement this model.

This result in a different primary way of specifying positions in text: line/column coordinates instead of an offset.

Would it be only accepting such coordinates on the backend would already be a good thing, because then it would be straightforward for clients to work with it, just sending the coordinates as is.

This would avoid the common issue where on the client:

  1. the content is changed
  2. the event handler receives the diff, with positions referring to the old content
  3. the handler tries to convert these positions to offsets using a utility of the library
  4. too bad, this utility refers to the new content!
  5. the positions can be wrong

Note also that some clients even specify document changes as a set of lines (CodeMirror for instance).

More efficient

Accepting such coordinates would be good, but actually implementing this model behind would bring more efficiency.

Indeed, the content would be split into multiple strings, one per line. Accessing a line is just a matter of accessing an item in an array.

Then, it's far more efficient, especially for big documents, to manipulate the string corresponding to one line instead of the whole text. In usual conditions, lines don't exceed an average of 80 characters.

Technical details

Code properties / models

I would keep a property named source in the Code class, returning the whole source as a single string. This would be a computed property, basically concatenating the lines together (the result should be cached however).

It would be used by the parser for instance.

Another property would appear, content, returning this line/column model (that is an array of lines)

Line endings

Line endings has always been a problem, we can see the basic Windows Notepad application not displaying \n as line endings, git providing a feature through hooks to convert them, ... whereas it is simple: we just want to end a line. That's the only information we need.

There are two solutions to handle line endings: either we keep them in the line or not.

We should keep them.

Why?

Editor services should not make the assumption that line endings are consistent throughout the whole document, even if most editors ensure it.


In case we would like to make possible to bypass the original line endings.

We could imagine that the user wants to get the whole source as a string, but with a specific line ending he would have specified (like in CodeMirror for instance).

In this case, this would be annoying because we would have to remove actual line endings from every line before concatenating them.

So there are two solutions:

  • either we do that: it requires iterating over all lines once, every time checking fro three kinds of line endings, removing them, then joining the result with the custom line ending
  • or we keep a symmetrical array of line endings, and we don't keep them in line contents. To rebuild the original source, we would have to do the contrary of the first point, but it's easier because it's just a concatenation, no need to find something. Or we could join the arrays, and then simply join it.
@ymeine
Copy link
Contributor Author

ymeine commented Sep 20, 2013

And what about keeping the old interface for document updates?

I mean, if a client requests a document update by sending a diff with the old format, that is offsets instead of lines/columns, how will the backend properly update the code?

Because it was straightforward to update the string of the whole source like this, but now if it is a purely computed property, only the line/column model must be updated (fortunately this way it avoids models synchronization issues)

So I would increase the estimated effort for this task, since there are multiple issues to take into account then:

  • converting an offset to a line/column position, which basically means indexing lines by offset positions, computed with lengths of other lines
  • updating lines, that is inserting or removing entries efficiently in the array of lines - splice should do the trick however, but what about rebuilding the previously mentioned index for lines coming afterwards?

I would also decrease the priority (ans thus change the milestone maybe) since in practice the project is developed for Eclipse first, with compatibility and performances for this client. And Eclipse does not use this model for document changes, but a simple couple of offset and length (see DocumentEvent)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant