Implement a line/column model for document content #18

ymeine · 2013-09-18T12:35:57Z

Representing the document content, that is text, in a model of lines and columns would have several advantages:

Main editors work like this

A lot of editors (clients/frontend) implement this model.

This result in a different primary way of specifying positions in text: line/column coordinates instead of an offset.

Would it be only accepting such coordinates on the backend would already be a good thing, because then it would be straightforward for clients to work with it, just sending the coordinates as is.

This would avoid the common issue where on the client:

the content is changed
the event handler receives the diff, with positions referring to the old content
the handler tries to convert these positions to offsets using a utility of the library
too bad, this utility refers to the new content!
the positions can be wrong

Note also that some clients even specify document changes as a set of lines (CodeMirror for instance).

More efficient

Accepting such coordinates would be good, but actually implementing this model behind would bring more efficiency.

Indeed, the content would be split into multiple strings, one per line. Accessing a line is just a matter of accessing an item in an array.

Then, it's far more efficient, especially for big documents, to manipulate the string corresponding to one line instead of the whole text. In usual conditions, lines don't exceed an average of 80 characters.

Technical details

Code properties / models

I would keep a property named source in the Code class, returning the whole source as a single string. This would be a computed property, basically concatenating the lines together (the result should be cached however).

It would be used by the parser for instance.

Another property would appear, content, returning this line/column model (that is an array of lines)

Line endings

Line endings has always been a problem, we can see the basic Windows Notepad application not displaying \n as line endings, git providing a feature through hooks to convert them, ... whereas it is simple: we just want to end a line. That's the only information we need.

There are two solutions to handle line endings: either we keep them in the line or not.

We should keep them.

Why?

Editor services should not make the assumption that line endings are consistent throughout the whole document, even if most editors ensure it.

In case we would like to make possible to bypass the original line endings.

We could imagine that the user wants to get the whole source as a string, but with a specific line ending he would have specified (like in CodeMirror for instance).

In this case, this would be annoying because we would have to remove actual line endings from every line before concatenating them.

So there are two solutions:

either we do that: it requires iterating over all lines once, every time checking fro three kinds of line endings, removing them, then joining the result with the custom line ending
or we keep a symmetrical array of line endings, and we don't keep them in line contents. To rebuild the original source, we would have to do the contrary of the first point, but it's easier because it's just a concatenation, no need to find something. Or we could join the arrays, and then simply join it.

The text was updated successfully, but these errors were encountered:

ymeine · 2013-09-20T12:32:07Z

And what about keeping the old interface for document updates?

I mean, if a client requests a document update by sending a diff with the old format, that is offsets instead of lines/columns, how will the backend properly update the code?

Because it was straightforward to update the string of the whole source like this, but now if it is a purely computed property, only the line/column model must be updated (fortunately this way it avoids models synchronization issues)

So I would increase the estimated effort for this task, since there are multiple issues to take into account then:

converting an offset to a line/column position, which basically means indexing lines by offset positions, computed with lengths of other lines
updating lines, that is inserting or removing entries efficiently in the array of lines - splice should do the trick however, but what about rebuilding the previously mentioned index for lines coming afterwards?

I would also decrease the priority (ans thus change the milestone maybe) since in practice the project is developed for Eclipse first, with compatibility and performances for this client. And Eclipse does not use this model for document changes, but a simple couple of offset and length (see DocumentEvent)

ymeine added state:agreed labels Feb 7, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a line/column model for document content #18

Implement a line/column model for document content #18

ymeine commented Sep 18, 2013

ymeine commented Sep 20, 2013

Implement a line/column model for document content #18

Implement a line/column model for document content #18

Comments

ymeine commented Sep 18, 2013

Main editors work like this

More efficient

Technical details

Code properties / models

Line endings

ymeine commented Sep 20, 2013