Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve indentation rules #17868

Closed
3 tasks done
rebornix opened this issue Dec 28, 2016 · 2 comments
Closed
3 tasks done

improve indentation rules #17868

rebornix opened this issue Dec 28, 2016 · 2 comments
Assignees
Milestone

Comments

@rebornix
Copy link
Member

rebornix commented Dec 28, 2016

Indentation

Intro

Indent Style is a convention to convey the structure of source code to human readers in programming languages. It's not a requirement in most programming languages but usually language authors adopt a style for their sample code and encourage users to use that. In some particular programming languages, Python for example, indentations are even meaningful to compilers/interpreters, wrong indentations may lead to compiling or runtime errors.

VS Code helps users to write correct indentations with ease. To make this happen, we followed TextMate's style and implemented three regexp patterns you can set to tell VS Code when to increase or decrease indentations.

  • increaseIndentPattern
  • decreaseIndentPattern
  • indentNextLinePattern
  • unIndentedLinePattern

However, these regexp patterns only decide whether we should modify the indentaion of a line if it matches one pattern, they don't tell us when is a good time to apply the modification. There are two ways to implement this feature, an active one and a passive one.

  • Active: Whenever the model changes(user input, code format, etc), run indentaion regex rules against current line.
  • Passive: Whenever users press enter key, run indentaion regex rules against current line.

Active or Passive

The active way looks really good as the indentation can be adjusted immediately when the content of current line matches the indentation rule. Take Ruby def/end block as an example:

ruby-indentation

Ruby programmers or anyone running into similar situation may only be 100% satisfied if we adjust the indentation correctly once they press d as part of the closing keyword end. But the catch is users might keep typing after we modify the indentation, the content of current line may not match any regex rule any more, should we keep an eye on that and undo previous indentation change? To make things worse, if any indentation regex rule contains $, which matches the ending position of a line, how can we tell whether users finish typing in current line or not?

Taking above into consideration, passive way might be more reasonable. When a user press enter, he or she is sure that the content in current line is satisfactory for the moment and wants to work on following lines. We can calcualte the final indentation for current line and do the updates if it is different from the existing one. Besides, we can calculate the potential correct indentation level for the following line.

In VS Code we chose the passive way.

Are we in good shape now?

We implemented the logic described above (there are several other components that contribute to the indentation, we'll discuss that later) but it's not precisely as expected. Currently we got somewhat wrong with the semantics of decreaseIndentPattern and there might be other cases that might be related or similar:

TODO

  • enrich support for decreaseIndentPattern
  • auto indent selection
  • move lines should honor indentation
@rebornix rebornix self-assigned this Dec 28, 2016
@rebornix rebornix added this to the January 2017 milestone Dec 28, 2016
@egamma egamma mentioned this issue Dec 28, 2016
56 tasks
@rebornix
Copy link
Member Author

rebornix commented Jan 9, 2017

In above description I just linked to the documentation of TextMate but unluckily, TextMate's write up about indentation is somewhat ambiguous so here I'd like to share how I understand it.


When we start to write a new line of code, most modern editors will insert some spaces/tabs, or the more popular concept indentation, at the beginning of the line. The reason for that is most programming languages and structured text have indentation conventions, and editors want to be smart about making indentation correct for us.

You might ask how editors decide how many spaces they should insert? There are several approaches to achieve that but here we'll talk about TextMate's mechanism first.

TextMate has four regexp patterns that you can set to tell it when and how to adjust the indentation:

  • increaseIndentPattern
  • decreaseIndentPattern
  • indentNextLinePattern
  • unIndentedLinePattern

they work perfectly in most cases and even become a de facto standard in this particular area. Successors like Sublime Text, Atom, VS Code ( 😄 ), etc adopt this concept while implementing their smart indentation system.

Indentation Inheritance

If we don't set any indentation rule mentioned above, when you start a new line, TextMate will just inherit the indentation from previous line.

var a;
    var b;

Take above code as example, we are editing a plain text file which has no indentation rules. The first line contains no whitespace but we add 8 spaces at the beginning of the second line by purpose. Now let's say the cursor is after ; in the second line and we press enter, the result will be like

var a;
    var b;
    var c;

The third line just takes the indentation from the second line. One thing to note here is that the third line doesn't take the first line's indentation into consideration in this case. Personally I call it Indentation Inheritance.

Indentation Rules

The four regexp indent patterns are all about adjusting the level of the inherited indentation. So the workflow of starting a new line by pressing enter is more or less

  • Inherit the indentation from previous lines (it's not always inheriting indentation from one line above, I'll explain it later).
  • Run these four regexp rules against the content of previous lines and current line to see whether and how to adjust the indentation.
  • Insert the adjusted indentation at the beginning of the new line.

Note 1

inheritedIndentationLevel doesn't equal indentationLevelOfPreviousLine always.\


increaseIndentPattern

This pattern means if a line matches it, the indentation level of the following line should be inheritedIndentationLevel + 1. We still use above sample code as example

var a;
    var b;

If we set increaseIndentPattern as ^.*b;$, after moving the cursor to the end of the second line and pressing enter, the result is like below ( tabSize=4 and insertSpaces=true )

var a;
    var b;
        var c;

The inherited indentation level is 1 and previous line matches increaseIndentPattern so the final indentation level increases to 2.

decreaseIndentPattern

TextMate documentation says it's the counterpart of increaseIndentPattern but it's not accurate. increaseIndentPattern is talking about the indentation of the following line. However this one is aiming to adjust the indentation of current line. If a line matches this pattern, the indentation level of the following line should be inheritedIndentationLevel - 1.

Since it's about the line we are working on at the moment, the only catch is if the pattern contains $ (match end of line), how can the editor know if the user finishes working on this line or not?

TextMate is really smart in this part, it checks the content of current line against decreaseIndentPattern whenever you type. If it matches, decrease the indentation, if it no longer matches, roll back to previous indentation.

textmate-decrease-indentation

Sublime Text doesn't roll back when the content no longer matches. However it has special handling when you delete all characters in current line, it rolls back to inherited indentation.

sublime-decrease-indentation

Atom is straight forward, if it no longer matches the decreaseIndentPattern rule, it just leaves it there.

atom-decrease-indentation


Note 2

I'm using inheritedIndentationLevel +/- 1 while describing above two indentation rules. This is because if a line matches decreaseIndentPattern, it doesn't mean that this line must decrease indentation by 1. Instead it means the indentation should be inheritedIndentationLevel - 1 and if current indentation doesn't match, adjust it to make it correct.

Take below code as example:

    if (true) {
        var b;
    |

| represents the cursor and we set the decreaseIndentPattern rule as \}. When we type in }, should we decrease the indentation of the third line? Apparently the answer is no and the expected correct output should be:

    if (true) {
        var b;
    }

The inherited indentation level is 2 so after the decrement, the result indentation level is 1. As the indentation of current line is already correct so we don't need to adjust the indentation any more.


indentNextLinePattern

This pattern is called Increase Only Next Line rule. As we highlighted in Note 1, the inherited indentation level is not always only decided by the line above, indentNextLinePattern is the first exception.

In some programming languages, you are Okay to write a block without brackets if there is just one single statement inside that block.

    if (condition)
        statement();
    otherStatement();

We usually increase the indentation if a line ends with an open bracket {, which indicates the beginning of a block. However in this case, as it's a single-statement-block, we should only increase the indentation of the second line statement() and then the third line should align with the first line as the if block already ends.

For this situation, there is the indentNextLinePattern which matches these lines. The indentation level of the first line is 1 and let's say it matches indentNextLinePattern, the third line should inherit the indentation from the first line, which should be 1 as well.

We can also think about it in another way. When we write the first line, inheritedIndentationLevel is 1. Since it matches indentNextLinePattern, the indentation level of second line increases to 2 but it doesn't change inheritedIndentationLevel. So when we start to work the third line, we just read inheritedIndentationLevel which is still 1.

But there is a really annoying problem:


Note 3

Let's recall what TextMate's documentation (section 24.2.5) says:

But generally in C-like languages, all lines not terminated with a semi-colon (or ending with starting a block), cause the next line to be indented. This is (with most conventions) also the case when manually breaking one expression into several lines, e.g.:

some_function(argument1,
   argument2,
   argument3);

more_code;

...

The pattern ends up as:

indentNextLinePattern = '^(?!.*;\s*//).*[^\s;{}]\s*$';

But the result in TextMate is not as expected. If I made anything wrong, like configuring TextMate improperly, free feel to correct me.

textmate-indent-nextline

In this case, the third line argument3 should inherit indentation from line some_function(argument1, first and since the second line argument2, matches indentNextLinePattern again, the result indentation of line argument3 turns out to be the same as the second line. But TextMate just reads the indentation level from the second line and increases that by 1.

Is TextMate wrong here? Actually no. Here is an example that explains well:

if (condition1)
    if (condition2)
        statement();

anotherStatement();

If we only indent next line when it's a single statement block, the first and second line would match indentNextLinePattern. It's similar to breaking one expression into three lines. In this situation, the third line's indentation level should be greater than the second one.

Am I saying you can't write a general indentNextLinePattern that makes two cases work at the same time?


Since we have increaseIndentPattern and indentNextLinePattern, we have to take above lines into consideration when we calculate the indentation for a new line. But sometimes we may want to ignore several lines completely, for instance comments, that's why we have another rule called unIndentedLinePattern.

unIndentedLinePattern

Any line that matches this rule is ignored when calculating indentation. Personally I'd like to put it this way: any line that matches unIndentedLinePattern can be removed from the editor while calculating the indentations.

For example, in JavaScript you may write some comments before code and we don't want these comments affect the indentation stuff, we can set the unIndentedLinePattern as '^\s*((/\*|\*/\s*$|//|#).*)?$'

function() {
    var a;
        // if block
    if () {

Then when we calculate the indentation for each time, we can actually see it as

function() {
    var a;
    if () {

The indentation of the comments are not useful and all the indentation rules on the comments will not take effect either.

@rebornix
Copy link
Member Author

rebornix commented Jan 11, 2017

It's always easy to describe what's the expected behavior but it's difficult to acknowledge that our current implementation is inconsistent with it. Here I'd like to list corner cases I ran into as many as possible

decreaseIndentPattern

  • check beforeEnterText and afterEnterText against decreaseIndentPattern when pressing enter

unIndentedLinePattern

I configured TextMate to make return false as unIndentedLine so we have below code sample

screen shot 2017-01-10 at 4 16 04 pm

any line contains return false doesn't affect the indentation of following lines. This is a big missing part in current implementation.

  • check previous lines against unIndentPattern when pressing enter
  • check beforeEnterText against unIndentPattern when pressing enter

Support selections

When we press enter, Code will check the content surrounding the selection against indentation rules. This is true when the selection is a single cursor. If we select multiple characters, our check against afterEnterText is possibly wrong.

  • Support selections when press enter

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant