Skip to content

Whitespace + allman checks#348

Closed
wilzbach wants to merge 1 commit intodlang-community:masterfrom
wilzbach:whitespace_checks
Closed

Whitespace + allman checks#348
wilzbach wants to merge 1 commit intodlang-community:masterfrom
wilzbach:whitespace_checks

Conversation

@wilzbach
Copy link
Member

Adds the ability to check for whitespace between operators (#318);

I realized it makes sense to bundle my changes in one PR. Thus it includes #344, #345, #346 and the whitespace checks

Fixes:

ubyte[] code;

///
this(string fileName, ubyte[] code, bool skipTests = false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason for taking the raw source code as a parameter here instead of the token array? The tokens are avalable in the analysis.run.analize function so they could be passed to this constructor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know how I could access the tokens based on a statement and look ahead/before. I only found startLocation

@wilzbach wilzbach force-pushed the whitespace_checks branch 6 times, most recently from 4fdac36 to 1998b39 Compare December 10, 2016 18:54
@wilzbach
Copy link
Member Author

I managed to find some to rebase this :)

What was the reason for taking the raw source code as a parameter here instead of the token array? The tokens are avalable in the analysis.run.analize function so they could be passed to this constructor.

The lexer drops all unnecessary whitespace are and these checks explicitly depend on lexWhitespace.

(I excluded the more complex check for space between operators.)

@Hackerpilot
Copy link
Collaborator

I don't recall if I asked this before, but wouldn't it be easier to check the tokens array for line numbers and column numbers? The tokens do have that information.

- add allman style check
- detect trailing whitespace
- check for two or more consecutive empty lines
override void visit(const DoStatement st)
{
// the DoStatement only knows about the line and column of the expression
checkForBrace(st.statementNoCaseNoDefault, 0, 0);
Copy link
Member Author

@wilzbach wilzbach Dec 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For

do {

} while(true)

the AST is:

<doStatement>
  <expression>
    <unaryExpression>
      <primaryExpression>
        <true/>
      </primaryExpression>
    </unaryExpression>
  </expression>
  <statementNoCaseNoDefault>
    <blockStatement>
      <declarationsAndStatements>
        <declarationOrStatement>
          <statement>
            <statementNoCaseNoDefault>
              <expressionStatement>
                <expression>
                  <unaryExpression>
                    <primaryExpression>
                      <identifierOrTemplateInstance>
                        <identifier>i</identifier>
                      </identifierOrTemplateInstance>
                    </primaryExpression>
                    <suffix>++</suffix>
                  </unaryExpression>
                </expression>
              </expressionStatement>
            </statementNoCaseNoDefault>
          </statement>
        </declarationOrStatement>
      </declarationsAndStatements>
    </blockStatement>
  </statementNoCaseNoDefault>
</doStatement>

and none of the elements contains information about the line and column:

DoStatement: https://github.com/Hackerpilot/libdparse/blob/master/src/dparse/ast.d#L1467
BlockStatement: https://github.com/Hackerpilot/libdparse/blob/master/src/dparse/ast.d#L959
StatementNoCaseNoDefault: https://github.com/Hackerpilot/libdparse/blob/master/src/dparse/ast.d#L2249

@wilzbach
Copy link
Member Author

I don't recall if I asked this before, but wouldn't it be easier to check the tokens array for line numbers and column numbers? The tokens do have that information.

Yes you did, but afaict the tokens array doesn't contain information about whitespace / new lines as they are already ignored during the lexing phase. However, I am happy if you prove me wrong ;-)

@ghost
Copy link

ghost commented Dec 16, 2016

Try this, you must change the lexer config so that whites are included and also (edit) don't use getTokensForParser().

void main()
{
    import dparse.lexer;
    import std.stdio, std.file : read;

    ubyte[] src = cast(ubyte[]) read(__FILE_FULL_PATH__);
    StringCache cache = StringCache(StringCache.defaultBucketCount);
    LexerConfig config = LexerConfig("", StringBehavior.source, WhitespaceBehavior.include, CommentBehavior.intern);

    auto toks = DLexer(src, config, &cache);

    foreach(tk; toks)
    {
        if (tk.type == tok!"identifier")
            writeln("identifier: '", tk.text, "'");
        else if (tk.type == tok!"whitespace")
            writeln("whites for ", tk.text.length, " chars");
        else // many other cases...
            writeln(str(tk.type));
    }
}

but actually you only need them to detect the trailing spaces. Otherwise you can use the tokenN position, the tokenN length and the tokenN+1 position to detect missing whites after if, foreach and such tokens.
Also take care of the comments, since if/*bla bla*/(true) is perfectly valid.

@wilzbach
Copy link
Member Author

wilzbach commented Jun 7, 2017

but actually you only need them to detect the trailing spaces. Otherwise you can use the tokenN position, the tokenN length and the tokenN+1 position to detect missing whites after if, foreach and such tokens.
Also take care of the comments, since if/bla bla/(true) is perfectly valid.

Hmm I am still not sure how I could do this without access to the raw code. Allman is probably easy to look at as an example, in particular this method is the core of the Allman check:

/**
Checks whether a brace or newline comes first
*/
void findBraceOrNewLine(size_t start, size_t end, size_t line, size_t column)
{
	import std.algorithm : canFind;
	import std.utf : byCodeUnit;

	auto codeRange = (cast(char[]) code[start..end]).byCodeUnit;

	// inline statements are allowed -> search for newline
	if (codeRange.canFind('\n'))
	{
		foreach (s; codeRange)
		{
			// first brace
			if (s == '{')
			{
				// DoStatement hasn't a proper line and column attached
				// -> calculate ourselves
				if (line == 0 && column == 0)
				{
					// find line & column of brace
					auto t = findLineAndColumnForPos(start);
					line = t.line + 1; // Dscanner starts lines at 1
					column = t.column;
				}
				addErrorMessage(line, column, KEY, MESSAGE);
				break;
			}
			// newline - test passed
			else if (s == '\n')
			{
				break;
			}
		}
	}
}
  1. With the code array, I simply go back/forth because I know the {start,end}Location of an statement. If I only have the tokens array, how could I access the next token without iterating through all of them?

  2. And imho this check is rather easy to do with access to the code array and it's in memory anyways, so I don't really get why it's so bad to use it?

@ghost
Copy link

ghost commented Jun 7, 2017

Allman enforcer:

void main()
{
    import dparse.lexer;
    import std.stdio, std.file : read;

    ubyte[] src = cast(ubyte[]) read(__FILE_FULL_PATH__);
    StringCache cache = StringCache(StringCache.defaultBucketCount);
    LexerConfig config = LexerConfig("", StringBehavior.source);

    auto toks = getTokensForParser(src, config, &cache);

    foreach(const i; 1..toks.length) {
        if (toks[i].type == tok!"{" && toks[i-1].line == toks[i].line)
            writeln(__FILE_FULL_PATH__, "(", toks[i].line, "): Allman not respected !");
    }
}

You see, it's shorter with the tokens.

@wilzbach wilzbach mentioned this pull request Jun 12, 2017
@wilzbach
Copy link
Member Author

This is superseded by #446, #447 and #448.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add check for trailing whitespace check for more than two consecutive empty lines

2 participants