Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarifications about ninja spec #952

Open
mdempsky opened this issue Apr 1, 2015 · 9 comments
Open

Clarifications about ninja spec #952

mdempsky opened this issue Apr 1, 2015 · 9 comments

Comments

@mdempsky
Copy link
Contributor

mdempsky commented Apr 1, 2015

Bunch of minor clarifications:

1. The Ninja manual mentions "whitespace" in a few places; what exactly does that mean? Are tabs and/or carriage returns considered whitespace? (The fact that Ninja is encoding agnostic suggests at least that Unicode whitespace isn't allowed.)

The current ninja implementation seems to error on \r and \t. It might be clearer to just say "spaces" instead of "whitespace", and to explicitly state the rule about \r and \t.

2. Are there any restrictions on acceptable variable names? Is "foo$ bar = 3" a valid variable assignment? Are "${foo bar}" and/or "${foo$ bar}" valid references to that variable?

The ninja implementation seems to only allow identifiers like [a-zA-Z0-9_.-]+ (notably allowing identifiers like "42"), though the "${foo}" syntax needs to be used for identifiers with periods in them. This doesn't seem documented anywhere.

3. "If a line is indented more than the previous one, it’s considered part of its parent’s scope; if it is indented less than the previous one, it closes the previous scope."

It's not clear what "its parent" means here, IMO.

Also, strictly interpreted, for this input:

rule foo
    x = 1
    y = 2
  z = 3

the "y = 2" line is not indented "more than the previous one" (i.e., "x = 1"), so it should not be considered part of its parent's scope.

Similarly, the "z = 3" line is indented "less than the previous one", so it should "close the previous scope".

As far as I can tell looking at the ninja implementation, the amount of indenting is insignificant as there's no nested scoping within a file: any indented assignment is simply associated with the last non-indented line.

4. It seems that the number of spaces is significant on the right hand side of an assignment statement, but this isn't mentioned anywhere that I can tell.

5. Comments only seem to work as standalone lines, though intuitively I would have expected them to behave like shell comments. Probably worth noting.

6. The spec says "The full lookup order for a variable expanded in a build block (or the rule is uses) is: [...] 2. Build-level variables from the build block."

But the current implementation runs "echo 1 2 4" given this build.ninja:

rule echo
    command = echo $args
args = 1
args = $args 2
build hello: echo
    args = $args 3
    args = $args 4

7. The current implementation rejects undefined escape sequences like "$!"; seems worth documenting that other uses of $ are invalid.

8. There's no escape sequence for literal "|" characters, though that's an uncommon character for file names.

@evmar
Copy link
Collaborator

evmar commented Apr 1, 2015

Might be better to bring this up on the mailing list, or file separate bugs on the issues you specifically want to fix. :) Stuff like putting #5 in the docs seems like a simple pull request.

For many of these corner cases I deliberately left it unsaid, because I wasn't quite sure what the answer should be and I didn't want to tie the implementation, nor do I think it's really useful to have a manual with 20% more text that documents every last corner. I think Ninja's implementation is the definitive answer for most questions but also that we're fine with changing Ninja's implementation where it doesn't make sense. (I think the error message if you include a "\r" mentions that we could fix things to make it allowed if you really care.) I know that this ambiguity might cause problems if we change behaviors across versions but in practice it's never been a problem.

Re 3, originally I had imagined multiple nested scopes of indenting (like Haskell "where" clauses) and then it never ended up being useful. I agree the docs could remove all mention of parent and simplify to just "indented or not".

Re 4, could you elaborate?

Re 7 and 8, see this recent thread: https://groups.google.com/forum/#!topic/ninja-build/UVYk9RhCE4M

@mdempsky
Copy link
Contributor Author

mdempsky commented Apr 1, 2015

For item 4, these two builds have the same behavior:

build a.out: link foo.o bar.o
build a.out: link foo.o  bar.o

but these two rules have different behavior:

rule hello_one_space_world
    command = echo "hello world"
rule hello_two_space_world
    command = echo "hello  world"

I understand why, but I don't think it's very obvious from the manual.

Context: My first idea for writing a ninja parser was to treat the value part of "var = value" the same as a list of paths (i.e., a sequence of space-separated tokens) and then I'd just rejoin them with a space later, but that's actually wrong. (Of course, I need to lex "value" tokens separately anyway because of : and | characters too, so now I'm just tracking whether the last token was an "=".)

@dpranke
Copy link

dpranke commented Apr 1, 2015

fwiw, I wrote a formal (PEG) grammar for Ninja files a while ago that I meant to get merged into the Ninja docs but never got around to it. I think it would answer most of your questions ...

https://github.com/dpranke/glop/blob/master/test_grammars/ninja.pymeta

@evmar
Copy link
Collaborator

evmar commented Apr 1, 2015

Yes also these are different:

command = echo foo$
  bar
command = echo foo $
  bar

I agree the best way to treat this is that = begins a specially-lexed section (much like how " begins a specially-lexed section in C).

@mdempsky
Copy link
Contributor Author

mdempsky commented Apr 2, 2015

dpranke: Thanks, that is helpful. (I note a handful of discrepancies though; is it useful to write them up somewhere?)

That also makes me realize you can't use "$\n" to wrap identifiers (e.g., rule and variable names), so the lexer needs to contextually know to look for those too (as opposed to paths and values, where it is allowed). Seems like all the special cases make the traditional lexer + parser generator solution impractical for such a simple grammar. PEG seems nicer here.

@dpranke
Copy link

dpranke commented Apr 2, 2015

On Wed, Apr 1, 2015 at 5:38 PM, Matthew Dempsky notifications@github.com
wrote:

dpranke: Thanks, that is helpful. (I note a handful of discrepancies
though; is it useful to write them up somewhere?)

Yes, if there are discrepancies then my python ninja clone is probably
buggy and I'd like to fix that :). We should also push the grammar into the
ninja docs.

@evmar
Copy link
Collaborator

evmar commented Apr 2, 2015

In the implementation, there are two lexers: the one for reading tokens (which doesn't understand $) and the one for reading identifiers and variable values (which does understand $). Unfortunately as you observe that does mean the lexing is dependent on the parse.

@Alhadis
Copy link

Alhadis commented Nov 27, 2016

I think Ninja's implementation is the definitive answer for most questions but also that we're fine with changing Ninja's implementation where it doesn't make sense.

You might consider lifting the ban on tabs, then, because enforcing a specific style of formatting is just repeating Make's mistake.

The lexer should raise an error only if tabs and spaces are mixed within the same file, which (of course) should be forbidden. The first indented line could set the expected indentation for the rest of the file.

@jonesmz

This comment was marked as abuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants