Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[K-Bug] Uncaught exception thrown of type NegativeArraySizeException #4592

Closed
2 of 6 tasks
pxhdev opened this issue Aug 15, 2024 · 8 comments
Closed
2 of 6 tasks

[K-Bug] Uncaught exception thrown of type NegativeArraySizeException #4592

pxhdev opened this issue Aug 15, 2024 · 8 comments

Comments

@pxhdev
Copy link

pxhdev commented Aug 15, 2024

What component is the issue in?

Front-End

Which command

  • kompile
  • kast
  • krun
  • kprove
  • kprovex
  • ksearch

What K Version?

K version: v7.1.106 Build date: Thu Aug 15 01:22:33 CST 2024

Operating System

Linux

K Definitions (If Possible)

module NEWLINE
  imports INT-SYNTAX

  syntax Stmt ::= List{Int, "\n"}
  syntax #Layout ::= r"([\\ \\r\\t])"
  configuration <k> $PGM:Stmt </k>
endmodule

Steps to Reproduce

kompile the definition and try parsing the newline.txt

1
3

The error message:

$ kast newline.txt --debug
java.lang.NegativeArraySizeException: -1
        at org.apache.commons.lang3.StringUtils.repeat(StringUtils.java:5210)
        at org.kframework.utils.errorsystem.KException.getSourceLine(KException.java:315)
        at org.kframework.utils.errorsystem.KException.getSourceLineText(KException.java:232)
        at org.kframework.utils.errorsystem.KException.<init>(KException.java:81)
        at org.kframework.utils.errorsystem.KException.<init>(KException.java:63)
        at org.kframework.utils.errorsystem.KEMException.create(KEMException.java:236)
        at org.kframework.utils.errorsystem.KEMException.innerParserError(KEMException.java:155)
        at org.kframework.parser.inner.kernel.Scanner.readTokenizedOutput(Scanner.java:498)
        at org.kframework.parser.inner.kernel.Scanner.tokenize(Scanner.java:464)
        at org.kframework.parser.inner.kernel.EarleyParser$ParserMetadata.<init>(EarleyParser.java:586)
        at org.kframework.parser.inner.kernel.EarleyParser.parse(EarleyParser.java:835)
        at org.kframework.parser.inner.ParseInModule.parseStringTerm(ParseInModule.java:396)
        at org.kframework.parser.inner.ParseInModule.parseString(ParseInModule.java:340)
        at org.kframework.parser.inner.ParseInModule.parseString(ParseInModule.java:295)
        at org.kframework.kompile.CompiledDefinition.parseSingleTerm(CompiledDefinition.java:208)
        at org.kframework.parser.KRead.prettyRead(KRead.java:59)
        at org.kframework.parser.KRead.prettyRead(KRead.java:43)
        at org.kframework.kast.KastFrontEnd.run(KastFrontEnd.java:254)
        at org.kframework.main.FrontEnd.main(FrontEnd.java:60)
        at org.kframework.main.Main.runApplication(Main.java:127)
        at org.kframework.main.Main.runApplication(Main.java:117)
        at org.kframework.main.Main.main(Main.java:58)
[Error] Internal: Uncaught exception thrown of type NegativeArraySizeException
(NegativeArraySizeException: -1)

Expected Results

kast successfully parses the file.

@pxhdev
Copy link
Author

pxhdev commented Aug 19, 2024

I believe I have identified the cause of this issue. The line:

  syntax Stmt ::= List{Int, "\n"}

is desugared to

  // Parsed.txt
  syntax Stmts ::= Stmt "" Stmts [userList(*)]
  ...

This results in the scanner assigning a kind of -1 to the \n separator, which then triggers an exception.

From my investigation, the root cause seems to be in the UserList.java file, where the separator is being trimmed:

  public UserList(Sort sort, String separator) {
    this.sort = sort;
    this.separator = separator.trim();
    this.listType = ZERO_OR_MORE;
  }

  public UserList(Sort sort, String separator, String listType) {
    this.sort = sort;
    this.separator = separator.trim();
    this.listType = listType;
  }

Wouldn't it be better to leave the separator as it is instead of trimming it?

@ehildenb
Copy link
Member

Huh, this is an interesting corner case I believe. Usually, we are counting on all whitespace to be deleted because of the #Layout, but you're setting custom #Layout in your example.

So with default #Layout, trimming whitespace is needed, but with custom layout, I guess what's instead needed is you want to "trim all parts of the production that match #Layout.

I'm tempted to say we just shouldn't trim at all, like you said. My guess is that this trimming was to prevent the case where we had users write List{Elem, " "}, which would never work because the layout would discard all space tokens.

Maybe we can get rid of this trim, but also have a check up front for all list-separator productions, to see if they intersect with any #Layout regexs? If so, then we throw an error. @Scott-Guest thoughts?

@Scott-Guest
Copy link
Contributor

Maybe we can get rid of this trim, but also have a check up front for all list-separator productions, to see if they intersect with any #Layout regexs? If so, then we throw an error. @Scott-Guest thoughts?

I agree this is the right approach! We'll need to implement a translation to the dk.brics.automaton syntax to compute the regex intersections (#4271 also needs this), but it shouldn't be too much work now that #4295 is done.

@ehildenb
Copy link
Member

I think for now we can probably remove the trim(), or put removing it behind an option, like --no-trim-list-separator, so that @pxhdev can make progress?

@pxhdev
Copy link
Author

pxhdev commented Aug 22, 2024

Maybe we can get rid of this trim, but also have a check up front for all list-separator productions, to see if they intersect with any #Layout regexs? If so, then we throw an error. @Scott-Guest thoughts?

I agree this is the right approach! We'll need to implement a translation to the dk.brics.automaton syntax to compute the regex intersections (#4271 also needs this), but it shouldn't be too much work now that #4295 is done.

I noticed that in K v7.1.116, the trim() has been removed, allowing the syntax syntax Stmts ::= List{Stmt, "\n"} to work now. Thank you very much for fixing this issue.

However, I have a follow-up question: I couldn't find a way to match \n on the LHS of a rewrite rule. For the syntax syntax Stmts ::= List{Stmt, ";"}, I can write the rule rule <k> S:Stmt; Ss:Stmts => S ~> Ss ... </k> for single-step execution. But I couldn't match \n on the LHS of the rule for List{Stmt, "\n"}. I guess I cannot place a real \n there, because it will be discarded by the K inner parser(?). I also tried using _, but it didn't work. Is there any way to address this? Thanks a lot!

@Scott-Guest
Copy link
Contributor

Scott-Guest commented Aug 22, 2024

@pxhdev You're correct - the newline gets discarded. Normally \n isn't meaningful inner syntax, so it's nice to strip it out and allow inner syntax to cross line breaks, at the expense of some annoyance when \n actually is meaningful.

To work around this, and other similar inner parsing issues, every syntax production in K has an associated name called its the "symbol" or "klabel", which you can set with the symbol attribute.

syntax Int ::= Int "if" Bool "else" Int [symbol("if-else"), function]

Rather than writing the production directly, you can instead refer to it by surrounding the symbol with backticks, so

rule `if-else`(X, true , _) => X
rule `if-else`(_, false, Y) => Y

is the same as

rule X if true  else _ => X
rule _ if false else Y => Y

For List syntax productions, you can set the symbol for the separator using symbol and the symbol for the terminator using terminator-symbol. That is, a definition like

syntax Stmts ::= List{Stmt, "\n"} [symbol("StmtsCons"), terminator-symbol("StmtsNil")]

is desugared under-the-hood to

syntax Stmts ::= Stmt "\n" Stmts [symbol("StmtsCons")]
               | ".Stmts" [symbol("StmtsNil")]

Your rule would then be written like

rule <k> `StmtsCons`(S:Stmt, Ss:Stmts) => S ~> Ss ... </k>

Let us know if this is unclear or if you run into any further issues!

@pxhdev
Copy link
Author

pxhdev commented Aug 22, 2024

@pxhdev You're correct - the newline gets discarded. Normally \n isn't meaningful inner syntax, so it's nice to strip it out and allow inner syntax to cross line breaks, at the expense of some annoyance when \n actually is meaningful.

To work around this, and other similar inner parsing issues, every syntax production in K has an associated name called its the "symbol" or "klabel", which you can set with the symbol attribute.

syntax Int ::= Int "if" Bool "else" Int [symbol("if-else"), function]

Rather than writing the production directly, you can instead refer to it by surrounding the symbol with backticks, so

rule `if-else`(X, true , _) => X
rule `if-else`(_, false, Y) => Y

is the same as

rule X if true  else _ => X
rule _ if false else Y => Y

For List syntax productions, you can set the symbol for the separator using symbol and the symbol for the terminator using terminator-symbol. That is, a definition like

syntax Stmts ::= List{Stmt, "\n"} [symbol("StmtsCons"), terminator-symbol("StmtsNil")]

is desugared under-the-hood to

syntax Stmts ::= Stmt "\n" Stmts [symbol("StmtsCons")]
               | ".Stmts" [symbol("StmtsNil")]

Your rule would then be written like

rule <k> `StmtsCons`(S:Stmt, Ss:Stmts) => S ~> Ss ... </k>

Let us know if this is unclear or if you run into any further issues!

Update:

I think this could be solved by introducing a new layer:

rule <k> #load(`StmtsCons`(S:Stmt, Ss:Stmts)) => #innerLoad(S) ~> #load(Ss) ... </k>
rule <k> #innerLoad(const V = I) => .K ... </k>
rule <k> #innerLoad(var V = I) => .K ... </k>

This seems slightly different from lists. My original code looks like this:

syntax Stmts ::= List{Stmt, ";"}
syntax Stmt ::= VarDef | ConstDef
syntax VarDef ::= "var" Id "=" Exp
syntax ConstDef ::= "const" Id "=" ConstExp

rule <k> #load(var V = I; Ss:Stmt) => #load(Ss) ... </k>
       ... // do something in other cells
 rule <k> #load(const V = I ; Ss:Stmt) => #load(Ss) ... </k>
      ... // do something in other cells

When I fetch the first element of the list, I can specify a particular subtype.
However, with this workaround, I encountered the following error:

// Inner Parser: Could not find any suitable productions for label `StmtCons`
rule <k> #load( `StmtCons`(var V:Id = I, Ss:Stmts) ) => #load(Ss) ... </k>

It seems to only accept

`StmtCons`(S:Stmt, Ss:Stmts) 

@pxhdev
Copy link
Author

pxhdev commented Aug 22, 2024

@pxhdev You're correct - the newline gets discarded. Normally \n isn't meaningful inner syntax, so it's nice to strip it out and allow inner syntax to cross line breaks, at the expense of some annoyance when \n actually is meaningful.

To work around this, and other similar inner parsing issues, every syntax production in K has an associated name called its the "symbol" or "klabel", which you can set with the symbol attribute.

syntax Int ::= Int "if" Bool "else" Int [symbol("if-else"), function]

Rather than writing the production directly, you can instead refer to it by surrounding the symbol with backticks, so

rule `if-else`(X, true , _) => X
rule `if-else`(_, false, Y) => Y

is the same as

rule X if true  else _ => X
rule _ if false else Y => Y

For List syntax productions, you can set the symbol for the separator using symbol and the symbol for the terminator using terminator-symbol. That is, a definition like

syntax Stmts ::= List{Stmt, "\n"} [symbol("StmtsCons"), terminator-symbol("StmtsNil")]

is desugared under-the-hood to

syntax Stmts ::= Stmt "\n" Stmts [symbol("StmtsCons")]
               | ".Stmts" [symbol("StmtsNil")]

Your rule would then be written like

rule <k> `StmtsCons`(S:Stmt, Ss:Stmts) => S ~> Ss ... </k>

Let us know if this is unclear or if you run into any further issues!

Everything works. Your explanation is really clear and helpful. Thank you soooooo much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants