TODO.txt

= Misc. =

* Should really consider whether it still is meaningful to keep this scoring
  system alive... It was a useful way to help identify cases which a grammar
  rule missed out on. But the grammar has grown so much by now that
  maintaining these rules may no longer be worth the effort.

* Idea for managing keywords in a better way:
{code}
# Nothing is a keyword by default.

# Following defines FOO to be a keyword anywhere in or below the foo or fee
# grammar rule.
FOO: in foo or in fee

# Following defines FUM to be a keyword anywhere, except in or below the fum
# grammar rule.
FUM: everywhere and not in fum
{code}
It works kind of like the scoring system, except that it inspects the runtime
stack of active grammar rules. Needs some more work on the checks we want to
express and the syntax to express them in.

* Abstract some of the classes which are now in koopa.parsers.cobol.

* Idea for parsing embedded languages:
{code}
# This skips to the "endOfConditionMarker", as usual.
--> endOfConditionMarker

# This first skips to the "endOfConditionMarker", and will then match what was skipped with a "condition". If "condition" can't be matched then the skipping will fail too.
-- condition --> endOfConditionMarker

# Same concept as above, but with "sql" being a separate parser which has a "statement" grammar rule.
EXEC SQL (-- sql:statement --> END-EXEC) END-EXEC 
{code}


* A "longest match" operator ? E.g.: $( identifier | arithmeticExpression ),
  which would classify inputs like "A" as an identifier, but inputs like
  "A + 1" as an arithmetic expression... Length is measured in number of tokens,
  though "furthest position in file" would be equivalent.

* Add caching strategy to testing of tokens. To help prevent duplicate work.
  I.e. if something wasn't an identifier the first time around it won't be 
  any later time either.

* Get rid of the syntax for capturing tokens and native code in the grammars ?

* Fix degenerate results from grammar rules. E.g. a complexCondition can
  degenerate into a simpleCondition. But the AST will still have both...

* Reconsider allowing these intermediate tokenizers. Maybe a listener pattern
  would be better.

* Remove the structure-shy ANTLR tree parsers and standardize on XML and XPath ?


= Customizing the generated ANTLR tree parser =

Due to a bug in ANTLR I can't use the generated tree parser in a composite
grammar. This means that any actions will have to be placed either in the
original tree parser or a copy of that one. Either way, this is not nice when
we're evolving the grammar.

Bug details: http://www.antlr.org/jira/browse/ANTLR-375;jsessionid=10155AFC751AEEF9A46B87FDBD3DA9DA?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel


= Comment entries =

Error on testsuite/cobol85/OBNC1M.CBL.

"Comment-entry can consist of any combination of characters from the computer
 character set.  Comment-entries can span several lines in Area B.  However,
 they cannot be continued by using a hyphen in the indicator area.  The end of
 comment-entry is the line before the next entry in Area A."

This was obsoleted in the 1985 standard. I'm not sure whether I should bother
with parsing this.


= Save to srcml ? =

http://www.sdml.info/projects/srcml/