Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbol conflicts in grammars #2153

Closed
kaby76 opened this issue Apr 20, 2021 · 1 comment
Closed

Symbol conflicts in grammars #2153

kaby76 opened this issue Apr 20, 2021 · 1 comment
Labels
symbol-conflict Rule names that conflict with keywords in certain runtimes

Comments

@kaby76
Copy link
Contributor

kaby76 commented Apr 20, 2021

I am nearly done with trrename, which takes a grammar and a list of symbols to rename and generates a new grammar with the renamed symbols. And, I have made an initial pass over all the grammars in the repo. There are many, many changes still required to remove all symbol conflicts. Most of them are now due to the Swift target, which I haven't yet added to the trgen template processor.

Below is a script that does a complete rename of the symbols in all grammars. The runtime for it took less than I initially estimated, 9 minutes to fully rename all symbol conflicts in all grammars. There are still bugs in the trrename tool, so I don't know exactly how many grammars changed, but it's likely well over half, probably close to 3/4 of the grammars that require some symbol renaming. If this was to be done manually, it would in all likelihood require months of work.

Most of the script just sets up the list of symbols that are to be renamed. There are 9 targets for Antlr needed to be specified, a merge of each into one list, and a sort of the list. Only a few lines are required to find a grammar file and do the renaming.

Since the script is so quick and easy to apply, I'll likely supply a script to rename symbols back without the underscores. This will allow people who don't want to change to still use the grammars.

# Symbol conflicts arise over the use of a target-specific reserved
# word being used in a grammar. Up to now, this list was derived
# experimentally. The list can actually be derived from the sources in
# Antlr here:
# https://github.com/antlr/antlr4/tree/master/tool/src/org/antlr/v4/codegen/target
# The list here is a copy/paste of the source code reserved words,
# modified for consumption in Bash.

all_sym=("alignas")

sym_csharp=( \
		"alignas" "alignof" "and" "and_eq" "asm" "auto" "bitand" \
		"bitor" "bool" "break" "case" "catch" "char" "char16_t" \
		"char32_t" "class" "compl" "concept" "const" "constexpr" \
		"const_cast" "continue" "decltype" "default" "delete" "do" \
		"double" "dynamic_cast" "else" "enum" "explicit" "export" \
		"extern" "false" "float" "for" "friend" "goto" "if" \
		"inline" "int" "long" "mutable" "namespace" "new" \
		"noexcept" "not" "not_eq" "nullptr" "operator" "or" \
		"or_eq" "private" "protected" "public" "register" \
		"reinterpret_cast" "requires" "return" "short" "signed" \
		"sizeof" "static" "static_assert" "static_cast" "struct" \
		"switch" "template" "this" "thread_local" "throw" "true" \
		"try" "typedef" "typeid" "typename" "union" "unsigned" \
		"using" "virtual" "void" "volatile" "wchar_t" "while" \
		"xor" "xor_eq" \
		)
sym_dart=( \
		"abstract" "dynamic" "implements" "show" \
		"as" "else" "import" "static" \
		"assert" "enum" "in" "super" \
		"async" "export" "interface" "switch" \
		"await" "extends" "is" "sync" \
		"break" "external" "library" "this" \
		"case" "factory" "mixin" "throw" \
		"catch" "false" "new" "true" \
		"class" "final" "null" "try" \
		"const" "finally" "on" "typedef" \
		"continue" "for" "operator" "var" \
		"covariant" "Function" "part" "void" \
		"default" "get" "rethrow" "while" \
		"deferred" "hide" "return" "with" \
		"do" "if" "set" "yield" \
)

sym_go=( \
			"break" "default" "func" "interface" "select" \
			"case" "defer" "go" "map" "struct" \
			"chan" "else" "goto" "package" "switch" \
			"const" "fallthrough" "if" "range" "type" \
			"continue" "for" "import" "return" "var" \
)

sym_javascript=( \
		"break" "case" "class" "catch" "const" "continue" "debugger" \
		"default" "delete" "do" "else" "export" "extends" "finally" "for" \
		"function" "if" "import" "in" "instanceof" "let" "new" "return" \
		"super" "switch" "this" "throw" "try" "typeof" "var" "void" \
		"while" "with" "yield" \
		"enum" "await" "implements" "package" "protected" "static" \
		"interface" "private" "public" \
		"abstract" "boolean" "byte" "char" "double" "final" "float" \
		"goto" "int" "long" "native" "short" "synchronized" "transient" \
		"volatile" \
		"null" "true" "false" \
)

sym_java=( \
		"abstract" "assert" "boolean" "break" "byte" "case" "catch" \
		"char" "class" "const" "continue" "default" "do" "double" "else" \
		"enum" "extends" "false" "final" "finally" "float" "for" "goto" \
		"if" "implements" "import" "instanceof" "int" "interface" \
		"long" "native" "new" "null" "package" "private" "protected" \
		"public" "return" "short" "static" "strictfp" "super" "switch" \
		"synchronized" "this" "throw" "throws" "transient" "true" "try" \
		"void" "volatile" "while" \
)

sym_php=( \
		"abstract" "and" "array" "as" \
		"break" \
		"callable" "case" "catch" "class" "clone" "const" "continue" \
		"declare" "default" "die" "do" \
		"echo" "else" "elseif" "empty" "enddeclare" "endfor" "endforeach" \
		"endif" "endswitch" "endwhile" "eval" "exit" "extends" \
		"final" "finally" "for" "foreach" "function" \
		"global" "goto" \
		"if" "implements" "include" "include_once" "instanceof" "insteadof" "interface" "isset" \
		"list" \
		"namespace" "new" \
		"or" \
		"print" "private" "protected" "public" \
		"require" "require_once" "return" \
		"static" "switch" \
		"throw" "trait" "try" \
		"unset" "use" \
		"var" \
		"while" \
		"xor" \
		"yield" \
		"__halt_compiler" "__CLASS__" "__DIR__" "__FILE__" "__FUNCTION__" \
		"__LINE__" "__METHOD__" "__NAMESPACE__" "__TRAIT__" \
)

sym_python2=( \
		"abs" "all" "and" "any" "apply" "as" "assert" \
		"bin" "bool" "break" "buffer" "bytearray" \
		"callable" "chr" "class" "classmethod" "coerce" "compile" "complex" "continue" \
		"def" "del" "delattr" "dict" "dir" "divmod" \
		"elif" "else" "enumerate" "eval" "except" "exec" "execfile" \
		"file" "filter" "finally" "float" "for" "format" "from" "frozenset" \
		"getattr" "global" "globals" \
		"hasattr" "hash" "help" "hex" \
		"id" "if" "import" "in" "input" "int" "intern" "is" "isinstance" "issubclass" "iter" \
		"lambda" "len" "list" "locals" \
		"map" "max" "min" "next" "not" \
		"memoryview" \
		"object" "oct" "open" "or" "ord" \
		"pass" "pow" "print" "property" \
		"raise" "range" "raw_input" "reduce" "reload" "repr" "return" "reversed" "round" \
		"set" "setattr" "slice" "sorted" "staticmethod" "str" "sum" "super" \
		"try" "tuple" "type" \
		"unichr" "unicode" \
		"vars" \
		"while" "with" \
		"xrange" \
		"yield" \
		"zip" \
		"__import__" \
		"True" "False" "None" \
)

sym_python3=( \
		"abs" "all" "and" "any" "apply" "as" "assert" \
		"bin" "bool" "break" "buffer" "bytearray" \
		"callable" "chr" "class" "classmethod" "coerce" "compile" "complex" "continue" \
		"def" "del" "delattr" "dict" "dir" "divmod" \
		"elif" "else" "enumerate" "eval" "execfile" "except" \
		"file" "filter" "finally" "float" "for" "format" "from" "frozenset" \
		"getattr" "global" "globals" \
		"hasattr" "hash" "help" "hex" \
		"id" "if" "import" "in" "input" "int" "intern" "is" "isinstance" "issubclass" "iter" \
		"lambda" "len" "list" "locals" \
		"map" "max" "min" "memoryview" \
		"next" "nonlocal" "not" \
		"object" "oct" "open" "or" "ord" \
		"pass" "pow" "print" "property" \
		"raise" "range" "raw_input" "reduce" "reload" "repr" "return" "reversed" "round" \
		"set" "setattr" "slice" "sorted" "staticmethod" "str" "sum" "super" \
		"try" "tuple" "type" \
		"unichr" "unicode" \
		"vars" \
		"with" "while" \
		"yield" \
		"zip" \
		"__import__" \
		"True" "False" "None" \
)

sym_swift=( \
			"associatedtype" "class" "deinit" "enum" "extension" "func" "import" "init" "inout" "internal" \
			"let" "operator" "private" "protocol" "public" "static" "struct" "subscript" "typealias" "var" \
			"break" "case" "continue" "default" "defer" "do" "else" "fallthrough" "for" "guard" "if" \
			"in" "repeat" "return" "switch" "where" "while" \
			"as" "catch" "dynamicType" "false" "is" "nil" "rethrows" "super" "self" "Self" "throw" "throws" \
			"true" "try" "__COLUMN__" "__FILE__" "__FUNCTION__""__LINE__" "#column" "#file" "#function" "#line" "_"  "#available" "#else" "#elseif" "#endif" "#if" "#selector" \
			"associativity" "convenience" "dynamic" "didSet" "final" "get" "infix" "indirect" "lazy" \
			"left" "mutating" "none" "nonmutating" "optional" "override" "postfix" "precedence" \
			"prefix" "Protocol" "required" "right" "set" "Type" "unowned" "weak" "willSet" \
)

# combine all lists.
for i in ${sym_csharp[@]}
do
  found="false"
  for j in ${all_sym[@]}
  do
	if [[ "$i" == "$j" ]]
	then
	  found="true"
	  break
	fi
  done
  if [[ "$found" == "false" ]]
  then
	all_sym+=($i)
  fi
done

for i in ${sym_dart[@]}
do
  found="false"
  for j in ${all_sym[@]}
  do
	if [[ "$i" == "$j" ]]
	then
	  found="true"
	  break
	fi
  done
  if [[ "$found" == "false" ]]
  then
	all_sym+=($i)
  fi
done

for i in ${sym_go[@]}
do
  found="false"
  for j in ${all_sym[@]}
  do
	if [[ "$i" == "$j" ]]
	then
	  found="true"
	  break
	fi
  done
  if [[ "$found" == "false" ]]
  then
	all_sym+=($i)
  fi
done

for i in ${sym_javascript[@]}
do
  found="false"
  for j in ${all_sym[@]}
  do
	if [[ "$i" == "$j" ]]
	then
	  found="true"
	  break
	fi
  done
  if [[ "$found" == "false" ]]
  then
	all_sym+=($i)
  fi
done

for i in ${sym_java[@]}
do
  found="false"
  for j in ${all_sym[@]}
  do
	if [[ "$i" == "$j" ]]
	then
	  found="true"
	  break
	fi
  done
  if [[ "$found" == "false" ]]
  then
	all_sym+=($i)
  fi
done

for i in ${sym_php[@]}
do
  found="false"
  for j in ${all_sym[@]}
  do
	if [[ "$i" == "$j" ]]
	then
	  found="true"
	  break
	fi
  done
  if [[ "$found" == "false" ]]
  then
	all_sym+=($i)
  fi
done

for i in ${sym_python2[@]}
do
  found="false"
  for j in ${all_sym[@]}
  do
	if [[ "$i" == "$j" ]]
	then
	  found="true"
	  break
	fi
  done
  if [[ "$found" == "false" ]]
  then
	all_sym+=($i)
  fi
done

for i in ${sym_python3[@]}
do
  found="false"
  for j in ${all_sym[@]}
  do
	if [[ "$i" == "$j" ]]
	then
	  found="true"
	  break
	fi
  done
  if [[ "$found" == "false" ]]
  then
	all_sym+=($i)
  fi
done

for i in ${sym_swift[@]}
do
  found="false"
  for j in ${all_sym[@]}
  do
	if [[ "$i" == "$j" ]]
	then
	  found="true"
	  break
	fi
  done
  if [[ "$found" == "false" ]]
  then
	all_sym+=($i)
  fi
done

IFS=$'\n' sorted=($(sort <<<"${all_sym[*]}"))
unset IFS

list=""
for i in ${sorted[@]}; do
  list="$list;$i,${i}_"
done
list=`echo "$list" | sed 's/^;//'`
echo $list

pushd "../GitHub/grammars-v4-new - Copy"
rm -rf `find . -name 'Generated' -type d`
trgen
for i in `find . -name '*.g4' | grep Generated | grep -v new.g4`
do
  echo "====="
  echo "$i"
  date
  thedir="$(dirname "${i}")"
  thefile="$(basename "${i}")"
  pushd "$thedir"
  trparse -t antlr4 -f "$thefile" | trrename.exe -r "$list" | trprint > a.a
  if [ -s a.a ]
  then
	mv a.a "$thefile"
  fi
  diff "$thefile" ..
  popd
  date
done
@kaby76
Copy link
Contributor Author

kaby76 commented Apr 21, 2021

Attached are the diffs between the original and fixed grammars. 188 .g4 files have no conflicts. 119 .g4 files have symbol conflicts.

out.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
symbol-conflict Rule names that conflict with keywords in certain runtimes
Projects
None yet
Development

No branches or pull requests

2 participants