Skip to content
Ken Domino edited this page Aug 12, 2024 · 29 revisions

Welcome to the grammars-v4

The grammars-v4 repository is a collection of ANTLR4 grammars contributed by authors around the world. Grammars-v4 uses trgen, antlr4test-maven-plugin, a number of scripts in the _scripts directory, and Github Actions to ensure that all grammars on the tree build and parse input files properly with ANTLR4.

Each grammar has a directory of examples, which contains input files and the expected output from the parse (parse errors contained in .errors files; parse tree of the input contained in .tree files). Testing is performed across: Ubuntu, macOS, and Windows operating systems; Cpp (C++), CSharp (C#), Dart (Dart2), Go, Java, JavaScript, PHP, and Python3 targets; Bash and Powershell environments.

A core value of grammars-v4 is that any grammar downloaded from grammars-v4 will compile properly with ANTLR4, and has been validated against some example inputs.

FAQ

What are the licensing terms for Grammars-v4?

There is no single license for the grammars; each grammar has its own license. Check inside the grammar files for licensing terms.

There is no grammar for the language or file format I need. What do I do?

You are welcome to submit an issue ticket, and contributions to the grammars tree are also welcome.

If you add a grammar, you should add a desc.xml, an examples/ directory to test it, and a readme.md to document the grammar.

What is required to submit a grammar?

  • You need to place the grammar in a directory that is appropriately named.
  • In that directory (aka "the root directory for the grammar"), add .g4's, desc.xml, examples in directory examples/. Please include a readme.md with notes on the source for the grammar, version information, copyrights, authorship, etc.
  • You can make the grammar combined (one .g4) or split (two .g4's). If combined, the grammar and file name must be identical. Do not add "Parser" to the name for a combined grammar. If Split, the name of the lexer must end in "Lexer" and the parser end in "Parser".
  • Actions or semantic predicates are ok if necessary for defining syntax. It is best if you use "target agnostic format".
  • Make sure you have tested the grammar for Java.

My PR was rejected! Why?!

If your PR breaks the existing tests, it will be rejected. Additionally, we ask that any incremental changes made to grammar files have examples contributed to the /examples directory for that grammar to ensure that future changes to the grammars don't introduce regressions.

Are there examples of how to use the grammars?

Look here

Is there a coding standard for ANTLR4 grammars?

All grammars in the repository are formatted according to common rules. Formatting is tested for each PR and if that fails the PR is rejected. The tool used to format an ANTLR4 grammar is antlr-format. You need Node.js installed to run it.

All grammars in the repository contain formatting options that mirror the common rules. These options must not be changed in a PR, unless the maintainers change these rules and reformat the entire repository again.

New grammars usually do not contain these formatting options. You can either copy them from an existing grammar or let the antlr-format tool add them for you. Consult the readme of the antlr-format terminal tool how to run it and read the Configuration section for details of the config file to use, to prepare your new grammar for a PR. Existing grammars don't need a config file (they have all options as comments), which look like:

// $antlr-format alignTrailingComments true, columnLimit 150, useTab false ...

How can I use ANTLR4 to parse binary files?

There is an example at /tcpheader/

How can I download grammars from the github page in a maven build?

Use download-maven-plugin

<plugin>
	<groupId>com.googlecode.maven-download-plugin</groupId>
	<artifactId>download-maven-plugin</artifactId>
	<version>1.4.0</version>
	<executions>
		<execution>
			<phase>generate-sources</phase>
			<goals>
				<goal>wget</goal>
			</goals>
			<configuration>
				<url>https://raw.githubusercontent.com/antlr/grammars-v4/master/arithmetic/arithmetic.g4</url>
				<outputFileName>arithmetic.g4</outputFileName>
				<outputDirectory>src/main/antlr4/com/khubla/antlr4example/</outputDirectory>
			</configuration>
		</execution>
	</executions>
</plugin>

How do I test the grammars?

Manually

This is the least desirable method to test a grammar because the Antlr4 website does not give very good instructions on how to write a program then build it. It is best that you use trgen to generate a complete, functioning program from templates.

However, if you insist, do the following:

  1. Clone this repo and cd to the directory of the grammar you want to use.
  2. Verify the desc.xml contains in the <targets> element that the target for this grammar works.
  3. Copy any files from the directory named after the target to the root directory for the grammar. For example, for the cpp grammar, copy the files in the CSharp directory to the directory that contains .g4's.
  4. If a transformGrammars.py file exists, then run python3 transformGrammars.py. This performs modifications to the grammar that are specifically for the target.
  5. Generate the parser and lexer recognizer files manually via antlr4 -Dlanguage=<target> *.g4, e.g., antlr4 -Dlanguage=CSharp CPP14Lexer.g4 CPP14Parser.g4.
  6. Follow the steps in the webpage Runtime Libraries and Code Generation Targets to write a driver program.
  7. Build your program.

It is very likely that you will have problems. You will need to resolve these issues yourself.

Using Trgen (all targets)

  1. Install dotnet version 8.
  2. Install "antlr4-tools". pip install antlr4-tools. See https://github.com/antlr/antlr4-tools
  3. Install target-specific support, e.g., G++, Dart, Go, etc.
  4. Install the Trash toolkit installed. See the documentation.
  5. git clone https://github.com/antlr/grammars-v4.git
  6. cd grammars-v4/<grammar-of-your-choice>. E.g., cd grammars-v4/java/java.
  7. trgen. This will create a driver for all implemented targets that work with the grammar. See the desc.xml file for this list.
  8. cd Generated-<target-of-your-choice>. E.g., cd Generated-CSharp.
  9. In a Bash prompt, type make; make test. Or, in a Powershell prompt, type pwsh build.ps1; pwsh test.ps1. The scripts create temporary files used in the build. Use git clean -f to remove these files.
  10. Tests create .errors and .tree files automatically. If you want, you can check these in for testing across targets and OSes.

Using Maven (Java only)

  1. Clone grammars-v4. git clone https://github.com/antlr/grammars-v4.git
  2. Make sure you have Maven installed. See the documentation.
  3. cd grammars-v4 (the root directory), or to a grammar cd grammars-v4/java/java.
  4. Execute mvn clean test.

How do I add testing to a new grammar?

If you want to add a grammar to the repo, you will need to create two .XML files and place them in the directory containing your .g4's.

pom.xml for Maven testing

This file is for the Antlr Maven Tester. This tester only tests the Java target.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<artifactId>abb</artifactId>
	<packaging>jar</packaging>
	<name>abb grammar</name>
	<parent>
		<groupId>org.antlr.grammars</groupId>
		<artifactId>grammarsv4</artifactId>
		<version>1.0-SNAPSHOT</version>
	</parent>
	<build>
		<plugins>
			<plugin>
				<groupId>org.antlr</groupId>
				<artifactId>antlr4-maven-plugin</artifactId>
				<version>${antlr.version}</version>
				<configuration>
					<sourceDirectory>${basedir}</sourceDirectory>
					<includes>
					   <include>abbParser.g4</include>
					   <include>abbLexer.g4</include>
					</includes>
					<visitor>true</visitor>
					<listener>true</listener>
				</configuration>
				<executions>
					<execution>
						<goals>
							<goal>antlr4</goal>
						</goals>
					</execution>
				</executions>
			</plugin>
			<plugin>
				<groupId>com.khubla.antlr</groupId>
				<artifactId>antlr4test-maven-plugin</artifactId>
				<version>${antlr4test-maven-plugin.version}</version>
				<configuration>
					<verbose>false</verbose>
					<showTree>false</showTree>
					<entryPoint>module_</entryPoint>
					<grammarName>abb</grammarName>
					<packageName></packageName>
					<exampleFiles>examples/</exampleFiles>
				</configuration>
				<executions>
					<execution>
						<goals>
							<goal>test</goal>
						</goals>
					</execution>
				</executions>
			</plugin>
		</plugins>
	</build>
</project>

Make sure the <includes> element only lists the top-level .g4's of the grammar. Do not include "import" grammars. The <entryPoint> must be the start rule, which should have EOF as the last symbol in the right-hand side. The <packageName> must be empty because the concept of a "package" does not have meaning across all targets.

desc.xml for trgen testing

trgen is now used to test all grammars across all targets and OSes.

<?xml version="1.0" encoding="UTF-8" ?>
<desc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../_scripts/desc.xsd">
   <targets>Antlr4ng;CSharp;Cpp;Dart;Go;Java;JavaScript;PHP;Python3;TypeScript</targets>
   <inputs>examples/**/*.sys</inputs>
</desc>

The <targets> element specifies all targets to test. If the grammar is target-specific, then you should try to write ports for as many targets as possible. Alternatively, limit the list of targets to what the grammar can work with. The <inputs> element indicates the path to the input files to test. Globbing and wildcards are optional. You may need to add the <targets> and <inputs> to a specific <test> element if the grammar performance is very poor for certain targets. Other elements you may want to use are:

  • <entry-point> to specify a specific entry point.
  • <grammar-files> to specify top-level .g4 files.
  • <grammar-name> to specify a specific grammar to avoid confusion of which grammar to test if there are multiple top-level grammars.
Clone this wiki locally