Skip to content

Commit

Permalink
Squashed 'runtime/Rust/' changes from 13d5a35cd..f8beaf8b6
Browse files Browse the repository at this point in the history
f8beaf8b6 fixed visitor architecture
f8da12f9e update readme and use rustfmt for formatting
d28736137 Fix `enterXXX` listener calls for alternative labels (antlr#13)
f0a2da766 fully finished support for zero-copy, generic token, and generic underlying data.
fdbf64f0f finished generic token support(almost, amend this)
d765c850a preliminary byte parser support, almost fully refotmatted with rustfmt
6bb617b51 more flexible tree structure and listener can have any lifetime now, more type safety
1188be780 zero-copy done, input stream changed accordingly.
2e75727b8 zero-copy, input_stream rewritten, docs improved
679319354 wip zero-copy, almost done, most of the tests passing
7833ab8fe wip zerocopy, compiles/passes tests successfully, only parse tree changes remaining
466b370dc wip zerocopy x2, lib compiles successfully
97cb6f8e5 wip zerocopy, compiles successfully
d8078f5fa minor adjustments
6aa622437 added proper build.rs, first change for next version - generic over token type
5bf0b080f fixed sometimes missing hash for prediction context
REVERT: 13d5a35cd fixed sometimes missing hash for prediction context

git-subtree-dir: runtime/Rust
git-subtree-split: f8beaf8b6d54cffa9d262abc54ef8d89511544d3
  • Loading branch information
rrevenantt committed Sep 23, 2020
1 parent 4ddbac3 commit fe6f621
Show file tree
Hide file tree
Showing 70 changed files with 8,167 additions and 5,442 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
.idea
.vscode
/target
/tests/gen/*.tokens
/tests/gen/*.interp
**/*.rs.bk
Cargo.lock
19 changes: 10 additions & 9 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,25 +1,26 @@
[package]
name = "antlr-rust"
version = "0.1.1"
version = "0.2.0-dev.1"
authors = ["Konstantin Anisimov <rrevenantt@gmail.com>"]
homepage = "https://github.com/rrevenantt/antlr4rust"
repository = "https://github.com/rrevenantt/antlr4rust"
documentation = "https://docs.rs/antlr-rust"
description = "ANTLR4 runtime for Rust"
readme = "README.md"
edition = "2018"
license = "BSD-3-Clause"
keywords = ["ANTLR","ANTLR4","parsing","runtime"]
categories = ["parsing"]

[dependencies]
lazy_static = "1.4.*"
uuid = "0.6.*"
byteorder = "1"
murmur3 = "0.4"
bit-set = "0.5.*"
once_cell = "1.2.*"
backtrace = "0.3"
typed-arena = "2.0.*"
lazy_static = "^1.4.*"
uuid = "=0.6.*"
byteorder = "^1"
murmur3 = "=0.4"
bit-set = "=0.5.*"
once_cell = "^1.2.*"
backtrace = "=0.3"
typed-arena = "^2.0.*"

[lib]

Expand Down
70 changes: 37 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,64 +11,59 @@ and [tests/my_tests.rs](tests/my_test.rs) for actual usage examples
### Implementation status

Everything is implemented, "business" logic is quite stable and well tested, but user facing
API is not very robust yet an very likely will have some changes.
API is not very robust yet and very likely will have some changes.

For now development is going on in this repository
but eventually it will be merged to main ANTLR4 repo

Currently requires nightly version of rust.
This very likely will be the case until `specialization`,`try_blocks` and `unsize` features are stabilized.
Currently, requires nightly version of rust.
This likely will be the case until `coerce_unsize` or some kind of coercion trait is stabilized.
There are other unstable features in use but only `CoerceUnsized` is essential.

Remaining things before merge:
- API stabilization
- [ ] Rust api guidelines compliance
- [ ] more tests for API because it is quite different from Java
- make parsing zero copy(i.e. use &str(or Cow) instead String in token and &Token in tree nodes)
- more generic `PredictionContext`
- generic over ownership for string
- generate enum for labeled alternatives without redundant `Error` option
- option to generate fields instead of getters by default
- move useful exports to lib.rs for better documentation

Can be done after merge:
- profiling and performance optimizations
- more profiling and performance optimizations
- Documentation
- [ ] Some things are already documented but still far from perfect, also more links needed.
- Code quality
- [ ] Rustfmt fails to run currently
- [ ] Clippy sanitation
- [ ] Not all warning are fixed
- visitor
- build.rs integration + example
- cfg to not build potentially unnecessary parts
(no Lexer if custom token stream, no ParserATNSimulator if LL(1) grammar)
- run rustfmt on generated parser
###### Long term improvements
- make tree generic over pointer type
- generate enum for labeled alternatives without redundant `Error` option
- option to generate fields instead of getters by default
- make tree generic over pointer type and allow tree nodes to arena.
(requires GAT, otherwise it would be a problem for users that want ownership for parse tree)
- support stable rust
- support no_std(although alloc would still be required)

### Usage

You use the ANTLR4 "tool" to generate a parser, that will use the ANTLR
runtime, located here.

Suppose you're using a UNIX system and have set up an alias for the ANTLR4 tool
as described in [the getting started guide](https://github.com/antlr/antlr4/blob/master/doc/getting-started.md).
To generate your Rust parser, run the following command:
You should use the ANTLR4 "tool" to generate a parser, that will use the ANTLR
runtime, located here. You can run it with the following command:
```bash
antlr4 -Dlanguage=Rust MyGrammar.g4
java -jar <path to ANTLR4 tool> -Dlanguage=Rust MyGrammar.g4
```

For a full list of antlr4 tool options, please visit the
[tool documentation page](https://github.com/antlr/antlr4/blob/master/doc/tool-options.md).

Then add to `Cargo.toml` of the crate from which generated parser is going to be used.
You can also see [build.rs](build.rs) as an example of `build.rs` configuration
to rebuild parser automatically if grammar file was changed

Then add following to `Cargo.toml` of the crate from which generated parser
is going to be used:
```toml
[dependencies]
lazy_static = "1.4"
antlr-rust = "0.1"
antlr-rust = "=0.2"
```
and `#![feature(try_blocks)]` in your project root module.
and `#![feature(try_blocks)]`(also `#![feature(specialization)]` if you are generating visitor) in your project root module.

### Parse Tree structure

Expand All @@ -89,21 +84,30 @@ It also is possible to disable generic parse tree creation to keep only selected
`parser.build_parse_trees = false`.

### Differences with Java
Although Rust runtime API is made as close as possible to Java,
Although Rust runtime API has been made as close as possible to Java,
there are quite some differences because Rust is not an OOP language and is much more explicit.

- All rule context variables (rule argument or rule return) should implement `Default + Clone`.
- Supports full zero-copy parsing including byte parsers.
- If you are using labeled alternatives,
struct generated for rule is a enum with variant for each alternative
- Parser needs to have ownership for listeners, but it is possible te get listener back via `ListenerId`
struct generated for rule is an enum with variant for each alternative
- Parser needs to have ownership for listeners, but it is possible to get listener back via `ListenerId`
otherwise `ParseTreeWalker` should be used.
- In embedded actions to access parser you should use `recog` variable instead of `self`.
This is because predicate have to be inserted into two syntactically different places in generated parser

- In embedded actions to access parser you should use `recog` variable instead of `self`/`this`.
This is because predicate have to be inserted into two syntactically different places in generated parser
- String `InputStream` have different index behavior when there are unicode characters.
If you need exactly the same behavior, use `[u32]` based `InputStream`, or implement custom `CharStream`.
- In actions you have to escape `'` in rust lifetimes with `\ ` because ANTLR considers them as strings, e.g. `Struct<\'lifetime>`
- To make custom tokens you should use `@tokenfactory` custom action, instead of usual `TokenLabelType` parser option.
In Rust target TokenFactory is main customisation interface that allows to specify input type of token type.
- All rule context variables (rule argument or rule return) should implement `Default + Clone`.

### Unsafe
Currently unsafe is used only to cast from trait object back to original type
Currently, unsafe is used only to cast from trait object back to original type
and to update data inside Rc via `get_mut_unchecked`(returned mutable reference is used immediately and not stored anywhere)

### Versioning
In addition to usual Rust semantic versioning,
patch version changes of the crate should not require updating of generator part

## Licence

Expand Down
41 changes: 0 additions & 41 deletions build-inactive.rs

This file was deleted.

62 changes: 62 additions & 0 deletions build.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
use std::convert::TryInto;
use std::env;
use std::env::VarError;
use std::error::Error;
use std::fs::{read_dir, DirEntry, File};
use std::io::Write;
use std::path::Path;
use std::process::Command;

fn main() {
let grammars = vec![
"CSV",
"ReferenceToATN",
"XMLLexer",
"SimpleLR",
"Labels",
"FHIRPath",
];
let additional_args = vec![Some("-visitor"), None, None, None, None];
let antlr_path = "/home/rrevenantt/dev/antlr4/tool/target/antlr4-4.8-2-SNAPSHOT-complete.jar";

for (grammar, arg) in grammars.into_iter().zip(additional_args) {
//ignoring error because we do not need to run anything when deploying to crates.io
let _ = gen_for_grammar(grammar, antlr_path, arg);
}

println!("cargo:rerun-if-changed=build.rs");

println!("cargo:rerun-if-changed=/home/rrevenantt/dev/antlr4/tool/target/antlr4-4.8-2-SNAPSHOT-complete.jar");
}

fn gen_for_grammar(
grammar_file_name: &str,
antlr_path: &str,
additional_arg: Option<&str>,
) -> Result<(), Box<Error>> {
// let out_dir = env::var("OUT_DIR").unwrap();
// let dest_path = Path::new(&out_dir);

let input = env::current_dir().unwrap().join("grammars");
let file_name = grammar_file_name.to_owned() + ".g4";

let c = Command::new("java")
.current_dir(input)
.arg("-cp")
.arg(antlr_path)
.arg("org.antlr.v4.Tool")
.arg("-Dlanguage=Rust")
.arg("-o")
.arg("../tests/gen")
.arg(&file_name)
.args(additional_arg)
.spawn()
.expect("antlr tool failed to start")
.wait_with_output()?;
// .unwrap()
// .stdout;
// eprintln!("xx{}",String::from_utf8(x).unwrap());

println!("cargo:rerun-if-changed=grammars/{}", file_name);
Ok(())
}
6 changes: 5 additions & 1 deletion grammars/CSV.g4
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
grammar CSV;

@tokenfactory{
pub type LocalTokenFactory<'input> = antlr_rust::token_factory::ArenaCommonFactory<'input>;
}

csvFile: hdr row+ ;
hdr : row ;

row : field (',' field)* '\r'? '\n' {println!("test");};
row : field (',' field)* '\r'? '\n';

field
: TEXT
Expand Down
14 changes: 7 additions & 7 deletions grammars/Labels.g4
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
grammar Labels;
s : q=e ;
e returns [String v]
: a=e op='*' b=e {$v = "* ".to_owned() + $a.v + " " + $b.v;} # mult
| a=e '+' b=e {$v = "+ ".to_owned() + $a.v + " " + $b.v;} # add
| INT {$v = $INT.text.to_owned();} # anInt
| '(' x=e ')' {$v = $x.v;} # parens
| x=e '++' {$v = " ++".to_owned() + $x.v;} # inc
| x=e '--' {$v = " --".to_owned() + $x.v;} # dec
| ID {$v = $ID.text.to_owned();} # anID
: a=e op='*' b=e {$v = "* ".to_owned() + $a.v + " " + $b.v;} # mult
| a=e '+' b=e {$v = "+ ".to_owned() + $a.v + " " + $b.v;} # add
| INT {$v = $INT.text.to_owned();} # anInt
| '(' x=e ')' {$v = $x.v;} # parens
| x=e '++' {$v = " ++".to_owned() + $x.v;} # inc
| x=e '--' {$v = " --".to_owned() + $x.v;} # dec
| ID {$v = $ID.text.to_owned();} # anID
;
ID : 'a'..'z'+ ;
INT : '0'..'9'+ ;
Expand Down
5 changes: 5 additions & 0 deletions grammars/ReferenceToATN.g4
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
grammar ReferenceToATN;

@tokenfactory{
pub type LocalTokenFactory<\'input> = antlr_rust::token_factory::OwningTokenFactory;
}
a : (ID|ATN)* ATN? {println!("{}",$text);};
ID : 'a'..'z'+ ;
ATN : '0'..'9'+;
Expand Down
2 changes: 2 additions & 0 deletions rustfmt.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
edition = "2018"
fn_single_line = true
Loading

0 comments on commit fe6f621

Please sign in to comment.