Skip to content

Commit

Permalink
Add name field to the lexical specification
Browse files Browse the repository at this point in the history
  • Loading branch information
nihei9 committed Sep 18, 2021
1 parent fe865a8 commit 7be1d27
Show file tree
Hide file tree
Showing 8 changed files with 97 additions and 23 deletions.
26 changes: 15 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ First, define your lexical specification in JSON format. As an example, let's wr

```json
{
"name": "statement",
"entries": [
{
"kind": "whitespace",
Expand All @@ -43,14 +44,14 @@ First, define your lexical specification in JSON format. As an example, let's wr
}
```

Save the above specification to a file in UTF-8. In this explanation, the file name is lexspec.json.
Save the above specification to a file in UTF-8. In this explanation, the file name is `statement.json`.

### 2. Compile the lexical specification

Next, generate a DFA from the lexical specification using `maleeni compile` command.

```sh
$ maleeni compile -l lexspec.json -o clexspec.json
$ maleeni compile -l statement.json -o statementc.json
```

### 3. Debug (Optional)
Expand All @@ -60,7 +61,7 @@ If you want to make sure that the lexical specification behaves as expected, you
⚠️ An encoding that `maleeni lex` and the driver can handle is only UTF-8.

```sh
$ echo -n 'The truth is out there.' | maleeni lex clexspec.json | jq -r '[.kind_name, .lexeme, .eof] | @csv'
$ echo -n 'The truth is out there.' | maleeni lex statementc.json | jq -r '[.kind_name, .lexeme, .eof] | @csv'
"word","The",false
"whitespace"," ",false
"word","truth",false
Expand Down Expand Up @@ -94,10 +95,10 @@ The JSON format of tokens that `maleeni lex` command prints is as follows:
Using `maleeni-go` command, you can generate a source code of the lexer to recognize your lexical specification.

```sh
$ maleeni-go clexspec.json > lexer.go
$ maleeni-go statementc.json
```

The above command generates the lexer and saves it to `lexer.go` file. To use the lexer, you need to call `NewLexer` function defined in `lexer.go`. The following code is a simple example. In this example, the lexer reads a source code from stdin and writes the result, tokens, to stdout.
The above command generates the lexer and saves it to `statement_lexer.go` file. By default, the file name will be `{spec name}_lexer.json`. To use the lexer, you need to call `NewLexer` function defined in `statement_lexer.go`. The following code is a simple example. In this example, the lexer reads a source code from stdin and writes the result, tokens, to stdout.

```go
package main
Expand Down Expand Up @@ -136,14 +137,14 @@ Please save the above source code to `main.go` and create a directory structure

```
/project_root
├── lexer.go ... Lexer generated from the compiled lexical specification (the result of `maleeni-go`).
└── main.go .... Caller of the lexer.
├── statement_lexer.go ... Lexer generated from the compiled lexical specification (the result of `maleeni-go`).
└── main.go .............. Caller of the lexer.
```

Now, you can perform the lexical analysis.

```sh
$ echo -n 'I want to believe.' | go run main.go lexer.go
$ echo -n 'I want to believe.' | go run main.go statement_lexer.go
valid: word: 'I'
valid: whitespace: ' '
valid: word: 'want'
Expand All @@ -164,8 +165,9 @@ The lexical specification format to be passed to `maleeni compile` command is as

top level object:

| Field | Type | Nullable | Description |
|---------|------------------------|----------|-----------------------------------------------------------------------------------------------------------------------|
| Field | Type | Nullable | Description |
|---------|------------------------|----------|---------------------------------------------------------------------------------------------------------------------------|
| name | string | false | A specification name. |
| entries | array of entry objects | false | An array of entries sorted by priority. The first element has the highest priority, and the last has the lowest priority. |

entry object:
Expand Down Expand Up @@ -292,6 +294,7 @@ For instance, you can define [an identifier of golang](https://golang.org/ref/sp

```json
{
"name": "id",
"entries": [
{
"fragment": true,
Expand Down Expand Up @@ -326,6 +329,7 @@ For instance, you can define a subset of [the string literal of golang](https://

```json
{
"name": "string",
"entries": [
{
"kind": "string_open",
Expand Down Expand Up @@ -369,7 +373,7 @@ For instance, you can define a subset of [the string literal of golang](https://
In the above specification, when the `"` mark appears in default mode (it's the initial mode), the driver transitions to the `string` mode and interprets character sequences (`char_seq`) and escape sequences (`escaped_char`). When the `"` mark appears the next time, the driver returns to the `default` mode.

```sh
$ echo -n '"foo\nbar"foo' | maleeni lex go-string-cspec.json | jq -r '[.mode_name, .kind_name, .lexeme, .eof] | @csv'
$ echo -n '"foo\nbar"foo' | maleeni lex stringc.json | jq -r '[.mode_name, .kind_name, .lexeme, .eof] | @csv'
"default","string_open","""",false
"string","char_seq","foo",false
"string","escaped_char","\n",false
Expand Down
29 changes: 27 additions & 2 deletions cmd/maleeni-go/generate.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,14 @@ func Execute() error {

var generateFlags = struct {
pkgName *string
output *string
}{}

var generateCmd = &cobra.Command{
Use: "maleeni-go",
Short: "Generate a lexer for Go",
Long: `maleeni-go generates a lexer for Go. The lexer recognizes the lexical specification specified as the argument.`,
Example: ` maleeni-go clexspec.json > lexer.go`,
Example: ` maleeni-go clexspec.json`,
Args: cobra.ExactArgs(1),
RunE: runGenerate,
SilenceErrors: true,
Expand All @@ -39,6 +40,7 @@ var generateCmd = &cobra.Command{

func init() {
generateFlags.pkgName = generateCmd.Flags().StringP("package", "p", "main", "package name")
generateFlags.output = generateCmd.Flags().StringP("output", "o", "", "output file path")
}

func runGenerate(cmd *cobra.Command, args []string) (retErr error) {
Expand All @@ -47,7 +49,30 @@ func runGenerate(cmd *cobra.Command, args []string) (retErr error) {
return fmt.Errorf("Cannot read a compiled lexical specification: %w", err)
}

return driver.GenLexer(clspec, *generateFlags.pkgName)
b, err := driver.GenLexer(clspec, *generateFlags.pkgName)
if err != nil {
return fmt.Errorf("Failed to generate a lexer: %v", err)
}

var filePath string
if *generateFlags.output != "" {
filePath = *generateFlags.output
} else {
filePath = fmt.Sprintf("%v_lexer.go", clspec.Name)
}

f, err := os.OpenFile(filePath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0644)
if err != nil {
return fmt.Errorf("Failed to create an output file: %v", err)
}
defer f.Close()

_, err = f.Write(b)
if err != nil {
return fmt.Errorf("Failed to write lexer source code: %v", err)
}

return nil
}

func readCompiledLexSpec(path string) (*spec.CompiledLexSpec, error) {
Expand Down
1 change: 1 addition & 0 deletions compiler/compiler.go
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ func Compile(lexspec *spec.LexSpec, opts ...CompilerOption) (*spec.CompiledLexSp
}

return &spec.CompiledLexSpec{
Name: lexspec.Name,
InitialModeID: spec.LexModeIDDefault,
ModeNames: modeNames,
KindNames: kindNames,
Expand Down
3 changes: 3 additions & 0 deletions compiler/compiler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ func TestCompile(t *testing.T) {
Caption: "allow duplicates names between fragments and non-fragments",
Spec: `
{
"name": "test",
"entries": [
{
"kind": "a2z",
Expand All @@ -36,6 +37,7 @@ func TestCompile(t *testing.T) {
Caption: "don't allow duplicates names in non-fragments",
Spec: `
{
"name": "test",
"entries": [
{
"kind": "a2z",
Expand All @@ -54,6 +56,7 @@ func TestCompile(t *testing.T) {
Caption: "don't allow duplicates names in fragments",
Spec: `
{
"name": "test",
"entries": [
{
"kind": "a2z",
Expand Down
Loading

0 comments on commit 7be1d27

Please sign in to comment.