Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/shfmt: add flags to read/write AST in JSON format to simplify integration with other languages #35

Closed
mvdan opened this issue Oct 12, 2016 · 9 comments · Fixed by #900

Comments

@mvdan
Copy link
Owner

mvdan commented Oct 12, 2016

Quite a few projects out there implement their own shell/bash parsers. Not only the shells themselves, but also some programs that work with shell scripts such as linters.

It's really a little silly for each one of them, especially tooling around shell scripts, to have to implement their own. Sadly, the shells out there don't allow their use as libraries (that I know of).

This is a library, so it does solve that problem. But being written in Go, it's not easily usable from other languages like Python or Shell itself.

Some ideas come to mind:

Both of those could be extended to also include the printer.

Any input will be appreciated, especially from people who would use something like this.

@mvdan
Copy link
Owner Author

mvdan commented Oct 12, 2016

The second option could also be in the form of flags in shfmt, like -dump to go from a shell script to a json AST and -format to do the opposite.

@mvdan
Copy link
Owner Author

mvdan commented Oct 13, 2016

Another point in favour of having this inside shfmt would be that depending on the binary would be a lot simpler, since for distributions it wouldn't mean an extra package and people using shfmt would already be set. And the two binaries would contain the same code anyway.

@mvdan mvdan added this to the 1.1 milestone Dec 17, 2016
@mvdan
Copy link
Owner Author

mvdan commented Dec 17, 2016

I'll go with the latter. The former will always be possible if anyone wants to build this package as a shared library.

JSON as the format makes sense because we're not too worried about efficiency in this case, since executing the program is in itself some overhead already. And it's a format well supported by many languages.

What I need to figure out is how to represent types for interface fields. For example, commands. If I just marshal a program, this type information is lost.

@mvdan mvdan changed the title proposal: make the parser easier to use externally cmd/shfmt: add flags to read/write AST in JSON format to simplify integration with other languages Dec 17, 2016
@mvdan mvdan removed this from the 1.1 milestone Jan 5, 2017
@mvdan mvdan mentioned this issue Mar 16, 2017
@mvdan mvdan closed this as completed in c0bfb7d Jun 20, 2017
@mvdan
Copy link
Owner Author

mvdan commented Jun 20, 2017

Shell code to JSON is now done. Note that this does not include unexported fields (File.lines) nor positions (e.g. Lit.ValuePos), since they both seem largely useless for tools that just want the AST of the code.

I haven't done JSON back to shell code yet. I might in the future. Please try the existing feature out and let me know if it would be useful. In my mind, it would only be useful to write back into the original shell source code.

But then we'd need position information too, so that complicates things a bit. Hence why I haven't included that second half just yet.

@mvdan
Copy link
Owner Author

mvdan commented Jun 20, 2017

Also, obligatory example:

$ shfmt -exp.tojson <<<'echo foo'
{
        "Comments": [],
        "Name": "",
        "Stmts": [
                {
                        "Assigns": [],
                        "Background": false,
                        "Cmd": {
                                "Args": [
                                        {
                                                "Parts": [
                                                        {
                                                                "Type": "Lit",
                                                                "Value": "echo"
                                                        }
                                                ]
                                        },
                                        {
                                                "Parts": [
                                                        {
                                                                "Type": "Lit",
                                                                "Value": "foo"
                                                        }
                                                ]
                                        }
                                ],
                                "Type": "CallExpr"
                        },
                        "Coprocess": false,
                        "Negated": false,
                        "Redirs": []
                }
        ]
}

@cilki
Copy link

cilki commented Jul 12, 2022

I find the -tojson flag useful. In particular I'm interested in manipulating the output JSON and turning it back into a script like so:

echo 'echo 123' | ./shfmt -tojson | jq '<optimizations>' | ./shfmt -fromjson >optimized.sh

Any guidance on what I would need to do to implement -fromjson? I'll send a PR if I can get it working (and this is a desirable feature outside of my probably unusual use-case).

@mvdan
Copy link
Owner Author

mvdan commented Jul 13, 2022

It is indeed desirable, and we've talked about it in the context of js/wasm - since calls from and to Go can only pass basic types like string around, and not the proper syntax tree. For example, see #885.

This issue does say read and write AST in JSON format, so I'm not entirely sure why I closed it after doing just one part of it. I'll take a look.

@mvdan mvdan reopened this Jul 13, 2022
mvdan added a commit that referenced this issue Jul 13, 2022
For the sake of consistency, -tojson is now also available as --to-json.

The following changes are made to --to-json:

1) JSON object keys are no longer sorted alphabetically.
   The new order is: the derived keys (Type, Pos, End),
   and then the Node's struct fields in their original order.
   This helps consistency across nodes and with the Go documentation.

2) File.Name is empty by default, rather than `<standard input>`.
   It did not make sense as a default for emitting JSON,
   as the flag always required the input to be stdin.

3) Empty values are no longer emitted, to help with verbosity.
   This includes `false`, `""`, `null`, `[]`, and zero positions.
   Positions offsets are exempt, as 0 is a valid byte offset.

4) All position fields in Node structs are now emitted.
   Some positions are redundant given the derived Pos and End keys,
   but some others are entirely separate, like IfClause.ThenPos.

As part of point 1 above, JSON encoding no longer uses Go maps.
It now uses reflect.StructOf, which further leans into Go reflection.

Any of these four changes could potentially break users,
as they slightly change the way we encode syntax trees as JSON.
We are working under the assumption that the changes are reasonable,
and that any breakage is unlikely and easy to fix.
If those assumptions turn out to be false once this change is released,
we can always swap the -tojson flag to be exactly the old behavior,
and --to-json can then add the new behavior in a safer way.

We also had other ideas to further improve the JSON format,
such as renaming the Type and Pos/End JSON keys,
but we leave those as v4 TODOs as they will surely break most users.

The new --from-json flag does the reverse; it decodes the typed JSON,
and fills a *syntax.File with all the information.
The derived Type field is used to select syntax.Node types,
and the derived Pos and End fields are ignored entirely.

It's worth noting that neither --to-json nor --from-json are optimized.
The decoding side first decodes into an empty interface, for example,
which leaves plenty of room for improvement.
Once we're happy with the functionality, we can look at faster
implementations and even dropping the need for reflection.

While here, I noticed that the godoc for Pos.IsValid was slightly wrong.

As a proof of concept, the following commands all produce the same result:

	shfmt <input.sh
	shfmt --to-json <input.sh | shfmt --from-json
	shfmt --to-json <input.sh | jq | shfmt --from-json

Fixes #35 again, as we never implemented the "read JSON" side.
@mvdan
Copy link
Owner Author

mvdan commented Jul 13, 2022

I had started writing -fromjson this all the way back in 2019 (jeez, time flies), and after four hours or so, it's finished - see #900.

@cilki
Copy link

cilki commented Jul 13, 2022

@mvdan wow that was amazingly fast. I'll try it out. Thank you!

mvdan added a commit that referenced this issue Jul 16, 2022
For the sake of consistency, -tojson is now also available as --to-json.

The following changes are made to --to-json:

1) JSON object keys are no longer sorted alphabetically.
   The new order is: the derived keys (Type, Pos, End),
   and then the Node's struct fields in their original order.
   This helps consistency across nodes and with the Go documentation.

2) File.Name is empty by default, rather than `<standard input>`.
   It did not make sense as a default for emitting JSON,
   as the flag always required the input to be stdin.

3) Empty values are no longer emitted, to help with verbosity.
   This includes `false`, `""`, `null`, `[]`, and zero positions.
   Positions offsets are exempt, as 0 is a valid byte offset.

4) All position fields in Node structs are now emitted.
   Some positions are redundant given the derived Pos and End keys,
   but some others are entirely separate, like IfClause.ThenPos.

As part of point 1 above, JSON encoding no longer uses Go maps.
It now uses reflect.StructOf, which further leans into Go reflection.

Any of these four changes could potentially break users,
as they slightly change the way we encode syntax trees as JSON.
We are working under the assumption that the changes are reasonable,
and that any breakage is unlikely and easy to fix.
If those assumptions turn out to be false once this change is released,
we can always swap the -tojson flag to be exactly the old behavior,
and --to-json can then add the new behavior in a safer way.

We also had other ideas to further improve the JSON format,
such as renaming the Type and Pos/End JSON keys,
but we leave those as v4 TODOs as they will surely break most users.

The new --from-json flag does the reverse; it decodes the typed JSON,
and fills a *syntax.File with all the information.
The derived Type field is used to select syntax.Node types,
and the derived Pos and End fields are ignored entirely.

It's worth noting that neither --to-json nor --from-json are optimized.
The decoding side first decodes into an empty interface, for example,
which leaves plenty of room for improvement.
Once we're happy with the functionality, we can look at faster
implementations and even dropping the need for reflection.

While here, I noticed that the godoc for Pos.IsValid was slightly wrong.

As a proof of concept, the following commands all produce the same result:

	shfmt <input.sh
	shfmt --to-json <input.sh | shfmt --from-json
	shfmt --to-json <input.sh | jq | shfmt --from-json

Fixes #35 again, as we never implemented the "read JSON" side.
mvdan added a commit that referenced this issue Jul 16, 2022
For the sake of consistency, -tojson is now also available as --to-json.

The following changes are made to --to-json:

1) JSON object keys are no longer sorted alphabetically.
   The new order is: the derived keys (Type, Pos, End),
   and then the Node's struct fields in their original order.
   This helps consistency across nodes and with the Go documentation.

2) File.Name is empty by default, rather than `<standard input>`.
   It did not make sense as a default for emitting JSON,
   as the flag always required the input to be stdin.

3) Empty values are no longer emitted, to help with verbosity.
   This includes `false`, `""`, `null`, `[]`, and zero positions.
   Positions offsets are exempt, as 0 is a valid byte offset.

4) All position fields in Node structs are now emitted.
   Some positions are redundant given the derived Pos and End keys,
   but some others are entirely separate, like IfClause.ThenPos.

As part of point 1 above, JSON encoding no longer uses Go maps.
It now uses reflect.StructOf, which further leans into Go reflection.

Any of these four changes could potentially break users,
as they slightly change the way we encode syntax trees as JSON.
We are working under the assumption that the changes are reasonable,
and that any breakage is unlikely and easy to fix.
If those assumptions turn out to be false once this change is released,
we can always swap the -tojson flag to be exactly the old behavior,
and --to-json can then add the new behavior in a safer way.

We also had other ideas to further improve the JSON format,
such as renaming the Type and Pos/End JSON keys,
but we leave those as v4 TODOs as they will surely break most users.

The new --from-json flag does the reverse; it decodes the typed JSON,
and fills a *syntax.File with all the information.
The derived Type field is used to select syntax.Node types,
and the derived Pos and End fields are ignored entirely.

It's worth noting that neither --to-json nor --from-json are optimized.
The decoding side first decodes into an empty interface, for example,
which leaves plenty of room for improvement.
Once we're happy with the functionality, we can look at faster
implementations and even dropping the need for reflection.

While here, I noticed that the godoc for Pos.IsValid was slightly wrong.

As a proof of concept, the following commands all produce the same result:

	shfmt <input.sh
	shfmt --to-json <input.sh | shfmt --from-json
	shfmt --to-json <input.sh | jq | shfmt --from-json

Fixes #35 again, as we never implemented the "read JSON" side.
mvdan added a commit that referenced this issue Jul 16, 2022
For the sake of consistency, -tojson is now also available as --to-json.

The following changes are made to --to-json:

1) JSON object keys are no longer sorted alphabetically.
   The new order is: the derived keys (Type, Pos, End),
   and then the Node's struct fields in their original order.
   This helps consistency across nodes and with the Go documentation.

2) File.Name is empty by default, rather than `<standard input>`.
   It did not make sense as a default for emitting JSON,
   as the flag always required the input to be stdin.

3) Empty values are no longer emitted, to help with verbosity.
   This includes `false`, `""`, `null`, `[]`, and zero positions.
   Positions offsets are exempt, as 0 is a valid byte offset.

4) All position fields in Node structs are now emitted.
   Some positions are redundant given the derived Pos and End keys,
   but some others are entirely separate, like IfClause.ThenPos.

As part of point 1 above, JSON encoding no longer uses Go maps.
It now uses reflect.StructOf, which further leans into Go reflection.

Any of these four changes could potentially break users,
as they slightly change the way we encode syntax trees as JSON.
We are working under the assumption that the changes are reasonable,
and that any breakage is unlikely and easy to fix.
If those assumptions turn out to be false once this change is released,
we can always swap the -tojson flag to be exactly the old behavior,
and --to-json can then add the new behavior in a safer way.

We also had other ideas to further improve the JSON format,
such as renaming the Type and Pos/End JSON keys,
but we leave those as v4 TODOs as they will surely break most users.

The new --from-json flag does the reverse; it decodes the typed JSON,
and fills a *syntax.File with all the information.
The derived Type field is used to select syntax.Node types,
and the derived Pos and End fields are ignored entirely.

It's worth noting that neither --to-json nor --from-json are optimized.
The decoding side first decodes into an empty interface, for example,
which leaves plenty of room for improvement.
Once we're happy with the functionality, we can look at faster
implementations and even dropping the need for reflection.

While here, I noticed that the godoc for Pos.IsValid was slightly wrong.

As a proof of concept, the following commands all produce the same result:

	shfmt <input.sh
	shfmt --to-json <input.sh | shfmt --from-json
	shfmt --to-json <input.sh | jq | shfmt --from-json

Fixes #35 again, as we never implemented the "read JSON" side.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants