cmd/shfmt: add flags to read/write AST in JSON format to simplify integration with other languages #35

mvdan · 2016-10-12T19:30:16Z

Quite a few projects out there implement their own shell/bash parsers. Not only the shells themselves, but also some programs that work with shell scripts such as linters.

It's really a little silly for each one of them, especially tooling around shell scripts, to have to implement their own. Sadly, the shells out there don't allow their use as libraries (that I know of).

This is a library, so it does solve that problem. But being written in Go, it's not easily usable from other languages like Python or Shell itself.

Some ideas come to mind:

Building the packages as shared libraries, linkable by C: http://blog.ralch.com/tutorial/golang-sharing-libraries/
Adding an executable alongside shfmt that parses an input file and outputs the AST in json or some other easy-to-import format

Both of those could be extended to also include the printer.

Any input will be appreciated, especially from people who would use something like this.

The text was updated successfully, but these errors were encountered:

mvdan · 2016-10-12T19:32:14Z

The second option could also be in the form of flags in shfmt, like -dump to go from a shell script to a json AST and -format to do the opposite.

mvdan · 2016-10-13T17:16:30Z

Another point in favour of having this inside shfmt would be that depending on the binary would be a lot simpler, since for distributions it wouldn't mean an extra package and people using shfmt would already be set. And the two binaries would contain the same code anyway.

mvdan · 2016-12-17T19:38:36Z

I'll go with the latter. The former will always be possible if anyone wants to build this package as a shared library.

JSON as the format makes sense because we're not too worried about efficiency in this case, since executing the program is in itself some overhead already. And it's a format well supported by many languages.

What I need to figure out is how to represent types for interface fields. For example, commands. If I just marshal a program, this type information is lost.

mvdan · 2017-06-20T17:27:50Z

Shell code to JSON is now done. Note that this does not include unexported fields (File.lines) nor positions (e.g. Lit.ValuePos), since they both seem largely useless for tools that just want the AST of the code.

I haven't done JSON back to shell code yet. I might in the future. Please try the existing feature out and let me know if it would be useful. In my mind, it would only be useful to write back into the original shell source code.

But then we'd need position information too, so that complicates things a bit. Hence why I haven't included that second half just yet.

mvdan · 2017-06-20T17:28:39Z

Also, obligatory example:

$ shfmt -exp.tojson <<<'echo foo'
{
        "Comments": [],
        "Name": "",
        "Stmts": [
                {
                        "Assigns": [],
                        "Background": false,
                        "Cmd": {
                                "Args": [
                                        {
                                                "Parts": [
                                                        {
                                                                "Type": "Lit",
                                                                "Value": "echo"
                                                        }
                                                ]
                                        },
                                        {
                                                "Parts": [
                                                        {
                                                                "Type": "Lit",
                                                                "Value": "foo"
                                                        }
                                                ]
                                        }
                                ],
                                "Type": "CallExpr"
                        },
                        "Coprocess": false,
                        "Negated": false,
                        "Redirs": []
                }
        ]
}

cilki · 2022-07-12T23:16:30Z

I find the -tojson flag useful. In particular I'm interested in manipulating the output JSON and turning it back into a script like so:

echo 'echo 123' | ./shfmt -tojson | jq '<optimizations>' | ./shfmt -fromjson >optimized.sh

Any guidance on what I would need to do to implement -fromjson? I'll send a PR if I can get it working (and this is a desirable feature outside of my probably unusual use-case).

mvdan · 2022-07-13T09:17:52Z

It is indeed desirable, and we've talked about it in the context of js/wasm - since calls from and to Go can only pass basic types like string around, and not the proper syntax tree. For example, see #885.

This issue does say read and write AST in JSON format, so I'm not entirely sure why I closed it after doing just one part of it. I'll take a look.

For the sake of consistency, -tojson is now also available as --to-json. The following changes are made to --to-json: 1) JSON object keys are no longer sorted alphabetically. The new order is: the derived keys (Type, Pos, End), and then the Node's struct fields in their original order. This helps consistency across nodes and with the Go documentation. 2) File.Name is empty by default, rather than `<standard input>`. It did not make sense as a default for emitting JSON, as the flag always required the input to be stdin. 3) Empty values are no longer emitted, to help with verbosity. This includes `false`, `""`, `null`, `[]`, and zero positions. Positions offsets are exempt, as 0 is a valid byte offset. 4) All position fields in Node structs are now emitted. Some positions are redundant given the derived Pos and End keys, but some others are entirely separate, like IfClause.ThenPos. As part of point 1 above, JSON encoding no longer uses Go maps. It now uses reflect.StructOf, which further leans into Go reflection. Any of these four changes could potentially break users, as they slightly change the way we encode syntax trees as JSON. We are working under the assumption that the changes are reasonable, and that any breakage is unlikely and easy to fix. If those assumptions turn out to be false once this change is released, we can always swap the -tojson flag to be exactly the old behavior, and --to-json can then add the new behavior in a safer way. We also had other ideas to further improve the JSON format, such as renaming the Type and Pos/End JSON keys, but we leave those as v4 TODOs as they will surely break most users. The new --from-json flag does the reverse; it decodes the typed JSON, and fills a *syntax.File with all the information. The derived Type field is used to select syntax.Node types, and the derived Pos and End fields are ignored entirely. It's worth noting that neither --to-json nor --from-json are optimized. The decoding side first decodes into an empty interface, for example, which leaves plenty of room for improvement. Once we're happy with the functionality, we can look at faster implementations and even dropping the need for reflection. While here, I noticed that the godoc for Pos.IsValid was slightly wrong. As a proof of concept, the following commands all produce the same result: shfmt <input.sh shfmt --to-json <input.sh | shfmt --from-json shfmt --to-json <input.sh | jq | shfmt --from-json Fixes #35 again, as we never implemented the "read JSON" side.

mvdan · 2022-07-13T16:28:57Z

I had started writing -fromjson this all the way back in 2019 (jeez, time flies), and after four hours or so, it's finished - see #900.

cilki · 2022-07-13T19:25:47Z

@mvdan wow that was amazingly fast. I'll try it out. Thank you!

For the sake of consistency, -tojson is now also available as --to-json. The following changes are made to --to-json: 1) JSON object keys are no longer sorted alphabetically. The new order is: the derived keys (Type, Pos, End), and then the Node's struct fields in their original order. This helps consistency across nodes and with the Go documentation. 2) File.Name is empty by default, rather than `<standard input>`. It did not make sense as a default for emitting JSON, as the flag always required the input to be stdin. 3) Empty values are no longer emitted, to help with verbosity. This includes `false`, `""`, `null`, `[]`, and zero positions. Positions offsets are exempt, as 0 is a valid byte offset. 4) All position fields in Node structs are now emitted. Some positions are redundant given the derived Pos and End keys, but some others are entirely separate, like IfClause.ThenPos. As part of point 1 above, JSON encoding no longer uses Go maps. It now uses reflect.StructOf, which further leans into Go reflection. Any of these four changes could potentially break users, as they slightly change the way we encode syntax trees as JSON. We are working under the assumption that the changes are reasonable, and that any breakage is unlikely and easy to fix. If those assumptions turn out to be false once this change is released, we can always swap the -tojson flag to be exactly the old behavior, and --to-json can then add the new behavior in a safer way. We also had other ideas to further improve the JSON format, such as renaming the Type and Pos/End JSON keys, but we leave those as v4 TODOs as they will surely break most users. The new --from-json flag does the reverse; it decodes the typed JSON, and fills a *syntax.File with all the information. The derived Type field is used to select syntax.Node types, and the derived Pos and End fields are ignored entirely. It's worth noting that neither --to-json nor --from-json are optimized. The decoding side first decodes into an empty interface, for example, which leaves plenty of room for improvement. Once we're happy with the functionality, we can look at faster implementations and even dropping the need for reflection. While here, I noticed that the godoc for Pos.IsValid was slightly wrong. As a proof of concept, the following commands all produce the same result: shfmt <input.sh shfmt --to-json <input.sh | shfmt --from-json shfmt --to-json <input.sh | jq | shfmt --from-json Fixes #35 again, as we never implemented the "read JSON" side.

mvdan added this to the 1.1 milestone Dec 17, 2016

mvdan changed the title ~~proposal: make the parser easier to use externally~~ cmd/shfmt: add flags to read/write AST in JSON format to simplify integration with other languages Dec 17, 2016

mvdan removed this from the 1.1 milestone Jan 5, 2017

mvdan mentioned this issue Mar 16, 2017

Sh stringifier #71

Closed

mvdan closed this as completed in c0bfb7d Jun 20, 2017

mvdan reopened this Jul 13, 2022

mvdan mentioned this issue Jul 13, 2022

cmd/shfmt: implement --from-json and improve -tojson #900

Merged

mvdan closed this as completed in #900 Jul 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/shfmt: add flags to read/write AST in JSON format to simplify integration with other languages #35

cmd/shfmt: add flags to read/write AST in JSON format to simplify integration with other languages #35

mvdan commented Oct 12, 2016

mvdan commented Oct 12, 2016

mvdan commented Oct 13, 2016

mvdan commented Dec 17, 2016

mvdan commented Jun 20, 2017

mvdan commented Jun 20, 2017

cilki commented Jul 12, 2022

mvdan commented Jul 13, 2022

mvdan commented Jul 13, 2022

cilki commented Jul 13, 2022