-
-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/shfmt: add flags to read/write AST in JSON format to simplify integration with other languages #35
Comments
The second option could also be in the form of flags in |
Another point in favour of having this inside |
I'll go with the latter. The former will always be possible if anyone wants to build this package as a shared library. JSON as the format makes sense because we're not too worried about efficiency in this case, since executing the program is in itself some overhead already. And it's a format well supported by many languages. What I need to figure out is how to represent types for interface fields. For example, commands. If I just marshal a program, this type information is lost. |
Shell code to JSON is now done. Note that this does not include unexported fields ( I haven't done JSON back to shell code yet. I might in the future. Please try the existing feature out and let me know if it would be useful. In my mind, it would only be useful to write back into the original shell source code. But then we'd need position information too, so that complicates things a bit. Hence why I haven't included that second half just yet. |
Also, obligatory example:
|
I find the echo 'echo 123' | ./shfmt -tojson | jq '<optimizations>' | ./shfmt -fromjson >optimized.sh Any guidance on what I would need to do to implement |
It is indeed desirable, and we've talked about it in the context of js/wasm - since calls from and to Go can only pass basic types like This issue does say read and write AST in JSON format, so I'm not entirely sure why I closed it after doing just one part of it. I'll take a look. |
For the sake of consistency, -tojson is now also available as --to-json. The following changes are made to --to-json: 1) JSON object keys are no longer sorted alphabetically. The new order is: the derived keys (Type, Pos, End), and then the Node's struct fields in their original order. This helps consistency across nodes and with the Go documentation. 2) File.Name is empty by default, rather than `<standard input>`. It did not make sense as a default for emitting JSON, as the flag always required the input to be stdin. 3) Empty values are no longer emitted, to help with verbosity. This includes `false`, `""`, `null`, `[]`, and zero positions. Positions offsets are exempt, as 0 is a valid byte offset. 4) All position fields in Node structs are now emitted. Some positions are redundant given the derived Pos and End keys, but some others are entirely separate, like IfClause.ThenPos. As part of point 1 above, JSON encoding no longer uses Go maps. It now uses reflect.StructOf, which further leans into Go reflection. Any of these four changes could potentially break users, as they slightly change the way we encode syntax trees as JSON. We are working under the assumption that the changes are reasonable, and that any breakage is unlikely and easy to fix. If those assumptions turn out to be false once this change is released, we can always swap the -tojson flag to be exactly the old behavior, and --to-json can then add the new behavior in a safer way. We also had other ideas to further improve the JSON format, such as renaming the Type and Pos/End JSON keys, but we leave those as v4 TODOs as they will surely break most users. The new --from-json flag does the reverse; it decodes the typed JSON, and fills a *syntax.File with all the information. The derived Type field is used to select syntax.Node types, and the derived Pos and End fields are ignored entirely. It's worth noting that neither --to-json nor --from-json are optimized. The decoding side first decodes into an empty interface, for example, which leaves plenty of room for improvement. Once we're happy with the functionality, we can look at faster implementations and even dropping the need for reflection. While here, I noticed that the godoc for Pos.IsValid was slightly wrong. As a proof of concept, the following commands all produce the same result: shfmt <input.sh shfmt --to-json <input.sh | shfmt --from-json shfmt --to-json <input.sh | jq | shfmt --from-json Fixes #35 again, as we never implemented the "read JSON" side.
I had started writing |
@mvdan wow that was amazingly fast. I'll try it out. Thank you! |
For the sake of consistency, -tojson is now also available as --to-json. The following changes are made to --to-json: 1) JSON object keys are no longer sorted alphabetically. The new order is: the derived keys (Type, Pos, End), and then the Node's struct fields in their original order. This helps consistency across nodes and with the Go documentation. 2) File.Name is empty by default, rather than `<standard input>`. It did not make sense as a default for emitting JSON, as the flag always required the input to be stdin. 3) Empty values are no longer emitted, to help with verbosity. This includes `false`, `""`, `null`, `[]`, and zero positions. Positions offsets are exempt, as 0 is a valid byte offset. 4) All position fields in Node structs are now emitted. Some positions are redundant given the derived Pos and End keys, but some others are entirely separate, like IfClause.ThenPos. As part of point 1 above, JSON encoding no longer uses Go maps. It now uses reflect.StructOf, which further leans into Go reflection. Any of these four changes could potentially break users, as they slightly change the way we encode syntax trees as JSON. We are working under the assumption that the changes are reasonable, and that any breakage is unlikely and easy to fix. If those assumptions turn out to be false once this change is released, we can always swap the -tojson flag to be exactly the old behavior, and --to-json can then add the new behavior in a safer way. We also had other ideas to further improve the JSON format, such as renaming the Type and Pos/End JSON keys, but we leave those as v4 TODOs as they will surely break most users. The new --from-json flag does the reverse; it decodes the typed JSON, and fills a *syntax.File with all the information. The derived Type field is used to select syntax.Node types, and the derived Pos and End fields are ignored entirely. It's worth noting that neither --to-json nor --from-json are optimized. The decoding side first decodes into an empty interface, for example, which leaves plenty of room for improvement. Once we're happy with the functionality, we can look at faster implementations and even dropping the need for reflection. While here, I noticed that the godoc for Pos.IsValid was slightly wrong. As a proof of concept, the following commands all produce the same result: shfmt <input.sh shfmt --to-json <input.sh | shfmt --from-json shfmt --to-json <input.sh | jq | shfmt --from-json Fixes #35 again, as we never implemented the "read JSON" side.
For the sake of consistency, -tojson is now also available as --to-json. The following changes are made to --to-json: 1) JSON object keys are no longer sorted alphabetically. The new order is: the derived keys (Type, Pos, End), and then the Node's struct fields in their original order. This helps consistency across nodes and with the Go documentation. 2) File.Name is empty by default, rather than `<standard input>`. It did not make sense as a default for emitting JSON, as the flag always required the input to be stdin. 3) Empty values are no longer emitted, to help with verbosity. This includes `false`, `""`, `null`, `[]`, and zero positions. Positions offsets are exempt, as 0 is a valid byte offset. 4) All position fields in Node structs are now emitted. Some positions are redundant given the derived Pos and End keys, but some others are entirely separate, like IfClause.ThenPos. As part of point 1 above, JSON encoding no longer uses Go maps. It now uses reflect.StructOf, which further leans into Go reflection. Any of these four changes could potentially break users, as they slightly change the way we encode syntax trees as JSON. We are working under the assumption that the changes are reasonable, and that any breakage is unlikely and easy to fix. If those assumptions turn out to be false once this change is released, we can always swap the -tojson flag to be exactly the old behavior, and --to-json can then add the new behavior in a safer way. We also had other ideas to further improve the JSON format, such as renaming the Type and Pos/End JSON keys, but we leave those as v4 TODOs as they will surely break most users. The new --from-json flag does the reverse; it decodes the typed JSON, and fills a *syntax.File with all the information. The derived Type field is used to select syntax.Node types, and the derived Pos and End fields are ignored entirely. It's worth noting that neither --to-json nor --from-json are optimized. The decoding side first decodes into an empty interface, for example, which leaves plenty of room for improvement. Once we're happy with the functionality, we can look at faster implementations and even dropping the need for reflection. While here, I noticed that the godoc for Pos.IsValid was slightly wrong. As a proof of concept, the following commands all produce the same result: shfmt <input.sh shfmt --to-json <input.sh | shfmt --from-json shfmt --to-json <input.sh | jq | shfmt --from-json Fixes #35 again, as we never implemented the "read JSON" side.
For the sake of consistency, -tojson is now also available as --to-json. The following changes are made to --to-json: 1) JSON object keys are no longer sorted alphabetically. The new order is: the derived keys (Type, Pos, End), and then the Node's struct fields in their original order. This helps consistency across nodes and with the Go documentation. 2) File.Name is empty by default, rather than `<standard input>`. It did not make sense as a default for emitting JSON, as the flag always required the input to be stdin. 3) Empty values are no longer emitted, to help with verbosity. This includes `false`, `""`, `null`, `[]`, and zero positions. Positions offsets are exempt, as 0 is a valid byte offset. 4) All position fields in Node structs are now emitted. Some positions are redundant given the derived Pos and End keys, but some others are entirely separate, like IfClause.ThenPos. As part of point 1 above, JSON encoding no longer uses Go maps. It now uses reflect.StructOf, which further leans into Go reflection. Any of these four changes could potentially break users, as they slightly change the way we encode syntax trees as JSON. We are working under the assumption that the changes are reasonable, and that any breakage is unlikely and easy to fix. If those assumptions turn out to be false once this change is released, we can always swap the -tojson flag to be exactly the old behavior, and --to-json can then add the new behavior in a safer way. We also had other ideas to further improve the JSON format, such as renaming the Type and Pos/End JSON keys, but we leave those as v4 TODOs as they will surely break most users. The new --from-json flag does the reverse; it decodes the typed JSON, and fills a *syntax.File with all the information. The derived Type field is used to select syntax.Node types, and the derived Pos and End fields are ignored entirely. It's worth noting that neither --to-json nor --from-json are optimized. The decoding side first decodes into an empty interface, for example, which leaves plenty of room for improvement. Once we're happy with the functionality, we can look at faster implementations and even dropping the need for reflection. While here, I noticed that the godoc for Pos.IsValid was slightly wrong. As a proof of concept, the following commands all produce the same result: shfmt <input.sh shfmt --to-json <input.sh | shfmt --from-json shfmt --to-json <input.sh | jq | shfmt --from-json Fixes #35 again, as we never implemented the "read JSON" side.
Quite a few projects out there implement their own shell/bash parsers. Not only the shells themselves, but also some programs that work with shell scripts such as linters.
It's really a little silly for each one of them, especially tooling around shell scripts, to have to implement their own. Sadly, the shells out there don't allow their use as libraries (that I know of).
This is a library, so it does solve that problem. But being written in Go, it's not easily usable from other languages like Python or Shell itself.
Some ideas come to mind:
shfmt
that parses an input file and outputs the AST in json or some other easy-to-import formatBoth of those could be extended to also include the printer.
Any input will be appreciated, especially from people who would use something like this.
The text was updated successfully, but these errors were encountered: