-
-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/shfmt: change the language default from "bash" to "auto" #622
Comments
Sounds good to me! |
It seems like their algorithm is defined in koalaman/shellcheck@a4b9cec:
I don't think I want to teach shfmt about shellcheck directives. It's an explicit design decision of our parser to not treat any comments as special directives. I think shebangs are the only universal exception here, and already provide what we need. I also slightly disagree with the order. If a I agree that it would be nicer if all tools would agree on a "shell language detection" standard, but given that ShellCheck went for a custom algorithm based on directives, I'm afraid that's already an impossibility. |
Thanks, that makes sense. |
Consider following repo:
In this example autodetection will detect progs as posix(probably?) and funcs as bash. |
The current proposal wouldn't understand posix-y shells like ash or dash, so your prog files would default to bash. We could extend the rules in the future, but for now I want to keep them simple. Similarly, since funcs.sh has no shebang and the In general, if you have a library POSIX shell file that's meant to be sourced but not executed directly, I think your best bet is to add an explicit POSIX shebang at the top of it, like If you don't like having shebangs in library files, then your other option is EditorConfig, like:
|
That's another inconsistency
Nobody's going to do that.
Current rules are pretty simple - parse everything as bash unless EditorConfig says otherwise. This proposal lets group of programmers targeting bash avoid writing EditorConfig, but group targeting posix shell (and other implementations) will have to deal with some inconsistencies. In the end it's a question about whether one group is large enough that it's worth creating more rules (more code, more docs) for solving their problem, while adding some inconsistencies for the other group. |
Like I said, we could teach the tool to understand other posix-y shells in the future.
Nobody is forced to do that, either. The bash fallback is still there if you don't care. And you can always be explicit via a flag or via EditorConfig if you want.
The rules are simple, but IMO the default is too rigid. It's fine for the parser to have extremely simple rules, where you always have to say what language you want if it's not Bash. But
I disagree. If anything, almost nothing changes for them. If they weren't specifying posix, they'll just keep on getting the same bash fallback. If they're specifying posix in some way, they'll get posix. And they always have the option to explicitly state that all of their code is posix. |
The man page describes the logic in detail. In short, the default is no longer just "assume bash", as it will try to detect the language from the filename and shebang, and only fall back to "bash" if those fail. Users who specify a language themselves, either via -ln or editorconfig, should see no change in behavior whatsoever. Those relying on the default should generally see an improvement, as they might no longer have issues with directories mixing languages. One potential regression is cases like: $ cat foo.sh foo=(bar baz) # actually bash That is, bash scripts which use "sh" extensions or shebangs. "sh" stands for POSIX shell, so the new logic will parse them as such, whereas the old logic would parse them as bash. Still, that seems like an easy fix on the part of the user: they can either change the extension or shebang to be clear, or they can explicitly set up the option to "bash". Note that the added "auto" logic applies to all kinds of inputs: explicit files, directories to be walked, and standard input. Fixes #622.
It's taken me a while to come back to this - I had a half-finished local branch from over a year ago, and I hadn't realised it would take me a few extra hours to iron out the last few details with tests. Please give #796 a try and let me know how it works :) Reviews are also very much welcome. |
The man page describes the logic in detail. In short, the default is no longer just "assume bash", as it will try to detect the language from the filename and shebang, and only fall back to "bash" if those fail. Users who specify a language themselves, either via -ln or editorconfig, should see no change in behavior whatsoever. Those relying on the default should generally see an improvement, as they might no longer have issues with directories mixing languages. One potential regression is cases like: $ cat foo.sh foo=(bar baz) # actually bash That is, bash scripts which use "sh" extensions or shebangs. "sh" stands for POSIX shell, so the new logic will parse them as such, whereas the old logic would parse them as bash. Still, that seems like an easy fix on the part of the user: they can either change the extension or shebang to be clear, or they can explicitly set up the option to "bash". Note that the added "auto" logic applies to all kinds of inputs: explicit files, directories to be walked, and standard input. Fixes #622.
That is, auto-detection of what language the input shell should be in. This is because, right now, running something like
shfmt -l -w .
on a large repository will simply use the default of-ln=bash
for all files. One can use-ln=posix
to change that, but then all files are parsed as POSIX.It's not terribly common to mix different shell languages in a single repository, but it happens. For example, imagine a repository which has a few simple POSIX shell scripts for extreme portability, some complex Bash scripts for development, and a few Bats files for unit testing other shell scripts. Right now, this would require at least three
shfmt
calls, and manually hard-coding which files are in which language.We can do better. The files themselves often tell us what language they are in via the filename extension and the shebang.
I propose the following algorithm to auto-detect the language. When a language is selected, the following steps aren't considered.
.bash
,.mksh
, or.bats
, we use that language.#!/bin/sh
or#!/usr/bin/env bash
, we use that language. Here,sh
means POSIX shell.Step 1 does not assume
.sh
means POSIX, because Bash scripts with that extension are too common. If a file is meant to be POSIX shell, give it ash
shebang, which will be matched by step 2.Step 3 allows us to not break backwards compatibility. If we don't know what language the input is in, we'll just keep the current conservative fallback that is Bash.
To my mind, the only downside to this change is that it adds a bit of complexity; we now need to explain what
-ln=auto
means. But I think that should be doable in a few lines in the added man page, just like we did in three list items above.--
This is a reboot of #429, which I actually forgot about. Fun how I came up with almost exactly the same idea again. The reason I think this is worth considering again is two-fold:
-ln=bash
is still too stiff, in my opinion. Support for Bats is a clear example; a project having both Bash and Bats files is entirely normal.[*.bats]\nshell_variant = "bats"
is silly. Plus, since EditorConfig is entirely filename-based, it does not cover Step 2 from this proposal.CC @jansorg @kolyshkin @lassik @kaey since you all contributed to related issues. If we reach consensus on this, I'd implement this for 3.3.0, which should come out in early 2021.
The text was updated successfully, but these errors were encountered: