-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[git] Correctly call git ls-tree
so unicode files are not quoted
#3879
Conversation
cb2fa2e
to
119cb5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
- There is a small problem with the string-reading code
- For the test perhaps you can create the file at runtime to avoid having to commit it
src/stdune/io.ml
Outdated
| Some s, eos -> process_buf buf eos (s :: acc) | ||
| None, rest -> (acc, rest) | ||
in | ||
let rec loop ic pos acc = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks buggy. If one of the strings has length >= Bytes.length buf
, this code will keep reading the string until buf
gets filled up (ie pos = buf_size
) and then the exit clause will trigger res = 0
prematurely.
Either we need to fail if we come across a long string or you need to keep some kind of variable-length buffer to store the string that is currently being read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh indeed the code doesn't support a string longer than >= Bytes.length buf
; that's OK for filenames but not in general.
As I mentioned I hacked something quick and dirty, but surely there has to be a reference implementation we could use [maybe in Base
?] instead of rolling our own, right? This kind of stuff can be tricky (specially perf-wise) so I prefer to use a tested implementation.
Otherwise I'd be happy to do a pass on the code a fix the bugs and maybe tweak a few things here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is OK to be simple here. You can look at the implementation of input_line
for inspiration; there they keep a list of fixed-size buffers instead of resizing a single buffer to fit the string being read.
https://github.com/ocaml/ocaml/blob/10364b67d5d7af71b8626c805ffc0ec49d9b1522/stdlib/stdlib.ml#L439-L465
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do rely on caml_input_scan_line
tho, we don't have anything equivalent so I'm not sure how we can simplify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I did a new, I hope simpler and more correct version, can you have a look while I add the test?
src/dune_engine/vcs.ml
Outdated
@@ -142,5 +151,5 @@ let files = | |||
|
|||
let to_dyn = Dyn.Encoder.list Path.to_dyn | |||
end )) | |||
~git:(f [ "ls-tree"; "-r"; "--name-only"; "HEAD" ]) | |||
~git:(f_zero [ "ls-tree"; "-z"; "-r"; "--name-only"; "HEAD" ]) | |||
~hg:(f [ "files" ]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hg
should be given a similar treatment with hg files -0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/stdune/io.ml
Outdated
| Some s, eos -> process_buf buf eos (s :: acc) | ||
| None, rest -> (acc, rest) | ||
in | ||
let rec loop ic pos acc = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is OK to be simple here. You can look at the implementation of input_line
for inspiration; there they keep a list of fixed-size buffers instead of resizing a single buffer to fit the string being read.
https://github.com/ocaml/ocaml/blob/10364b67d5d7af71b8626c805ffc0ec49d9b1522/stdlib/stdlib.ml#L439-L465
eb83249
to
5d37f2b
Compare
Test added, verified it fails on master. |
5d37f2b
to
fb069f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I left a few comments, but they are very minor; up to you to take them or not. Thanks!
let actual_input = | ||
match rem with | ||
| None -> actual_input | ||
| Some rem -> rem ^ actual_input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanted to improve the worst-case scenario of repeatedly concatenating a string here, rem
could be a string list
instead (the function scan_inputs_buf
would have to be adapted in turn).
But I understand that this is not relevant for the case at hand of dealing with filenames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's improve this indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After giving it a go IMHO the extra complexity of the code was not worth it; I've added a comment instead. We can improve it later if we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks.
fb069f7
to
fb10080
Compare
…uoted In order to avoid quoting of file names, we need to call `git ls-tree` with the `-z` option. This fixes problems with `dune subst` in the presence of unicode files, in particular fixes ocaml#3219 Signed-off-by: Emilio Jesus Gallego Arias <e+git@x80.org>
Signed-off-by: Emilio Jesus Gallego Arias <e+git@x80.org>
fb10080
to
5bd579b
Compare
Merged, thanks! |
… is propagated `dune subst` is broken on unicode files, see ocaml/dune#3879 and ocaml/dune#3879 This is a frequent problem, so disabling pending on dune 2.8 being released.
… is propagated `dune subst` is broken on unicode files, see ocaml/dune#3879 and ocaml/dune#3879 This is a frequent problem, introduced by coq#13374 , so disabling pending on dune 2.8 being released.
…ne-action-plugin, dune-private-libs and dune-glob (2.8.0) CHANGES: - `dune rules` accepts aliases and other non-path rules (ocaml/dune#4063, @mrmr1993) - Action `(diff reference test_result)` now accept `reference` to be absent and in that case consider that the reference is empty. Then running `dune promote` will create the reference file. (ocaml/dune#3795, @bobot) - Ignore special files (BLK, CHR, FIFO, SOCKET), (ocaml/dune#3570, fixes ocaml/dune#3124, ocaml/dune#3546, @ejgallego) - Experimental: Simplify loading of additional files (data or code) at runtime in programs by introducing specific installation sites. In particular it allow to define plugins to be installed in these sites. (ocaml/dune#3104, ocaml/dune#3794, fixes ocaml/dune#1185, @bobot) - Move all temporary files created by dune to run actions to a single directory and make sure that actions executed by dune also use this directory by setting `TMPDIR` (or `TEMP` on Windows). (ocaml/dune#3691, fixes ocaml/dune#3422, @rgrinberg) - Fix bootstrap script with custom configuration. (ocaml/dune#3757, fixes ocaml/dune#3774, @marsam) - Add the `executable` field to `inline_tests` to customize the compilation flags of the test runner executable (ocaml/dune#3747, fixes ocaml/dune#3679, @lubegasimon) - Add `(enabled_if ...)` to `(copy_files ...)` (ocaml/dune#3756, @nojb) - Make sure Dune cleans up the status line before exiting (ocaml/dune#3767, fixes ocaml/dune#3737, @alan-j-hu) - Add `{gitlab,bitbucket}` as options for defining project sources with `source` stanza `(source (<host> user/repo))` in the `dune-project` file. (ocaml/dune#3813, @rgrinberg) - Fix generation of `META` and `dune-package` files when some targets (byte, native, dynlink) are disabled. Previously, dune would generate all archives for regardless of settings. (ocaml/dune#3829, ocaml/dune#4041, @rgrinberg) - Do not run ocamldep to for single module executables & libraries. The dependency graph for such artifacts is trivial (ocaml/dune#3847, @rgrinberg) - Fix cram tests inside vendored directories not being interpreted correctly. (ocaml/dune#3860, fixes ocaml/dune#3843, @rgrinberg) - Add `package` field to private libraries. This allows such libraries to be installed and to be usable by other public libraries in the same project (ocaml/dune#3655, fixes ocaml/dune#1017, @rgrinberg) - Fix the `%{make}` variable on Windows by only checking for a `gmake` binary on UNIX-like systems as a unrelated `gmake` binary might exist on Windows. (ocaml/dune#3853, @kit-ty-kate) - Fix `$ dune install` modifying the build directory. This made the build directory unusable when `$ sudo dune install` modified permissions. (fix ocaml/dune#3857, @rgrinberg) - Fix handling of aliases given on the command line (using the `@` and `@@` syntax) so as to correctly handle relative paths. (ocaml/dune#3874, fixes ocaml/dune#3850, @nojb) - Allow link time code generation to be used in preprocessing executable. This makes it possible to use the build info module inside the preprocessor. (ocaml/dune#3848, fix ocaml/dune#3848, @rgrinberg) - Correctly call `git ls-tree` so unicode files are not quoted, this fixes problems with `dune subst` in the presence of unicode files. Fixes ocaml/dune#3219 (ocaml/dune#3879, @ejgallego) - `dune subst` now accepts common command-line arguments such as `--debug-backtraces` (ocaml/dune#3878, @ejgallego) - `dune describe` now also includes information about executables in addition to that of libraries. (ocaml/dune#3892, ocaml/dune#3895, @nojb) - instrumentation backends can now receive arguments via `(instrumentation (backend <name> <args>))`. (ocaml/dune#3906, ocaml/dune#3932, @nojb) - Tweak auto-formatting of `dune` files to improve readability. (ocaml/dune#3928, @nojb) - Add a switch argument to opam when context is not default. (ocaml/dune#3951, @tmattio) - Avoid pager when running `$ git diff` (ocaml/dune#3912, @AltGr) - Add `(root_module ..)` field to libraries & executables. This makes it possible to use library dependencies shadowed by local modules (ocaml/dune#3825, @rgrinberg) - Allow `(formatting ...)` field in `(env ...)` stanza to set per-directory formatting specification. (ocaml/dune#3942, @nojb) - [coq] In `coq.theory`, `:standard` for the `flags` field now uses the flags set in `env` profile flags (ocaml/dune#3931 , @ejgallego @rgrinberg) - [coq] Add `-q` flag to `:standard` `coqc` flags , fixes ocaml/dune#3924, (ocaml/dune#3931 , @ejgallego) - Add support for Coq's native compute compilation mode (@ejgallego, ocaml/dune#3210) - Add a `SUFFIX` directive in `.merlin` files for each dialect with no preprocessing, to let merlin know of additional file extensions (ocaml/dune#3977, @vouillon) - Stop promoting `.merlin` files. Write per-stanza Merlin configurations in binary form. Add a new subcommand `dune ocaml-merlin` that Merlin can use to query the configuration files. The `allow_approximate_merlin` option is now useless and deprecated. Dune now conflicts with `merlin < 3.4.0` and `ocaml-lsp-server < 1.3.0` (ocaml/dune#3554, @voodoos) - Configurator: fix a bug introduced in 2.6.0 where the configurator V1 API doesn't work at all when used outside of dune. (ocaml/dune#4046, @aalekseyev) - Fix `libexec` and `libexec-private` variables. In cross-compilation settings, they now point to the file in the host context. (ocaml/dune#4058, fixes ocaml/dune#4057, @TheLortex) - When running `$ dune subst`, use project metadata as a fallback when package metadata is missing. We also generate a warning when `(name ..)` is missing in `dune-project` files to avoid failures in production builds. - Remove support for passing `-nodynlink` for executables. It was bypassed in most cases and not correct in other cases in particular on arm32. (ocaml/dune#4085, fixes ocaml/dune#4069, fixes ocaml/dune#2527, @emillon) - Generate archive rules compatible with 4.12. Dune longer attempt to generate an archive file if it's unnecessary (ocaml/dune#3973, fixes ocaml/dune#3766, @rgrinberg) - Fix generated Merlin configurations when multiple preprocessors are defined for different modules in the same folder. (ocaml/dune#4092, fixes ocaml/dune#2596, ocaml/dune#1212 and ocaml/dune#3409, @voodoos) - Add the option `use_standard_c_and_cxx_flags` to `dune-project` that 1. disables the unconditional use of the `ocamlc_cflags` and `ocamlc_cppflags` from `ocamlc -config` in C compiler calls, these flags will be present in the `:standard` set instead; and 2. enables the detection of the C compiler family and populates the `:standard` set of flags with common default values when building CXX stubs. (ocaml/dune#3875, ocaml/dune#3802, fix ocaml/dune#3718 and ocaml/dune#3528, @voodoos)
In order to avoid quoting we need to call
git ls-tree
with the-z
option.
This fixes problems with
dune subst
in the presence of unicodefiles, in particular fixes #3219
I did a quick-and-dirty implementation of
input_zero_separated
, as I assume we would like to have it elsewhere?Let me know what you think folks and I can finish it.
Also, I didn't add a test case as I'm a bit scared about unicode filenames in the tree indeed.