Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[git] Correctly call git ls-tree so unicode files are not quoted #3879

Merged
merged 2 commits into from
Oct 23, 2020

Conversation

ejgallego
Copy link
Collaborator

In order to avoid quoting we need to call git ls-tree with the -z
option.

This fixes problems with dune subst in the presence of unicode
files, in particular fixes #3219

I did a quick-and-dirty implementation of input_zero_separated, as I assume we would like to have it elsewhere?

Let me know what you think folks and I can finish it.

Also, I didn't add a test case as I'm a bit scared about unicode filenames in the tree indeed.

Copy link
Collaborator

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

  • There is a small problem with the string-reading code
  • For the test perhaps you can create the file at runtime to avoid having to commit it

src/stdune/io.ml Outdated
| Some s, eos -> process_buf buf eos (s :: acc)
| None, rest -> (acc, rest)
in
let rec loop ic pos acc =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks buggy. If one of the strings has length >= Bytes.length buf, this code will keep reading the string until buf gets filled up (ie pos = buf_size) and then the exit clause will trigger res = 0 prematurely.

Either we need to fail if we come across a long string or you need to keep some kind of variable-length buffer to store the string that is currently being read.

Copy link
Collaborator Author

@ejgallego ejgallego Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh indeed the code doesn't support a string longer than >= Bytes.length buf; that's OK for filenames but not in general.

As I mentioned I hacked something quick and dirty, but surely there has to be a reference implementation we could use [maybe in Base?] instead of rolling our own, right? This kind of stuff can be tricky (specially perf-wise) so I prefer to use a tested implementation.

Otherwise I'd be happy to do a pass on the code a fix the bugs and maybe tweak a few things here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is OK to be simple here. You can look at the implementation of input_line for inspiration; there they keep a list of fixed-size buffers instead of resizing a single buffer to fit the string being read.
https://github.com/ocaml/ocaml/blob/10364b67d5d7af71b8626c805ffc0ec49d9b1522/stdlib/stdlib.ml#L439-L465

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do rely on caml_input_scan_line tho, we don't have anything equivalent so I'm not sure how we can simplify.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I did a new, I hope simpler and more correct version, can you have a look while I add the test?

@nojb nojb self-assigned this Oct 21, 2020
@@ -142,5 +151,5 @@ let files =

let to_dyn = Dyn.Encoder.list Path.to_dyn
end ))
~git:(f [ "ls-tree"; "-r"; "--name-only"; "HEAD" ])
~git:(f_zero [ "ls-tree"; "-z"; "-r"; "--name-only"; "HEAD" ])
~hg:(f [ "files" ])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hg should be given a similar treatment with hg files -0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/stdune/io.ml Outdated
| Some s, eos -> process_buf buf eos (s :: acc)
| None, rest -> (acc, rest)
in
let rec loop ic pos acc =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is OK to be simple here. You can look at the implementation of input_line for inspiration; there they keep a list of fixed-size buffers instead of resizing a single buffer to fit the string being read.
https://github.com/ocaml/ocaml/blob/10364b67d5d7af71b8626c805ffc0ec49d9b1522/stdlib/stdlib.ml#L439-L465

@ejgallego ejgallego force-pushed the git+fix_ls_tree branch 2 times, most recently from eb83249 to 5d37f2b Compare October 23, 2020 10:52
@ejgallego
Copy link
Collaborator Author

Test added, verified it fails on master.

Copy link
Collaborator

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I left a few comments, but they are very minor; up to you to take them or not. Thanks!

src/stdune/io.ml Outdated Show resolved Hide resolved
src/stdune/io.ml Outdated Show resolved Hide resolved
let actual_input =
match rem with
| None -> actual_input
| Some rem -> rem ^ actual_input
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to improve the worst-case scenario of repeatedly concatenating a string here, rem could be a string list instead (the function scan_inputs_buf would have to be adapted in turn).

But I understand that this is not relevant for the case at hand of dealing with filenames.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's improve this indeed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After giving it a go IMHO the extra complexity of the code was not worth it; I've added a comment instead. We can improve it later if we want.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks.

…uoted

In order to avoid quoting of file names, we need to call `git ls-tree`
with the `-z` option.

This fixes problems with `dune subst` in the presence of unicode
files, in particular fixes ocaml#3219

Signed-off-by: Emilio Jesus Gallego Arias <e+git@x80.org>
Signed-off-by: Emilio Jesus Gallego Arias <e+git@x80.org>
@nojb nojb merged commit 2a1ed9e into ocaml:master Oct 23, 2020
@nojb
Copy link
Collaborator

nojb commented Oct 23, 2020

Merged, thanks!

@ejgallego ejgallego deleted the git+fix_ls_tree branch October 23, 2020 15:16
ejgallego added a commit to ejgallego/coq that referenced this pull request Dec 4, 2020
… is propagated

`dune subst` is broken on unicode files, see
ocaml/dune#3879 and
ocaml/dune#3879

This is a frequent problem, so disabling pending on dune 2.8 being
released.
ejgallego added a commit to ejgallego/coq that referenced this pull request Dec 4, 2020
… is propagated

`dune subst` is broken on unicode files, see
ocaml/dune#3879 and
ocaml/dune#3879

This is a frequent problem, introduced by
coq#13374 , so disabling pending on dune
2.8 being released.
rgrinberg added a commit to rgrinberg/opam-repository that referenced this pull request Jan 13, 2021
…ne-action-plugin, dune-private-libs and dune-glob (2.8.0)

CHANGES:

- `dune rules` accepts aliases and other non-path rules (ocaml/dune#4063, @mrmr1993)

- Action `(diff reference test_result)` now accept `reference` to be absent and
  in that case consider that the reference is empty. Then running `dune promote`
  will create the reference file. (ocaml/dune#3795, @bobot)

- Ignore special files (BLK, CHR, FIFO, SOCKET), (ocaml/dune#3570, fixes ocaml/dune#3124, ocaml/dune#3546,
  @ejgallego)

- Experimental: Simplify loading of additional files (data or code) at runtime
  in programs by introducing specific installation sites. In particular it allow
  to define plugins to be installed in these sites. (ocaml/dune#3104, ocaml/dune#3794, fixes ocaml/dune#1185,
  @bobot)

- Move all temporary files created by dune to run actions to a single directory
  and make sure that actions executed by dune also use this directory by setting
  `TMPDIR` (or `TEMP` on Windows). (ocaml/dune#3691, fixes ocaml/dune#3422, @rgrinberg)

- Fix bootstrap script with custom configuration. (ocaml/dune#3757, fixes ocaml/dune#3774, @marsam)

- Add the `executable` field to `inline_tests` to customize the compilation
  flags of the test runner executable (ocaml/dune#3747, fixes ocaml/dune#3679, @lubegasimon)

- Add `(enabled_if ...)` to `(copy_files ...)` (ocaml/dune#3756, @nojb)

- Make sure Dune cleans up the status line before exiting (ocaml/dune#3767,
  fixes ocaml/dune#3737, @alan-j-hu)

- Add `{gitlab,bitbucket}` as options for defining project sources with `source`
  stanza `(source (<host> user/repo))` in the `dune-project` file.  (ocaml/dune#3813,
  @rgrinberg)

- Fix generation of `META` and `dune-package` files when some targets (byte,
  native, dynlink) are disabled. Previously, dune would generate all archives
  for regardless of settings. (ocaml/dune#3829, ocaml/dune#4041, @rgrinberg)

- Do not run ocamldep to for single module executables & libraries. The
  dependency graph for such artifacts is trivial (ocaml/dune#3847, @rgrinberg)

- Fix cram tests inside vendored directories not being interpreted correctly.
  (ocaml/dune#3860, fixes ocaml/dune#3843, @rgrinberg)

- Add `package` field to private libraries. This allows such libraries to be
  installed and to be usable by other public libraries in the same project
  (ocaml/dune#3655, fixes ocaml/dune#1017, @rgrinberg)

- Fix the `%{make}` variable on Windows by only checking for a `gmake` binary
  on UNIX-like systems as a unrelated `gmake` binary might exist on Windows.
  (ocaml/dune#3853, @kit-ty-kate)

- Fix `$ dune install` modifying the build directory. This made the build
  directory unusable when `$ sudo dune install` modified permissions. (fix
  ocaml/dune#3857, @rgrinberg)

- Fix handling of aliases given on the command line (using the `@` and `@@`
  syntax) so as to correctly handle relative paths. (ocaml/dune#3874, fixes ocaml/dune#3850, @nojb)

- Allow link time code generation to be used in preprocessing executable. This
  makes it possible to use the build info module inside the preprocessor.
  (ocaml/dune#3848, fix ocaml/dune#3848, @rgrinberg)

- Correctly call `git ls-tree` so unicode files are not quoted, this fixes
  problems with `dune subst` in the presence of unicode files. Fixes ocaml/dune#3219
  (ocaml/dune#3879, @ejgallego)

- `dune subst` now accepts common command-line arguments such as
  `--debug-backtraces` (ocaml/dune#3878, @ejgallego)

- `dune describe` now also includes information about executables in addition to
  that of libraries. (ocaml/dune#3892, ocaml/dune#3895, @nojb)

- instrumentation backends can now receive arguments via `(instrumentation
  (backend <name> <args>))`. (ocaml/dune#3906, ocaml/dune#3932, @nojb)

- Tweak auto-formatting of `dune` files to improve readability. (ocaml/dune#3928, @nojb)

- Add a switch argument to opam when context is not default. (ocaml/dune#3951, @tmattio)

- Avoid pager when running `$ git diff` (ocaml/dune#3912, @AltGr)

- Add `(root_module ..)` field to libraries & executables. This makes it
  possible to use library dependencies shadowed by local modules (ocaml/dune#3825,
  @rgrinberg)

- Allow `(formatting ...)` field in `(env ...)` stanza to set per-directory
  formatting specification. (ocaml/dune#3942, @nojb)

- [coq] In `coq.theory`, `:standard` for the `flags` field now uses the
  flags set in `env` profile flags (ocaml/dune#3931 , @ejgallego @rgrinberg)

- [coq] Add `-q` flag to `:standard` `coqc` flags , fixes ocaml/dune#3924, (ocaml/dune#3931 , @ejgallego)

- Add support for Coq's native compute compilation mode (@ejgallego, ocaml/dune#3210)

- Add a `SUFFIX` directive in `.merlin` files for each dialect with no
  preprocessing, to let merlin know of additional file extensions (ocaml/dune#3977,
  @vouillon)

- Stop promoting `.merlin` files. Write per-stanza Merlin configurations in
  binary form. Add a new subcommand `dune ocaml-merlin` that Merlin can use to
  query the configuration files. The `allow_approximate_merlin` option is now
  useless and deprecated. Dune now conflicts with `merlin < 3.4.0` and
  `ocaml-lsp-server < 1.3.0` (ocaml/dune#3554, @voodoos)

- Configurator: fix a bug introduced in 2.6.0 where the configurator V1 API
  doesn't work at all when used outside of dune. (ocaml/dune#4046, @aalekseyev)

- Fix `libexec` and `libexec-private` variables. In cross-compilation settings,
  they now point to the file in the host context. (ocaml/dune#4058, fixes ocaml/dune#4057,
  @TheLortex)

- When running `$ dune subst`, use project metadata as a fallback when package
  metadata is missing. We also generate a warning when `(name ..)` is missing in
  `dune-project` files to avoid failures in production builds.

- Remove support for passing `-nodynlink` for executables. It was bypassed in
  most cases and not correct in other cases in particular on arm32.
  (ocaml/dune#4085, fixes ocaml/dune#4069, fixes ocaml/dune#2527, @emillon)

- Generate archive rules compatible with 4.12. Dune longer attempt to generate
  an archive file if it's unnecessary (ocaml/dune#3973, fixes ocaml/dune#3766, @rgrinberg)

- Fix generated Merlin configurations when multiple preprocessors are defined
  for different modules in the same folder. (ocaml/dune#4092, fixes ocaml/dune#2596, ocaml/dune#1212 and
  ocaml/dune#3409, @voodoos)

- Add the option `use_standard_c_and_cxx_flags` to `dune-project` that 1.
  disables the unconditional use of the `ocamlc_cflags` and `ocamlc_cppflags`
  from `ocamlc -config` in C compiler calls, these flags will be present in the
  `:standard` set instead; and 2. enables the detection of the C compiler family
  and populates the `:standard` set of flags with common default values when
  building CXX stubs. (ocaml/dune#3875, ocaml/dune#3802, fix ocaml/dune#3718 and ocaml/dune#3528, @voodoos)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dune segfaults if the codebase contains some UTF-8 chars in (Coq) filenames
2 participants