Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtins.toXML can return strings with invalid UTF-8 encoding #12061

Open
2 tasks done
NaN-git opened this issue Dec 15, 2024 · 2 comments
Open
2 tasks done

builtins.toXML can return strings with invalid UTF-8 encoding #12061

NaN-git opened this issue Dec 15, 2024 · 2 comments
Labels
bug idea approved The given proposal has been discussed and approved by the Nix team. An implementation is welcome. language The Nix expression language; parser, interpreter, primops, evaluation, etc

Comments

@NaN-git
Copy link

NaN-git commented Dec 15, 2024

Describe the bug

Applying builtins.toXML to a string with invalid UTF-8 encoding returns

"<?xml version='1.0' encoding='utf-8'?>\n<expr>\n  <string value=\"[...]\" />\n</expr>\n"

where [...] is the input string with invalid UTF-8 encoding.

Steps To Reproduce

  1. Download test file:
wget -O - https://github.com/flenniken/utf8tests/raw/refs/heads/main/utf8tests.bin | head -n 208 > utf8tests.bin
  1. Evaluate the following Nix expression, e.g. in nix repl:
builtins.toXML (builtins.readFile ./utf8tests.bin)
  1. The output contains the invalid UTF-8 input string.

Expected behavior

Either builtins.readFile or builtins.toXML should fail and a proper error message should be displayed.

Additional context

Related to issue #12060.

Checklist


Add 👍 to issues you find important.

@NaN-git NaN-git added the bug label Dec 15, 2024
@roberth roberth added the language The Nix expression language; parser, interpreter, primops, evaluation, etc label Dec 18, 2024
@roberth roberth added this to Nix team Dec 18, 2024
@github-project-automation github-project-automation bot moved this to To triage in Nix team Dec 18, 2024
@edolstra edolstra added the idea approved The given proposal has been discussed and approved by the Nix team. An implementation is welcome. label Dec 18, 2024
@edolstra
Copy link
Member

Team discussion:

  • Agreed that this should be either an error or a warning (if we're afraid of breaking stuff). Not a high priority though since it's a bit garbage-in-garbage-out (i.e. currently it's the user's responsibility to ensure that the input strings are all valid UTF-8).

@edolstra edolstra removed this from Nix team Dec 18, 2024
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-12-18-nix-team-meeting-minutes-204/57602/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug idea approved The given proposal has been discussed and approved by the Nix team. An implementation is welcome. language The Nix expression language; parser, interpreter, primops, evaluation, etc
Projects
None yet
Development

No branches or pull requests

4 participants