Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash with Fatal error: exception Zed_string.Invalid #334

Open
jmid opened this issue Aug 6, 2020 · 7 comments
Open

crash with Fatal error: exception Zed_string.Invalid #334

jmid opened this issue Aug 6, 2020 · 7 comments

Comments

@jmid
Copy link

jmid commented Aug 6, 2020

I use utop in combination with QCheck - for randomized property-based testing.

However, sometimes when I inspect or illustrate QCheck's ability to produce random strings I encounter a crash.

To reproduce:

$ utop
───────────────────────────────┬─────────────────────────────────────────────────────────────┬───────────────────────────────
                               │ Welcome to utop version 2.4.3 (using OCaml version 4.09.0)! │                               
                               └─────────────────────────────────────────────────────────────┘                               
Findlib has been successfully loaded. Additional directives:
  #require "package";;      to load a package
  #list;;                   to list the available packages
  #camlp4o;;                to load camlp4 (standard syntax)
  #camlp4r;;                to load camlp4 (revised syntax)
  #predicates "p,q,...";;   to set these predicates
  Topfind.reset();;         to force that packages will be reloaded
  #thread;;                 to enable threads


Type #utop_help for help about using utop.

─( 13:18:23 )─< command 0 >───────────────────────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # #require "qcheck";;
─( 13:18:23 )─< command 1 >───────────────────────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # open QCheck;;
─( 13:18:30 )─< command 2 >───────────────────────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # Random.init 12; Gen.generate1 ~rand:(Random.get_state ()) Gen.small_string;;
Fatal error: exception Zed_string.Invalid("at position 157: individual combining marks encountered", "- : string =\n\"\195\134x`Ihq\196\178$&\\031\\015\\026k*\\031\194\185\\t\194\179x~jm\195\136i\195\185\195\175\195\166\195\137/\195\156\\025(\194\168\\\\\194\146\195\160^\195\181\194\131[\195\155\195\128\195\161\194\186(5\\021\194\175\202\167\195\180\195\148\195\158\195\1551Y_/E\194\164\194\163\194\191`\195\160h\194\166\195\174\195\153#\194\144\195\188\195\164rG\\026\194\160\194\173\195\152(\194\176\194\158\214\149\\029\194\168G\194\150\\030\194\141\"\n")

Note how I explicitly set the random seed to 12 to attempt something reproducable.
On my machine (MacBook Pro, 64-bit, 3,1 GHz Intel Core i5, w/MacPorts), this consistently crashes utop.
Thus if this particular seed does not crash your machine, try another.
I just started from 1 myself, before I hit the number 12.

I've then managed to reduce the crashing string into the following minimal one:

utop # "\158\214\149";;
Fatal error: exception Zed_string.Invalid("at position 16: individual combining marks encountered", "- : string = \"\194\158\214\149\"\n")

Since I'm not familiar with utop's internals, I cannot tell if this may be a bug in Zed.

@jmid
Copy link
Author

jmid commented Aug 6, 2020

To confirm that this remains an issue, I can reproduce it on utop v.2.6.0:

$ utop
───────────────────────────────┬─────────────────────────────────────────────────────────────┬───────────────────────────────
                               │ Welcome to utop version 2.6.0 (using OCaml version 4.09.0)! │                               
                               └─────────────────────────────────────────────────────────────┘                               
Findlib has been successfully loaded. Additional directives:
  #require "package";;      to load a package
  #list;;                   to list the available packages
  #camlp4o;;                to load camlp4 (standard syntax)
  #camlp4r;;                to load camlp4 (revised syntax)
  #predicates "p,q,...";;   to set these predicates
  Topfind.reset();;         to force that packages will be reloaded
  #thread;;                 to enable threads


Type #utop_help for help about using utop.

─( 14:23:08 )─< command 0 >───────────────────────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # "\158\214\149";;
Fatal error: exception Zed_string.Invalid("at position 16: individual combining marks encountered", "- : string = \"\194\158\214\149\"\n")

@jmid
Copy link
Author

jmid commented Aug 7, 2020

So I tried with a backtrace:

$ env OCAMLRUNPARAM=b utop
───────────────────────────────┬─────────────────────────────────────────────────────────────┬───────────────────────────────
                               │ Welcome to utop version 2.6.0 (using OCaml version 4.09.0)! │                               
                               └─────────────────────────────────────────────────────────────┘                               
(* ... *)
utop # "\158\214\149";;
Fatal error: exception Zed_string.Invalid("at position 16: individual combining marks encountered", "- : string = \"\194\158\214\149\"\n")
Raised at file "src/zed_string.ml", line 23, characters 29-88
Called from file "src/zed_string.ml", line 211, characters 14-30
Called from file "src/zed_string.ml", line 295, characters 23-43
Called from file "src/lTerm_text_impl.ml", line 24, characters 65-96
Called from file "src/lib/uTop_main.ml", line 301, characters 17-42
Called from file "src/lib/uTop_main.ml", line 810, characters 30-61
Re-raised at file "parsing/location.ml", line 898, characters 22-25
Called from file "src/lib/uTop.ml", line 114, characters 2-11
Called from file "src/lib/uTop_main.ml", line 816, characters 21-61
Called from file "src/lib/uTop_main.ml", line 1555, characters 8-17
Called from file "src/lib/uTop_main.ml", line 1570, characters 4-25

It appears this conversion https://github.com/ocaml-community/utop/blob/master/src/lib/uTop_main.ml#L301
helps assemble escape codes into "prettier utf8 glyphs" (if I understand correctly). The lTerm call
https://github.com/ocaml-community/lambda-term/blob/master/src/lTerm_text_impl.ml#L30
itself calls Zed_string.unsafe_of_utf8.

Since a conversion failure is possible (we are calling an unsafe function), I guess a reasonable fix would be to keep the "unstyled string" in this case. I don't know if a "partial styling" is possible.

@jmid
Copy link
Author

jmid commented Aug 25, 2020

I understand that utop is maintained by the community.
However I would appreciate if someone more familiar with the internals could briefly (a) confirm that I'm on the right track (and ideally: the bug), or (b) suggest an alternative approach for a fix.

@emillon
Copy link
Collaborator

emillon commented Aug 25, 2020

Hi,

Yes, I agree that it's a bug and that printing a string should never result in an exception.

The core issue seems to be that ocaml strings are just a sequence of bytes, and many operations that utop does only work strings that are valid utf8 encodings. That's why there are of_string and of_utf8 functions.

When rendering an output phrase, we only have a string, and we don't know if it's valid utf8. So it seems that there's a sort of preprocessing phase:

utop/src/lib/uTop_main.ml

Lines 300 to 301 in 7bc5117

let string = fix_string string in
let styled = LTerm_text.of_utf8 string in

My understanding is that fix_string is expected to return a valid utf8 string, but in some cases (found by your fuzzer), it does not work. So, it looks like patching fix_string could be a solution too.

@emillon
Copy link
Collaborator

emillon commented Aug 25, 2020

I just noticed that in the case of long strings, it just does not attempt to do any styling:

utop/src/lib/uTop_main.ml

Lines 297 to 298 in 7bc5117

if String.length string >= 100 * 1024 then
LTerm.fprint term string

This can be a reasonable fallback in case of an exception if there's no easy way to alter fix_string's behaviour.

@jmid
Copy link
Author

jmid commented Oct 7, 2020

OK, I've now taken a look at this.
I've managed to print the (invalid uft-8) string which ends up as question marks in the terminal.
However I have not been able to do so while maintaining utop's coloring of the output ("orange for strings").
This is not exactly optimal, as printing"\158\214\149" with any of the three characters removed brings back color...

My best attempt is to wrap the relevant part of render_out_phrase in an exception handler as follows:

let render_out_phrase term string =
  if String.length string >= 100 * 1024 then
    LTerm.fprint term string
  else
    try
      begin
        let string = fix_string string in
        let styled = LTerm_text.of_utf8 string in
        let stylise loc token_style =
          for i = loc.idx1 to loc.idx2 - 1 do
            let ch, style = styled.(i) in
            styled.(i) <- (ch, LTerm_style.merge token_style style)
          done
        in
        UTop_styles.stylise stylise (UTop_lexer.lex_string string);
        LTerm.fprints term styled
      end
    with (Zed_string.Invalid _) ->
      (Printf.printf "got you!\n%!";
       LTerm.fprint term (fix_string string))

I think I'm bitten by this design decision of LambdaTerm:

"All these functions accept only valid UTF-8 strings (or unicode styled text)"
https://ocaml-community.github.io/lambda-term/3.1.0/lambda-term/LTerm/index.html#printing

This means a LTerm.fprints-printer accepting a LTerm_text.t will end up having to convert the string first - which rethrows the exception 🤔

Further input is welcome!

@Armael
Copy link

Armael commented Sep 7, 2022

I also just hit this issue, on a longer input (~350 bytes) but with the same error:

exception Zed_string.Invalid("at position 331: individual combining marks encountered", "val contents : string =\n  \"y\.....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants