Skip to content
This repository has been archived by the owner on Sep 25, 2020. It is now read-only.

Codec.string should fail on chars >=256 #79

Open
dominique-unruh opened this issue May 8, 2019 · 4 comments
Open

Codec.string should fail on chars >=256 #79

dominique-unruh opened this issue May 8, 2019 · 4 comments

Comments

@dominique-unruh
Copy link
Contributor

Currently, Codec.string.encode will send any string passed to it to Isabelle.

However, Isabelle can only handle character codes 0-255. Anything else leads to a Chr exception being thrown in Isabelle. (By function Char.chr.)

For example system.invoke(Operation.Hello)("α") will fail.

I think Codec.string.encode should already raise an exception if the input contains chars >=256 since otherwise one gets a hard to track down exception (The exception from Char.chr comes without any information why this exception is raised.)

@larsrh
Copy link
Owner

larsrh commented May 8, 2019

I'm a little surprised. I thought that everything above 128 gets escaped:

@dominique-unruh
Copy link
Contributor Author

It's hard to be sure because I don't know how to see a stack trace for the ML side. But I suspect the following happens:

  • libisabelle produces properly escaped XML, e.g., ÈӒ
  • The ML code decodes the XML. When it encounters È, it calls Char.chr 200 to get the first character. This succeeds. Then it encounters Ӓ, it calls Char.chr 1234. This raises an exception because chars in ML cannot be >256.

As far as I understand, strings in ML cannot have code positions >=256, so there is no valid interpretation of ÈӒ (unless we encode the strings in UTF8 but I think that's not what is done).

Usually, one should not need non-ASCII strings in Isabelle anyway because it uses \<...> sequences for all non-ASCII symbols anyway. So an error seems to be the right response.

(One can convert a certain fragment of Unicode to symbols and back, https://github.com/dominique-unruh/qrhl-tool/blob/1f1f94f2ec7e86d6867f55495026458c5fdf21d8/src/main/scala/qrhl/isabelle/Isabelle.scala#L566 and https://github.com/dominique-unruh/qrhl-tool/blob/1f1f94f2ec7e86d6867f55495026458c5fdf21d8/src/main/scala/qrhl/isabelle/Isabelle.scala#L547 contain code for that.)

@larsrh
Copy link
Owner

larsrh commented May 9, 2019

Okay, I can see that now. Could you PR the change?

@dominique-unruh
Copy link
Contributor Author

To make a PR I would have to set up the whole libisabelle build.
Instead, here is just the code fragment that I would replace Codec.string with:

  private def check256(str:String): String = {
    if (str.exists(_>=256))
      throw new RuntimeException // TODO put here whatever exception is appropriate according to the design of libisabelle
    str
  }
  implicit def string: Codec[String] = text[String](check256, Some.apply, "string").tagged("string")

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants