Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix non-ascii string/binary literals; Add multi-line binary literals #22

Merged
merged 19 commits into from
Jul 7, 2021
Merged

Fix non-ascii string/binary literals; Add multi-line binary literals #22

merged 19 commits into from
Jul 7, 2021

Conversation

michallepicki
Copy link
Contributor

@michallepicki michallepicki commented Jun 9, 2021

Also: require (and ignore) one newline at the start of a string block edit: This change is to have first line aligned horizontally with other lines when you don't want a newline at the start.

Also: require (and ignore) one newline at the start of a string block
src/parser.mly Outdated Show resolved Hide resolved
@michallepicki michallepicki changed the title Allow using string blocks to create binary literals Allow using string blocks to create multi-line binary literals Jun 9, 2021
@michallepicki
Copy link
Contributor Author

michallepicki commented Jun 10, 2021

Hmm I think this may not be handling UTF-8 characters correctly. But possibly we already had this problem for erlang FFI snippets

edit: OR they are escaped correctly but I need to pass some flags for erlang printing / writing to files.

@michallepicki
Copy link
Contributor Author

michallepicki commented Jun 10, 2021

I think the way to go would be to pass the binary to the erlang source mostly as is from the .sest source file, so remove the String.escaped from here when using the "" syntax and add /utf8 modifier in printed erlang binary literal. So the "" syntax would accept all escape sequences that the erlang <<""/utf8>> literal accepts.

For the new multiline ` syntax, newlines need to be handled additionally I think, and I would propose this new syntax to keep user's input "as is" so escape all escape sequences they include. Or just dump it in the erlang file as integers (less readable but maybe easier). What do you think?

@michallepicki
Copy link
Contributor Author

michallepicki commented Jun 10, 2021

It seems that this change is also causing wrong line numbers to be reported in the reported errors when e.g. a type error happens after such string block binary, this should probably be fixed before merging.

edit: This was fixed.

@michallepicki
Copy link
Contributor Author

@gfngfn Please let me know if this would be a welcome addition and I can work on this more.

@gfngfn
Copy link
Owner

gfngfn commented Jun 13, 2021

Oops, I’m sorry for the late response. The addition you are working on is largely welcomed. Thank you!

@michallepicki
Copy link
Contributor Author

@gfngfn So this is semantically more-or-less done, how I see it:

  • '' translates to Erlang list of codepoints ("") so user can use the same characters and escape characters as in Erlang syntax (with exception to " that will be automatically escaped when not escaped in Sesterl)
  • "" translates to Erlang UTF-8 encoded binary (<<""/utf8>>) so user can use the same characters and escape characters as in Erlang syntax
  • `` has raw binary value of the contents, no escape characters supported (or needed since number of ` characters can be changed). If it starts with a line break, the line beak is ignored.

I left some TODOs to think about and tests/documentation are missing, but I would appreciate your thoughts about this design.

src/lexer.mll Outdated Show resolved Hide resolved
@michallepicki
Copy link
Contributor Author

michallepicki commented Jun 14, 2021

I think this is ready for review/merge now :)

@gfngfn
Copy link
Owner

gfngfn commented Jun 15, 2021

Thank you for looking into the implementation. I would like to review your modification.

…ter was treated as escaped; Add more escaped (or not) characters to test
@michallepicki michallepicki changed the title Allow using string blocks to create multi-line binary literals Fix non-ascii string/binary literals; Add multi-line binary literals Jun 19, 2021
@michallepicki
Copy link
Contributor Author

@gfngfn How does it look after my updates from a week ago?

@gfngfn
Copy link
Owner

gfngfn commented Jul 6, 2021

Oh, I’m so sorry for the late response and for not having given any suggestions… The updated change looks very nice to me!

src/outputErlangCode.ml Outdated Show resolved Hide resolved
@gfngfn
Copy link
Owner

gfngfn commented Jul 7, 2021

The change made by this PR looks excellent and correct according to the added tests. I would then like to merge this PR. Thank you so much for the development!

@gfngfn gfngfn merged commit d71be32 into gfngfn:master Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants