-
-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map string <-> ettBinary, []byte -> ettBitBinary #68
Conversation
if bits != 8 { | ||
b[n-1] = b[n-1] >> (8 - bits) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark data first before coming to a conclusion?
buf := b.Extend(1 + 4 + lenBinary) | ||
buf[0] = ettBinary | ||
buf := b.Extend(1 + 4 + 1 + lenBinary) | ||
buf[0] = ettBitBinary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your idea to use ettBinary as a transport for the string and ettBitBinary for the real binary data but it makes Ergo-Ergo interaction a bit harder for the case of string usage or for the case if I sent real binary (not a string) from the Erlang side. That's why I prefer to see
Ergo -> | transport -> | Erlang -> | transport -> | Ergo |
---|---|---|---|---|
[]byte | ettBinary | <<...>> | ettBinary | []byte |
string (no utf8) | ettString | ".." | ettString | string |
string (utf8) | ettString | [byte()] | ettList | string (via TermIntoStruct, TermMapIntoStruct, TermToString) |
etf.Charlist | ettList | charlist | ettList | etf.Charlist (via ...) |
etf.String | ettBinary | <<..>> | ettBinary | etf.String (via ...) |
and here are prioritized transitions for me so far as it doesn't require any extra conversions.
Ergo -> | transport -> | Ergo |
---|---|---|
string (utf8) | ettString | string |
[]byte | ettBinary | []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit was based on your previous input that this library's priority is Ergo <-> Ergo.
If we start considering the case for Erlang, we should also consider the case for Elixir. Elixir strings are binaries.
Do also note that the current ergo implementation can only send ASCII strings to Erlang and there are no safety checks to ensure that the user passes only ASCII goStrings.
Rather than have golang side waste CPU cycles to check if a string contains utf8 or not, it's why etf.String
was added to support sending legacy style ASCII-only strings to Erlang.
Ergo -> | transport -> | Elixir | transport -> | Ergo |
---|---|---|---|---|
[]byte | ettBitBinary | <<.....>> | ettBitBinary | []byte |
string (ascii) | ettBinary | "...." | ettBinary | string |
string (utf8) | ettBinary | "...." | ettBinary | string |
etf.String | ettString | '....' or [ , , ] | ettString | etf.String |
Note single quotes in Elixir produce charlists (and only support ascii characters), unlike double quotes .
Charlists are defined as a linked list of positive integers that can use [ h | tail ] pattern matching. (Charlist is not a concrete type in elixir).
ettString
automatically becomes a charlist in Elixir (and are displayed as string with single quotes in Elixir shell).
List of positive integers (charlist) are also displayed as string with quotes in Erlang shell: https://erlang.org/doc/apps/stdlib/unicode_usage.html#heuristic-string-detection
This is because "..." in Erlang by default creates a list of integers (i.e. charlist).
There is no impact on Ergo <-> Ergo integration.
Ergo -> | transport -> | Ergo |
---|---|---|
string (utf8) | ettBinary | string |
byte[] | ettBitBinary | []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about Elixir <<...>>> -> ettBitBinary. May I ask you to show the same output
16> term_to_binary(<<1,2,3>>).
<<131,109,0,0,0,3,1,2,3>>
but in Elixir shell? (I'm not familiar with it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just found
iex(1)> :erlang.term_to_binary(<<1,2,3>>);
<<131, 109, 0, 0, 0, 3, 1, 2, 3>>
as you may notice it was encoded as ettBinary (109) which means there is no way to get []byte on the Ergo side using your approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elixir:
Interactive Elixir (1.12.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> <<195,165,195,164,195,182>>
"åäö"
iex(2)> :erlang.term_to_binary("åäö")
<<131, 109, 0, 0, 0, 6, 195, 165, 195, 164, 195, 182>>
iex(3)> :erlang.term_to_binary("123")
<<131, 109, 0, 0, 0, 3, 49, 50, 51>>
iex(4)> :erlang.term_to_binary("日本")
<<131, 109, 0, 0, 0, 6, 230, 151, 165, 230, 156, 172>>
Erlang:
Eshell V10.7.2.12 (abort with ^G)
1> <<195,165,195,164,195,182>>.
<<"åäö"/utf8>>
2> term_to_binary(<<"åäö"/utf8>>).
<<131,109,0,0,0,6,195,165,195,164,195,182>>
3> term_to_binary("åäö").
<<131,107,0,3,229,228,246>>
4> term_to_binary("123").
<<131,107,0,3,49,50,51>>
5> term_to_binary("日本").
<<131,108,0,0,0,2,98,0,0,101,229,98,0,0,103,44,106>>
I do understand your point, but it's not a coincidence that we can Use Strings as Byte Slices in golang: https://go101.org/article/string.html#use-string-as-byte-slice
(Not just for copy/append, but even when indexing a string.)
Since string
is just an immutable []byte
according to Rob Pike...
So i think the question would be, should decoded binary values be immutable or mutable?
Does immutability in this case help prevent a class programming bugs?
This topic is still a big contention even within the Golang Issue Tracker:
See "Strengths of This Proposal" from:
- proposal: spec: read-only types golang/go#22876
- proposal: spec: immutable data golang/go#37303
- proposal: spec: support read-only and practical immutable values in Go golang/go#32245
- proposal: spec: immutable type qualifier golang/go#27975
Which is why I would approach it from language design: what is a string in Elixir, what is a string in Erlang, and what is a String in Golang?
And i find that the common denominator is that strings are just an immutable sequence of bytes in all three languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, 107 is just an optimisation on 108 (or 109).
Try searching for "StrangeList" in the Erlang docs. (Those are caused by 107).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another Interesting Side Note:
All of Erlang standard lib and modern 3rd-party Erlang libraries, can always accept (and behave the same on) either 108 (CharList) or 109 (Binary), but not always 107 (StrangeList).
We have tested this pretty extensively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we just make this configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate your effort to make this project better, but seemingly this approach differs from the way this project goes.
explanation #68 (comment) why this approach can't be accepted |
First off, thanks so much for the great work on this library. I hope to be able to bring to light the extensive research done by the Elixir team with regards to interoperability with Erlang: https://elixirschool.com/en/lessons/advanced/erlang/#strings Please see #68 (comment) on why this approach should be accepted. |
I'll let the creator of Erlang explain further: https://elixirforum.com/t/why-are-charlist-and-string-co-existing-in-erlang/31566 |
encode/decode strings longer than 65535
Two different erlang authors interfacing to lua<-> erlang. Both came to the same conclusion independently, that erlang's binary-strings should be mapped to lua "string", and erlang's list-strings should be mapped to lua lists. https://github.com/rtraschke/erlang-lua#value-translation-from-erlang-to-lua https://github.com/rvirding/luerl/wiki/0.7-Data-representation#data-types https://github.com/rvirding/luerl/wiki/0.7-Data-representation#strings |
Because Lua doesn't have a binary type. Anyway, you may want to try my implementation in the 'otp2324' branch. See README file there. |
Ruby has byte arrays. See how types are mapped: |
Who said that's a good mapping? :) especially looking at ... let alone this project is on hold last 13 years )) Feel free to make a fork with data mapping you like. This PR won't be accepted. I've made etf.String and etf.Charlist to support interaction with Erlang/Elixir and it should be enough so far |
not valid any more |
Implement encode/decode for the types below without affecting node logic. (All tests pass)
Mapping Types
Pros:
Cons:
[]byte
are now sent as ETF bit_binary with bits set to 8. (Overhead of 1 byte)