Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map string <-> ettBinary, []byte -> ettBitBinary #68

Closed
wants to merge 3 commits into from

Conversation

heri16
Copy link

@heri16 heri16 commented Aug 2, 2021

Implement encode/decode for the types below without affecting node logic. (All tests pass)

Mapping Types

Golang Type ETF Type
string binary
[]byte bitBinary

Pros:

  • Support longer strings (now max length is 4294967295, previously was 65535)

Cons:

  • []byte are now sent as ETF bit_binary with bits set to 8. (Overhead of 1 byte)

@CLAassistant
Copy link

CLAassistant commented Aug 2, 2021

CLA assistant check
All committers have signed the CLA.

etf/encode.go Outdated Show resolved Hide resolved
Comment on lines +441 to +443
if bits != 8 {
b[n-1] = b[n-1] >> (8 - bits)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark data first before coming to a conclusion?

etf/decode.go Outdated Show resolved Hide resolved
buf := b.Extend(1 + 4 + lenBinary)
buf[0] = ettBinary
buf := b.Extend(1 + 4 + 1 + lenBinary)
buf[0] = ettBitBinary
Copy link
Collaborator

@halturin halturin Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your idea to use ettBinary as a transport for the string and ettBitBinary for the real binary data but it makes Ergo-Ergo interaction a bit harder for the case of string usage or for the case if I sent real binary (not a string) from the Erlang side. That's why I prefer to see

Ergo -> transport -> Erlang -> transport -> Ergo
[]byte ettBinary <<...>> ettBinary []byte
string (no utf8) ettString ".." ettString string
string (utf8) ettString [byte()] ettList string (via TermIntoStruct, TermMapIntoStruct, TermToString)
etf.Charlist ettList charlist ettList etf.Charlist (via ...)
etf.String ettBinary <<..>> ettBinary etf.String (via ...)

and here are prioritized transitions for me so far as it doesn't require any extra conversions.

Ergo -> transport -> Ergo
string (utf8) ettString string
[]byte ettBinary []byte

Copy link
Author

@heri16 heri16 Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit was based on your previous input that this library's priority is Ergo <-> Ergo.

If we start considering the case for Erlang, we should also consider the case for Elixir. Elixir strings are binaries.

Do also note that the current ergo implementation can only send ASCII strings to Erlang and there are no safety checks to ensure that the user passes only ASCII goStrings.

Rather than have golang side waste CPU cycles to check if a string contains utf8 or not, it's why etf.String was added to support sending legacy style ASCII-only strings to Erlang.

Ergo -> transport -> Elixir transport -> Ergo
[]byte ettBitBinary <<.....>> ettBitBinary []byte
string (ascii) ettBinary "...." ettBinary string
string (utf8) ettBinary "...." ettBinary string
etf.String ettString '....' or [ , , ] ettString etf.String

Note single quotes in Elixir produce charlists (and only support ascii characters), unlike double quotes .
Charlists are defined as a linked list of positive integers that can use [ h | tail ] pattern matching. (Charlist is not a concrete type in elixir).

ettString automatically becomes a charlist in Elixir (and are displayed as string with single quotes in Elixir shell).

List of positive integers (charlist) are also displayed as string with quotes in Erlang shell: https://erlang.org/doc/apps/stdlib/unicode_usage.html#heuristic-string-detection
This is because "..." in Erlang by default creates a list of integers (i.e. charlist).

There is no impact on Ergo <-> Ergo integration.

Ergo -> transport -> Ergo
string (utf8) ettBinary string
byte[] ettBitBinary []byte

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about Elixir <<...>>> -> ettBitBinary. May I ask you to show the same output

16> term_to_binary(<<1,2,3>>).
<<131,109,0,0,0,3,1,2,3>>

but in Elixir shell? (I'm not familiar with it)

Copy link
Collaborator

@halturin halturin Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just found

iex(1)> :erlang.term_to_binary(<<1,2,3>>);
<<131, 109, 0, 0, 0, 3, 1, 2, 3>>

as you may notice it was encoded as ettBinary (109) which means there is no way to get []byte on the Ergo side using your approach.

Copy link
Author

@heri16 heri16 Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elixir:

Interactive Elixir (1.12.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> <<195,165,195,164,195,182>> 
"åäö"
iex(2)> :erlang.term_to_binary("åäö")
<<131, 109, 0, 0, 0, 6, 195, 165, 195, 164, 195, 182>>
iex(3)> :erlang.term_to_binary("123")
<<131, 109, 0, 0, 0, 3, 49, 50, 51>>
iex(4)> :erlang.term_to_binary("日本")
<<131, 109, 0, 0, 0, 6, 230, 151, 165, 230, 156, 172>>

Erlang:

Eshell V10.7.2.12  (abort with ^G)
1> <<195,165,195,164,195,182>>.
<<"åäö"/utf8>>
2> term_to_binary(<<"åäö"/utf8>>).
<<131,109,0,0,0,6,195,165,195,164,195,182>>
3> term_to_binary("åäö").
<<131,107,0,3,229,228,246>>
4> term_to_binary("123"). 
<<131,107,0,3,49,50,51>>
5> term_to_binary("日本").
<<131,108,0,0,0,2,98,0,0,101,229,98,0,0,103,44,106>>

I do understand your point, but it's not a coincidence that we can Use Strings as Byte Slices in golang: https://go101.org/article/string.html#use-string-as-byte-slice
(Not just for copy/append, but even when indexing a string.)

Since string is just an immutable []byte according to Rob Pike...

So i think the question would be, should decoded binary values be immutable or mutable?
Does immutability in this case help prevent a class programming bugs?

This topic is still a big contention even within the Golang Issue Tracker:
See "Strengths of This Proposal" from:

Which is why I would approach it from language design: what is a string in Elixir, what is a string in Erlang, and what is a String in Golang?

And i find that the common denominator is that strings are just an immutable sequence of bytes in all three languages.

Copy link
Author

@heri16 heri16 Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, 107 is just an optimisation on 108 (or 109).

Try searching for "StrangeList" in the Erlang docs. (Those are caused by 107).

Copy link
Author

@heri16 heri16 Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another Interesting Side Note:

All of Erlang standard lib and modern 3rd-party Erlang libraries, can always accept (and behave the same on) either 108 (CharList) or 109 (Binary), but not always 107 (StrangeList).

We have tested this pretty extensively.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just make this configurable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate your effort to make this project better, but seemingly this approach differs from the way this project goes.

@halturin
Copy link
Collaborator

halturin commented Aug 2, 2021

explanation #68 (comment) why this approach can't be accepted

@heri16 heri16 mentioned this pull request Aug 2, 2021
2 tasks
@heri16
Copy link
Author

heri16 commented Aug 2, 2021

First off, thanks so much for the great work on this library.

I hope to be able to bring to light the extensive research done by the Elixir team with regards to interoperability with Erlang: https://elixirschool.com/en/lessons/advanced/erlang/#strings

Please see #68 (comment) on why this approach should be accepted.

@heri16
Copy link
Author

heri16 commented Aug 2, 2021

I'll let the creator of Erlang explain further:

image

https://elixirforum.com/t/why-are-charlist-and-string-co-existing-in-erlang/31566

@heri16
Copy link
Author

heri16 commented Aug 13, 2021

Two different erlang authors interfacing to lua<-> erlang.

Both came to the same conclusion independently, that erlang's binary-strings should be mapped to lua "string", and erlang's list-strings should be mapped to lua lists.

https://github.com/rtraschke/erlang-lua#value-translation-from-erlang-to-lua

https://github.com/rvirding/luerl/wiki/0.7-Data-representation#data-types

https://github.com/rvirding/luerl/wiki/0.7-Data-representation#strings

@halturin
Copy link
Collaborator

halturin commented Aug 13, 2021

Because Lua doesn't have a binary type. Anyway, you may want to try my implementation in the 'otp2324' branch. See README file there.

@heri16
Copy link
Author

heri16 commented Aug 17, 2021

Ruby has byte arrays.

See how types are mapped:
http://www.erlang-factory.com/upload/presentations/36/tom_preston_werner_erlectricity.pdf

@halturin
Copy link
Collaborator

halturin commented Aug 17, 2021

Who said that's a good mapping? :) especially looking at

image

... let alone this project is on hold last 13 years ))

Feel free to make a fork with data mapping you like. This PR won't be accepted. I've made etf.String and etf.Charlist to support interaction with Erlang/Elixir and it should be enough so far

@halturin
Copy link
Collaborator

not valid any more

@halturin halturin closed this Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants