Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map string <-> ettBinary, []byte -> ettBitBinary #68

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions etf/decode.go
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ func Decode(packet []byte, cache []Atom) (retTerm Term, retByte []byte, retErr e
return nil, nil, errMalformedString
}

term = string(packet[2 : n+2])
term = String(packet[2 : n+2])
packet = packet[n+2:]

case ettCacheRef:
Expand Down Expand Up @@ -357,10 +357,7 @@ func Decode(packet []byte, cache []Atom) (retTerm Term, retByte []byte, retErr e
return nil, nil, errMalformedBinary
}

b := make([]byte, n)
copy(b, packet[4:n+4])

term = b
term = string(packet[4 : n+4])
packet = packet[n+4:]

case ettNil:
Expand Down Expand Up @@ -438,7 +435,10 @@ func Decode(packet []byte, cache []Atom) (retTerm Term, retByte []byte, retErr e

b := make([]byte, n)
copy(b, packet[5:n+5])
b[n-1] = b[n-1] >> (8 - bits)

if bits != 8 {
b[n-1] = b[n-1] >> (8 - bits)
}
Comment on lines +439 to +441
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark data first before coming to a conclusion?


term = b
packet = packet[n+5:]
Expand Down
29 changes: 22 additions & 7 deletions etf/decode_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ func TestDecodeAtom(t *testing.T) {
}

func TestDecodeString(t *testing.T) {
expected := "abc"
expected := String("abc")
packet := []byte{ettString, 0, 3, 97, 98, 99}
term, _, err := Decode(packet, []Atom{})
if err != nil || term != expected {
Expand Down Expand Up @@ -198,7 +198,7 @@ func TestDecodeMap(t *testing.T) {
Atom("abc"): 123,
"abc": 4.56,
}
packet := []byte{116, 0, 0, 0, 2, 100, 0, 3, 97, 98, 99, 97, 123, 107, 0, 3, 97, 98,
packet := []byte{116, 0, 0, 0, 2, 100, 0, 3, 97, 98, 99, 97, 123, 109, 0, 0, 0, 3, 97, 98,
99, 70, 64, 18, 61, 112, 163, 215, 10, 61}

term, _, err := Decode(packet, []Atom{})
Expand All @@ -214,8 +214,23 @@ func TestDecodeMap(t *testing.T) {
}

func TestDecodeBinary(t *testing.T) {
expected := "abc"
packet := []byte{ettBinary, 0, 0, 0, 3, 97, 98, 99}
term, _, err := Decode(packet, []Atom{})
if err != nil || term != expected {
t.Fatal(err)
}

packet = []byte{ettBinary, 0, 3, 97, 98, 99}
term, _, err = Decode(packet, []Atom{})
if err != errMalformedBinary {
t.Fatal(err)
}
}

func TestDecodeBitBinary(t *testing.T) {
expected := []byte{1, 2, 3, 4, 5, 6, 7, 8, 9, 0}
packet := []byte{ettBinary, 0, 0, 0, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0}
packet := []byte{ettBitBinary, 0, 0, 0, 10, 8, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0}

term, _, err := Decode(packet, []Atom{})
if err != nil {
Expand All @@ -228,9 +243,9 @@ func TestDecodeBinary(t *testing.T) {
}
}

func TestDecodeBitBinary(t *testing.T) {
func TestDecodeBitBinaryWithLastBits(t *testing.T) {
expected := []byte{1, 2, 3, 4, 5}
packet := []byte{77, 0, 0, 0, 5, 3, 1, 2, 3, 4, 160}
packet := []byte{ettBitBinary, 0, 0, 0, 5, 3, 1, 2, 3, 4, 160}

term, _, err := Decode(packet, []Atom{})
if err != nil {
Expand Down Expand Up @@ -401,9 +416,9 @@ func TestDecodeComplex(t *testing.T) {
expected := Tuple{"hello", List{},
Map{Atom("v1"): List{Tuple{3, 13, 3.13}, Tuple{Atom("abc"), "abc"}},
Atom("v2"): int64(12345)}}
packet := []byte{104, 3, 107, 0, 5, 104, 101, 108, 108, 111, 106, 116, 0, 0, 0, 2,
packet := []byte{104, 3, 109, 0, 0, 0, 5, 104, 101, 108, 108, 111, 106, 116, 0, 0, 0, 2,
100, 0, 2, 118, 49, 108, 0, 0, 0, 2, 104, 3, 97, 3, 97, 13, 70, 64, 9, 10,
61, 112, 163, 215, 10, 104, 2, 100, 0, 3, 97, 98, 99, 107, 0, 3, 97, 98,
61, 112, 163, 215, 10, 104, 2, 100, 0, 3, 97, 98, 99, 109, 0, 0, 0, 3, 97, 98,
99, 106, 100, 0, 2, 118, 50, 98, 0, 0, 48, 57}
term, _, err := Decode(packet, []Atom{})
if err != nil {
Expand Down
68 changes: 59 additions & 9 deletions etf/encode.go
Original file line number Diff line number Diff line change
Expand Up @@ -403,15 +403,15 @@ func Encode(term Term, b *lib.Buffer,
}
lenString := len(t)

if lenString > 65535 {
if lenString > math.MaxUint32 {
return ErrStringTooLong
}

// 1 (ettString) + 2 (len) + string
buf := b.Extend(1 + 2 + lenString)
buf[0] = ettString
binary.BigEndian.PutUint16(buf[1:3], uint16(lenString))
copy(buf[3:], t)
// 1 (ettBinary) + 4 (len) + string
buf := b.Extend(1 + 4 + lenString)
buf[0] = ettBinary
binary.BigEndian.PutUint32(buf[1:5], uint32(lenString))
copy(buf[5:], t)

case Atom:
if cacheEnabled && cacheIndex < 256 {
Expand Down Expand Up @@ -514,6 +514,38 @@ func Encode(term Term, b *lib.Buffer,
stringAsCharlist: stringAsCharlist,
}

case String:
// Spec: optimization for sending lists of bytes
// See: https://erlang.org/doc/apps/erts/erl_ext_dist.html#string_ext
lenString := len(t)

charlist := []rune(t)
lenCharlist := len(charlist)
// each character takes up only one byte in extended ASCII charset (i.e. Latin1)
if lenString != lenCharlist {
// Spec: when unicode detected, must send list of unicode/rune, as list of UTF8 bytes is a "StrangeList" in Erlang
// See: https://erlang.org/doc/apps/stdlib/unicode_usage.html#lists-of-utf-8-bytes
// Usage: erl +pc unicode -name erl-demo@localhost -setcookie 123
term = charlist
goto recasting
} else if lenString > 65535 {
// Spec: implementations must ensure that lists longer than 65535 elements are encoded as LIST_EXT.
// See: https://erlang.org/doc/apps/erts/erl_ext_dist.html#string_ext
l := make(List, lenString)
for i := 0; i < lenString; i++ {
// each element = one char
l[i] = t[i]
}
term = l
goto recasting
}

// 1 (ettString) + 2 (len) + string
buf := b.Extend(1 + 2 + lenString)
buf[0] = ettString
binary.BigEndian.PutUint16(buf[1:3], uint16(lenString))
copy(buf[3:], t)

case List:
lenList := len(t)
buf := b.Extend(5)
Expand All @@ -527,12 +559,30 @@ func Encode(term Term, b *lib.Buffer,
stringAsCharlist: stringAsCharlist,
}

case []rune:
lenList := len(t)
buf := b.Extend(5)
buf[0] = ettList
binary.BigEndian.PutUint32(buf[1:], uint32(lenList))
l := make(List, lenList)
for i := 0; i < lenList; i++ {
l[i] = t[i]
}
child = &stackElement{
parent: stack,
termType: ettList,
term: l,
children: lenList + 1,
stringAsCharlist: stringAsCharlist,
}

case []byte:
lenBinary := len(t)
buf := b.Extend(1 + 4 + lenBinary)
buf[0] = ettBinary
buf := b.Extend(1 + 4 + 1 + lenBinary)
buf[0] = ettBitBinary
Copy link
Collaborator

@halturin halturin Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your idea to use ettBinary as a transport for the string and ettBitBinary for the real binary data but it makes Ergo-Ergo interaction a bit harder for the case of string usage or for the case if I sent real binary (not a string) from the Erlang side. That's why I prefer to see

Ergo -> transport -> Erlang -> transport -> Ergo
[]byte ettBinary <<...>> ettBinary []byte
string (no utf8) ettString ".." ettString string
string (utf8) ettString [byte()] ettList string (via TermIntoStruct, TermMapIntoStruct, TermToString)
etf.Charlist ettList charlist ettList etf.Charlist (via ...)
etf.String ettBinary <<..>> ettBinary etf.String (via ...)

and here are prioritized transitions for me so far as it doesn't require any extra conversions.

Ergo -> transport -> Ergo
string (utf8) ettString string
[]byte ettBinary []byte

Copy link
Author

@heri16 heri16 Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit was based on your previous input that this library's priority is Ergo <-> Ergo.

If we start considering the case for Erlang, we should also consider the case for Elixir. Elixir strings are binaries.

Do also note that the current ergo implementation can only send ASCII strings to Erlang and there are no safety checks to ensure that the user passes only ASCII goStrings.

Rather than have golang side waste CPU cycles to check if a string contains utf8 or not, it's why etf.String was added to support sending legacy style ASCII-only strings to Erlang.

Ergo -> transport -> Elixir transport -> Ergo
[]byte ettBitBinary <<.....>> ettBitBinary []byte
string (ascii) ettBinary "...." ettBinary string
string (utf8) ettBinary "...." ettBinary string
etf.String ettString '....' or [ , , ] ettString etf.String

Note single quotes in Elixir produce charlists (and only support ascii characters), unlike double quotes .
Charlists are defined as a linked list of positive integers that can use [ h | tail ] pattern matching. (Charlist is not a concrete type in elixir).

ettString automatically becomes a charlist in Elixir (and are displayed as string with single quotes in Elixir shell).

List of positive integers (charlist) are also displayed as string with quotes in Erlang shell: https://erlang.org/doc/apps/stdlib/unicode_usage.html#heuristic-string-detection
This is because "..." in Erlang by default creates a list of integers (i.e. charlist).

There is no impact on Ergo <-> Ergo integration.

Ergo -> transport -> Ergo
string (utf8) ettBinary string
byte[] ettBitBinary []byte

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about Elixir <<...>>> -> ettBitBinary. May I ask you to show the same output

16> term_to_binary(<<1,2,3>>).
<<131,109,0,0,0,3,1,2,3>>

but in Elixir shell? (I'm not familiar with it)

Copy link
Collaborator

@halturin halturin Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just found

iex(1)> :erlang.term_to_binary(<<1,2,3>>);
<<131, 109, 0, 0, 0, 3, 1, 2, 3>>

as you may notice it was encoded as ettBinary (109) which means there is no way to get []byte on the Ergo side using your approach.

Copy link
Author

@heri16 heri16 Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elixir:

Interactive Elixir (1.12.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> <<195,165,195,164,195,182>> 
"åäö"
iex(2)> :erlang.term_to_binary("åäö")
<<131, 109, 0, 0, 0, 6, 195, 165, 195, 164, 195, 182>>
iex(3)> :erlang.term_to_binary("123")
<<131, 109, 0, 0, 0, 3, 49, 50, 51>>
iex(4)> :erlang.term_to_binary("日本")
<<131, 109, 0, 0, 0, 6, 230, 151, 165, 230, 156, 172>>

Erlang:

Eshell V10.7.2.12  (abort with ^G)
1> <<195,165,195,164,195,182>>.
<<"åäö"/utf8>>
2> term_to_binary(<<"åäö"/utf8>>).
<<131,109,0,0,0,6,195,165,195,164,195,182>>
3> term_to_binary("åäö").
<<131,107,0,3,229,228,246>>
4> term_to_binary("123"). 
<<131,107,0,3,49,50,51>>
5> term_to_binary("日本").
<<131,108,0,0,0,2,98,0,0,101,229,98,0,0,103,44,106>>

I do understand your point, but it's not a coincidence that we can Use Strings as Byte Slices in golang: https://go101.org/article/string.html#use-string-as-byte-slice
(Not just for copy/append, but even when indexing a string.)

Since string is just an immutable []byte according to Rob Pike...

So i think the question would be, should decoded binary values be immutable or mutable?
Does immutability in this case help prevent a class programming bugs?

This topic is still a big contention even within the Golang Issue Tracker:
See "Strengths of This Proposal" from:

Which is why I would approach it from language design: what is a string in Elixir, what is a string in Erlang, and what is a String in Golang?

And i find that the common denominator is that strings are just an immutable sequence of bytes in all three languages.

Copy link
Author

@heri16 heri16 Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, 107 is just an optimisation on 108 (or 109).

Try searching for "StrangeList" in the Erlang docs. (Those are caused by 107).

Copy link
Author

@heri16 heri16 Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another Interesting Side Note:

All of Erlang standard lib and modern 3rd-party Erlang libraries, can always accept (and behave the same on) either 108 (CharList) or 109 (Binary), but not always 107 (StrangeList).

We have tested this pretty extensively.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just make this configurable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate your effort to make this project better, but seemingly this approach differs from the way this project goes.

binary.BigEndian.PutUint32(buf[1:5], uint32(lenBinary))
copy(buf[5:], t)
buf[5] = 8 // 1 byte = 8 bits
copy(buf[6:], t)

default:
v := reflect.ValueOf(t)
Expand Down
Loading