-
Notifications
You must be signed in to change notification settings - Fork 192
Getting Started
NOTE: This document is still being developed. See issue #78. If you have questions or find bugs in this documentation, please file an issue.
You can download and install msgp
using the standard go
toolchain. For Go 1.17 and later do:
$ go install github.com/tinylib/msgp@latest
Or for earlier Go versions:
$ go get -u -t github.com/tinylib/msgp
- a tool that generates code, and
- a library that is used by the generated code.
It is NOT used in a conventional manner like most packages are.
Optimization of msgp is made possible through manually pre-defining values of msgp variable and generating a golang library for use in your program.
The primary difference between msgp
and other serialization libraries for Go (such as those found in the standard library) is that msgp
doesn't perform runtime reflection. Instead, the msgp
tool reads .go
source files and generates code that binds methods to your existing type declarations.
First, create a new directory in your GOPATH
, and create main.go
.
$ mkdir -p $GOPATH/src/msgp-demo
$ cd $GOPATH/src/msgp-demo
$ touch main.go
Then open main.go
in your editor of choice and add the following:
package main
import (
"fmt"
)
//go:generate msgp
type Foo struct {
Bar string `msg:"bar"`
Baz float64 `msg:"baz"`
}
func main() {
fmt.Println("Nothing to see here yet!")
}
(You can verify that this builds and runs with $ go build && ./msgp-demo
.)
Now let's bind some methods to Foo
by running go generate
:
$ go generate
======== MessagePack Code Generator =======
>>> Input: "main.go"
>>> Wrote and formatted "main_gen.go"
>>> Wrote and formatted "main_gen_test.go"
$ ls
main.go main_gen.go main_gen_test.go
$ go test -v -bench .
=== RUN TestMarshalUnmarshalFoo
--- PASS: TestMarshalUnmarshalFoo (0.00s)
=== RUN TestEncodeDecodeFoo
--- PASS: TestEncodeDecodeFoo (0.00s)
PASS
BenchmarkMarshalMsgFoo-8 20000000 97.9 ns/op 32 B/op 1 allocs/op
BenchmarkAppendMsgFoo-8 30000000 41.4 ns/op 458.43 MB/s 0 B/op 0 allocs/op
BenchmarkUnmarshalFoo-8 20000000 94.7 ns/op 200.57 MB/s 0 B/op 0 allocs/op
BenchmarkEncodeFoo-8 20000000 57.2 ns/op 332.15 MB/s 0 B/op 0 allocs/op
BenchmarkDecodeFoo-8 10000000 135 ns/op 140.50 MB/s 0 B/op 0 allocs/op
ok msgp-demo 9.712s
Let's break down what happened here:
-
go generate
scanned each file inmsgp-demo
for ago:generate
directive. -
//go:generate msgp
was found in main.go, which caused$GOFILE
to be set tomain.go
-
msgp
was invoked bygo generate
, and it parsed$GOFILE
and extracted type declarations. -
msgp
createdmain_gen.go
, which contains all of the generated methods, andmain_gen_test.go
, which has tests and benchmarks for each generated method.
The key takeaway here is that msgp
works on a per-file, not a per-package basis. (You can, however, invoke the code generator on an entire directory at once by passing a directory path using the -file
flag.)
There are a couple reasons why we designed msgp
to operate on files rather than on go packages:
- Integration with build tools like
make
is dead simple. - Reading one file is much faster than reading a whole directory. The
msgp
tool itself typically runs in less time than thego generate
tool takes just to find the directive.
Our suggestion is that users put types requiring code generation in their own file (say, wiretypes.go
), and put //go:generate msgp
at the top. However, other workflows are possible.
Let's look at the generated code in main_gen.go
:
(Note: the interfaces that the code generator implements are stable, but the code that it generates in order to implement those interfaces has changed over time in order to provide performance and stability improvements. Don't be alarmed if you see output that's different from what is listed below.)
package main
// NOTE: THIS FILE WAS PRODUCED BY THE
// MSGP CODE GENERATION TOOL (github.com/tinylib/msgp)
// DO NOT EDIT
import (
"github.com/tinylib/msgp/msgp"
)
// DecodeMsg implements msgp.Decodable
func (z *Foo) DecodeMsg(dc *msgp.Reader) (err error) {
var field []byte
_ = field
var isz uint32
isz, err = dc.ReadMapHeader()
if err != nil {
return
}
for isz > 0 {
isz--
field, err = dc.ReadMapKeyPtr()
if err != nil {
return
}
switch msgp.UnsafeString(field) {
case "bar":
z.Bar, err = dc.ReadString()
if err != nil {
return
}
case "baz":
z.Baz, err = dc.ReadFloat64()
if err != nil {
return
}
default:
err = dc.Skip()
if err != nil {
return
}
}
}
return
}
// EncodeMsg implements msgp.Encodable
func (z Foo) EncodeMsg(en *msgp.Writer) (err error) {
// map header, size 2
// write "bar"
err = en.Append(0x82, 0xa3, 0x62, 0x61, 0x72)
if err != nil {
return err
}
err = en.WriteString(z.Bar)
if err != nil {
return
}
// write "baz"
err = en.Append(0xa3, 0x62, 0x61, 0x7a)
if err != nil {
return err
}
err = en.WriteFloat64(z.Baz)
if err != nil {
return
}
return
}
// MarshalMsg implements msgp.Marshaler
func (z Foo) MarshalMsg(b []byte) (o []byte, err error) {
o = msgp.Require(b, z.Msgsize())
// map header, size 2
// string "bar"
o = append(o, 0x82, 0xa3, 0x62, 0x61, 0x72)
o = msgp.AppendString(o, z.Bar)
// string "baz"
o = append(o, 0xa3, 0x62, 0x61, 0x7a)
o = msgp.AppendFloat64(o, z.Baz)
return
}
// UnmarshalMsg implements msgp.Unmarshaler
func (z *Foo) UnmarshalMsg(bts []byte) (o []byte, err error) {
var field []byte
_ = field
var isz uint32
isz, bts, err = msgp.ReadMapHeaderBytes(bts)
if err != nil {
return
}
for isz > 0 {
isz--
field, bts, err = msgp.ReadMapKeyZC(bts)
if err != nil {
return
}
switch msgp.UnsafeString(field) {
case "bar":
z.Bar, bts, err = msgp.ReadStringBytes(bts)
if err != nil {
return
}
case "baz":
z.Baz, bts, err = msgp.ReadFloat64Bytes(bts)
if err != nil {
return
}
default:
bts, err = msgp.Skip(bts)
if err != nil {
return
}
}
}
o = bts
return
}
func (z Foo) Msgsize() (s int) {
s = 1 + 4 + msgp.StringPrefixSize + len(z.Bar) + 4 + msgp.Float64Size
return
}
As we just saw, by default there are 5 methods implemented by the code generator:
-
MarshalMsg([]byte) ([]byte, error)
implementsmsgp.Marshaler
-
UnmarshalMsg([]byte) ([]byte, error)
implementsmsgp.Unmarshaler
-
EncodeMsg(*msgp.Writer) error
implementsmsgp.Encodable
-
DecodeMsg(*msgp.Reader) error
implementsmsgp.Decodable
-
Msgsize() int
implementsmsgp.Sizer
Each of those methods is actually an implementation of an interface defined in the msgp
library. In effect, the library at github.com/tinylib/msgp/msgp
contains everything we need to encode and decode MessagePack, and the code generator exists simply to write boilerplate code using that library. We could, of course, implement all of these interfaces ourselves, but that would be unnecessarily laborious and error-prone. (Plus, the code generator can perform optimizations like pre-encoding static strings, like the example above. This would be especially cumbersome to write by hand!)
The "memory interfaces" are interfaces through which chunks of memory ([]byte
, in this case) are written or read as MessagePack.
Go veterans will notice that msgp.Marshaler
differs slightly from the conventional Marshaler
interfaces in the standard library (json.Marshaler
and friends) in that it takes a []byte
as its first and only argument. The semantics of msgp.Marshaler
dictate that it return a slice that is the concatenation of the input slice and the body of the object itself, and that it is allowed to use the memory between len
and cap
if at all possible. In practice, this allows for zero-allocation marshaling. (If you don't happen to have a slice lying around that you can use, you can always pass a nil
slice, and a new slice will be allocated for you.) There is a similar set of zero-allocation APIs in the standard library's strconv
package.
foo1 := Foo{ /* ... */ }
foo2 := Foo{ /* ... */ }
// data contains the body of foo1
data, _ := foo1.MarshalMsg(nil)
fmt.Printf("foo1 is encoded as %x\n", data)
// data is overwritten with the
// body of foo2. if it fits within
// the old slice, no new memory
// is allocated.
data, _ = foo2.MarshalMsg(data[:0])
fmt.Printf("foo2 is encoded as %x\n", data)
As you may have already guessed, the msgp.Unmarshaler
interface is simply the inverse of the msgp.Marshaler
interface. The returned []byte
should be a sub-slice of the argument slice pointing to the memory not yet consumed.
For example, here's a convoluted way to switch the values contained in two structs:
foo1 := Foo{ /* ... */ }
foo2 := Foo{ /* ... */ }
fmt.Printf("foo1: %v\n", foo1)
fmt.Printf("foo2: %v\n", foo2)
// Here, we append two messages
// to the same slice.
data, _ := foo1.MarshalMsg(nil)
data, _ = foo2.MarshalMsg(data)
// Now we'll just decode them
// in reverse:
data, _ = foo2.UnmarshalMsg(data)
data, _ = foo1.UnmarshalMsg(data)
// at this point, len(data) should be 0
fmt.Println("len(data) =", len(data))
fmt.Printf("foo1: %v", foo1)
fmt.Printf("foo2: %v", foo2)
Because MessagePack is self-describing, we can interleave it with other pieces of data without framing and still re-construct the original input. (Notably, the same cannot be said of a number of other popular protocols, including Protocol Buffers.)
"Streaming interfaces" are interfaces through which MessagePack can be written to an io.Writer
or read from an io.Reader
.
msgp
handles streaming a little differently than the Go standard library. The msgp.Writer
and msgp.Reader
types are MessagePack-aware versions of bufio.Writer
and bufio.Reader
, respectively.
The implementation of msgp.Encodable
writes the object to the msgp.Writer
. Since the buffered writer maintains its own buffer, no memory allocation is performed.
foo := Foo{ /* ... */ }
w := msgp.NewWriter(os.Stdout)
foo.EncodeMsg(w)
w.Flush()
msgp.Decodable
, as you may have already guessed, is the converse of msgp.Encodable
. It is the interface through which objects read themselves out of a msgp.Reader
.
pr, pw := io.Pipe()
go func() {
w := msgp.NewWriter(pw)
fooIn := Foo{ /* ... */ }
fmt.Printf("fooIn is %v\n", fooIn)
fooIn.EncodeMsg(w)
w.Flush()
}()
var fooOut Foo
fooOut.DecodeMsg(msgp.NewReader(pr))
fmt.Printf("fooOut is %v\n", fooOut)
msgp.Sizer
is a helper interface used in a couple places inside the msgp
library, as well as in the implementation of msgp.Marshaler
. Users will typically not need to use it. Its purpose is to help estimate the right amount of memory to allocate in order to fit a particular object. (In practice, it systematically over-estimates the encoded size of the object.)
TODO: edit main.go
so that it prints the raw hex of Foo
along with its JSON-equivalent plaintext representation.
TODO: point to other wiki documents that document more complicated features.