Skip to content
Alejandro García Montoro edited this page Jul 29, 2024 · 27 revisions

NOTE: This document is still being developed. See issue #78. If you have questions or find bugs in this documentation, please file an issue.

Installation

You can download and install msgp using the standard go toolchain. For Go 1.17 and later do:

$ go install github.com/tinylib/msgp@latest

Or for earlier Go versions:

$ go get -u -t github.com/tinylib/msgp

Serialization Strategy: Tool + Library

Important Notice : msgp has two parts:

  1. a tool that generates code, and
  2. a library that is used by the generated code.

It is NOT used in a conventional manner like most packages are.

Optimization of msgp is made possible through manually pre-defining values of msgp variable and generating a golang library for use in your program.

Golang Understanding Level : Intermediate to Advanced

The primary difference between msgp and other serialization libraries for Go (such as those found in the standard library) is that msgp doesn't perform runtime reflection. Instead, the msgp tool reads .go source files and generates code that binds methods to your existing type declarations.

Hello World

First, create a new directory in your GOPATH, and create main.go.

$ mkdir -p $GOPATH/src/msgp-demo
$ cd $GOPATH/src/msgp-demo
$ touch main.go

Then open main.go in your editor of choice and add the following:

package main

import (
    "fmt"
)

//go:generate msgp

type Foo struct {
    Bar string  `msg:"bar"`
    Baz float64 `msg:"baz"`
}

func main() {
    fmt.Println("Nothing to see here yet!")
}

(You can verify that this builds and runs with $ go build && ./msgp-demo.)

Now let's bind some methods to Foo by running go generate:

$ go generate
======== MessagePack Code Generator =======
>>> Input: "main.go"
>>> Wrote and formatted "main_gen.go"
>>> Wrote and formatted "main_gen_test.go"
$ ls
main.go			main_gen.go		main_gen_test.go
$ go test -v -bench .
=== RUN   TestMarshalUnmarshalFoo
--- PASS: TestMarshalUnmarshalFoo (0.00s)
=== RUN   TestEncodeDecodeFoo
--- PASS: TestEncodeDecodeFoo (0.00s)
PASS
BenchmarkMarshalMsgFoo-8	20000000	        97.9 ns/op	      32 B/op	       1 allocs/op
BenchmarkAppendMsgFoo-8 	30000000	        41.4 ns/op	 458.43 MB/s	       0 B/op	       0 allocs/op
BenchmarkUnmarshalFoo-8 	20000000	        94.7 ns/op	 200.57 MB/s	       0 B/op	       0 allocs/op
BenchmarkEncodeFoo-8    	20000000	        57.2 ns/op	 332.15 MB/s	       0 B/op	       0 allocs/op
BenchmarkDecodeFoo-8    	10000000	       135 ns/op	 140.50 MB/s	       0 B/op	       0 allocs/op
ok  	msgp-demo	9.712s

Let's break down what happened here:

  • go generate scanned each file in msgp-demo for a go:generate directive.
  • //go:generate msgp was found in main.go, which caused $GOFILE to be set to main.go
  • msgp was invoked by go generate, and it parsed $GOFILE and extracted type declarations.
  • msgp created main_gen.go, which contains all of the generated methods, and main_gen_test.go, which has tests and benchmarks for each generated method.

The key takeaway here is that msgp works on a per-file, not a per-package basis. (You can, however, invoke the code generator on an entire directory at once by passing a directory path using the -file flag.)

There are a couple reasons why we designed msgp to operate on files rather than on go packages:

  • Integration with build tools like make is dead simple.
  • Reading one file is much faster than reading a whole directory. The msgp tool itself typically runs in less time than the go generate tool takes just to find the directive.

Our suggestion is that users put types requiring code generation in their own file (say, wiretypes.go), and put //go:generate msgp at the top. However, other workflows are possible.

Let's look at the generated code in main_gen.go:

(Note: the interfaces that the code generator implements are stable, but the code that it generates in order to implement those interfaces has changed over time in order to provide performance and stability improvements. Don't be alarmed if you see output that's different from what is listed below.)

package main

// NOTE: THIS FILE WAS PRODUCED BY THE
// MSGP CODE GENERATION TOOL (github.com/tinylib/msgp)
// DO NOT EDIT

import (
	"github.com/tinylib/msgp/msgp"
)

// DecodeMsg implements msgp.Decodable
func (z *Foo) DecodeMsg(dc *msgp.Reader) (err error) {
	var field []byte
	_ = field
	var isz uint32
	isz, err = dc.ReadMapHeader()
	if err != nil {
		return
	}
	for isz > 0 {
		isz--
		field, err = dc.ReadMapKeyPtr()
		if err != nil {
			return
		}
		switch msgp.UnsafeString(field) {
		case "bar":
			z.Bar, err = dc.ReadString()
			if err != nil {
				return
			}
		case "baz":
			z.Baz, err = dc.ReadFloat64()
			if err != nil {
				return
			}
		default:
			err = dc.Skip()
			if err != nil {
				return
			}
		}
	}
	return
}

// EncodeMsg implements msgp.Encodable
func (z Foo) EncodeMsg(en *msgp.Writer) (err error) {
	// map header, size 2
	// write "bar"
	err = en.Append(0x82, 0xa3, 0x62, 0x61, 0x72)
	if err != nil {
		return err
	}
	err = en.WriteString(z.Bar)
	if err != nil {
		return
	}
	// write "baz"
	err = en.Append(0xa3, 0x62, 0x61, 0x7a)
	if err != nil {
		return err
	}
	err = en.WriteFloat64(z.Baz)
	if err != nil {
		return
	}
	return
}

// MarshalMsg implements msgp.Marshaler
func (z Foo) MarshalMsg(b []byte) (o []byte, err error) {
	o = msgp.Require(b, z.Msgsize())
	// map header, size 2
	// string "bar"
	o = append(o, 0x82, 0xa3, 0x62, 0x61, 0x72)
	o = msgp.AppendString(o, z.Bar)
	// string "baz"
	o = append(o, 0xa3, 0x62, 0x61, 0x7a)
	o = msgp.AppendFloat64(o, z.Baz)
	return
}

// UnmarshalMsg implements msgp.Unmarshaler
func (z *Foo) UnmarshalMsg(bts []byte) (o []byte, err error) {
	var field []byte
	_ = field
	var isz uint32
	isz, bts, err = msgp.ReadMapHeaderBytes(bts)
	if err != nil {
		return
	}
	for isz > 0 {
		isz--
		field, bts, err = msgp.ReadMapKeyZC(bts)
		if err != nil {
			return
		}
		switch msgp.UnsafeString(field) {
		case "bar":
			z.Bar, bts, err = msgp.ReadStringBytes(bts)
			if err != nil {
				return
			}
		case "baz":
			z.Baz, bts, err = msgp.ReadFloat64Bytes(bts)
			if err != nil {
				return
			}
		default:
			bts, err = msgp.Skip(bts)
			if err != nil {
				return
			}
		}
	}
	o = bts
	return
}

func (z Foo) Msgsize() (s int) {
	s = 1 + 4 + msgp.StringPrefixSize + len(z.Bar) + 4 + msgp.Float64Size
	return
}

As we just saw, by default there are 5 methods implemented by the code generator:

Each of those methods is actually an implementation of an interface defined in the msgp library. In effect, the library at github.com/tinylib/msgp/msgp contains everything we need to encode and decode MessagePack, and the code generator exists simply to write boilerplate code using that library. We could, of course, implement all of these interfaces ourselves, but that would be unnecessarily laborious and error-prone. (Plus, the code generator can perform optimizations like pre-encoding static strings, like the example above. This would be especially cumbersome to write by hand!)

Memory Interfaces

The "memory interfaces" are interfaces through which chunks of memory ([]byte, in this case) are written or read as MessagePack.

Go veterans will notice that msgp.Marshaler differs slightly from the conventional Marshaler interfaces in the standard library (json.Marshaler and friends) in that it takes a []byte as its first and only argument. The semantics of msgp.Marshaler dictate that it return a slice that is the concatenation of the input slice and the body of the object itself, and that it is allowed to use the memory between len and cap if at all possible. In practice, this allows for zero-allocation marshaling. (If you don't happen to have a slice lying around that you can use, you can always pass a nil slice, and a new slice will be allocated for you.) There is a similar set of zero-allocation APIs in the standard library's strconv package.

foo1 := Foo{ /* ... */ }
foo2 := Foo{ /* ... */ }

// data contains the body of foo1
data, _ := foo1.MarshalMsg(nil)

fmt.Printf("foo1 is encoded as %x\n", data)

// data is overwritten with the
// body of foo2. if it fits within
// the old slice, no new memory
// is allocated.
data, _ = foo2.MarshalMsg(data[:0])

fmt.Printf("foo2 is encoded as %x\n", data)

As you may have already guessed, the msgp.Unmarshaler interface is simply the inverse of the msgp.Marshaler interface. The returned []byte should be a sub-slice of the argument slice pointing to the memory not yet consumed.

For example, here's a convoluted way to switch the values contained in two structs:

foo1 := Foo{ /* ... */ }
foo2 := Foo{ /* ... */ }

fmt.Printf("foo1: %v\n", foo1)
fmt.Printf("foo2: %v\n", foo2)

// Here, we append two messages
// to the same slice.
data, _ := foo1.MarshalMsg(nil)
data, _ = foo2.MarshalMsg(data)

// Now we'll just decode them
// in reverse:
data, _ = foo2.UnmarshalMsg(data)
data, _ = foo1.UnmarshalMsg(data)

// at this point, len(data) should be 0
fmt.Println("len(data) =", len(data))

fmt.Printf("foo1: %v", foo1)
fmt.Printf("foo2: %v", foo2)

Because MessagePack is self-describing, we can interleave it with other pieces of data without framing and still re-construct the original input. (Notably, the same cannot be said of a number of other popular protocols, including Protocol Buffers.)

Streaming Interfaces

"Streaming interfaces" are interfaces through which MessagePack can be written to an io.Writer or read from an io.Reader.

msgp handles streaming a little differently than the Go standard library. The msgp.Writer and msgp.Reader types are MessagePack-aware versions of bufio.Writer and bufio.Reader, respectively.

The implementation of msgp.Encodable writes the object to the msgp.Writer. Since the buffered writer maintains its own buffer, no memory allocation is performed.

foo := Foo{ /* ... */ }

w := msgp.NewWriter(os.Stdout)
foo.EncodeMsg(w)
w.Flush()

msgp.Decodable, as you may have already guessed, is the converse of msgp.Encodable. It is the interface through which objects read themselves out of a msgp.Reader.

pr, pw := io.Pipe()

go func() {
    w := msgp.NewWriter(pw)
    fooIn := Foo{ /* ... */ }
    fmt.Printf("fooIn is %v\n", fooIn)
    fooIn.EncodeMsg(w)
    w.Flush()
}()

var fooOut Foo
fooOut.DecodeMsg(msgp.NewReader(pr))

fmt.Printf("fooOut is %v\n", fooOut)

Helper Methods

msgp.Sizer is a helper interface used in a couple places inside the msgp library, as well as in the implementation of msgp.Marshaler. Users will typically not need to use it. Its purpose is to help estimate the right amount of memory to allocate in order to fit a particular object. (In practice, it systematically over-estimates the encoded size of the object.)

Revisiting main

TODO: edit main.go so that it prints the raw hex of Foo along with its JSON-equivalent plaintext representation.

TODO: point to other wiki documents that document more complicated features.