Alternative data format for matrix shards storage #5

5kg · 2015-06-11T12:52:01Z

Quote 1, 2, 3:

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

For example, you have to parse a message all at once, and serialize it all at once. So if you have a message containing a 100MB blob, you can't read any part of the message unless you read in the entire 100MB and block the calling thread while it parses. Also problematic is the fact that the 100MB blob will be allocated as one gigantic flat byte array. On 64-bit systems this may be fine but on 32-bit you may have address space fragmentation issues. Finally, there is a hard message size limit at 2GB.

Options:

csv
hdf5
leveldb
lmdb

We can probably use snappy for data compression.

hongchaodeng · 2015-06-11T15:15:03Z

Thanks @5kg for sharing. This is interesting.
Do you suggest that we shouldn't use protobuf to transfer large dataset of models? Otherwise, can you give more details w.r.t. other options?

xiang90 · 2015-06-11T15:19:40Z

@5kg

Well. I was also thinking about this when I wanted to use pb to send a 1GB snapshot.

Then I asked the designer of protobuf and the one who implemented go-protobuf.

They told me "never mind, you can do it actually". Then I benchmark it, all thing just works fine.

I suggest you to benchmark it first.

I am not against to use any compression mechanism. However, I am not quite sure why lmdb,leveldb(kv db), csv,hdf5(format), snappy(compression) group together. A little confused.

5kg · 2015-06-15T03:16:26Z

Hi,

The original description is indeed confusing. Sorry for that.

I think the data format for matrix shards storage and transfer between taskgraph nodes are two separated issues. Of course, if we found a data format fits both, that will be great.

For data transfer, we may take a look at flatbuffers, or Cap’n Proto. They are both featuring "access to serialized data without parsing/unpacking", which might be favorable for our usage (transferring huge float array). I've not used these two library before. We need do some additional research and benchmark before adopting those libraries. Or maybe just wait until protobuf becomes the bottleneck.

For data storage, there is a 2G hard limit for the size of protobuf message, since they encode data use 32-bit integer I think. Such a limit has been affecting people ignoring the "limit you size of message to 1 MB" rule of thumb -- BVLC/caffe#2006 😄 .

Most commonly used data format for storage are csv & hdf5. Some machine learning library like caffe also use lmdb/leveldb for data storage. Their use cases for kv store are mostly fetching independent data bolbs such as image, which I think is irrelevant for bwmf.

5kg · 2015-06-15T03:25:02Z

In term of snappy, we could integrated it into taskgraph?

Not a priority for now.

xiang90 · 2015-06-15T03:25:50Z

For data transfer, we may take a look at flatbuffers, or Cap’n Proto. They are both featuring "access to serialized data without parsing/unpacking"

Several things to think about it:
These encoding mechanisms simply amortize the overall encoding/decoding cost into each operation.

If you only access the data once, it would be slower than encoding/decoding in a bulk intuitively and in theory.

It really depends on your data access pattern. If you can provide the common access pattern of bwmf, we can make a better decision.

For data storage, there is a 2G hard limit for the size of protobuf message

Reference? On a 64bit machine, I do not think there is a 2GB limit actually.

xiang90 · 2015-06-15T03:28:06Z

Most commonly used data format for storage are csv & hdf5. Some machine learning library like caffe also use lmdb/leveldb for data storage. Their use cases for kv store are mostly fetching independent data bolbs such as image, which I think is irrelevant for bwmf.

If you use flatbuffers or Cap’n Proto, one nice thing you would get for free is the data is mmap-able. Thus you do not need to save them into any kv store.

xiang90 · 2015-06-15T03:30:21Z

In term of snappy, we could integrated it into taskgraph?

I think grpc will support snappy eventually. So you do not need to worry about it for communication.

5kg · 2015-06-15T03:31:08Z

Reference? On a 64bit machine, I do not think there is a 2GB limit actually.

They use int explicitly: https://github.com/google/protobuf/blob/4644f99d1af4250dec95339be6a13e149787ab33/src/google/protobuf/message_lite.cc#L243

hongchaodeng · 2015-06-15T03:32:36Z

Could we write a test case to try the 2GB limit?

xiang90 · 2015-06-15T03:32:59Z

@5kg int on 32bit = 32 bit; int on 64bit = 64bit. int32=32; int64=64.

I may be wrong though... Using go too much.

hongchaodeng · 2015-06-15T03:33:49Z

That's c++

5kg · 2015-06-15T03:34:14Z

@5kg int on 32bit = 32 bit; int on 64bit = 64bit. int32=32; int64=64

For c++, sizeof(int) = 4 on 64bit machine. Maybe it's only a limit for c++?

Let me write a test for it.

hongchaodeng · 2015-06-15T03:34:57Z

Great. Thanks @5kg !

5kg · 2015-06-18T05:08:50Z

@fengjingchao @xiang90
I tried to generate protobuf files larger than 2g -> https://gist.github.com/5kg/f27a32f238c376635024

Marshaling/unmarshaling seems to be OK. But I found the memory usage blowed up during the process. It uses as much as 20g memory unmarshaling a 2.8g protobuf file.

xiang90 · 2015-06-18T05:10:29Z

@5kg Have you tried gogoproto? I think that is the problem with the inefficient goproto library...

5kg · 2015-06-18T05:43:26Z

@xiang90 Never heard about gogoproto. Good to know.

All I need to do is change the test.pb.go with protoc --gofast_out=. test.proto? I don't see much difference.

Before:

$ /usr/bin/time -v go run unmarshal.go
    Command being timed: "go run unmarshal.go"
    User time (seconds): 28.09
    System time (seconds): 25.28
    Percent of CPU this job got: 89%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:59.73
......
    Maximum resident set size (kbytes): 20805872 # (20.8059 G)
......
    Exit status: 0

After:

zifei@wuxiaoyun-desktop:/tmp/f27a32f238c376635024$ /usr/bin/time -v go run unmarshal.go
    Command being timed: "go run unmarshal.go"
    User time (seconds): 17.43
    System time (seconds): 34.02
    Percent of CPU this job got: 90%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:56.63
......
    Maximum resident set size (kbytes): 20800400
......
    Exit status: 0

5kg · 2015-06-18T05:44:17Z

I also changed import "github.com/golang/protobuf/proto" to import "github.com/gogo/protobuf/proto"

hongchaodeng · 2015-06-18T05:46:04Z

Can you upload the code to gist with the commands too?
We can investigate it aligned with you.

On Wed, Jun 17, 2015 at 10:44 PM, Zifei Tong notifications@github.com
wrote:

I also changed import "github.com/golang/protobuf/proto" to import "
github.com/gogo/protobuf/proto"

—
Reply to this email directly or view it on GitHub
#5 (comment).

- Hongchao Deng
Software Engineer

5kg · 2015-06-18T05:50:59Z

@fengjingchao Sure. Please see https://gist.github.com/5kg/f27a32f238c376635024/revisions

You can clone the gist then checkout an older version.

xiang90 · 2015-06-18T05:55:51Z

@5kg You need to set https://github.com/coreos/etcd/blob/master/snap/snappb/snap.proto#L5-L8

xiang90 · 2015-06-18T05:57:13Z

@5kg gogoproto will generate marshal code, which is significantly better than reflection based marshaling.

5kg · 2015-06-18T06:35:27Z

No luck. It's faster but memory usage is still the same.

zifei@wuxiaoyun-desktop:/tmp/f27a32f238c376635024$ /usr/bin/time -v go run unmarshal.go
    Command being timed: "go run unmarshal.go"
    User time (seconds): 13.53
    System time (seconds): 12.72
    Percent of CPU this job got: 75%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:34.91
    Maximum resident set size (kbytes): 20803320
    Exit status: 0

Updated gist: https://gist.github.com/5kg/f27a32f238c376635024

xiang90 · 2015-06-18T06:39:16Z

@5kg Wired... Can you try to use go's runtime pkg to print out the memstats?

xiang90 · 2015-06-18T06:39:29Z

@5kg I can help you to debug this when I have time too.

xiang90 · 2015-06-18T06:42:08Z

@5kg Also you can quickly scan the generated code. It should be very easy to understand.

xiaoyunwu · 2015-06-19T13:37:01Z

is it because this is map in the proto message?

Xiaoyun

On Wed, Jun 17, 2015 at 11:35 PM, Zifei Tong notifications@github.com
wrote:

No luck. It's faster but memory usage is still the same.

zifei@wuxiaoyun-desktop:/tmp/f27a32f238c376635024$ /usr/bin/time -v go run unmarshal.go
Command being timed: "go run unmarshal.go"
User time (seconds): 13.53
System time (seconds): 12.72
Percent of CPU this job got: 75%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:34.91
Maximum resident set size (kbytes): 20803320
Exit status: 0

Updated gist: https://gist.github.com/5kg/f27a32f238c376635024

—
Reply to this email directly or view it on GitHub
#5 (comment).

5kg changed the title ~~Alternative data format for matrix shards~~ Alternative data format for matrix shards storage Jun 15, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative data format for matrix shards storage #5

Alternative data format for matrix shards storage #5

5kg commented Jun 11, 2015

hongchaodeng commented Jun 11, 2015

xiang90 commented Jun 11, 2015

5kg commented Jun 15, 2015

5kg commented Jun 15, 2015

xiang90 commented Jun 15, 2015

xiang90 commented Jun 15, 2015

xiang90 commented Jun 15, 2015

5kg commented Jun 15, 2015

hongchaodeng commented Jun 15, 2015

xiang90 commented Jun 15, 2015

hongchaodeng commented Jun 15, 2015

5kg commented Jun 15, 2015

hongchaodeng commented Jun 15, 2015

5kg commented Jun 18, 2015

xiang90 commented Jun 18, 2015

5kg commented Jun 18, 2015

5kg commented Jun 18, 2015

hongchaodeng commented Jun 18, 2015

5kg commented Jun 18, 2015

xiang90 commented Jun 18, 2015

xiang90 commented Jun 18, 2015

5kg commented Jun 18, 2015

xiang90 commented Jun 18, 2015

xiang90 commented Jun 18, 2015

xiang90 commented Jun 18, 2015

xiaoyunwu commented Jun 19, 2015

Alternative data format for matrix shards storage #5

Alternative data format for matrix shards storage #5

Comments

5kg commented Jun 11, 2015

hongchaodeng commented Jun 11, 2015

xiang90 commented Jun 11, 2015

5kg commented Jun 15, 2015

5kg commented Jun 15, 2015

xiang90 commented Jun 15, 2015

xiang90 commented Jun 15, 2015

xiang90 commented Jun 15, 2015

5kg commented Jun 15, 2015

hongchaodeng commented Jun 15, 2015

xiang90 commented Jun 15, 2015

hongchaodeng commented Jun 15, 2015

5kg commented Jun 15, 2015

hongchaodeng commented Jun 15, 2015

5kg commented Jun 18, 2015

xiang90 commented Jun 18, 2015

5kg commented Jun 18, 2015

5kg commented Jun 18, 2015

hongchaodeng commented Jun 18, 2015

5kg commented Jun 18, 2015

xiang90 commented Jun 18, 2015

xiang90 commented Jun 18, 2015

5kg commented Jun 18, 2015

xiang90 commented Jun 18, 2015

xiang90 commented Jun 18, 2015

xiang90 commented Jun 18, 2015

xiaoyunwu commented Jun 19, 2015