Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple database wrappers #1748

Merged
merged 4 commits into from
Jan 30, 2015
Merged

Simple database wrappers #1748

merged 4 commits into from
Jan 30, 2015

Conversation

longjon
Copy link
Contributor

@longjon longjon commented Jan 19, 2015

After discussing with @shelhamer, we'd like to merge a simplified version of #1568 very soon, before addressing #1743, and in line with the discussion at #1238. To accelerate this process, I created this pared-down version of that PR. (History has been heavily rewritten, but keeps @sguada's authorship.)

This PR is meant to instantiate the original conception of #1238, i.e., a simple DB wrapper class that abstracts LevelDB vs LMDB without switch statements, and nothing more. The remaining functionality of #1568 may be rebased for continuing consideration.

Notes:

  • The DB interface lives in caffe::db as suggested in Datum db #1568 (comment).
  • The interface is a further simplified version of that from Datum db #1568. One reads (and only reads) from Cursors (and only Cursors), and writes (and only writes) from Transactions and (only Transactions). The wrappers are uncoupled from protobuf.
  • I used boost::scoped_ptr in some instances where unique_ptr would be appropriate, but is disallowed by our avoidance of C++11.
  • Tools are not well-tested. The changes are pretty light, so hopefully nothing is broken.

@sguada
Copy link
Contributor

sguada commented Jan 19, 2015

@longjon I took a quick pass, and although the interface is cleaner, I'm concerned that the management of cursors and transactions for LMDB could create problems.

I'm not sure if the proposed way of creating a new lmdb_txn for each cursor will cause problems, specially since they are never closed.

Also creating a new transaction for each batch implies creating a new lmdb_txn that it not clearly closed.

As it is now one could try to write while READ, or try to read while NEW.

Have you done some stress test? I mean trying to train a ImageNet like model? Have you tried that you can train and test on the same lmdb?

virtual void Next() = 0;
virtual string key() = 0;
virtual string value() = 0;
virtual bool valid() = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should Valid() to align better with leveldb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with valid since it's an accessor and we generally lowercase those, but presumably LevelDB is also following the Google C++ style guide, so I could see the argument either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case for consistency with leveldb I would vote for Valid

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with lowercase valid by the accessor rule. No need to match LevelDB exactly since this is an interface over DBs.

@longjon
Copy link
Contributor Author

longjon commented Jan 19, 2015

Thanks for your apt comments @sguada, I've fixed the sloppiness as noted.

Re: writing after READ, LMDB is opened read-only, I don't immediately see a corresponding option for LevelDB. We could enforce that in the wrapper or not; in this patch I've tried to provide nothing more than the functionality in master, and master of course doesn't enforce that. For LevelDB, READ still has the semantics of "don't create if missing".

Re: reading after WRITE or NEW, that seems fine to me, just as files are usually opened read-only or read/write.

Re: stress testing, I haven't done it yet, but can do.

This was referenced Jan 21, 2015
@shelhamer
Copy link
Member

I plan to merge this shortly, so let me know if there are any objections.

@longjon @sguada I stress tested this with 8x K40s doing simultaneous CaffeNet train and test on the same LMDB at a rate of 1 iter / s each. I left a CaffeNet solver running to check for leaks or other issues and there's no problem at 30,000+ iterations.

shelhamer added a commit that referenced this pull request Jan 30, 2015
@shelhamer shelhamer merged commit 92a66e3 into BVLC:dev Jan 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants