Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot API #47

Closed
juliangruber opened this issue May 13, 2013 · 27 comments
Closed

Snapshot API #47

juliangruber opened this issue May 13, 2013 · 27 comments
Labels
discussion Discussion enhancement New feature or request

Comments

@juliangruber
Copy link
Member

What about this API for snapshots

db#snapshot()

Create a new snapshot.

snapshot#get(key, value[, opts])

Read a single value from a snapshot.

snapshot#create{Read,Key,Value}Stream

Read multiple values from a snapshot.

snapshot#dispose()

Delete the snapshot.

Maybe also:

snapshot#{del,createWriteStream,...}

Just forwarded to db#*

Usage

For a consistend read stream this would be:

var snapshot() = db.snapshot();
snapshot.createReadStream()
  .pipe(...)
  .on('close', shapshot.dispose.bind(snapshot));
@rvagg
Copy link
Member

rvagg commented May 13, 2013

nice creative thinking here, I was thinking more in terms of a token to be applied to the root db for read operations but this looks quite nice.

@juliangruber
Copy link
Member Author

This makes it easy to do multiple operations on a snapshot but for just one operations it introduces some overhead, comparing the example from above to

db.createReadStream({ snapshot : true }).pipe(...)

@rvagg
Copy link
Member

rvagg commented May 13, 2013

Yeah, but { snapshot : true } is unnecessary, every time you create a LevelDB iterator you create a snapshot that gets discarded when you finish, so you're using snapshots even though you don't see them. Snapshots in the sense we're talking about here are for longer-lived operations, where multiple read types (get & iterator) can reference the same point-in-time.

I was thinking more:

var snapshot = db.createSnapshot()
db.createReadStream({ snapshot : snapshot }).pipe(...)
snapshot.dispose()

@juliangruber
Copy link
Member Author

Oh, true.

What I like about the syntax I proposed is that it

  • extends no options objects (which should stay as small as possible)
  • just swaps db with snapshot and all the APIs stay the same

@juliangruber
Copy link
Member Author

Bad about my syntax:

Plugins that overwrite methods need to do it in 2 places, or just use hooks.

But, I think we need more hooks libraries anyways, as only they should do db.get = function() {...}. With them in place there's no problem here.

@rvagg
Copy link
Member

rvagg commented May 13, 2013

ahh, perhaps we could do both approaches - i.e. hide the fact that it uses an options object on createReadStream() and get() to present your API. That way, plugins mostly won't need to worry since they will be routed back through the standard calls:

snapshot.createReadStream() === db.createReadStream({ snapshot: snapshot })
snapshot.get('foo', cb) === db.get('foo', { snapshot: snapshot }, cb)

We wouldn't even necessarily need to document the option.

@juliangruber
Copy link
Member Author

Nice! Let's do it like this.

Does LevelDOWN expose snapshots already?

@rvagg
Copy link
Member

rvagg commented May 13, 2013

there's a tiny bit of snapshot code in there, not much and it's not tested.

@mcollina
Copy link
Member

Are you sure we should expose the del, put, batch and createWriteStreammethods in the snapshots? The snapshot will totally ignore their changes
The following example shows the problem:

var snap = db.snapshot();
snap.put("a", "b", function() {
  snap.get("a", function(err, result) {
    // unfortunately, the key will not exists on the snapshot
  });
  db.get("a", function(err, result) {
    // but it will exists here!!
  });
});

@juliangruber
Copy link
Member Author

Yeah, that kinda sucks.

I was thinking to store new state in memory, so snap.get('a') would return a value from memory...But, no, that's going to cause problems.

On the other hand, it could be nice to do plugin(snapshot) instead of plugin(db) and it just works...Can't think of a real use case though...

The let's just do the reading methods for now.

@mcollina
Copy link
Member

If you implement it with the {snapshot: snap } option, any plugin will work by attaching itself to the main db.

@juliangruber
Copy link
Member Author

Hm true, any plugin that will need snapshots will just have to use them. I was thinking about a plugin that would read 2 values and store 1. If it didn't have to be consistent you could call it with plugin(db) but if it had do, it would be plugin(snap).

Anyways, there's no real use case for this. So,

  • expose snapshots in LevelDOWN
  • add snapshot option to all methods that read something
  • add snapshot sugar with fluent api

@kesla
Copy link
Contributor

kesla commented May 13, 2013

I've been lagging a bit behind descussion about snapshots, so I'd like to know what the real use case for snapshots are to begin with? I'd say that we're in the risk of bloating levelup if we add snapshots only because we can...

@dominictarr
Copy link
Contributor

I can't think of any uses for snapshots either, hmm, except providing a consistent view of data while you perform some computation -- but what is the point of consistent, if it's not the current data?

Anyway, on aesthetic grounds, I agree with @mcollina that snapshots shouldn't have any write methods,
also, if the db emitted a snapshot event, then plugins could listen on this, and apply themselves to the snapshot also.

But most importantly - what do we need snapshots for?

Oh, I remember one thing: I had some problems with createReadStream and then syncly putting data, and having it turn up in the read stream... i.e. a normal createReadStream should not return data that was inserted after it was created...

@rvagg
Copy link
Member

rvagg commented May 13, 2013

ref #128 for that last problem, there's some discussion there that covers some relevant ground

@mcollina
Copy link
Member

If we put the snapshosts in LevelDown, then also MemDown and Level.js should support them.
Is this feasible?

@rvagg
Copy link
Member

rvagg commented May 13, 2013

No, I'm not going to implement it in MemDOWN (unless some clever bean submits a pull request). I don't know if IndexedDB exposes a usable API for this either.

Relevant thoughts on this in this comment: Level/levelup#129 (comment)

LevelUP is becoming more of a generic API to a storage architecture so it's important to be careful with API bloat here. As of 0.9, you'll be able to directly refer to a 'db' property of your LevelUP instance which is the underlying LevelDOWN compatible instance. I'd be happy to remove approximateSize(), repair() and destroy() from LevelUP and just say that they can be accessed from LevelDOWN which can be require()d for the static functions and referred to on 'db' for approximateSize() (mydb.db.approximateSize(...)).

Same thoughts apply here, it appears that there could be something of an emerging suspicion that the snapshot API may not be "core enough" to expose here, it would be easy enough to just expose it in LevelDOWN and it'll still be usable; we'd just have to forgo @juliangruber's nice API--but that could just as easily become a plugin.

@juliangruber
Copy link
Member Author

Agree.

We could make it part of a extended abstractLevelDOWN test suite and each implementation could refer to that in the README.

@fritzy
Copy link

fritzy commented May 14, 2013

Use case: could a snapshot be used to read keys, create an atomic batch determined from those keys and use an optimistic lock? Say, if the snapshot is no longer valid for the read keys, fail the batch.

@fritzy
Copy link

fritzy commented May 14, 2013

Essentially, you could use snapshots to implement optimistic lock transactions.

I found an example of what I meant.
http://codeofrob.com/entries/writing-a-transaction-manager-on-top-of-leveldb.html

@dominictarr
Copy link
Contributor

oh, right!

Yes, that would work great!

if there was a way to check if the snapshot was invalid!

you could do this with a bloom filter also, a snapshot creates a bloom filter, but also insert any puts into the bloom filter.
if there is a collision, (because you read a key, and then it was updated) then abort the transaction.

@mirkokiefer
Copy link

Has anyone worked on getting snapshots exposed?
I think there are definitely use-cases where you need support for atomic batch reads.

@rvagg
Copy link
Member

rvagg commented Jun 27, 2013

They are partially implemented in LevelDOWN, from long ago, waiting for someone to either apply enough pressure to get it done because they have a compelling use-case or just do it themselves!

@dominictarr
Copy link
Contributor

@mirkokiefer can you describe your usecase? we are having trouble thinking of a compelling usecase for this, so if you can give us one, that would be very helpful!

@rvagg
Copy link
Member

rvagg commented Jul 1, 2013

Perhaps I should run a bountysource for this. If enough people want it and are willing to demonstrate that with $$ the I could probably crank it out. I just don't have a need for it right now and like @dominictarr said, I haven't seen any compelling use-cases that convince me that it's the next thing I should spend my limited time on.

@mirkokiefer
Copy link

@dominictarr I thought I had a usecase with having to iterate over a snapshot of entries while new entries are written. But readonly iterators do in fact already guarantee that they operate on a snapshot of the data - which is pretty cool :)

@ralphtheninja ralphtheninja reopened this Dec 18, 2018
@ralphtheninja ralphtheninja transferred this issue from Level/levelup Dec 18, 2018
@vweevers vweevers added the discussion Discussion label Jan 1, 2019
@vweevers vweevers added the enhancement New feature or request label Jan 1, 2019
vweevers added a commit to Level/abstract-level that referenced this issue Sep 25, 2022
At the moment, this is a documentation-only PR, acting as an RFC. In
particular I'd like review of my choice to use a token-based
approach (as first suggested by Rod Vagg in
Level/community#45) as opposed to a
dedicated snapshot API surface (as suggested by Julian Gruber in
Level/community#47). Main reasons for not
choosing the latter:

- It would be a third API surface. We have the main database API,
  the sublevel API which is equal to that except for implementation-
  specific additional methods, and would now add another API which
  only has read methods (get, iterator, etc, but not put and batch).
  If the API was reusable, meaning you could pass around a snapshot
  as if it was a regular database (like you can with sublevels) then
  I'd be cool with it. But for that to happen, we'd have to
  implement transactions as well, which I consider to be out of
  scope although transactions are in fact a use case of snapshots.
- It would have a higher complexity once you factor in sublevels.
  I.e. to make `db.sublevel().snapshot().get(key)` read from the
  snapshot but also prefix the given key. By instead doing
  `db.sublevel().get(key, { snapshot })`, the sublevel can just
  forward that snapshot option to its parent database.
- Furthermore, with the token approach, you can pass a snapshot to
  multiple sublevels, which cleanly solves the main use case of
  retrieving data from an index (I included an example in the docs).

I renamed the existing snapshot mechanism to "implicit snapshots"
and attempted to clarify the behavior of those as well.

If accepted, some related issues can be closed, because this PR:

- Includes (a more complete write-up than)
  Level/classic-level#40.
- Mentions Level/leveldown#796 (which
  should be moved rather than closed).
- Answers Level/classic-level#28.
- Solves the main use case as described in
  Level/leveldown#486.
- Effectively completes Level/awesome#19.
@vweevers
Copy link
Member

Closing in favor of the approach outlined in #45. Tracked in #118.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants