Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hooks and deprecate batch, put & del events #45

Merged
merged 15 commits into from
Nov 5, 2022
Merged

Add hooks and deprecate batch, put & del events #45

merged 15 commits into from
Nov 5, 2022

Conversation

vweevers
Copy link
Member

@vweevers vweevers commented Oct 30, 2022

Adds postopen, prewrite and newsub hooks that allow userland "hook functions" to customize behavior of the database. See README for details. A quick example:

db.hooks.prewrite.add(function (op, batch) {
  if (op.type === 'put') {
    batch.add({
      type: 'put',
      key: op.value.foo,
      value: op.key,
      sublevel: fooIndex
    })
  }
})

More generally, this is a move towards "renewed modularity". Our ecosystem is old and many modules no longer work because they had no choice but to monkeypatch database methods, of which the signature has changed since then.

So in addition to hooks, this:

  • Introduces a new write event that is emitted on db.batch(), db.put() and db.del() and has richer data: userland options, encoded data, keyEncoding and valueEncoding. The batch, put and del events are now deprecated and will be removed in a future version. Related to Live streaming values from a level level#222.
  • Restores support of userland options on batch operations. In particular, to copy options in db.batch(ops, options) to ops, allowing for code like db.batch(ops, { ttl: 123 }) to apply a default userland ttl option to all ops.

No breaking changes, yet. Using hooks means opting-in to new behaviors (like the new write event) and disables some old behaviors (like the deprecated events). Later on we can make those the default behavior, regardless of whether hooks are used.

TODO:

  • Benchmark
  • Write tests
  • Documentation for batch argument of prewrite hook function
  • Canary-test memory-level
  • Canary-test classic-level

Closes Level/community#44.

Adds postopen, prewrite and newsub hooks that allow userland "hook
functions" to customize behavior of the database. See README for
details. A quick example:

```js
db.hooks.prewrite.add(function (op, batch) {
  if (op.type === 'put') {
    batch.add({
      type: 'put',
      key: op.value.foo,
      value: op.key,
      sublevel: fooIndex
    })
  }
})
```

More generally, this is a move towards "renewed modularity". Our
ecosystem is old and many modules no longer work because they had
no choice but to monkeypatch database methods, of which the
signature has changed since then.

So in addition to hooks, this:

- Introduces a new `write` event that is emitted on `db.batch()`,
  `db.put()` and `db.del()` and has richer data: userland options,
  encoded data, keyEncoding and valueEncoding. The `batch`, `put`
  and `del` events are now deprecated and will be removed in a
  future version. Related to Level/level#222.
- Restores support of userland options on batch operations. In
  particular, to copy options in `db.batch(ops, options)` to ops,
  allowing for code like `db.batch(ops, { ttl: 123 })` to apply a
  default userland `ttl` option to all ops.

No breaking changes, yet. Using hooks means opting-in to new
behaviors (like the new write event) and disables some old behaviors
(like the deprecated events). Later on we can make those the default
behavior, regardless of whether hooks are used.

TODO: benchmarks, tests and optionally some light refactoring.

Closes Level/community#44.
@vweevers vweevers added the semver-minor New features that are backward compatible label Oct 30, 2022
@vweevers
Copy link
Member Author

Initial batch benchmarks (on memory-level) look good. If you're not using hooks or events, the hooks branch of abstract-level is faster than main. If you are using events, db.batch() with a write event listener is 3-4% slower than db.batch() with a batch event listener (on either branch). Which is fair; the write event has more data.

@vweevers
Copy link
Member Author

vweevers commented Nov 1, 2022

db.put() performance is good too (after 75c75e2). The hooks branch is faster than the main branch if no events are used. As expected, it becomes slower when you use events or prehooks. In the table below, events.put=1 means the benchmark had one listener for the put event. Similarly, hooks.prewrite=1 means one prewrite hook function, and hooks.prewrite=100 means it did:

for (let i = 0; i < 100; i++) {
  db.hooks.prewrite.add(function () {})
}
$ level-bench plot put
benchmark put on memory-level@1.0.0 win32 x64
node@16.9.1 n=1M concurrency=4 valueSize=100B keys=random values=random

1  memory-level#hooks                      36588 ops/s ±8.51%  fastest
2  memory-level#main                       35837 ops/s ±8.12%   +1.70%
3  memory-level#hooks  hooks.prewrite=1    35286 ops/s ±6.51%   +1.75%
4  memory-level#main   events.put=1        35508 ops/s ±7.47%   +2.01%
5  memory-level#hooks  events.write=1      34692 ops/s ±6.83%   +3.69%
6  memory-level#hooks  hooks.prewrite=100  34302 ops/s ±8.28%   +6.05%
Plot (click to expand)

put 1667342642043

@vweevers
Copy link
Member Author

vweevers commented Nov 1, 2022

In classic-level, adding a prewrite hook function has a bigger effect. Which is not a blocker for this PR but we may want to look into optimizing batches at some point.

$ level-bench plot put
benchmark put on classic-level@1.2.0 win32 x64
node@16.9.1 n=1M concurrency=4 valueSize=100B keys=random values=random

1  classic-level#main                     30548 ops/s ±7.59%  fastest
2  classic-level#hooks                    30424 ops/s ±7.32%   +0.16%
3  classic-level#hooks  hooks.prewrite=1  28070 ops/s ±7.55%   +8.08%

@vweevers
Copy link
Member Author

vweevers commented Nov 3, 2022

There's one remaining issue to fix (or not). If you do:

const data = db.sublevel('data')
const users = data.sublevel('users')

data.on('write', function (ops) {
  const wrongKey = ops[0].key
})

data.batch().del('alice', { sublevel: users })

Then the wrongKey emitted by the data sublevel is !data!!users!alice rather than !users!alice. This is a result of how sublevels work in general and I don't yet have a solution.

@vweevers vweevers marked this pull request as ready for review November 3, 2022 19:52
@vweevers vweevers changed the title Add hooks Add hooks and deprecate batch, put & del events Nov 3, 2022
@vweevers
Copy link
Member Author

vweevers commented Nov 4, 2022

I have a solution and a PoC implementation, but it'll hurt performance for nested sublevels. Given users = db.sublevel('data').sublevel('users'), instead of users forwarding its operations directly to db, it'll forward to the data sublevel which in turn forwards to db. I.e. users.batch([]) calls data.batch([]) which calls db.batch([]).

I have to benchmark that and see what tweaks can be made, but even if performance is significantly worse (and I think it will be) it might be worth it. Because it benefits both events and hooks: users.batch([]) would trigger the prewrite hook of users, then of data, then of db. Same for the write event. So, no matter what kind of database you have (sublevel or not, nested or not) it works the same. Which should benefit modularity.

It would make this PR semver-major, for two reasons:

  1. The change in performance. We could add support of db.sublevel(['data', 'users']) to give users the ability to negate it.
  2. We'd no longer support passing a sublevel option that isn't a descendant. So given a = db.sublevel('a') and b = db.sublevel('b') you can no longer do b.batch().del('1', { sublevel: a }).

In which case, I might just remove the batch, put and del events rather than deprecating them.

@vweevers
Copy link
Member Author

vweevers commented Nov 4, 2022

@juliangruber @ralphtheninja any objections? The batch, put and del events are 10 years old, so I don't want to take removing them lightly.

@vweevers
Copy link
Member Author

vweevers commented Nov 4, 2022

I have to benchmark that

Results for db.put() on memory-level, comparing no sublevel, 1 sublevel (!foo!), 2 sublevels (!foo!!bar!) and more:

Click to expand
$ level-bench plot put
benchmark put on memory-level@1.0.0 win32 x64
node@16.9.1 n=1M concurrency=4 valueSize=100B keys=random values=random

1   memory-level#hooks                             35781 ops/s ±7.11%  fastest
2   memory-level#main                              35794 ops/s ±8.69%   +1.43%
3   memory-level#hooks  !foo!                      30701 ops/s ±7.04%  +14.15%
4   memory-level#main   !foo!                      30183 ops/s ±5.55%  +14.40%
5   memory-level#main   !foo!!bar!                 29386 ops/s ±5.66%  +16.75%
6   memory-level#hooks  !foo!!bar!                 28865 ops/s ±5.07%  +17.76%
7   memory-level#main   !foo!!bar!!baz!            28813 ops/s ±5.46%  +18.22%
8   memory-level#main   !foo!!bar!!baz!!bam!       28720 ops/s ±5.48%  +18.49%
9   memory-level#main   !foo!!bar!!baz!!bam!!boo!  27970 ops/s ±5.96%  +20.98%
10  memory-level#hooks  !foo!!bar!!baz!            27796 ops/s ±5.59%  +21.20%
11  memory-level#hooks  !foo!!bar!!baz!!bam!       26067 ops/s ±6.95%  +27.04%
12  memory-level#hooks  !foo!!bar!!baz!!bam!!boo!  25255 ops/s ±5.33%  +28.22%

At a depth of 2 sublevels, the difference between main and hooks is negligible. But it gets progressively worse the deeper you go. That's partially explained by having to copy longer prefixes, but main has a more consistent performance between sublevel depths.

With support of db.sublevel(['foo', 'bar']) (marked by flat below) we can recover:

Click to expand
1   memory-level#hooks                                   35781 ops/s ±7.11%  fastest
2   memory-level#main                                    35794 ops/s ±8.69%   +1.43%
3   memory-level#hooks  !foo!                            30701 ops/s ±7.04%  +14.15%
4   memory-level#main   !foo!                            30183 ops/s ±5.55%  +14.40%
5   memory-level#main   !foo!!bar!                       29386 ops/s ±5.66%  +16.75%
6   memory-level#hooks  !foo!!bar!                       28865 ops/s ±5.07%  +17.76%
7   memory-level#main   !foo!!bar!!baz!                  28813 ops/s ±5.46%  +18.22%
8   memory-level#hooks  !foo!!bar!!baz!!bam!!boo!  flat  29008 ops/s ±6.28%  +18.30%
9   memory-level#main   !foo!!bar!!baz!!bam!             28720 ops/s ±5.48%  +18.49%
10  memory-level#main   !foo!!bar!!baz!!bam!!boo!        27970 ops/s ±5.96%  +20.98%
11  memory-level#hooks  !foo!!bar!!baz!                  27796 ops/s ±5.59%  +21.20%
12  memory-level#hooks  !foo!!bar!!baz!!bam!             26067 ops/s ±6.95%  +27.04%
13  memory-level#hooks  !foo!!bar!!baz!!bam!!boo!        25255 ops/s ±5.33%  +28.22%

@vweevers vweevers removed the semver-minor New features that are backward compatible label Nov 5, 2022
@vweevers vweevers added the semver-major Changes that break backward compatibility label Nov 5, 2022
@vweevers vweevers added this to the 2.0.0 milestone Nov 5, 2022
@vweevers
Copy link
Member Author

vweevers commented Nov 5, 2022

I've created a v2 branch as new base for this PR. Allows me to move ahead with items of #47.

@vweevers vweevers changed the base branch from main to v2 November 5, 2022 20:18
@vweevers vweevers merged commit ad3a813 into v2 Nov 5, 2022
@vweevers vweevers deleted the hooks branch November 5, 2022 20:25
@juliangruber
Copy link
Member

Sorry @vweevers, I don't have time to review this ATM :|

@vweevers
Copy link
Member Author

vweevers commented Nov 7, 2022

OK! Thanks for letting me know. FWIW I'll probably mark the hooks API as experimental (before v2 goes out the door) so there will be room for changes.

vweevers added a commit that referenced this pull request Nov 10, 2022
vweevers added a commit that referenced this pull request Nov 10, 2022
vweevers added a commit to Level/bench that referenced this pull request Nov 19, 2022
vweevers added a commit that referenced this pull request Jan 27, 2024
- Adds postopen, prewrite and newsub hooks that allow userland "hook
  functions" to customize behavior of the database.
- Introduces a new `write` event that is emitted on `db.batch()`,
  `db.put()` and `db.del()`.
- Restores support of userland options on batch operations.
- Changes nested sublevels to be actually nested. Comes with two
  low-impact breaking changes, described in `UPGRADING.md`.
vweevers added a commit that referenced this pull request Jan 27, 2024
vweevers added a commit that referenced this pull request Feb 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-major Changes that break backward compatibility
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

plugin extension points: merge level-hooks / level-sublevel
2 participants