Skip to content
This repository has been archived by the owner on Mar 22, 2023. It is now read-only.

batch_put #708

Open
igchor opened this issue Jun 8, 2020 · 2 comments
Open

batch_put #708

igchor opened this issue Jun 8, 2020 · 2 comments

Comments

@igchor
Copy link
Contributor

igchor commented Jun 8, 2020

FEAT: batch_put

Rationale

With current API it's impossible to insert/update several values in one atomic action. I propose adding batch_putmethod which would solve this.

Description

Currently, we do not have any functions which return a consistent view of the database (more than one value). This means that it's probably ok for batch_put not to be fully serializable.

API Changes

New function batch_put(std::vector<std::pair<string_view, string_view>> batch) (or similar).

Implementation details

It's quite easy to implement update if all elements already exist (we just take all locks and do updates in a single tx). The problem is that inserting new nodes cannot be done in a tx (internally, in csmap and cmap we sometimes drop the locks and require them - it would be impossible to do that while locks are added to tx).

For both cmap and csmap we could solve this by algorithm similar to this one (based on custom redo log):

batch_put(batch) {
    for (e : batch)
        redo_log.add(e);

    std::vector<element_lock> locks;
    for (e : batch) {
        auto i = map.insert(e);
        lock.push_back(i->lock);
    }

    for (r : redo)
        r.perform_operation();

    free(redo);
    // drop locks
}

On recovery, if there were multiple batch_puts in progress we can redo them in any order. Since we had lock on all elements, no reader could have seen updated value. This implementation would guarantee full isolation (serializability).

Alternatively, we could use undo logs (it would save us allocations in redo logs, and we can optimize undo log buffer by preallocating it):

batch_put(batch) {
    std::vector<element_lock> locks;
    std::vector<iterators> its;

    for (e : batch) {
        auto i = map.insert({e, newly_inserted});
        // on recovery we would have to free the newly_inserted
        // nodes, or add logic to other functions (get/put etc.)
        lock.push_back(i->lock);
        its.push_back(i);
    }

    tx {
        for (i,e : its, batch) {
            i->value.assign(e);
            i->newly_inserted = false;
        }
    }

    // drop locks
}

Meta

@szyrom szyrom changed the title FEAT: batch_update FEAT: batch_put Jun 8, 2020
@szyrom
Copy link

szyrom commented Jun 8, 2020

decided to go for second approach (undo log with a usage of persistent TLS (from libpmemobj-cpp))

@igchor
Copy link
Contributor Author

igchor commented Jun 8, 2020

One problem with the second approach and TLS usage is that we need to add this TLS entry and insert new map node atomically. (maybe we could use some callback on map insert?)

@lukaszstolarczuk lukaszstolarczuk changed the title FEAT: batch_put batch_put Jul 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants