Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot Optimization (https://github.com/speedb-io/speedb/issues/35) #547

Merged
merged 3 commits into from
Aug 2, 2023

Conversation

ofriedma
Copy link
Contributor

@ofriedma ofriedma commented Jun 8, 2023

Motivation:
The most important information inside a snapshot is its Sequence number, which allows the compaction to know if the key-value should be deleted or not.
The sequence number is being changed when modification happens in the db.
This feature allows the db to take a snapshot without acquiring db mutex when the last snapshot has the same sequence number as a new one.
In transactional db with mostly read operations, it should improve performance when used with multithreaded environment and as well other scenarios of taking large amount of snapshots with mostly read operations.

This Feature must have folly library installed.

In order to cache the snapshots, there is last_snapshot_
(folly::atomic_shared_ptr, lock free atomic_shared_ptr) in order to
access the last_snapshot_ created and point to it.
For every GetSnapshotImpl call (where snapshots are being created), the
function checks if the sequence number is different than last_snapshot_,
if no, it creates new snapshot and inside this snapshot it adds a
reference to last_snapshot_ (the reference is cached_snapshot), so this sequence number will remain inside
SnapshotList (SnapshotList is the list of the snapshots in the system and used in compaction to show which snapshots are being used), if there are still snapshots holding this sequence number. If the sequence number as changed or the last_snapshot_ is nullptr it will create the snapshot while acquiring db_mutex.

For ReleaseSnapshotImpl (deleting a snapshot).
We will unref the last_snapshot_ (using comapre_exchange_weak) and if the refcount becomes 0, it will
call Deleter and remove this snapshot entirely from the SnapshotList and
continue with taking the db mutex.
If there are still references, it will return without taking it out from
the SnapshotList nor taking the db mutex

@ofriedma ofriedma requested a review from ayulas June 8, 2023 19:04
@Yuval-Ariel Yuval-Ariel linked an issue Jun 11, 2023 that may be closed by this pull request
@ofriedma ofriedma force-pushed the wip-ofriedma-snap-optimization branch 2 times, most recently from 620f851 to edcfc11 Compare June 14, 2023 11:56
@ofriedma
Copy link
Contributor Author

@ayulas Ready for review , thank you

@ofriedma ofriedma force-pushed the wip-ofriedma-snap-optimization branch from edcfc11 to e23dbcf Compare June 14, 2023 12:02
@ofriedma ofriedma added the enhancement New feature or request label Jun 14, 2023
@ofriedma ofriedma force-pushed the wip-ofriedma-snap-optimization branch from e23dbcf to 849576a Compare June 14, 2023 13:30
@@ -3711,7 +3711,57 @@ Status DBImpl::GetTimestampedSnapshots(
timestamped_snapshots_.GetSnapshots(ts_lb, ts_ub, timestamped_snapshots);
return Status::OK();
}
#ifdef ROCKSDB_SNAP_OPTIMIZATION
SnapshotImpl* DBImpl::GetSnapshotImpl(bool is_write_conflict_boundary,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to check if snapshot supported first... and if not exit immediately

bool lock) {
int64_t unix_time = 0;
immutable_db_options_.clock->GetCurrentTime(&unix_time)
.PermitUncheckedError(); // Ignore error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get as local variable the last published seq. you use it in several places, and you need to check for the same point in time . if you are taken it again as you wrote above you may get a different value!

immutable_db_options_.clock->GetCurrentTime(&unix_time)
.PermitUncheckedError(); // Ignore error
SnapshotImpl* s = new SnapshotImpl;
std::shared_ptr<SnapshotImpl> snap = snapshots_.last_snapshot_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last snapshot is fully. pls mention here that its atomic

.PermitUncheckedError(); // Ignore error
SnapshotImpl* s = new SnapshotImpl;
std::shared_ptr<SnapshotImpl> snap = snapshots_.last_snapshot_;
if (snap && snap->GetSequenceNumber() == GetLastPublishedSequence() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mention GetLastPublishedSequence should be taken once in this function and use this local value

SnapshotImpl* s = new SnapshotImpl;
std::shared_ptr<SnapshotImpl> snap = snapshots_.last_snapshot_;
if (snap && snap->GetSequenceNumber() == GetLastPublishedSequence() &&
snap->is_write_conflict_boundary_ == is_write_conflict_boundary) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the all point of this feature is to use efficiently the fact that you actually dont need to take a snapshot if not needed (no new writes and the is_write_conflict_boundary is the same) so there is no point to create a new snap...... you need to use. the exist one and use the shared ptr facilities (wont be deleted as long someone holds it e.g. it still in the snap list) and this is the approach i will choose, and handling the snap time by adding a range snap time.... (time_t so its a long and can be atomic) . but even if you create a new snap by NEW , you shouldn't increase the snap list count cause its just a reference to a snap in the snap list

mutex_.AssertHeld();
}
// returns null if the underlying memtable does not support snapshot.
if (!is_snapshot_supported_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be checked first, before doing something

auto snapshot_seq = GetLastPublishedSequence();
SnapshotImpl* snapshot =
snapshots_.New(s, snapshot_seq, unix_time, is_write_conflict_boundary);
SnapshotImpl* user_snapshot = new SnapshotImpl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? you have the s... why creating a new object??
you need to add in the snap structure the notion if its an holder or the original snap thats it.
you should use s and not create a new one
also why not adding to the snapshots_.New function the most of the knowledge...
we want to reduce the db mutex as much as we can
first use the list mutex
then the list should be a set order by seq number (since you can have many threads that doing snap while writing is in progress and because you are updating the snap list by lock in a short scope you might insert a snap with older seq after a new one ) then you update the last snap list object by taking the last object in the set (still under the list mutex) doing it by store atomic function and increase the snap count to 1
when you create a new snap after checking if you supported the snap creation at all, load the last snap (atomically) and on it do your checking.

// inplace_update_support enabled.
return;
}
snapshots_.count_.fetch_sub(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no... if its a reference snap it wasnt part of the list

InstrumentedMutexLock l(&mutex_);
std::scoped_lock<std::mutex> snaplock(snapshots_.lock);
snapshots_.deleteitem = false;
uint64_t oldest_snapshot;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly the same code as in the original scope. you copied the same code...
i dont think we should divide it at all.

};

class SnapshotList {
public:
#ifdef ROCKSDB_SNAP_OPTIMIZATION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think you should divide code. snap list should have mutex, last snap , the delete item is not needed and you can do the code much much simpler / now you have in some parts holes

@ofriedma
Copy link
Contributor Author

ofriedma commented Jul 4, 2023

Summarize offered changes:

Fix Release and Get so we will always delete when needed.
Separate count and logical count.
Unix time and sequence number should be together.
move ifdef within the releasesnapshot
Try for GetSnapshot as well.
Add new RefSnapshot to db_impl
move New node snapshot_list logic to New.
Add to New another parameter of re-add.

@ofriedma ofriedma force-pushed the wip-ofriedma-snap-optimization branch 2 times, most recently from 16ce5d0 to 523a6fa Compare July 6, 2023 17:52
@ofriedma
Copy link
Contributor Author

ofriedma commented Jul 6, 2023

@ayulas Please review

@ofriedma ofriedma requested a review from ayulas July 6, 2023 17:53
@ofriedma ofriedma force-pushed the wip-ofriedma-snap-optimization branch 2 times, most recently from 2e2af9d to b6d6717 Compare July 6, 2023 22:47
@@ -3714,10 +3714,16 @@ Status DBImpl::GetTimestampedSnapshots(

SnapshotImpl* DBImpl::GetSnapshotImpl(bool is_write_conflict_boundary,
bool lock) {
if (!is_snapshot_supported_) {
return nullptr;
}
int64_t unix_time = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed the unix time should be taken inside the RefSnapshot and _New

SnapshotImpl* s = new SnapshotImpl;
#ifdef SPEEDB_SNAP_OPTIMIZATION
if (RefSnapshot(unix_time, is_write_conflict_boundary, s)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mention , unix_time shouldnt be an input parameter

@@ -3714,10 +3714,16 @@ Status DBImpl::GetTimestampedSnapshots(

SnapshotImpl* DBImpl::GetSnapshotImpl(bool is_write_conflict_boundary,
bool lock) {
if (!is_snapshot_supported_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didnt remove this check in line 3734

@@ -3732,6 +3738,8 @@ SnapshotImpl* DBImpl::GetSnapshotImpl(bool is_write_conflict_boundary,
delete s;
return nullptr;
}
immutable_db_options_.clock->GetCurrentTime(&unix_time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should change the New function signature so the seq and time ad the Ref function will be set inside

SnapshotImpl* New(SnapshotImpl* s, SequenceNumber seq, uint64_t unix_time,
bool is_write_conflict_boundary,
uint64_t ts = std::numeric_limits<uint64_t>::max()) {
#ifdef SPEEDB_SNAP_OPTIMIZATION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why to divide it . put the logical in not folly as well. the ifdef folly should be minimal the logical should be not under ifdef

struct Deleter {
inline void operator()(SnapshotImpl* snap) const;
};
int64_t unix_time_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you created it again under ifdef??
you need it to be public changed it to all

SnapshotImpl* New(SnapshotImpl* s, SequenceNumber seq, uint64_t unix_time,
bool is_write_conflict_boundary,
uint64_t ts = std::numeric_limits<uint64_t>::max()) {
#ifdef SPEEDB_SNAP_OPTIMIZATION
std::unique_lock<std::mutex> l(lock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need lock, the db mutex already protect you

@@ -81,9 +104,29 @@ class SnapshotList {
return list_.prev_;
}

#ifdef SPEEDB_SNAP_OPTIMIZATION
SnapshotImpl* NewSnapRef(SequenceNumber seq, uint64_t unix_time,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont understand why you allocated another snapshot. its not right.

// changing count_ always under snapshot_list mutex
uint64_t count_;
uint64_t logical_count() const { return logical_count_; }
std::atomic_uint64_t logical_count_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be out as well

@ofriedma ofriedma force-pushed the wip-ofriedma-snap-optimization branch 2 times, most recently from d30509c to 3b7a705 Compare July 19, 2023 11:50
@ofriedma
Copy link
Contributor Author

ready for review

@ofriedma ofriedma force-pushed the wip-ofriedma-snap-optimization branch from 3b7a705 to 6be2ecf Compare July 19, 2023 12:46
return nullptr;
}
SnapshotImpl* snapshot = snapshots_.RefSnapshot(
is_write_conflict_boundary, GetLastPublishedSequence(), GetSystemClock());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetSystemClock shouldn't be a parameter.. it should be called inside. that what we discussed

auto snapshot_seq = GetLastPublishedSequence();
SnapshotImpl* snapshot =
snapshots_.New(s, snapshot_seq, unix_time, is_write_conflict_boundary);
snapshot = snapshots_.New(GetLastPublishedSequence(), GetSystemClock(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again... not pass the GetSystemClock

@@ -3747,7 +3742,6 @@ DBImpl::CreateTimestampedSnapshotImpl(SequenceNumber snapshot_seq, uint64_t ts,
int64_t unix_time = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont need that

@@ -3747,7 +3742,6 @@ DBImpl::CreateTimestampedSnapshotImpl(SequenceNumber snapshot_seq, uint64_t ts,
int64_t unix_time = 0;
immutable_db_options_.clock->GetCurrentTime(&unix_time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mention above

return std::make_pair(status, ret);
} else {
status.PermitUncheckedError();
}
}

SnapshotImpl* snapshot =
snapshots_.New(s, snapshot_seq, unix_time,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the signature of the function shouldn't contains unix_time

shared_snap->is_write_conflict_boundary_ ==
is_write_conflict_boundary) {
SnapshotImpl* snapshot = new SnapshotImpl;
int64_t unix_time;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doing double setting? set it directly to snapshot->unix_time_

return false;
}

SnapshotImpl* New(SequenceNumber seq, SystemClock* clock,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont pass the time in the function signature

{
InstrumentedMutexLock l(&mutex_);
snapshots_.Delete(casted_s);
std::unique_lock<std::mutex> snapshotlist_lock(snapshots_.lock);
casted_s = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you need that... you deleted the s in the Delete function. do it inside the Delete

@@ -81,9 +121,54 @@ class SnapshotList {
return list_.prev_;
}

SnapshotImpl* New(SnapshotImpl* s, SequenceNumber seq, uint64_t unix_time,
#ifdef SPEEDB_SNAP_OPTIMIZATION
SnapshotImpl* NewSnapRef(SnapshotImpl* s, SequenceNumber seq,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewSnapRef gets the s as an input... why passing all the parameters again in the function signature. you need to create the user_snap by passing in the constructor the base s

@@ -94,15 +179,25 @@ class SnapshotList {
s->prev_->next_ = s;
s->next_->prev_ = s;
count_++;
#ifdef SPEEDB_SNAP_OPTIMIZATION
l.unlock();
return NewSnapRef(s, seq, unix_time, is_write_conflict_boundary, ts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mention .. pass s and ts only

@ofriedma ofriedma force-pushed the wip-ofriedma-snap-optimization branch from 6be2ecf to 94ed6c4 Compare July 20, 2023 14:57
@ofriedma ofriedma requested a review from ayulas July 20, 2023 14:58
@ofriedma
Copy link
Contributor Author

@ayulas fixed

immutable_db_options_.clock->GetCurrentTime(&unix_time)
.PermitUncheckedError(); // Ignore error
SnapshotImpl* s = new SnapshotImpl;

const bool need_update_seq = (snapshot_seq != kMaxSequenceNumber);

if (lock) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this after the check of is_snapshot_supported_. you will avoid lock unlock....like the GetSnapshotImpl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ofriedma ofriedma force-pushed the wip-ofriedma-snap-optimization branch from 94ed6c4 to c64d847 Compare July 23, 2023 12:35
@ofriedma ofriedma requested a review from ayulas July 23, 2023 12:36
@ofriedma ofriedma force-pushed the wip-ofriedma-snap-optimization branch from c64d847 to bcb828a Compare July 23, 2023 13:12
Motivation:
The most important information inside a snapshot is its Sequence number, which allows the compaction to know if the key-value should be deleted or not.
The sequence number is being changed when modification happens in the db.
This feature allows the db to take a snapshot without acquiring db mutex when the last snapshot has the same sequence number as a new one.
In transactional db with mostly read operations, it should improve performance when used with multithreaded environment and as well other scenarios of taking large amount of snapshots with mostly read operations.

This Feature must have folly library installed.

In order to cache the snapshots, there is last_snapshot_
(folly::atomic_shared_ptr, lock free atomic_shared_ptr) in order to
access the last_snapshot_ created and point to it.
For every GetSnapshotImpl call (where snapshots are being created), the
function checks if the sequence number is different than last_snapshot_,
if no, it creates new snapshot and inside this snapshot it adds a
reference to last_snapshot_ (the reference is cached_snapshot), so this sequence number will remain inside
SnapshotList (SnapshotList is the list of the snapshots in the system and used in compaction to show which snapshots are being used), if there are still snapshots holding this sequence number. If the sequence number as changed or the last_snapshot_ is nullptr it will create the snapshot while acquiring db_mutex.

For ReleaseSnapshotImpl (deleting a snapshot).
We will unref the last_snapshot_ (using comapre_exchange_weak) and if the refcount becomes 0, it will
call Deleter and remove this snapshot entirely from the SnapshotList and
continue with taking the db mutex.
If there are still references, it will return without taking it out from
the SnapshotList nor taking the db mutex
@ofriedma ofriedma force-pushed the wip-ofriedma-snap-optimization branch from bcb828a to c1ddc54 Compare July 23, 2023 16:09
@Yuval-Ariel Yuval-Ariel merged commit ff1b086 into main Aug 2, 2023
@Yuval-Ariel Yuval-Ariel deleted the wip-ofriedma-snap-optimization branch August 2, 2023 11:05
udi-speedb pushed a commit that referenced this pull request Nov 22, 2023
* Snapshot Optimization (#35)

Motivation:
The most important information inside a snapshot is its Sequence number, which allows the compaction to know if the key-value should be deleted or not.
The sequence number is being changed when modification happens in the db.
This feature allows the db to take a snapshot without acquiring db mutex when the last snapshot has the same sequence number as a new one.
In transactional db with mostly read operations, it should improve performance when used with multithreaded environment and as well other scenarios of taking large amount of snapshots with mostly read operations.

This Feature must have folly library installed.

In order to cache the snapshots, there is last_snapshot_
(folly::atomic_shared_ptr, lock free atomic_shared_ptr) in order to
access the last_snapshot_ created and point to it.
For every GetSnapshotImpl call (where snapshots are being created), the
function checks if the sequence number is different than last_snapshot_,
if no, it creates new snapshot and inside this snapshot it adds a
reference to last_snapshot_ (the reference is cached_snapshot), so this sequence number will remain inside
SnapshotList (SnapshotList is the list of the snapshots in the system and used in compaction to show which snapshots are being used), if there are still snapshots holding this sequence number. If the sequence number as changed or the last_snapshot_ is nullptr it will create the snapshot while acquiring db_mutex.

For ReleaseSnapshotImpl (deleting a snapshot).
We will unref the last_snapshot_ (using comapre_exchange_weak) and if the refcount becomes 0, it will
call Deleter and remove this snapshot entirely from the SnapshotList and
continue with taking the db mutex.
If there are still references, it will return without taking it out from
the SnapshotList nor taking the db mutex
udi-speedb pushed a commit that referenced this pull request Dec 5, 2023
* Snapshot Optimization (#35)

Motivation:
The most important information inside a snapshot is its Sequence number, which allows the compaction to know if the key-value should be deleted or not.
The sequence number is being changed when modification happens in the db.
This feature allows the db to take a snapshot without acquiring db mutex when the last snapshot has the same sequence number as a new one.
In transactional db with mostly read operations, it should improve performance when used with multithreaded environment and as well other scenarios of taking large amount of snapshots with mostly read operations.

This Feature must have folly library installed.

In order to cache the snapshots, there is last_snapshot_
(folly::atomic_shared_ptr, lock free atomic_shared_ptr) in order to
access the last_snapshot_ created and point to it.
For every GetSnapshotImpl call (where snapshots are being created), the
function checks if the sequence number is different than last_snapshot_,
if no, it creates new snapshot and inside this snapshot it adds a
reference to last_snapshot_ (the reference is cached_snapshot), so this sequence number will remain inside
SnapshotList (SnapshotList is the list of the snapshots in the system and used in compaction to show which snapshots are being used), if there are still snapshots holding this sequence number. If the sequence number as changed or the last_snapshot_ is nullptr it will create the snapshot while acquiring db_mutex.

For ReleaseSnapshotImpl (deleting a snapshot).
We will unref the last_snapshot_ (using comapre_exchange_weak) and if the refcount becomes 0, it will
call Deleter and remove this snapshot entirely from the SnapshotList and
continue with taking the db mutex.
If there are still references, it will return without taking it out from
the SnapshotList nor taking the db mutex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snapshot optimization
3 participants