-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip expired object while using DBWithTtl #403
Conversation
Does the partial / hybrid support (Get / MultiGet strict ttl while iterator / merge non-strict ttl) make sense for users? |
3ddffa1
to
1b0add1
Compare
Please add the issue's / pr's number at the end of the commit message |
ASSERT_TRUE(values[0].empty()); | ||
CloseTtl(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as above, but also:
- Please also test the mutli-get versions that accept a single cf and multiple cf-s
I suggest adding another cf, but have the 2 cf-s have different ttl-s and verifying that each cf complies with its ttl
88cc532
to
9997ce2
Compare
b81f150
to
55f9d83
Compare
Please add the PR's number (#403) at the end of the commit message. |
statuses = db->MultiGet(ropts, keys, &values); | ||
for (const auto& i : statuses) { | ||
if (s.IsNotFound()) { | ||
std::cout << "Key has been expired as expected by MultiGet" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to cout the key's contents that has been expired
statuses[i] = StripTS(&(*values)[i]); | ||
} else { | ||
(*values)[i] = ""; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest factoring out the duplicate code (used in both Get and here) and add a helper function that accepts value and a cf handle and returns whether or not that value is stale. Basically an IsStale() but with a cf handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the comment
@@ -514,7 +527,25 @@ std::vector<Status> DBWithTTLImpl::MultiGet( | |||
if (!statuses[i].ok()) { | |||
continue; | |||
} | |||
statuses[i] = StripTS(&(*values)[i]); | |||
if (options.skip_expired_data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move line 521 (is_stale = false
) to here, right above the if (options.skip_expired_data)
. This is where it is actually used and make it clearer that it is re set / calculated (correctly) in each loop iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the comment
Options cfopts = GetOptions(column_family); | ||
auto filter = std::static_pointer_cast<TtlCompactionFilterFactory>( | ||
cfopts.compaction_filter_factory); | ||
int32_t ttl = filter->GetTtl(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also suggesting adding a second helper function (that may be used by the 1st I have suggested above) to extract_ttl(cf handle, cf opts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the comment
@ofriedma - I have submitted all of my comments except for the ttl_test.cc file |
55f9d83
to
1c2ac34
Compare
Currently even if objects have been expired we still return them until compaction delete those objects. This commit adds a skip_expired_data to the ReadOptions, if this option is true, it Get, MultiGet and iterator will not return objects that have been expired by the configured ttl. Currently this feature does not include support of for Merge operation. For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys will not be returned. Resolves: #394 Pull Request: #403
1c2ac34
to
2556636
Compare
@ofriedma - Please add the issue's number to the commit message. Thanks |
Currently even if objects have been expired we still return them until compaction delete those objects. This commit adds a skip_expired_data to the ReadOptions, if this option is true, it Get, MultiGet and iterator will not return objects that have been expired by the configured ttl. Currently this feature does not include support of for Merge operation. For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys will not be returned. Resolves: #394 Pull Request: #403
2556636
to
3e7a325
Compare
What about ttl support in db_stress? |
tools/db_bench_tool.cc
Outdated
DEFINE_int32(ttl, -1, | ||
"Opens the db with this ttl value if this is not -1. " | ||
"Carefully specify a large value such that verifications on " | ||
"deleted values don't fail"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If ttl <= 0 the ttl is useless. So, better document this and also validate (below, where applicable)
- How will the user know how to "Carefully specify a large value"? What is a large value?
- You should sanitize (validate) these 2 flags. For example, it doesn't make sense to skip expired data if ttl <= 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I wanted to keep the same documentation as the db_stress (this is a copy of line by line).
I will try to change it to be better understandable here and I will sanitise those.
tools/db_bench_tool.cc
Outdated
} | ||
db->cfh.resize(1); | ||
db->num_created = 1; | ||
db->num_hot = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why are you not supporting ttl / db with ttl when there are multiple column families?
- Why no ttl db when in read only?
- If it is indeed the case, you should add it to the sanitization part. IMO invalid combinations should cause db_bench to abort and not run as the user has explicitly requested for specific behaviour and db_bench is not testing the desired behaviour.
- And, if this code stays, you should refactor somehow the duplicate code in this case and the one below (the else block)
{(int32_t)FLAGS_ttl_seconds}); | ||
if (s.ok()) { | ||
db->db = dbttl; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Open fails (!s.ok()) we exit below, but, for better readability IMO, I would execute the 3 lines below only if s.ok().
Currently even if objects have been expired we still return them until compaction delete those objects. This commit adds a skip_expired_data to the ReadOptions, if this option is true, it Get, MultiGet and iterator will not return objects that have been expired by the configured ttl. Currently this feature does not include support of for Merge operation. For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys will not be returned. Resolves: speedb-io#394 Pull Request: speedb-io#403
Currently even if objects have been expired we still return them until compaction delete those objects. This commit adds a skip_expired_data to the ReadOptions, if this option is true, it Get, MultiGet and iterator will not return objects that have been expired by the configured ttl. Currently this feature does not include support of for Merge operation. For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys will not be returned. Resolves: #394 Pull Request: #403
fe0473c
to
a836ba2
Compare
@ofriedma - When you have made up your mind about the ttl support / changes in db_stress please update (following our discussion yesterday). Thanks |
@udi-speedb about your assumption, it was correct, small ttl is causing failure of the test, which means the test has not been suitable to handle expiration and deletion by db_stress, probably never, I'm getting not found error whether with or without skip_expired_data when use small ttl (--ttl=2). @udi-speedb what do you think we should do in that case? |
Thanks for checking this.
|
The pull request is ready to be reviewed, will squash it once the review will over. Thank you |
If db_stress will fail if a user is really trying to run it with a reasonable ttl, why not just not touch it at all? |
@@ -682,6 +682,12 @@ void StressTest::OperateDb(ThreadState* thread) { | |||
read_opts.async_io = FLAGS_async_io; | |||
read_opts.adaptive_readahead = FLAGS_adaptive_readahead; | |||
read_opts.readahead_size = FLAGS_readahead_size; | |||
if (gflags::GetCommandLineFlagInfoOrDie("ttl").is_default && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does changing the ttl to 0 (non default) mean with respect to the skip_expired_data flag and should it be included in this check and error reporting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add a check for the ttl with skip_expired_data to be greater than 0
@@ -514,7 +527,25 @@ std::vector<Status> DBWithTTLImpl::MultiGet( | |||
if (!statuses[i].ok()) { | |||
continue; | |||
} | |||
statuses[i] = StripTS(&(*values)[i]); | |||
if (options.skip_expired_data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the comment
statuses[i] = StripTS(&(*values)[i]); | ||
} else { | ||
(*values)[i] = ""; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the comment
Options cfopts = GetOptions(column_family); | ||
auto filter = std::static_pointer_cast<TtlCompactionFilterFactory>( | ||
cfopts.compaction_filter_factory); | ||
int32_t ttl = filter->GetTtl(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the comment
utilities/ttl/db_ttl_impl.h
Outdated
@@ -86,6 +86,10 @@ class DBWithTTLImpl : public DBWithTTL { | |||
|
|||
static bool IsStale(const Slice& value, int32_t ttl, SystemClock* clock); | |||
|
|||
// IsStale for strict ttl | |||
bool IsStale(const Slice& value, ColumnFamilyHandle* column_family, | |||
const ReadOptions& options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2 IsStale methods are confusing IMO. I suggest having a common IsStale that accepts a time to which to compare, and does all the common functionality and 2 IsStaleXXX where XXX indicates in what manner is it stale.
std::string value; | ||
ASSERT_OK(db_ttl_->Put(WriteOptions(), key_1, put_value)); | ||
ropts.snapshot = db_ttl_->GetSnapshot(); | ||
ASSERT_NE(ropts.snapshot, nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using the gtest:
using ::testing::NotNull;
ASSERT_THAT(ptr, NotNull());
Which also supports smart pointers and provides descriptive errors
env_->Sleep(2); | ||
ASSERT_OK(db_ttl_->CompactRange(CompactRangeOptions(), nullptr, nullptr)); | ||
// TODO prevent from ttl compaction to delete keys referenced by snapshot | ||
// ASSERT_OK(db_ttl_->Get(ropts, key_1, &value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test currently a placeholder for a time in which we properly support ttl compaction?
If it is, it might be preferable to have it disables and definitely document that before the test
: opts.env->GetSystemClock().get(); | ||
return IsStale(value, ttl, clock); | ||
} else { | ||
int64_t snapshot_time = options.snapshot->GetUnixTime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ttl <= 0 you may just avoid doing anything.
And, is it in any way evn-specific (Unix Time)? Should it run correctly on Windows?
Options opts = GetOptions(column_family); | ||
auto filter = std::static_pointer_cast<TtlCompactionFilterFactory>( | ||
opts.compaction_filter_factory); | ||
int32_t ttl = filter->GetTtl(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be preferable to just return at the top when ttl <= 0 and save any time spent below in any case
std::string key_1 = "a"; | ||
std::string put_value = "1"; | ||
ASSERT_OK(DBWithTTL::Open(options, dbname_, &db_ttl_, 1)); | ||
auto ropts = ReadOptions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is skip_expired_data left false on purpose?
Do you expect the test's results to depend on this?
Regardless, worth documenting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@udi-speedb
I've added documentation and comments, change the name of the strict ttl added a condition to ttl <= 0.
The last thing I need to test is the windows environment will run make check on top of Windows tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Please check with @maxb-io the status of building on Windows. He is very familiar with this subject.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran ttl_test on windows, ran without issues
Currently even if objects have been expired we still return them until compaction delete those objects. This commit adds a skip_expired_data to the ReadOptions, if this option is true, it Get, MultiGet and iterator will not return objects that have been expired by the configured ttl. Currently this feature does not include support of for Merge operation. For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys will not be returned. Resolves: #394, #417 Pull Request: #403
9460180
to
119b336
Compare
@udi-speedb for a reason i can't understand the squash caused your review to be cancelled, the code hasn't changed. |
Currently even if objects have been expired we still return them until compaction delete those objects. This commit adds a skip_expired_data to the ReadOptions, if this option is true, it Get, MultiGet and iterator will not return objects that have been expired by the configured ttl. Currently this feature does not include support of for Merge operation. For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys will not be returned. Resolves: #394, #417 Pull Request: #403
119b336
to
a42ec6c
Compare
@udi-speedb please approve |
Currently even if objects have been expired we still return them until compaction delete those objects. This commit adds a skip_expired_data to the ReadOptions, if this option is true, it Get, MultiGet and iterator will not return objects that have been expired by the configured ttl. Currently this feature does not include support for Merge operations. For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys will not be returned. Resolves: #394, #417 Pull Request: #403
Currently even if objects have been expired we still return them until compaction delete those objects. This commit adds a skip_expired_data to the ReadOptions, if this option is true, it Get, MultiGet and iterator will not return objects that have been expired by the configured ttl. Currently this feature does not include support for Merge operations. For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys will not be returned. Resolves: #394, #417 Pull Request: #403
Currently even if objects have been expired we still return them until compaction delete those objects. This commit adds a skip_expired_data to the ReadOptions, if this option is true, it Get, MultiGet and iterator will not return objects that have been expired by the configured ttl. Currently this feature does not include support for Merge operations. For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys will not be returned. Resolves: #394, #417 Pull Request: #403
Currently even if objects have been expired we still return them until compaction delete those objects. This commit adds a skip_expired_data to the ReadOptions, if this option is true, it Get, MultiGet and iterator will not return objects that have been expired by the configured ttl. Currently this feature does not include support for Merge operations. For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys will not be returned. Resolves: #394, #417 Pull Request: #403
Currently even if objects have been expired we still return them until compaction delete those objects. This commit adds a skip_expired_data to the ReadOptions, if this option is true, it Get, MultiGet and iterator will not return objects that have been expired by the configured ttl. Currently this feature does not include support for Merge operations. For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys will not be returned. Resolves: #394, #417 Pull Request: #403
Currently even if objects have been expired we still return them until
compaction delete those objects.
This commit adds a skip_expired_data to the ReadOptions, if this option
is true, it Get, MultiGet and iterator will not return objects that have been
expired by the configured ttl.
Currently this feature does not include support of for Merge
operation.
For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys
will not be returned.
Resolves: #394 , #417