Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the new encoding with 64bit size and expire time in milliseconds #1342

Merged
merged 27 commits into from
Mar 23, 2023

Conversation

PragmaTwice
Copy link
Member

@PragmaTwice PragmaTwice commented Mar 21, 2023

It closes #1033.

See proposal in #1033, after this change, these old data can still be readable/writable, but all new data will be written via the new encoding.

NOTE: although the encoding now support 64bit size, maybe some place in code still use int32_t/int so it cannot work well in large number of items more than 32bit. We can fix them after this PR.

After some discussion, we create a new build option ENABLE_NEW_ENCODING (currently default OFF), and users need to turn this option on to use this feature.

@PragmaTwice PragmaTwice requested review from git-hulk and torwig March 21, 2023 10:24
@torwig
Copy link
Contributor

torwig commented Mar 21, 2023

@PragmaTwice Wow! Awesome feature, especially 64-bit size.

@aleksraiden
Copy link
Contributor

Awesome, I very need this feature!!

@PragmaTwice PragmaTwice marked this pull request as ready for review March 22, 2023 10:01
@PragmaTwice
Copy link
Member Author

It is ready for review now.

// element size of the key-value
uint64_t size;

explicit Metadata(RedisType type, bool generate_version = true, bool use_64bit_common_field = true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case when we need to pass explicitly the use_64bit_common_field parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we do not need to change this parameter. The old data will be decoded normally, and the new data will be encoded use the new encoding (64bit).

Copy link
Member Author

@PragmaTwice PragmaTwice Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I am thinking if we set the parameter to false to allow users rollback to previous kvrocks versions (i.e. users cannot rollback (e.g. to 2.3.0) when they use the new encoding to write some new data to the database, although these old data will be treat as well). Do you have some idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also had the same idea.
So, maybe we can set 64-bit encoding in the constructor by default, and it will be overwritten by the Decode method in case of decoding from the 'old' format?

Copy link
Member Author

@PragmaTwice PragmaTwice Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can set 64-bit encoding in the constructor by default, and it will be overwritten by the Decode method in case of decoding from the 'old' format?

Yes, you are right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought that the use_64bit_common_field can be useful in unit tests to test backward compatibility, for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought that the use_64bit_common_field can be useful in unit tests to test backward compatibility, for example.

Good idea.

Co-authored-by: hulk <hulk.website@gmail.com>
@PragmaTwice PragmaTwice added feature type new feature major decision Requires project management committee consensus A-build area build labels Mar 23, 2023
git-hulk
git-hulk previously approved these changes Mar 23, 2023
@git-hulk git-hulk requested a review from torwig March 23, 2023 05:57
@torwig
Copy link
Contributor

torwig commented Mar 23, 2023

@PragmaTwice Hi, I'm trying to add a few unit tests, and the following one is failing:

TEST(Metadata, MetadataDecodingBackwardCompatibleSimpleKey) {
  auto expire_at = Util::GetTimeStamp() + 10;
  Metadata md_old(kRedisString, true, false);
  EXPECT_FALSE(md_old.Is64BitEncoded());
  md_old.expire = expire_at;
  std::string encoded_bytes;
  md_old.Encode(&encoded_bytes);

  Metadata md_new(kRedisNone, false, true); // decoding existing metadata with 64-bit feature activated
  md_new.Decode(encoded_bytes);
  EXPECT_FALSE(md_new.Is64BitEncoded());
  EXPECT_EQ(md_new.Type(), kRedisString);
  EXPECT_EQ(md_new.expire, expire_at*1000);
}

With the following error:

Expected equality of these values:
  md_new.expire
    Which is: 1679553000
  expire_at*1000
    Which is: 1679553143000

I'm just wondering is there an error in logic or if my test is somehow incorrect (typo/misunderstanding/etc.) :)
Could you please have a look at it?

@PragmaTwice
Copy link
Member Author

PragmaTwice commented Mar 23, 2023

Hi @torwig, thanks for your testing.

In this PR, the metadata.expire is now always in milliseconds (whatever it is in 64bit mode or not). And only when we encode it into some raw data in non-64bit mode, it will be converted to seconds.

This design is aimed to make the code simple. If the metadata.expire can be either in milliseconds or seconds, it can be a mess.

PragmaTwice and others added 2 commits March 23, 2023 15:07
Co-authored-by: Yaroslav <torwigua@gmail.com>
@torwig
Copy link
Contributor

torwig commented Mar 23, 2023

@PragmaTwice I see.
But if I use a new version of Kvrocks and I have existing data and their metadata contain expiration time in seconds. Then I use Decode on them and I expect that the new expiration time won't change, just seconds will be transformed into milliseconds. Am I right?

@PragmaTwice
Copy link
Member Author

@PragmaTwice I see. But if I use a new version of Kvrocks and I have existing data and their metadata contain expiration time in seconds. Then I use Decode on them and I expect that the new expiration time won't change, just seconds will be transformed into milliseconds. Am I right?

Sure. The logic is in here:

https://github.com/apache/incubator-kvrocks/blob/11f2a80c1fa1fd17e8fa4a2c600a8e34eea1e3e0/src/storage/redis_metadata.cc#L264

@torwig
Copy link
Contributor

torwig commented Mar 23, 2023

@PragmaTwice So there is a mistake in my test?

@PragmaTwice
Copy link
Member Author

PragmaTwice commented Mar 23, 2023

@PragmaTwice So there is a mistake in my test?

The correct version is like this:

TEST(Metadata, MetadataDecodingBackwardCompatibleSimpleKey) {
  auto expire_at = (Util::GetTimeStamp() + 10) * 1000;
  Metadata md_old(kRedisString, true, false);
  EXPECT_FALSE(md_old.Is64BitEncoded());
  md_old.expire = expire_at;
  std::string encoded_bytes;
  md_old.Encode(&encoded_bytes);

  Metadata md_new(kRedisNone, false, true); // decoding existing metadata with 64-bit feature activated
  md_new.Decode(encoded_bytes);
  EXPECT_FALSE(md_new.Is64BitEncoded());
  EXPECT_EQ(md_new.Type(), kRedisString);
  EXPECT_EQ(md_new.expire, expire_at);
}

You need to use milliseconds for metadata.expire.

@torwig
Copy link
Contributor

torwig commented Mar 23, 2023

@PragmaTwice I intentionally use seconds and explicit false as the last parameter of a constructor to model the current encoding of metadata. And then I'm decoding that metadata with the new version by passing explicit true as the last parameter to the constructor.

@PragmaTwice
Copy link
Member Author

PragmaTwice commented Mar 23, 2023

I intentionally use seconds and explicit false as the last parameter of a constructor to model the current encoding of metadata. And then I'm decoding that metadata with the new version by passing explicit true as the last parameter to the constructor.

metadata.expire now cannot be seconds. It must be milliseconds, and when it is encoded to some raw data, it will be converted to seconds for use_64bit_common_field=false.

Your test is added to the codebase, thanks! 558bece

@PragmaTwice PragmaTwice requested a review from git-hulk March 23, 2023 07:57
Co-authored-by: Yaroslav <torwigua@gmail.com>
@torwig
Copy link
Contributor

torwig commented Mar 23, 2023

@PragmaTwice For testing new features I have the following tests, you can add them:

TEST(Metadata, Metadata64bitExpiration) {
  auto expire_at = Util::GetTimeStampMS() + 1000;
  Metadata md_src(kRedisString, true, true);
  EXPECT_TRUE(md_src.Is64BitEncoded());
  md_src.expire = expire_at;
  std::string encoded_bytes;
  md_src.Encode(&encoded_bytes);

  Metadata md_decoded(kRedisNone, false, true);
  md_decoded.Decode(encoded_bytes);
  EXPECT_TRUE(md_decoded.Is64BitEncoded());
  EXPECT_EQ(md_decoded.Type(), kRedisString);
  EXPECT_EQ(md_decoded.expire, expire_at);
}

TEST(Metadata, Metadata64bitSize) {
  uint64_t big_size = 100000000000;
  Metadata md_src(kRedisHash, true, true);
  EXPECT_TRUE(md_src.Is64BitEncoded());
  md_src.size = big_size;
  std::string encoded_bytes;
  md_src.Encode(&encoded_bytes);

  Metadata md_decoded(kRedisNone, false, true);
  md_decoded.Decode(encoded_bytes);
  EXPECT_TRUE(md_decoded.Is64BitEncoded());
  EXPECT_EQ(md_decoded.Type(), kRedisHash);
  EXPECT_EQ(md_decoded.size, big_size);
}

To test backward compatibility, I have the following two ones, but they are failing:

TEST(Metadata, MetadataDecodingBackwardCompatibleSimpleKey) {
  auto expire_at = Util::GetTimeStamp() + 10; // <-- assign seconds here
  Metadata md_old(kRedisString, true, false);
  EXPECT_FALSE(md_old.Is64BitEncoded());
  md_old.expire = expire_at;
  std::string encoded_bytes;
  md_old.Encode(&encoded_bytes);

  Metadata md_new(kRedisNone, false, true);
  md_new.Decode(encoded_bytes);
  EXPECT_FALSE(md_new.Is64BitEncoded());
  EXPECT_EQ(md_new.Type(), kRedisString);
  EXPECT_EQ(md_new.expire, expire_at*1000);
}

TEST(Metadata, MetadataDecodingBackwardCompatibleComplexKey) {
  auto expire_at = Util::GetTimeStamp() + 100; // <-- assign seconds here
  uint32_t size = 1000000000;
  Metadata md_old(kRedisHash, true, false);
  EXPECT_FALSE(md_old.Is64BitEncoded());
  md_old.expire = expire_at;
  md_old.size = size;
  std::string encoded_bytes;
  md_old.Encode(&encoded_bytes);

  Metadata md_new(kRedisHash, false, true);
  md_new.Decode(encoded_bytes);
  EXPECT_FALSE(md_new.Is64BitEncoded());
  EXPECT_EQ(md_new.Type(), kRedisHash);
  EXPECT_EQ(md_new.expire, expire_at*1000);
  EXPECT_EQ(md_new.size, size);
}

But fixing them by assigning milliseconds instead of seconds won't work because in this case, they would not test backward compatibility anymore. So these two tests can be discarded.

@PragmaTwice
Copy link
Member Author

@torwig Thanks! Added.

Copy link
Contributor

@torwig torwig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
@PragmaTwice Now I realized what was wrong with my failing tests: I set the expiration timestamp in seconds while 64-bit encoding was false, trying to emulate "old" behavior, and PutExpire made a division by 1000, and the result confused me :)

@PragmaTwice
Copy link
Member Author

Thanks @torwig and @git-hulk. Merging...

@PragmaTwice PragmaTwice merged commit bfeb26c into apache:unstable Mar 23, 2023
@mapleFU
Copy link
Member

mapleFU commented Mar 23, 2023

Can some stale versions of kvrocks recognize this format? If user ENABLE it, and find other bug, and revert kvrocks to a old version. Would user get segment fault on these keys?

@PragmaTwice
Copy link
Member Author

Can some stale versions of kvrocks recognize this format? If user ENABLE it, and find other bug, and revert kvrocks to a old version. Would user get segment fault on these keys?

Firstly, we currently do not enable this new encoding by default.

And if users write some data via the new encoding, they cannot revert their kvrocks version to 2.3.0 or earlier. But if we release 2.4.0 and keep ENABLE_NEW_ENCODING=OFF, and then release 2.5.0 to turn on it as default, the users can revert their version from 2.5.0 to 2.4.0 without any data breaking.

BTW, I think any encoding changes may be a pain to users, which is not specially related to this PR.

if (generate_version) version = generateVersion();
}
Metadata::Metadata(RedisType type, bool generate_version, bool use_64bit_common_field)
: flags((use_64bit_common_field ? METADATA_64BIT_ENCODING_MASK : 0) | (METADATA_TYPE_MASK & type)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi twice, when decoding, should we force get use_64bit from field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

@@ -155,6 +158,7 @@ rocksdb::Status Database::TTL(const Slice &user_key, int *ttl) {
Metadata metadata(kRedisNone, false);
metadata.Decode(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Decode happens here, would it have the right data when a 32bit size instance read 64bits data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. You can read the implementation of Decode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, seems Decode reset it flags rather than default argument. Seems ok to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build area build feature type new feature major decision Requires project management committee consensus
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: New encoding with compatibility to old data
6 participants