Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

feat: make the block size configurable for mutation logs #889

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

empiredan
Copy link
Contributor

The block size used to read the files of mutation logs should be configurable to limit the memory consumed by rDSN, with log_private_block_bytes specified for private logs and log_shared_block_bytes for shared logs.

The default block size is still 1MB, as it's used before.

@@ -348,6 +350,10 @@ void replication_options::initialize()
"log_private_reserve_max_time_seconds",
log_private_reserve_max_time_seconds,
"max time in seconds of useless private log to be reserved");
log_private_block_bytes = dsn_config_get_value_int64("replication",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now a new gflag like config defination is introduced, you can use it instead.
See

#define DSN_DECLARE_int64(name) DSN_DECLARE_VARIABLE(int64_t, name)

log_private_block_bytes = dsn_config_get_value_int64("replication",
"log_private_block_bytes",
log_private_block_bytes,
"block size by bytes for private log");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'in bytes' ?

@@ -35,7 +35,8 @@ namespace dsn {
namespace replication {

log_file::~log_file() { close(); }
/*static */ log_file_ptr log_file::open_read(const char *path, /*out*/ error_code &err)
/*static */ log_file_ptr
log_file::open_read(const char *path, /*out*/ error_code &err, size_t block_bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you using the new FLAGS_XXX instead, then these parameters are not needed to pass everywhere.

@Smityz Smityz added the type/config-change PR that made modification on configs, which should be noted in release note. label Aug 30, 2021
@Smityz
Copy link
Contributor

Smityz commented Aug 30, 2021

Can you add some test results about your changes? Like the relationship between configuration and memory

@hycdong
Copy link
Contributor

hycdong commented Aug 31, 2021

Hi, empiredan~ I notice that you would like to make block size configurable to limit the memory consumed by rDSN. I'm really interested in the memory related problems. Could you please adding more comments about it? Thanks a lot.

@empiredan
Copy link
Contributor Author

Can you add some test results about your changes? Like the relationship between configuration and memory

@Smityz Yeah, it's necessary to provide some test results that show how much we can limit memory, though we've not make some formal tests. I'll try to make some ones.

@empiredan
Copy link
Contributor Author

Hi, empiredan~ I notice that you would like to make block size configurable to limit the memory consumed by rDSN. I'm really interested in the memory related problems. Could you please adding more comments about it? Thanks a lot.

@hycdong Great, I'm also interested in the memory related problems, we can discuss about this one.

Since the original description for this PR is too simple, I'll explain some more. Since the memory of our machine is very limited (also deployed with many other services), we'll try our best to save memory.

Actually this PR is added in our internal version more than a year ago. At that time, I observed from the memory profile by tcmalloc that a lot of memory had been consumed by reading plog files into the caches. Therefore, I make the buffer size for plog configurable to limit memory consumed, which is tuned from 1MB to 256KB.

As for this PR, I also make the buffer size for slog configurable. Now there is a problem we can discuss. According to the suggestion from @acelyc111 , we can use FLAGS_XXX to denote the buffer size for slog and plog. However, besides the slog and plog, some other components, such as mutation log tools and unit tests, are operating the underlying log_file class, which will lead to a deep call chain to pass the parameter of block size.

This will also result in a lot of modifications of source files. To some extent maybe a default block size will alleviate this problem, but it seems not a graceful method. Have some advices about this problem, or this PR ? Thanks !

@hycdong
Copy link
Contributor

hycdong commented Sep 6, 2021

Hi, empiredan~ I notice that you would like to make block size configurable to limit the memory consumed by rDSN. I'm really interested in the memory related problems. Could you please adding more comments about it? Thanks a lot.

@hycdong Great, I'm also interested in the memory related problems, we can discuss about this one.

Since the original description for this PR is too simple, I'll explain some more. Since the memory of our machine is very limited (also deployed with many other services), we'll try our best to save memory.

Actually this PR is added in our internal version more than a year ago. At that time, I observed from the memory profile by tcmalloc that a lot of memory had been consumed by reading plog files into the caches. Therefore, I make the buffer size for plog configurable to limit memory consumed, which is tuned from 1MB to 256KB.

As for this PR, I also make the buffer size for slog configurable. Now there is a problem we can discuss. According to the suggestion from @acelyc111 , we can use FLAGS_XXX to denote the buffer size for slog and plog. However, besides the slog and plog, some other components, such as mutation log tools and unit tests, are operating the underlying log_file class, which will lead to a deep call chain to pass the parameter of block size.

This will also result in a lot of modifications of source files. To some extent maybe a default block size will alleviate this problem, but it seems not a graceful method. Have some advices about this problem, or this PR ? Thanks !

Thanks for your detailed explanation~ I agree that make the buffer size for slog configurable is a good job~
I have an idea to avoid lots of modification of source files. By default, we use a default block size, such as in unit tests and mutation log tools etc. But if it is really used (in my view, it is really used in log_file_stream.h ), you can implement it like:

// in log_file::file_streamer construction
auto current_value = default_value
if(current_value != FLAGS_value){
      current_value = FLAGS_value;
}

@empiredan
Copy link
Contributor Author

Hi, empiredan~ I notice that you would like to make block size configurable to limit the memory consumed by rDSN. I'm really interested in the memory related problems. Could you please adding more comments about it? Thanks a lot.

@hycdong Great, I'm also interested in the memory related problems, we can discuss about this one.
Since the original description for this PR is too simple, I'll explain some more. Since the memory of our machine is very limited (also deployed with many other services), we'll try our best to save memory.
Actually this PR is added in our internal version more than a year ago. At that time, I observed from the memory profile by tcmalloc that a lot of memory had been consumed by reading plog files into the caches. Therefore, I make the buffer size for plog configurable to limit memory consumed, which is tuned from 1MB to 256KB.
As for this PR, I also make the buffer size for slog configurable. Now there is a problem we can discuss. According to the suggestion from @acelyc111 , we can use FLAGS_XXX to denote the buffer size for slog and plog. However, besides the slog and plog, some other components, such as mutation log tools and unit tests, are operating the underlying log_file class, which will lead to a deep call chain to pass the parameter of block size.
This will also result in a lot of modifications of source files. To some extent maybe a default block size will alleviate this problem, but it seems not a graceful method. Have some advices about this problem, or this PR ? Thanks !

Thanks for your detailed explanation~ I agree that make the buffer size for slog configurable is a good job~
I have an idea to avoid lots of modification of source files. By default, we use a default block size, such as in unit tests and mutation log tools etc. But if it is really used (in my view, it is really used in log_file_stream.h ), you can implement it like:

// in log_file::file_streamer construction
auto current_value = default_value
if(current_value != FLAGS_value){
      current_value = FLAGS_value;
}

@hycdong Good idea ! Thanks for suggestion ! Does this mean we use a single config FLAGS_block_bytes to denote the block size of both slog and plog ?

@hycdong
Copy link
Contributor

hycdong commented Sep 27, 2021

Hi, empiredan~ I notice that you would like to make block size configurable to limit the memory consumed by rDSN. I'm really interested in the memory related problems. Could you please adding more comments about it? Thanks a lot.

@hycdong Great, I'm also interested in the memory related problems, we can discuss about this one.
Since the original description for this PR is too simple, I'll explain some more. Since the memory of our machine is very limited (also deployed with many other services), we'll try our best to save memory.
Actually this PR is added in our internal version more than a year ago. At that time, I observed from the memory profile by tcmalloc that a lot of memory had been consumed by reading plog files into the caches. Therefore, I make the buffer size for plog configurable to limit memory consumed, which is tuned from 1MB to 256KB.
As for this PR, I also make the buffer size for slog configurable. Now there is a problem we can discuss. According to the suggestion from @acelyc111 , we can use FLAGS_XXX to denote the buffer size for slog and plog. However, besides the slog and plog, some other components, such as mutation log tools and unit tests, are operating the underlying log_file class, which will lead to a deep call chain to pass the parameter of block size.
This will also result in a lot of modifications of source files. To some extent maybe a default block size will alleviate this problem, but it seems not a graceful method. Have some advices about this problem, or this PR ? Thanks !

Thanks for your detailed explanation~ I agree that make the buffer size for slog configurable is a good job~
I have an idea to avoid lots of modification of source files. By default, we use a default block size, such as in unit tests and mutation log tools etc. But if it is really used (in my view, it is really used in log_file_stream.h ), you can implement it like:

// in log_file::file_streamer construction
auto current_value = default_value
if(current_value != FLAGS_value){
      current_value = FLAGS_value;
}

@hycdong Good idea ! Thanks for suggestion ! Does this mean we use a single config FLAGS_block_bytes to denote the block size of both slog and plog ?

Sorry for long-time no-reply. Yes, you can use FLAGS_block_bytes to denote the block size of both slog and plog, and other method that won't modify too many code~

@empiredan
Copy link
Contributor Author

We can divide all the related classes into several levels, where the lowest level is log_file (with file_streamer as its nested class):

  • From the standpoint of log_file itself, it should not care about who will use it, such as slog, plog, tools, unit tests, etc. In other words, to instantiate more appropriately, log_file must make block_bytes configurable.
  • From the standpoint of the users of the log_file, actually there are 2 kinds of users: one will instantiate a log_file with a customized block_bytes (such as slog or plog), and another won't need to care about what size the block_bytes is (such as unit tests or some tools that have little to do with the block size).

Based on the 2 points discussed above, we can draw some conclusions.

Firstly, we should import some configuration items as below:

  • FLAGS_log_default_block_bytes
  • FLAGS_log_shared_block_bytes
  • FLAGS_log_private_block_bytes

If block_bytes is customized with FLAGS_log_shared_block_bytes, FLAGS_log_private_block_bytes, or some other configurations which can be added if necessary at any time, the buffer will be initiated with the specified value. Otherwise, if block_bytes is not important for some cases, FLAGS_log_default_block_bytes will be used to create the buffer and there's no need to write any code to specify the value.

Secondly, use optional pattern to simplify the constructor of each related class. Specify the value of block_bytes directly as the parameter which will implicitly converted to some of optional. If not specified, none of optional is adopted. At the log_file level(the lowest level), none will be translated to FLAGS_log_default_block_bytes as the size of the buffer.

By the way, some bugs are found in dsn::operational, which are also fixed in this PR.

@Smityz @hycdong @acelyc111 please feel free to review the updated conclusions and codes. If this is feasible, later I'll add some unit tests for this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
memory-control type/config-change PR that made modification on configs, which should be noted in release note.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants