Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-master detection for storages, rebalancer location selection, and bug fixes #439

Merged

Commits on Nov 3, 2023

  1. storage: fix too early drop of SENT buckets

    SENT bucket could be deleted by GC before bucket_send() is
    completed. It could lead to a problem that bucket_send() would
    notice that in the end and treat as an error. The bucket should
    not be deleted while its transfer is still in progress.
    
    The bug is fixed now because it started manifesting frequently in
    new tests for the next commits.
    
    Closes #433
    
    NO_DOC=bugfix
    Gerold103 committed Nov 3, 2023
    Configuration menu
    Copy the full SHA
    b2df47c View commit details
    Browse the repository at this point in the history

Commits on Nov 15, 2023

  1. storage: fix false-positive recovery of SENDING

    SENDING bucket could be recovered back to ACTIVE state if the
    destination instance didn't yet apply the first bucket_recv().
    Could happen if the recovery was just faster than the bucket
    sending procedure.
    
    That shouldn't happen. The patch makes the recovery skip the
    local buckets which are being transferred right now, regardless of
    their state.
    
    NO_DOC=bugfix
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    1986fbd View commit details
    Browse the repository at this point in the history
  2. replicaset: rename is_auto_master to is_master_auto

    The old name looked like it meant the object itself is an auto
    assigned master. But in fact it is a replicaset flag which means
    the master role should be auto-discovered.
    
    The patch makes the name clearer. The reason for it now is that
    the flag is going to be used more in the next commits.
    
    Part of #429
    
    NO_DOC=refactoring
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    6985ddc View commit details
    Browse the repository at this point in the history
  3. storage: factor out check_is_master() function

    The master check is going to be used in more than one place in the
    next commits. Better do it via one function which this commit
    introduces.
    
    Part of #429
    
    NO_DOC=refactoring
    NO_TEST=refactoring
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    54c027e View commit details
    Browse the repository at this point in the history
  4. rebalancer: return nil in rebalancer_apply_routes

    The function used to return 'false' on error. But all functions
    which can fail are supposed to return nil on error (exception on
    critical errors such as misusage).
    
    The patch makes the function conform with the rules. This is done
    as a preparation before the function would start returning another
    error in a next commit.
    
    Part of #429
    
    NO_DOC=not visible
    NO_TEST=already covered
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    1d5a824 View commit details
    Browse the repository at this point in the history
  5. storage: use util.fiber_cond_wait everywhere

    A few places in storage code still used fiber_cond:wait()
    directly. It is not safe for negative timeouts and is too easy to
    forget to convert an error to vshard format. The patch replaces
    those usages with util.fiber_cond_wait.
    
    This was noticed while fixed another bug related to sleeps and
    fiber cancellation.
    
    NO_DOC=not visible
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    ba75476 View commit details
    Browse the repository at this point in the history
  6. storage: scatter some fiber cancellation points

    Apparently storage service fibers could live quite a while after
    their cancellation, because all what they are doing is wrapped
    into pcalls (built-in cancellation points didn't throw), and the
    cancel wasn't checked often enough.
    
    This commit adds some more cancellation points to recovery, gc,
    rebalancer.
    
    It is fixed now because in the next commits it was getting
    problematic to change master role. When a master was switched off
    and then back on, then its new services couldn't be started
    right away until the old ones finally died. It would clog the logs
    a lot.
    
    NO_DOC=bugfix
    NO_TEST=covered in next commits
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    f4beb7e View commit details
    Browse the repository at this point in the history
  7. storage: move master management code

    It is moved into its own section where the next commits will
    almost fully rewrite it. The move is done separately, because some
    parts will remain unchanged and it should be easier to see them in
    the diff this way.
    
    Part of #429
    
    NO_DOC=refactoring
    NO_TEST=refactoring
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    3bf51b2 View commit details
    Browse the repository at this point in the history
  8. storage: move rebalancer activation code

    It was inside storage_cfg, but it is going to grow too big to
    contain it in the cfg code further. The patch moves it to the
    rebalancer management section.
    
    Part of #429
    
    NO_DOC=refactoring
    NO_TEST=refactoring
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    a56d360 View commit details
    Browse the repository at this point in the history
  9. storage: refactor storage_cfg()

    It was almost entirely done in a single function, was too
    difficult to extend in next commits. This patch splits the config
    code into several functions doing a certain step of the
    configuration.
    
    The patch also introduces a config context. It is an object where
    one time per cfg call are calculated and saved all the most useful
    config details such as the current replica and replicaset UUID,
    whether the instance was a master before and is now, config of
    this particular instance and replicaset, and more.
    
    The context is introduced because these attributes are going to be
    used more widely across several functions, and passing them as
    arguments everywhere would be just unhandy. As well as calculating
    them again in each function.
    
    For example, it allows storage_cfg_master_prepare() and
    storage_cfg_master_commit() just use the already provided
    was_master_in_cfg and is_master_in_cfg and concentrate on the
    switch logic instead of figuring out how did the state change.
    
    Part of #429
    
    NO_DOC=refactoring
    NO_TEST=refactoring
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    f976c0f View commit details
    Browse the repository at this point in the history
  10. storage: move master and rebalancer role commit

    It was done after schema upgrade, but now is done before. It makes
    sense because master commit should enable read-write mode which is
    needed for the schema changes.
    
    It worked so far, because in manual master config the read-only
    cfg update is done at the box.cfg stage. I.e. before the schema
    update.
    
    But soon the master role will be allowed to turn on and off
    automatically. And then the old order wouldn't work anymore.
    
    At the same time not everything can be done before the schema is
    installed. Because some services, like recovery and bucket GC do
    need the schema (`_bucket` space at least). The solution is to
    make those services depend on whether the first configuration is
    performed. They are started only after that.
    
    Part of #429
    
    NO_DOC=refactoring
    NO_TEST=refactoring
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    6de3cff View commit details
    Browse the repository at this point in the history
  11. storage: introduce is_master flag

    There is this_is_master() function, but it wasn't used in all
    places, and it used to rely on the config. Soon the config won't
    be telling who the master is when the role is set to be automatic
    in next commits.
    
    The patch introduces a flag is_master which reflects the current
    actual state, not the config. For now it is only managed by the
    config, but in the next commit it won't be so anymore.
    
    Vtest also now won't rely on the config. It uses the actual master
    state.
    
    Part of #429
    
    NO_DOC=internal change
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    9ef6c3b View commit details
    Browse the repository at this point in the history
  12. test: rename storage_2_test.lua

    It is now storage_call_test.lua. The reason is that it contains
    just vshard.storage.call() tests. There are going to be added some
    new files and this one was getting more confusing.
    
    Needed for #429
    
    NO_DOC=refactoring
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    f7126de View commit details
    Browse the repository at this point in the history
  13. storage: introduce master='auto' for own rs

    Replicaset config gets a new option 'master' with the only
    possible value (besides nil) 'auto'. That makes the nodes of this
    replicaset decide who of them is the master by looking at their
    local box.info.ro value.
    
    Such replicaset is yet unable to run the rebalancer, and other
    replicasets and routers can't make rw requests to it. This is all
    supported in next commits.
    
    Closes #429
    
    NO_DOC=covered by a next commit
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    295b5dc View commit details
    Browse the repository at this point in the history
  14. storage: allow some internal funcs only on master

    bucket_recv, bucket_send, rebalancer_apply_routes, and
    rebalancer_request_state only make sense on the master instance.
    Now it is checked explicitly.
    
    This becomes necessary for the next commits which will enable
    master discovery at the storage level, and the likelihood of
    making those calls on a non-master instances rises up.
    
    Part of #430
    
    NO_DOC=internal change
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    20ca1ca View commit details
    Browse the repository at this point in the history
  15. storage: introduce recovery_bucket_stat()

    The new function is used by bucket recovery instead of
    bucket_stat(). The problem of the latter is that it allows to be
    invoked on a non-master instance which is not ok for recovery - it
    needs to get the most up to date state of the bucket.
    
    On the other hand it wasn't possible to just ban bucket_stat() on
    non-master instances, because it is a public function and it was
    never documented to work exclusively on master instances.
    
    Part of #430
    
    NO_DOC=internal change
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    604b16b View commit details
    Browse the repository at this point in the history
  16. storage: introduce master_call() wrapper

    It wraps replicaset:callrw(). The purpose is to handle the
    NON_MASTER error in a single place. It is going to become much
    more likely to get after the next commits which will enable master
    discovery at the storage level.
    
    It will be important to notice that the node, which was thought to
    be the master, is not in that role anymore, and a new discovery is
    needed.
    
    Part of #430
    
    NO_DOC=internal change
    NO_TEST=covered in next commits
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    dbc4a64 View commit details
    Browse the repository at this point in the history
  17. cfg: introduce extract_vshard and extract_box

    Those functions replace the old split() function. It is needed
    mainly for tests which are going to pass read_only options
    per-instance in the global config. But it also looks like an
    actually useful feature even outside of tests.
    
    In tests that is needed for auto-master cases. There is already
    one place - auto_master_2_test.lua, but for that one the patch in
    vtest was enough and helped only for bootstrap. The current patch
    allows to reconfigure the instances after bootstrap via their
    own private config sections.
    
    Needed for #430
    
    @TarantoolBot document
    Title: vshard: per-instance box options in the config
    
    Previously the config could include box options only on the root
    level like so:
    
    ```Lua
    {
        -- VShard options:
        sharding = ...,
        weights = ...,
        bucket_count = ...,
        ...
        -- Box options:
        replication_timeout = ...,
        memtx_use_mvcc_engine = ...,
        net_msg_max = ...,
    }
    ```
    Those box-options are propagated to `box.cfg{}` on each instance
    using this config. But there was no way to pass different options
    for difference instances via one config.
    
    Now it is possible via their config sections in `sharding`:
    ```Lua
    {
        sharding = {
            [replicaset_uuid] = {
                replicas = {
                    [replica_uuid] = {
                        replication_timeout = ...,
                        net_msg_max = ...,
                        read_only = ...,
                    }
                },
                ...
            },
            ...
        },
        ...
        -- Box options:
        replication_timeout = ...,
        memtx_use_mvcc_engine = ...,
        net_msg_max = ...,
    }
    ```
    
    For each instance its resulting `box.cfg` is merged from the root
    box options and from the instance's section in `sharding`. If some
    options are specified in both places, then the priority is for the
    replica-local options.
    
    For example:
    ```
    {
        sharding = {
            [replicaset_uuid] = {
                replicas = {
                    [replica_uuid] = {
                        replication_timeout = 100,
                        net_msg_max = 200,
                    }
                },
            },
        },
        replication_timeout = 50,
        txn_timeout = 300,
    }
    ```
    The resulting `box.cfg` config for the given replica will be:
    ```Lua
    {
        replication_timeout = 100,
        net_msg_max = 200,
        txn_timeout = 300,
    }
    ```
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    760eb70 View commit details
    Browse the repository at this point in the history
  18. storage: add timeouts to all master_call()

    Many of them used to use the default timeout
    (replica.net_timeout). It is working good enough, but isn't really
    suitable for storages - this auto-tuning timeout thing was
    designed for routers mostly.
    
    Besides, soon in master_call() it will be needed to know the
    timeout before doing the call itself. Getting the net_timeout
    out of the master object would work for now, but soon this won't
    be possible when auto-master will be enabled with discovery.
    
    That all means either trying schemas like the router does for the
    same job, or simply always pass an explicit timeout to all
    master calls. The patch follows the latter way.
    
    The need to know the timeout will appear for doing retries inside
    master_call() when the master is not known right now, but chances
    are high is going to be discovered in a few milliseconds. For the
    waiting a deadline is needed, and for that a timeout is necessary.
    
    Part of #430
    
    NO_DOC=internal change
    NO_TEST=covered
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    0728c29 View commit details
    Browse the repository at this point in the history
  19. replicaset: create missing conns in locate_master()

    locate_master() would not even try to create connections to the
    instances, which didn't have an already existing one. As a result,
    if the actual master wasn't ever contacted by any calls, then it
    would stay undiscovered.
    
    Part of #430
    
    NO_DOC=bugfix
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    0664937 View commit details
    Browse the repository at this point in the history
  20. storage: introduce full master discovery

    Until now only master discovery of own replicaset was supported on
    the storage level. But now the storages can find masters of other
    replicasets.
    
    The design is similar to the router's master discovery, except
    that it is not proactive. Storages rarely need to talk to each
    other which makes pointless to constantly maintain all the
    connections and re-check the instances' roles to keep the master's
    location info up to date.
    
    Instead, the discovery is reactive. It only works for the
    replicasets which are being used for any calls right now.
    
    Part of #430
    
    @TarantoolBot document
    Title: vshard: master discovery on storage level
    Routers do already support `master = 'auto'` setting at the
    replicaset level in the config. It makes them able to
    automatically find the master instance location in the so
    configured replicaset.
    
    On the storages it wasn't supported at all. But now it is. The
    storages can specify in their configs the same option
    `master = 'auto'` for their own replicaset and for any other
    replicasets too.
    
    If the storage has `master = 'auto'` for its own replicaset, then
    it will constantly monitor `box.info.ro`. When it becomes `true`,
    the instance considers itself a replica. When false - considers
    itself a master. It also means that such automatic masters will
    start and stop all needed background fibers automatically as well,
    depending on the actual state of the instance, not on the config.
    
    However it is worth keeping in mind that this option doesn't mean
    master election. It only means master **detection**. I.e. if this
    option is used, then the user's code is supposed to assign the
    masters using `box.cfg.read_only` or any other way.
    
    If the storage sees `master = 'auto'` for any other replicasets
    than its own, then it will be able to automatically discover their
    master instance for internal operations, such as bucket recovery
    and rebalancing.
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    e1083fd View commit details
    Browse the repository at this point in the history
  21. rebalancer: enable it on auto-master replicasets

    Previous commit added support for full master discovery on the
    storages, but the rebalancer didn't work on replicasets having
    auto master enabled for themselves. Now it works on those too.
    
    The priority is firstly given to manual masters. If none are found
    in the entire cluster, then the rebalancer appears on the
    auto-assigned master with the minimal replicaset UUID among all
    the other auto-masters.
    
    Closes #430
    
    NO_DOC=documented later with new rebalancer options
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    1cd2873 View commit details
    Browse the repository at this point in the history
  22. storage: introduce no-activity timeout for conns

    Automatic master discovery on the storages means, that the
    instances are going to establish more connections between
    replicasets to locate the masters of each other.
    
    That might easily lead to fullmesh of the whole cluster - it means
    lots of connections majority of which are not even used most of
    the time. Need to get rid of them somehow if they are not used for
    a while.
    
    This commit makes the storages drop the too long unused
    connections. Also the automatically discovered master locations
    are erased by the same no-activity timeout. The logic behind it is
    that if an instance wasn't contacted anyhow for a very long time,
    then it might be not a master anymore. It is better not to treat
    it as a master then and let it be re-discovered when a master
    connection is needed again.
    
    Follow-up #429
    Follow-up #430
    
    NO_DOC=internal
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    a0d7ddc View commit details
    Browse the repository at this point in the history
  23. rebalancer: introduce rebalancer flag

    The flags allows to explicitly assign the rebalancer role to a
    specific instance or a replicaset. Or forbid its automatic
    assignment to instances or whole replicasets.
    
    Part of #432
    
    NO_DOC=in the final commit with rebalancer options
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    9380966 View commit details
    Browse the repository at this point in the history
  24. rebalancer: introduce rebalancer_mode

    Closes #432
    
    @TarantoolBot document
    Title: vshard: rebalancer flag and mode
    So far it was impossible to specify which instance should run the
    rebalancer. It was always automatically assigned using some
    internal rules based on UUIDs.
    
    Now the users can choose:
    - Which specific instance should run the rebalancer. Can be a
        replica or a master - won't matter.
    - In which replicaset the instance to run the rebalancer should be
        selected automatically.
    - Which instances and whole replicasets should not run the
        rebalancer even when it is selected automatically.
    
    For that there are 2 new options: `rebalancer = <bool>` and
    `rebalancer_mode = <name>`.
    
    The `rebalancer` flag can be either omitted, or set to true, or
    false. It can be set for replicasets and for specific instances.
    There can be only one `rebalancer = true` in the whole config. But
    can be many `rebalancer = false`.
    
    * `rebalancer = true` assigned to an instance means that this
        instance is guaranteed to run the rebalancer service on it.
        The instance role doesn't matter - it can be a replica or a
        master. Will run the rebalancer anyway.
    
    * `rebalancer = true` assigned to a replicaset means that the
        service will run only on the master of this replicaset. Can be
        combined with `master = 'auto'` on the given replicaset.
    
    * `rebalancer = false` assigned to an instance means that it will
        not run the rebalancer.
    
    * `rebalancer = false` assigned to a replicaset means that all the
        instances of this replicaset will not run the rebalancer.
    
    * `rebalancer = nil` (same as omitted, default) means that the
        instance/replicaset will be eligible to run the rebalancer
        only if `rebalancer_mode = 'auto'` is set and there are no
        `rebalancer = true` anywhere.
    
    The option `rebalancer_mode` should be specified in the root of
    the config. It can have one of those values:
    
    * `'auto'` - default. Means that the rebalancer service location
        is chosen automatically among all master instances in the
        cluster. Excluding those which have `rebalancer = false` on
        them or on their replicaset. If there are any
        `rebalancer = true`, then this mode works the same as
        `'manual'`.
    
    * `'manual'`. The rebalancer will run only if there is at least
        one `rebalancer = true` in the config. And only on the given
        instance / replicaset (depending on at which level the flag
        was specified - for a specific instance or for a whole
        replicaset).
    
    * `'off'`. The rebalancer will not run anywhere, regardless of all
        the `rebalancer = true/false` specified in the config.
    Gerold103 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    201f1ee View commit details
    Browse the repository at this point in the history