Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tt: add command tt upgrade #936

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mandesero
Copy link
Contributor

@mandesero mandesero commented Sep 3, 2024

tt upgrade command steps:

  • For each replicaset:
    • On the master instance:
      1. Execute the following commands sequentially:
        box.schema.upgrade()
        box.snapshot()
    • On each replica:
      1. Wait until the replica applies all transactions produced by box.schema.upgrade() on the master by comparing the vector clocks (vclock).
      2. Once synchronization is confirmed, execute the following command on the replica:
        box.snapshot()

If any errors occur during the upgrade process, the process will stop and an error report will be generated.


The replica is waiting for synchronization for Timeout seconds. The default value for Timeout is 5 seconds, but you can specify it manually using the --timeout option.

$tt upgrade [<APP_NAME>] --timeout 10

You can also specify which replicaset(s) to upgrade by using the --replicaset option.

$tt upgrade [<APP_NAME>] --replicaset <RS_NAME_1> -r <RS_NAME_2> ...

@mandesero mandesero force-pushed the mandesero/gh-924-upgrade-db-schema branch 3 times, most recently from b10d9bd to 34a3589 Compare September 6, 2024 15:31
@mandesero
Copy link
Contributor Author

mandesero commented Sep 6, 2024

Examples:

Case 1: OK

INSTANCE            STATUS   PID    MODE
 app2:storage-002-b  RUNNING  12556  RO
 app2:router-001-a   RUNNING  12560  RW
 app2:storage-001-a  RUNNING  12567  RO
 app2:storage-001-b  RUNNING  12554  RO
 app2:storage-002-a  RUNNING  12555  RW
$ tt upgrade app2
• storage-001: ok
• storage-002: ok
• router-001: ok

Case 2: More than one master in the same replicaset

INSTANCE            STATUS   PID    MODE
app2:storage-002-b  RUNNING  12556  RO
app2:router-001-a   RUNNING  12560  RW
app2:storage-001-a  RUNNING  12567  RO
app2:storage-001-b  RUNNING  12554  RW
app2:storage-002-a  RUNNING  12555  RW
$ tt upgrade app2
• storage-001: error
  ⨯ [storage-001]: app2:storage-001-a and app2:storage-001-b are both masters

Case 3: LSN didn't update

$ tt upgrade app2
• storage-001: error
   ⨯ [storage-001]: LSN wait timeout: error waiting LSN 2003085 in vclock component 1 on app2:storage-001-b: time quota 5 seconds exceeded

Case 4: There is a replicaset that does not have a master

 INSTANCE            STATUS   PID    MODE
 app2:storage-002-b  RUNNING  12556  RO
 app2:router-001-a   RUNNING  12560  RW
 app2:storage-001-a  RUNNING  12567  RO
 app2:storage-001-b  RUNNING  12554  RO
 app2:storage-002-a  RUNNING  12555  RO
$ tt upgrade app2
• storage-001: error
   ⨯ [storage-001]: has not master instance

Case 5: A non-existent replicaset was specified

$ tt upgrade app2 --replicaset foo
   ⨯ replicaset with alias "foo" doesn't exist

@mandesero mandesero force-pushed the mandesero/gh-924-upgrade-db-schema branch 4 times, most recently from 2b33e94 to f2406ee Compare September 8, 2024 12:06
@mandesero mandesero changed the title tt: add command tt upgrade tt: add command tt upgrade [WIP] Sep 9, 2024
@mandesero mandesero force-pushed the mandesero/gh-924-upgrade-db-schema branch 3 times, most recently from 234b980 to e12fe07 Compare September 10, 2024 11:52
cli/upgrade/upgrade.go Outdated Show resolved Hide resolved
@mandesero mandesero force-pushed the mandesero/gh-924-upgrade-db-schema branch 8 times, most recently from a30a1e5 to 45fc816 Compare September 17, 2024 09:59
@mandesero mandesero changed the title tt: add command tt upgrade [WIP] tt: add command tt upgrade Sep 17, 2024
@mandesero mandesero marked this pull request as ready for review September 17, 2024 10:14
cli/upgrade/upgrade.go Outdated Show resolved Hide resolved
cli/upgrade/upgrade.go Outdated Show resolved Hide resolved
cli/upgrade/upgrade.go Outdated Show resolved Hide resolved
cli/upgrade/upgrade.go Outdated Show resolved Hide resolved
@mandesero mandesero force-pushed the mandesero/gh-924-upgrade-db-schema branch 2 times, most recently from 5168866 to 8f0f987 Compare October 8, 2024 13:37
00000000000000000000.snap Outdated Show resolved Hide resolved
test/integration/replicaset/test_replicaset_upgrade.py Outdated Show resolved Hide resolved
test/integration/replicaset/test_replicaset_upgrade.py Outdated Show resolved Hide resolved
test/integration/replicaset/test_replicaset_upgrade.py Outdated Show resolved Hide resolved
cli/replicaset/cmd/upgrade.go Outdated Show resolved Hide resolved
cli/replicaset/cmd/upgrade.go Outdated Show resolved Hide resolved
cli/replicaset/cmd/upgrade.go Outdated Show resolved Hide resolved
cli/replicaset/cmd/upgrade.go Outdated Show resolved Hide resolved
cli/replicaset/cmd/upgrade.go Outdated Show resolved Hide resolved
@oleg-jukovec
Copy link
Contributor

Please, rebase on the latest commit in the master branch.

@mandesero mandesero force-pushed the mandesero/gh-924-upgrade-db-schema branch 4 times, most recently from 869a5f6 to 633fb46 Compare October 28, 2024 08:18
@mandesero
Copy link
Contributor Author

Please, rebase on the latest commit in the master branch.

Rebased.

@mandesero mandesero force-pushed the mandesero/gh-924-upgrade-db-schema branch from 633fb46 to 85e12e1 Compare October 28, 2024 08:48
Copy link
Contributor

@oleg-jukovec oleg-jukovec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update. I have a couple of non-critical comments.
Please, rebase to the master too.

cli/replicaset/cmd/lua/upgrade.lua Outdated Show resolved Hide resolved
cli/replicaset/cmd/upgrade.go Outdated Show resolved Hide resolved
cli/replicaset/cmd/upgrade.go Outdated Show resolved Hide resolved
cli/replicaset/cmd/upgrade.go Outdated Show resolved Hide resolved
test/integration/replicaset/test_replicaset_upgrade.py Outdated Show resolved Hide resolved
cli/replicaset/cmd/lua/upgrade.lua Show resolved Hide resolved
cli/replicaset/cmd/upgrade.go Outdated Show resolved Hide resolved
cli/replicaset/cmd/upgrade.go Outdated Show resolved Hide resolved
cli/replicaset/cmd/upgrade.go Outdated Show resolved Hide resolved
test/integration/replicaset/single-t2-app/init.lua Outdated Show resolved Hide resolved
cli/replicaset/cmd/upgrade.go Show resolved Hide resolved
@mandesero mandesero force-pushed the mandesero/gh-924-upgrade-db-schema branch from e294018 to f3c6c07 Compare November 14, 2024 12:40
@oleg-jukovec oleg-jukovec added the full-ci Enables full ci tests label Nov 14, 2024
@mandesero mandesero force-pushed the mandesero/gh-924-upgrade-db-schema branch from f3c6c07 to 6b062d1 Compare November 18, 2024 11:01
Part of tarantool#924

@TarantoolBot document
Title: `tt replicaset upgrade` upgrades database schema.

The `tt replicaset upgrade` command allows for a automate upgrade of each
replicaset in a Tarantool cluster. The process is performed sequentially on
the master instance and its replicas to ensure data consistency. Below are
the steps involved:

For Each Replicaset:
- **On the Master Instance**:
  1. Run the following commands in sequence to upgrade the schema and take
  a snapshot:
     ```lua
     box.schema.upgrade()
     box.snapshot()
     ```

- **On Each Replica**:
  1. Wait for the replica to apply all transactions produced by the
  `box.schema.upgrade()` command executed on the master. This is done
  by monitoring the vector clocks (vclock) to ensure synchronization.
  2. Once the repica has caught up, run the following command to take
  a snapshot:
     ```lua
     box.snapshot()
     ```

> **Error Handling**: If any errors occur during the upgrade process, the
operation will halt, and an error report will be generated.

---

- Timeout for Synchronization

Replicas will wait for synchronization for a maximum of `Timeout` seconds.
The default timeout is set to 5 seconds, but this can be adjusted manually
using the `--timeout` option.

**Example:**
```bash
$ tt replicaset upgrade [<APP_NAME>] --timeout 10
```

- Selecting Replicasets for Upgrade

You can specify which replicaset(s) to upgrade by using the `--replicaset`
or `-r` option to target specific replicaset names.

**Example:**
```bash
$ tt replicaset upgrade [<APP_NAME> | <URI>] --replicaset <RS_NAME_1> -r <RS_NAME_2> ...
```

This provides flexibility in upgrading only the desired parts of the cluster
without affecting the entire system.
@mandesero mandesero force-pushed the mandesero/gh-924-upgrade-db-schema branch from 6b062d1 to 62ec8a3 Compare November 18, 2024 11:05
@mandesero
Copy link
Contributor Author

mandesero commented Nov 18, 2024

It was found that sometimes the replicaset name might not reach tt for some reason. I simplified the check in the test so that the test wouldn't be flaky.

For example:

• 5b3a3d0d-5ee2-40d1-989e-d4d68687581e: ok
• router-001: ok
• storage-001: ok

But should be:

• storage-001: ok
• storage-002: ok
• router-001: ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full-ci Enables full ci tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants