Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a switch to choose between Scylla and Rclone API for backup/restore #4163

Open
Michal-Leszczynski opened this issue Dec 13, 2024 · 6 comments
Assignees

Comments

@Michal-Leszczynski
Copy link
Collaborator

Should we allow user to decide whether backup/restore tasks should use Rclone or Scylla API?

Having a way of falling back to Rclone (without rolling back to previous SM release, which is not always possible) when some issues regarding Scylla API or its usage by SM are encountered makes sense.
It would also allow for easier testing of cross Rclone/Scylla backup/restore.

So I would say that we should add a switch to choose between those APIs, and that the Scylla API should be the default choice (when possible).

Such switch could be implemented as a new restore/backup flags or as a config field in scylla-manager.yaml file.

I would say that it's easier for the user and the testing if we decide to go with the new flags.
The only drawback is that we are adding more and more flags to those commands, which might make them more difficult to configure for the users. On the other hand, in an ideal scenario, we don't expect users to use this flag at all. But if that's a case, maybe it's enough to just put it in the scylla-manager.yaml file, where it can still be configured on the really rare occasions, when it's needed?

What do you think?
cc: @karol-kokoszka @VAveryanov8 @regevran

@regevran
Copy link

I don't think we need to keep both options - as the user has nothing to select from. It is implementation detail that, given the option to select, just confuses.

Having said that, if the same SM version needs to support both rclone and scylla - one solution could be to make it configurable. A better solution (IMO) would be to query scylla for its ability to backup and restore - and when it can't - fallback to rclone. No user setup at all.

@Michal-Leszczynski
Copy link
Collaborator Author

Having said that, if the same SM version needs to support both rclone and scylla - one solution could be to make it configurable.

This will most likely be the case (cc: @karol-kokoszka @tzach)

A better solution (IMO) would be to query scylla for its ability to backup and restore - and when it can't - fallback to rclone. No user setup at all.

Is there a better way for checking Scylla features than checking Scylla version (SM does not have access to Scylla swagger)?

I agree in general - my only concern is that if there is a bug on SM or Scylla side regarding new backup/restore implementation, then it probably would be a good idea to add a safety switch allowing user to use Rclone.
In order not to confuse the user, it could be a part of scylla-manager.yaml config, which is not noticed/modified by the majority of the users. By default, SM would use Scylla backup/restore API when possible - the only use case of this config option would be to not use Scylla backup/restore API, even when possible.
But if you think that's still a bad direction, then I'm fine with making it not user configurable at all.

@regevran
Copy link

if there is a bug on SM or Scylla side

I am not familiar with defensive programming - finding solutions to problems that were not happened yet.
I think it would add more confusion and by itself may be a source of troubles.
E.g. someone someday changes the yaml and then experiences degradation in performance and we need to look for the reason.
But again - if it is more convenient for SM to support older Scylla versions then, taking into consideration the risks - I am not objecting the yaml solution.

@tzach
Copy link
Collaborator

tzach commented Dec 16, 2024

To validate:

  • Scylla API: ScyllaDB core interacts directly with S3
  • Rclone API: Manager agents use Rclone to interact directly with S3

My 2c:

  • User should not care if we are using rclone
  • Best will be to convert to one solution ASAP. Supporting two solutions is a pain.
  • As long as the core backup does not support all cloud, we must keep agent backup as an option.

@karol-kokoszka
Copy link
Collaborator

We still need to support older versions of Scylla (correct me if I'm wrong), what means that the RClone support for backups and restores stays in Scylla Manager code.

Besides, for now, Scylla API covers just transferring SSTables from/to S3. Things like deduplication or creating the manifest file are not covered, what means that SM have to use RClone to do that still.
And yeah, Scylla API doesn't cover Azure, GCP.

I understand that Scylla Manager must do the validation step before proceeding with the backup/restore to just check if the Scylla API for it is available.
Why not to leave the safety vault that just verifies the Scylla Manager configuration against presence of a flag i.e. "force_non_scylla_api_for_backup_restore" or something similar ?
By default, the flag is not included what means that ScyllaAPI is always used when available.
But, in case of the production emergency, or even simple experiment, it can be easily set to true.

@regevran
Copy link

Things like deduplication or creating the manifest file are not covered, what means that SM have to use RClone to do that still.

Yes, by decision -
12 Sep 2024:

Rclone is going to serve us. There is no short-term target to remove it completely. We’ll use it when there is no other solution. With time, long-term target we’ll do our best to move functionality from it to Scylla.)

"force_non_scylla_api_for_backup_restore" or something similar

Sure. Fine with me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants