-
Notifications
You must be signed in to change notification settings - Fork 476
Restructured DR backup section for clarity and thoroughness #20873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Restructured backup and dr section for clarity and thoroughness
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
Files changed: |
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for getting this started! let me know if you have clarifying questions about my feedback.
|
|
||
| ## Disaster recovery | ||
|
|
||
| When cluster virtualization is enabled, [backup]({% link {{ page.version.version }}/backup.md %}) and [restore]({% link {{ page.version.version }}/restore.md %}) commands are scoped to the virtual cluster by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i think this line can removed. i don't think it adds much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| Cockroach Labs recommends that you regularly [back up]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#full-backups) your _application virtual cluster (app VC)_. Only the app VC's data and settings are included in these backups, and data and settings for other virtual clusters or for the _system virtual cluster (system VC)_ are omitted. If needed, you can [restore](#restore-a-virtual-cluster) these backups to a new app VC. Use the following process to back up your app VC. | ||
|
|
||
| To back up a virtual cluster: | ||
| 1. [Connect](#connect-to-a-virtual-cluster) to the app VC as a user with the `admin` role on the app VC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i would say "a user with the BACKUP privilege". You don't need to be admin to take a backup. Reference doc: https://www.cockroachlabs.com/docs/stable/security-reference/authorization#supported-privileges
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before getting specific about the privilege type, we could make a generic statement that says 'conect to the app vc as a user with supported privileges{link to the doc michael linked}. In this example, we connect to the app VC as a user with the Backup privilege:'
| ~~~ | ||
|
|
||
| For details about restoring a backup of a virtual cluster, refer to [Restore a virtual cluster](#restore-a-virtual-cluster). | ||
| 1. [Perform a full backup]({% link {{ page.version.version }}/backup.md %}#back-up-a-cluster): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we recommend that users run backups via backup schedules (ref). Schedules have a nicer UX (no need to manually take backups), and they manage the backup's protected timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious though, as a new hire, do you think our backups docs should more explicitly point customers to use schedules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to have both. "Perform a one off full backup or create a backup schedule so that backups can be automatically taken on your behalf at a set frequency" (that wasn't great wording but something along those lines) and then we have a code snippet for setting a backup schedule too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrm, maybe we gotta align here lol. What is the use case for taking a one off full backup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we look at our current docs for backup and restore, 'scheduled backups' is like a subsection of that. If we want to restructure our backup/restore docs to emphasize scheduled backups more, we can do that, but I think that I'm trying to match what our docs say today. @peachdawnleach it'd be good to hear your opinion here, but I do agree w Michael that we need to show an example of creating a backup schedule
| 1. [Back up the cluster]({% link {{ page.version.version }}/backup.md %}), and include the `INCLUDE_ALL_SECONDARY_TENANTS` flag in the `BACKUP` command. All virtual clusters and the system virtual cluster are included in the backup. | ||
| You can also back up your system VC to preserve metadata such as users and cluster settings. Use the following process to back up your system VC. | ||
|
|
||
| 1. [Connect](#connect-to-the-system-virtual-cluster) to the system VC as a user with the `admin` role on the system VC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same comment about admin role
|
|
||
| {% include_cached copy-clipboard.html %} | ||
| ~~~ sql | ||
| BACKUP INTO 'external://backup_s3' AS OF SYSTEM TIME '-10s'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing about schedules.
| 1. [Connect to the destination virtual cluster](#connect-to-a-virtual-cluster) as a user with the `admin` role on the virtual cluster. | ||
| 1. [Restore the cluster]({% link {{ page.version.version }}/restore.md %}). Only the virtual cluster's data and settings are restored. | ||
| ### Restore the entire cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the section above can also be deleted. it applies to tenant level backups.
You can restore a backup of a virtual cluster to:
- The original virtual cluster on the original CockroachDB cluster.
- A different virtual cluster on the original CockroachDB cluster.
- A different virtual cluster on a different CockroachDB cluster with cluster virtualization enabled.
To restore only a virtual cluster:
1. [Connect to the destination virtual cluster](#connect-to-a-virtual-cluster) as a user with the `admin` role on the virtual cluster.
1. [Restore the cluster]({% link {{ page.version.version }}/restore.md %}). Only the virtual cluster's data and settings are restored.
|
|
||
| 1. [Connect to the destination system virtual cluster](#connect-to-the-system-virtual-cluster) as a user with the `admin` role on the system virtual cluster. | ||
| 1. [Restore the cluster]({% link {{ page.version.version }}/restore.md %}) from a backup that included the the `INCLUDE_ALL_SECONDARY_VIRTUAL_CLUSTERS` flag. All virtual clusters and the system virtual cluster are restored. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could be a bit more explicit on how to restore one of these backups: connect to a virtual cluster, and run restore like you normally would (for example, if the VC is empty and you took a cluster level backup, you can do a cluster level restore).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, we should have an example and code snippet
(for example, if the VC is empty and you took a cluster level backup, you can do a cluster level restore)
I don't think we should use this example though. We're not telling users to use cluster level backups. We should just add some examples about restoring an app VC to the same CRDB cluster and to a different CRDB cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not telling users to use cluster level backups
why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whatttttt? i thought you said not to!! Like i thought everythihng should be done per-VC. Otherwise we can't do table or DB level restores from a full cluster backup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you might be conflating cluster level backups and virtual cluster (or tenant) level backups (another reason i can't with the virtual cluster terminology).
- this can be run within the app vc, to back up the whole app vc:
BACKUP INTO .... - this can be run from the system vc, to backup the whole app vc
BACKUP VIRTUAL CLUSTER appvc, but i don't want to document this cmd as it is insanely confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be more specific: by cluster level I mean taking a backup of the whole physical cluster. If we're talking about app vc level, then we should say 'vc level.'
My point was that we should not document your second bullet, where we take a app vc backup from the system vc, AND we should also not document 'taking a backup of their entire cluster' which would be a 'cluster level backup.' I'm referring to the 'include secondary clusters' option when you take a backup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - great start!
|
|
||
| ## Disaster recovery | ||
|
|
||
| When cluster virtualization is enabled, [backup]({% link {{ page.version.version }}/backup.md %}) and [restore]({% link {{ page.version.version }}/restore.md %}) commands are scoped to the virtual cluster by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
|
||
| When connected to a virtual cluster from the DB Console, metrics which measure SQL and related activity show data scoped to the virtual cluster. All other metrics are collected system-wide and display the same data on all virtual clusters including the system virtual cluster. | ||
|
|
||
| ## Disaster recovery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this to "Backup and Restore?" Was it always 'disaster recovery?'
|
|
||
| When cluster virtualization is enabled, [backup]({% link {{ page.version.version }}/backup.md %}) and [restore]({% link {{ page.version.version }}/restore.md %}) commands are scoped to the virtual cluster by default. | ||
|
|
||
| ### Back up a virtual cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why was this removed? I wonder if we should say "Backups when cluster virtualization is enabled" or something
| When cluster virtualization is enabled, [backup]({% link {{ page.version.version }}/backup.md %}) and [restore]({% link {{ page.version.version }}/restore.md %}) commands are scoped to the virtual cluster by default. | ||
|
|
||
| ### Back up a virtual cluster | ||
| Cockroach Labs recommends that you regularly [back up]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#full-backups) your _application virtual cluster (app VC)_. Only the app VC's data and settings are included in these backups, and data and settings for other virtual clusters or for the _system virtual cluster (system VC)_ are omitted. If needed, you can [restore](#restore-a-virtual-cluster) these backups to a new app VC. Use the following process to back up your app VC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if the first sentence were a little more generic. like "Cockroach Labs recommends that you regularly back up your data. When working with virtual clusters, backups should be performed on the application virtual cluster."
And then, we can add some sort of 'note' about system virtual clusters and how you can back those up too if you want to keep a record of your system settings somewhere.
Can we also remove the "If needed, you can also restore these backups to a new app VC?" We already have that Restore section down below, and IMO it flows a bit nicer if we keep this focused on backups.
| Cockroach Labs recommends that you regularly [back up]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#full-backups) your _application virtual cluster (app VC)_. Only the app VC's data and settings are included in these backups, and data and settings for other virtual clusters or for the _system virtual cluster (system VC)_ are omitted. If needed, you can [restore](#restore-a-virtual-cluster) these backups to a new app VC. Use the following process to back up your app VC. | ||
|
|
||
| To back up a virtual cluster: | ||
| 1. [Connect](#connect-to-a-virtual-cluster) to the app VC as a user with the `admin` role on the app VC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before getting specific about the privilege type, we could make a generic statement that says 'conect to the app vc as a user with supported privileges{link to the doc michael linked}. In this example, we connect to the app VC as a user with the Backup privilege:'
| ~~~ | ||
|
|
||
| For details about restoring a backup of a virtual cluster, refer to [Restore a virtual cluster](#restore-a-virtual-cluster). | ||
| 1. [Perform a full backup]({% link {{ page.version.version }}/backup.md %}#back-up-a-cluster): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to have both. "Perform a one off full backup or create a backup schedule so that backups can be automatically taken on your behalf at a set frequency" (that wasn't great wording but something along those lines) and then we have a code snippet for setting a backup schedule too.
| ~~~ | ||
|
|
||
| To back up the entire CockroachDB cluster, including all virtual clusters and the system virtual cluster: | ||
| {% include {{ page.version.version }}/backups/backup-storage-collision.md %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msbutler I feel like I barely understand this directive. It reads to me like we don't want to take multiple full backups to the same place. Any ideas on how we could maek this clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, there could be a sentence explaining the backup collision problem: "each backup schedule should have a unique URI to avoid two backup schedules colliding in the same URI", or something
| ### Back up the entire cluster | ||
| {% include_cached copy-clipboard.html %} | ||
| ~~~ sql | ||
| BACKUP INTO 'external://backup_s3' AS OF SYSTEM TIME '-10s'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if in this code snippet and in the system vc code snippet we should change the URI example to be: 'external://backup_s3/app' and 'external://backup_s3/system' respectively. If the user is taking both system and app backups, then won't we run into the collision issue if we just use 'external://backup_s3' for both? @msbutler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds reasonable to me.
|
|
||
| 1. [Connect to the destination system virtual cluster](#connect-to-the-system-virtual-cluster) as a user with the `admin` role on the system virtual cluster. | ||
| 1. [Restore the cluster]({% link {{ page.version.version }}/restore.md %}) from a backup that included the the `INCLUDE_ALL_SECONDARY_VIRTUAL_CLUSTERS` flag. All virtual clusters and the system virtual cluster are restored. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, we should have an example and code snippet
(for example, if the VC is empty and you took a cluster level backup, you can do a cluster level restore)
I don't think we should use this example though. We're not telling users to use cluster level backups. We should just add some examples about restoring an app VC to the same CRDB cluster and to a different CRDB cluster.
Fixes: DOC-15097
Restructured backup and dr section for clarity and thoroughness