Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cliccl/debug_backup: create a command line utility for offline backup inspection #60790

Closed
Elliebababa opened this issue Feb 19, 2021 · 0 comments
Assignees
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-disaster-recovery

Comments

@Elliebababa
Copy link
Contributor

Elliebababa commented Feb 19, 2021

Description

Currently, when users want to inspect backups, they need to run SHOW BACKUP in a running cluster. Alternatively, we can do it via a command line utility and thus free backup inspections from running a cluster.

With this tool :

  • Users are able to list tables/databases in a backup
  • Users are able to list backups in a collection
  • When listing backups, users are able to inspect backups are full or incremental
  • When listing backups, users are able to inspect the time range the backup contains
  • Users are able to read backup tables and convert them into csv format (not sure are we going to support other formats?)
  • Users are able to look up data starting from a specific key

And therefore we may light up user stories like:

  • AOST checks
    What data is included in a table at a given time -- am I restoring the right time?
  • INCREMENTAL BACKUP checks
    Is it an incremental backup? Can I do an incremental backup on top of this backup?
  • Extracting a subset of data from backups to insert into a new table.
  • Convert backups to “logical” backups in another format (parquet, csv, etc. TBD on schema if we do something like csv)

Proposed command line design

Command line options
./cockroach debug backup show <backup_url>
Enable users to inspect metadata / database / table / user-defined types / user-defined schemas in <backup_url>

./cockroach debug backup list-incremental <backup_url>
Enable user to inspect incremental backup paths of a backup in <backup_url>

./cockroach debug backup list-backups <collection_url>
Enable users to inspect backups in a collection

./cockroach debug backup export <backup_url> --table=<tablename>
Enable user to read table from backups and output to a csv file
Flags supported by this command:

  • --as-of Enable users to read backup data snapshot as of a specific timetamp
  • --start-key and --max-rows Enable users to read backup data starting from a specific key and limit output number of rows
  • --with-revisions (experimental) Enable users to read the revisions of data from backup. Notes: This flag only supports displaying row updates since the last schema change.

Other Common Command line flags
--external-io-dir Dir for accessing local storage.

@Elliebababa Elliebababa added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Feb 19, 2021
@Elliebababa Elliebababa self-assigned this Feb 19, 2021
craig bot pushed a commit that referenced this issue Mar 12, 2021
61131: cliccl/load: update `load show` with `summary` subcommand to display backup meta information r=pbardea,itsbilal a=Elliebababa

Previously, `cockroach load show <backup_path>` is used for debugging purposes, and displays all the metadata (including Spans, Files, Descriptors) of a single backup manifest file. 

With this PR, we update `load show` with `summary` subcommand. Users can inspect metadata of a single manifest by using `load show summary <backup_url>`. User-defined types and user-defined schemas are added to the output. Also, we add support of "nodelocal://0/" access.

see
#18434
#60790

Release justification: Non-production code changes
Release note (cli change): Update `load show` with `summary` subcommand to display information of backup metadata. Users can inspect metadata information of a single manifest by using `.cockroach load show summary <backup_url>`.

--------------------
```
$ ./cockroach load show summary "nodelocal://self/f1"                       
StartTime: 1970-01-01T00:00:00Z (0,0)
EndTime: 2021-02-24T02:14:12.635069Z (1614132852.635069000,0)
DataSize: 15772 (15 KiB)
Rows: 58
IndexEntries: 70
FormatVersion: 1
ClusterID: 171d37c2-0cb1-4f14-af20-5d9a51ccbbf3
NodeID: 0
BuildInfo: CockroachDB CCL v21.1.0-alpha.1-2022-g497a9c01d1 (x86_64-apple-darwin19.6.0, built 2021/02/17 17:23:27, go1.15.7)
Spans:
        /Table/4/{1-2}
        /Table/5/{1-2}
        /Table/6/{1-2}
        /Table/14/{1-2}
        /Table/15/{1-4}
        /Table/21/{1-2}
        /Table/23/{1-4}
        /Table/24/{1-2}
        /Table/33/{1-2}
        /Table/37/{1-3}
        /Table/52/{1-2}
        /Table/75/{1-2}
        /Table/76/{1-2}
        /Table/77/{1-2}
Files:
        635903846438207489.sst:
                Span: /Table/4/{1-2}
                DataSize: 99 (99 B)
                Rows: 2
                IndexEntries: 0
        635903846681772033.sst:
                Span: /Table/5/{1-2}
                DataSize: 282 (282 B)
                Rows: 7
                IndexEntries: 0
        635903846438338561.sst:
                Span: /Table/6/{1-2}
                DataSize: 374 (374 B)
                Rows: 5
                IndexEntries: 0
        635903846605324289.sst:
                Span: /Table/15/{1-4}
                DataSize: 14480 (14 KiB)
                Rows: 34
                IndexEntries: 68
        635903846439354369.sst:
                Span: /Table/21/{1-2}
                DataSize: 261 (261 B)
                Rows: 5
                IndexEntries: 0
        635903846599852033.sst:
                Span: /Table/23/{1-4}
                DataSize: 94 (94 B)
                Rows: 1
                IndexEntries: 2
        635903847072792577.sst:
                Span: /Table/75/{1-2}
                DataSize: 182 (182 B)
                Rows: 4
                IndexEntries: 0
Databases:
        1: system
        50: defaultdb
        51: postgres
Schemas:
        29: public
Types:
        (No user-defined types included in the specified backup path.)
Tables:
        4: system.public.users
        5: system.public.zones
        6: system.public.settings
        14: system.public.ui
        15: system.public.jobs
        21: system.public.locations
        23: system.public.role_members
        24: system.public.comments
        33: system.public.role_options
        37: system.public.scheduled_jobs
        52: defaultdb.public.products
        75: defaultdb.public.customers
        76: defaultdb.public.abc
        77: defaultdb.public.deep_folder
```

Co-authored-by: elliebababa <ellie24.huang@gmail.com>
@Elliebababa Elliebababa changed the title backup: create a command line utility for offline backup inspection cliccl/debug_backup: create a command line utility for offline backup inspection Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-disaster-recovery
Projects
No open projects
Archived in project
Development

No branches or pull requests

3 participants