Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: allow planning a backup in user txn #47539

Closed
dt opened this issue Apr 15, 2020 · 28 comments
Closed

backupccl: allow planning a backup in user txn #47539

dt opened this issue Apr 15, 2020 · 28 comments
Assignees

Comments

@dt
Copy link
Member

dt commented Apr 15, 2020

Currently BACKUP executes, including running the actual job and its side-effects of files writing to external storage, during the execution of the statement. This differs slightly from most OLTP statements where their side-effects (e.g. writing to rows) are not observable outside the txn until it commits -- i.e. it lacks transactional isolation. This is why the BACKUP statement currently refuses to run inside an explicitly started transaction, since the explicit transaction implies we want transactional isolation, but we can't provide that.

In theory though, the backup could be planned inside a transaction that then wrote a row to the jobs table, and then once that transaction, that job would become runnable and be executed. Doing this will some technical legwork to ensure we use the planner txn and move all the external storage writes to the job itself, which is somewhat debatable: right now we like to confirm it can be written to before scheduling a job that writes to it -- do we want lose that? are we okay with a slightly leaky transaction?

Even more fundamental though is that the backup needs to actually run in order to compute things like rows backed up, bytes, etc and those are currently returns as the result rows from the BACKUP statement. Backups planned in transactions will be unable to return such rows, so we'll only be able to say OK or the job ID inside the txn, and when it commits, our chance to send results has passed. We can use SHOW JOB or something to get the actual results.

@miretskiy
Copy link
Contributor

Ideally, we would like to return just the job id as the result of executing backup/import/restore statements if those statements run under explicit transaction.

To do this, we need to add a new ast node in order to indicate that we would like to execute a job
and not to block until job completes.

We have discussed various options:

-- regular backup for reference.
BACKUP TO 'foo://bar' WITH x;
BACKUP TO 'foo://bar' WITH x RETURNING JOB;
BACKUP ASYNC TO 'foo://bar' WITH x;
BACKUP IN BACKGROUND TO 'foo://bar' WITH x;
BACKUP ASYNC TO 'foo://bar' ASYNC WITH x;
BACKUP TO 'foo://bar' WITH x IN BACKGROUND;
CREATE JOB EXECUTE IMMEDIATELY AS BACKUP TO 'foo://bar' WITH x;

-- regular imports
IMPORT INTO foo CSV DATA 'foo://bar';
IMPORT TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar');
IMPORT PGDUMP 'foo://bar.sql';
IMPORT INTO foo CSV DATA 'foo://bar' RETURNING JOB;
IMPORT TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar') RETURNING JOB;;
IMPORT PGDUMP 'foo://bar.sql' RETURNING JOB;

IMPORT IN BACKGROUND INTO foo CSV DATA 'foo://bar';
IMPORT IN BACKGROUND TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar');
IMPORT IN BACKGROUND PGDUMP 'foo://bar.sql';
IMPORT INTO foo CSV DATA 'foo://bar' IN BACKGROUND;
IMPORT TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar') IN BACKGROUND;
IMPORT PGDUMP 'foo://bar.sql' IN BACKGROUND;
IMPORT ASYNC INTO foo CSV DATA 'foo://bar';
IMPORT ASYNC TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar');
IMPORT ASYNC PGDUMP 'foo://bar.sql';
IMPORT INTO foo CSV DATA 'foo://bar' ASYNC;
IMPORT TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar') ASYNC;
IMPORT PGDUMP 'foo://bar.sql' ASYNC;

My preference is "ASYNC" word, but
CREATE JOB AS ... looks very interesting too

@dt
Copy link
Member Author

dt commented May 15, 2020

Some ideas that have come out of prior discussion for syntax to request this behavior are RETURNING NOTHING or RETURNING JOB suffix inspired by our RETURNING NOTHING for fire-and-forget ops elsewhere, adding a (non-reserved) keyword like "ASYNC" or "IN BACKGROUND" as either a suffix or immediately following the leading verb (e.g. "BACKUP ASYNC") or adding it to the WITH options (though this would require special handling in the parser to ensure it is available in the AST, before exec, unlike the other arbitrary strings and expressions that can usually be in a WITH kv options).

A regular BACKUP statement might look like this:

BACKUP TO 'foo://bar' WITH x;

Some of these would them look like:

BACKUP TO 'foo://bar' WITH x RETURNING JOB;
BACKUP TO 'foo://bar' WITH x IN BACKGROUND;
BACKUP ASYNC TO 'foo://bar' ASYNC WITH x;
BACKUP TO 'foo://bar' WITH x, ASYNC;
BACKUP ASYNC TO 'foo://bar' WITH x;
BACKUP IN BACKGROUND TO 'foo://bar' WITH x;
CREATE JOB EXECUTE IMMEDIATELY AS BACKUP TO 'foo://bar' WITH x;

Similarly here are a few forms of IMPORTs that run blocking right now:

IMPORT INTO foo CSV DATA 'foo://bar';
IMPORT TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar');
IMPORT PGDUMP 'foo://bar.sql';

And with some of those proposed modifiers in them:

IMPORT INTO foo CSV DATA 'foo://bar' RETURNING JOB;
IMPORT TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar') RETURNING JOB;;
IMPORT PGDUMP 'foo://bar.sql' RETURNING JOB;

IMPORT IN BACKGROUND INTO foo CSV DATA 'foo://bar';
IMPORT IN BACKGROUND TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar');
IMPORT IN BACKGROUND PGDUMP 'foo://bar.sql';

IMPORT INTO foo CSV DATA 'foo://bar' IN BACKGROUND;
IMPORT TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar') IN BACKGROUND;
IMPORT PGDUMP 'foo://bar.sql' IN BACKGROUND;

IMPORT ASYNC INTO foo CSV DATA 'foo://bar';
IMPORT ASYNC TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar');
IMPORT ASYNC PGDUMP 'foo://bar.sql';

IMPORT INTO foo CSV DATA 'foo://bar' WITH x, ASYNC;
IMPORT TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar') WITH x, ASYNC;
IMPORT PGDUMP 'foo://bar.sql' WITH x, ASYNC;

IMPORT INTO foo CSV DATA 'foo://bar' ASYNC WITH x;
IMPORT TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar') ASYNC WITH x;
IMPORT PGDUMP 'foo://bar.sql' ASYNC WITH x;

@dt
Copy link
Member Author

dt commented May 15, 2020

One of the motivations of CREATE JOB was that eventually we expect to be able to use a syntax like that to be able to create jobs that run arbitrary SQL statements, so that you can schedule SQL to run at a given time/schedule (say, like, running EXPORTs nightly or something).

However in that case, it is arbitrary SQL -- we'll wrap it up in some serialized record and then when it is time to run it, we'll unpack that and execute the payload. Importantly, we will not be executing that payload or planning the statements in it using the user's transaction in which they ran CREATE JOB. In a way, that feels like the opposite of what we're talking about here, which is evaluating the (backup) statement in a very explicit transaction? I feel like CREATE JOB... AS BACKUP... should instead just be an instance of the generalized create-a-job-that-runs-arbitrary-sql, so we shouldn't give it different semantics (plan a backup using the open txn) now. I'm 👎 on using CREATE JOB here.

@dt
Copy link
Member Author

dt commented May 15, 2020

I think my preference leans towards a trailing modifier of either RETURNING JOB or IN BACKGROUND because IN and RETURNING are both already reserved keywords, so we don't need to worry about ambiguity with the trailing optional WITH kv_option_list if we just go at the end I think and we don't need to pick a new non-reserved keyword ASYNC and then fret over where it can be unambiguously inserted.

@miretskiy
Copy link
Contributor

Between "RETURNING JOB" and "IN BACKGROUND" my preference would be "IN BACKGROUND". "RETURNING JOB" -- not sure what that means, really.
One could imagine that maybe we extend those statements to support e.g. "IMPORT ... RETURNING status" --- i.e. wait for job to finish, but just tell me if it succeeded;

I think "IN BACKGROUND" is more explicit.

@dt
Copy link
Member Author

dt commented May 15, 2020

Okay, let's go ahead and prototype with IN BACKGROUND suffix and see how it looks/feels and then sync up with sql schema and feature folks later to see if/what makes sense to share with schema changes or arbitrary sql statement jobs -- I think having something concrete to play with could help that discussion plus it lets us move forward with scheduled backups now.

@miretskiy
Copy link
Contributor

Looks like "IN BACKGROUND" is a no go. "IN" is an operator, so "IN BACKGROUND" makes
yacc yuck.

I would still prefer to have suffix options (as opposed to e.g. IMPORT ASYNC...).
So, with that in mind, and also, trying to avoid any operator keywords, I can think of few more options:

  1. "RUNNING ASYNC"

IMPORT INTO foo CSV DATA 'foo://bar' RUNNING ASYNC;
IMPORT TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar') RUNNING ASYNC;
IMPORT PGDUMP 'foo://bar.sql' RUNNING ASYNC;

  1. "DETACHED"
    IMPORT INTO foo CSV DATA 'foo://bar' DETACHED;
    IMPORT TABLE foo (id INT PRIMARY KEY, s string) CSV DATA ('foo://bar') DETACHED;
    IMPORT PGDUMP 'foo://bar.sql' DETACHED;

I kinda like option 2: DETACHED

@dt what's your take? Any other options/suggestions?

@miretskiy
Copy link
Contributor

Just published #49172 to have something concrete to discuss. I implemented "DETACHED" option.

@knz
Copy link
Contributor

knz commented May 21, 2020

I'd like to hear more:

  • what is the technical problem with IN exactly (in my experience, there's nothing particular about reserved keywords, there may be a way to make it succeed)

  • what is your rationale for "DETACHED" and why is it better than "ASYNC" or "IN BACKGROUND", or "RETURNING NOTHING" which was discusssed before

@miretskiy
Copy link
Contributor

miretskiy commented May 21, 2020

I'm not yacc expert (never actually had a chance to work with yacc), but it appears that if I use "IN BACKGROUND", I get the following error:
"conflicts: 1 shift/reduce"

Grammar is ambiguous because there are 2 ways to interpret "IN BACKGROUND", as "IN" (args) and as "IN BACKGROUND". "IN" operator currently is marked as "%nonassoc" token, which I think explains why there is ambiguity. Perhaps I could spend (probably quite a bit more) time to get it to work (and perhaps "IN" can be changed to be "%left" associate). I didn't think it was that critical to get "IN BACKGROUND" working. Also, I personally really dislike overloading the behavior of an operator.

Since I didn't want to spend more time wrestling w/ yacc, changing grammar to make (likely) dangerous change to the "IN" precedence and associativity, I just wanted to come up with something that is synonymous in meaning and spirit to "in background".

First candidate was ASYNC. And my initial attempt actually did use ASYNC. However, I felt that "async" is a bit too much "tech speak". It's an abbreviation, it's not as clear as "in background". Then I thought, how do I specify "in background" normally? That's the "&" ampersand on the command line; that's also how "daemons" run. And that's where "DETACHED" came in.

I chose "DETACHED". I thought it's better than "ASYNC", though "ASYNC" at the end of the statement works equally well. I guess it's a matter of what we think is a nicer/clearer syntax.
My preference is "DETACHED", but I'm open to changing to "ASYNC" if people think it's better.

Regarding "RETURNING NOTHING" -- I discarded this approach because we are returning something; namely "job id". Also, I didn't want to use "RETURNING" at all because I thought what if the user wants to run (attached) job, but doesn't care about any of the result fields and just wants to see subset of results: "IMPORT ... RETURNING status" (that's not currently supported, but I din't want to close the door for that"). I could have used "RETURNING NOTHING", but then I'm overloading the meaning of "RETURNING" to both mean how the job runs and what it returns. Again, I don't like overloading like that. Another knock against "RETURNING" is that what it returns is actually an expression list. Expressions are evaluated at the time plan hook runs.
But that's already too late for me -- I need to know if the job will run in detached mode because
I return different set of columns if the job runs detached.

@knz
Copy link
Contributor

knz commented May 21, 2020

The following diff does not produce any yacc error:

diff --git a/pkg/sql/parser/sql.y b/pkg/sql/parser/sql.y
index 2ed9c5f892..923fd26128 100644
--- a/pkg/sql/parser/sql.y
+++ b/pkg/sql/parser/sql.y
@@ -549,7 +549,7 @@ func (u *sqlSymUnion) alterTypeAddValuePlacement() *tree.AlterTypeAddValuePlacem
 %token <str> ALL ALTER ALWAYS ANALYSE ANALYZE AND AND_AND ANY ANNOTATE_TYPE ARRAY AS ASC
 %token <str> ASYMMETRIC AT ATTRIBUTE AUTHORIZATION AUTOMATIC

-%token <str> BACKUP BEFORE BEGIN BETWEEN BIGINT BIGSERIAL BIT
+%token <str> BACKGROUND BACKUP BEFORE BEGIN BETWEEN BIGINT BIGSERIAL BIT
 %token <str> BUCKET_COUNT
 %token <str> BOOLEAN BOTH BUNDLE BY

@@ -2111,46 +2111,50 @@ import_format:
 //    comment = '...'        [CSV-specific]
 //
 // %SeeAlso: CREATE TABLE
+
+opt_in_bg:
+       |       IN BACKGROUND
+
 import_stmt:
- IMPORT import_format '(' string_or_placeholder ')' opt_with_options
+ IMPORT opt_in_bg import_format '(' string_or_placeholder ')' opt_with_options
   {
     /* SKIP DOC */
-    $$.val = &tree.Import{Bundle: true, FileFormat: $2, Files: tree.Exprs{$4.expr()}, Options: $6.kvOptions()}
+    $$.val = &tree.Import{Bundle: true, FileFormat: $3, Files: tree.Exprs{$5.expr()}, Options: $7.kvOptions()}
   }
-| IMPORT import_format string_or_placeholder opt_with_options
+| IMPORT opt_in_bg import_format string_or_placeholder opt_with_options
   {
-    $$.val = &tree.Import{Bundle: true, FileFormat: $2, Files: tree.Exprs{$3.expr()}, Options: $4.kvOptions()}
+    $$.val = &tree.Import{Bundle: true, FileFormat: $3, Files: tree.Exprs{$4.expr()}, Options: $5.kvOptions()}
   }
-| IMPORT TABLE table_name FROM import_format '(' string_or_placeholder ')' opt_with_options
+| IMPORT opt_in_bg TABLE table_name FROM import_format '(' string_or_placeholder ')' opt_with_options
   {
     /* SKIP DOC */
-    name := $3.unresolvedObjectName().ToTableName()
-    $$.val = &tree.Import{Bundle: true, Table: &name, FileFormat: $5, Files: tree.Exprs{$7.expr()}, Options: $9.kvOptions()}
+    name := $4.unresolvedObjectName().ToTableName()
+    $$.val = &tree.Import{Bundle: true, Table: &name, FileFormat: $6, Files: tree.Exprs{$8.expr()}, Options: $10.kvOptions()}
   }
-| IMPORT TABLE table_name FROM import_format string_or_placeholder opt_with_options
+| IMPORT opt_in_bg TABLE table_name FROM import_format string_or_placeholder opt_with_options
   {
-    name := $3.unresolvedObjectName().ToTableName()
-    $$.val = &tree.Import{Bundle: true, Table: &name, FileFormat: $5, Files: tree.Exprs{$6.expr()}, Options: $7.kvOptions()}
+    name := $4.unresolvedObjectName().ToTableName()
+    $$.val = &tree.Import{Bundle: true, Table: &name, FileFormat: $6, Files: tree.Exprs{$7.expr()}, Options: $8.kvOptions()}
   }
-| IMPORT TABLE table_name CREATE USING string_or_placeholder import_format DATA '(' string_or_placeholder_list ')' opt_with_options
+| IMPORT opt_in_bg TABLE table_name CREATE USING string_or_placeholder import_format DATA '(' string_or_placeholder_list ')' opt_with_options
   {
-    name := $3.unresolvedObjectName().ToTableName()
-    $$.val = &tree.Import{Table: &name, CreateFile: $6.expr(), FileFormat: $7, Files: $10.exprs(), Options: $12.kvOptions()}
+    name := $4.unresolvedObjectName().ToTableName()
+    $$.val = &tree.Import{Table: &name, CreateFile: $7.expr(), FileFormat: $8, Files: $11.exprs(), Options: $13.kvOptions()}
   }
-| IMPORT TABLE table_name '(' table_elem_list ')' import_format DATA '(' string_or_placeholder_list ')' opt_with_options
+| IMPORT opt_in_bg TABLE table_name '(' table_elem_list ')' import_format DATA '(' string_or_placeholder_list ')' opt_with_options
   {
-    name := $3.unresolvedObjectName().ToTableName()
-    $$.val = &tree.Import{Table: &name, CreateDefs: $5.tblDefs(), FileFormat: $7, Files: $10.exprs(), Options: $12.kvOptions()}
+    name := $4.unresolvedObjectName().ToTableName()
+    $$.val = &tree.Import{Table: &name, CreateDefs: $6.tblDefs(), FileFormat: $8, Files: $11.exprs(), Options: $13.kvOptions()}
   }
-| IMPORT INTO table_name '(' insert_column_list ')' import_format DATA '(' string_or_placeholder_list ')' opt_with_options
+| IMPORT opt_in_bg INTO table_name '(' insert_column_list ')' import_format DATA '(' string_or_placeholder_list ')' opt_with_options
   {
-    name := $3.unresolvedObjectName().ToTableName()
-    $$.val = &tree.Import{Table: &name, Into: true, IntoCols: $5.nameList(), FileFormat: $7, Files: $10.exprs(), Options: $12.kvOptions()}
+    name := $4.unresolvedObjectName().ToTableName()
+    $$.val = &tree.Import{Table: &name, Into: true, IntoCols: $6.nameList(), FileFormat: $8, Files: $11.exprs(), Options: $13.kvOptions()}
   }
-| IMPORT INTO table_name import_format DATA '(' string_or_placeholder_list ')' opt_with_options
+| IMPORT opt_in_bg INTO table_name import_format DATA '(' string_or_placeholder_list ')' opt_with_options
   {
-    name := $3.unresolvedObjectName().ToTableName()
-    $$.val = &tree.Import{Table: &name, Into: true, IntoCols: nil, FileFormat: $4, Files: $7.exprs(), Options: $9.kvOptions()}
+    name := $4.unresolvedObjectName().ToTableName()
+    $$.val = &tree.Import{Table: &name, Into: true, IntoCols: nil, FileFormat: $5, Files: $8.exprs(), Options: $10.kvOptions()}
   }
 | IMPORT error // SHOW HELP: IMPORT

@@ -10201,6 +10205,7 @@ unreserved_keyword:
 | ATTRIBUTE
 | AUTOMATIC
 | AUTHORIZATION
+| BACKGROUND
 | BACKUP
 | BEFORE
 | BEGIN

@knz
Copy link
Contributor

knz commented May 21, 2020

Also you could out opt_in_bg at the end, after/before opt_with_options and it would be even simpler

@knz
Copy link
Contributor

knz commented May 21, 2020

I personally recommend the new qualifier completely at the end.

The rationale is that it makes the "adding the qualifier to an existing SQL string" easier, by simple string concatenation

@miretskiy
Copy link
Contributor

Interesting... I didn't even try "IN BACKGROUND" as a "prefix" option (IMPORT IN BACKGROUND)... Always had it as a suffix. I Don't see in your patch that you actually set any values for the e..g import specification. Like, where is the information on whether or not that option was specified?

@miretskiy
Copy link
Contributor

miretskiy commented May 21, 2020

As I said, I'm not yacc expert, it may well be possible to make "in background" work. I have defined detached option (and previously tried the same with "in background") as follows:

opt_detached:
  DETACHED
  {
    $$.val = true
  }
| /* EMPTY */
  {
    $$.val = false
  }

I think I need to have true/false value on whether that option was set. And then, the statements need to set true/false value for that option as well.
Can you try that?

Regardless though: do we really think "IN BACKGROUND" is better than "DETACHED"?

@knz
Copy link
Contributor

knz commented May 21, 2020

Yes looking into it now thanks

@miretskiy
Copy link
Contributor

Here is my updated backup statement:

backup_stmt:
  BACKUP TO partitioned_backup opt_as_of_clause opt_incremental opt_with_options opt_detached
  {
    $$.val = &tree.Backup{
    	DescriptorCoverage: tree.AllDescriptors,
    	To: $3.partitionedBackup(),
    	IncrementalFrom: $5.exprs(),
    	AsOf: $4.asOfClause(),
    	Options: $6.kvOptions(),
    	Detached: $7.bool(),
    }
  }

@knz
Copy link
Contributor

knz commented May 21, 2020

  • it's not too bad, but there's one more UX improvement which I'm going to prototype for you

  • I'm not sold on the "DETACHED" keyword.

    I don't understand looking at "detached" what it means. I can't even google it and get some intuition that way. With "ASYNC" you may think it's "technical" but at least I can google it and find some intuition in that way.

    "IN BACKGROUND" is not ideal in a similar way, but at least folk with a modicum of experience in batch control systems will know what to think about it

btw it's SQL so we don't do abbreviations. It would be "ASYNCHRONOUS" not "async" - but that's a detail.

I'm not going to let the keyword discission block you, but give a minute to try out the other UX thing I'm thinking about.

@miretskiy
Copy link
Contributor

Google gives definition for detached as "separate or disconnected"... The disconnected bit is exactly what it is . Detached, I think is equally familiar with anybody who used command line; or batch control system as well.

I'm not hard set on "detached" either way -- just something that I could prototype to start discussion.

@knz
Copy link
Contributor

knz commented May 21, 2020

I have an intermediate finding for y'all: there was a mistake made in the past to pass AS OF SYSTEM TIME to BACKUP without parentheses or a separating comma.

The AS OF SYSTEM TIME clause ends with a scalar expression, and that can expand arbitrarily on the right side with all the expression keywords (including IN, but also including other things we may add into scalar expression in the future).

There is thus a large exposure by the backup statement to future breakage due to further innovation/additions in SQL scalar expressions.
I believe this was unintended.

Not sure what to make of it, but it explains why "IN" didn't work at the end.

@knz
Copy link
Contributor

knz commented May 21, 2020

Ok so the idea I had doesn't work because of the thing I just found.
Here's the thing: I am very annoyed by statements that have positional arguments because it forces me to remember the order.

How am I to remember that "AS OF" must be present before "WITH", and "INCREMENTAL" must be after "AS OF", but before "WITH"?

Conversely, what's fundamentally blocking the ability to accept things in arbitrary order? (There's not a very good reason - the main reason is the mistake pointed to above)

And now you're thinking about adding yet one more positional argument, which compounds the difficulty of remembering the order.

IMHO I'd rather see the new "detached" option come as an option after WITH, for this reason. This way I know I can stick it in there interleaved in any order with the other options.

Arguably I'd find it easier to use if the "incremental" option was also after WITH, for the same reason.

@knz
Copy link
Contributor

knz commented May 21, 2020

Ooh actually I just discovered if we adopt DETACHED instead of IN BACKGROUND, we can do the "any order" thing. I'll show you the diff in a moment.

@miretskiy
Copy link
Contributor

The problem with adding "async/detached/background" to the "WITH" option clause is that it is evaluated too late.
Yeah, detached can be used anywhere; not a problem. So, if you can think of ways to make that option better behaved and more future proof, that'd be great.

@dt
Copy link
Member Author

dt commented May 21, 2020

FWIW, I think we can add keyword/existence-only options to the WITH clause if we want to and have them available during planning if we add them to the grammar and have sql.y accordingly set a real field in the AST, instead of the opaque map and arbitrary expressions.

@knz
Copy link
Contributor

knz commented May 21, 2020

Here's your diff:

diff --git a/pkg/sql/parser/sql.y b/pkg/sql/parser/sql.y
index 2ed9c5f892..0710c3f528 100644
--- a/pkg/sql/parser/sql.y
+++ b/pkg/sql/parser/sql.y
@@ -563,7 +563,7 @@ func (u *sqlSymUnion) alterTypeAddValuePlacement() *tree.AlterTypeAddValuePlacem
 %token <str> CURRENT_USER CYCLE

 %token <str> DATA DATABASE DATABASES DATE DAY DEC DECIMAL DEFAULT DEFAULTS
-%token <str> DEALLOCATE DECLARE DEFERRABLE DEFERRED DELETE DESC
+%token <str> DEALLOCATE DECLARE DEFERRABLE DEFERRED DELETE DESC DETACHED
 %token <str> DISCARD DISTINCT DO DOMAIN DOUBLE DROP

 %token <str> ELSE ENCODING END ENUM ESCAPE EXCEPT EXCLUDE EXCLUDING
@@ -835,9 +835,9 @@ func (u *sqlSymUnion) alterTypeAddValuePlacement() *tree.AlterTypeAddValuePlacem
 %type <tree.Statement> declare_cursor_stmt
 %type <tree.Statement> reindex_stmt

-%type <[]string> opt_incremental
+%type <[]string> opt_incremental incremental
 %type <tree.KVOption> kv_option
-%type <[]tree.KVOption> kv_option_list opt_with_options var_set_list
+%type <[]tree.KVOption> kv_option_list opt_with_options kv_with var_set_list
 %type <str> import_format
 %type <tree.StorageParam> storage_parameter
 %type <[]tree.StorageParam> storage_parameter_list opt_table_with
@@ -2014,9 +2014,9 @@ alter_attribute_action:
 //
 // %SeeAlso: RESTORE, WEBDOCS/backup.html
 backup_stmt:
-  BACKUP TO partitioned_backup opt_as_of_clause opt_incremental opt_with_options
+  BACKUP TO partitioned_backup opt_bulk_io_options
   {
-    $$.val = &tree.Backup{DescriptorCoverage: tree.AllDescriptors, To: $3.partitionedBackup(), IncrementalFrom: $5.exprs(), AsOf: $4.asOfClause(), Options: $6.kvOptions()}
+    $$.val = &tree.Backup{DescriptorCoverage: tree.AllDescriptors, To: $3.partitionedBackup()}
   }
 | BACKUP targets TO partitioned_backup opt_as_of_clause opt_incremental opt_with_options
   {
@@ -2185,6 +2185,23 @@ string_or_placeholder:
     $$.val = p
   }

+opt_bulk_io_options:
+  /* EMPTY */ {}
+| bulk_io_options {}
+
+bulk_io_options:
+  bulk_io_option {}
+| bulk_io_options bulk_io_option {}
+
+bulk_io_option:
+  as_of_clause {}
+| incremental {}
+| kv_with {}
+| detached {}
+
+detached:
+  DETACHED {}
+
 string_or_placeholder_list:
   string_or_placeholder
   {
@@ -2196,15 +2213,18 @@ string_or_placeholder_list:
   }

 opt_incremental:
-  INCREMENTAL FROM string_or_placeholder_list
-  {
-    $$.val = $3.exprs()
-  }
+  incremental
 | /* EMPTY */
   {
     $$.val = tree.Exprs(nil)
   }

+incremental:
+  INCREMENTAL FROM string_or_placeholder_list
+  {
+    $$.val = $3.exprs()
+  }
+
 kv_option:
   name '=' string_or_placeholder
   {
@@ -2234,6 +2254,10 @@ kv_option_list:
   }

 opt_with_options:
+  kv_with {}
+| /* EMPTY */ {}
+
+kv_with:
   WITH kv_option_list
   {
     $$.val = $2.kvOptions()
@@ -2242,10 +2266,6 @@ opt_with_options:
   {
     $$.val = $4.kvOptions()
   }
-| /* EMPTY */
-  {
-    $$.val = nil
-  }

 copy_from_stmt:
   COPY table_name opt_column_list FROM STDIN opt_with_options
@@ -10241,6 +10261,7 @@ unreserved_keyword:
 | DELETE
 | DEFAULTS
 | DEFERRED
+| DETACHED
 | DISCARD
 | DOMAIN
 | DOUBLE

Note that I have left the yacc actions empty {}- that's still left to be figured out, but it's not too complicated. The diff demonstrates that you can do the more flexible grammar without a yacc error.

With this change, all the orderings are now valid:

BACKUP ... AS OF SYSTEM TIME z DETACHED WITH x = y
BACKUP ... DETACHED WITH x = y AS OF SYSTEM TIME z
BACKUP ... WITH x = y AS OF SYSTEM TIME z DETACHED
etc

which IMHO is a big UX improvement

@knz
Copy link
Contributor

knz commented May 21, 2020

I still prefer a WITH-based solution, with the caveat that we need to do what @dt suggests.

@miretskiy
Copy link
Contributor

miretskiy commented Jun 18, 2020

Sorry to resurrect this issue... What exactly have we agreed on?

Change the WITH option so that it also takes "DETACHED"? My concern is that it suddenly and significantly expands the scope of this particular issue. Just import statement along has few dozen options that now need to be explicitly parsed and set inside tree structures (not to mention the expanded scope in proving that new parsed with options behave the same way as the old key/value based ones; plus much larger code change to all of the bulk-y statement implementations)

Or, just take @knz diff above and try to make DETACHED option more flexible (wrt to its position)?

Or, alternatively, go with what I have in #49172 and refactor/modify/improve later?

@dt? @knz?

craig bot pushed a commit that referenced this issue Jun 29, 2020
50723: sql: Switch BACKUP statement to use allowed options r=miretskiy a=miretskiy

Informs #47539

Modify BACKUP statements "WITH" clause to only accept allowed
explicit options.

This change replaces the "WITH" clause implementation that used
to accept any set of key/value pairs with the one that only
accepts options allowed by the BACKUP statement.

This change does not introduce any new options.
However, this change does introduce two changes on how
we parse those options:
  1. We now check to make sure that each option is specified only once
     and produce error message if the option is specified multiple
     times (which is a mistake that could not have been detected prior
     to this change).
  2. We now disallow quoted option name.


Release Notes (usability): Improve WITH option parsing for backup.
When BACKUP statement is used incorrectly with the same
option specified multiple times inside the WITH clause, we now
produce useful error message.
In addition, we now disallow quoted option names.  This is
a backward incompatible change.


50752: cloud: update orchestrator configs to point to v20.1.3 image r=jlinder a=asubiotto

Release note: None (orchestrator config change)

Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Co-authored-by: Alfonso Subiotto Marques <alfonso@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Jun 29, 2020
50724: sql: Add 'DETACHED" option to the list of allowed BACKUP options r=miretskiy a=miretskiy

Informs #47539

Add `DETACHED` option to the list of allowed BACKUP options.
This option requests the backup to run in detached mode: that is,
backup statement should not block, and instead simply return the job id.

Release Notes : None


Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Jul 2, 2020
50775: bulkio: Support running BACKUP under transaction. r=miretskiy a=miretskiy

Informs #47539

Allow execution of BACKUP statement under explicit transaction,
provided the "DETACHED" option is specified:

```
BACKUP TO <location_uri> ... WITH DETACHED
```

When backup runs in `detached` mode, we do not wait for the job
completion.  Instead, we return the job id to the caller, and
the job executes asynchronously.

Release notes (enterprise change): BACKUP can now run in "detached" mode.
That is, we do not wait for the BACKUP job to finish.  Instead, the
job ID is returned immediately, and the job itself runs in background.

Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
@miretskiy
Copy link
Contributor

Closed by #50775

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants