Skip to content

Conversation

@dbatomic
Copy link
Owner

@dbatomic dbatomic commented Apr 1, 2024

Draft implementation of sql batch scripting

Examples

CREATE TABLE t1 (a INT) USING parquet;
CREATE TABLE t2 (a INT) USING parquet;
DECLARE totalInsertCount = 0;
WHILE (SELECT COUNT(*) < 2 FROM t1) DO
  INSERT INTO t1 VALUES (1);
  SET VAR totalInsertCount = totalInsertCount + 1;
  WHILE (SELECT COUNT(*) < 2 FROM t2) DO
   INSERT INTO t2 VALUES (1);
   SET VAR totalInsertCount = totalInsertCount + 1;
   SELECT COUNT(*) as T2Count FROM t2;
  END WHILE;
  TRUNCATE TABLE t2;
END WHILE;
SELECT COUNT(*) as t1FinalCount FROM t1;
SELECT COUNT(*) as t2FinalCount FROM t2;
SELECT totalInsertCount;

Note - this is just testing, proof of concept syntax. Actual syntax will likely be different.

High level design

  1. Extending spark parser to support batch statements - IF/ELSE, variable scope, WHILE loops. Support for TRY/CATCH, cursors and PROCEDURES is TBD (but path forward should be somewhat clear).
  2. Interpreter implementation. Parser output is handled by a builder that creates new AST-like structures. For each spark statement we first build a logical plan that is stored as leaf operator in aforementioned AST. Interpreter implements iterator interface that returns stream of dataframes. By walking the iterator consumer (that can also be a debugger) gets dataframes for each of spark statements executed by interpreter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants