Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for enabling/disabling DDL / DML / Config changes via SQL #7328

Closed
alamb opened this issue Aug 18, 2023 · 2 comments · Fixed by #7333
Closed

API for enabling/disabling DDL / DML / Config changes via SQL #7328

alamb opened this issue Aug 18, 2023 · 2 comments · Fixed by #7333
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Aug 18, 2023

Is your feature request related to a problem or challenge?

Some people want to use DataFusion as a read only engine (for example we do in IOx). We do not want to allow users to:

  1. Create memory backed tables (the state is ephemeral, so they won't be able to use them)
  2. Write to local files (via COPY) as this is a security issue
  3. Set session configuration (e.g. batch_size) as this can cause unwanted memory use / Denial of service attacks

Other users, such as datafusion-cli want to allow all the features

Also, DataFusion has gained additional capabilities, such as the ability to INSERT into the included table providers like Csv and Json, it may not be obvious to builders on top of DataFusion that such modifications are allowed and depending on their usecase may actually be a security risk

While working on #7272 from @UlfarErl , it is pretty clear that the distinction between APIs that handle read only sql and SQL that modifies the catalog is confusing. Additionally
the new COPY command, is a normal execution plan, and thus without additional work on IOx (see https://github.com/influxdata/influxdb_iox/pull/8515#discussion_r1297654343 ) datafusion could allow users to run COPY (and overwrite local files, etc)

Describe the solution you'd like

Thus I propose making an API on SessionContext and SessionState with the specific options about what types of operations are supported:

Something like:

struct SQLOptions {
  /// allow DDL catalog modification commands (e.g. `CREATE TABLE ...`)
  allow_ddl: bool,
  /// allow DML data modification commands (e.g. `INSERT and COPY`)
  allow_dml: bool,
/// allow configuration changes (e.g. `SET ...`)
allow_config: bool
}

And then add this:

impl SessionContext {

/// Existing API will allow all types of SQL:
pub async fn sql(&self, sql: &str) -> Result<DataFrame> {.
  self.sql_with_options(sql SQLOptions {
    allow_ddl: true,
    allow_dml: true,
    allow_config: true,
    })
}

/// New API will generate errors if a type of command is not allowed
pub async fn sql_with_options(&self, sql: &str, options: SQLOptions) -> Result<DataFrame> {
  let plan = ...;
  if is_dml(plan) && !optiobs.allow_dml {
    return plan_err!("DML Plan {plan} is not allowed")
  }
  ...
}

Describe alternatives you've considered

No response

Additional context

Related to an earlier proposal #4720

@alamb
Copy link
Contributor Author

alamb commented Aug 18, 2023

@tustvold you had strong opinions on #4720

Do you have any opinions for this one?

@tustvold
Copy link
Contributor

I've honestly lost context on this, and haven't been following the recent additions, so I'll refrain from comment. So long as we're able to maintain some notion of atomicity, its fine by me 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants