Skip to content

Commit

Permalink
updates (#11830)
Browse files Browse the repository at this point in the history
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
  • Loading branch information
soyeric128 and mergify[bot] authored Jun 21, 2023
1 parent 6a66544 commit 5490176
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 8 deletions.
18 changes: 13 additions & 5 deletions docs/doc/14-sql-commands/10-dml/dml-copy-into-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,10 +219,6 @@ copyOptions ::=
| ON_ERROR | Decides how to handle a file that contains errors: 'continue' to skip and proceed, 'abort' to terminate on error, 'abort_N' to terminate when errors ≥ N. Default is 'abort'. Note: 'abort_N' not available for Parquet files. | Optional |
| MAX_FILES | Sets the maximum number of files to load. Defaults to `0` meaning no limits. | Optional |

:::info
The parameter ON_ERROR currently does not work for parquet files.
:::

## Examples

### 1. Loading Data from an Internal Stage
Expand Down Expand Up @@ -386,4 +382,16 @@ CONNECTION = (
)
PATTERN = '.*[.]parquet'
FILE_FORMAT = (TYPE = PARQUET);
```
```

### 8. Controlling Parallel Processing

In Databend, the *max_threads* setting specifies the maximum number of threads that can be utilized to execute a request. By default, this value is typically set to match the number of CPU cores available on the machine.

When loading data into Databend with COPY INTO, you can control the parallel processing capabilities by injecting hints into the COPY INTO command and setting the *max_threads* parameter. For example:

```sql
COPY /*+ set_var(max_threads=6) */ INTO mytable FROM @mystage/ pattern='.*[.]parq' FILE_FORMAT=(TYPE=parquet);
```

For more information about injecting hints, see [SET_VAR](../80-setting-cmds/03-set-var.md).
18 changes: 15 additions & 3 deletions docs/doc/14-sql-commands/80-setting-cmds/03-set-var.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,12 @@ title: SET_VAR

SET_VAR is used to specify optimizer hints within a single SQL statement, allowing for finer control over the execution plan of that specific statement. This includes:

- Configure settings temporarily, affecting only the duration of the SQL statement execution. It's important to note that the settings specified with SET_VAR will solely impact the result of the current statement being executed and will not have any lasting effects on the overall database configuration. For a list of available settings that can be configured using SET_VAR, see [SHOW SETTINGS](../40-show/show-settings.md). To understand how it works, see [Example 1. Temporarily Set Timezone](#example-1-temporarily-configure-timezone).
- Configure settings temporarily, affecting only the duration of the SQL statement execution. It's important to note that the settings specified with SET_VAR will solely impact the result of the current statement being executed and will not have any lasting effects on the overall database configuration. For a list of available settings that can be configured using SET_VAR, see [SHOW SETTINGS](../40-show/show-settings.md). To understand how it works, see these examples:

- Control the deduplication behavior on [INSERT](../10-dml/dml-insert.md), [UPDATE](../10-dml/dml-update.md), or [REPLACE](../10-dml/dml-replace.md) operations with the label *deduplicate_label*. For those operations with a deduplicate_label in the SQL statements, Databend executes only the first statement, and subsequent statements with the same deduplicate_label value are ignored, regardless of their intended data modifications. Please note that once you set a deduplicate_label, it will remain in effect for a period of 24 hours. To understand how the deduplicate_label assists in deduplication, see [Example 2: Set Deduplicate Label](#example-2-set-deduplicate-label).
- [Example 1. Temporarily Set Timezone](#example-1-temporarily-set-timezone)
- [Example 2: Control Parallel Processing for COPY INTO](#example-2-control-parallel-processing-for-copy-into)

- Control the deduplication behavior on [INSERT](../10-dml/dml-insert.md), [UPDATE](../10-dml/dml-update.md), or [REPLACE](../10-dml/dml-replace.md) operations with the label *deduplicate_label*. For those operations with a deduplicate_label in the SQL statements, Databend executes only the first statement, and subsequent statements with the same deduplicate_label value are ignored, regardless of their intended data modifications. Please note that once you set a deduplicate_label, it will remain in effect for a period of 24 hours. To understand how the deduplicate_label assists in deduplication, see [Example 3: Set Deduplicate Label](#example-3-set-deduplicate-label).

See also: [SET](01-set-global.md)

Expand Down Expand Up @@ -69,8 +72,17 @@ SELECT

1 row in 0.010 sec. Processed 1 rows, 1B (104.34 rows/s, 104B/s)
```
### Example 2: Control Parallel Processing for COPY INTO

In Databend, the *max_threads* setting specifies the maximum number of threads that can be utilized to execute a request. By default, this value is typically set to match the number of CPU cores available on the machine.

When loading data into Databend with COPY INTO, you can control the parallel processing capabilities by injecting hints into the COPY INTO command and setting the *max_threads* parameter. For example:

```sql
COPY /*+ set_var(max_threads=6) */ INTO mytable FROM @mystage/ pattern='.*[.]parq' FILE_FORMAT=(TYPE=parquet);
```

### Example 2: Set Deduplicate Label
### Example 3: Set Deduplicate Label

```sql
CREATE TABLE t1(a Int, b bool);
Expand Down

1 comment on commit 5490176

@vercel
Copy link

@vercel vercel bot commented on 5490176 Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully deployed to the following URLs:

databend – ./

databend.vercel.app
databend.rs
databend-git-main-databend.vercel.app
databend-databend.vercel.app

Please sign in to comment.