title | summary | toc |
---|---|---|
IMPORT INTO |
Import CSV data into an existing CockroachDB table. |
true |
The IMPORT INTO
statement imports CSV, Avro, or delimited data into an existing table, by appending new rows into the table.
IMPORT INTO
only works for existing tables. To import data into new tables, seeIMPORT
.IMPORT INTO
cannot be used within a transaction or during a rolling upgrade.IMPORT INTO
invalidates all foreign keys on the target table. To validate the foreign key(s), use theVALIDATE CONSTRAINT
statement.IMPORT INTO
does not offerSELECT
orWHERE
clauses to specify subsets of rows. To do this, useINSERT
.
The user must have the INSERT
and DROP
privileges on the specified table. (DROP
is required because the table is taken offline during the IMPORT INTO
.)
The source file URL does not require the ADMIN
role in the following scenarios:
- S3 and GS using
SPECIFIED
(and notIMPLICIT
) credentials. Azure is alwaysSPECIFIED
by default. - Userfile
The source file URL does require the ADMIN
role in the following scenarios:
- S3 or GS using
IMPLICIT
credentials - Use of a custom endpoint on S3
- Nodelocal, HTTP, or HTTPS
Learn more about cloud storage for bulk operations.
{{site.data.alerts.callout_info}} While importing into an existing table, the table is taken offline. {{site.data.alerts.end}}
Parameter | Description |
---|---|
table_name |
The name of the table you want to import into. |
column_name |
The table columns you want to import. Note: Currently, target columns are not enforced. |
file_location |
The URL of a CSV or Avro file containing the table data. This can be a comma-separated list of URLs. For an example, see Import into an existing table from multiple CSV files below. |
<option> [= <value>] |
Control your import's behavior with import options. |
New in v20.2: The DELIMITED DATA
format can be used to import delimited data from any text file type, while ignoring characters that need to be escaped, like the following:
- The file's delimiter (
\t
by default) - Double quotes (
"
) - Newline (
\n
) - Carriage return (
\r
)
For examples showing how to use the DELIMITED DATA
format, see the Examples section below.
You can control the IMPORT
process's behavior using any of the following key-value pairs as a <option> [= <value>]
.
Key |
--------------------+-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
delimiter
| CSV DATA
| The unicode character that delimits columns in your rows. Default: ,
. Example: To use tab-delimited values:
IMPORT INTO foo (..) CSV DATA ('file.csv') WITH delimiter = e'\t';
comment
| CSV DATA
| The unicode character that identifies rows to skip. Example:
IMPORT INTO foo (..) CSV DATA ('file.csv') WITH comment = '#';
nullif
| CSV DATA
, DELIMITED DATA
| The string that should be converted to NULL. Example: To use empty columns as NULL:
IMPORT INTO foo (..) CSV DATA ('file.csv') WITH nullif = '';
skip
| CSV DATA
, DELIMITED DATA
| The number of rows to be skipped while importing a file. Default: '0'
. Example: To import CSV files with column headers:
IMPORT INTO ... CSV DATA ('file.csv') WITH skip = '1';
decompress
| General | The decompression codec to be used: gzip
, bzip
, auto
, or none
. Default: 'auto'
, which guesses based on file extension (.gz
, .bz
, .bz2
). none
disables decompression. Example:
IMPORT INTO ... WITH decompress = 'bzip';
rows_terminated_by
| DELIMITED DATA
| The unicode character to indicate new lines in the input file. Default: \n
Example:
IMPORT INTO ... WITH rows_terminated_by='\m';
fields_terminated_by
| DELIMITED DATA
| The unicode character used to separate fields in each input line. Default: \t
Example:
IMPORT INTO ... WITH fields_terminated_by=".";
fields_enclosed_by
| DELIMITED DATA
| The unicode character that encloses fields. Default: "
Example:
IMPORT INTO ... WITH fields_enclosed_by='"';
fields_escaped_by
| DELIMITED DATA
| The unicode character, when preceding one of the above DELIMITED DATA
options, to be interpreted literally. Example:
IMPORT INTO ... WITH fields_escaped_by='\';
strict_validation
| AVRO DATA
| Rejects Avro records that do not have a one-to-one mapping between Avro fields to the target CockroachDB schema. By default, CockroachDB ignores unknown Avro fields and sets missing SQL fields to NULL
. CockroachDB will also attempt to convert the Avro field to the CockroachDB [data type][datatypes]; otherwise, it will report an error. Example:
IMPORT INTO foo (..) AVRO DATA ('file.avro') WITH strict_validation;
records_terminated_by
| AVRO DATA
| The unicode character to indicate new lines in the input binary or JSON file. This is not needed for Avro OCF. Default: \n
Example: To use tab-terminated records:
IMPORT INTO foo (..) AVRO DATA ('file.csv') WITH records_terminated_by = e'\t';
data_as_binary_records
| AVRO DATA
| Use when importing a binary file containing Avro records. The schema is not included in the file, so you need to specify the schema with either the schema
or schema_uri
option. Example:
IMPORT INTO foo (..) AVRO DATA ('file.bjson') WITH data_as_binary_records, schema_uri='..';
data_as_json_records
| AVRO DATA
| Use when importing a JSON file containing Avro records. The schema is not included in the file, so you need to specify the schema with either the schema
or schema_uri
option. Example:
IMPORT INTO foo (..) AVRO DATA ('file.bjson') WITH data_as_json_records, schema='{ "type": "record",..}';
schema
| AVRO DATA
| The schema of the Avro records included in the binary or JSON file. This is not needed for Avro OCF. See
data_as_json_records
example above.
schema_uri
| AVRO DATA
| The URI of the file containing the schema of the Avro records include in the binary or JSON file. This is not needed for Avro OCF. See
data_as_binary_records
example above.
For examples showing how to use these options, see the IMPORT
- Examples section.
For instructions and working examples showing how to migrate data from other databases and formats, see the Migration Overview. For information on how to import data into new tables, see IMPORT
.
Before using IMPORT INTO
, you should have:
-
An existing table to import into (use
CREATE TABLE
).New in v20.2:
IMPORT INTO
supports computed columns and theDEFAULT
expressions listed below. -
The CSV or Avro data you want to import, preferably hosted on cloud storage. This location must be equally accessible to all nodes using the same import file location. This is necessary because the
IMPORT INTO
statement is issued once by the client, but is executed concurrently across all nodes of the cluster. For more information, see the Import file location section below.
New in v20.2: IMPORT INTO
supports computed columns and the following DEFAULT
expressions:
-
Constant
DEFAULT
expressions, which are expressions that return the same value in different statements. Examples include:- Literals (booleans, strings, integers, decimals, dates)
- Functions where each argument is a constant expression and the functions themselves depend solely on their arguments (e.g., arithmetic operations, boolean logical operations, string operations).
-
Current
TIMESTAMP
functions that record the transaction timestamp, which include:current_date()
current_timestamp()
localtimestamp()
now()
statement_timestamp()
timeofday()
transaction_timestamp()
-
random()
-
gen_random_uuid()
-
unique_rowid()
Each node in the cluster is assigned an equal part of the imported data, and so must have enough temp space to store it. In addition, data is persisted as a normal table, and so there must also be enough space to hold the final, replicated data. The node's first-listed/default store
directory must have enough available storage to hold its portion of the data.
On cockroach start
, if you set --max-disk-temp-storage
, it must also be greater than the portion of the data a node will store in temp space.
CockroachDB uses the URL provided to construct a secure API call to the service you specify. The URL structure depends on the type of file storage you are using. For more information, see the following:
- Use Cloud Storage for Bulk Operations
- New in v20.2: Use
userfile
for Bulk Operations - Use a Local File Server for Bulk Operations
All nodes are used during the import job, which means all nodes' CPU and RAM will be partially consumed by the IMPORT
task in addition to serving normal traffic.
After CockroachDB successfully initiates an import into an existing table, it registers the import as a job, which you can view with SHOW JOBS
.
After the import has been initiated, you can control it with PAUSE JOB
, RESUME JOB
, and CANCEL JOB
.
{{site.data.alerts.callout_info}} If initiated correctly, the statement returns when the import is finished or if it encounters an error. In some cases, the import can continue after an error has been returned (the error message will tell you that the import has resumed in background). {{site.data.alerts.end}}
{{site.data.alerts.callout_danger}}
Pausing and then resuming an IMPORT INTO
job will cause it to restart from the beginning.
{{site.data.alerts.end}}
Amazon S3:
{% include copy-clipboard.html %}
> IMPORT INTO customers (id, name)
CSV DATA (
's3://acme-co/customers.csv?AWS_ACCESS_KEY_ID=[placeholder]&AWS_SECRET_ACCESS_KEY=[placeholder]&AWS_SESSION_TOKEN=[placeholder]'
);
Azure:
{% include copy-clipboard.html %}
> IMPORT INTO customers (id, name)
CSV DATA (
'azure://acme-co/customer-import-data.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co'
);
Google Cloud:
{% include copy-clipboard.html %}
> IMPORT INTO customers (id, name)
CSV DATA (
'gs://acme-co/customers.csv'
);
{{site.data.alerts.callout_info}} The column order in your statement must match the column order in the CSV being imported, regardless of the order in the existing table's schema. {{site.data.alerts.end}}
Amazon S3:
{% include copy-clipboard.html %}
> IMPORT INTO customers (id, name)
CSV DATA (
's3://acme-co/customers.csv?AWS_ACCESS_KEY_ID=[placeholder]&AWS_SECRET_ACCESS_KEY=[placeholder]',
's3://acme-co/customers2.csv?AWS_ACCESS_KEY_ID=[placeholder]&AWS_SECRET_ACCESS_KEY=[placeholder',
's3://acme-co/customers3.csv?AWS_ACCESS_KEY_ID=[placeholder]&AWS_SECRET_ACCESS_KEY=[placeholder]',
's3://acme-co/customers4.csv?AWS_ACCESS_KEY_ID=[placeholder]&AWS_SECRET_ACCESS_KEY=[placeholder]',
);
Azure:
{% include copy-clipboard.html %}
> IMPORT INTO customers (id, name)
CSV DATA (
'azure://acme-co/customer-import-data1.1.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co',
'azure://acme-co/customer-import-data1.2.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co',
'azure://acme-co/customer-import-data1.3.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co',
'azure://acme-co/customer-import-data1.4.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co',
'azure://acme-co/customer-import-data1.5.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co',
);
Google Cloud:
{% include copy-clipboard.html %}
> IMPORT INTO customers (id, name)
CSV DATA (
'gs://acme-co/customers.csv',
'gs://acme-co/customers2.csv',
'gs://acme-co/customers3.csv',
'gs://acme-co/customers4.csv',
);
Avro OCF data, JSON records, or binary records can be imported. The following are examples of importing Avro OCF data.
To specify the table schema in-line:
Amazon S3:
{% include copy-clipboard.html %}
> IMPORT INTO customers
AVRO DATA (
's3://acme-co/customers.avro?AWS_ACCESS_KEY_ID=[placeholder]&AWS_SECRET_ACCESS_KEY=[placeholder]&AWS_SESSION_TOKEN=[placeholder]'
);
Azure:
{% include copy-clipboard.html %}
> IMPORT INTO customers
AVRO DATA (
'azure://acme-co/customers.avro?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co'
);
Google Cloud:
{% include copy-clipboard.html %}
> IMPORT INTO customers
AVRO DATA (
'gs://acme-co/customers.avro'
);
For more detailed information about importing data from Avro and examples, see Migrate from Avro.
Amazon S3:
{% include copy-clipboard.html %}
> IMPORT INTO customers
DELIMITED DATA (
's3://your-external-storage/employees.csv?AWS_ACCESS_KEY_ID=[placeholder]&AWS_SECRET_ACCESS_KEY=[placeholder]'
)
WITH
fields_terminated_by='|',
fields_enclosed_by='"',
fields_escaped_by='\';
Azure:
{% include copy-clipboard.html %}
> IMPORT INTO customers
DELIMITED DATA (
'azure://acme-co/employees.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co'
)
WITH
fields_terminated_by='|',
fields_enclosed_by='"',
fields_escaped_by='\';
Google Cloud:
{% include copy-clipboard.html %}
> IMPORT INTO customers
DELIMITED DATA (
'gs://acme-co/employees.csv'
)
WITH
fields_terminated_by='|',
fields_enclosed_by='"',
fields_escaped_by='\';
- You cannot import into a table with partial indexes.
- While importing into an existing table, the table is taken offline.
- After importing into an existing table, constraints will be un-validated and need to be re-validated.
- Imported rows must not conflict with existing rows in the table or any unique secondary indexes.
IMPORT INTO
works for only a single existing table, and the table must not be interleaved.IMPORT INTO
cannot be used within a transaction.IMPORT INTO
can sometimes fail with a "context canceled" error, or can restart itself many times without ever finishing. If this is happening, it is likely due to a high amount of disk contention. This can be mitigated by setting thekv.bulk_io_write.max_rate
cluster setting to a value below your max disk write speed. For example, to set it to 10MB/s, execute: {% include copy-clipboard.html %}> SET CLUSTER SETTING kv.bulk_io_write.max_rate = '10MB';