Skip to content

Bulk insertion

Oleg V. Kozlyuk edited this page Sep 19, 2020 · 18 revisions

Rationale

ClickHouse is specialized in storing huge volumes of logs/metrics data. While it can accept single-row inserts (and this client supports them), the specialized MergeTree family engines work best when data is inserted in bulk. ClickHouse.Client supports this scenario via specialized ClickHouseBulkCopy class

Overview

Using ClickHouseBulkCopy requires:

  • Target connection (ClickHouseConnection instance)
  • Target table name (DestinationTableName property)
  • Data source (IDataReader or IEnumerable<object[]>)

Notes

  • To make best use of ClickHouse properties, ClickHouseBulkCopy utilizes TPL to process batches of data, with up to 4 parallel insertion tasks (tweakable)
  • Column names can be optionally provided if source data has less columns than target table
  • Following parameters are tweakable: BatchSize, MaxDegreeOfParallelism
  • Before copying, a SELECT * FROM <table> LIMIT 0 query is performed to get information about target table structure. Types of provided objects must (reasonably) match the target table - i.e. it is pointless to try to write string to integer column, so an exception will be thrown on such attempt
  • It's recommended to not use sessions when using Bulk insertion, or set MaxDegreeOfParallelism to 1

Example usage

using var connection = new ClickHouseConnection("Host=<host>;Driver=Binary;<..other..>");

using var bulkCopyInterface = new ClickHouseBulkCopy(connection)
{
    DestinationTableName = "<database>.<table>",
    BatchSize = 100000
};

// Example data to insert
var values = Enumerable.Range(0, 1000000).Select(i => new object[] { (long)i, "value" + i.ToString() });
await bulkCopyInterface.WriteToServerAsync(values);
Console.WriteLine(bulkCopyInterface.RowsWritten);
Clone this wiki locally