Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add script to auto-generate list of data formats from system table #2946

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
184 changes: 107 additions & 77 deletions docs/en/chdb/data-formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,83 +18,113 @@ As well as the data formats that ClickHouse supports, chDB also supports:

The supported data formats from ClickHouse are:

| Format | Input | Output |
|---------------------------------|-------|--------|
| TabSeparated | ✔ | ✔ |
| TabSeparatedRaw | ✔ | ✔ |
| TabSeparatedWithNames | ✔ | ✔ |
| TabSeparatedWithNamesAndTypes | ✔ | ✔ |
| TabSeparatedRawWithNames | ✔ | ✔ |
| TabSeparatedRawWithNamesAndTypes| ✔ | ✔ |
| Template | ✔ | ✔ |
| TemplateIgnoreSpaces | ✔ | ✗ |
| CSV | ✔ | ✔ |
| CSVWithNames | ✔ | ✔ |
| CSVWithNamesAndTypes | ✔ | ✔ |
| CustomSeparated | ✔ | ✔ |
| CustomSeparatedWithNames | ✔ | ✔ |
| CustomSeparatedWithNamesAndTypes| ✔ | ✔ |
| SQLInsert | ✗ | ✔ |
| Values | ✔ | ✔ |
| Vertical | ✗ | ✔ |
| JSON | ✔ | ✔ |
| JSONAsString | ✔ | ✗ |
| JSONStrings | ✔ | ✔ |
| JSONColumns | ✔ | ✔ |
| JSONColumnsWithMetadata | ✔ | ✔ |
| JSONCompact | ✔ | ✔ |
| JSONCompactStrings | ✗ | ✔ |
| JSONCompactColumns | ✔ | ✔ |
| JSONEachRow | ✔ | ✔ |
| PrettyJSONEachRow | ✗ | ✔ |
| JSONEachRowWithProgress | ✗ | ✔ |
| JSONStringsEachRow | ✔ | ✔ |
| JSONStringsEachRowWithProgress | ✗ | ✔ |
| JSONCompactEachRow | ✔ | ✔ |
| JSONCompactEachRowWithNames | ✔ | ✔ |
| JSONCompactEachRowWithNamesAndTypes | ✔ | ✔ |
| JSONCompactStringsEachRow | ✔ | ✔ |
| JSONCompactStringsEachRowWithNames | ✔ | ✔ |
<!-- DO NOT REMOVE THE LINES BELOW - used to generate table of data formats -->
<!-- DATA FORMATS TABLE BEGIN -->
| Name | Input | Output |
| --- | --- | --- |
| Arrow | ✔ | ✔ |
| ArrowStream | ✔ | ✔ |
| Avro | ✔ | ✔ |
| AvroConfluent | ✔ | ✗ |
| BSONEachRow | ✔ | ✔ |
| CSV | ✔ | ✔ |
| CSVWithNames | ✔ | ✔ |
| CSVWithNamesAndTypes | ✔ | ✔ |
| CapnProto | ✔ | ✔ |
| CustomSeparated | ✔ | ✔ |
| CustomSeparatedIgnoreSpaces | ✔ | ✗ |
| CustomSeparatedIgnoreSpacesWithNames | ✔ | ✗ |
| CustomSeparatedIgnoreSpacesWithNamesAndTypes | ✔ | ✗ |
| CustomSeparatedWithNames | ✔ | ✔ |
| CustomSeparatedWithNamesAndTypes | ✔ | ✔ |
| DWARF | ✔ | ✗ |
| Form | ✔ | ✗ |
| HiveText | ✔ | ✗ |
| JSON | ✔ | ✔ |
| JSONAsObject | ✔ | ✗ |
| JSONAsString | ✔ | ✗ |
| JSONColumns | ✔ | ✔ |
| JSONColumnsWithMetadata | ✔ | ✔ |
| JSONCompact | ✔ | ✔ |
| JSONCompactColumns | ✔ | ✔ |
| JSONCompactEachRow | ✔ | ✔ |
| JSONCompactEachRowWithNames | ✔ | ✔ |
| JSONCompactEachRowWithNamesAndTypes | ✔ | ✔ |
| JSONCompactStrings | ✗ | ✔ |
| JSONCompactStringsEachRow | ✔ | ✔ |
| JSONCompactStringsEachRowWithNames | ✔ | ✔ |
| JSONCompactStringsEachRowWithNamesAndTypes | ✔ | ✔ |
| JSONObjectEachRow | ✔ | ✔ |
| BSONEachRow | ✔ | ✔ |
| TSKV | ✔ | ✔ |
| Pretty | ✗ | ✔ |
| PrettyNoEscapes | ✗ | ✔ |
| PrettyMonoBlock | ✗ | ✔ |
| PrettyNoEscapesMonoBlock | ✗ | ✔ |
| PrettyCompact | ✗ | ✔ |
| PrettyCompactNoEscapes | ✗ | ✔ |
| PrettyCompactMonoBlock | ✗ | ✔ |
| PrettyCompactNoEscapesMonoBlock | ✗ | ✔ |
| PrettySpace | ✗ | ✔ |
| PrettySpaceNoEscapes | ✗ | ✔ |
| PrettySpaceMonoBlock | ✗ | ✔ |
| PrettySpaceNoEscapesMonoBlock | ✗ | ✔ |
| Prometheus | ✗ | ✔ |
| Protobuf | ✔ | ✔ |
| ProtobufSingle | ✔ | ✔ |
| Avro | ✔ | ✔ |
| AvroConfluent | ✔ | ✗ |
| Parquet | ✔ | ✔ |
| ParquetMetadata | ✔ | ✗ |
| Arrow | ✔ | ✔ |
| ArrowStream | ✔ | ✔ |
| ORC | ✔ | ✔ |
| One | ✔ | ✗ |
| RowBinary | ✔ | ✔ |
| RowBinaryWithNames | ✔ | ✔ |
| RowBinaryWithNamesAndTypes | ✔ | ✔ |
| RowBinaryWithDefaults | ✔ | ✔ |
| Native | ✔ | ✔ |
| Null | ✗ | ✔ |
| XML | ✗ | ✔ |
| CapnProto | ✔ | ✔ |
| LineAsString | ✔ | ✔ |
| Regexp | ✔ | ✗ |
| RawBLOB | ✔ | ✔ |
| MsgPack | ✔ | ✔ |
| MySQLDump | ✔ | ✗ |
| Markdown | ✗ | ✔ |
| JSONEachRow | ✔ | ✔ |
| JSONEachRowWithProgress | ✗ | ✔ |
| JSONLines | ✔ | ✔ |
| JSONObjectEachRow | ✔ | ✔ |
| JSONStrings | ✗ | ✔ |
| JSONStringsEachRow | ✔ | ✔ |
| JSONStringsEachRowWithProgress | ✗ | ✔ |
| LineAsString | ✔ | ✔ |
| LineAsStringWithNames | ✗ | ✔ |
| LineAsStringWithNamesAndTypes | ✗ | ✔ |
| Markdown | ✗ | ✔ |
| MsgPack | ✔ | ✔ |
| MySQLDump | ✔ | ✗ |
| MySQLWire | ✗ | ✔ |
| NDJSON | ✔ | ✔ |
| Native | ✔ | ✔ |
| Npy | ✔ | ✔ |
| Null | ✗ | ✔ |
| ODBCDriver2 | ✗ | ✔ |
| ORC | ✔ | ✔ |
| One | ✔ | ✗ |
| Parquet | ✔ | ✔ |
| ParquetMetadata | ✔ | ✗ |
| PostgreSQLWire | ✗ | ✔ |
| Pretty | ✗ | ✔ |
| PrettyCompact | ✗ | ✔ |
| PrettyCompactMonoBlock | ✗ | ✔ |
| PrettyCompactNoEscapes | ✗ | ✔ |
| PrettyCompactNoEscapesMonoBlock | ✗ | ✔ |
| PrettyJSONEachRow | ✗ | ✔ |
| PrettyJSONLines | ✗ | ✔ |
| PrettyMonoBlock | ✗ | ✔ |
| PrettyNDJSON | ✗ | ✔ |
| PrettyNoEscapes | ✗ | ✔ |
| PrettyNoEscapesMonoBlock | ✗ | ✔ |
| PrettySpace | ✗ | ✔ |
| PrettySpaceMonoBlock | ✗ | ✔ |
| PrettySpaceNoEscapes | ✗ | ✔ |
| PrettySpaceNoEscapesMonoBlock | ✗ | ✔ |
| Prometheus | ✗ | ✔ |
| Protobuf | ✔ | ✔ |
| ProtobufList | ✔ | ✔ |
| ProtobufSingle | ✔ | ✔ |
| Raw | ✔ | ✔ |
| RawBLOB | ✔ | ✔ |
| RawWithNames | ✔ | ✔ |
| RawWithNamesAndTypes | ✔ | ✔ |
| Regexp | ✔ | ✗ |
| RowBinary | ✔ | ✔ |
| RowBinaryWithDefaults | ✔ | ✗ |
| RowBinaryWithNames | ✔ | ✔ |
| RowBinaryWithNamesAndTypes | ✔ | ✔ |
| SQLInsert | ✗ | ✔ |
| TSKV | ✔ | ✔ |
| TSV | ✔ | ✔ |
| TSVRaw | ✔ | ✔ |
| TSVRawWithNames | ✔ | ✔ |
| TSVRawWithNamesAndTypes | ✔ | ✔ |
| TSVWithNames | ✔ | ✔ |
| TSVWithNamesAndTypes | ✔ | ✔ |
| TabSeparated | ✔ | ✔ |
| TabSeparatedRaw | ✔ | ✔ |
| TabSeparatedRawWithNames | ✔ | ✔ |
| TabSeparatedRawWithNamesAndTypes | ✔ | ✔ |
| TabSeparatedWithNames | ✔ | ✔ |
| TabSeparatedWithNamesAndTypes | ✔ | ✔ |
| Template | ✔ | ✔ |
| TemplateIgnoreSpaces | ✔ | ✗ |
| Values | ✔ | ✔ |
| Vertical | ✗ | ✔ |
| XML | ✗ | ✔ |
<!-- DATA FORMATS TABLE END -->

For further information and examples, see [ClickHouse formats for input and output data](/docs/en/interfaces/formats).
4 changes: 3 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,11 @@
"new-build": "bash ./copyClickhouseRepoDocs.sh && bash ./scripts/settings/autogenerate-settings.sh && yarn build-api-doc && yarn build && yarn build-swagger",
"start": "docusaurus start",
"swizzle": "docusaurus swizzle",
"write-heading-ids": "docusaurus write-heading-ids"
"write-heading-ids": "docusaurus write-heading-ids",
"autogen_data_formats_table" : "node scripts/autogenerated-content/autogen_data_formats_tables.mjs"
},
"dependencies": {
"@clickhouse/client": "^1.10.0",
"@docusaurus/core": "2.3.1",
"@docusaurus/plugin-client-redirects": "2.3.1",
"@docusaurus/preset-classic": "2.3.1",
Expand Down
32 changes: 32 additions & 0 deletions scripts/autogenerated-content/autogen_data_formats_tables.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import { createClient } from '@clickhouse/client'
import {jsonToTable, insertTextBetweenTags} from './utilities.mjs';

/*
This script is used to automatically generate the tables of data formats found at:
https://clickhouse.com/docs/en/interfaces/formats
https://clickhouse.com/docs/en/chdb/data-formats
*/

const play_endpoint = 'https://play.clickhouse.com/';
const client = createClient({
/* configuration */
url: play_endpoint,
username: 'explorer'
})

const resultSet = await client.query({
query: 'SELECT name AS Name, if(is_input, \'✔\', \'✗\') AS Input,' +
'if(is_output, \'✔\', \'✗\') AS Output ' +
'FROM system.formats ORDER BY name ASC'
})
const dataset = await resultSet.json()

let data_formats_table = jsonToTable(dataset.data)
// file paths should be provided relative
const file_paths = ['docs/en/interfaces/formats.md', 'docs/en/chdb/data-formats.md']
const startTag = '<!-- DATA FORMATS TABLE BEGIN -->';
const endTag = '<!-- DATA FORMATS TABLE END -->';

file_paths.forEach((file_path) => {
insertTextBetweenTags(file_path, data_formats_table, startTag, endTag);
})
41 changes: 41 additions & 0 deletions scripts/autogenerated-content/utilities.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import * as fs from 'fs'
const jsonToTable = (jsonData) => {
if (!Array.isArray(jsonData) || jsonData.length === 0) {
return "";
}

const headers = Object.keys(jsonData[0]);
const headerRow = `| ${headers.join(' | ')} |\n`;
const separatorRow = `| ${headers.map(() => '---').join(' | ')} |\n`;

const rows = jsonData.map(obj => `| ${headers.map(key => obj[key] || '').join(' | ')} |`);

return `\n${headerRow}${separatorRow}${rows.join('\n')}\n`;
}

const insertTextBetweenTags = (filePath, textToInsert, startTag, endTag) => {
try {
const fileContent = fs.readFileSync(filePath, 'utf-8');

const startIndex = fileContent.indexOf(startTag) + startTag.length;
const endIndex = fileContent.indexOf(endTag);

if (startIndex === -1 || endIndex === -1) {
console.error(`Error: Tags "${startTag}" or "${endTag}" not found in the file.`);
return;
}

const newContent =
fileContent.substring(0, startIndex) +
textToInsert +
fileContent.substring(endIndex);

fs.writeFileSync(filePath, newContent, 'utf-8');
console.log('Text inserted successfully.');

} catch (err) {
console.error(`Error: ${err}`);
}
}

export {jsonToTable, insertTextBetweenTags}
Loading