Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databend vs. Snowflake Features and Capabilities #13059

Open
BohuTANG opened this issue Sep 27, 2023 · 10 comments
Open

Databend vs. Snowflake Features and Capabilities #13059

BohuTANG opened this issue Sep 27, 2023 · 10 comments

Comments

@BohuTANG
Copy link
Member

BohuTANG commented Sep 27, 2023

Features and Capabilities

Databend is an open-source alternative to Snowflake, so we need a page to introduce the differences between Databend and Snowflake.
Feel free to contribute if you notice any omissions.

Category Snowflake Databend
Numeric Data Types Data Types Yes Yes [1, 2]
String & Binary Data Types Data Types Yes Yes
Logical Data Types Data Types Yes Yes
Date & Time Data Types Data Types Yes Yes
Geospatial Data Types Data Types Yes Yes
Bitmap Data Types Data Types No Yes
Semi-structured Data Types Data Types Yes Yes
Database Databases Yes Yes
Sequence Sequences Yes Yes
Table Tables Yes Yes
External Location Table(Store data in your own S3 bucket) Tables No (Snowflake Native Format with Micro-Partition) Yes (Databend Native Format with FUSE Table Format)
Vacuum Tables No Yes
Clustering Tables Yes Yes(Distributed)
Full-Text Index Tables No Yes
Hive Lake Yes Yes
Iceberg Lake Yes Yes
View Views Yes Yes
Materialized Views Views Yes Yes
User, Role, & Privilege Privileges Yes Yes
Query Queries Yes Yes
Update General DML Yes Yes(Distributed)
Replace General DML No Yes(Distributed)
Delete General DML Yes Yes(Distributed)
Merge General DML Yes Yes
Stage(Internal/External/Remove/List) Data Loading Yes Yes
Query Stage File Data Loading Yes Yes
Copy Into Table/Location Data Loading Yes Yes(Distributed)
Transforming Data During a Load Data Loading Yes Yes
Pipe Data Loading Yes Yes(Cloud)
Task Data Loading Yes, minimum schedule is 1 minute Yes, minimum schedule is 500 millisecond
Stream Data Loading Yes Yes
Table Stream(Append-only/Standard) Data Loading Yes Yes
User-Defined Functions Functions Yes Yes
External Functions Functions Yes Yes
Stored Procedure Functions Yes Planning
AI Functions Functions Yes, 2024 Add Yes
Window Function Functions Yes Yes
Time Travel Data Recovery Yes Yes
Data Sharing Data Sharing Yes Yes
RBAC Security Yes Yes
Network Policies Security Yes Yes
Masking Policies Security Yes Yes
Kafka Connect Connectors Yes Yes
Flink CDC Connectors No Yes
dbt Connectors Yes Yes
Drivers Drivers Go
Java
Ptyhon
Node.js
Go
Java
Python
Node.js
Rust
Embedded as a Python Library No Yes
Storage Format Storage MicroPartition Parquet
Aggregate&Join&Sort Spill Performance Optimization Yes Yes
Local Disk Cache Performance Optimization Yes Yes
Profiling Performance Optimization Powerful Powerful
CLI Client Client SnowSQL BendSQL
TPC-H Benchmark Yes Yes
TPC-DS Benchmark Yes Yes
Supported Cloud Platforms Cloud AWS/GCP/AZURE Any Cloud

Databend and Snowflake Functions Comparison

Mathematical Functions

Databend Function Snowflake Function
ABS(x) ABS(x)
ACOS(x) ACOS(x)
ASIN(x) ASIN(x)
ATAN(x) ATAN(x)
ATAN2(y, x) ATAN2(y, x)
CEIL(x) CEIL(x)
CEILING(x) CEILING(x)
COS(x) COS(x)
DEGREES(x) DEGREES(x)
EXP(x) EXP(x)
FLOOR(x) FLOOR(x)
LN(x) LN(x)
LOG(x) LOG(x)
LOG10(x) LOG10(x)
LOG2(x) LOG2(x)
PI() PI()
POW(x, y) POW(x, y)
POWER(x, y) POWER(x, y)
RADIANS(x) RADIANS(x)
RAND() RAND()
ROUND(x, d) ROUND(x, d)
SIGN(x) SIGN(x)
SIN(x) SIN(x)
SQRT(x) SQRT(x)
TAN(x) TAN(x)

String Functions

Databend Function Snowflake Function
ASCII(expr) ASCII(expr)
CHAR(n, ...) CHAR(n, ...)
CONCAT(expr1, ...) CONCAT(expr1, ...)
CONCAT_WS(separator, expr1, ...) CONCAT_WS(separator, expr1, ...)
INSTR(str, substr) INSTR(str, substr)
LENGTH(str) LENGTH(str)
LCASE(str) LOWER(str)
LEFT(str, len) LEFT(str, len)
LOWER(str) LOWER(str)
LTRIM(str) LTRIM(str)
POSITION(substr IN str) POSITION(substr IN str)
REPEAT(str, count) REPEAT(str, count)
REPLACE(str, from_str, to_str) REPLACE(str, from_str, to_str)
REVERSE(str) REVERSE(str)
RIGHT(str, len) RIGHT(str, len)
RTRIM(str) RTRIM(str)
SPACE(n) SPACE(n)
SUBSTR(str, pos) SUBSTR(str, pos)
SUBSTR(str, pos, len) SUBSTR(str, pos, len)
SUBSTRING(str, pos) SUBSTRING(str, pos)
SUBSTRING(str, pos, len) SUBSTRING(str, pos, len)
`TRIM([{BOTH LEADING
UCASE(str) UPPER(str)
UPPER(str) UPPER(str)

Date and Time Functions

Databend Function Snowflake Function
ADD_DAYS(date, interval) DATEADD(day, interval, date)
ADD_HOURS(date, interval) DATEADD(hour, interval, date)
ADD_MINUTES(date, interval) DATEADD(minute, interval, date)
ADD_MONTHS(date, interval) DATEADD(month, interval, date)
ADD_SECONDS(date, interval) DATEADD(second, interval, date)
ADD_YEARS(date, interval) DATEADD(year, interval, date)
DATE(expr) DATE(expr)
DAY(date) DAY(date)
DAYNAME(date) DAYNAME(date)
DAY_OF_MONTH(date) DAYOFMONTH(date)
DAY_OF_WEEK(date) DAYOFWEEK(date)
DAY_OF_YEAR(date) DAYOFYEAR(date)
HOUR(date) HOUR(date)
LAST_DAY(date) LAST_DAY(date)
MINUTE(time) MINUTE(time)
MONTH(date) MONTH(date)
NOW() CURRENT_TIMESTAMP()
SECOND(time) SECOND(time)
TO_DATE(expr) TO_DATE(expr)
TO_TIMESTAMP(expr) TO_TIMESTAMP(expr)
WEEK(date) WEEK(date)
YEAR(date) YEAR(date)
YESTERDAY() DATEADD(day, -1, CURRENT_DATE())
TOMORROW() DATEADD(day, 1, CURRENT_DATE())
TO_START_OF_DAY(expr) DATE_TRUNC('day', expr)
TO_START_OF_HOUR(expr) DATE_TRUNC('hour', expr)
TO_START_OF_MINUTE(expr) DATE_TRUNC('minute', expr)
TO_START_OF_MONTH(expr) DATE_TRUNC('month', expr)
TO_START_OF_QUARTER(expr) DATE_TRUNC('quarter', expr)
TO_START_OF_WEEK(expr) DATE_TRUNC('week', expr)
TO_START_OF_YEAR(expr) DATE_TRUNC('year', expr)
TO_UNIX_TIMESTAMP(expr) TO_TIMESTAMP(expr)
TO_YEARMONTH(expr) TO_CHAR(expr, 'YYYY-MM')
TO_YYYYMMDD(expr) TO_CHAR(expr, 'YYYYMMDD')
TO_YYYYMMDDHH(expr) TO_CHAR(expr, 'YYYYMMDDHH')
TO_YYYYMMDDHHMMSS(expr) TO_CHAR(expr, 'YYYYMMDDHHMMSS')

Aggregate Functions

Databend Function Snowflake Function
AVG(x) AVG(x)
COUNT(x) COUNT(x)
MAX(x) MAX(x)
MIN(x) MIN(x)
SUM(x) SUM(x)

Conversion Functions

Databend Function Snowflake Function
CAST(expr AS type) CAST(expr AS type)
TO_BOOLEAN(expr) CAST(expr AS BOOLEAN)
TO_STRING(expr) CAST(expr AS STRING)
TO_INT32(expr) CAST(expr AS INT)
TO_INT64(expr) CAST(expr AS BIGINT)
TO_UINT32(expr) CAST(expr AS NUMBER)
TO_UINT64(expr) CAST(expr AS NUMBER)
TO_FLOAT32(expr) CAST(expr AS FLOAT)
TO_FLOAT64(expr) CAST(expr AS DOUBLE)
TO_DATE(expr) CAST(expr AS DATE)
TO_TIMESTAMP(expr) CAST(expr AS TIMESTAMP)
TRY_CAST(expr AS type) TRY_CAST(expr AS type)

JSON Functions

Databend Function Snowflake Function
JSON_OBJECT(key1, value1[, key2, value2[, ...]]) OBJECT_CONSTRUCT(key1, value1[, key2, value2[, ...]])
PARSE_JSON(expr) PARSE_JSON(expr)
JSON_EXTRACT_PATH_TEXT(expr, path_name) GET(expr, path_name)
JSON_PRETTY(json_string) TO_JSON_STRING(json_string)
JSON_ARRAY(value1[, value2[, ...]]) ARRAY_CONSTRUCT(value1[, value2[, ...]])
JSON_PATH_QUERY(variant, path_name) PATH(expr, path_name)
JSON_PATH_QUERY_FIRST(variant, path_name) PATH(expr, path_name)
CHECK_JSON(expr) IS_JSON(expr)
GET_PATH(variant, path_name) GET_PATH(variant, path_name)
JSON_PATH_EXISTS(json_data, json_path_expression) JSON_PATH_EXISTS(json_data, json_path_expression)
JSON_OBJECT_KEEP_NULL(key1, value1[, key2, value2[, ...]]) OBJECT_CONSTRUCT_KEEP_NULL(key1, value1[, key2, value2[, ...]])
IS_INTEGER(expr) IS_INTEGER(expr)
IS_NULL_VALUE(expr) IS_NULL_VALUE(expr)
IS_STRING(expr) IS_STRING(expr)
IS_BOOLEAN(expr) IS_BOOLEAN(expr)
`FLATTEN(INPUT => expr [, PATH => expr ] [ , OUTER => TRUE FALSE ] [ , RECURSIVE => TRUE
IS_ARRAY(expr) IS_ARRAY(expr)
GET(variant, index) GET(variant, index)
JSON_PATH_QUERY_ARRAY(variant, path_name) PATH_ARRAY(variant, path_name)
GET_IGNORE_CASE(variant, field_name) GET_IGNORE_CASE(variant, field_name)
JSON_ARRAY_ELEMENTS(json_string) ARRAY_ELEMENTS(json_string)
JSON_STRIP_NULLS(json_string) REMOVE_NULLS(json_string)
AS_BOOLEAN(variant) CAST(variant AS BOOLEAN)
AS_INTEGER(variant) CAST(variant AS INTEGER)
AS_FLOAT(variant) CAST(variant AS FLOAT)
AS_STRING(variant) CAST(variant AS STRING)
AS_ARRAY(variant) CAST(variant AS ARRAY)
AS_OBJECT(variant) CAST(variant AS OBJECT)

Bitmap Functions

Databend Function Snowflake Function
BUILD_BITMAP(expr) N/A
TO_BITMAP(expr) N/A
BITMAP_AND(expr1, expr2) N/A
BITMAP_OR(expr1, expr2) N/A
BITMAP_XOR(expr1, expr2) N/A
BITMAP_HAS_ALL(expr1, expr2) N/A
BITMAP_HAS_ANY(expr1, expr2) N/A
BITMAP_TO_STRING(expr) N/A

Conditional Functions

Databend Function Snowflake Function
IF(cond, expr1, expr2) IFF(cond, expr1, expr2)
NULLIF(expr1, expr2) NULLIF(expr1, expr2)
IFNULL(expr1, expr2) IFNULL(expr1, expr2)
COALESCE(expr1, expr2[, ...]) COALESCE(expr1, expr2[, ...])

Encryption and Hash Functions

Databend Function Snowflake Function
MD5(expr) MD5(expr)
SHA(expr) SHA(expr)
SHA2(expr, x) SHA2(expr, x)
CRC32(expr) CRC32(expr)
XXHASH32(expr) N/A
XXHASH64(expr) N/A
BLAKE3(expr) N/A
SIPHASH64(expr) N/A
CITY64WITHSEED(expr1, expr2) N/A

Window Functions

Databend Function Snowflake Function
ROW_NUMBER() ROW_NUMBER()
RANK() RANK()
DENSE_RANK() DENSE_RANK()
PERCENT_RANK() PERCENT_RANK()
CUME_DIST() CUME_DIST()
NTILE(n) NTILE(n)
LAG(expr, n, default) LAG(expr, n, default)
LEAD(expr, n, default) LEAD(expr, n, default)
FIRST_VALUE(expr) FIRST_VALUE(expr)
LAST_VALUE(expr) LAST_VALUE(expr)
NTH_VALUE(expr, n) NTH_VALUE(expr, n)

Array Functions

Databend Function Snowflake Function
ARRAY_REDUCE(array, lambda) N/A
ARRAY_FLATTEN(array) ARRAY_FLATTEN( <array> )
SLICE(array, start[, end]) SLICE(array, start[, end])
ARRAY_TO_STRING(array, '<separator>') ARRAY_TO_STRING(array, '<separator>')
CONTAINS(array, element) ARRAY_CONTAINS(array, element)
ARRAY_FILTER(array, lambda) N/A
ARRAY_SORT(array[, order[, nullposition]]) ARRAY_SORT(array[, order[, nullposition]])
UNNEST(array) UNNEST(array)
ARRAY_CONCAT(array1, array2) ARRAY_CAT(array1, array2)
GET(array, index) GET(array, index)
ARRAY_INDEX_OF(array, element) ARRAY_INDEX_OF(array, element)
ARRAY_UNIQUE(array) ARRAY_UNIQUE(array)
ARRAY_REMOVE_FIRST(array) N/A
ARRAY_TRANSFORM(array, lambda) TRANSFORM( <array> , <lambda_expression> )
ARRAY_DISTINCT(array) ARRAY_DISTINCT(array)
ARRAY_APPEND(array, element) ARRAY_APPEND(array, element)
ARRAY_PREPEND(element, array) ARRAY_PREPEND(element, array)
ARRAY_LENGTH(array) ARRAY_LENGTH(array)
ARRAY_APPLY(array, lambda) N/A

Geospatial Functions

Databend Function Snowflake Function
H3_EXACT_EDGE_LENGTH_KM(h3) H3_EXACT_EDGE_LENGTH_KM(h3)
H3_EDGE_LENGTH_M(1) H3_EDGE_LENGTH_M(1)
H3_LINE(h3, a_h3) H3_LINE(h3, a_h3)
H3_EDGE_LENGTH_KM(res) H3_EDGE_LENGTH_KM(res)
H3_EXACT_EDGE_LENGTH_M(h3) H3_EXACT_EDGE_LENGTH_M(h3)
GEOHASH_DECODE('<geohashed-string>') GEOHASH_DECODE('<geohashed-string>')
H3_HEX_AREA_M2(res) H3_HEX_AREA_M2(res)
H3_TO_GEO_BOUNDARY(h3) H3_TO_GEO_BOUNDARY(h3)
H3_GET_UNIDIRECTIONAL_EDGES_FROM_HEXAGON(h3) H3_GET_UNIDIRECTIONAL_EDGES_FROM_HEXAGON(h3)
H3_HEX_AREA_KM2(res) H3_HEX_AREA_KM2(res)
H3_IS_VALID(h3) H3_IS_VALID(h3)
H3_TO_GEO(h3) H3_TO_GEO(h3)
H3_UNIDIRECTIONAL_EDGE_IS_VALID(h3) H3_UNIDIRECTIONAL_EDGE_IS_VALID(h3)
GEO_TO_H3(lon, lat, res) GEO_TO_H3(lon, lat, res)
GEOHASH_ENCODE(lon, lat) GEOHASH_ENCODE(lon, lat)
POINT_IN_POLYGON((x,y), [(a,b), (c,d), (e,f) ... ]) POINT_IN_POLYGON((x,y), [(a,b), (c,d), (e,f) ... ])
H3_DISTANCE(h3, a_h3) H3_DISTANCE(h3, a_h3)
H3_CELL_AREA_RADS2(h3) H3_CELL_AREA_RADS2(h3)
H3_TO_CENTER_CHILD(h3, res) H3_TO_CENTER_CHILD(h3, res)
H3_CELL_AREA_M2(h3) H3_CELL_AREA_M2(h3)
H3_GET_FACES(h3) H3_GET_FACES(h3)
H3_IS_PENTAGON(h3) H3_IS_PENTAGON(h3)
STRING_TO_H3(h3) STRING_TO_H3(h3)
H3_GET_UNIDIRECTIONAL_EDGE(h3, a_h3) H3_GET_UNIDIRECTIONAL_EDGE(h3, a_h3)
H3_GET_BASE_CELL(h3) H3_GET_BASE_CELL(h3)
H3_GET_UNIDIRECTIONAL_EDGE_BOUNDARY(h3) H3_GET_UNIDIRECTIONAL_EDGE_BOUNDARY(h3)
H3_GET_DESTINATION_INDEX_FROM_UNIDIRECTIONAL_EDGE(h3) H3_GET_DESTINATION_INDEX_FROM_UNIDIRECTIONAL_EDGE(h3)
H3_TO_PARENT(h3, parent_res) H3_TO_PARENT(h3, parent_res)
H3_NUM_HEXAGONS(res) H3_NUM_HEXAGONS(res)
H3_TO_CHILDREN(h3, child_res) H3_TO_CHILDREN(h3, child_res)
H3_GET_INDEXES_FROM_UNIDIRECTIONAL_EDGE(h3) H3_GET_INDEXES_FROM_UNIDIRECTIONAL_EDGE(h3)
H3_TO_STRING(h3) H3_TO_STRING(h3)
H3_K_RING(h3, k) H3_K_RING(h3, k)
H3_IS_RES_CLASS_III(h3) H3_IS_RES_CLASS_III(h3)
H3_INDEXES_ARE_NEIGHBORS(h3, a_h3) H3_INDEXES_ARE_NEIGHBORS(h3, a_h3)
H3_GET_ORIGIN_INDEX_FROM_UNIDIRECTIONAL_EDGE(h3) H3_GET_ORIGIN_INDEX_FROM_UNIDIRECTIONAL_EDGE(h3)
H3_GET_RESOLUTION(h3) H3_GET_RESOLUTION(h3)

Other Functions

Databend Function Snowflake Function
NEXTVAL(seq_name) NEXTVAL(seq_name)
QUOTE(str) QUOTE(str)
@BohuTANG BohuTANG pinned this issue Sep 27, 2023
@ZhiHanZ

This comment was marked as outdated.

@PsiACE

This comment was marked as outdated.

@xudong963
Copy link
Member

We also support aggregate and hash join spilling.

@BohuTANG BohuTANG changed the title doc: Snowflake & Databend Comprehensive Comparison: Databend vs. Snowflake Features and Capabilities Oct 8, 2023
@BohuTANG BohuTANG changed the title Comprehensive Comparison: Databend vs. Snowflake Features and Capabilities Databend vs. Snowflake Features and Capabilities Oct 8, 2023
@Chasen-Zhang
Copy link
Member

Chasen-Zhang commented Dec 4, 2023

@BohuTANG Could you help me change the address of the connection,
Here are the rules:
/doc/sql-reference/* => /sql/sql-reference/*
/doc/sql-commands/* => /sql/sql-commands/*
/doc/sql-functions/* =>/sql/sql-functions/*
(👆 Although I redirected all of these, it's better to change them all together. And you can directly change the domain name to https://docs.databend.com )

https://docs.databend.com/load-data/pipeline => https://docs.databend.com/doc/load-data/pipeline
https://databend.rs/blog/2023-08-01-iceberg-integration=> https://www.databend.com/blog/2023-08-01-iceberg-integration/

Thank you in advance!

@cdmikechen
Copy link

@BohuTANG
Wish to support Geospatial Data Types, we used to have a lot of spatial data needs, and if databend can support geo, it will be a great help to us ~

@BohuTANG BohuTANG unpinned this issue Dec 27, 2023
@BohuTANG BohuTANG pinned this issue Dec 27, 2023
@sfc-gh-space
Copy link

I didn't do a comprehensive review, but your Snowflake features are out of date. A few examples:

@BohuTANG
Copy link
Member Author

Hi @sfc-gh-space,

Thank you for the information. I've updated the outdated sections with the information you provided.

Snowflake supports external tables, both in your own VPC and on-prem from S3-compatible storage arrays.

While Snowflake supports external tables, it's important to note that Databend’s external location tables operate differently (see: Databend External Location Tables). In Snowflake, you can't place the Snowflake-specific micro-partition table format into your own S3 bucket, as it's confined to Snowflake-managed storage. However, Databend allows external table data to be stored directly in your S3 bucket. Databend’s approach is more akin to Snowflake's external stage queries (see: Querying an External Stage).

Embedded Python Library:...

Here we have changed it to Embedded as a Python Library.

@sfc-gh-space
Copy link

Regarding your comment: "Databend allows external table data to be stored directly in your S3 bucket." Apache Iceberg™ is a fully native table format (e.g a first party object) for Snowflake since June 2024. This means Snowflake customers can choose to store their data in their own S3 bucket if they like. Apache Iceberg™ tables are Parquet files with metadata. Databend appears to store data in Parquet as well. I see zero distinction.

@BohuTANG
Copy link
Member Author

Hi @sfc-gh-space,

Just to clarify, Databend's FUSE table format is our default format and is similar to Snowflake's table format with micro-partitions. It is optimized for performance, multi-transaction support, streaming capabilities, and includes significant improvements for columnar storage.

One key difference is that Databend allows you to store FUSE tables in external locations, such as your own bucket, while Snowflake's external tables only support external formats.

@sfc-gh-space
Copy link

I assume this purpose of this matrix is to help companies understand the difference between Databend and Snowflake. In this case, you've used a term ("external table") that means something specific to Snowflake which is causing confusion. You've attempted to clarify this by redefining external table to mean "native table in a customer's VPC". But here's the issue:

12 years ago Snowflake only had one internal table format, FDN. However, in 2024, Snowflake has additional native tables formats for different use cases. FDN (micro-partitions), Apache Iceberg™ (Parquet + metadata), hybrid tables (row+column), Event Tables (OpenTelemetry), etc.

If you have redefined this column to mean "native table format in customer VPC", then the answer for Snowflake is Yes: Apache Iceberg™ (Parquet + metadata). The answer for Databend is Yes: FUSE (Parquet + metadata).

Apache Iceberg™ tables in Snowflake are fully supported first party objects that use the same query compiler, have the same ability to prune, and support other native Snowflake features like live sharing, data governance, etc. More importantly for customers, performance on these tables are excellent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants