Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: User Guide: Move Data Types and Information Schema to their own pages #3131

Merged
merged 6 commits into from
Aug 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions docs/source/user-guide/sql/data_types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Data Types

DataFusion uses Arrow, and thus the Arrow type system, for query
execution. The SQL types from
[sqlparser-rs](https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/ast/data_type.rs#L27)
are mapped to Arrow types according to the following table

| SQL Data Type | Arrow DataType |
| ------------- | ------------------------------------------------------------------------ |
| `CHAR` | `Utf8` |
| `VARCHAR` | `Utf8` |
| `UUID` | _Not yet supported_ |
| `CLOB` | _Not yet supported_ |
| `BINARY` | _Not yet supported_ |
| `VARBINARY` | _Not yet supported_ |
| `DECIMAL` | `Float64` |
| `FLOAT` | `Float32` |
| `SMALLINT` | `Int16` |
| `INT` | `Int32` |
| `BIGINT` | `Int64` |
| `REAL` | `Float32` |
| `DOUBLE` | `Float64` |
| `BOOLEAN` | `Boolean` |
| `DATE` | `Date32` |
| `TIME` | `Time64(TimeUnit::Nanosecond)` |
| `TIMESTAMP` | `Timestamp(TimeUnit::Nanosecond)` |
| `INTERVAL` | `Interval(YearMonth)` or `Interval(MonthDayNano)` or `Interval(DayTime)` |
| `REGCLASS` | _Not yet supported_ |
| `TEXT` | `Utf8` |
| `BYTEA` | `Binary` |
| `CUSTOM` | _Not yet supported_ |
| `ARRAY` | _Not yet supported_ |
4 changes: 3 additions & 1 deletion docs/source/user-guide/sql/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,10 @@ SQL Reference
.. toctree::
:maxdepth: 2

sql_status
data_types
select
ddl
information_schema
aggregate_functions
scalar_functions
sql_status
68 changes: 68 additions & 0 deletions docs/source/user-guide/sql/information_schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Information Schema

DataFusion supports showing metadata about the tables available. This information can be accessed using the views
of the ISO SQL `information_schema` schema or the DataFusion specific `SHOW TABLES` and `SHOW COLUMNS` commands.

More information can be found in the [Postgres docs](https://www.postgresql.org/docs/13/infoschema-schema.html)).

To show tables available for use in DataFusion, use the `SHOW TABLES` command or the `information_schema.tables` view:

```sql
> show tables;
+---------------+--------------------+------------+------------+
| table_catalog | table_schema | table_name | table_type |
+---------------+--------------------+------------+------------+
| datafusion | public | t | BASE TABLE |
| datafusion | information_schema | tables | VIEW |
+---------------+--------------------+------------+------------+

> select * from information_schema.tables;

+---------------+--------------------+------------+--------------+
| table_catalog | table_schema | table_name | table_type |
+---------------+--------------------+------------+--------------+
| datafusion | public | t | BASE TABLE |
| datafusion | information_schema | TABLES | SYSTEM TABLE |
+---------------+--------------------+------------+--------------+
```

To show the schema of a table in DataFusion, use the `SHOW COLUMNS` command or the or `information_schema.columns` view:

```sql
> show columns from t;
+---------------+--------------+------------+-------------+-----------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
+---------------+--------------+------------+-------------+-----------+-------------+
| datafusion | public | t | a | Int32 | NO |
| datafusion | public | t | b | Utf8 | NO |
| datafusion | public | t | c | Float32 | NO |
+---------------+--------------+------------+-------------+-----------+-------------+

> select table_name, column_name, ordinal_position, is_nullable, data_type from information_schema.columns;
+------------+-------------+------------------+-------------+-----------+
| table_name | column_name | ordinal_position | is_nullable | data_type |
+------------+-------------+------------------+-------------+-----------+
| t | a | 0 | NO | Int32 |
| t | b | 1 | NO | Utf8 |
| t | c | 2 | NO | Float32 |
+------------+-------------+------------------+-------------+-----------+
```
101 changes: 0 additions & 101 deletions docs/source/user-guide/sql/sql_status.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,104 +143,3 @@ DataFusion is designed to be extensible at all points. To that end, you can prov
## Rust Version Compatibility

This crate is tested with the latest stable version of Rust. We do not currently test against other, older versions of the Rust compiler.

# Supported SQL
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this since this is covered already in "SELECT syntax" page


This library currently supports many SQL constructs, including

- `CREATE EXTERNAL TABLE X STORED AS PARQUET LOCATION '...';` to register a table's locations
- `SELECT ... FROM ...` together with any expression
- `ALIAS` to name an expression
- `CAST` to change types, including e.g. `Timestamp(Nanosecond, None)`
- Many mathematical unary and binary expressions such as `+`, `/`, `sqrt`, `tan`, `>=`.
- `WHERE` to filter
- `GROUP BY` together with one of the following aggregations: `MIN`, `MAX`, `COUNT`, `SUM`, `AVG`, `CORR`, `VAR`, `COVAR`, `STDDEV` (sample and population)
- `ORDER BY` together with an expression and optional `ASC` or `DESC` and also optional `NULLS FIRST` or `NULLS LAST`

## Supported Functions
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this since most of the functions are now documented in the SQL reference


DataFusion strives to implement a subset of the [PostgreSQL SQL dialect](https://www.postgresql.org/docs/current/functions.html) where possible. We explicitly choose a single dialect to maximize interoperability with other tools and allow reuse of the PostgreSQL documents and tutorials as much as possible.

Currently, only a subset of the PostgreSQL dialect is implemented, and we will document any deviations.

## Schema Metadata / Information Schema Support

DataFusion supports the showing metadata about the tables available. This information can be accessed using the views of the ISO SQL `information_schema` schema or the DataFusion specific `SHOW TABLES` and `SHOW COLUMNS` commands.

More information can be found in the [Postgres docs](https://www.postgresql.org/docs/13/infoschema-schema.html)).

To show tables available for use in DataFusion, use the `SHOW TABLES` command or the `information_schema.tables` view:

```sql
> show tables;
+---------------+--------------------+------------+------------+
| table_catalog | table_schema | table_name | table_type |
+---------------+--------------------+------------+------------+
| datafusion | public | t | BASE TABLE |
| datafusion | information_schema | tables | VIEW |
+---------------+--------------------+------------+------------+

> select * from information_schema.tables;

+---------------+--------------------+------------+--------------+
| table_catalog | table_schema | table_name | table_type |
+---------------+--------------------+------------+--------------+
| datafusion | public | t | BASE TABLE |
| datafusion | information_schema | TABLES | SYSTEM TABLE |
+---------------+--------------------+------------+--------------+
```

To show the schema of a table in DataFusion, use the `SHOW COLUMNS` command or the or `information_schema.columns` view:

```sql
> show columns from t;
+---------------+--------------+------------+-------------+-----------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
+---------------+--------------+------------+-------------+-----------+-------------+
| datafusion | public | t | a | Int32 | NO |
| datafusion | public | t | b | Utf8 | NO |
| datafusion | public | t | c | Float32 | NO |
+---------------+--------------+------------+-------------+-----------+-------------+

> select table_name, column_name, ordinal_position, is_nullable, data_type from information_schema.columns;
+------------+-------------+------------------+-------------+-----------+
| table_name | column_name | ordinal_position | is_nullable | data_type |
+------------+-------------+------------------+-------------+-----------+
| t | a | 0 | NO | Int32 |
| t | b | 1 | NO | Utf8 |
| t | c | 2 | NO | Float32 |
+------------+-------------+------------------+-------------+-----------+
```

## Supported Data Types

DataFusion uses Arrow, and thus the Arrow type system, for query
execution. The SQL types from
[sqlparser-rs](https://github.com/ballista-compute/sqlparser-rs/blob/main/src/ast/data_type.rs#L57)
are mapped to Arrow types according to the following table

| SQL Data Type | Arrow DataType |
| ------------- | --------------------------------- |
| `CHAR` | `Utf8` |
| `VARCHAR` | `Utf8` |
| `UUID` | _Not yet supported_ |
| `CLOB` | _Not yet supported_ |
| `BINARY` | _Not yet supported_ |
| `VARBINARY` | _Not yet supported_ |
| `DECIMAL` | `Float64` |
| `FLOAT` | `Float32` |
| `SMALLINT` | `Int16` |
| `INT` | `Int32` |
| `BIGINT` | `Int64` |
| `REAL` | `Float32` |
| `DOUBLE` | `Float64` |
| `BOOLEAN` | `Boolean` |
| `DATE` | `Date32` |
| `TIME` | `Time64(TimeUnit::Millisecond)` |
| `TIMESTAMP` | `Timestamp(TimeUnit::Nanosecond)` |
| `INTERVAL` | _Not yet supported_ |
| `REGCLASS` | _Not yet supported_ |
| `TEXT` | _Not yet supported_ |
| `BYTEA` | _Not yet supported_ |
| `CUSTOM` | _Not yet supported_ |
| `ARRAY` | _Not yet supported_ |