Skip to content

Commit

Permalink
docs: stage files with presign (#11900)
Browse files Browse the repository at this point in the history
* updates

* Update 00-stage.md

* updated
  • Loading branch information
soyeric128 authored Jun 28, 2023
1 parent a26cc30 commit cbdf2a7
Show file tree
Hide file tree
Showing 7 changed files with 96 additions and 95 deletions.
1 change: 0 additions & 1 deletion docs/doc/01-guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,6 @@ Databend's rich ecosystem offers a range of powerful tools and integrations, all
* [MySQL Handler](../03-develop/00-api/01-mysql-handler.md)
* [ClickHouse Handler](../03-develop/00-api/02-clickhouse-handler.md)
* [Streaming Load API](../03-develop/00-api/03-streaming-load.md)
* [File Upload API](../03-develop/00-api/10-put-to-stage.md)

</TabItem>

Expand Down
25 changes: 0 additions & 25 deletions docs/doc/03-develop/00-api/10-put-to-stage.md

This file was deleted.

3 changes: 1 addition & 2 deletions docs/doc/03-develop/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@ Databend offers a variety of powerful APIs, allowing you to seamlessly interact
| HTTP Handler | Allows interaction with Databend through HTTP requests. |
| MySQL Handler | Enables communication between Databend and MySQL databases. |
| ClickHouse Handler | Facilitates data exchange between Databend and ClickHouse. |
| Streaming Load API | Designed for real-time data ingestion into Databend, allowing seamless streaming of data from various sources. |
| File Upload API | Enables uploading files directly into Databend, simplifying the process of importing data stored in files. | |
| Streaming Load API | Designed for real-time data ingestion into Databend, allowing seamless streaming of data from various sources. | |

Learn to use programming languages such as Go, Python, Node.js, Java, and Rust to develop applications that interact with Databend. Drivers described in the table below can be used to access Databend or Databend Cloud from these applications, enabling communication with Databend from the supported languages.

Expand Down
84 changes: 48 additions & 36 deletions docs/doc/12-load-data/00-stage/02-stage-files.md
Original file line number Diff line number Diff line change
@@ -1,91 +1,103 @@
---
title: Staging Files
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

You can use the [File Upload API](../../03-develop/00-api/10-put-to-stage.md) to stage a file by uploading a local file to a stage. This API can be called using curl or other HTTP client tools. Alternatively, you can also upload files directly to the folder in your bucket that maps to a stage using a web browser. Once uploaded, Databend can recognize them as staged files.
Databend recommends using the Presigned URL method to upload files to the stage. This method provides a secure and efficient way to transfer data by generating a time-limited URL with a signature. By generating a Presigned URL, the client can directly upload the file to the designated stage without the need to route the traffic through Databend servers. This helps in offloading network traffic from the Databend infrastructure and can lead to improved performance and scalability. It also reduces the latency for file uploads, as the data can be transferred directly between the client and the storage destination without intermediaries.

See also: [PRESIGN](../../14-sql-commands/00-ddl/80-presign/presign.md)

## Examples

The following examples demonstrate how to upload a sample file ([books.parquet](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.parquet)) to the user stage, an internal stage, and an external stage with the File Upload API.
### Uploading with Presigned URL

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
The following examples demonstrate how to upload a sample file ([books.parquet](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.parquet)) to the user stage, an internal stage, and an external stage with presigned URLs.

<Tabs groupId="operating-systems">
<Tabs groupId="presign">

<TabItem value="user" label="Upload to User Stage">

Use cURL to make a request to the File Upload API:
```sql
PRESIGN UPLOAD @~/books.parquet;

```shell title='Put books.parquet to stage'
curl -u root: -H "stage_name:~" -F "upload=@books.parquet" -XPUT "http://localhost:8000/v1/upload_to_stage"
Name |Value |
-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
method |PUT |
headers|{"host":"s3.us-east-2.amazonaws.com"} |
url |https://s3.us-east-2.amazonaws.com/databend-toronto/stage/user/root/books.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASTQNLUZWP2UY2HSN%2F20230627%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20230627T153448Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=84f1c280bff52f33c1914d64b2091d19650ad4882137013601fc44d26b607933|
```

```shell title='Response'
{"id":"bf2574bd-a467-4690-82b9-12549a1875d4","stage_name":"~","state":"SUCCESS","files":["books.parquet"]}
```shell
curl -X PUT -T books.parquet "https://s3.us-east-2.amazonaws.com/databend-toronto/stage/user/root/books.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASTQNLUZWP2UY2HSN%2F20230627%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20230627T153448Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=84f1c280bff52f33c1914d64b2091d19650ad4882137013601fc44d26b607933"
```

Check the staged file:

```sql
LIST @~;

name |size|md5|last_modified |creator|
-------------+----+---+-----------------------------+-------+
books.parquet| 998| |2023-04-20 20:55:03.100 +0000| |
name |size|md5 |last_modified |creator|
-------------+----+----------------------------------+-----------------------------+-------+
books.parquet| 998|"88432bf90aadb79073682988b39d461c"|2023-06-27 16:03:51.000 +0000| |
```
</TabItem>

<TabItem value="internal" label="Upload to Internal Stage">

1. Create a named internal stage:
```sql
CREATE STAGE my_internal_stage;
```
2. Use cURL to make a request to the File Upload API:
```sql
PRESIGN UPLOAD @my_internal_stage/books.parquet;

```shell title='Put books.parquet to stage'
curl -u root: -H "stage_name:my_internal_stage" -F "upload=@books.parquet" -XPUT "http://localhost:8000/v1/upload_to_stage"
Name |Value |
-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
method |PUT |
headers|{"host":"s3.us-east-2.amazonaws.com"} |
url |https://s3.us-east-2.amazonaws.com/databend-toronto/stage/internal/my_internal_stage/books.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASTQNLUZWP2UY2HSN%2F20230628%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20230628T022951Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=9cfcdf3b3554280211f88629d60358c6d6e6a5e49cd83146f1daea7dfe37f5c1|
```

```shell title='Response'
{"id":"a3b21915-b3a3-477f-8e31-b676074539ea","stage_name":"my_internal_stage","state":"SUCCESS","files":["books.parquet"]}
```shell
curl -X PUT -T books.parquet "https://s3.us-east-2.amazonaws.com/databend-toronto/stage/internal/my_internal_stage/books.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASTQNLUZWP2UY2HSN%2F20230628%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20230628T022951Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=9cfcdf3b3554280211f88629d60358c6d6e6a5e49cd83146f1daea7dfe37f5c1"
```

Check the staged file:

```sql
LIST @my_internal_stage;

name |size|md5|last_modified |creator|
-------------+----+---+-----------------------------+-------+
books.parquet| 998| |2023-04-19 19:34:51.303 +0000| |
name |size |md5 |last_modified |creator|
-----------------------------------+------+----------------------------------+-----------------------------+-------+
books.parquet | 998|"88432bf90aadb79073682988b39d461c"|2023-06-28 02:32:15.000 +0000| |
```
</TabItem>
<TabItem value="external" label="Upload to External Stage">

1. Create a named external stage:

```sql
CREATE STAGE my_external_stage url = 's3://databend' CONNECTION =(ENDPOINT_URL= 'http://127.0.0.1:9000' aws_key_id='ROOTUSER' aws_secret_key='CHANGEME123');
```
2. Use cURL to make a request to the File Upload API:

```shell title='Put books.parquet to stage'
curl -u root: -H "stage_name:my_external_stage" -F "upload=@books.parquet" -XPUT "http://127.0.0.1:8000/v1/upload_to_stage"
```
```sql
PRESIGN UPLOAD @my_external_stage/books.parquet;

```shell title='Response'
{"id":"a21844fc-4c06-4b95-85a0-d57c28b9a142","stage_name":"my_external_stage","state":"SUCCESS","files":["books.parquet"]}
Name |Value |
-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
method |PUT |
headers|{"host":"127.0.0.1:9000"} |
url |http://127.0.0.1:9000/databend/books.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ROOTUSER%2F20230628%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230628T040959Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=697d608750fdcfe4a0b739b409cd340272201351023baa823382bf8c3718a4bd|
```
```shell
curl -X PUT -T books.parquet "http://127.0.0.1:9000/databend/books.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ROOTUSER%2F20230628%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230628T040959Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=697d608750fdcfe4a0b739b409cd340272201351023baa823382bf8c3718a4bd"
```

Check the staged file:

```sql
LIST @my_external_stage;

+-------------------+------+------------------------------------+-------------------------------+---------+
| name | size | md5 | last_modified | creator |
+-------------------+------+------------------------------------+-------------------------------+---------+
| books.parquet | 998 | "88432bf90aadb79073682988b39d461c" | 2023-04-24 04:57:35.447 +0000 | NULL |
+-------------------+------+------------------------------------+-------------------------------+---------+
name |size|md5 |last_modified |creator|
-------------+----+----------------------------------+-----------------------------+-------+
books.parquet| 998|"88432bf90aadb79073682988b39d461c"|2023-06-28 04:13:15.178 +0000| |
```
</TabItem>
</Tabs>
2 changes: 1 addition & 1 deletion docs/doc/12-load-data/00-transform/05-querying-stage.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ This example shows how to query data in a Parquet file stored in different locat
<Tabs groupId="query2stage">
<TabItem value="Stages" label="Stages">

Let's assume you have a sample file named [books.parquet](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.parquet) and you have uploaded it to your user stage, an internal stage named *my_internal_stage*, and an external stage named *my_external_stage*. To upload files to a stage, use the [File Upload API](../../03-develop/00-api/10-put-to-stage.md).
Let's assume you have a sample file named [books.parquet](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.parquet) and you have uploaded it to your user stage, an internal stage named *my_internal_stage*, and an external stage named *my_external_stage*. To upload files to a stage, use the [PRESIGN](../../14-sql-commands/00-ddl/80-presign/presign.md) method.

```sql
-- Query file in user stage
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ This section provides several brief tutorials that offer practical guidance on h

### Before You Begin

Download the sample file [employees.parquet](https://datasets.databend.rs/employees.parquet) and then upload it to your user stage using the [File Upload API](../../03-develop/00-api/10-put-to-stage.md). If you query the file, you will find that it contains these records:
Download the sample file [employees.parquet](https://datasets.databend.rs/employees.parquet) and then upload it to your user stage with [PRESIGN](../../14-sql-commands/00-ddl/80-presign/presign.md). If you query the file, you will find that it contains these records:

```sql
-- Query remote sample file directly
Expand Down
Loading

1 comment on commit cbdf2a7

@vercel
Copy link

@vercel vercel bot commented on cbdf2a7 Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully deployed to the following URLs:

databend – ./

databend-git-main-databend.vercel.app
databend-databend.vercel.app
databend.vercel.app
databend.rs

Please sign in to comment.