Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC-324 Add more SQL to Snowflake import doc #362

Merged
merged 1 commit into from
Nov 6, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
242 changes: 233 additions & 9 deletions content/collections/source-catalog/en/snowflake.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@

## SQL query examples

To make the data selection step a bit easier, here are few example SQL snippets to get you started.
To make the data selection step easier, here are few example SQL snippets to get you started.

### Event data example

Expand Down Expand Up @@ -263,18 +263,242 @@

### Common snippets

Creating a JSON Object:
Create a JSON Object:

`OBJECT_CONSTRUCT('city', CITY, 'state', STATE) as "user_properties"`
```sql
OBJECT_CONSTRUCT('city', CITY, 'state', STATE) as "user_properties"
```

Convert a timestamp column to milliseconds:

```sql
DATE_PART('EPOCH_MILLISECOND', TIMESTAMP_COLUMN) as "time"
```

Convert milliseconds to the `TIMESTAMP_NTZ` format needed for time-based import. This example uses the `scale` argument set to `3` to convert to milliseconds. See the [Snowflake documentation](https://docs.snowflake.com/en/sql-reference/functions/to_timestamp.html) for more details.

```sql
TO_TIMESTAMP_NTZ(TIME_COLUMN_IN_MILLIS, 3) as "update_time_column"
```

Convert a timestamp column with a timezone to `TIMESTAMP_NTZ` format needed for time-based import.

```sql
TO_TIMESTAMP_NTZ(CONVERT_TIMEZONE('UTC', TIMESTAMP_TZ_COLUMN)) as "update_time_column"
```

## SQL troubleshooting

Check warning on line 290 in content/collections/source-catalog/en/snowflake.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Amplitude.Headings] 'SQL troubleshooting' should use sentence-style capitalization. Raw Output: {"message": "[Amplitude.Headings] 'SQL troubleshooting' should use sentence-style capitalization.", "location": {"path": "content/collections/source-catalog/en/snowflake.md", "range": {"start": {"line": 290, "column": 4}}}, "severity": "WARNING"}

The following sections provide example SQL queries you can use to configure your import connectors.

### Required event properties

The Snowflake SQL queries you write for Amplitude's data warehouse import connectors must return specific columns that match Amplitude's Event API schema. Use the following examples to help structure your query.

{{partial:tabs tabs="Basic template, Complete template"}}
{{partial:tab name="Basic template"}}
```sql
SELECT
event_type, -- String: Name of the event
user_id, -- String: Unique identifier for the user
EXTRACT(EPOCH_MILLISECOND FROM event_timestamp) as time -- Timestamp in milliseconds
FROM your_events_table
```
{{/partial:tab}}
{{partial:tab name="Complete template"}}
```sql
SELECT
event_name as event_type,
user_identifier as user_id,
EXTRACT(EPOCH_MILLISECOND FROM event_timestamp) as time,
device_id,
session_id,

-- Event Properties as JSON object using OBJECT_CONSTRUCT
OBJECT_CONSTRUCT(
'property1', property1_value,
'property2', property2_value,
'category', category,
'value', amount
) as event_properties,
-- [tl! collapse:start ]
-- User Properties as JSON object
OBJECT_CONSTRUCT(
'user_type', user_type,
'subscription_status', subscription_status,
'city', data:address:city::string,
'last_updated', TO_VARCHAR(last_updated)
) as user_properties,

app_version,
platform,
os_name,
os_version,
device_brand,
device_manufacturer,
device_model,
carrier,
country,
region,
city,
dma,
language,
price::FLOAT as price,
quantity::INTEGER as quantity,
revenue::FLOAT as revenue,
product_id as productId,
revenue_type as revenueType,
location_lat::FLOAT as location_lat,
location_lng::FLOAT as location_lng,
ip
-- [tl! collapse:end ]
FROM your_events_table
WHERE event_timestamp >= DATEADD(day, -7, CURRENT_DATE())
```
{{/partial:tab}}
{{/partial:tabs}}

### Basic event query with properties

```sql
SELECT
event_name as event_type,
user_id,
EXTRACT(EPOCH_MILLISECOND FROM event_timestamp) as time,
device_id,
-- Construct event properties from multiple columns
OBJECT_CONSTRUCT(
'page_name', page_name,
'button_id', button_id,
'interaction_type', interaction_type,
'duration_ms', duration_ms
) as event_properties,
-- Construct user properties
OBJECT_CONSTRUCT(
'account_type', account_type,
'subscription_tier', subscription_tier,
'last_login', TO_VARCHAR(last_login_date)
) as user_properties,
platform,
app_version
FROM app_events
WHERE event_timestamp >= DATEADD(day, -7, CURRENT_DATE())
```

### Snowflake-specific features and best practices

Converting timestamp column to milliseconds:
The following are examples of Snowflake-specific features and best practices.

`DATE_PART('EPOCH_MILLISECOND', TIMESTAMP_COLUMN) as "time"`
#### Working with JSON

Check warning on line 392 in content/collections/source-catalog/en/snowflake.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Amplitude.Headings] 'Working with JSON' should use sentence-style capitalization. Raw Output: {"message": "[Amplitude.Headings] 'Working with JSON' should use sentence-style capitalization.", "location": {"path": "content/collections/source-catalog/en/snowflake.md", "range": {"start": {"line": 392, "column": 6}}}, "severity": "WARNING"}

Converting milliseconds to TIMESTAMP_NTZ format needed for time-based import. This example uses the `scale` argument set to `3` to convert to milliseconds. See the [Snowflake documentation](https://docs.snowflake.com/en/sql-reference/functions/to_timestamp.html) for more details.
```sql
-- Combining multiple JSON objects
SELECT
event_type,
user_id,
EXTRACT(EPOCH_MILLISECOND FROM event_timestamp) as time,
OBJECT_CONSTRUCT(
'base_properties', base_properties, -- existing JSON column
'additional_data', OBJECT_CONSTRUCT(
'new_field1', value1,
'new_field2', value2
)
) as event_properties
FROM events

-- Parsing JSON fields
SELECT
event_type,
user_id,
time,
PARSE_JSON(raw_properties):field_name::string as extracted_value
FROM events
```

#### Handling timestamps

```sql
-- Converting different timestamp formats
SELECT
event_type,
user_id,
CASE
WHEN TRY_TO_TIMESTAMP(timestamp_string) IS NOT NULL
THEN EXTRACT(EPOCH_MILLISECOND FROM TRY_TO_TIMESTAMP(timestamp_string))
WHEN TRY_TO_TIMESTAMP_NTZ(timestamp_string) IS NOT NULL
THEN EXTRACT(EPOCH_MILLISECOND FROM TRY_TO_TIMESTAMP_NTZ(timestamp_string))
ELSE NULL
END as time
FROM events
```

#### Data validation queries

```sql
-- Validate required fields
SELECT COUNT(*)
FROM (
YOUR_QUERY_HERE
) t
WHERE event_type IS NULL
OR user_id IS NULL
OR time IS NULL;

-- Validate JSON structure
SELECT COUNT(*)
FROM (
YOUR_QUERY_HERE
) t
WHERE NOT (
TRY_CAST(event_properties AS OBJECT) IS NOT NULL
AND TRY_CAST(user_properties AS OBJECT) IS NOT NULL
);

-- Validate timestamp range
SELECT
MIN(time) as min_time,
MAX(time) as max_time,
TIMEADD(millisecond, MIN(time), '1970-01-01'::timestamp) as min_readable_time,
TIMEADD(millisecond, MAX(time), '1970-01-01'::timestamp) as max_readable_time
FROM (
YOUR_QUERY_HERE
) t;
```

## Performance optimization tips

Use the following examples to help optimize the performance of your integration.

`TO_TIMESTAMP_NTZ(TIME_COLUMN_IN_MILLIS, 3) as "update_time_column"`
### Use clustering keys

Converting a timestamp column with a timezone to TIMESTAMP_NTZ format needed for time-based import.
Use the appropriate clusting keys on your source tables.

Check warning on line 474 in content/collections/source-catalog/en/snowflake.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'clusting'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'clusting'?", "location": {"path": "content/collections/source-catalog/en/snowflake.md", "range": {"start": {"line": 474, "column": 21}}}, "severity": "WARNING"}

`TO_TIMESTAMP_NTZ(CONVERT_TIMEZONE('UTC', TIMESTAMP_TZ_COLUMN)) as "update_time_column"`
```sql
ALTER TABLE your_events_table CLUSTER BY (event_timestamp, user_id);
```

### Use materialized views

Use materialized views for complex transformations.

```sql
CREATE MATERIALIZED VIEW amplitude_ready_events AS
SELECT
-- Your transformed columns here
FROM source_events;
```

### Date partitioning in WHERE clauses

```sql
WHERE event_timestamp >= DATEADD(day, -7, CURRENT_DATE())
AND event_timestamp < CURRENT_DATE()
```

### Micro-partitions

```sql
SELECT ...
FROM your_table
WHERE TO_DATE(event_timestamp) BETWEEN '2024-01-01' AND '2024-01-31'
```
Loading