Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing <WHCode> and changing it to <Tabs> #6877

Open
wants to merge 2 commits into
base: current
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 26 additions & 27 deletions website/docs/docs/build/python-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,9 +266,9 @@ Python models can't be materialized as `view` or `ephemeral`. Python isn't suppo

For incremental models, like SQL models, you need to filter incoming tables to only new rows of data:

<WHCode>
<Tabs>

<div warehouse="Snowpark">
<TabItem value="Snowpark">

<File name='models/my_python_model.py'>

Expand All @@ -295,9 +295,9 @@ def model(dbt, session):

</File>

</div>
</TabItem>

<div warehouse="PySpark">
<TabItem value="PySpark">

<File name='models/my_python_model.py'>

Expand All @@ -324,9 +324,9 @@ def model(dbt, session):

</File>

</div>
</TabItem>

</WHCode>
</Tabs>

## Python-specific functionality

Expand Down Expand Up @@ -358,10 +358,9 @@ Currently, Python functions defined in one dbt model can't be imported and reuse
You can also define functions that depend on third-party packages so long as those packages are installed and available to the Python runtime on your data platform. See notes on "Installing Packages" for [specific data platforms](#specific-data-platforms).

In this example, we use the `holidays` package to determine if a given date is a holiday in France. The code below uses the pandas API for simplicity and consistency across platforms. The exact syntax, and the need to refactor for multi-node processing, still vary.
<Tabs>

<WHCode>

<div warehouse="Snowpark">
<TabItem value="Snowpark">

<File name='models/my_python_model.py'>

Expand Down Expand Up @@ -395,9 +394,9 @@ def model(dbt, session):

</File>

</div>
</TabItem>

<div warehouse="PySpark">
<TabItem value="PySpark">

<File name='models/my_python_model.py'>

Expand Down Expand Up @@ -434,9 +433,9 @@ def model(dbt, session):

</File>

</div>
</TabItem>

</WHCode>
</Tabs>

#### Configuring packages

Expand Down Expand Up @@ -474,9 +473,9 @@ You can use the `@udf` decorator or `udf` function to define an "anonymous" func
- [Snowpark Python: Creating UDFs](https://docs.snowflake.com/en/developer-guide/snowpark/python/creating-udfs.html)
- [PySpark functions: udf](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.udf.html)

<WHCode>
<Tabs>

<div warehouse="Snowpark">
<TabItem value="Snowpark">

<File name='models/my_python_model.py'>

Expand Down Expand Up @@ -516,9 +515,9 @@ def model(dbt, session):
- Writing [`create function`](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-batch.html) inside a SQL macro, to run as a hook or run-operation
- [Registering from a staged file](https://docs.snowflake.com/en/developer-guide/snowpark/python/creating-udfs#creating-a-udf-from-a-python-source-file) within your Python model code

</div>
</TabItem>

<div warehouse="PySpark">
<TabItem value="PySpark">

<File name='models/my_python_model.py'>

Expand Down Expand Up @@ -548,9 +547,9 @@ def model(dbt, session):

</File>

</div>
</TabItem>

</WHCode>
</Tabs>

#### Code reuse

Expand Down Expand Up @@ -633,9 +632,9 @@ As a general rule, if there's a transformation you could write equally well in S

In their initial launch, Python models are supported on three of the most popular data platforms: Snowflake, Databricks, and BigQuery/GCP (via Dataproc). Both Databricks and GCP's Dataproc use PySpark as the processing framework. Snowflake uses its own framework, Snowpark, which has many similarities to PySpark.

<WHCode>
<Tabs>

<div warehouse="Snowflake">
<TabItem value="Snowflake">

**Additional setup:** You will need to [acknowledge and accept Snowflake Third Party Terms](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html#getting-started) to use Anaconda packages.

Expand Down Expand Up @@ -713,9 +712,9 @@ def model(dbt, session):
For more information on using this configuration, refer to [Snowflake's documentation](https://community.snowflake.com/s/article/how-to-use-other-python-packages-in-snowpark) on uploading and using other python packages in Snowpark not published on Snowflake's Anaconda channel.


</div>
</TabItem>

<div warehouse="Databricks">
<TabItem value="Databricks">

**Submission methods:** Databricks supports a few different mechanisms to submit PySpark code, each with relative advantages. Some are better for supporting iterative development, while others are better for supporting lower-cost production deployments. The options are:
- `all_purpose_cluster` (default): dbt will run your Python model using the cluster ID configured as `cluster` in your connection profile or for this specific model. These clusters are more expensive but also much more responsive. We recommend using an interactive all-purpose cluster for quicker iteration in development.
Expand Down Expand Up @@ -763,9 +762,9 @@ If not configured, `dbt-spark` will use the built-in defaults: the all-purpose c
- [PySpark DataFrame syntax](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html)
- [Databricks: Introduction to DataFrames - Python](https://docs.databricks.com/spark/latest/dataframes-datasets/introduction-to-dataframes-python.html)

</div>
</TabItem>

<div warehouse="BigQuery">
<TabItem value="BigQuery">

The `dbt-bigquery` adapter uses a service called Dataproc to submit your Python models as PySpark jobs. That Python/PySpark code will read from your tables and views in BigQuery, perform all computation in Dataproc, and write the final result back to BigQuery.

Expand Down Expand Up @@ -860,7 +859,7 @@ Installation of third-party packages on Dataproc varies depending on whether it'
- [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets)
- [PySpark DataFrame syntax](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html)

</div>
</TabItem>

</WHCode>
</Tabs>

Loading