You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Assume further that the materialization of the above model fails with TOO_MANY_OPEN_PARTITIONS (improbable with such a simple query, but please bear with me).
As per the current version of the code, the following code will get executed:
In our specific situation this leads to a query on the order of
select distinct bucket(user_id, 5) from (
select'A'as user_id,
'pi'as name,
'active'as status,
17.89as cost,
1as quantity,
100000000as quantity_big,
current_dateas my_date
) order by bucket(user_id, 5);
This query, however, will necessarily fail, as the bucket function is not defined in Athena/Presto(/Trino) and (to the best of my knowledge) actually originates from Iceberg/Spark (https://iceberg.apache.org/docs/latest/spark-ddl/#partitioned-by). The error will be on the order of (here is to hoping this will help others find this via a search engine in the future):
FUNCTION_NOT_FOUND: line 2:21: Function 'bucket' not registered
I am not sure what a proper fix of something like this would be but as it currently is, when one runs into TOO_MANY_OPEN_PARTITIONS with a model that uses the bucket function in its partitions, there is no way of getting it to work right now. Note that this was introduced in #360, but the queries that would cause this did not "stand a chance" anyway, so this certainly does not mean that there was a bug introduced in #360 -- on the contrary, it certainly help a large class of queries to pass despite the TOO_MANY_OPEN_PARTITIONS issue.
I hope this might help some fellow dbt-athena user in the past. If anyone has any ideas as to what would a fix look like, please do not hesitate to suggest them -- I would be very eager to learn about them.
Thanks!
The text was updated successfully, but these errors were encountered:
@mrshu thanks for pointing that out. This case is really not covered in dbt-athena. This can be fixed on stage where the query to get partitions list is generated. Can you help us with contributing such fix ?
I would be very happy to contribute a fix for this, but I am not really sure how to go about it. Since you are trying to implement it, please feel free to keep me in the loop -- I would be happy to help with testing it.
Hi @mrshu!
I added fix and implementation for the issue you mentioned above, please feel free to test it and answer if everything is fine or something happens wrong ☺️
We've ran into this in our production, so I thought it might be interesting for everyone else to see as well.
Let us assume the following materialized view:
Assume further that the materialization of the above model fails with
TOO_MANY_OPEN_PARTITIONS
(improbable with such a simple query, but please bear with me).As per the current version of the code, the following code will get executed:
https://github.com/dbt-athena/dbt-athena/blob/bae9fce151317b235eba2d497d9e16a1aaa7975c/dbt/include/athena/macros/materializations/models/incremental/helpers.sql#L71-L73
The
batch_incremental_insert
function will call theget_partition_batches
function in turn:https://github.com/dbt-athena/dbt-athena/blob/bae9fce151317b235eba2d497d9e16a1aaa7975c/dbt/include/athena/macros/materializations/models/incremental/helpers.sql#L25-L27
Which will then do a specific
select distinct
with the contents ofpartitioned_by
as the parameter:https://github.com/dbt-athena/dbt-athena/blob/bae9fce151317b235eba2d497d9e16a1aaa7975c/dbt/include/athena/macros/materializations/models/helpers/get_partition_batches.sql#L7-L13
In our specific situation this leads to a query on the order of
This query, however, will necessarily fail, as the
bucket
function is not defined in Athena/Presto(/Trino) and (to the best of my knowledge) actually originates from Iceberg/Spark (https://iceberg.apache.org/docs/latest/spark-ddl/#partitioned-by). The error will be on the order of (here is to hoping this will help others find this via a search engine in the future):I am not sure what a proper fix of something like this would be but as it currently is, when one runs into
TOO_MANY_OPEN_PARTITIONS
with a model that uses thebucket
function in its partitions, there is no way of getting it to work right now. Note that this was introduced in #360, but the queries that would cause this did not "stand a chance" anyway, so this certainly does not mean that there was a bug introduced in #360 -- on the contrary, it certainly help a large class of queries to pass despite theTOO_MANY_OPEN_PARTITIONS
issue.I hope this might help some fellow
dbt-athena
user in the past. If anyone has any ideas as to what would a fix look like, please do not hesitate to suggest them -- I would be very eager to learn about them.Thanks!
The text was updated successfully, but these errors were encountered: