-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrong name for configuration of Repartition output data before write #487
Comments
Good catch. The correct config name should be |
Closing this. The doc has been fixed. |
@zsxwing |
@SteffenMangold please check the latest doc: https://docs.delta.io/latest/delta-update.html#-merge-in-dedup We don't update these archived docs. |
tdas
pushed a commit
to tdas/delta
that referenced
this issue
Jun 6, 2023
* [FlinkSQL_PR_1] Flink Delta Sink - Table API UPDATED (delta-io#389) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com> Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> Co-authored-by: Paweł Kubit <pawel.kubit@getindata.com> Co-authored-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com> * [FlinkSQL_PR_2] - SQL Support for Delta Source connector. (delta-io#487) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_3] - Delta catalog skeleton (delta-io#503) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_4] - Delta catalog - Interactions with DeltaLog. Create and get table. (delta-io#506) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_5] - Delta catalog - DDL option validation. (delta-io#509) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_6] - Delta catalog - alter table + tests. (delta-io#510) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_7] - Delta catalog - Restrict Delta Table factory to work only with Delta Catalog + tests. (delta-io#514) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_8] - Delta Catalog - DDL/Query hint validation + tests. (delta-io#520) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_9] - Delta Catalog - Adding Flink's Hive catalog as decorated catalog. (delta-io#524) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_10] - Table API support SELECT with filter on partition column. (delta-io#528) * [FlinkSQL_PR_10] - Table API support SELECT with filter on partition column. --------- Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> Co-authored-by: Scott Sandre <scott.sandre@databricks.com> * [FlinkSQL_PR_11] - Delta Catalog - cache DeltaLog instances in DeltaCatalog. (delta-io#529) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_12] - UML diagrams. (delta-io#530) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_13] - Remove mergeSchema option from SQL API. (delta-io#531) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_14] - SQL examples. (delta-io#535) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * remove duplicate function after rebasing against master --------- Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com> Co-authored-by: kristoffSC <krzysiek.chmielewski@gmail.com> Co-authored-by: Paweł Kubit <pawel.kubit@getindata.com> Co-authored-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
tdas
pushed a commit
to tdas/delta
that referenced
this issue
Jun 8, 2023
* [FlinkSQL_PR_1] Flink Delta Sink - Table API UPDATED (delta-io#389) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com> Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> Co-authored-by: Paweł Kubit <pawel.kubit@getindata.com> Co-authored-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com> * [FlinkSQL_PR_2] - SQL Support for Delta Source connector. (delta-io#487) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_3] - Delta catalog skeleton (delta-io#503) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_4] - Delta catalog - Interactions with DeltaLog. Create and get table. (delta-io#506) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_5] - Delta catalog - DDL option validation. (delta-io#509) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_6] - Delta catalog - alter table + tests. (delta-io#510) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_7] - Delta catalog - Restrict Delta Table factory to work only with Delta Catalog + tests. (delta-io#514) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_8] - Delta Catalog - DDL/Query hint validation + tests. (delta-io#520) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_9] - Delta Catalog - Adding Flink's Hive catalog as decorated catalog. (delta-io#524) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_10] - Table API support SELECT with filter on partition column. (delta-io#528) * [FlinkSQL_PR_10] - Table API support SELECT with filter on partition column. --------- Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> Co-authored-by: Scott Sandre <scott.sandre@databricks.com> * [FlinkSQL_PR_11] - Delta Catalog - cache DeltaLog instances in DeltaCatalog. (delta-io#529) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_12] - UML diagrams. (delta-io#530) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_13] - Remove mergeSchema option from SQL API. (delta-io#531) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * [FlinkSQL_PR_14] - SQL examples. (delta-io#535) Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> * remove duplicate function after rebasing against master --------- Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com> Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com> Co-authored-by: kristoffSC <krzysiek.chmielewski@gmail.com> Co-authored-by: Paweł Kubit <pawel.kubit@getindata.com> Co-authored-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
hi there,
I'm on
spark3.0.0 + io.delta:delta-core_2.12:0.7.0,
and i'd like to active theRepartition output data before write
feature described in https://docs.delta.io/latest/delta-update.html .it is said to set
spark.delta.merge.repartitionBeforeWrite
to true can active this feature but as go through the source code and tried, it seems that only when setspark.databricks.delta.merge.repartitionBeforeWrite.enabled
to true can make the output file number much less than before. is there some wrong with the name of this configuration?here's the
spark-defaults.conf
i been use to start spark sessionand i'm using pyspark to run below code
with
spark.databricks.delta.merge.repartitionBeforeWrite.enabled=true
what i have iswithout
spark.databricks.delta.merge.repartitionBeforeWrite.enabled=true
what i have isand only with
spark.delta.merge.repartitionBeforeWrite=true
withoutspark.databricks.delta.merge.repartitionBeforeWrite.enabled=true
what i have is the same when withoutspark.databricks.delta.merge.repartitionBeforeWrite.enabled=true
that ends up with tons of small parquet filesThe text was updated successfully, but these errors were encountered: