-
Notifications
You must be signed in to change notification settings - Fork 29k
[WIP] Loading spark-defaults.conf when creating SparkConf instances #1233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16159/ |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will end up causing disagreement with spark-submit options.
spark-submit will only load spark-defaults.conf if --properties-file is not defined in the command line. It seems like this code will always load spark-defaults.conf if one exists.
|
Hi @witgo, can you elaborate, in the change summary, why this is needed? Maybe file a bug? I think you guys don't use spark-submit, which is why you might need this; but spark-submit translates the config file into system properties, so the config is picked up by SparkConf. And it seems like this change is breaking the semantics of how spark-submit works. (I think it would be nice to stop using system properties like that - it gets really confusing when parts of the code use system properties and others use SparkConf - but that's a separate discussion.) |
|
@vanzin The situation is eg: |
|
Ah, so it's SPARK-2098. I think it's a nice feature to have (I filed the bug after all), but we can't break the existing semantics. For daemons, the command line parsers could do that (by having a "--properties-file" argument similar to spark-submit). But if you want to support arbitrary SparkConf instances to read these conf files, it will become trickier, since now you need to propagate that command line information somehow. |
|
You're right, the corresponding code should be submitted at the weekend. |
### What changes were proposed in this pull request? This PR cherry-picks the OPTIMIZE command to Spark 3.2. ### Why are the changes needed? These changes are needed to support data compaction via SQL for Iceberg. ### Does this PR introduce _any_ user-facing change? Yes but the changes are isolated and will be supported only by Iceberg. ### How was this patch tested? This PR comes with tests. More tests are in Iceberg.
…1233) * Push partial aggregate through range join condition * Add lower cost expression threshold * Fix data issue: ```sql SELECT c.session_start_dt, COUNT(*) AS clav_session_cats_cnt FROM p_soj_cl_v.clav_session_cats c full OUTER JOIN p_soj_cl_v.clav_session_ext s ON c.guid = s.guid AND c.session_skey = s.session_skey AND c.site_id = s.site_id AND c.session_start_dt = s.session_start_dt AND c.cobrand = s.cobrand WHERE c.session_start_dt = '2021-07-14' GROUP BY 1 ORDER by 1, 2 ``` * Fix NPE: ```sql SELECT if(CSS.SLR_ID > 10, 'B2C', 'C2C') as key ,count(*) FROM P_ATEE_T.DW_ACCOUNTS_ALL AS REV LEFT JOIN PRS_RESTRICTED_V.DNA_CUST_SELLER_SGMNTN_HIST AS CSS ON REV.USER_ID = CSS.SLR_ID AND REV.ACCT_TRANS_DT = CSS.CUST_SLR_SGMNTN_BEG_DT WHERE REV.ACCT_TRANS_DT = DATE '2019-12-01' AND REV.AUCT_TYPE_CODE NOT IN (12,15) GROUP BY 1 ``` * fix test
No description provided.