Skip to content

Conversation

@witgo
Copy link
Contributor

@witgo witgo commented Jun 26, 2014

No description provided.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16159/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16160/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will end up causing disagreement with spark-submit options.

spark-submit will only load spark-defaults.conf if --properties-file is not defined in the command line. It seems like this code will always load spark-defaults.conf if one exists.

@vanzin
Copy link
Contributor

vanzin commented Jun 26, 2014

Hi @witgo, can you elaborate, in the change summary, why this is needed? Maybe file a bug?

I think you guys don't use spark-submit, which is why you might need this; but spark-submit translates the config file into system properties, so the config is picked up by SparkConf. And it seems like this change is breaking the semantics of how spark-submit works.

(I think it would be nice to stop using system properties like that - it gets really confusing when parts of the code use system properties and others use SparkConf - but that's a separate discussion.)

@witgo
Copy link
Contributor Author

witgo commented Jun 27, 2014

@vanzin The situation is sbin/start-*.sh are not support spark-defaults.conf.

eg: sbin/start-history-server.sh cannot load thespark.history.fs.logDirectory configuration from spark-defaults.conf.

@vanzin
Copy link
Contributor

vanzin commented Jun 27, 2014

Ah, so it's SPARK-2098.

I think it's a nice feature to have (I filed the bug after all), but we can't break the existing semantics. For daemons, the command line parsers could do that (by having a "--properties-file" argument similar to spark-submit).

But if you want to support arbitrary SparkConf instances to read these conf files, it will become trickier, since now you need to propagate that command line information somehow.

@witgo
Copy link
Contributor Author

witgo commented Jun 27, 2014

You're right, the corresponding code should be submitted at the weekend.

@witgo
Copy link
Contributor Author

witgo commented Jun 28, 2014

@vanzin I submitted a new PR #1256 . I close this.

@witgo witgo closed this Jun 28, 2014
@witgo witgo deleted the defaults-conf branch March 13, 2015 08:58
sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021
### What changes were proposed in this pull request?

This PR cherry-picks the OPTIMIZE command to Spark 3.2.

### Why are the changes needed?

These changes are needed to support data compaction via SQL for Iceberg.

### Does this PR introduce _any_ user-facing change?

Yes but the changes are isolated and will be supported only by Iceberg.

### How was this patch tested?

This PR comes with tests. More tests are in Iceberg.
wangyum added a commit that referenced this pull request May 26, 2023
…1233)

* Push partial aggregate through range join condition

* Add lower cost expression threshold

* Fix data issue:
```sql
SELECT
    c.session_start_dt,
    COUNT(*) AS clav_session_cats_cnt
FROM
    p_soj_cl_v.clav_session_cats c
    full OUTER JOIN p_soj_cl_v.clav_session_ext s ON
        c.guid = s.guid
        AND c.session_skey = s.session_skey
        AND c.site_id = s.site_id
        AND c.session_start_dt = s.session_start_dt
        AND c.cobrand = s.cobrand
WHERE
    c.session_start_dt = '2021-07-14'

GROUP BY

    1
ORDER by 1, 2
```

* Fix NPE:
```sql
SELECT
      if(CSS.SLR_ID > 10, 'B2C', 'C2C') as key
	  ,count(*)
FROM P_ATEE_T.DW_ACCOUNTS_ALL AS REV
LEFT JOIN PRS_RESTRICTED_V.DNA_CUST_SELLER_SGMNTN_HIST AS CSS
        ON REV.USER_ID = CSS.SLR_ID
       AND REV.ACCT_TRANS_DT = CSS.CUST_SLR_SGMNTN_BEG_DT
WHERE REV.ACCT_TRANS_DT = DATE '2019-12-01'
  AND REV.AUCT_TYPE_CODE NOT IN (12,15)
GROUP BY 1
```

* fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants