[pull] master from apache:master #57

pull · 2024-01-05T23:20:56Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

…e view/table does not exists ### What changes were proposed in this pull request? This PR fixes the undesired behavior that concurrent `CREATE VIEW IF NOT EXISTS` queries could throw `TABLE_OR_VIEW_ALREADY_EXISTS` exceptions. It's because the current implementation did not propagate the 'IF NOT EXISTS' when the detecting view/table does not exists. ### Why are the changes needed? Fix the above issue. ### Does this PR introduce _any_ user-facing change? Yes in the sense that if fixes an issue in concurrent case. ### How was this patch tested? Without the fix the following test failed while with this PR if passed. But following the [comment](#44603 (comment)), I removed the test from this PR. ```scala test("CREATE VIEW IF NOT EXISTS never throws TABLE_OR_VIEW_ALREADY_EXISTS") { // Concurrently create a view with the same name, so that some of the queries may all // get that the view does not exist and try to create it. But with IF NOT EXISTS, the // queries should not fail. import ExecutionContext.Implicits.global val concurrency = 10 val tableName = "table_name" val viewName = "view_name" withTable(tableName) { sql(s"CREATE TABLE $tableName (id int) USING parquet") withView("view_name") { val futures = (0 to concurrency).map { _ => Future { Try { sql(s"CREATE VIEW IF NOT EXISTS $viewName AS SELECT * FROM $tableName") } } } futures.map { future => val res = ThreadUtils.awaitResult(future, 5.seconds) assert( res.isSuccess, s"Failed to create view: ${if (res.isFailure) res.failed.get.getMessage}" ) } } } } ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44603 from anchovYu/create-view-if-not-exist-fix. Authored-by: Xinyi Yu <xinyi.yu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…ingUnaryExecNode ### What changes were proposed in this pull request? This is a followup of #37525 . When expanding the output partitioning/ordering with aliases, we have a threshold to avoid exponential explosion. However, we missed to apply this threshold in one place. This PR fixes it. ### Why are the changes needed? to avoid OOM ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes #44614 from cloud-fan/oom. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

### What changes were proposed in this pull request? Added MONTHNAME function which returns three-letter abbreviated month name for a given date to: - Scala API - Python API - R API - Spark Connect Scala Client - Spark Connect Python Client ### Why are the changes needed? for parity with Snowflake ### Does this PR introduce _any_ user-facing change? Yes, new MONTHNAME function ### How was this patch tested? With newly added unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #44483 from stefankandic/monthname-function. Authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>

…ode when creating column vectors for the missing column ### What changes were proposed in this pull request? This PR fixes a long-standing bug that `OrcColumnarBatchReader` does not respect the memory mode when creating column vectors for missing columbs. This PR fixes it. ### Why are the changes needed? To not violate the memory mode requirement ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes #44598 from cloud-fan/orc. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

### What changes were proposed in this pull request? The `Literal.create` function supports `immuable.ArraySeq`, but the `Literal.apply` function does not. So this pr make the `Literal.apply` function to also support `immuable.ArraySeq`. ### Why are the changes needed? Make `Literal.apply` support `s.c.immuable.ArraySeq` as `Literal.create` ### Does this PR introduce _any_ user-facing change? Yes, user can create `Literal` using the `Literal.apply` function with `s.c.immuable.ArraySeq` as input. ### How was this patch tested? Add a new test ### Was this patch authored or co-authored using generative AI tooling? No Closes #44607 from LuciferYang/SPARK-46604. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

… support `s.c.immutable.ArraySeq` ### What changes were proposed in this pull request? This pr aims to make `lit/typedLit` function in connect module support `s.c.immutable.ArraySeq`. ### Why are the changes needed? `s.c.immutable.ArraySeq` is a commonly used data type in Scala 2.13. ### Does this PR introduce _any_ user-facing change? Yes, user can use function `lit/typedLit` with `s.c.immutable.ArraySeq` type. ### How was this patch tested? Add new test ### Was this patch authored or co-authored using generative AI tooling? No Closes #44608 from LuciferYang/connect-lit-imm-ArraySeq. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…ompatibility check ### What changes were proposed in this pull request? Make the following changes to the XML schema inference: - Use TypeCoercion.findTightestCommonType for compatibility check. - Update DecimalType to support scale > 0 - Create a spark job so that TypeCoercion can access the SQLConf. - Added reduceOption so that each partition returns just one StructType as opposed to a list of StructType ### Why are the changes needed? To achieve consistency of dataType compatibility checks with other formats. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Existing and new unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #44601 from sandip-db/xml-typecoercion. Authored-by: Sandip Agarwala <131817656+sandip-db@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

### What changes were proposed in this pull request? This follow-up refactors the handling of value tags and endElement. 1. As value tags only exist in structure data, their handling will be confined to the inferObject method, eliminating the need for processing in inferField. This implies that when we encounter non-whitespace characters, we can invoke inferObject. For structures with a single primitive field, we'll simplify them into primitive types during the schema inference. 2. We wanted to make sure that the entire entry, including the starting tag, value, and ending tag are all consumed when we completed the parsing. ### Why are the changes needed? This follow-up simplifies the handling of value tags. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #44571 from shujingyang-db/cpature-values-follow-up. Lead-authored-by: Shujing Yang <shujing.yang@databricks.com> Co-authored-by: Shujing Yang <135740748+shujingyang-db@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…ingFiles options ### What changes were proposed in this pull request? This PR corrects the handling of corrupt or missing multiline XML files by respecting user-specific options. It also improves the error for malformed records during schema inference and parsing, to keep consistent with other file formats ### Why are the changes needed? This PR fixes a bug. ### Does this PR introduce _any_ user-facing change? Previously, corrupt/missing files weren't ignored based on user-specific options. This PR fixes this issue. ### How was this patch tested? Unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #44163 from shujingyang-db/ignore-missing-files. Lead-authored-by: Shujing Yang <shujing.yang@databricks.com> Co-authored-by: Shujing Yang <135740748+shujingyang-db@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

### What changes were proposed in this pull request? 1, check the testing mode: - `test_assert_vanilla_mode` in `ReusedPySparkTestCase` - `test_assert_remote_mode` in `ReusedConnectTestCase` use different function names in case a test suite inherit them both 2, fix the incorrect testing mode introduced in #44196 ### Why are the changes needed? incorrect usage of `PandasOnSparkTestCase` (subclass of `ReusedPySparkTestCase`) in parity tests cause test with vanilla Spark Session: #44196 #44592 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci, added UT ### Was this patch authored or co-authored using generative AI tooling? no Closes #44611 from zhengruifeng/py_testing_mode. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…e_dt_interval/make_interval` ### What changes were proposed in this pull request? The pr aims to refine docstring of `convert_timezone/make_dt_interval/make_interval`. ### Why are the changes needed? To improve PySpark documentation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GA. - Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44610 from panbingkun/SPARK-46606. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…/url_decode` ### What changes were proposed in this pull request? This pr refine docstring of `parse_url/url_encode/url_decode` and add some new examples. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #44604 from LuciferYang/url-functions. Lead-authored-by: yangjie01 <yangjie01@baidu.com> Co-authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…ython Data Sources ### What changes were proposed in this pull request? This PR proposes to log full exception when failed to lookup Python Data Sources ### Why are the changes needed? In my internal testing it logs something like: ``` ... 24/01/05 03:49:49 WARN DataSourceManager: Skipping the lookup of Python Data Sources due to the failure: java.lang.StackOverflowError 24/01/05 03:49:49 WARN DataSourceManager: Skipping the lookup of Python Data Sources due to the failure: java.lang.StackOverflowError 24/01/05 03:49:49 WARN DataSourceManager: Skipping the lookup of Python Data Sources due to the failure: java.lang.StackOverflowError 24/01/05 03:49:49 WARN DataSourceManager: Skipping the lookup of Python Data Sources due to the failure: java.lang.StackOverflowError 24/01/05 03:49:49 WARN PythonWorkerFactory: Failed to open socket to Python daemon: java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect0(Native Method) ... ``` which is hard to debug. It should show the full error messages so developers can debug. ### Does this PR introduce _any_ user-facing change? No, the main change has not been released out yet. ### How was this patch tested? Manually. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44617 from HyukjinKwon/log-full-warning. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…ons documentation" This reverts commit e4b5977.

…th DateTimeFormatter ### What changes were proposed in this pull request? This PR propose to remove `ThreadLocal` by replace `SimpleDateFormat` with `DateTimeFormatter`. ### Why are the changes needed? `SimpleDateFormat` is not thread safe, so we wrap it with `ThreadLocal`. `DateTimeFormatter` is thread safe, we can use it instead. According to the javadoc of `SimpleDateFormat`, it recommended to use `DateTimeFormatter` too. ![1](https://github.com/apache/spark/assets/8486025/97b16bbb-e5b7-4b3f-9bc8-0b0b8c907542) In addition, `DateTimeFormatter` have better performance than `SimpleDateFormat` too. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? GA tests. ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #44613 from beliefer/sdf-to-dtf. Authored-by: beliefer <beliefer@163.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>

… conflicts ### What changes were proposed in this pull request? This PR prevents the registration of a Python data source if its name conflicts with either a built-in data source or a loadable custom Java/Scala data source. ### Why are the changes needed? To improve usability. For example, currently, users can register a data source that already exists, but they are unable to use it ```python spark.dataSource.registerPython("json", MyDataSource) # OK spark.read.format("json").load() [FOUND_MULTIPLE_DATA_SOURCES] Detected multiple data sources with the name 'json'. Please check the data source isn't simultaneously registered and located in the classpath. SQLSTATE: 42710 ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #44507 from allisonwang-db/spark-46522-check-name. Authored-by: allisonwang-db <allison.wang@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

### What changes were proposed in this pull request? Fix log error in handleStatusMessage ### Why are the changes needed? When needMergeOutput is true, map/merge should be output instead of map The code got the location of both wrong ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? PASS GA ### Was this patch authored or co-authored using generative AI tooling? Closes #44606 from jiaoqingbo/46601. Authored-by: jiaoqingbo <1178404354@qq.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…cale in Parquet readers ### What changes were proposed in this pull request? This is a follow-up from #44368 implementing an additional type promotion to decimals with larger precision and scale. As long as the precision increases by at least as much as the scale, the decimal values can be promoted without loss of precision: Decimal(6, 2) -> Decimal(8, 4): 1234.56 -> 1234.5600. The non-vectorized reader (parquet-mr) is already able to do this type promotion, this PR implements it for the vectorized reader. ### Why are the changes needed? This allows reading multiple parquet files that contain decimal with different precision/scales ### Does this PR introduce _any_ user-facing change? Yes, the following now succeeds when using the vectorized Parquet reader: ``` Seq(20).toDF($"a".cast(DecimalType(4, 2))).write.parquet(path) spark.read.schema("a decimal(6, 4)").parquet(path).collect() ``` It failed before with the vectorized reader and succeeded with the non-vectorized reader. ### How was this patch tested? - Tests added to `ParquetWideningTypeSuite` to cover decimal promotion between decimals with different physical types: INT32, INT64, FIXED_LEN_BYTE_ARRAY. ### Was this patch authored or co-authored using generative AI tooling? No Closes #44513 from johanl-db/SPARK-40876-parquet-type-promotion-decimal-scale. Authored-by: Johan Lasperas <johan.lasperas@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? This PR add a new optimizer rule `EliminateWindowPartitions`, it remove window partition if partition expressions are foldable. sql1: `select row_number() over(order by a) b from t ` sql2: `select row_number() over(partition by 1 order by a) b from t ` After this PR, the `optimizedPlan` for sql1 and sql2 is the same. ### Why are the changes needed? Foldable partition is redundant, remove it not only can simplify plan, but some rules can also take effect when the partitions are all foldable, such as `LimitPushDownThroughWindow`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT ### Was this patch authored or co-authored using generative AI tooling? No Closes #43144 from zml1206/SPARK-45352. Authored-by: zml1206 <zhuml1206@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? Two changes: - Update comment on `AccumulatorV2`'s `isZero` to reflect what it actually does. - Update variable name in `SQLMetrics` to `defaultValidValue` to reflect this ### Why are the changes needed? `AccumulatorV2`'s `isZero` doesn't do what the comment implies - it actually checks if the accumulator hasn't been updated. The comment implies that for a `LongAccumulator`, for example, a value of `0` would cause `isZero` to be `true`. But if we were to `add(0)`, then the value would still be `0` but `isZero` would return `false`. Changing the name of `zeroValue` to `defaultValidValue` to avoid confusion since `isZero` doesn't use `zeroValue` in `SQLMetric`. Thanks arvindsaik for pointing this out. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44583 from davintjong-db/sqlmetric-zerovalue-refactor. Authored-by: Davin Tjong <davin.tjong@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…one in UIUtils ### What changes were proposed in this pull request? Simplify code and fix new use of DateTimeFormat.withZone introduced in #44613 ; need to use the new object copy this creates. ### Why are the changes needed? Describing on mailing list from Janda Martin: ``` DateTimeFormatter is thread-safe and immutable according to JavaDoc so method DateTimeFormatter::withZone returns new instance when zone is changed. Following code has no effect: val oldTimezones = (batchTimeFormat.getZone, batchTimeFormatWithMilliseconds.getZone) if (timezone != null) { val zoneId = timezone.toZoneId batchTimeFormat.withZone(zoneId) batchTimeFormatWithMilliseconds.withZone(zoneId) } Suggested fix: introduce local variables for "batchTimeFormat" and "batchTimeFormatWithMilliseconds" and remove "oldTimezones" and "finally" block. ``` ### Does this PR introduce _any_ user-facing change? Unlikely, the path in question is apparently test-only ### How was this patch tested? Existing tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #44619 from srowen/SPARK-46611.2. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…captured exception ### What changes were proposed in this pull request? This PR proposes to address null from `Exception.getMessage` in Py4J captured exception. It returns an empty string. ### Why are the changes needed? So whitelisted exceptions with an empty arguments are also covered. ### Does this PR introduce _any_ user-facing change? Virtually no. It only happens when whitelisted exceptions are created without any argument so `null` is located in `message`. ### How was this patch tested? Manually. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44623 from HyukjinKwon/SPARK-46621. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: yangjie01 <yangjie01@baidu.com>

…ssifyException` ### What changes were proposed in this pull request? In the PR, I propose to restore `classifyException()` of `JdbcDialect` before the commit 14a933b, and extends `classifyException()` with the error class parameter by `description`: ```scala def classifyException( e: Throwable, errorClass: String, messageParameters: Map[String, String], description: String): AnalysisException ``` The `description` parameter has the same meaning as `message` in the old version of `classifyException()` which is deprecated. Also old implementation of `classifyException()` has been restored in JDBC dialects: MySQL, PostgreSQL and so on. ### Why are the changes needed? To restore compatibility with existing JDBC dialects. ### Does this PR introduce _any_ user-facing change? No, this PR restores the behaviour prior #44358. ### How was this patch tested? By running the affected test suite: ``` $ build/sbt "core/testOnly *SparkThrowableSuite" ``` and modified test suite: ``` $ build/sbt "test:testOnly *JDBCV2Suite" $ build/sbt "test:testOnly *JDBCTableCatalogSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44449 from MaxGekk/restore-jdbc-classifyException. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>

### What changes were proposed in this pull request? This pr aims to upgrade joda-time from 2.12.5 to 2.12.6 ### Why are the changes needed? The new version brings the following fixes: - JodaOrg/joda-time#733 - JodaOrg/joda-time#755 - JodaOrg/joda-time#731 The full release notes as follows: - https://github.com/JodaOrg/joda-time/releases/tag/v2.12.6 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #44626 from LuciferYang/SPARK-46624. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…ead pool ### What changes were proposed in this pull request? This PR aims to use a meaningful class name prefix for REST Submission API thread pool instead of the default value of Jetty QueuedThreadPool, `"qtp"+super.hashCode()`. https://github.com/dekellum/jetty/blob/3dc0120d573816de7d6a83e2d6a97035288bdd4a/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L64 ### Why are the changes needed? This is helpful during JVM investigation. **BEFORE (4.0.0-preview2)** ``` $ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh $ jstack 28217 | grep qtp "qtp1925630411-52" #52 daemon prio=5 os_prio=31 cpu=0.07ms elapsed=19.06s tid=0x0000000134906c10 nid=0xde03 runnable [0x0000000314592000] "qtp1925630411-53" #53 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=19.06s tid=0x0000000134ac6810 nid=0xc603 runnable [0x000000031479e000] "qtp1925630411-54" #54 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=19.06s tid=0x000000013491ae10 nid=0xdc03 runnable [0x00000003149aa000] "qtp1925630411-55" #55 daemon prio=5 os_prio=31 cpu=0.08ms elapsed=19.06s tid=0x0000000134ac9810 nid=0xc803 runnable [0x0000000314bb6000] "qtp1925630411-56" #56 daemon prio=5 os_prio=31 cpu=0.04ms elapsed=19.06s tid=0x0000000134ac9e10 nid=0xda03 runnable [0x0000000314dc2000] "qtp1925630411-57" #57 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=19.06s tid=0x0000000134aca410 nid=0xca03 runnable [0x0000000314fce000] "qtp1925630411-58" #58 daemon prio=5 os_prio=31 cpu=0.04ms elapsed=19.06s tid=0x0000000134acaa10 nid=0xcb03 runnable [0x00000003151da000] "qtp1925630411-59" #59 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=19.06s tid=0x0000000134acb010 nid=0xcc03 runnable [0x00000003153e6000] "qtp1925630411-60-acceptor-0108e9815-ServerConnector1e497474{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #60 daemon prio=3 os_prio=31 cpu=0.11ms elapsed=19.06s tid=0x00000001317ffa10 nid=0xcd03 runnable [0x00000003155f2000] "qtp1925630411-61-acceptor-11d90f2aa-ServerConnector1e497474{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #61 daemon prio=3 os_prio=31 cpu=0.10ms elapsed=19.06s tid=0x00000001314ed610 nid=0xcf03 waiting on condition [0x00000003157fe000] ``` **AFTER** ``` $ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh $ jstack 28317 | grep StandaloneRestServer "StandaloneRestServer-52" #52 daemon prio=5 os_prio=31 cpu=0.09ms elapsed=60.06s tid=0x00000001284a8e10 nid=0xdb03 runnable [0x000000032cfce000] "StandaloneRestServer-53" #53 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284acc10 nid=0xda03 runnable [0x000000032d1da000] "StandaloneRestServer-54" #54 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284ae610 nid=0xd803 runnable [0x000000032d3e6000] "StandaloneRestServer-55" #55 daemon prio=5 os_prio=31 cpu=0.09ms elapsed=60.06s tid=0x00000001284aec10 nid=0xd703 runnable [0x000000032d5f2000] "StandaloneRestServer-56" #56 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284af210 nid=0xc803 runnable [0x000000032d7fe000] "StandaloneRestServer-57" #57 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284af810 nid=0xc903 runnable [0x000000032da0a000] "StandaloneRestServer-58" #58 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284afe10 nid=0xcb03 runnable [0x000000032dc16000] "StandaloneRestServer-59" #59 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284b0410 nid=0xcc03 runnable [0x000000032de22000] "StandaloneRestServer-60-acceptor-04aefbaa8-ServerConnector44284d85{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #60 daemon prio=3 os_prio=31 cpu=0.13ms elapsed=60.05s tid=0x000000015cda1a10 nid=0xcd03 runnable [0x000000032e02e000] "StandaloneRestServer-61-acceptor-148976251-ServerConnector44284d85{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #61 daemon prio=3 os_prio=31 cpu=0.12ms elapsed=60.05s tid=0x000000015cd1c810 nid=0xce03 waiting on condition [0x000000032e23a000] ``` ### Does this PR introduce _any_ user-facing change? No, the thread names are accessed during the debugging. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48924 from dongjoon-hyun/SPARK-50385. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: panbingkun <panbingkun@apache.org>

…ingBuilder` ### What changes were proposed in this pull request? This PR aims to improve `toString` by `JEP-280` instead of `ToStringBuilder`. In addition, `Scalastyle` and `Checkstyle` rules are added to prevent a future regression. ### Why are the changes needed? Since Java 9, `String Concatenation` has been handled better by default. | ID | DESCRIPTION | | - | - | | JEP-280 | [Indify String Concatenation](https://openjdk.org/jeps/280) | For example, this PR improves `OpenBlocks` like the following. Both Java source code and byte code are simplified a lot by utilizing JEP-280 properly. **CODE CHANGE** ```java - return new ToStringBuilder(this, ToStringStyle.SHORT_PREFIX_STYLE) - .append("appId", appId) - .append("execId", execId) - .append("blockIds", Arrays.toString(blockIds)) - .toString(); + return "OpenBlocks[appId=" + appId + ",execId=" + execId + ",blockIds=" + + Arrays.toString(blockIds) + "]"; ``` **BEFORE** ``` public java.lang.String toString(); Code: 0: new #39 // class org/apache/commons/lang3/builder/ToStringBuilder 3: dup 4: aload_0 5: getstatic #41 // Field org/apache/commons/lang3/builder/ToStringStyle.SHORT_PREFIX_STYLE:Lorg/apache/commons/lang3/builder/ToStringStyle; 8: invokespecial #47 // Method org/apache/commons/lang3/builder/ToStringBuilder."<init>":(Ljava/lang/Object;Lorg/apache/commons/lang3/builder/ToStringStyle;)V 11: ldc #50 // String appId 13: aload_0 14: getfield #7 // Field appId:Ljava/lang/String; 17: invokevirtual #51 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder; 20: ldc #55 // String execId 22: aload_0 23: getfield #13 // Field execId:Ljava/lang/String; 26: invokevirtual #51 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder; 29: ldc #56 // String blockIds 31: aload_0 32: getfield #16 // Field blockIds:[Ljava/lang/String; 35: invokestatic #57 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String; 38: invokevirtual #51 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder; 41: invokevirtual #61 // Method org/apache/commons/lang3/builder/ToStringBuilder.toString:()Ljava/lang/String; 44: areturn ``` **AFTER** ``` public java.lang.String toString(); Code: 0: aload_0 1: getfield #7 // Field appId:Ljava/lang/String; 4: aload_0 5: getfield #13 // Field execId:Ljava/lang/String; 8: aload_0 9: getfield #16 // Field blockIds:[Ljava/lang/String; 12: invokestatic #39 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String; 15: invokedynamic #43, 0 // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; 20: areturn ``` ### Does this PR introduce _any_ user-facing change? No. This is an `toString` implementation improvement. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#51572 from dongjoon-hyun/SPARK-52880. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

anchovYu and others added 3 commits January 5, 2024 22:57

github-actions bot added SQL DOCS PYTHON CONNECT R labels Jan 5, 2024

pull bot added ⤵️ pull and removed SQL DOCS PYTHON CONNECT R labels Jan 5, 2024

github-actions bot added SQL DOCS PYTHON CONNECT R labels Jan 6, 2024

LuciferYang and others added 6 commits January 6, 2024 12:51

github-actions bot added the PANDAS API ON SPARK label Jan 7, 2024

panbingkun and others added 3 commits January 8, 2024 09:04

HyukjinKwon and others added 2 commits January 8, 2024 13:42

Revert "[SPARK-46437][DOCS] Remove cruft from the built-in SQL functi…

a88c64e

…ons documentation" This reverts commit e4b5977.

github-actions bot added STRUCTURED STREAMING WEB UI labels Jan 8, 2024

allisonwang-db and others added 2 commits January 8, 2024 16:20

github-actions bot added the CORE label Jan 8, 2024

johanl-db and others added 7 commits January 8, 2024 16:58

github-actions bot added the BUILD label Jan 8, 2024

pull bot merged commit bb0f77d into huangxiaopingRD:master Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] master from apache:master #57

[pull] master from apache:master #57

Uh oh!

pull bot commented Jan 5, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

[pull] master from apache:master #57

[pull] master from apache:master #57

Uh oh!

Conversation

pull bot commented Jan 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

pull bot commented Jan 5, 2024 •

edited

Loading