Generated on 2022-06-17
#5451 | [FEA] Update Spark2 explain code for 22.06 |
#5261 | [FEA] Create MIG with Cgroups on YARN Dataproc scripts |
#5476 | [FEA] extend concat on arrays to all nested types. |
#5113 | [FEA] ANSI mode: Support CAST between types |
#5112 | [FEA] ANSI mode: allow casting between numeric type and timestamp type |
#5323 | [FEA] Enable floating point by default |
#4518 | [FEA] Add support for escaped unicode hex in regular expressions |
#5405 | [FEA] Support map_concat function |
#5547 | [FEA] Regexp: Can we transpile \W and \D to Java's definition so we can support on GPU? |
#5512 | [FEA] Qualification tool, hook up final output and output execs table |
#5507 | [FEA] Support GpuRaiseError |
#5325 | [FEA] Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys |
#3682 | [FEA] Use conventional jar layout in dist jar if there is only one input shim |
#1556 | [FEA] Implement ANSI mode tests for string to timestamp functions |
#4425 | [FEA] Support line anchor $ and string anchors \z and \Z in regexp_replace |
#5176 | [FEA] Qualification tool UI |
#5111 | [FEA] ANSI mode: CAST between ANSI intervals and IntegralType |
#4605 | [FEA] Add regular expression support for new character classes introduced in Java 8 |
#5273 | [FEA] Support map_filter |
#1557 | [FEA] Enable ANSI mode for CAST string to date |
#5446 | [FEA] Remove hasNans check for array_contains |
#5445 | [FEA] Support reading Int as Byte/Short/Date from parquet |
#5449 | [FEA] QualificationTool. Add speedup information to AppSummaryInfo |
#5322 | [FEA] remove hasNans for Pivot |
#4800 | [FEA] Enable support for more regular expressions with \A and \Z |
#5404 | [FEA] Add Shim for the Spark version shipped with Cloudera CDH 7.1.7 |
#5226 | [FEA] Support array_repeat |
#5229 | [FEA] Support arrays_zip |
#5119 | [FEA] Support ANSI mode for SQL functions/operators |
#4532 | [FEA] Re-enable support for \Z in regular expressions |
#3985 | [FEA] UDF-Compiler: Translation of simple predicate UDF should allow predicate pushdown |
#5034 | [FEA] Implement ExistenceJoin for BroadcastNestedLoopJoin Exec |
#4533 | [FEA] Re-enable support for $ in regular expressions |
#5263 | [FEA] Write out operator mapping from plugin to CSV file for use in qualification tool |
#5095 | [FEA] Support collect_set on struct in reduction context |
#4811 | [FEA] Support ANSI intervals for Cast and Sample |
#2062 | [FEA] support collect aggregations |
#5060 | [FEA] Support Count on Struct of [ Struct of [String, Map(String,String)], Array(String), Map(String,String) ] |
#4528 | [FEA] Add support for regular expressions containing \s and \S |
#4557 | [FEA] Add support for regexp_replace with back-references |
#5148 | Add the MULTI-THREADED reading support for avro |
#5304 | [FEA] Optimize remote Avro reading for a PartitionFile |
#5257 | [FEA][Audit] - [SPARK-34863][SQL] Support complex types for Parquet vectorized reader |
#5149 | Add the COALESCING reading support for avro |
#5769 | [BUG] arithmetic ops tests failing on Spark 3.3.0 |
#5785 | [BUG] Tests module build failed in OrcEncryptionSuite for 321cdh |
#5765 | [BUG] Container decimal overflow when casting float/double to decimal |
#5246 | Verify Parquet columnar encryption is handled safely |
#5770 | [BUG] test_buckets failed |
#5733 | [BUG] Integration test test_orc_write_encryption_fallback fail |
#5719 | [BUG] test_cast_float_to_timestamp_ansi_for_nan_inf failed in spark330 |
#5739 | [BUG] Spark 3.3 build failure - QueryExecutionErrors package scope changed |
#5670 | [BUG] Job failed when parsing "java.lang.reflect.InvocationTargetException: org.apache.spark.sql.catalyst.parser.ParseException:" |
#4860 | [BUG] GPU writing ORC columns statistics |
#5717 | [BUG] div_by_zero test is failing on Spark 330 on 22.06 |
#5632 | [BUG] udf_cudf tests failed: EOFException DataInputStream.readInt(DataInputStream.java:392) |
#5672 | [BUG] Read exception occurs when clipped schema is empty |
#5694 | [BUG] Inconsistent behavior with Spark when reading a non-existent column from Parquet |
#5562 | [BUG] read ORC file with various file schemas |
#5654 | [BUG] Transpiler produces regex pattern that cuDF cannot compile |
#5655 | [BUG] Regular expression pattern [&&1] produces incorrect results on GPU |
#4862 | [FEA] Add support for regular expressions containing octal digits inside character classes , eg[\0177] |
#5615 | [BUG] GpuBatchScanExec only reports output row metrics |
#4505 | [BUG] RegExp parse fails to parse character ranges containing escaped characters |
#4865 | [BUG] Add support for regular expressions containing hexadecimal digits inside character classes, eg [\x7f] |
#5513 | [BUG] NoClassDefFoundError with caller classloader off in GpuShuffleCoalesceIterator in local-cluster |
#5530 | [BUG] regexp: \d , \w inconsistencies with non-latin unicode input |
#5594 | [BUG] 3.3 test_div_overflow_exception_when_ansi test failures |
#5596 | [BUG] Shim service provider failure when using jar built with -DallowConventionalDistJar |
#5582 | [BUG] Nightly CI failed with : 'dist/target/rapids-4-spark_2.12-22.06.0-SNAPSHOT.jar' not exists |
#5577 | [BUG] test_cast_neg_to_decimal_err failing in databricks |
#5557 | [BUG] dist jar does not contain reduced pom, creates an unnecessary jar |
#5474 | [BUG] Spark 3.2.1 arithmetic_ops_test failures |
#5497 | [BUG] 3 tests in IntervalSuite are faling on 330 |
#5544 | [BUG] GpuCreateMap needs to set hasSideEffects in some cases |
#5469 | [BUG] NPE during serialization for shuffle in array-aggregation-with-limit query |
#5496 | [BUG] avg literals bools is failing on 330 |
#5511 | [BUG] orc_test failures on 321cdh |
#5439 | [BUG] Encrypted Parquet writes are being replaced with a GPU unencrypted write |
#5108 | [BUG] GpuArrayExists encounters a CudfException on an input partition consisting of just empty lists |
#5492 | [BUG] com.nvidia.spark.rapids.RegexCharacterClass cannot be cast to com.nvidia.spark.rapids.RegexCharacterClassComponent |
#4818 | [BUG] ASYNC: the spill store needs to synchronize on spills against the allocating stream |
#5481 | [BUG] test_parquet_check_schema_compatibility failed in databricks runtimes |
#5482 | [BUG] test_cast_string_date_invalid_ansi_before_320 failed in databricks runtime |
#5457 | [BUG] 330 AnsiCastOpSuite Unit tests failed 22 cases |
#5098 | [BUG] Harden calls to RapidsBuffer.free |
#5464 | [BUG] Query failure with java.lang.AssertionError when using partitioned Iceberg tables |
#4746 | [FEA] Add support for regular expressions containing octal digits in range \200 to 377 |
#5200 | [BUG] More detailed logs to show which parquet file and which data type has mismatch. |
#4866 | [BUG] Add support for regular expressions containing hexadecimal digits greater than 0x7f |
#5140 | [BUG] NPE on array_max of transformed empty array |
#5444 | [BUG] build failed on Databricks |
#5357 | [BUG] Spark 3.3 cache_test test_passing_gpuExpr_as_Expr[failures |
#5429 | [BUG] test_cache_expand_exec fails on Spark 3.3 |
#5312 | [BUG] The coalesced AVRO file may contain different sync markers if the sync marker varies in the avro files being coalesced. |
#5415 | [BUG] Regular Expressions: matching the dot . doesn't fully exclude all unicode line terminator characters |
#5413 | [BUG] Databricks 321 build fails - not found: type OrcShims320untilAllBase |
#5286 | [BUG] assert failed test_struct_self_join and test_computation_in_grpby_columns |
#5351 | [BUG] Build fails for Spark 3.3 due to extra arguments to mapKeyNotExistError |
#5260 | [BUG] map_test failures on Spark 3.3.0 |
#5189 | [BUG] Reading from iceberg table will fail. |
#5130 | [BUG] string_split does not respect spark.rapids.sql.regexp.enabled config |
#5267 | [BUG] markdown link check failed issue |
#5295 | [BUG] Build fails for Spark 3.3 due to extra arguments to mapKeyNotExistError |
#5264 | [BUG] Delete unused generic type. |
#5275 | [BUG] rlike cannot run on GPU because invalid or unsupported escape character ']' near index 14 |
#5278 | [BUG] build 311cdh failed: unable to find valid certification path to requested target |
#5211 | [BUG] csv_test:test_basic_csv_read FAILED |
#5244 | [BUG] Spark 3.3 integration test failures logic_test.py::test_logical_with_side_effect |
#5041 | [BUG] Implement hasSideEffects for all expressions that have side-effects |
#4980 | [BUG] window_function_test FAILED on PASCAL GPU |
#5240 | [BUG] EGX integration test_collect_list_reductions failures |
#5242 | [BUG] Executor falls back to cudaMalloc if the pool can't be initialized |
#5215 | [BUG] Coalescing reading is not working for v2 parquet/orc datasource |
#5104 | [BUG] Unconditional warning in UDF Plugin "The compiler is disabled by default" |
#5099 | [BUG] Profiling tool should not sum gettingResultTime |
#5182 | [BUG] Spark 3.3 integration tests arithmetic_ops_test.py::test_div_overflow_exception_when_ansi failures |
#5147 | [BUG] object LZ4Compressor is not a member of package ai.rapids.cudf.nvcomp |
#4695 | [BUG] Segfault with UCX and ASYNC allocator |
#5138 | [BUG] xgboost job failed if we enable PCBS |
#5135 | [BUG] GpuRegExExtract is not align with RegExExtract |
#5084 | [BUG] GpuWriteTaskStatsTracker complains for all writes in local mode |
#5123 | [BUG] Compile error for Spark330 because of VectorizedColumnReader constructor added a new parameter. |
#5133 | [BUG] Compile error for Spark330 because of Spark changed the method signature: QueryExecutionErrors.mapKeyNotExistError |
#4959 | [BUG] Test case in OpcodeSuite failed on Spark 3.3.0 |
#5861 | [Doc]Add Spark3.3 support in doc for 22.06 branch[skip ci] |
#5851 | Update 22.06 changelog to include new commits [skip ci] |
#5848 | Update spark330shim to use released lib |
#5840 | [DOC] Updated RapidsConf to reflect the default value of spark.rapids.sql.improvedFloatOps.enabled [skip ci] |
#5816 | Update 22.06.0 changelog to latest [skip ci] |
#5795 | Update FAQ to include local jar deployment via extraClassPath [skip ci] |
#5802 | Update spark-rapids-jni.version to release 22.06.0 |
#5798 | Fall back to CPU for RoundCeil and RoundFloor expressions |
#5791 | Remove ORC encryption test from 321cdh |
#5766 | Fix the overflow of container type when casting floats to decimal |
#5786 | Fix rounds over decimal in Spark 330+ |
#5761 | Throw an exception when attempting to read columnar encrypted Parquet files on the GPU |
#5784 | Update the error string for test_cast_neg_to_decimal_err on 330 |
#5781 | Correct the exception string for test_mod_pmod_by_zero on Spark 3.3.0 |
#5764 | Add test for encrypted ORC write |
#5760 | Enable avrotest in nightly tests [skip ci] |
#5746 | Init 22.06 changelog [skip ci] |
#5716 | Disable Avro support when spark-avro classes not loadable by Shim classloader |
#5737 | Remove the ORC encryption tests |
#5753 | [DOC] Update regexp compatibility for 22.06 [skip ci] |
#5738 | Update Spark2 explain code for 22.06 |
#5731 | Throw SparkDateTimeException for InvalidInput while casting in ANSI mode |
#5742 | Spark-3.3 build fix - Move QueryExecutionErrors to sql package |
#5641 | [Doc]Update 22.06 documentation[skip ci] |
#5701 | Update docs for qualification tool to reflect recommendations and UI [skip ci] |
#5283 | Add documentation for MIG on Dataproc [skip ci] |
#5728 | Qualification tool: Add test for stage failures |
#5681 | Branch 22.06 nvcomp notice binary [skip ci] |
#5713 | Fix GpuCast losing the timezoneId during canonicalization |
#5715 | Update GPU ORC statistics write support |
#5718 | Update the error message for div_by_zero test |
#5604 | ORC encrypted write should fallback to CPU |
#5674 | Fix reading ORC/PARQUET over empty clipped schema |
#5676 | Fix ORC reading over different schemas |
#5693 | Temporarily allow 3.3.1 for 3.3.0 shims. |
#5591 | Enable regular expressions by default |
#5664 | Fix edge case where one side of regexp choice ends in duplicate string anchors |
#5542 | Support arrays of arrays and structs for concat on arrays |
#5677 | Qualification tool Enable UI by default |
#5575 | Regexp: Transpile \D , \W to Java's definitions |
#5668 | Add user as CI owner [skip ci] |
#5627 | Install locales and generate en_US.UTF-8 |
#5514 | ANSI mode: allow casting between numeric type and timestamp type |
#5600 | Qualification tool UI cosmetics and CSV output changes |
#5658 | Fallback to CPU when && found in character class |
#5644 | Qualification tool: Enable UDF reporting in potential problems |
#5645 | Add support for octal digits in character classes |
#5643 | Fix missing GpuBatchScanExec metrics in SQL UI |
#5441 | Enable optional float confs and update docs mentioning them |
#5532 | Support hex digits in character classes and escaped characters in character class ranges |
#5625 | [DOC]update links for 2206 release[skip ci] |
#5623 | Handle duplicates in negated character classes |
#5533 | Support GpuMapConcat |
#5614 | Move HostConcatResultUtil out of unshimmed classes |
#5612 | Qualification tool: update SQL Df value used and look at jobs in SQL |
#5526 | Fix whitespace \s and \S tests |
#5541 | Regexp: Transpile \d , \w to Java's definitions |
#5598 | Qualification tool: Update RunningQualificationApp tests |
#5601 | Update test_div_overflow_exception_when_ansi test for Spark-3.3 |
#5588 | Update Databricks build scripts |
#5599 | Move ShimServiceProvider file re-init/truncate |
#5531 | Filter rows with null keys when coalescing due to reaching cuDF row limits |
#5550 | Qualification tool hook up final output based on per exec analysis |
#5540 | Support RaiseError |
#5505 | Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys |
#5583 | Disable spark snapshot shims build for pre-merge |
#5584 | Enable automerge from branch-22.06 to 22.08 [skip ci] |
#5581 | nightly CI to install and deploy cuda11 classifier dist jar [skip ci] |
#5579 | Update test_cast_neg_to_decimal_err to work with Databricks 10.4 where exception is different |
#5578 | Fix unfiltered partitions being used to create GpuBatchScanExec RDD |
#5560 | Minor: Clean up the tests of concat_list |
#5528 | Enable build and test with JDK11 |
#5571 | Update array_min and array_max to use new cudf operations |
#5558 | Fix target file for update from extra-resources in dist module |
#5556 | Move FsInput creation into AvroFileReader |
#5483 | Don't distinguish between types of ArithmeticException for Spark 3.2.x |
#5539 | Fix IntervalSuite cases failure |
#5421 | Support multi-threaded reading for avro |
#5538 | Add tests for string to timestamp functions in ANSI mode |
#5546 | Set hasSideEffects correctly for GpuCreateMap |
#5529 | Fix failing bool agg test in Spark 3.3 |
#5500 | Fallback parquet reading with merged schema and native footer reader |
#5534 | MVN_OPT to last, as it is empty in most cases |
#5523 | Enable forcePositionEvolution for 321cdh |
#5501 | Build against specified spark-rapids-jni snapshot jar [skip ci] |
#5489 | Fallback to the CPU if Parquet encryption keys are set |
#5527 | Fix bug with character class immediately following a string anchor |
#5506 | Fix ClassCastException in regular expression transpiler |
#5519 | Address feedback in "string anchors regexp replace" PR |
#5520 | [DOC] Remove Spark from our naming of Tools [skip ci] |
#5491 | Enables $ , \z , and \Z in REGEXP_REPLACE on the GPU |
#5470 | Qualification tool support UI code generation |
#5353 | Supports casting between ANSI interval types and integral types |
#5487 | Add limited support for captured vars and athrow |
#5499 | [DOC]update doc for emr6.6[skip ci] |
#5485 | Add cudaStreamSynchronize when a new device buffer is added to the spill framework |
#5477 | Add support for \h , \H , \v , \V , and \R character classes |
#5490 | Qualification tool: Update speedup factor for few operators |
#5494 | Fix databrick Shim to support Ansi mode when casting from string to date |
#5498 | Enable 330 unit tests for nightly |
#5504 | Fix printing of split information when dumping debug data |
#5486 | Fix regression in AnsiCastOpSuite with Spark 3.3.0 |
#5436 | Support map_filter operator |
#5471 | Add implicit safeFree for RapidsBuffer |
#5465 | Fix query planning issue when Iceberg is used with DPP and AQE |
#5459 | Add test cases for casting string to date in ANSI mode |
#5443 | Add support for regular expressions containing octal digits greater than \200 |
#5468 | Qualification tool: Add support for join, pandas, aggregate execs |
#5473 | Remove hasNan check over array_contains |
#5434 | Check schema compatibility when building parquet readers |
#5442 | Add support for regular expressions containing hexadecimal digits greater than 0x7f |
#5466 | [Doc] Change the picture of the query plan to text format. [skip ci] |
#5310 | Use C++ to parse and filter parquet footers. |
#5454 | QualificationTool. Add speedup information to AppSummaryInfo |
#5455 | Moved ShimCurrentBatchIterator so it's visible to db312 and db321 |
#5354 | Plugin should throw same arithmetic exceptions as Spark part1 |
#5440 | Qualification tool support for read and write execs and more, add mapping stage times to sql execs |
#5431 | [DOC] Update the ubuntu repo key [skip ci] |
#5425 | Handle readBatch changes for Spark 3.3.0 |
#5438 | Add tests for all-null data for array_max |
#5428 | Make the sync marker uniform for the Avro coalescing reader |
#5432 | Test case insensitive reading for Parquet and CSV |
#5433 | [DOC] Removed mention of 30x from shims.md [skip ci] |
#5424 | Exclude all unicode line terminator characters from matching dot |
#5426 | Qualification tool: Parsing Execs to get the ExecInfo #2 |
#5427 | Workaround to fix cuda repo key rotation in ubuntu images [skip ci] |
#5419 | Append my id to blossom-ci whitelist [skip ci] |
#5422 | xfail tests for spark 3.3.0 due to changes in readBatch |
#5420 | Qualification tool: Parsing Execs to get the ExecInfo #1 |
#5418 | Add GpuEqualToNoNans and update GpuPivotFirst to use to handle PivotFirst with NaN support enabled on GPU |
#5306 | Support coalescing reading for avro |
#5410 | Update docs for removal of 311cdh |
#5414 | Add 320+-noncdh to Databricks to fix 321db build |
#5349 | Enable some repetitions for \A and \Z |
#5346 | ADD 321cdh shim to rapids and remove 311cdh shim |
#5408 | [DOC] Add rebase mode notes for databricks doc [skip ci] |
#5348 | Qualification tool: Skip GPU event logs |
#5400 | Restore test_computation_in_grpby_columns and test_struct_self_join |
#5399 | Update New Issue template to recommend a Discussion or Question [skip ci] |
#5293 | Support array_repeat |
#5359 | Qualification tool base plan parsing infrastructure |
#5360 | Revert "skip failing tests for Spark 3.3.0 (#5313)" |
#5326 | Update GCP doc and scripts [skip ci] |
#5352 | Fix spark330 build due to mapKeyNotExistError changed |
#5317 | Support arrays_zip |
#5316 | Support ANSI mode for ToUnixTimestamp, UnixTimestamp, GetTimestamp, DateAddInterval |
#5319 | Re-enable support for \Z in regular expressions on the GPU |
#5315 | Simplify conditional catalyst expressions generated by udf-compiler |
#5301 | Support existence join type for broadcast nested loop join |
#5313 | skip failing tests for Spark 3.3.0 |
#5311 | Add information about the discussion board to the README and FAQ [skip ci] |
#5308 | Remove unused ColumnViewUtil |
#5289 | Re-enable dollar ($) line anchor in regular expressions in find mode |
#5274 | Perform explicit UnsafeRow projection in ColumnarToRow transition |
#5297 | GpuStringSplit now honors thespark.rapids.sql.regexp.enabled configuration option |
#5307 | Remove compatibility guide reference to issue #4060 |
#5298 | Qualification tool: Operator mapping from plugin to CSV file |
#5266 | Update Outdated GCP getting started guide[skip ci] |
#5300 | Fix DIST_JAR PATH in coverage-report [skip ci] |
#5290 | Add documentation about reporting security issues [skip ci] |
#5277 | Support multiple datatypes in TypeSig.withPsNote() |
#5296 | Fix spark330 build due to removal of isElementAt parameter from mapKeyNotExistError |
#5291 | fix dead links in shims.md [skip ci] |
#5276 | fix markdown check issue[skip ci] |
#5270 | Include dependency of common jar in tools jar |
#5265 | Remove unused generic types |
#5288 | Temporarily xfail tests to restore premerge builds |
#5287 | Fix nightly scripts to deploy w/ classifier correctly [skip ci] |
#5134 | Support division on ANSI interval types |
#5279 | Add test case for ANSI pmod and ANSI Remainder |
#5284 | Enable support for escaping the right square bracket |
#5280 | [BUG] Fix incorrect plugin nightly deployment and release [skip ci] |
#5249 | Use a bundled spark-rapids-jni dependency instead of external cudf dependency |
#5268 | [BUG] When ASYNC is enabled GDS needs to handle cudaMalloced bounce buffers |
#5230 | Update csv float tests to reflect changes in precision in cuDF |
#5001 | Add fuzzing test for JSON reader |
#5155 | Support casting between day-time interval and string |
#5247 | Fix test failure caused by change in Spark 3.3 exception |
#5254 | Fix the integration test of collect_list_reduction |
#5243 | Throw again after logging that RMM could not intialize |
#5105 | Support multiplication on ANSI interval types |
#5171 | Fix the bug COALESCING reading does not work for v2 parquet/orc datasource |
#5157 | Update the log warning of UDF compiler |
#5213 | Support sample on ANSI interval types |
#5218 | XFAIL tests that are failing due to issue 5211 |
#5202 | Profiling tool: Remove gettingResultTime from stages & jobs aggregation |
#5201 | Fix merge conflict from branch-22.04 |
#5195 | Refactor Spark33XShims to avoid code duplication |
#5185 | Fix test failure with Spark 3.3 by looking for less specific error message |
#4992 | Support Collect-like Reduction Aggregations |
#5193 | Fix auto merge conflict 5192 [skip ci] |
#5020 | Support arithmetic operators on ANSI interval types |
#5174 | Fix auto merge conflict 5173 [skip ci] |
#5168 | Fix auto merge conflict 5166 |
#5151 | Remove NvcompLZ4CompressionCodec single-buffer APIs |
#5132 | Add count support for all types |
#5141 | Upgrade to UCX 1.12.1 for 22.06 |
#5143 | Fix merge conflict with branch-22.04 |
#5144 | Adapt to storage-partitioned join additions in SPARK-37377 |
#5139 | Make mvn-verify check name more descriptive [skip ci] |
#5136 | Fix GpuRegExExtract about inconsistent to Spark |
#5107 | Fix GpuFileFormatDataWriter failing to stat file after commit |
#5124 | Fix ShimVectorizedColumnReader construction for recent Spark 3.3.0 changes |
#5047 | Change Cast.toString as "cast" instead of "ansi_cast" under ANSI mode |
#5089 | Enable regular expressions containing \s and \S |
#5087 | Add support for regexp_replace with back-references |
#5110 | Appending my id (mattahrens) to the blossom-ci whitelist [skip ci] |
#5090 | Add nvtx ranges around pre, agg, and post steps in hash aggregate |
#5092 | Remove single-buffer compression codec APIs |
#5093 | Fix leak when GDS buffer store closes |
#5067 | Premerge databricks CI autotrigger [skip ci] |
#5083 | Remove EMRShimVersion |
#5076 | Unshim cache serializer and other 311+-all code |
#5074 | Make ASYNC the default allocator for 22.06 |
#5073 | Add in nvtx ranges for parquet filterBlocks |
#5077 | Change Scala style continuation indentation to be 2 spaces to match guide [skip ci] |
#5070 | Fix merge from 22.04 to 22.06 |
#5046 | Init 22.06.0-SNAPSHOT |
#5059 | Fix merge from 22.04 to 22.06 |
#5036 | Unshim many expressions |
#4993 | PCBS and Parquet support ANSI year month interval type |
#5031 | Unshim many SparkShim interfaces |
#5027 | Fix merge of branch-22.04 to branch-22.06 |
#5022 | Unshim many Pandas execs |
#5013 | Unshim GpuRowBasedScalaUDF |
#5012 | Unshim GpuOrcScan and GpuParquetScan |
#5010 | Unshim GpuSumDefaults |
#5007 | Remove schema utils, case class copying, file partition, and legacy statistical aggregate shims |
#4999 | Enable automerge from branch-22.04 to branch-22.06 [skip ci] |
#4734 | [FEA] Support approx_percentile in reduction context |
#1922 | [FEA] Support ORC forced positional evolution |
#123 | [FEA] add in support for dayfirst formats in the CSV parser |
#4863 | [FEA] Improve timestamp support in JSON and CSV readers |
#4935 | [FEA] Support reading Avro: primitive types |
#4915 | [FEA] Drop support for Spark 3.0.1, 3.0.2, 3.0.3, Databricks 7.3 ML LTS |
#4815 | [FEA] Support org.apache.spark.sql.catalyst.expressions.ArrayExists |
#3245 | [FEA] GpuGetMapValue should support all valid value data types and non-complex key types |
#4914 | [FEA] Support for Databricks 10.4 ML LTS |
#4945 | [FEA] Support filter and comparisons on ANSI day time interval type |
#4004 | [FEA] Add support for percent_rank |
#1111 | [FEA] support spark.sql.legacy.timeParserPolicy when parsing CSV files |
#4849 | [FEA] Support parsing dates in JSON reader |
#4789 | [FEA] Add Spark 3.1.4 shim |
#4646 | [FEA] Make JSON parsing of NaN and Infinity values fully compatible with Spark |
#4824 | [FEA] Support reading decimals from JSON and CSV |
#4814 | [FEA] Support element_at with non-literal index |
#4816 | [FEA] Support org.apache.spark.sql.catalyst.expressions.GetArrayStructFields |
#3542 | [FEA] Support str_to_map function |
#4721 | [FEA] Support regular expression delimiters for str_to_map |
#4791 | Update Spark 3.1.3 to be released |
#4712 | [FEA] Allow to partition on Decimal 128 when running on the GPU |
#4762 | [FEA] Improve support for reading JSON integer types |
#4696 | [FEA] Support casting map to string |
#1572 | [FEA] Add in decimal support for pmod, remainder and divide |
#4763 | [FEA] Improve support for reading JSON boolean types |
#4003 | [FEA] Add regular expression support to GPU implementation of StringSplit |
#4626 | [FEA] cannot run on GPU because unsupported data types in 'partitionSpec' |
#33 | [FEA] hypot SQL function |
#4515 | [FEA] Set RMM async allocator as default |
#3026 | [FEA] [Audit]: Set the list of read columns in the task configuration to reduce reading of ORC data |
#4895 | Add support for structs in GpuScalarSubquery |
#4393 | [BUG] Columnar to Columnar transfers are very slow |
#589 | [FEA] Support ExistenceJoin |
#4784 | [FEA] Improve copying decimal data from CPU columnar data |
#4685 | [FEA] Avoid regexp cost in string_split for escaped characters |
#4777 | Remove input upcast in GpuExtractChunk32 |
#4722 | Optimize DECIMAL128 average aggregations |
#4645 | [FEA] Investigate ASYNC allocator performance with additional queries |
#4539 | [FEA] semaphore optimization in shuffled hash join |
#2441 | [FEA] Use AST for filter in join APIs |
#5233 | [BUG] rapids-tools v22.04.0 release jar reports maven dependency issue : rapids-4-spark-common_2.12:jar:22.04.0 NOT FOUND |
#5183 | [BUG] UCX EGX integration test array_test.py::test_array_exists failures |
#5180 | [BUG] create_map failed with java.lang.IllegalStateException: This is not supported yet |
#5181 | [BUG] Dataproc tests failing when trying to detect for accelerated row conversions |
#5154 | [BUG] build failed in databricks 10.4 runtime (updated recently) |
#5159 | [BUG] Approx percentile query fails with UnsupportedOperationException |
#5164 | [BUG] Databricks 9.1ML failed with "java.lang.NoSuchMethodError: org.apache.spark.sql.execution.metric.SQLMetrics$.createSizeMetric" |
#5125 | [BUG] GpuCast.hasSideEffects does not check if child expression has side effects |
#5091 | [BUG] Profiling tool fails process custom task accumulators of type CollectionAccumulator |
#5050 | [BUG] Release build of v22.04.0 FAILED on "Execution attach-javadoc failed: NullPointerException" with maven option '-P source-javadoc' |
#5035 | [BUG] Different CSV parsing behavior between 22.04 and 22.02 |
#5065 | [BUG] spark330+ build error due to SPARK-37463 |
#5019 | [BUG] udf compiler failed to translate UDF in spark-shell |
#5048 | [BUG] OOM for q18 of TPC-DS benchmark testing on Spark2a |
#5038 | [BUG] When spark.rapids.sql.regexp.enabled is on in 22.04 snapshot jars, Reading a Delta table in Databricks may cause driver error |
#5023 | [BUG] When+sequence could trigger "Illegal sequence boundaries" error |
#5021 | [BUG] test_cache_reverse_order failed |
#5003 | [BUG] Cloudera 3.1.1 tests fail due to ClouderaShimVersion |
#4960 | [BUG] Spark 3.3 IT cache_test:test_passing_gpuExpr_as_Expr failure |
#4913 | [BUG] Fall back to the CPU if we see a scale on Ceil or Floor |
#4806 | [BUG] When running xgboost training, if PCBS is enabled, it fails with java.lang.AssertionError |
#4542 | [BUG] test_write_round_trip failed Maximum pool size exceeded |
#4911 | [BUG][Audit] [SPARK-38314] - Fail to read parquet files after writing the hidden file metadata |
#4936 | [BUG] databricks nightly window_function_test failures |
#4931 | [BUG] Spark 3.3 IT test cache_test.py::test_passing_gpuExpr_as_Expr fails with IllegalArgumentException |
#4710 | [BUG] cudaErrorIllegalAddress for q95 (3TB) on GCP with ASYNC allocator |
#4918 | [BUG] databricks nightly build failed |
#4826 | [BUG] cache_test failures when testing with 128-bit decimal |
#4855 | [BUG] Shim tests in sql-plugin module are not running |
#4487 | [BUG] regexp_find hangs with some patterns |
#4486 | [BUG] Regular expressions with hex digits not working as expected |
#4879 | [BUG] [SPARK-38237][SQL] ClusteredDistribution clustering keys break build with wrong arguments |
#4883 | [BUG] row-based_udf_test.py::test_hive_empty_* fail nightly tests |
#4876 | [BUG] Nightly build failed on Databricks with "pip: No such file or directory" |
#4739 | [BUG] Plugin will crash with query > 100 columns on pascal GPU |
#4840 | [BUG] test_dpp_via_aggregate_subquery_aqe_off failed with table already exists |
#4841 | [BUG] test_compress_write_round_trip failed on Spark 3.3 |
#4668 | [FEA][Audit] - [SPARK-37750][SQL] ANSI mode: optionally return null result if element not exists in array/map |
#3971 | [BUG] udf-examples dependencies are incorrect |
#4022 | [BUG] Ensure shims.v2.ParquetCachedBatchSerializer and similar classes are at most package-private |
#4526 | [BUG] Short circuit AND/OR in ANSI mode |
#4787 | [BUG] Dataproc notebook IT test failure - NoSuchMethodError: org.apache.spark.network.util.ByteUnit.toBytes |
#4704 | [BUG] Update the premerge and nightly tests after moving the UDF example to external repository |
#4795 | [BUG] Read ORC does not ignoreCorruptFiles |
#4802 | [BUG] GPU CSV read does not honor ignoreCorruptFiles or ignoreMissingFiles |
#4803 | [BUG] GPU JSON read does not honor ignoreCorruptFiles or ignoreMissingFiles |
#1986 | [BUG] CSV reading null inconsistent between spark.rapids.sql.format.csv.enabled=true&false |
#126 | [BUG] CSV parsing large number values overflow |
#4759 | [BUG] Profiling tool can miss datasources when they are GPU reads |
#4798 | [BUG] Integration test builds failing with worker_id not found |
#4727 | [BUG] Read Parquet does not ignoreCorruptFiles |
#4744 | [BUG] test_groupby_std_variance_partial_replace_fallback failed |
#4761 | [BUG] test_simple_partitioned_read failed on Spark 3.3 |
#2071 | [BUG] parsing invalid boolean CSV values return true instead of null |
#4749 | [BUG] test_write_empty_parquet_round_trip failed |
#4730 | [BUG] python UDF tests are leaking |
#4290 | [BUG] Investigate q32 and q67 for decimals potential regression |
#4409 | [BUG] Possible race condition in regular expression support for octal digits |
#4728 | [BUG] test_mixed_compress_read orc_test.py failures |
#4736 | [BUG] buildall --profile=321 fails on missing spark301 rapids-4-spark-sql dependency |
#4702 | [BUG] cache_test.py failed w/ cache.serializer in spark 3.3.0 |
#4031 | [BUG] Spark 3.3.0 test failure: NoSuchMethodError org.apache.orc.TypeDescription.getAttributeValue |
#4664 | [BUG] MortgageAdaptiveSparkSuite failed with duplicate buffer exception |
#4564 | [BUG] map_test ansi failed in spark330 |
#119 | [BUG] LIKE does not work if null chars are in the string |
#124 | [BUG] CSV/JSON Parsing some float values results in overflow |
#4045 | [BUG] q93 failed in this week's NDS runs |
#4488 | [BUG] isCastingStringToNegDecimalScaleSupported seems set wrong for some Spark versions |
#5251 | Update 22.04 changelog to latest [skip ci] |
#5232 | Fix issue in GpuArrayExists where a parent view outlived the child |
#5239 | Fix tools depending on the common jar |
#5205 | Update 22.04 changelog to latest [skip ci] |
#5190 | Fix column->row conversion GPU check: |
#5184 | Fix CPU fallback for Map lookup |
#5191 | Update version-def to use released cudfjni 22.04.0 [skip ci] |
#5167 | Update cudfjni version to released 22.04.0 |
#5169 | Terminate test earlier if pytest ENV issue [skip ci] |
#5160 | Fix approximate percentile reduction UnsupportedOperationException |
#5165 | Update Databricks 10.4 for changes to the QueryStageExec and ClusteredDistribution |
#4997 | Update docs for the 22.04 release[skip ci] |
#5146 | Support env var INTEGRATION_TEST_VERSION to override shim version |
#5103 | Init 22.04 changelog [skip ci] |
#5122 | Disable GPU accelerated row-column transpose for Pascal GPUs: |
#5127 | GpuCast.hasSideEffects now checks to see if the child expression has side-effects |
#5118 | On task failure catch some CUDA exceptions and kill executor |
#5069 | Update for the public release [skip ci] |
#5097 | Implement hasSideEffects for GpuGetArrayItem, GpuElementAt, GpuGetMapValue, GpuUnaryMinus, and GpuAbs |
#5079 | Disable spark snapshot shims pre-merge build in 22.04 |
#5094 | Fix profiling tool reading collectionAccumulator |
#5078 | Disable JSON and CSV floating-point reads by default |
#4961 | Support approx_percentile in reduction context |
#5062 | Update Spark 2.x explain API with changes in 22.04 |
#5066 | Add getOrcSchemaString for OrcShims |
#5030 | Fix regression from 21.12 where udfs defined in repl no longer worked |
#5051 | Revert "Replace ParquetFileReader.readFooter with open() and getFooter " |
#5052 | Work around incompatibility between Databricks Delta loads and GpuRegExpExtract |
#4972 | Add support for ORC forced positional evolution |
#5042 | Implement hasSideEffects for GpuSequence |
#5040 | Fix missing imports for 321db shim |
#5033 | Removed limit from the test |
#4938 | Improve compatibility when reading timestamps from JSON and CSV sources |
#5026 | Update RoCE doc URL [skip ci] |
#4976 | Replace ParquetFileReader.readFooter with open() and getFooter |
#4989 | Use conf.useCompression config to decide if we should be compressing the cache |
#4956 | Add avro reader support |
#5009 | Remove references of shims folder in docs [skip ci] |
#5004 | Add ClouderaShimVersion to unshimmed files |
#4971 | Fall back to the CPU for non-zero scale on Ceil or Floor functions |
#4996 | Fix collect_set on struct type |
#4998 | Added the id back for struct children to make them unique |
#4995 | Include 321db shim in distribution build [skip ci] |
#4981 | Update doc for CSV reading interval |
#4973 | Implement support for ArrayExists expression |
#4988 | Remove support for Spark 3.0.x |
#4955 | Add UDT support to ParquetCachedBatchSerializer (CPU) |
#4994 | Add databricks 10.4 build in pre-merge |
#4990 | Remove 30X permerge support for version 22.04 and above [skip ci] |
#4958 | Add independent mvn verify check [skip ci] |
#4933 | Set OrcConf.INCLUDE_COLUMNS for ORC reading |
#4944 | Support for non-string key-types for GetMapValue and element_at() |
#4974 | Add shim for Databricks 10.4 |
#4907 | Add markdown check action |
#4977 | Add missing 314 to buildall script |
#4927 | Support reading ANSI day time interval type from CSV source |
#4965 | Documentation: add example python api call for ExplainPlan.explainPotentialGpuPlan [skip ci] |
#4957 | Document agg pushdown on ORC file limitation [skip ci] |
#4946 | Support predictors on ANSI day time interval type |
#4952 | Have a fixed GPU memory size for integration tests |
#4954 | Fix of failing to read parquet files after writing the hidden file metadata in |
#4953 | Add Decimal 128 as a supported type in partition by for databricks running window |
#4941 | Use new list reduction API to improve performance |
#4926 | Support DayTimeIntervalType in ParquetCachedBatchSerializer |
#4947 | Fallback to ARENA if ASYNC configured and driver < 11.5.0 |
#4934 | Replace MetadataAttribute with FileSourceMetadataAttribute to follow the update in Spark for 3.3.0+ |
#4942 | Fix window rank integration tests on |
#4928 | Disable regular expressions on GPU by default |
#4923 | Support GpuScalarSubquery on nested types |
#4924 | Implement percent_rank() on GPU |
#4853 | Improve date support in JSON and CSV readers |
#4930 | Add in support for sorting arrays with structs in sort_array |
#4861 | Add Apache Spark 3.1.4-SNAPSHOT Shims |
#4925 | Remove unused Spark322PlusShims |
#4921 | Add DatabricksShimVersion to unshimmed class list |
#4917 | Default some configs to protect against cluster settings in integration tests |
#4922 | Add support for decimal 128 for db and spark 320+ |
#4919 | Case-insensitive PR title check [skip ci] |
#4796 | Implement ExistenceJoin Iterator using an auxiliary left semijoin |
#4857 | Transition to v2 shims [Databricks] |
#4899 | Fixed Decimal 128 bug in ParquetCachedBatchSerializer |
#4810 | Support ANSI intervals to/from Parquet |
#4909 | Make ARENA the default allocator for 22.04 |
#4856 | Enable shim tests in sql-plugin module |
#4880 | Bump hadoop-client dependency to 3.1.4 |
#4825 | Initial support for reading decimal types from JSON and CSV |
#4859 | Fallback to CPU when Spark pushes down Aggregates (Min/Max/Count) for ORC |
#4872 | Speed up copying decimal column from parquet buffer to GPU buffer |
#4904 | Relocate Hive UDF Classes |
#4871 | Minor changes to print revision differences when building shims |
#4882 | Disable write/read Parquet when Parquet field IDs are used |
#4858 | Support non-literal index for GpuElementAt and GpuGetArrayItem |
#4875 | Support running GetArrayStructFields on GPU |
#4885 | Enable fuzz testing for Regular Expression repetitions and move remaining edge cases to CPU |
#4869 | Support for hexadecimal digits in regular expressions on the GPU |
#4854 | Avoid regexp_cost with stringSplit on the GPU using transpilation |
#4888 | Clean up leak detection code |
#4901 | fix a broken link in CONTRIBUTING.md[skip ci] |
#4891 | update getting started doc because aws-emr 6.5.0 released[skip ci] |
#4881 | Fix compilation error caused by ClusteredDistribution parameters |
#4890 | Integration-test tests jar for hive UDF tests |
#4878 | Set conda/mamba default to Python version to 3.8 [skip ci] |
#4874 | Fix spark-tests syntax issue [skip ci] |
#4850 | Also check cuda runtime version when using the ASYNC allocator |
#4851 | Add worker ID to temporary table names in tests |
#4847 | Fix test_compress_write_round_trip failure on Spark 3.3 |
#4848 | Profile tool: fix printing of task failed reason |
#4636 | Support str_to_map |
#4835 | Trim parquet_write_test to reduce integration test runtime |
#4819 | Throw exception if casting from double to datetime |
#4838 | Trim cache tests to improve integration test time |
#4839 | Optionally return null if element not exists map/array |
#4822 | Push decimal workarounds to cuDF |
#4619 | Move the udf-examples module to the external repository spark-rapids-examples |
#4844 | Update spark313 dep to released one |
#4827 | Make InternalExclusiveModeGpuDiscoveryPlugin and ExplainPlanImpl as protected class. |
#4836 | Support WindowExec partitioning by Decimal 128 on the GPU |
#4760 | Short circuit AND/OR in ANSI mode |
#4829 | Make bloopInstall version configurable in buildall |
#4823 | Reduce redundancy of decimal testing |
#4715 | Patterns such (3?)+ should now fall back to CPU |
#4809 | Add ignoreCorruptFiles for ORC readers |
#4790 | Improve JSON and CSV parsing of integer values |
#4812 | Default integration test configs to allow negative decimal scale |
#4805 | Avoid output cast by using unsigned type output for GpuExtractChunk32 |
#4804 | Profiling tool can miss datasources when they are GPU reads |
#4797 | Do not check for metadata during schema comparison |
#4785 | Support casting Map to String |
#4794 | Decimal-128 support for mod and pmod |
#4799 | Fix failure to generate worker_id when xdist is not present |
#4742 | Add ignoreCorruptFiles feature for Parquet reader |
#4792 | Ensure GpuM2 merge aggregation does not produce a null mean or m2 |
#4770 | Improve columnarCopy for HostColumnarToGpu |
#4776 | Improve aggregation performance of average on DECIMAL128 columns |
#4786 | Add shims to compare ORC TypeDescription |
#4780 | Improve JSON and CSV support for boolean values |
#4778 | Decrease chance of random collisions in test temporary paths |
#4782 | Check in host leak detection code |
#4781 | Add Spark properties table to profiling tool output |
#4714 | Add regular expression support to string_split |
#4754 | Close SpillableBatch to avoid leaks |
#4758 | Fix merge conflict with branch-22.02 [skip ci] |
#4694 | Add clarifications and details to integration-tests README [skip ci] |
#4740 | Enable regular expressions on GPU by default |
#4735 | Re-enables partial regex support for octal digits on the GPU |
#4737 | Check for a null compression codec when creating ORC OutStream |
#4738 | Change resume-from to aggregator in buildall [skip ci] |
#4698 | Add tests for few json options |
#4731 | Trim join tests to improve runtime of tests |
#4732 | Fix failing serializer tests on Spark 3.3.0 |
#4709 | Update centos 8 dockerfile to handle EOL issue [skip ci] |
#4724 | Debug dump to Parquet support for DECIMAL128 columns |
#4688 | Optimize DECIMAL128 sum aggregations |
#4692 | Add FAQ entry to discuss executor task concurrency configuration [skip ci] |
#4588 | Optimize semaphore acquisition in GpuShuffledHashJoinExec |
#4697 | Add preliminary test and test framework changes for ExistanceJoin |
#4716 | GpuStringSplit should return an array on not-null elements |
#4611 | Support BitLength and OctetLength |
#4408 | Use the ORC version that corresponds to the Spark version |
#4686 | Fall back to CPU for queries referencing hidden metadata columns |
#4669 | Prevent deadlock between RapidsBufferStore and RapidsBufferBase on close |
#4707 | Fix auto merge conflict 4705 [skip ci] |
#4690 | Fix map_test ANSI failure in Spark 3.3.0 |
#4681 | Reimplement check for non-regexp strings using RegexParser |
#4683 | Fix documentation link, clarify documentation [skip ci] |
#4677 | Make Collect, first and last as deterministic aggregate functions for Spark-3.3 |
#4682 | Enable test for LIKE with embedded null character |
#4673 | Allow GpuWindowExec to partition on structs |
#4637 | Improve support for reading CSV and JSON floating-point values |
#4629 | Remove shims module |
#4648 | Append new authorized user to blossom-ci safelist |
#4623 | Fallback to CPU when aggregate push down used for parquet |
#4606 | Set default RMM pool to ASYNC for cuda 11.2+ |
#4531 | Use libcudf mixed joins for conditional hash semi and anti joins |
#4624 | Enable integration test results report on Jenkins [skip ci] |
#4597 | Update plugin version to 22.04.0-SNAPSHOT |
#4592 | Adds SQL function HYPOT using the GPU |
#4504 | Implement AST-based regular expression fuzz tests |
#4560 | Make shims.v2.ParquetCachedBatchSerializer as protected |
#4305 | [FEA] write nvidia tool wrappers to allow old YARN versions to work with MIG |
#4410 | [FEA] ReplicateRows - Support ReplicateRows for decimal 128 type |
#4360 | [FEA] Add explain api for Spark 2.X |
#3541 | [FEA] Support max on single-level struct in aggregation context |
#4238 | [FEA] Add a Spark 3.X Explain only mode to the plugin |
#3952 | [Audit] [FEA][SPARK-32986][SQL] Add bucketed scan info in query plan of data source v1 |
#4412 | [FEA] Improve support for \A, \Z, and \z in regular expressions |
#3979 | [FEA] Improvements for CPU(Row) based UDF |
#4467 | [FEA] Add support for regular expression with repeated digits (\d+ , \d* , \d? ) |
#4439 | [FEA] Enable GPU broadcast exchange reuse for DPP when AQE enabled |
#3512 | [FEA] Support org.apache.spark.sql.catalyst.expressions.Sequence |
#3475 | [FEA] Spark 3.2.0 reads Parquet unsigned int64(UINT64) as Decimal(20,0) but CUDF does not support it |
#4091 | [FEA] regexp_replace: Improve support for ^ and $ |
#4104 | [FEA] Support org.apache.spark.sql.catalyst.expressions.ReplicateRows |
#4027 | [FEA] Support SubqueryBroadcast on GPU to enable exchange reuse during DPP |
#4284 | [FEA] Support idx = 0 in GpuRegExpExtract |
#4002 | [FEA] Implement regexp_extract on GPU |
#3221 | [FEA] Support GpuFirst and GpuLast on nested types under reduction aggregations |
#3944 | [FEA] Full support for sum with overflow on Decimal 128 |
#4028 | [FEA] support GpuCast from non-nested ArrayType to StringType |
#3250 | [FEA] Make CreateMap duplicate key handling compatible with Spark and enable CreateMap by default |
#4170 | [FEA] Make regular expression behavior with $ and \r consistent with CPU |
#4001 | [FEA] Add regexp support to regexp_replace |
#3962 | [FEA] Support null characters in regular expressions in RLIKE |
#3797 | [FEA] Make RLike support consistent with Apache Spark |
#4392 | [FEA] could the parquet scan code avoid acquiring the semaphore for an empty batch? |
#679 | [FEA] move some deserialization code out of the scope of the gpu-semaphore to increase cpu concurrent |
#4350 | [FEA] Optimize the all-true and all-false cases in GPU If and CaseWhen |
#4309 | [FEA] Leverage cudf conditional nested loop join to implement semi/anti hash join with condition |
#4395 | [FEA] acquire the semaphore after concatToHost in GpuShuffleCoalesceIterator |
#4134 | [FEA] Allow EliminateJoinToEmptyRelation in GpuBroadcastExchangeExec |
#4189 | [FEA] understand why between is so expensive |
#4725 | [DOC] Broken links in guide doc |
#4675 | [BUG] Jenkins integration build timed out at 10 hours |
#4665 | [BUG] Spark321Shims.getParquetFilters failed with NoSuchMethodError |
#4635 | [BUG] nvidia-smi wrapper script ignores ENABLE_NON_MIG_GPUS=1 on a heterogeneous multi-GPU machine |
#4500 | [BUG] Build failures against Spark 3.2.1 rc1 and make 3.2.1 non snapshot |
#4631 | [BUG] Release build with mvn option -P source-javadoc FAILED |
#4625 | [BUG] NDS query 5 fails with AdaptiveSparkPlanExec assertion |
#4632 | [BUG] Build failing for Spark 3.3.0 due to deprecated method warnings |
#4599 | [BUG] test_group_apply_udf and test_group_apply_udf_more_types hangs on Databricks 9.1 |
#4600 | [BUG] crash if we have a decimal128 in a struct in an array |
#4581 | [BUG] Build error "GpuOverrides.scala:924: wrong number of arguments" on DB9.1.x spark-3.1.2 |
#4593 | [BUG] dup GpuHashJoin.diff case-folding issue |
#4559 | [BUG] regexp_replace with replacement string containing \ can produce incorrect results |
#4503 | [BUG] regexp_replace with back references produces incorrect results on GPU |
#4567 | [BUG] Profile tool hangs in compare mode |
#4315 | [BUG] test_hash_reduction_decimal_overflow_sum[30] failed OOM in integration tests |
#4551 | [BUG] protobuf-java version changed to 3.x |
#4499 | [BUG]GpuSequence blows up when nulls exist in any of the inputs (start, stop, step) |
#4454 | [BUG] Shade warnings when building the tools artifact |
#4541 | [BUG] Column vector leak in conditionals_test.py |
#4514 | [BUG] test_hash_reduction_pivot_without_nans failed |
#4521 | [BUG] Inconsistencies in handling of newline characters and string and line anchors |
#4548 | [BUG] ai.rapids.cudf.CudaException: an illegal instruction was encountered in databricks 9.1 |
#4475 | [BUG] \D and \W match newline in Spark but not in cuDF |
#1866 | [BUG] GpuFileFormatWriter does not close the data writer |
#4524 | [BUG] RegExp transpiler fails to detect some choice expressions that cuDF cannot compile |
#3226 | [BUG]OOM happened when do cube operations |
#2504 | [BUG] OOM when running NDS queries with UCX and GDS |
#4273 | [BUG] Rounding past the size that can be stored in a type produces incorrect results |
#4060 | [BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed intermittently |
#4039 | [BUG] Spark 3.3.0 IT Array test failures |
#3849 | [BUG] In ANSI mode we can fail in cases Spark would not due to conditionals |
#4445 | [BUG] mvn clean prints an error message on a clean dir |
#4421 | [BUG] the driver is trying to load CUDA with latest 22.02 |
#4455 | [BUG] join_test.py::test_struct_self_join[IGNORE_ORDER({'local': True})] failed in spark330 |
#4442 | [BUG] mvn build FAILED with option -P noSnapshotsWithDatabricks |
#4281 | [BUG] q9 regression between 21.10 and 21.12 |
#4280 | [BUG] q88 regression between 21.10 and 21.12 |
#4422 | [BUG] Host column vectors are being leaked during tests |
#4446 | [BUG] GpuCast crashes when casting from Array with unsupportable child type |
#4432 | [BUG] nightly build 3.3.0 failed: HashClusteredDistribution is not a member of org.apache.spark.sql.catalyst.plans.physical |
#4443 | [BUG] SPARK-37705 breaks parquet filters from Spark 3.3.0 and Spark 3.2.2 onwards |
#4316 | [BUG] Exception: Unable to find py4j, your SPARK_HOME may not be configured correctly intermittently |
#4378 | [BUG] udf_test udf_cudf_test failed require_minimum_pandas_version check in spark 320+ |
#4423 | [BUG] Build is failing due to FileScanRDD changes in Spark 3.3.0-SNAPSHOT |
#4401 | [BUG]array_test.py::test_array_contains failures |
#4403 | [BUG] NDS query 72 logs codegen fallback exception and produces incorrect results |
#4386 | [BUG] conditionals_test.py FAILED with side_effects_cast[Integer/Long] on Databricks 9.1 Runtime |
#3934 | [BUG] Dependencies of published integration tests jar are missing |
#4341 | [BUG] GpuCast.scala:nnn warning: discarding unmoored doc comment |
#4356 | [BUG] nightly spark303 deploy pulling spark301 aggregator |
#4347 | [BUG] Dist jar pom lists aggregator jar as dependency |
#4176 | [BUG] ParseDateTimeSuite UT failed |
#4292 | [BUG] no meaningful message is surfaced to maven when binary-dedupe fails |
#4351 | [BUG] Tests FAILED On SPARK-3.2.0, com.nvidia.spark.rapids.SerializedTableColumn cannot be cast to com.nvidia.spark.rapids.GpuColumnVector |
#4346 | [BUG] q73 decimal was twice as slow in weekly results |
#4334 | [BUG] GpuColumnarToRowExec will always be tagged False for exportColumnarRdd after Spark311 |
#4339 | The parameter dataType is not necessary in resolveColumnVector method. |
#4275 | [BUG] Row-based Hive UDF will fail if arguments contain a foldable expression. |
#4229 | [BUG] regexp_replace [^a] has different behavior between CPU and GPU for multiline strings |
#4294 | [BUG] parquet_write_test.py::test_ts_write_fails_datetime_exception failed in spark 3.1.1 and 3.1.2 |
#4205 | [BUG] Get different results when casting from timestamp to string |
#4277 | [BUG] cudf_udf nightly cudf import rmm failed |
#4246 | [BUG] Regression in CastOpSuite due to cuDF change in parsing NaN |
#4243 | [BUG] test_regexp_replace_null_pattern_fallback[ALLOW_NON_GPU(ProjectExec,RegExpReplace)] failed in databricks |
#4244 | [BUG] Cast from string to float using hand-picked values failed |
#4227 | [BUG] RAPIDS Shuffle Manager doesn't fallback given encryption settings |
#3374 | [BUG] minor deprecation warnings in a 3.2 shim build |
#3613 | [BUG] release312db profile pulls in 311until320-apache |
#4213 | [BUG] unused method with a misleading outdated comment in ShimLoader |
#3609 | [BUG] GpuShuffleExchangeExec in v2 shims has inconsistent packaging |
#4127 | [BUG] CUDF 22.02 nightly test failure |
#4773 | Update 22.02 changelog to latest [skip ci] |
#4771 | revert cudf api links from legacy to stable[skip ci] |
#4767 | Update 22.02 changelog to latest [skip ci] |
#4750 | Updated doc for decimal support |
#4757 | Update qualification tool to remove DECIMAL 128 as potential problem |
#4755 | Fix databricks doc for limitations.[skip ci] |
#4751 | Fix broken hyperlinks in documentation [skip ci] |
#4706 | Update 22.02 changelog to latest [skip ci] |
#4700 | Update cudfjni version to released 22.02.0 |
#4701 | Decrease nighlty tests upper limitation to 7 [skip ci] |
#4639 | Update changelog for 22.02 and archive info of some older releases [skip ci] |
#4572 | Add download page for 22.02 [skip ci] |
#4672 | Revert "Disable 311cdh build due to missing dependency (#4659)" |
#4662 | Update the deploy script [skip ci] |
#4657 | Upmerge spark2 directory to the latest 22.02 changes |
#4659 | Disable 311cdh build by default because of a missing dependency |
#4508 | Fix Spark 3.2.1 build failures and make it non-snapshot |
#4652 | Remove non-deterministic test order in nightly [skip ci] |
#4643 | Add profile release301 when mvn help:evaluate |
#4630 | Fix the incomplete capture of SubqueryBroadcast |
#4633 | Suppress newTaskTempFile method warnings for Spark 3.3.0 build |
#4618 | [DB31x] Pick the correct Python runner for flatmap-group Pandas UDF |
#4622 | Fallback to CPU when encoding is not supported for JSON reader |
#4470 | Add in HashPartitioning support for decimal 128 |
#4535 | Revert "Disable orc write by default because of https://issues.apache.org/jira/browse/ORC-1075 (#4471)" |
#4583 | Avoid unapply on PromotePrecision |
#4573 | Correct version from 21.12 to 22.02[skip ci] |
#4575 | Correct and update links in UDF doc[skip ci] |
#4501 | Switch and/or to use new cudf binops to improve performance |
#4594 | Resolve case-folding issue [skip ci] |
#4585 | Spark2 module upmerge, deploy script, and updates for Jenkins |
#4589 | Increase premerge databricks IDLE_TIMEOUT to 4 hours [skip ci] |
#4485 | Add json reader support |
#4556 | regexp_replace with back-references should fall back to CPU |
#4569 | Fix infinite loop with Profiling tool compare mode and app with no sql ids |
#4529 | Add support for Spark 2.x Explain Api |
#4577 | Revert "Fix CVE-2021-22569 (#4545)" |
#4520 | GpuSequence refactor |
#4570 | A few quick fixes to try to reduce max memory usage in the tests |
#4477 | Use libcudf mixed joins for conditional hash joins |
#4566 | remove scala-library from combined tools jar |
#4552 | Fix resource leak in GpuCaseWhen |
#4553 | Reenable test_hash_reduction_pivot_without_nans |
#4530 | Fix correctness issues in regexp and add \r and \n to fuzz tests |
#4549 | Fix typos in integration tests README [skip ci] |
#4545 | Fix CVE-2021-22569 |
#4543 | Enable auto-merge from branch-22.02 to branch-22.04 [skip ci] |
#4540 | Remove user kuhushukla |
#4434 | Support max on single-level struct in aggregation context |
#4534 | Temporarily disable integration test - test_hash_reduction_pivot_without_nans |
#4322 | Add an explain only mode to the plugin |
#4497 | Make better use of pinned memory pool |
#4512 | remove hadoop version requirement[skip ci] |
#4527 | Fall back to CPU for regular expressions containing \D or \W |
#4525 | Properly close data writer in GpuFileFormatWriter |
#4502 | Removed the redundant test for element_at and fixed the failing one |
#4523 | Add more integration tests for decimal 128 |
#3762 | Call the right method to convert table from row major <=> col major |
#4482 | Simplified the construction of zero scalar in GpuUnaryMinus |
#4510 | Update copyright in NOTICE [skip ci] |
#4484 | Update GpuFileFormatWriter to stay in sync with recent Spark changes, but still not support writing Hive bucketed table on GPU. |
#4492 | Fall back to CPU for regular expressions containing hex digits |
#4495 | Enable approx_percentile by default |
#4420 | Fix up incorrect results of rounding past the max digits of data type |
#4483 | Update test case of reading nested unsigned parquet file |
#4490 | Remove warning about RMM default allocator |
#4461 | [Audit] Add bucketed scan info in query plan of data source v1 |
#4489 | Add arrays of decimal128 to join tests |
#4476 | Don't acquire the semaphore for empty input while scanning |
#4424 | Improve support for regular expression string anchors \A , \Z , and \z |
#4491 | Skip the test for spark versions 3.1.1, 3.1.2 and 3.2.0 only |
#4459 | Use merge sort for struct types in non-key columns |
#4494 | Append new authorized user to blossom-ci whitelist [skip ci] |
#4400 | Enable approx percentile tests |
#4471 | Disable orc write by default because of https://issues.apache.org/jira/browse/ORC-1075 |
#4462 | Rename DECIMAL_128_FULL and rework usage of TypeSig.gpuNumeric |
#4479 | Change signoff check image to slim-buster [skip ci] |
#4464 | Throw SparkArrayIndexOutOfBoundsException for Spark 3.3.0+ |
#4469 | Support repetition of \d and \D in regexp functions |
#4472 | Modify docs for 22.02 to address issue-4319[skip ci] |
#4440 | Enable GPU broadcast exchange reuse for DPP when AQE enabled |
#4376 | Add sequence support |
#4460 | Abstract the text based PartitionReader |
#4383 | Fix correctness issue with CASE WHEN with expressions that have side-effects |
#4465 | Refactor for shims 320+ |
#4463 | Avoid replacing a hash join if build side is unsupported by the join type |
#4456 | Fix build issues: 1 clean non-exists target dirs; 2 remove duplicated plugin |
#4416 | Unshim join execs |
#4172 | Support String to Decimal 128 |
#4458 | Exclude some metadata operators when checking GPU replacement |
#4451 | Some metrics improvements and timeline reporting |
#4435 | Disable add profile src execution by default to make the build log clean |
#4436 | Print error log to stderr output |
#4155 | Add partial support for line begin and end anchors in regexp_replace |
#4428 | Exhaustively iterate ColumnarToRow iterator to avoid leaks |
#4430 | update pca example link in ml-integration.md[skip ci] |
#4452 | Limit parallelism of nightly tests [skip ci] |
#4449 | Add recursive type checking and fallback tests for casting array with unsupported element types to string |
#4437 | Change logInfo to logWarning |
#4447 | Fix 330 build error and add 322 shims layer |
#4417 | Fix an Intellij debug issue |
#4431 | Add DateType support for AST expressions |
#4433 | Import the right pandas from conda [skip ci] |
#4419 | Import the right pandas from conda |
#4427 | Update getFileScanRDD shim for recent changes in Spark 3.3.0 |
#4397 | Ignore cufile.log |
#4388 | Add support for ReplicateRows |
#4399 | Update docs for Profiling and Qualification tool to change wording |
#4407 | Fix GpuSubqueryBroadcast on multi-fields relation |
#4396 | GpuShuffleCoalesceIterator acquire semaphore after host concat |
#4361 | Accommodate altered semantics of cudf::lists::contains() |
#4394 | Use correct column name in GpuIf test |
#4385 | Add missing GpuSubqueryBroadcast replacement rule for spark31x |
#4387 | Fix auto merge conflict 4384[skip ci] |
#4374 | Fix the IT module depends on the tests module |
#4365 | Not publishing integration_tests jar to Maven Central [skip ci] |
#4358 | Update GpuIf to support expressions with side effects |
#4382 | Remove unused scallop dependency from integration_tests |
#4364 | Replace Scala document with Scala comment for inner functions |
#4373 | Add pytest tags for nightly test parallel run [skip ci] |
#4150 | Support GpuSubqueryBroadcast for DPP |
#4372 | Move casting to string tests from array_test.py and struct_test.py to cast_test.py |
#4371 | Fix typo in skipTestsFor330 calculation [skip ci] |
#4355 | Dedicated deploy-file with reduced pom in nightly build [skip ci] |
#4352 | Revert "Ignore failing string to timestamp tests temporarily (#4197)" |
#4359 | Audit - SPARK-37268 - Remove unused variable in GpuFileScanRDD [Databricks] |
#4327 | Print meaningful message when calling scripts in maven |
#4354 | Fix regression in AQE optimizations |
#4343 | Fix issue with binding to hash agg columns with computation |
#4285 | Add support for regexp_extract on the GPU |
#4349 | Fix PYTHONPATH in pre-merge |
#4269 | The option for the nightly script not deploying jars [skip ci] |
#4335 | Fix the issue of exporting Column RDD |
#4336 | Split expensive pytest files in cases level [skip ci] |
#4328 | Change the explanation of why the operator will not work on GPU |
#4338 | Use scala Int.box instead of Integer constructors |
#4340 | Remove the unnecessary parameter dataType in resolveColumnVector method |
#4256 | Allow returning an EmptyHashedRelation when a broadcast result is empty |
#4333 | Add tests about writing empty table to ORC/PAQUET |
#4337 | Support GpuFirst and GpuLast on nested types under reduction aggregations |
#4331 | Fix parquet options builder calls |
#4310 | Fix typo in shim class name |
#4326 | Fix 4315 decrease concurrentGpuTasks to avoid sum test OOM |
#4266 | Check revisions for all shim jars while build all |
#4282 | Use data type to create an inspector for a foldable GPU expression. |
#3144 | Optimize AQE with Spark 3.2+ to avoid redundant transitions |
#4317 | [BUG] Update nightly test script to dynamically set mem_fraction [skip ci] |
#4206 | Porting GpuRowToColumnar converters to InternalColumnarRDDConverter |
#4272 | Full support for SUM overflow detection on decimal |
#4255 | Make regexp pattern [^a] consistent with Spark for multiline strings |
#4306 | Revert commonizing the int96ParquetRebase* functions |
#4299 | Fix auto merge conflict 4298 [skip ci] |
#4159 | Optimize sample perf |
#4235 | Commonize v2 shim |
#4274 | Add tests for timestamps that overflowed before. |
#4271 | Skip test_regexp_replace_null_pattern_fallback on Spark 3.1.1 and later |
#4278 | Use mamba for cudf conda install [skip ci] |
#4270 | Document exponent differences when casting floating point to string [skip ci] |
#4268 | Fix merge conflict with branch-21.12 |
#4093 | Add tests for regexp() and regexp_like() |
#4259 | fix regression in cast from string to float that caused signed NaN to be considered valid |
#4241 | fix bug in parsing regex character classes that start with ^ and contain an unescaped ] |
#4224 | Support row-based Hive UDFs |
#4221 | GpuCast from ArrayType to StringType |
#4007 | Implement duplicate key handling for GpuCreateMap |
#4251 | Skip test_regexp_replace_null_pattern_fallback on Databricks |
#4247 | Disable failing CastOpSuite test |
#4239 | Make EOL anchor behavior match CPU for strings ending with newline |
#4153 | Regexp: Only transpile once per expression rather than once per batch |
#4230 | Change to build tools module with all the versions by default |
#4223 | Fixes a minor deprecation warning |
#4215 | Rebalance testing load |
#4214 | Fix pre_merge ci_2 [skip ci] |
#4212 | Remove an unused method with its outdated comment |
#4211 | Update test_floor_ceil_overflow to be more lenient on exception type |
#4203 | Move all the GpuShuffleExchangeExec shim v2 classes to org.apache.spark |
#4193 | Rename 311until320-apache to 311until320-noncdh |
#4197 | Ignore failing string to timestamp tests temporarily |
#4160 | Fix merge issues for branch 22.02 |
#4081 | Convert String to DecimalType without casting to FloatType |
#4132 | Fix auto merge conflict 4131 [skip ci] |
#4099 | [REVIEW] Init version 22.02.0 |
#4113 | Fix pre-merge CI 2 conditions [skip ci] |
#4064 | Regex: transpile . to [^\r\n] in cuDF |
#4044 | RLike: Fall back to CPU for regex that would produce incorrect results |
#1571 | [FEA] Better precision range for decimal multiply, and possibly others |
#3953 | [FEA] Audit: Add array support to union by name |
#4085 | [FEA] Decimal 128 Support: Concat |
#4073 | [FEA] Decimal 128 Support: MapKeys, MapValues, MapEntries |
#3432 | [FEA] Qualification tool checks if there is any "Scan JDBCRelation" and count it as "problematic" |
#3824 | [FEA] Support MapType in ParquetCachedBatchSerializer |
#4048 | [FEA] WindowExpression support for Decimal 128 in Spark 320 |
#4047 | [FEA] Literal support for Decimal 128 in Spark 320 |
#3863 | [FEA] Add Spark 3.3.0-SNAPSHOT Shim |
#3814 | [FEA] stddev stddev_samp and std should be supported over a window |
#3370 | [FEA] Add support for Databricks 9.1 runtime |
#3876 | [FEA] Support REGEXP_REPLACE to replace null values |
#3784 | [FEA] Support ORC write Map column(single level) |
#3470 | [FEA] Add shims for 3.2.1-SNAPSHOT |
#3855 | [FEA] CPU based UDF to run efficiently and transfer data back to GPU for supported operations |
#3739 | [FEA] Provide an explicit config for fallback on CPU if plan rewrite fails |
#3888 | [FEA] Decimal 128 Support: Add a "Trust me I know it will not overflow config" |
#3088 | [FEA] Profile tool print problematic operations |
#3886 | [FEA] Decimal 128 Support: Extend the range for Decimal Multiply and Divide |
#79 | [FEA] Support Size operation |
#3880 | [FEA] Decimal 128 Support: Average aggregation |
#3659 | [FEA] External tool integration with Qualification tool |
#2 | [FEA] RLIKE support |
#3192 | [FEA] Support decimal type in ORC writer |
#3419 | [FEA] Add support for org.apache.spark.sql.execution.SampleExec |
#3535 | [FEA] Qualification tool can detect RDD APIs in SQL plan |
#3494 | [FEA] Support structs in ORC writer |
#3514 | [FEA] Support collect_set on struct in aggregation context |
#3515 | [FEA] Support CreateArray to produce array(struct) |
#3116 | [FEA] Support Maps, Lists, and Structs as non-key columns on joins |
#2054 | [FEA] Add support for Arrays to ParquetCachedBatchSerializer |
#3573 | [FEA] Support Cache(PCBS) Array-of-Struct |
#3768 | [DOC] document databricks init script required for UCX |
#2867 | [FEA] Make LZ4_CHUNK_SIZE configurable |
#3832 | [FEA] AST enabled GpuBroadcastNestedLoopJoin left side can't be small |
#3798 | [FEA] bounds checking in joins can be expensive |
#3603 | [FEA] Allocate UCX bounce buffers outside of RMM if ASYNC allocator is enabled |
#4253 | [BUG] Dependencies missing of spark-rapids v21.12.0 release jars |
#4216 | [BUG] AQE Crashing Spark RAPIDS when using filter() and union() |
#4188 | [BUG] data corruption in GpuBroadcastNestedLoopJoin with empty relations edge case |
#4191 | [BUG] failed to read DECIMAL128 within MapType from ORC |
#4175 | [BUG] arithmetic_ops_test failed in spark 3.2.0 |
#4162 | [BUG] isCastDecimalToStringEnabled is never called |
#3894 | [BUG] test_pandas_scalar_udf and test_pandas_map_udf failed in UCX standalone CI run |
#3970 | [BUG] mismatching timezone settings on executor and driver can cause ORC read data corruption |
#4141 | [BUG] Unable to start the RapidsShuffleManager in databricks 9.1 |
#4102 | [BUG] udf-example build failed: Unknown CMake command "cpm_check_if_package_already_added". |
#4084 | [BUG] window on unbounded preceeding and unbounded following can produce incorrect results. |
#3990 | [BUG] Scaladoc link warnings in ParquetCachedBatchSerializer and ExplainPlan |
#4108 | [BUG] premerge fails due to Spark 3.3.0 HadoopFsRelation after SPARK-37289 |
#4042 | [BUG] cudf_udf tests fail on nightly Integration test run |
#3743 | [BUG] Implicitly catching all exceptions warning in GpuOverrides |
#4069 | [BUG] parquet_test.py pytests FAILED on Databricks-9.1-ML-spark-3.1.2 |
#3461 | [BUG] Cannot build project from a sub-directory |
#4053 | [BUG] buildall uses a stale aggregator dependency during test compilation |
#3703 | [BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed with TypeError |
#3706 | [BUG] approx_percentile returns array of zero percentiles instead of null in some cases |
#4017 | [BUG] Why is the hash aggregate not handling empty result expressions |
#3994 | [BUG] can't open notebook 'docs/demo/GCP/mortgage-xgboost4j-gpu-scala.ipynb' |
#3996 | [BUG] Exception happened when getting a null row |
#3999 | [BUG] Integration cache_test failures - ArrayIndexOutOfBoundsException |
#3532 | [BUG] DatabricksShimVersion must carry runtime version info |
#3834 | [BUG] Approx_percentile deserialize error when calling "show" rather than "collect" |
#3992 | [BUG] failed create-parallel-world in databricks build |
#3987 | [BUG] "mvn clean package -DskipTests" is no longer working |
#3866 | [BUG] RLike integration tests failing on Azure Databricks 7.3 |
#3980 | [BUG] udf-example build failed due to maven-antrun-plugin upgrade |
#3966 | [BUG] udf-examples module fails on mvn compile and mvn test |
#3977 | [BUG] databricks aggregator jar deployed failed |
#3915 | [BUG] typo in verify_same_sha_for_unshimmed prevents the offending class file name from being logged. |
#1304 | [BUG] Query fails with HostColumnarToGpu doesn't support Structs |
#3924 | [BUG] ExpressionEncoder does not work for input in GpuScalaUDF |
#3911 | [BUG] CI fails on an inconsistent set of partial builds |
#2896 | [BUG] Extra GpuColumnarToRow when using ParquetCachedBatchSerializer on databricks |
#3864 | [BUG] test_sample_produce_empty_batch failed in dataproc |
#3823 | [BUG]binary-dedup.sh script fails on mac |
#3658 | [BUG] DataFrame actions failing with error: Error : java.lang.NoClassDefFoundError: Could not initialize class com.nvidia.spark.rapids.GpuOverrides withlatest 21.10 jars |
#3857 | [BUG] nightly build push dist packge w/ single version of spark |
#3854 | [BUG] not found: type PoissonDistribution in databricks build |
#3852 | spark-nightly-build deploys all modules due to typo in -pl |
#3844 | [BUG] nightly spark311cdh build failed |
#3843 | [BUG] databricks nightly deploy failed |
#3705 | [BUG] Change nullOnDivideByZero from runtime parameter to aggregate expression for stddev and variance aggregation families |
#3614 | [BUG] ParquetMaterializer.scala appears in both v1 and v2 shims |
#3430 | [BUG] Profiling tool silently stops without producing any output on a Synapse Spark event log |
#3311 | [BUG] cache_test.py failed w/ cache.serializer in spark 3.1.2 |
#3710 | [BUG] Usage of Class.forName without specifying a classloader |
#3462 | [BUG] IDE complains about duplicate ShimBasePythonRunner instances |
#3476 | [BUG] test_non_empty_ctas fails on yarn |
#4362 | Decimal128 support for Parquet |
#4391 | update gcp custom dataproc image version to avoid log4j issue[skip ci] |
#4379 | update hot fix cudf link v21.12.2 |
#4367 | update 21.12 branch for doc [skip ci] |
#4245 | Update changelog 21.12 to latest [skip ci] |
#4258 | Sanitize column names in ParquetCachedBatchSerializer before writing to Parquet |
#4308 | Bump up GPU reserve memory to 640MB |
#4307 | Update Download page for 21.12 [skip ci] |
#4261 | Update cudfjni version to released 21.12.0 |
#4265 | Remove aggregator dependency before deploying dist artifact |
#4030 | Support code coverage report with single version jar [skip ci] |
#4287 | Update 21.12 compatibility guide for known regexp issue [skip ci] |
#4242 | Fix indentation issue in getting-started-k8s guide [skip ci] |
#4263 | Add missing ORC write tests on Map of Decimal |
#4257 | Implement getShuffleRDD and fixup mismatched output types on shuffle reuse |
#4250 | Update the release script [skip ci] |
#4222 | Add arguments support to 'databricks/run-tests.py' |
#4233 | Add databricks init script for UCX |
#4231 | RAPIDS Shuffle Manager fallback if security is enabled |
#4228 | Fix unconditional nested loop joins on empty tables |
#4217 | Enable event log for qualification & profiling tools testing from IT |
#4202 | Parameter for the Databricks zone-id [skip ci] |
#4199 | modify some words for synapse getting started guide[skip ci] |
#4200 | Disable approx percentile tests that intermittently fail |
#4187 | Added a getting started guide for Synapse[skip ci] |
#4192 | Fix ORC read DECIMAL128 inside MapType |
#4173 | Update approx percentile docs to link to issue 4060 [skip ci] |
#4174 | Document Bloop, Metals and VS code as an IDE option [skip ci] |
#4181 | Fix element_at for 3.2.0 and array/struct cast |
#4110 | Add a getting started guide on workload qualification [skip ci] |
#4106 | Add docs for MIG on YARN [skip ci] |
#4100 | Add PCA example to ml-integration page [skip ci] |
#4177 | Decimal128: added missing decimal128 signature on Spark 32X |
#4161 | More integration tests with decimal128 |
#4165 | Fix type checks for get array item in 3.2.0 |
#4163 | Enable config to check for casting decimals to strings |
#4154 | Use num_slices to guarantee partition shape in the pandas udf tests |
#4129 | Check executor timezone is same as driver timezone when running on GPU |
#4139 | Decimal128 Support |
#4128 | Fix build errors in udf-examples native build |
#4063 | Regexp_replace support regexp |
#4125 | Remove unused imports |
#4052 | Support null safe host column vector |
#4116 | Add in tests to check for overflow in unbounded window |
#4111 | Added external doc links for JRE and Spark |
#4105 | Enforce checks for unused imports and missed interpolation |
#4107 | Set the task context in background reader threads |
#4114 | Refactoring cudf_udf test setup |
#4109 | Stop using redundant partitionSchemaOption dropped in 3.3.0 |
#4097 | Enable auto-merge from branch-21.12 to branch-22.02 [skip ci] |
#4094 | Remove spark311db shim layer |
#4082 | Add abfs and abfss to the cloud scheme |
#4071 | Treat scalac warnings as errors |
#4043 | Promote cudf as dist direct dependency, mark aggregator provided |
#4076 | Sets the GPU device id in the UCX early start thread |
#4087 | Regex parser improvements and bug fixes |
#4079 | verify "Add array support to union by name " by adding an integration test |
#4090 | Update pre-merge expression for 2022+ CI [skip ci] |
#4049 | Change Databricks image from 8.2 to 9.1 [skip ci] |
#4051 | Upgrade ORC version from 1.5.8 to 1.5.10 |
#4080 | Add case insensitive when clipping parquet blocks |
#4083 | Fix compiler warning in regex transpiler |
#4070 | Support building from sub directory |
#4072 | Fix overflow checking on optimized decimal sum |
#4067 | Append new authorized user to blossom-ci whitelist [skip ci] |
#4066 | Temply disable cudf_udf test |
#4057 | Restore original ASL 2.0 license text |
#3937 | Qualification tool: Detect JDBCRelation in eventlog |
#3925 | verify AQE and DPP both on |
#3982 | Fix the issue of parquet reading with case insensitive schema |
#4054 | Use install for the base version build thread [skip ci] |
#4008 | [Doc] Update the getting started guide for databricks: Change from 8.2 to 9.1 runtime [skip ci] |
#4010 | Enable MapType for ParquetCachedBatchSerializer |
#4046 | lower GPU memory reserve to 256MB |
#3770 | Enable approx percentile tests |
#4038 | Change the catalystConverter to be a Scala val . |
#4035 | Hash aggregate fix empty resultExpressions |
#3998 | Check for CPU cores and free memory in IT script |
#3984 | Check for data write command before inserting hash sort optimization |
#4019 | initialize RMM with a single pool size |
#3993 | Qualification tool: Remove "unsupported" word for nested complex types |
#4033 | skip spark 330 tests temporarily in nightly [skip ci] |
#4029 | Update buildall script and the build doc [skip ci] |
#4014 | fix can't open notebook 'docs/demo/GCP/mortgage-xgboost4j-gpu-scala.ipynb'[skip ci] |
#4024 | Allow using a custom Spark Resource Name for a GPU |
#4012 | Add Apache Spark 3.3.0-SNAPSHOT Shims |
#4021 | Explicitly use the public version of ParquetCachedBatchSerializer |
#3869 | Add Std dev samp for windowing |
#3960 | Use a fixed RMM pool size |
#3767 | Add shim for Databricks 9.1 |
#3862 | Prevent approx_percentile aggregate from being split between CPU and GPU |
#3871 | Add integration test for RLike with embedded null in input |
#3968 | Allow null character in regexp_replace pattern |
#3821 | Support ORC write Map column |
#3991 | Fix aggregator jar copy logic |
#3973 | Add shims for Apache Spark 3.2.1-SNAPSHOT builds |
#3967 | Bring back AST support for BNLJ inner joins |
#3947 | Enable rlike tests on databricks |
#3981 | Replace tasks w/ target of maven-antrun-plugin in udf-example |
#3976 | Replace long artifact lists with an ant loop |
#3972 | Revert udf-examples dependency change to restore test build phase |
#3978 | Update aggregator jar name in databricks deploy script |
#3965 | Add how-to resolve auto-merge conflict [skip ci] |
#3963 | Add a dedicated RapidsConf option to tolerate GpuOverrides apply failures |
#3923 | Prepare for 3.2.1 shim, various shim build fixes and improvements |
#3969 | add doc on using compute-sanitizer |
#3964 | Qualification tool: Catch exception for invalid regex patterns |
#3961 | Avoid using HostColumnarToGpu for nested types |
#3910 | Refactor the aggregate API |
#3897 | Support running CPU based UDF efficiently |
#3950 | Fix failed auto-merge #3939 |
#3946 | Document compatability of operations with side effects. |
#3945 | Update udf-examples dependencies to use dist jar |
#3938 | remove GDS alignment code |
#3943 | Add artifact revisions check for nightly tests [skip ci] |
#3933 | Profiling tool: Print potential problems |
#3926 | Add zip unzip to integration tests dockerfiles [skip ci] |
#3757 | Update to nvcomp-2.x JNI APIs |
#3922 | Stop using -U in build merges aggregator jars of nightly [skip ci] |
#3907 | Add version properties to integration tests modules |
#3912 | Stop using -U in the build that merges all aggregator jars |
#3909 | Fix warning when catching all throwables in GpuOverrides |
#3766 | Use JCudfSerialization to deserialize a table to host columns |
#3820 | Advertise CPU orderingSatisfies |
#3858 | update emr 6.4 getting started doc and pic[skip ci] |
#3899 | Fix sample test cases |
#3896 | Xfail the sample tests temporarily |
#3848 | Fix binary-dedupe failures and improve its performance on macOS |
#3867 | Disable rlike integration tests on Databricks |
#3850 | Add explain Plugin API for CPU plan |
#3868 | Fix incorrect schema of nested types of union - audit SPARK-36673 |
#3860 | Add unit test for GpuKryoRegistrator |
#3847 | Add Running Qualification App API |
#3861 | Revert "Fix typo in nightly deploy project list (#3853)" [skip ci] |
#3796 | Add Rlike support |
#3856 | Fix not found: type PoissonDistribution in databricks build |
#3853 | Fix typo in nightly deploy project list |
#3831 | Support decimal type in ORC writer |
#3789 | GPU sample exec |
#3846 | Include pluginRepository for cdh build |
#3819 | Qualification tool: Detect RDD Api's in SQL plan |
#3835 | Minor cleanup: do not set cuda stream to null |
#3845 | Include 'DB_SHIM_NAME' from Databricks jar path to fix nightly deploy [skip ci] |
#3523 | Interpolate spark.version.classifier in build.dir |
#3813 | Change nullOnDivideByZero from runtime parameter to aggregate expression for stddev and variance aggregations |
#3791 | Add audit script to get list of commits from Apache Spark master branch |
#3744 | Add developer documentation for setting up Microk8s [skip ci] |
#3817 | Fix auto-merge conflict 3816 [skip ci] |
#3804 | Missing statistics in GpuBroadcastNestedLoopJoin |
#3799 | Optimize out bounds checking for joins when the gather map has only valid entries |
#3801 | Update premerge to use the combined snapshots jar |
#3696 | Support nested types in ORC writer |
#3790 | Fix overflow when casting integral to neg scale decimal |
#3779 | Enable some union of structs tests that were marked xfail |
#3787 | Fix auto-merge conflict 3786 from branch-21.10 [skip ci] |
#3782 | Fix auto-merge conflict 3781 [skip ci] |
#3778 | Remove extra ParquetMaterializer.scala file |
#3773 | Restore disabled ORC and Parquet tests |
#3714 | Qualification tool: Error handling while processing large event logs |
#3758 | Temporarily disable timestamp read tests for Parquet and ORC |
#3748 | Fix merge conflict with branch-21.10 |
#3700 | CollectSet supports structs |
#3740 | Throw Exception if failure to load ParquetCachedBatchSerializer class |
#3726 | Replace Class.forName with ShimLoader.loadClass |
#3690 | Added support for Array[Struct] to GpuCreateArray |
#3728 | Qualification tool: Fix bug to process correct listeners |
#3734 | Fix squashed merge from #3725 |
#3725 | Fix merge conflict with branch-21.10 |
#3680 | cudaMalloc UCX bounce buffers when async allocator is used |
#3681 | Clean up and document metrics |
#3674 | Move file TestingV2Source.Scala |
#3617 | Update Version to 21.12.0-SNAPSHOT |
#3612 | Add support for nested types as non-key columns on joins |
#3619 | Added support for Array of Structs |
#1601 | [FEA] Support AggregationFunction StddevSamp |
#3223 | [FEA] Rework the shim layer to robustly handle ABI and API incompatibilities across Spark releases |
#13 | [FEA] Percentile support |
#3606 | [FEA] Support approx_percentile on GPU with decimal type |
#3552 | [FEA] extend allowed datatypes for add and multiply in ANSI mode |
#3450 | [FEA] test the UCX shuffle with the new build changes |
#3043 | [FEA] Qualification tool: Add support to filter specific configuration values |
#3413 | [FEA] Add in support for transform_keys |
#3297 | [FEA] ORC reader supports reading Map columns. |
#3367 | [FEA] Support GpuRowToColumnConverter on BinaryType |
#3380 | [FEA] Support CollectList/CollectSet on nested input types in GroupBy aggregation |
#1923 | [FEA] Fall back to the CPU when LEAD/LAG wants to IGNORE NULLS |
#3044 | [FEA] Qualification tool: Report the nested data types |
#3045 | [FEA] Qualification tool: Report the write data formats. |
#3224 | [FEA] Add maven compile/package plugin executions, one for each supported Spark dependency version |
#3047 | [FEA] Profiling tool: Structured output format |
#2877 | [FEA] Support HashAggregate on struct and nested struct |
#2916 | [FEA] Support GpuCollectList and GpuCollectSet as TypedImperativeAggregate |
#463 | [FEA] Support NESTED_SCHEMA_PRUNING_ENABLED for ORC |
#1481 | [FEA] ORC Predicate pushdown for Nested fields |
#2879 | [FEA] ORC reader supports reading Struct columns. |
#27 | [FEA] test current_date and current_timestamp |
#3229 | [FEA] Improve CreateMap to support multiple key and value expressions |
#3111 | [FEA] Support conditional nested loop joins |
#3177 | [FEA] Support decimal type in ORC reader |
#3014 | [FEA] Add initial support for CreateMap |
#3110 | [FEA] Support Map as input to explode and pos_explode |
#3046 | [FEA] Profiling tool: Scale to run large number of event logs. |
#3156 | [FEA] Support casting struct to struct |
#2876 | [FEA] Support joins(SHJ and BHJ) on struct as join key with nested struct in the selected column list |
#68 | [FEA] support StringRepeat |
#3042 | [FEA] Qualification tool: Add conjunction and disjunction filters. |
#2615 | [FEA] support collect_list and collect_set as groupby aggregation |
#2943 | [FEA] Support PreciseTimestampConversion when using windowing function |
#2878 | [FEA] Support Sort on nested struct |
#2133 | [FEA] Join support for passing MapType columns along when not join keys |
#3041 | [FEA] Qualification tool: Add filters based on Regex and user name. |
#576 | [FEA] Spark 3.1 orc nested predicate pushdown support |
#3651 | [DOC] Point users to UCX 1.11.2 |
#2370 | [FEA] RAPIDS Shuffle Manager enable/disable config |
#2923 | [FEA] Move to dispatched binops instead of JIT binops |
#3929 | [BUG] published rapids-4-spark dist artifact references aggregator |
#3837 | [BUG] Spark-rapids v21.10.0 release candidate jars failed on the OSS validation check. |
#3769 | [BUG] dedupe fails with find: './parallel-world/spark301/ ...' No such file or directory |
#3783 | [BUG] spark-rapids v21.10.0 release build failed on script "dist/scripts/binary-dedupe.sh" |
#3775 | [BUG] Hash aggregate with structs crashes with IllegalArgumentException |
#3704 | [BUG] Executor-side ClassCastException when testing with Spark 3.2.1-SNAPSHOT in k8s environment |
#3760 | [BUG] Databricks class cast exception failure |
#3736 | [BUG] Crossjoin performance degraded a lot on RAPIDS 21.10 snapshot |
#3369 | [BUG] UDF compiler can cause crashes with unexpected class input |
#3713 | [BUG] AQE shuffle coalesce optimization is broken with Spark 3.2 |
#3720 | [BUG] Qualification tool warnings |
#3718 | [BUG] plugin failing to build for CDH due to missing dependency |
#3653 | [BUG] Issue seen with AQE on in Q5 (possibly others) using Spark 3.2 rc3 |
#3686 | [BUG] binary-dedupe doesn't fail the build on errors |
#3520 | [BUG] Scaladoc warnings emitted during build |
#3516 | [BUG] MultiFileParquetPartitionReader can fail while trying to write the footer |
#3648 | [BUG] test_cast_decimal_to failing in databricks 7.3 |
#3670 | [BUG] mvn test failed compiling rapids-4-spark-tests-next-spark_2.12 |
#3640 | [BUG] q82 regression after #3288 |
#3642 | [BUG] Shims improperly overridden |
#3611 | [BUG] test_no_fallback_when_ansi_enabled failed in databricks |
#3601 | [BUG] Latest 21.10 snapshot jars failing with java.lang.ClassNotFoundException: com.nvidia.spark.rapids.ColumnarRdd with XGBoost |
#3589 | [BUG] Latest 21.10 snapshot jars failing with java.lang.ClassNotFoundException: com.nvidia.spark.ExclusiveModeGpuDiscoveryPlugin |
#3424 | [BUG] Aggregations in ANSI mode do not detect overflows |
#3592 | [BUG] Failed to find data source: com.nvidia.spark.rapids.tests.datasourcev2.parquet.ArrowColumnarDataSourceV2 |
#3580 | [BUG] Class deduplication pulls wrong class for ProxyRapidsShuffleInternalManagerBase |
#3331 | [BUG] Failed to read file into buffer in CuFile.readFromFile in gds standalone test |
#3376 | [BUG] Unit test failures in Spark 3.2 shim build |
#3382 | [BUG] Support years with up to 7 digits when casting from String to Date in Spark 3.2 |
#3266 | CDP - Flakiness in JoinSuite in Integration tests |
#3415 | [BUG] Fix regressions in WindowFunctionSuite with Spark 3.2.0 |
#3548 | [BUG] GpuSum overflow on 3.2.0+ |
#3472 | [BUG] GpuAdd and GpuMultiply do not include failOnError |
#3502 | [BUG] Spark 3.2.0 TimeAdd/TimeSub fail due to new DayTimeIntervalType |
#3511 | [BUG] "Sequence" function fails with "java.lang.UnsupportedOperationException: Not supported on UnsafeArrayData" |
#3518 | [BUG] Nightly tests failed with RMM outstanding allocations on shutdown |
#3383 | [BUG] ParseDateTime should not support special dates with Spark 3.2 |
#3384 | [BUG] AQE does not work with Spark 3.2 due to unrecognized GPU partitioning |
#3478 | [BUG] CastOpSuite and ParseDateTimeSuite failures spark 302 and others |
#3495 | Fix shim override config |
#3482 | [BUG] ClassNotFound error when running a job |
#1867 | [BUG] In Spark 3.2.0 and above dynamic partition pruning and AQE are not mutually exclusive |
#3468 | [BUG] GpuKryoRegistrator ClassNotFoundException |
#3488 | [BUG] databricks 8.2 runtime build failed |
#3429 | [BUG] test_sortmerge_join_struct_mixed_key_with_null_filter LeftSemi/LeftAnti fails |
#3400 | [BUG] Canonicalized GPU plans sometimes not consistent when using Spark 3.2 |
#3440 | [BUG] Followup comments from PR3411 |
#3372 | [BUG] 3.2.0 shim: ShuffledBatchRDD.scala:141: match may not be exhaustive. |
#3434 | [BUG] Fix the unit test failure of KnownNotNull in Scala UDF for Spark 3.2 |
#3084 | [AUDIT] [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala |
#3463 | [BUG] 301+-nondb is named incorrectly |
#3435 | [BUG] tools - test dsv1 complex and decimal test fails |
#3388 | [BUG] maven scalastyle checks don't appear to work for alterneate source directories |
#3416 | [BUG] Resource cleanup issues with Spark 3.2 |
#3339 | [BUG] Databricks test fails test_hash_groupby_collect_partial_replace_fallback |
#3375 | [BUG] SPARK-35742 Replace semanticEquals with canonicalize |
#3334 | [BUG] UCX join_test FAILED on spark standalone |
#3058 | [BUG] GPU ORC reader complains errors when specifying columns that do not exist in file schema. |
#3385 | [BUG] misc_expr_test FAILED on Dataproc |
#2052 | [BUG] Spark 3.2.0 test fails due to SPARK-34906 Refactor TreeNode's children handling methods into specialized traits |
#3401 | [BUG] Qualification tool failed with java.lang.ArrayIndexOutOfBoundsException |
#3333 | [BUG]Mortgage ETL input_file_name is not correct when using CPU's CsvScan |
#3391 | [BUG] UDF example build fail |
#3379 | [BUG] q93 failed w/ UCX |
#3364 | [BUG] analysis tool cannot handle a job with no tasks. |
#3235 | Classes directly in Apache Spark packages |
#3237 | BasicColumnWriteJobStatsTracker might be affected by spark change SPARK-34399 |
#3134 | [BUG] Add more checkings before coalescing ORC files |
#3324 | [BUG] Databricks builds failing with missing dependency issue |
#3244 | [BUG] join_test LeftAnti failing on Databricks |
#3268 | [BUG] CDH ParquetCachedBatchSerializer fails to build due to api change in VectorizedColumnReader |
#3305 | [BUG] test_case_when failed on Databricks 7.3 nightly build |
#3139 | [BUG] case when on some nested types can produce a crash |
#3253 | [BUG] ClassCastException for unsupported TypedImperativeAggregate functions |
#3256 | [BUG] udf-examples native build broken |
#3271 | [BUG] Databricks 301 shim compilation error |
#3255 | [BUG] GpuRunningWindowExecMeta is missing ExecChecks for partitionSpec in databricks runtime |
#3222 | [BUG] test_running_window_function_exec_for_all_aggs failed in the UCX EGX run |
#3195 | [BUG] failures parquet_test test:read_round_trip |
#3176 | [BUG] test_window_aggs_for_rows_collect_list[IGNORE_ORDER({'local': True})] FAILED on EGX Yarn cluster |
#3187 | [BUG] NullPointerException in SLF4J on startup |
#3166 | [BUG] Unable to build rapids-4-spark jar from source due to missing 3.0.3-SNAPSHOT for spark-sql |
#3131 | [BUG] hash_aggregate_test TypedImperativeAggregate tests failed |
#3147 | [BUG] window_function_test.py::test_window_ride_along failed in databricks runtime |
#3094 | [BUG] join_test.py::test_sortmerge_join_with_conditionals failed in databricks 8.2 runtime |
#3078 | [BUG] test_hash_join_map, test_sortmerge_join_map failed in databricks runtime |
#3059 | [BUG] orc_test:test_pred_push_round_trip failed |
#3940 | Update changelog [skip ci] |
#3930 | Dist artifact with provided aggregator dependency |
#3918 | Update changelog [skip ci] |
#3906 | Doc updated for v2110[skip ci] |
#3840 | Update changelog [skip ci] |
#3838 | Update deploy script [skip ci] |
#3827 | Update changelog 21.10 to latest [skip ci] |
#3808 | Rewording qualification and profiling tools doc files[skip ci] |
#3815 | Correct 21.10 docs such as PCBS related FAQ [skip ci] |
#3807 | Update 21.10.0 release doc [skip ci] |
#3800 | Update approximate percentile documentation |
#3810 | Update to include Spark 3.2.0 in nosnapshots target so it gets released officially. |
#3806 | Update spark320.version to 3.2.0 |
#3795 | Reduce usage of escaping in xargs |
#3785 | [BUG] Update cudf version in version-dev script [skip ci] |
#3771 | Update cudfjni version to 21.10.0 |
#3777 | Ignore nullability when checking for need to cast aggregation input |
#3763 | Force parallel world in Shim caller's classloader |
#3756 | Simplify shim classloader logic |
#3746 | Avoid using AST on inner joins and avoid coalesce after nested loop join filter |
#3719 | Advertise CPU sort order and partitioning expressions to Catalyst |
#3737 | Add note referencing known issues in approx_percentile implementation |
#3729 | Update to ucx 1.11.2 for 21.10 |
#3711 | Surface problems with overrides and fallback |
#3722 | CDH build stopped working due to missing jars in maven repo |
#3691 | Fix issues with AQE and DPP enabled on Spark 3.2 |
#3373 | Support stddev and variance aggregations families |
#3708 | disable percentile approx tests |
#3695 | Remove duplicated data types for collect_list tests |
#3687 | Improve dedupe script |
#3646 | Debug utility method to dump a table or columnar batch to Parquet |
#3683 | Change deploy scripts for new build system |
#3301 | Approx Percentile |
#3673 | Add the Scala jar as an external lib for a linkage warning |
#3668 | Improve the diagnostics in udf compiler for try-and-catch. |
#3666 | Recompute Parquet block metadata when estimating footer from multiple file input |
#3671 | Fix tests-spark310+ dependency |
#3663 | Add back the tests-spark310+ |
#3657 | Revert "Use cudf to compute exact hash join output row sizes (#3288)" |
#3643 | Properly override Shims for int96Rebase |
#3645 | Verify unshimmed classes are bitwise-identical |
#3650 | Fix dist copy dependencies |
#3649 | Add ignore_order to other fallback tests for the aggregate |
#3631 | Change premerge to build all Spark versions |
#3630 | Fix CDH Build |
#3636 | Change nightly build to not deploy dist for each classifier version [skip ci] |
#3632 | Revert disabling of ctas test |
#3628 | Fix 313 ShuffleManager build |
#3618 | Update changelog script to strip ambiguous annotation [skip ci] |
#3626 | Add in support for casting decimal to other number types |
#3615 | Ignore order for the test_no_fallback_when_ansi_enabled |
#3602 | Dedupe proxy rapids shuffle manager byte code |
#3330 | Support int96RebaseModeInWrite and int96RebaseModeInRead |
#3438 | Parquet read unsigned int: uint8, uin16, uint32 |
#3607 | com.nvidia.spark.rapids.ColumnarRdd not exposed to user for XGBoost |
#3566 | Enable String Array Max and Min |
#3590 | Unshim ExclusiveModeGpuDiscoveryPlugin |
#3597 | ANSI check for aggregates |
#3595 | Update the overflow check algorithm for Subtract |
#3588 | Disable test_non_empty_ctas test |
#3577 | Commonize more shim module files |
#3594 | Fix nightly integration test script for specfic artifacts |
#3544 | Add test for nested grouping sets, rollup, cube |
#3587 | Revert shared class list modifications in PR#3545 |
#3570 | ANSI Support for Abs, UnaryMinus, and Subtract |
#3574 | Add in ANSI date time fallback |
#3578 | Deploy all of the classifier versions of the jars [skip ci] |
#3569 | Add commons-lang3 dependency to tests |
#3568 | Enable 3.2.0 unit test in premerge and nightly |
#3559 | Commonize shim module join and shuffle files |
#3565 | Auto-dedupe ASM-relocated shim dependencies |
#3531 | Fall back to the CPU for date/time parsing we cannot support yet |
#3561 | Follow on to ANSI Add |
#3557 | Add IDEA profile switch workarounds |
#3504 | Fix reserialization of broadcasted tables |
#3556 | Fix databricks test.sh script for passing spark shim version |
#3545 | Dynamic class file deduplication across shims in dist jar build |
#3551 | Fix window sum overflow for 3.2.0+ |
#3537 | GpuAdd supports ANSI mode. |
#3533 | Define a SPARK_SHIM_VER to pick up specific rapids-4-spark-integration-tests jars |
#3547 | Range window supports DayTime on 3.2+ |
#3534 | Fix package name and sql string issue for GpuTimeAdd |
#3536 | Enable auto-merge from branch 21.10 to 21.12 [skip ci] |
#3521 | Qualification tool: Report nested complex types in Potential Problems and improve write csv identification. |
#3507 | TimeAdd supports DayTimeIntervalType |
#3529 | Support UnsafeArrayData in scalars |
#3528 | Update NOTICE copyrights to 2021 |
#3527 | Ignore CBO tests that fail against Spark 3.2.0 |
#3439 | Stop parsing special dates for Spark 3.2+ |
#3524 | Update hashing to normalize -0.0 on 3.2+ |
#3508 | Auto abort dup pre-merge builds [skip ci] |
#3501 | Add limitations for Databricks doc |
#3517 | Update empty CTAS testing to avoid Hive if possible |
#3513 | Allow spark320 tests to run with 320 or 321 |
#3493 | Initialze RAPIDS Shuffle Manager at driver/executor startup |
#3496 | Update parse date to leverage cuDF support for single digit components |
#3454 | Catch UDF compiler exceptions and fallback to CPU |
#3505 | Remove doc references to cudf JIT |
#3503 | Have average support nulls for 3.2.0 |
#3500 | Fix GpuSum type to match resultType |
#3485 | Fix regressions in cast from string to date and timestamp |
#3487 | Add databricks build tests to pre-merge CI [skip ci] |
#3497 | Re-enable spark.rapids.shims-provider-override |
#3499 | Fix Spark 3.2.0 test_div_by_zero_ansi failures |
#3418 | Qualification tool: Add filtering based on configuration parameters |
#3498 | Update the scala repl loader to avoid issues with broadcast. |
#3479 | Test with Spark 3.2.1-SNAPSHOT |
#3474 | Build fixes and IDE instructions |
#3460 | Add DayTimeIntervalType/YearMonthIntervalType support |
#3491 | Shim GpuKryoRegistrator |
#3489 | Fix 311 databricks shim for AnsiCastOpSuite failures |
#3456 | Fallback to CPU when datasource v2 enables RuntimeFiltering |
#3417 | Adds pre/post steps for merge and update aggregate |
#3431 | Reinstate test_sortmerge_join_struct_mixed_key_with_null_filter |
#3477 | Update supported docs to clarify casting floating point to string |
#3447 | Add CUDA async memory resource as an option |
#3473 | Create non-shim specific version of ParquetCachedBatchSerializer |
#3471 | Fix canonicalization of GpuScalarSubquery |
#3480 | Temporarily disable failing cast string to date tests |
#3377 | Fix AnsiCastOpSuite failures with Spark 3.2 |
#3467 | Update docs to better describe support for floating point aggregation and NaNs |
#3459 | Use Shims v2 for ShuffledBatchRDD |
#3457 | Update the children unpacking pattern for GpuIf. |
#3464 | Add test for empty relation propagation |
#3458 | Fix log info GPU BroadcastExchangeExec |
#3466 | Databricks build fixes for missing shouldFailDivOverflow and removal of needed imports |
#3465 | Fix name of 301+-nondb directory to stop at Spark 3.2.0 |
#3452 | Enable AQE/DPP test for Spark 3.2 |
#3436 | Qualification tool: Update expected result for test |
#3455 | Decrease pre_merge_ci parallelism to 4 and reordering time-consuming tests |
#3420 | IntegralDivide throws an exception on overflow in ANSI mode |
#3433 | Batch scalastyle checks across all modules upfront |
#3453 | Fix spark-tests script for classifier |
#3445 | Update nightly build to pull Databricks jars |
#3446 | Format aggregator pom and commonize some configuration |
#3444 | Add in tests for unaligned parquet pages |
#3451 | Fix typo in spark-tests.sh |
#3443 | Remove 301emr shim |
#3441 | update deploy script for Databricks |
#3414 | Add in support for transform_keys |
#3320 | Add AST support for logical AND and logical OR |
#3425 | Throw an error by default if CREATE TABLE AS SELECT overwrites data |
#3422 | Stop double closing SerializeBatchDeserializeHostBuffer host buffers when running with Spark 3.2 |
#3411 | Make new build default and combine into dist package |
#3368 | Extend TagForReplaceMode to adapt Databricks runtime |
#3428 | Remove commented-out semanticEquals overrides |
#3421 | Revert to CUDA runtime image for build |
#3381 | Implement per-shim parallel world jar classloader |
#3303 | Update to cudf conditional join change that removes null equality argument |
#3408 | Add leafNodeDefaultParallelism support |
#3426 | Correct grammar in qualification tool doc |
#3423 | Fix hash_aggregate tests that leaked configs |
#3412 | Restore AST conditional join tests |
#3403 | Fix canonicalization regression with Spark 3.2 |
#3394 | Orc read map |
#3392 | Support transforming BinaryType between Row and Columnar |
#3393 | Fill with null columns for the names exist only in read schema in ORC reader |
#3399 | Fix collect_list test so it covers nested types properly |
#3410 | Specify number of RDD slices for ID tests |
#3363 | Add AST support for null literals |
#3396 | Throw exception on parse error in ANSI mode when casting String to Date |
#3315 | Add in reporting of time taken to transition plan to GPU |
#3409 | Use devel cuda image for premerge CI |
#3405 | Qualification tool: Filter empty strings from Read Schema |
#3387 | Fallback to the CPU for IGNORE NULLS on lead and lag |
#3398 | Fix NPE on string repeat when there is no data buffer |
#3366 | Fix input_file_xxx issue when FileScan is running on CPU |
#3397 | Add tests for GpuInSet |
#3395 | Fix UDF native example build |
#3389 | Bring back setRapidsShuffleManager in the driver side |
#3263 | Qualification tool: Report write data format and nested types |
#3378 | Make Dockerfile.cuda consistent with getting-started-kubernetes.md |
#3359 | UnionExec array and nested array support |
#3342 | Profiling tool add CSV output option and add new combined mode |
#3365 | fix databricks builds |
#3323 | Enable optional Spark 3.2.0 shim build |
#3361 | Fix databricks 3.1.1 arrow dependency version |
#3354 | Support HashAggregate on struct and nested struct |
#3341 | ArrayMax and ArrayMin support plus map_entries, map_keys, map_values |
#3356 | Support Databricks 3.0.1 with new build profiles |
#3344 | Move classes out of Apache Spark packages |
#3345 | Add job commit time to task tracker stats |
#3357 | Avoid RAT checks on any CSV file |
#3355 | Add new authorized user to blossom-ci whitelist [skip ci] |
#3340 | xfail AST nested loop join tests until cudf empty left table bug is fixed |
#3276 | Use child type in some places to make it more clear |
#3346 | Mark more tests as premerge_ci_1 |
#3353 | Fix automerge conflict 3349 [skip ci] |
#3335 | Support Databricks 3.1.1 in new build profiles |
#3317 | Adds in support for the transform_values SQL function |
#3299 | Insert buffer converters for TypedImperativeAggregate |
#3325 | Fix spark version classifier being applied properly |
#3288 | Use cudf to compute exact hash join output row sizes |
#3318 | Fix LeftAnti nested loop join missing condition case |
#3316 | Fix GpuProjectAstExec when projecting only literals |
#3262 | Re-enable the struct support for the ORC reader. |
#3312 | Fix inconsistent function name and add backward compatibility support for premerge job [skip ci] |
#3319 | Temporarily disable cache test except for spark 3.1.1 |
#3308 | Branch 21.10 FAQ update forward compatibility, update Spark and CUDA versions |
#3309 | Prepare Spark 3.2.0 related changes |
#3289 | Support for ArrayTransform |
#3307 | Fix generation of null scalars in tests |
#3306 | Update guava to be 30.0-jre |
#3304 | Fix nested cast type checks |
#3302 | Fix shim aggregator dependencies when snapshot-shims profile provided |
#3291 | Bump guava from 28.0-jre to 29.0-jre in /tests |
#3292 | Bump guava from 28.0-jre to 29.0-jre in /integration_tests |
#3293 | Bump guava from 28.0-jre to 29.0-jre in /udf-compiler |
#3294 | Update Qualification and Profiling tool documentation for gh-pages |
#3282 | Test for current_date , current_timestamp and now |
#3298 | Minor parent pom fixes |
#3296 | Support map type in case when expression |
#3295 | Rename pytest 'slow_test' tag as 'premerge_ci_1' to avoid confusion |
#3274 | Add m2 cache to fast premerge build |
#3283 | Fix ClassCastException for unsupported TypedImperativeAggregate functions |
#3251 | CreateMap support for multiple key-value pairs |
#3234 | Parquet support for MapType |
#3277 | Build changes for Spark 3.0.3, 3.0.4, 3.1.1, 3.1.2, 3.1.3, 3.1.1cdh and 3.0.1emr |
#3275 | Improve over-estimating for ORC coalescing reading |
#3280 | Update project URL to the public doc website |
#3285 | Qualification tool: Check for metadata being null |
#3281 | Decrease parallelism for pre-merge pod to avoid potential OOM kill |
#3264 | Add parallel support to nightly spark standalone tests |
#3257 | Add maven compile/package plugin executions for Spark302 and Spark301 |
#3272 | Fix Databricks shim build |
#3270 | Remove reference to old maven-scala-plugin |
#3259 | Generate docs for AST from checks |
#3164 | Support Union on Map types |
#3261 | Fix some typos[skip ci] |
#3242 | Support for LeftOuter/BuildRight and RightOuter/BuildLeft nested loop joins |
#3239 | Support decimal type in orc reader |
#3258 | Add ExecChecks to Databricks shims for RunningWindowFunctionExec |
#3230 | Initial support for CreateMap on GPU |
#3252 | Update to new cudf AST API |
#3249 | Fix typo in Spark311dbShims |
#3183 | Add TypeSig checks for join keys and other special cases |
#3246 | Disable test_broadcast_nested_loop_join_condition_missing_count on Databricks |
#3241 | Split pytest by 'slow_test' tag and run from different k8s pods to reduce premerge job duration |
#3184 | Support broadcast nested loop join for LeftSemi and LeftAnti |
#3236 | Fix Scaladoc warnings in GpuScalaUDF and BufferSendState |
#2846 | default rmm alloc fraction to the max to avoid unnecessary fragmentation |
#3231 | Fix some resource leaks in GpuCast and RapidsShuffleServerSuite |
#3179 | Support GpuFirst/GpuLast on more data types |
#3228 | Fix unreachable code warnings in GpuCast |
#3200 | Enable a smoke test for UCX in pre-merge |
#3203 | Fix Parquet test_round_trip to avoid CPU write exception |
#3220 | Use LongRangeGen instead of IntegerGen |
#3218 | Add UCX 1.11.0 to the pre-merge Docker image |
#3204 | Decrease parallelism for pre-merge integration tests |
#3212 | Fix merge conflict 3211 [skip ci] |
#3188 | Exclude slf4j classes from the spark-rapids jar |
#3189 | Disable snapshot shims by default |
#3178 | Fix hash_aggregate test failures due to TypedImperativeAggregate |
#3190 | Update GpuInSet for SPARK-35422 changes |
#3193 | Append res-life to blossom-ci whitelist [skip ci] |
#3175 | Add in support for explode on maps |
#3171 | Refine upload log stage naming in workflow file [skip ci] |
#3173 | Profile tool: Fix reporting app contains Dataset |
#3165 | Add optional projection via AST expression evaluation |
#3113 | Fix order of operations when using mkString in typeConversionInfo |
#3161 | Rework Profile tool to not require Spark to run and process files faster |
#3169 | Fix auto-merge conflict 3167 [skip ci] |
#3162 | Add in more generalized support for casting nested types |
#3158 | Enable joins on nested structs |
#3099 | Decimal_128 type checks |
#3155 | Simple nested additions v2 |
#2728 | Support string repeat SQL |
#3148 | Updated RunningWindow to support extended types too |
#3112 | Qualification tool: Add conjunction and disjunction filters |
#3117 | First pass at enabling structs, arrays, and maps for more parts of the plan |
#3109 | Cudf agg type changes |
#2971 | Support GpuCollectList and GpuCollectSet as TypedImperativeAggregate |
#3107 | Add setting to enable/disable RAPIDS Shuffle Manager dynamically |
#3105 | Add filter in query plan for conditional nested loop and cartesian joins |
#3096 | add spark311db GpuSortMergeJoinExec conditional joins filter |
#3086 | Fix Support of MapType in joins on Databricks |
#3089 | Add filter node in the query plan for conditional joins |
#3074 | Partial support for time windows |
#3061 | Support Union on Struct of Map |
#3034 | Support Sort on nested struct |
#3011 | Support MapType in joins |
#3031 | add doc for PR status checks [skip ci] |
#3028 | Enable parallel build for pre-merge job to reduce overall duration [skip ci] |
#3025 | Qualification tool: Add regex and username filters. |
#2980 | Init version 21.10.0 |
#3000 | Merge branch-21.08 to branch-21.10 |
#3350 | [BUG] Qualification tool: check for metadata being null |
#3351 | Update changelog for tools v21.08.1 release [skip CI] |
#3348 | Change tool version to 21.08.1 [skip ci] |
#3343 | Qualification tool backport: Check for metadata being null (#3285) |
#1584 | [FEA] Support rank as window function |
#1859 | [FEA] Optimize row_number/rank for memory usage |
#2976 | [FEA] support for arrays in BroadcastNestedLoopJoinExec and CartesianProductExec |
#2398 | [FEA] GpuIf and GpuCoalesce supports ArrayType |
#2445 | [FEA] Support literal arrays in case/when statements |
#2757 | [FEA] Profiling tool display input data types |
#2860 | [FEA] Minimal support for LEGACY timeParserPolicy |
#2693 | [FEA] Profiling Tool: Print GDS + UCX related parameters |
#2334 | [FEA] Record GPU time and Fetch time separately, instead of recording Total Time |
#2685 | [FEA] Profiling compare mode for table SQL Duration and Executor CPU Time Percent |
#2742 | [FEA] include App Name from profiling tool output |
#2712 | [FEA] Display job and stage info in the dot graph for profiling tool |
#2562 | [FEA] Implement KnownNotNull on the GPU |
#2557 | [FEA] support sort_array on GPU |
#2307 | [FEA] Enable Parquet writing for arrays |
#1856 | [FEA] Create a batch chunking iterator and integrate it with GpuWindowExec |
#866 | [FEA] combine window operations into single call |
#2800 | [FEA] Support ORC small files coalescing reading |
#737 | [FEA] handle peer timeouts in shuffle |
#1590 | Rapids Shuffle - UcpListener |
#2275 | [FEA] UCP error callback deal with cleanup |
#2799 | [FEA] Support ORC multi-file cloud reading |
#3135 | [BUG] Regression seen in concatenate in NDS with RAPIDS Shuffle Manager enabled |
#3017 | [BUG] orc_write_test failed in databricks runtime |
#3060 | [BUG] ORC read can corrupt data when specified schema does not match file schema ordering |
#3065 | [BUG] window exec tries to do too much on the GPU |
#3066 | [BUG] Profiling tool generate dot file fails to convert |
#3038 | [BUG] leak in getDeviceMemoryBuffer for the unspill case |
#3007 | [BUG] data mess up reading from ORC |
#3029 | [BUG] udf_test failed in ucx standalone env |
#2723 | [BUG] test failures in CI build (observed in UCX job) after starting to use 21.08 |
#3016 | [BUG] databricks script failed to return correct exit code |
#3002 | [BUG] writing parquet with partitionBy() loses sort order |
#2959 | [BUG] Resolve common code source incompatibility with supported Spark versions |
#2589 | [BUG] RapidsShuffleHeartbeatManager needs to remove executors that are stale |
#2964 | [BUG] IGNORE ORDER, WITH DECIMALS: [Window] [MIXED WINDOW SPECS] FAILED in spark 3.0.3+ |
#2942 | [BUG] Cache of Array using ParquetCachedBatchSerializer failed with "DATA ACCESS MUST BE ON A HOST VECTOR" |
#2965 | [BUG] test_round_robin_sort_fallback failed with ValueError: 'a_1' is not in list |
#2891 | [BUG] Discrepancy in getting count before and after caching |
#2972 | [BUG] When using timeout option(-t) of qualification tool, it does not print anything in output after timeout. |
#2958 | [BUG] When AQE=on, SMJ with a Map in SELECTed list fails with "key not found: numPartitions" |
#2929 | [BUG] No validation of format strings when formatting dates in legacy timeParserPolicy mode |
#2900 | [BUG] CAST string to float/double produces incorrect results in some cases |
#2957 | [BUG] Builds failing due to breaking changes in SPARK-36034 |
#2901 | [BUG] GpuCompressedColumnVector cannot be cast to GpuColumnVector with AQE |
#2899 | [BUG] CAST string to integer produces incorrect results in some cases |
#2937 | [BUG] Fix more edge cases when parsing dates in legacy timeParserPolicy |
#2939 | [BUG] Window integration tests failing with Lead expected at least 3 but found 0 |
#2912 | [BUG] Profiling compare mode fails when comparing spark 2 eventlog to spark 3 event log |
#2892 | [BUG] UCX error Message truncated observed with UCX 1.11 RC in Q77 NDS |
#2807 | [BUG] Use UCP_AM_FLAG_WHOLE_MSG and UCP_AM_FLAG_PERSISTENT_DATA for receive handlers |
#2930 | [BUG] Profiling tool does not show "Potential Problems" for dataset API in section "SQL Duration and Executor CPU Time Percent" |
#2902 | [BUG] CAST string to bool produces incorrect results in some cases |
#2850 | [BUG] "java.io.InterruptedIOException: getFileStatus on s3a://xxx" for ORC reading in Databricks 8.2 runtime |
#2856 | [BUG] cache of struct does not work on databricks 8.2ML |
#2790 | [BUG] In Comparison mode health check does not show the application id |
#2713 | [BUG] profiling tool does not error or warn if incompatible options are given |
#2477 | [BUG] test_single_sort_in_part is failed in nightly UCX and AQE (no UCX) integration |
#2868 | [BUG] to_date produces wrong value on GPU for some corner cases |
#2907 | [BUG] incorrect expression to detect previously set --master |
#2893 | [BUG] TransferRequest request transactions are getting leaked |
#120 | [BUG] GPU InitCap supports too much white space. |
#2786 | [BUG][initCap function]There is an issue converting the uppercase character to lowercase on GPU. |
#2754 | [BUG] cudf_udf tests failed w/ 21.08 |
#2820 | [BUG] Metrics are inconsistent for GpuRowToColumnarToExec |
#2710 | [BUG] dot file generation can go over the limits of dot |
#2772 | [BUG] new integration test failures w/ maxFailures=1 |
#2739 | [BUG] CBO causes less efficient plan for NDS q84 |
#2717 | [BUG] CBO forces joins back onto CPU in some cases |
#2718 | [BUG] CBO falls back to CPU to write to Parquet in some cases |
#2692 | [BUG] Profiling tool: Add error handling for comparison functions |
#2711 | [BUG] reused stages should not appear multiple times in dot |
#2746 | [BUG] test_single_nested_sort_in_part integration test failure 21.08 |
#2690 | [BUG] Profiling tool doesn't properly read rolled log files |
#2546 | [BUG] Build Failure when building from source |
#2750 | [BUG] nightly test failed with lists: testStringReplaceWithBackrefs |
#2644 | [BUG] test event logs should be compressed |
#2725 | [BUG] Heartbeat from unknown executor when running with UCX shuffle in local mode |
#2715 | [BUG] Part of the plan is not columnar class com.databricks.sql.execution.window.RunningWindowFunc |
#2521 | [BUG] cudf_udf failed in all spark release intermittently |
#1712 | [BUG] Scala UDF compiler can decompile UDFs with RAPIDS implementation |
#3216 | Update changelog to include download doc update [skip ci] |
#3214 | Update download and databricks doc for 21.06.2 [skip ci] |
#3210 | Update 21.08.0 changelog to latest [skip ci] |
#3197 | Databricks parquetFilters api change in db 8.2 runtime |
#3168 | Update 21.08 changelog to latest [skip ci] |
#3146 | update cudf Java binding version to 21.08.2 |
#3080 | Update docs for 21.08 release |
#3136 | Update tool docs to explain default filesystem [skip ci] |
#3128 | Fix merge conflict 3126 from branch-21.06 [skip ci] |
#3124 | Fix merge conflict 3122 from branch-21.06 [skip ci] |
#3100 | Update databricks 3.0.1 shim to new ParquetFilter api |
#3083 | Initial CHANGELOG.md update for 21.08 |
#3079 | Remove the struct support in ORC reader |
#3062 | Fix ORC read corruption when specified schema does not match file order |
#3064 | Tweak scaladoc to callout the GDS+unspill case in copyBuffer |
#3049 | Handle mmap exception more gracefully in RapidsShuffleServer |
#3067 | Update to UCX 1.11.0 |
#3024 | Check validity of any() or all() results that could be null |
#3069 | Fall back to the CPU on window partition by struct or array |
#3068 | Profiling tool generate dot file fails on unescaped html characters |
#3048 | Apply unique committer job ID fix from SPARK-33230 |
#3050 | Updates for google analytics [skip ci] |
#3015 | Fix ORC read error when read schema reorders file schema columns |
#3053 | cherry-pick #3028 [skip ci] |
#2887 | ORC reader supports struct |
#3032 | Add disorder read schema test case for Parquet |
#3022 | Add in docs to describe window performance |
#3018 | [BUG] fix db script hides error issue |
#2953 | Add in support for rank and dense_rank |
#3009 | Propagate child output ordering in GpuCoalesceBatches |
#2989 | Re-enable Array support in Cartesian Joins, Broadcast Nested Loop Joins |
#2999 | Remove unused configuration setting spark.rapids.sql.castStringToInteger.enabled |
#2967 | Resolve hidden source incompatibility between Spark30x and Spark31x Shims |
#2982 | Add FAQ entry for timezone error |
#2839 | GpuIf and GpuCoalesce support array and struct types |
#2987 | Update documentation for unsupported edge cases when casting from string to timestamp |
#2977 | Expire executors from the RAPIDS shuffle heartbeat manager on timeout |
#2985 | Move tools README to docs/additional-functionality/qualification-profiling-tools.md with some modification |
#2992 | Remove commented/redundant window-function tests. |
#2994 | Tweak RAPIDS Shuffle Manager configs for 21.08 |
#2984 | Avoid comparing window range canonicalized plans on Spark 3.0.x |
#2970 | Put the GPU data back on host before processing cache on CPU |
#2986 | Avoid struct aliasing in test_round_robin_sort_fallback |
#2935 | Read the complete batch before returning when selectedAttributes is empty |
#2826 | CaseWhen supports scalar of list and struct |
#2978 | enable auto-merge from branch 21.08 to 21.10 [skip ci] |
#2946 | ORC reader supports list |
#2947 | Qualification tool: Filter based on timestamp in event logs |
#2973 | Assert that CPU and GPU row fields match when present |
#2974 | Qualification tool: fix performance regression |
#2948 | Remove unnecessary copies of ParquetCachedBatchSerializer |
#2968 | Fix AQE CustomShuffleReaderExec not seeing ShuffleQueryStageExec |
#2969 | Make the dir for spark301 shuffle shim match package name |
#2933 | Improve CAST string to float implementation to handle more edge cases |
#2963 | Add override getParquetFilters for shim 304 |
#2956 | Profile Tool: make order consistent between runs |
#2924 | Fix bug when collecting directly from a GPU shuffle query stage with AQE on |
#2950 | Fix shutdown bugs in the RAPIDS Shuffle Manager |
#2922 | Improve UCX assertion to show the failed assertion |
#2961 | Fix ParquetFilters issue |
#2951 | Qualification tool: Allow app start and app name filtering and test with filesystem filters |
#2941 | Make test event log compression codec configurable |
#2919 | Fix bugs in CAST string to integer |
#2944 | Fix childExprs list for GpuWindowExpression, for Spark 3.1.x. |
#2917 | Refine GpuHashAggregateExec.setupReference |
#2909 | Support orc coalescing reading |
#2938 | Qualification tool: Add negation filter |
#2940 | qualification tool: add filtering by app start time |
#2928 | Qualification tool support recognizing decimal operations |
#2934 | Qualification tool: Add filter based on appName |
#2904 | Qualification and Profiling tool handle Read formats and datatypes |
#2927 | Restore aggregation sorted data hint |
#2932 | Profiling tool: Fix comparing spark2 and spark3 event logs |
#2926 | GPU Active Messages for all buffer types |
#2888 | Type check with the information from RapidsMeta |
#2903 | Fix cast string to bool |
#2895 | Add in running window optimization using scan |
#2859 | Add spillable batch caching and sort fallback to hash aggregation |
#2898 | Add fuzz tests for cast from string to other types |
#2881 | fix orc readers leak issue for ORC PERFILE type |
#2842 | Support STRUCT/STRING for LEAD()/LAG() |
#2880 | Added ParquetCachedBatchSerializer support for Databricks |
#2911 | Add in ID as sort for Job + Stage level aggregated task metrics |
#2914 | Profiling tool: add app index to tables that don't have it |
#2906 | Fix compiler warning |
#2890 | Fix cast to date bug |
#2908 | Fixes bad string contains in run_pyspark_from_build |
#2886 | Use UCP Listener for UCX connections and enable peer error handling |
#2875 | Add support for timeParserPolicy=LEGACY |
#2894 | Fixes a JVM leak for UCX TransactionRequests |
#2854 | Qualification Tool to output only the 'k' highest-ranked or 'k' lowest-ranked applications |
#2873 | Fix infinite loop in MultiFileCloudPartitionReaderBase |
#2838 | Replace toTitle with capitalize for GpuInitCap |
#2870 | Avoid readers acquiring GPU on next batch query if not first batch |
#2882 | Refactor window operations to do them in the exec |
#2874 | Update audit script to clone branch-3.2 instead of master |
#2843 | Qualification/Profiling tool add tests for Spark2 event logs |
#2828 | add cloud reading for orc |
#2721 | Check-list for corner cases in testing. |
#2675 | Support for Decimals with negative scale for Parquet Cached Batch Serializer |
#2849 | Update release notes to include qualification and profiling tool |
#2852 | Fix hash aggregate tests leaking configs into other tests |
#2845 | Split window exec into multiple stages if needed |
#2853 | Tag last batch when coalescing |
#2851 | Fix build failure - update ucx profiling test to fix parameter type to getEventLogInfo |
#2785 | Profiling tool: Print UCX and GDS parameters |
#2840 | Fix Gpu -> GPU |
#2844 | Document Qualification tool Spark requirements |
#2787 | Add metrics definition link to tool README.md[skip ci] |
#2841 | Add a threadpool to Qualification tool to process logs in parallel |
#2833 | Stop running so many versions of Spark unit tests for premerge |
#2837 | Append new authorized user to blossom-ci whitelist [skip ci] |
#2822 | Rewrite Qualification tool for better performance |
#2823 | Add semaphoreWaitTime and gpuOpTime for GpuRowToColumnarExec |
#2829 | Fix filtering directories on compression extension match |
#2720 | Add metrics documentation to the tuning guide |
#2816 | Improve some existing collectTime handling |
#2821 | Truncate long plan labels and refer to "print-plans" |
#2827 | Update cmake to build udf native [skip ci] |
#2793 | Report equivilant stages/sql ids as a part of compare |
#2810 | Use SecureRandom for UCPListener TCP port choice |
#2798 | Mirror apache repos to urm |
#2788 | Update the type signatures for some expressions |
#2792 | Automatically set spark.task.maxFailures and local[*, maxFailures] |
#2805 | Revert "Use UCX Active Messages for all shuffle transfers (#2735)" |
#2796 | show disk bytes spilled when GDS spill is enabled |
#2801 | Update pre-merge to use reserved_pool [skip ci] |
#2795 | Improve CBO debug logging |
#2794 | Prevent integer overflow when estimating data sizes in cost-based optimizer |
#2784 | Make spark303 shim version w/o snapshot and add shim layer for spark304 |
#2744 | Cost-based optimizer: Implement simple cost model that demonstrates benefits with NDS queries |
#2762 | Profiling tool: Update comparison mode output format and add error handling |
#2761 | Update dot graph to include stages and remove some duplication |
#2760 | Add in application timeline to profiling tool |
#2735 | Use UCX Active Messages for all shuffle transfers |
#2732 | qualification and profiling tool support rolled and compressed event logs for CSPs and Apache Spark |
#2768 | Make window function test results deterministic. |
#2769 | Add developer documentation for Adaptive Query Execution |
#2532 | date_format should not suggest enabling incompatibleDateFormats for formats we cannot support |
#2743 | Disable dynamicAllocation and set maxFailures to 1 in integration tests |
#2749 | Revert "Add in support for lists in some joins (#2702)" |
#2181 | abstract the parquet coalescing reading |
#2753 | Merge branch-21.06 to branch-21.08 [skip ci] |
#2751 | remove invalid blossom-ci users [skip ci] |
#2707 | Support KnownNotNull running on GPU |
#2747 | Fix num_slices for test_single_nested_sort_in_part |
#2729 | fix 301db-shim typecheck typo |
#2726 | Fix local mode starting RAPIDS shuffle heartbeats |
#2722 | Support aggregation on NullType in RunningWindowExec |
#2719 | Avoid executing child plan twice in CoalesceExec |
#2586 | Update metrics use in GpuUnionExec and GpuCoalesceExec |
#2716 | Add file size check to pre-merge CI |
#2554 | Upload build failure log to Github for external contributors access |
#2596 | Initial running window memory optimization |
#2702 | Add in support for arrays in BroadcastNestedLoopJoinExec and CartesianProductExec |
#2699 | Add a pre-commit hook to reject large files |
#2700 | Set numSlices and use parallelize to build dataframe for partition-se… |
#2548 | support collect_set in rolling window |
#2661 | Make tools inherit common dependency versions from parent pom |
#2668 | Remove CUDA 10.x from getting started guide [skip ci] |
#2676 | Profiling tool: Print Job Information in compare mode |
#2679 | Merge branch-21.06 to branch-21.08 [skip ci] |
#2677 | Add pre-merge independent stage timeout [skip ci] |
#2616 | support GpuSortArray |
#2582 | support parquet write arrays |
#2609 | Fix automerge failure from branch-21.06 to branch-21.08 |
#2570 | Added nested structs to UnionExec |
#2581 | Fix merge conflict 2580 [skip ci] |
#2458 | Split batch by key for window operations |
#2565 | Merge branch-21.06 into branch-21.08 |
#2563 | Document: git commit twice when copyright year updated by hook |
#2561 | Fixing the merge of 21.06 to 21.08 for comment changes in Profiling tool |
#2558 | Fix cdh shim version in 21.08 [skip ci] |
#2543 | Init branch-21.08 |
#3191 | [BUG] Databricks parquetFilters build failure in db 8.2 runtime |
#3209 | Update 21.06.2 changelog [skip ci] |
#3208 | Update rapids plugin version to 21.06.2 [skip ci] |
#3207 | Disable auto-merge from 21.06 to 21.08 [skip ci] |
#3205 | Branch 21.06 databricks update [skip ci] |
#3198 | Databricks parquetFilters api change in db 8.2 runtime |
#3098 | [BUG] Databricks parquetFilters build failure |
#3127 | Update CHANGELOG for the release v21.06.1 [skip ci] |
#3123 | Update rapids plugin version to 21.06.1 [skip ci] |
#3118 | Fix databricks 3.0.1 for ParquetFilters api change |
#3119 | Branch 21.06 databricks update [skip ci] |
#2483 | [FEA] Profiling and qualification tool |
#951 | [FEA] Create Cloudera shim layer |
#2481 | [FEA] Support Spark 3.1.2 |
#2530 | [FEA] Add support for Struct columns in CoalesceExec |
#2512 | [FEA] Report gpuOpTime not totalTime for expand, generate, and range execs |
#63 | [FEA] support ConcatWs sql function |
#2501 | [FEA] Add support for scalar structs to named_struct |
#2286 | [FEA] update UCX documentation for branch 21.06 |
#2436 | [FEA] Support nested types in CreateNamedStruct |
#2461 | [FEA] Report gpuOpTime instead of totalTime for project, filter, window, limit |
#2465 | [FEA] GpuFilterExec should report gpuOpTime not totalTime |
#2013 | [FEA] Support concatenating ArrayType columns |
#2425 | [FEA] Support for casting array of floats to array of doubles |
#2012 | [FEA] Support Window functions(lead & lag) for ArrayType |
#2011 | [FEA] Support creation of 2D array type |
#1582 | [FEA] Allow StructType as input and output type to InMemoryTableScan and InMemoryRelation |
#216 | [FEA] Range window-functions must support non-timestamp order-by expressions |
#2390 | [FEA] CI/CD for databricks 8.2 runtime |
#2273 | [FEA] Enable struct type columns for GpuHashAggregateExec |
#20 | [FEA] Support out of core joins |
#2160 | [FEA] Support Databricks 8.2 ML Runtime |
#2330 | [FEA] Enable hash partitioning with arrays |
#1103 | [FEA] Support date_format on GPU |
#1125 | [FEA] explode() can take expressions that generate arrays |
#1605 | [FEA] Support sorting on struct type keys |
#1445 | [FEA] GDS Integration |
#1588 | Rapids shuffle - UCX active messages |
#2367 | [FEA] CBO: Implement costs for memory access and launching kernels |
#2431 | [FEA] CBO should show benefits with q24b with decimals enabled |
#2652 | [BUG] No Job Found. Exiting. |
#2659 | [FEA] Group profiling tool "Potential Problems" |
#2680 | [BUG] cast can throw NPE |
#2628 | [BUG] failed to build plugin in databricks runtime 8.2 |
#2605 | [BUG] test_pandas_map_udf_nested_type failed in Yarn integration |
#2622 | [BUG] compressed event logs are not processed |
#2478 | [BUG] When tasks complete, cancel pending UCX requests |
#1953 | [BUG] Could not allocate native memory when running DLRM ETL with --output_ordering input on A100 |
#2495 | [BUG] scaladoc warning GpuParquetScan.scala:727 "discarding unmoored doc comment" |
#2368 | [BUG] Mismatched number of columns while performing GpuSort |
#2407 | [BUG] test_round_robin_sort_fallback failed |
#2497 | [BUG] GpuExec failed to find metric totalTime in databricks env |
#2473 | [BUG] enable test_window_aggs_for_rows_lead_lag_on_arrays and make the order unambiguous |
#2489 | [BUG] Queries with window expressions fail when cost-based optimizer is enabled |
#2457 | [BUG] test_window_aggs_for_rows_lead_lag_on_arrays failed |
#2371 | [BUG] Performance regression for crossjoin on 0.6 comparing to 0.5 |
#2372 | [BUG] FAILED ../../src/main/python/udf_cudf_test.py::test_window |
#2404 | [BUG] test_hash_pivot_groupby_nan_fallback failed on Dataproc |
#2474 | [BUG] when ucp listener enabled we bind 16 times always |
#2427 | [BUG] test_union_struct_missing_children[(Struct(not_null) failed in databricks310 and spark 311 |
#2455 | [BUG] CaseWhen crashes on literal arrays |
#2421 | [BUG] NPE when running mapInPandas Pandas UDF in 0.5GA |
#2428 | [BUG] Intermittent ValueError in test_struct_groupby_count |
#1628 | [BUG] TPC-DS-like query 24a and 24b at scale=3TB fails with OOM |
#2276 | [BUG] SPARK-33386 - ansi-mode changed ElementAt/Elt/GetArray behavior in Spark 3.1.1 - fallback to cpu |
#2309 | [BUG] legacy cast of a struct column to string with a single nested null column yields null instead of '[]' |
#2315 | [BUG] legacy struct cast to string crashes on a two field struct |
#2406 | [BUG] test_struct_groupby_count failed |
#2378 | [BUG] java.lang.ClassCastException: GpuCompressedColumnVector cannot be cast to GpuColumnVector |
#2355 | [BUG] convertDecimal64ToDecimal32Wrapper leaks ColumnView instances |
#2346 | [BUG] segfault when using UcpListener in TCP-only setup |
#2364 | [BUG] qa_nightly_select_test.py::test_select integration test fails |
#2302 | [BUG] Int96 are not being written as expected |
#2359 | [BUG] Alias is different in spark 3.1.0 but our canonicalization code doesn't handle |
#2277 | [BUG] spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED or LEGACY still fails to read LEGACY date from parquet |
#2320 | [BUG] TypeChecks diagnostics outputs column ids instead of unsupported types |
#2238 | [BUG] Unnecessary to cache the batches that will be sent to Python in FlatMapGroupInPandas . |
#1811 | [BUG] window_function_test.py::test_multi_types_window_aggs_for_rows_lead_lag[partBy failed |
#2817 | Update changelog for v21.06.0 release [skip ci] |
#2806 | Noted testing for A10, noted that min driver ver is HW specific |
#2797 | Update documentation for InitCap incompatibility |
#2774 | Update changelog for 21.06 release [skip ci] |
#2770 | [Doc] add more for Alluxio page [skip ci] |
#2745 | Add link to Mellanox RoCE documentation and mention --without-ucx installation option |
#2740 | Update cudf Java bindings to 21.06.1 |
#2664 | Update changelog for 21.06 release [skip ci] |
#2697 | fix GDS spill bug when copying from the batch write buffer |
#2691 | Update properties to check if table there |
#2687 | Remove CUDA 10.x from getting started guide (#2668) |
#2686 | Profiling tool: Print Job Information in compare mode |
#2657 | Print CPU and GPU output when _assert_equal fails to help debug given… |
#2681 | Avoid NPE when casting empty strings to ints |
#2669 | Fix multiple problems reported and improve error handling |
#2666 | [DOC]Update custom image guide in GCP dataproc to reduce cluster startup time |
#2665 | Update docs to move RAPIDS Shuffle out of beta [skip ci] |
#2671 | Clean profiling&qualification tool README |
#2673 | Profiling tool: Enable tests and update compressed event log |
#2672 | Update cudfjni dependency version to 21.06.0 |
#2663 | Qualification tool - add in estimating the App end time when the event log missing application end event |
#2600 | Accelerate RunningWindow queries on GPU |
#2651 | Profiling tool - fix reporting contains dataset when sql time 0 |
#2623 | Fixed minor mistakes in documentation |
#2631 | Update docs for Databricks 8.2 ML |
#2638 | Add an init script for databricks 7.3ML with CUDA11.0 installed |
#2643 | Profiling tool: Health check follow on |
#2640 | Add physical plan to the dot file as the graph label |
#2637 | Fix databricks for 3.1.1 |
#2577 | Update download.md and FAQ.md for 21.06.0 |
#2636 | Profiling tool - Fix file writer for generating dot graphs, supporting writing sql plans to a file, change output to subdirectory |
#2625 | Exclude failed jobs/queries from Qualification tool output |
#2626 | Enable processing of compressed Spark event logs |
#2632 | Profiling tool: Add support for health check. |
#2627 | Ignore order for map udf test |
#2620 | Change aggregation of executor CPU and run time for Qualification tool to speed up query |
#2618 | Correct an issue for README for tools and also correct s3 solution in Args.scala |
#2612 | Profiling tool, add in job to stage, duration, executor cpu time, fix writing to HDFS |
#2614 | change rapids-4-spark-tools directory to tools in deploy script [skip ci] |
#2611 | Revert "disable cudf_udf tests for #2521" |
#2604 | Profile/qualification tool error handling improvements and support spark < 3.1.1 |
#2598 | Rename rapids-4-spark-tools directory to tools |
#2576 | Add filter support for qualification and profiling tool. |
#2603 | Add the doc for -g option of the profiling tool. |
#2594 | Change the README of the qualification and profiling tool to match the current version. |
#2591 | Implement test for qualification tool sql metric aggregates |
#2590 | Profiling tool support for collection and analysis |
#2587 | Handle UCX connection timeouts from heartbeats more gracefully |
#2588 | Fix package name |
#2574 | Add Qualification tool support |
#2571 | Change test_single_sort_in_part to print source data frame on failure |
#2569 | Remove -SNAPSHOT in documentation in preparation for release |
#2429 | Change RMM_ALLOC_FRACTION to represent percentage of available memory, rather than total memory, for initial allocation |
#2553 | Cancel requests that are queued for a client/handler on error |
#2566 | expose unspill config option |
#2460 | align GDS reads/writes to 4 KiB |
#2515 | Remove fetchTime and standardize on collectTime |
#2523 | Not compile RapidsUDF when udf compiler is enabled |
#2538 | Fixed code indentation in ParquetCachedBatchSerializer |
#2559 | Release profiling tool jar to maven central |
#2423 | Add cloudera shim layer |
#2520 | Add event logs for integration tests |
#2525 | support interval.microseconds for range window TimeStampType |
#2536 | Don't do an extra shuffle in some TopN cases |
#2508 | Refactor the code for conditional expressions |
#2542 | enable auto-merge from 21.06 to 21.08 [skip ci] |
#2540 | Update spark 312 shim, and Add spark 313-SNAPSHOT shim |
#2539 | disable cudf_udf tests for #2521 |
#2514 | Add Struct support for ParquetWriter |
#2534 | Remove scaladoc on an internal method to avoid warning during build |
#2537 | Add CentOS documentation and improve dockerfiles for UCX |
#2531 | Add nested types and decimals to CoalesceExec |
#2513 | Report opTime not totalTime for expand, range, and generate execs |
#2533 | Fix concat_ws test specifying only a separator for databricks |
#2528 | Make GenerateDot test more robust |
#2529 | Change Databricks 310 shim to be 311 to match reported spark.version |
#2479 | Support concat with separator on GPU |
#2507 | Improve test coverage for sorting structs |
#2526 | Improve debug print to include addresses and null counts |
#2463 | Add EMR 6.3 documentation |
#2516 | Avoid listener race collecting wrong plan in assert_gpu_fallback_collect |
#2505 | Qualification tool updates for datasets, udf, and misc fixes |
#2509 | Added in basic support for scalar structs to named_struct |
#2449 | Add code for generating dot file visualizations |
#2475 | Update shuffle documentation for branch-21.06 and UCX 1.10.1 |
#2500 | Update Dockerfile for native UDF |
#2506 | Support creating Scalars/ColumnVectors from utf8 strings directly. |
#2502 | Remove work around for nulls in semi-anti joins |
#2503 | Remove temporary logging and adjust test column names |
#2499 | Fix regression in TOTAL_TIME metrics for Databricks |
#2498 | Add in basic support for scalar maps and allow nesting in named_struct |
#2496 | Add comments for lazy binding in WindowInPandas |
#2493 | improve window agg test for range numeric types |
#2491 | Fix regression in cost-based optimizer when calculating cost for Window operations |
#2482 | Window tests with smaller batches |
#2490 | Add temporary logging for Dataproc round robin fallback issue |
#2486 | Remove the null replacement in computePredicate |
#2469 | Adding additional functionalities to profiling tool |
#2462 | Report gpuOpTime instead of totalTime for project, filter, limit, and window |
#2484 | Fix the failing test test_window on Databricks |
#2472 | Fix hash_aggregate_test |
#2476 | Fix for UCP Listener created spark.port.maxRetries times |
#2471 | skip test_window_aggs_for_rows_lead_lag_on_arrays |
#2446 | Update plugin version to 21.06.0 |
#2409 | Change shuffle metadata messages to use UCX Active Messages |
#2397 | Include memory access costs in cost models (cost-based optimizer) |
#2442 | fix GpuCreateNamedStruct not serializable issue |
#2379 | support GpuConcat on ArrayType |
#2456 | Fall back to the CPU for literal array values on case/when |
#2447 | Filter out the nulls after slicing the batches. |
#2426 | Implement cast of nested arrays |
#2299 | support creating array of array |
#2451 | Update tuning docs to add batch size recommendations. |
#2435 | support lead/lag on arrays |
#2448 | support creating list ColumnVector for Literal(ArrayType(NullType)) |
#2402 | Add profiling tool |
#2313 | Supports GpuLiteral of array type |
Changelog of older releases can be found at docs/archives