Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions R/pkg/R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -879,8 +879,8 @@ setMethod("factorial",
#'
#' The function by default returns the first values it sees. It will return the first non-missing
#' value it sees when na.rm is set to true. If all values are missing, then NA is returned.
#' Note: the function is non-deterministic because its results depends on order of rows which
#' may be non-deterministic after a shuffle.
#' Note: the function is non-deterministic because its results depends on the order of the rows
#' which may be non-deterministic after a shuffle.
#'
#' @param na.rm a logical value indicating whether NA values should be stripped
#' before the computation proceeds.
Expand Down Expand Up @@ -1024,8 +1024,8 @@ setMethod("kurtosis",
#'
#' The function by default returns the last values it sees. It will return the last non-missing
#' value it sees when na.rm is set to true. If all values are missing, then NA is returned.
#' Note: the function is non-deterministic because its results depends on order of rows which
#' may be non-deterministic after a shuffle.
#' Note: the function is non-deterministic because its results depends on the order of the rows
#' which may be non-deterministic after a shuffle.
#'
#' @param x column to compute on.
#' @param na.rm a logical value indicating whether NA values should be stripped
Expand Down Expand Up @@ -3706,7 +3706,7 @@ setMethod("create_map",
#' @details
#' \code{collect_list}: Creates a list of objects with duplicates.
#' Note: the function is non-deterministic because the order of collected results depends
#' on order of rows which may be non-deterministic after a shuffle.
#' on the order of the rows which may be non-deterministic after a shuffle.
#'
#' @rdname column_aggregate_functions
#' @aliases collect_list collect_list,Column-method
Expand All @@ -3727,7 +3727,7 @@ setMethod("collect_list",
#' @details
#' \code{collect_set}: Creates a list of objects with duplicate elements eliminated.
#' Note: the function is non-deterministic because the order of collected results depends
#' on order of rows which may be non-deterministic after a shuffle.
#' on the order of the rows which may be non-deterministic after a shuffle.
#'
#' @rdname column_aggregate_functions
#' @aliases collect_set collect_set,Column-method
Expand Down
12 changes: 6 additions & 6 deletions python/pyspark/sql/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ def _options_to_str(options):
Aggregate function: returns a list of objects with duplicates.

.. note:: The function is non-deterministic because the order of collected results depends
on order of rows which may be non-deterministic after a shuffle.
on the order of the rows which may be non-deterministic after a shuffle.

>>> df2 = spark.createDataFrame([(2,), (5,), (5,)], ('age',))
>>> df2.agg(collect_list('age')).collect()
Expand All @@ -206,7 +206,7 @@ def _options_to_str(options):
Aggregate function: returns a set of objects with duplicate elements eliminated.

.. note:: The function is non-deterministic because the order of collected results depends
on order of rows which may be non-deterministic after a shuffle.
on the order of the rows which may be non-deterministic after a shuffle.

>>> df2 = spark.createDataFrame([(2,), (5,), (5,)], ('age',))
>>> df2.agg(collect_set('age')).collect()
Expand Down Expand Up @@ -444,8 +444,8 @@ def first(col, ignorenulls=False):
The function by default returns the first values it sees. It will return the first non-null
value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

.. note:: The function is non-deterministic because its results depends on order of rows which
may be non-deterministic after a shuffle.
.. note:: The function is non-deterministic because its results depends on the order of the
rows which may be non-deterministic after a shuffle.
"""
sc = SparkContext._active_spark_context
jc = sc._jvm.functions.first(_to_java_column(col), ignorenulls)
Expand Down Expand Up @@ -535,8 +535,8 @@ def last(col, ignorenulls=False):
The function by default returns the last values it sees. It will return the last non-null
value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

.. note:: The function is non-deterministic because its results depends on order of rows
which may be non-deterministic after a shuffle.
.. note:: The function is non-deterministic because its results depends on the order of the
rows which may be non-deterministic after a shuffle.
"""
sc = SparkContext._active_spark_context
jc = sc._jvm.functions.last(_to_java_column(col), ignorenulls)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ import org.apache.spark.sql.types._
> SELECT _FUNC_(col, true) FROM VALUES (NULL), (5), (20) AS tab(col);
5
""",
note = """
The function is non-deterministic because its results depends on the order of the rows
which may be non-deterministic after a shuffle.
""",
since = "2.0.0")
case class First(child: Expression, ignoreNullsExpr: Expression)
extends DeclarativeAggregate with ExpectsInputTypes {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ import org.apache.spark.sql.types._
> SELECT _FUNC_(col, true) FROM VALUES (10), (5), (NULL) AS tab(col);
5
""",
note = """
The function is non-deterministic because its results depends on the order of the rows
which may be non-deterministic after a shuffle.
""",
since = "2.0.0")
case class Last(child: Expression, ignoreNullsExpr: Expression)
extends DeclarativeAggregate with ExpectsInputTypes {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,10 @@ abstract class Collect[T <: Growable[Any] with Iterable[Any]] extends TypedImper
> SELECT _FUNC_(col) FROM VALUES (1), (2), (1) AS tab(col);
[1,2,1]
""",
note = """
The function is non-deterministic because the order of collected results depends
on the order of the rows which may be non-deterministic after a shuffle.
""",
since = "2.0.0")
case class CollectList(
child: Expression,
Expand Down Expand Up @@ -121,6 +125,10 @@ case class CollectList(
> SELECT _FUNC_(col) FROM VALUES (1), (2), (1) AS tab(col);
[1,2]
""",
note = """
The function is non-deterministic because the order of collected results depends
on the order of the rows which may be non-deterministic after a shuffle.
""",
since = "2.0.0")
case class CollectSet(
child: Expression,
Expand Down
40 changes: 20 additions & 20 deletions sql/core/src/main/scala/org/apache/spark/sql/functions.scala
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ object functions {
* Aggregate function: returns a list of objects with duplicates.
*
* @note The function is non-deterministic because the order of collected results depends
* on order of rows which may be non-deterministic after a shuffle.
* on the order of the rows which may be non-deterministic after a shuffle.
*
* @group agg_funcs
* @since 1.6.0
Expand All @@ -284,7 +284,7 @@ object functions {
* Aggregate function: returns a list of objects with duplicates.
*
* @note The function is non-deterministic because the order of collected results depends
* on order of rows which may be non-deterministic after a shuffle.
* on the order of the rows which may be non-deterministic after a shuffle.
*
* @group agg_funcs
* @since 1.6.0
Expand All @@ -295,7 +295,7 @@ object functions {
* Aggregate function: returns a set of objects with duplicate elements eliminated.
*
* @note The function is non-deterministic because the order of collected results depends
* on order of rows which may be non-deterministic after a shuffle.
* on the order of the rows which may be non-deterministic after a shuffle.
*
* @group agg_funcs
* @since 1.6.0
Expand All @@ -306,7 +306,7 @@ object functions {
* Aggregate function: returns a set of objects with duplicate elements eliminated.
*
* @note The function is non-deterministic because the order of collected results depends
* on order of rows which may be non-deterministic after a shuffle.
* on the order of the rows which may be non-deterministic after a shuffle.
*
* @group agg_funcs
* @since 1.6.0
Expand Down Expand Up @@ -424,8 +424,8 @@ object functions {
* The function by default returns the first values it sees. It will return the first non-null
* value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
*
* @note The function is non-deterministic because its results depends on order of rows which
* may be non-deterministic after a shuffle.
* @note The function is non-deterministic because its results depends on the order of the rows
* which may be non-deterministic after a shuffle.
*
* @group agg_funcs
* @since 2.0.0
Expand All @@ -440,8 +440,8 @@ object functions {
* The function by default returns the first values it sees. It will return the first non-null
* value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
*
* @note The function is non-deterministic because its results depends on order of rows which
* may be non-deterministic after a shuffle.
* @note The function is non-deterministic because its results depends on the order of the rows
* which may be non-deterministic after a shuffle.
*
* @group agg_funcs
* @since 2.0.0
Expand All @@ -456,8 +456,8 @@ object functions {
* The function by default returns the first values it sees. It will return the first non-null
* value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
*
* @note The function is non-deterministic because its results depends on order of rows which
* may be non-deterministic after a shuffle.
* @note The function is non-deterministic because its results depends on the order of the rows
* which may be non-deterministic after a shuffle.
*
* @group agg_funcs
* @since 1.3.0
Expand All @@ -470,8 +470,8 @@ object functions {
* The function by default returns the first values it sees. It will return the first non-null
* value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
*
* @note The function is non-deterministic because its results depends on order of rows which
* may be non-deterministic after a shuffle.
* @note The function is non-deterministic because its results depends on the order of the rows
* which may be non-deterministic after a shuffle.
*
* @group agg_funcs
* @since 1.3.0
Expand Down Expand Up @@ -549,8 +549,8 @@ object functions {
* The function by default returns the last values it sees. It will return the last non-null
* value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
*
* @note The function is non-deterministic because its results depends on order of rows which
* may be non-deterministic after a shuffle.
* @note The function is non-deterministic because its results depends on the order of the rows
* which may be non-deterministic after a shuffle.
*
* @group agg_funcs
* @since 2.0.0
Expand All @@ -565,8 +565,8 @@ object functions {
* The function by default returns the last values it sees. It will return the last non-null
* value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
*
* @note The function is non-deterministic because its results depends on order of rows which
* may be non-deterministic after a shuffle.
* @note The function is non-deterministic because its results depends on the order of the rows
* which may be non-deterministic after a shuffle.
*
* @group agg_funcs
* @since 2.0.0
Expand All @@ -581,8 +581,8 @@ object functions {
* The function by default returns the last values it sees. It will return the last non-null
* value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
*
* @note The function is non-deterministic because its results depends on order of rows which
* may be non-deterministic after a shuffle.
* @note The function is non-deterministic because its results depends on the order of the rows
* which may be non-deterministic after a shuffle.
*
* @group agg_funcs
* @since 1.3.0
Expand All @@ -595,8 +595,8 @@ object functions {
* The function by default returns the last values it sees. It will return the last non-null
* value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
*
* @note The function is non-deterministic because its results depends on order of rows which
* may be non-deterministic after a shuffle.
* @note The function is non-deterministic because its results depends on the order of the rows
* which may be non-deterministic after a shuffle.
*
* @group agg_funcs
* @since 1.3.0
Expand Down