Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -256,4 +256,9 @@ class VectorAssemblerSuite
assert(runWithMetadata("keep", additional_filter = "id1 > 2").count() == 4)
}

test("SPARK-25371: VectorAssembler with empty inputCols") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, after this patch this test passes, before it doesn't. I think it is helpful to avoid regressions like this in the future.

val vectorAssembler = new VectorAssembler().setInputCols(Array()).setOutputCol("a")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is VectorAssembler with zero input column useful?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't sound that useful, but the JIRA suggests this is the behavior in 2.2. It throws a weird error in 2.3. I could imagine just allowing this behavior, or throwing a better exception. Is there a use case for no input? maybe you have some reusable pipeline that is applied to a subset of columns and sometimes it matches nothing. The output is empty but maybe that doesn't matter for whatever purpose it serves... maybe it's assembled with something else afterwards. I could picture a valid use case.

val output = vectorAssembler.transform(dfWithNullsAndNaNs)
assert(output.select("a").limit(1).collect().head == Row(Vectors.sparse(0, Seq.empty)))
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -379,10 +379,7 @@ trait CreateNamedStructLike extends Expression {
}

override def checkInputDataTypes(): TypeCheckResult = {
if (children.length < 1) {
TypeCheckResult.TypeCheckFailure(
s"input to function $prettyName requires at least one argument")
} else if (children.size % 2 != 0) {
if (children.size % 2 != 0) {
TypeCheckResult.TypeCheckFailure(s"$prettyName expects an even number of arguments.")
} else {
val invalidNames = nameExprs.filterNot(e => e.foldable && e.dataType == StringType)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2677,8 +2677,6 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext {
val funcsMustHaveAtLeastOneArg =
("coalesce", (df: DataFrame) => df.select(coalesce())) ::
("coalesce", (df: DataFrame) => df.selectExpr("coalesce()")) ::
("named_struct", (df: DataFrame) => df.select(struct())) ::
("named_struct", (df: DataFrame) => df.selectExpr("named_struct()")) ::
("hash", (df: DataFrame) => df.select(hash())) ::
("hash", (df: DataFrame) => df.selectExpr("hash()")) :: Nil
funcsMustHaveAtLeastOneArg.foreach { case (name, func) =>
Expand Down