Skip to content

Conversation

@ssimeonov
Copy link
Contributor

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Jul 21, 2015

CC @marmbrus just in case

@marmbrus
Copy link
Contributor

I don't think this is correct: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L147

scala> sql("SELECT a FROM test GROUP BY b")
org.apache.spark.sql.AnalysisException: expression 'a' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.;

scala> sql("SELECT first(a) FROM test GROUP BY b")
res3: org.apache.spark.sql.DataFrame = [_c0: int]

@ssimeonov
Copy link
Contributor Author

@marmbrus can you please provide a complete example that can execute in spark-shell?

You can find a standalone runnable example with complete shell output in this gist. Here is the summary of what happens:

// ERROR RetryingHMSHandler: MetaException(message:NoSuchObjectException(message:Function default.first does not exist))
// INFO FunctionRegistry: Unable to lookup UDF in metastore: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:NoSuchObjectException(message:Function default.first does not exist))
// java.lang.RuntimeException: Couldn't find function first
ctx.sql("select first(num) from test_first group by category").show

// OK
ctx.sql("select first_value(num) from test_first group by category").show

Perhaps the difference is that I'm using HiveContext?

@marmbrus
Copy link
Contributor

Which version of hive/spark are you running?

@ssimeonov
Copy link
Contributor Author

@marmbrus you can see the version and full INFO-level shell output in the gist. I'm running 1.4.1.

@rxin
Copy link
Contributor

rxin commented Aug 12, 2015

cc @yhuai since you are working on a related issue.

@yhuai
Copy link
Contributor

yhuai commented Aug 12, 2015

In Spark 1.4, first and last were not in function registry. Right now, first_value and last_value are pointing to Hive's first_value and last_value, respectively. I am adding these to our function registry as well in #8113. @ssimeonov I will update the exception message in #8113.

@ssimeonov
Copy link
Contributor Author

@Yhual great

@JoshRosen
Copy link
Contributor

Hey @ssimeonov, would you mind closing this PR now that it's change has been incorporated into #8113? Thanks!

@asfgit asfgit closed this in 8d4449c Oct 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants