[SPARK-8345] [ML] Add an SQL node as a feature transformer #7465

yanboliang · 2015-07-17T10:53:38Z

Implements the transforms which are defined by SQL statement.
Currently we only support SQL syntax like 'SELECT ... FROM THIS'
where 'THIS' represents the underlying table of the input dataset.

AmplabJenkins · 2015-07-17T10:57:12Z

Merged build triggered.

AmplabJenkins · 2015-07-17T10:57:18Z

Merged build started.

SparkQA · 2015-07-17T11:02:07Z

Test build #37623 has started for PR 7465 at commit 51eb9e7.

SparkQA · 2015-07-17T11:41:32Z

Test build #37623 has finished for PR 7465 at commit 51eb9e7.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class SQLTransformer (override val uid: String) extends Transformer

AmplabJenkins · 2015-07-17T11:42:09Z

Merged build finished. Test PASSed.

mengxr · 2015-07-18T17:29:23Z

mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala

I see the issue but I don't think this is the right solution. See my comments below.

liancheng · 2015-07-19T12:34:34Z

@mengxr I guess what we need here is essentially a wrapper helper function which wraps a DataFrame => DataFrame function as a transformer, and a SQL statement is just a (questionably) more convenient way to express this function. One of the benefit of DataFrame DSL over SQL is that you don't need a temporary table name.

AmplabJenkins · 2015-07-20T09:37:14Z

Merged build triggered.

AmplabJenkins · 2015-07-20T15:37:13Z

Merged build started.

SparkQA · 2015-07-20T15:41:38Z

Test build #37827 has started for PR 7465 at commit 0d4bb15.

SparkQA · 2015-07-20T16:17:06Z

Test build #37827 has finished for PR 7465 at commit 0d4bb15.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class SQLTransformer (override val uid: String) extends Transformer

AmplabJenkins · 2015-07-20T16:17:40Z

Merged build finished. Test PASSed.

yanboliang · 2015-07-28T10:35:48Z

@mengxr

mengxr · 2015-08-07T06:59:11Z

mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala

I think it is okay to return the DataFrame from sqlContext.sql directly. User should use * if they want to keep existing columns.

This will have different behavior with other transformers in ml.feature. Other transformers will return the DataFrame which is composed of original DataFrame and transformed DataFrame. But here if user did not use *, he will not keep existing columns in the output DataFrame.

AmplabJenkins · 2015-08-09T08:22:49Z

Merged build triggered.

AmplabJenkins · 2015-08-09T08:22:56Z

Merged build started.

SparkQA · 2015-08-09T08:28:50Z

Test build #40267 has started for PR 7465 at commit b403fcb.

SparkQA · 2015-08-09T09:06:18Z

Test build #40267 has finished for PR 7465 at commit b403fcb.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class SQLTransformer (override val uid: String) extends Transformer

AmplabJenkins · 2015-08-09T09:06:54Z

Merged build finished. Test PASSed.

mengxr · 2015-08-11T18:04:12Z

LGTM. Merged into master. I think it is okay if the output columns do not contain all input columns. It is not a requirement for transformers in the pipeline API. Thanks for working on this!

Implements the transforms which are defined by SQL statement. Currently we only support SQL syntax like 'SELECT ... FROM __THIS__' where '__THIS__' represents the underlying table of the input dataset. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#7465 from yanboliang/spark-8345 and squashes the following commits: b403fcb [Yanbo Liang] address comments 0d4bb15 [Yanbo Liang] a better transformSchema() implementation 51eb9e7 [Yanbo Liang] Add an SQL node as a feature transformer

Add an SQL node as a feature transformer

51eb9e7

yanboliang changed the title ~~[SPARK-8345] [ML] Add an SQL node as a feature transformer~~ [WIP] [SPARK-8345] [ML] Add an SQL node as a feature transformer Jul 17, 2015

mengxr reviewed Jul 18, 2015
View reviewed changes

yanboliang changed the title ~~[WIP] [SPARK-8345] [ML] Add an SQL node as a feature transformer~~ [SPARK-8345] [ML] Add an SQL node as a feature transformer Jul 20, 2015

a better transformSchema() implementation

0d4bb15

mengxr reviewed Aug 7, 2015
View reviewed changes

address comments

b403fcb

asfgit closed this in 8cad854 Aug 11, 2015

yanboliang deleted the spark-8345 branch August 26, 2015 07:11

[SPARK-8345] [ML] Add an SQL node as a feature transformer #7465

[SPARK-8345] [ML] Add an SQL node as a feature transformer #7465

Uh oh!

Conversation

yanboliang commented Jul 17, 2015

Uh oh!

AmplabJenkins commented Jul 17, 2015

Uh oh!

AmplabJenkins commented Jul 17, 2015

Uh oh!

SparkQA commented Jul 17, 2015

Uh oh!

SparkQA commented Jul 17, 2015

Uh oh!

AmplabJenkins commented Jul 17, 2015

Uh oh!

mengxr Jul 18, 2015

Choose a reason for hiding this comment

Uh oh!

liancheng commented Jul 19, 2015

Uh oh!

AmplabJenkins commented Jul 20, 2015

Uh oh!

AmplabJenkins commented Jul 20, 2015

Uh oh!

SparkQA commented Jul 20, 2015

Uh oh!

SparkQA commented Jul 20, 2015

Uh oh!

AmplabJenkins commented Jul 20, 2015

Uh oh!

yanboliang commented Jul 28, 2015

Uh oh!

mengxr Aug 7, 2015

Choose a reason for hiding this comment

Uh oh!

yanboliang Aug 9, 2015

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Aug 9, 2015

Uh oh!

AmplabJenkins commented Aug 9, 2015

Uh oh!

SparkQA commented Aug 9, 2015

Uh oh!

SparkQA commented Aug 9, 2015

Uh oh!

AmplabJenkins commented Aug 9, 2015

Uh oh!

mengxr commented Aug 11, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants