-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13770] [Documentation][ML] Document the ML feature Interaction #15658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit 3fad195.
|
Can one of the admins verify this patch? |
docs/ml-features.md
Outdated
|
|
||
| ## Interaction | ||
|
|
||
| `Implements` is a `Transformer` which implements interaction transform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Interaction" is a Transformer ...
I think this examples is valuable but it actually says nothing about what the transformer does. I know it outputs the various interaction features but we should define and give an example here and/or in the comments of the examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment. I added example and changed description.
docs/ml-features.md
Outdated
| 2 | 6 | 1 | [12.0] | ||
| 3 | 10 | 8 | [240.0] | ||
| 4 | 9 | 2 | [72.0] | ||
| 5 | 1 | 1 | [5.0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example doesn't really show what Interaction does. It looks like it just outputs the product of the columns, but that's the corner case. Really it outputs all possible products of one element from each column. Here's an rough example:
val df = spark.createDataFrame(Seq(
(Vectors.dense(0,1,2,3),Vectors.dense(1,4,3,9)),
(Vectors.dense(2,6,1,7),Vectors.dense(3,10,8,11))
)).toDF("data1", "data2")
val assembler1 = new VectorAssembler().setInputCols(Array("data1")).setOutputCol("vec1")
val assembler2 = new VectorAssembler().setInputCols(Array("data2")).setOutputCol("vec2")
val interaction = new Interaction().setInputCols(Array("vec1", "vec2")).setOutputCol("interactedCol")
interaction.transform(assembler2.transform(assembler1.transform(df))).select("vec1", "vec2", "interactedCol").show(truncate = false)
+-----------------+-------------------+------------------------------------------------------------------------------+
|vec1 |vec2 |interactedCol |
+-----------------+-------------------+------------------------------------------------------------------------------+
|[0.0,1.0,2.0,3.0]|[1.0,4.0,3.0,9.0] |[0.0,0.0,0.0,0.0,1.0,4.0,3.0,9.0,2.0,8.0,6.0,18.0,3.0,12.0,9.0,27.0] |
|[2.0,6.0,1.0,7.0]|[3.0,10.0,8.0,11.0]|[6.0,20.0,16.0,22.0,18.0,60.0,48.0,66.0,3.0,10.0,8.0,11.0,21.0,70.0,56.0,77.0]|
+-----------------+-------------------+------------------------------------------------------------------------------+
I think a useful example should show something more like this?
|
Ping @hayashidac I'd be happy to work with you to get a more complete example into the docs for this operation. It could use them, yes. |
|
@srowen sorry, I have some other task and will work on this tomorrow. |
|
I changed input column format. Please check again. |
|
Yeah I see what you've done, but the example still doesn't suggest what the output is. I think it's OK if the example has to be run to show the output, but, I think the documentation needs to exhibit a more representative input and output so that people understand what it does without having to run the example. |
|
I reflected the contents of the indications. Please confirm it. |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with a few edits to the text.
docs/ml-features.md
Outdated
|
|
||
| ## Interaction | ||
|
|
||
| `Interaction` is a `Transformer` which takes a vector/double columns, and generate a single vector column that contains multiplication results of all combination of each vector/double values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small changes: "... which takes vector or double-valued columns, and generates a ..."
"... contains the product of all combinations of one value from each input column"
docs/ml-features.md
Outdated
|
|
||
| `Interaction` is a `Transformer` which takes a vector/double columns, and generate a single vector column that contains multiplication results of all combination of each vector/double values. | ||
|
|
||
| For example, if you have two vector type columns each of which contains three double type values as input columns, then you'll get a vector with 9 double type values as the output column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest: "... each of which has 3 dimensions ..." "... then you'll get a 9-dimensional vector ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your check. I fixed the document for your comment.
I created Scala and Java example and added documentation. Author: chie8842 <hayashidac@nttdata.co.jp> Closes #15658 from hayashidac/SPARK-13770. (cherry picked from commit ee2e741) Signed-off-by: Sean Owen <sowen@cloudera.com>
|
Merged to master/2.1 |
I created Scala and Java example and added documentation. Author: chie8842 <hayashidac@nttdata.co.jp> Closes apache#15658 from hayashidac/SPARK-13770.
I created Scala and Java example and added documentation.