Skip to content

Conversation

@lresende
Copy link
Member

@lresende lresende commented Feb 25, 2016

Enable Zeppelin to be built with both Scala 2.10
and Scala 2.11, mostly to start supporting interpreters
that are moving to Scala 2.11 only such as Spark.

Before testing this PR, one would need to build Spark 1.6.1 for example with Scala 2.11 and build Flink 1.0 with Scala 2.11

@lresende lresende changed the title [ZEPPELIN-605] Add support for Scala 2.11 [ZEPPELIN-605][WIP] Add support for Scala 2.11 Feb 25, 2016
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-actor_${flink.scala.binary.version}</artifactId>
<artifactId>akka-actor_${scala.binary.version}</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there might still be a need for interpreter individually requiring different version of Scala? I think for now it might make sense to have separate flink.scala, ignite.scala and so on

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the case for having interpreter specific scala versions ? Now it seems that everything work with latest scala 2.10. I am more towards simplifying the build now, and making it more complex when actually needed. Do we have a concrete case where this is needed now ?

If we are talking here about actually scala 2.10 versus 2.11, I plan to handle that by profiles/modules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was referring to flink.scala.binary.version, ignite.scala.binary.version.
Maybe Ignite doesn't support Scala 2.11? https://github.com/apache/ignite/blob/master/pom.xml#L453

@lresende lresende force-pushed the scala-210-211 branch 16 times, most recently from 29facb9 to 5a02318 Compare April 2, 2016 21:24
@lresende lresende force-pushed the scala-210-211 branch 2 times, most recently from 2886a4f to 88452f0 Compare April 16, 2016 20:21
@lresende
Copy link
Member Author

The current status of this PR is that it's working ok with Scala 2.10 and Scala 2.11, I need to try to have one source for both versions of scala, and also rebase to pickup the most recent projects added to trunk

@adeandrade
Copy link

adeandrade commented May 11, 2016

Hi @lresende, I rebased the PR and built a distribution with:

mvn clean package -Pbuild-distr -Pscala-2.11 -Pspark-1.6 -Phadoop-2.6 -Pyarn -Ppyspark -Dscala-2.11 -DskipTests -Dcheckstyle.skip=true clean install

But I still don't have access to a SparkContext instance:

java.lang.NoSuchMethodException: scala.tools.nsc.interpreter.ILoop$ILoopInterpreter.classServerUri()
at java.lang.Class.getMethod(Class.java:1786)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:276)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:518)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

Am I missing anything?

Thanks.

@felixcheung
Copy link
Member

@adeandrade this seems to be a breaking change in Spark 2.0, also being fixed in #868

@karuppayya
Copy link
Contributor

@adeandrade is ur spark binary compiled with scala-2.11, else it will continue to use the repl classes from scala-2.10(https://issues.apache.org/jira/browse/SPARK-1812 ) and you would hit the exception.
I was able to consume this PR and use Zeppelin with spark 1.5 compiled with scala-2.11.

@adeandrade
Copy link

@karup1990 Yes, it is. Did you rebase with master? I think it works if you don't. @felixcheung suggests the problem is being fixed in #868. I'm waiting for that PR to be merged to try again. If it doesn't work then I'll fall back to your suggestion. Thanks.

@lresende
Copy link
Member Author

Sorry to not be so responsive here, I was busy with conferences, etc... but should be able to devote some time to this again.

This should work if you don't rebase, otherwise we need to update the new modules such as R extensions to properly work on Scala 2.11. Note that one way or another, you probably need to build Spark and Flink with Scala 2.11 and make sure you use maven "install" and not only "package".
If folks are eager to try it, i can try to rebase, while I work on the final version of this PR, which will try to merge the two scala 2.10 and 2.11 into one code, using some reflection magic.

@lresende lresende force-pushed the scala-210-211 branch 3 times, most recently from 31d4234 to 6952c4c Compare May 26, 2016 14:29
@Leemoonsoo
Copy link
Member

@lresende Awesome!

Tested and Looks good to me!

return invokeMethod(o, name, new Class[]{}, new Object[]{});
}

private Object invokeMethod(Object o, String name, Class [] argTypes, Object [] params) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's awesome, you did a lot of work to get both Scala 2.10 and 2.11 supported!

Probably a nit, but how do you think, shall these guys be pulled up to some common ancestor to avoid duplication between DepInterpreter and SparkInterpreter? Or may be just extracted and re-used?

@bzz
Copy link
Member

bzz commented Jul 12, 2016

Looks 👍 💯 to me

@lresende
Copy link
Member Author

@bzz and @Leemoonsoo

I have updated the readme to match the ci, and also refactored the reflection utility methods to a separate class.

@Leemoonsoo
Copy link
Member

Looks like there were some network problem on last CI bulid. Could you trigger ci once again and see if it goes to green?

@lresende
Copy link
Member Author

@Leemoonsoo all back to green

@Leemoonsoo
Copy link
Member

Looks good to me!

@bzz
Copy link
Member

bzz commented Jul 14, 2016

Looks great to me! Thank you @lresende

Let's merge if there is no further discussion

@Leemoonsoo
Copy link
Member

Merging it into master and branch-0.6

@bzz
Copy link
Member

bzz commented Jul 15, 2016

@Leemoonsoo could you into non-trivial merge conflicts that happen on merging to branch-0.6 ?

@Leemoonsoo
Copy link
Member

@bzz Sure

asfgit pushed a commit that referenced this pull request Jul 15, 2016
### What is this PR for?
Add new interpreter to Python group: `%python.sql` for SQL over DataFrame support

### What type of PR is it?
Improvement

### TODOs
* [x] add new interpreter `%python.sql`
* [x] add test
* [x] make Python-dependant tests, excluded from CI
   * PythonInterpreterWithPythonInstalledTest
   * PythonPandasSqlInterpreterTest
   * run manually by `mvn -Dpython.test.exclude='' test -pl python -am`
* [x] add docs `%python.sql`
* [x] make `%python.sql` fail gracefully in case there is no Pandas or PandaSQL installed
* [x] after #747 is merged - rebase and remove `-Dpython.test.exclude=''` from both profiles

### What is the Jira issue?
[ZEPPELIN-1115](https://issues.apache.org/jira/browse/ZEPPELIN-1115)

### How should this be tested?
`mvn -Dpython.test.exclude='' test -pl python -am` should pass or manually run
 - Given the DataFrame i.e

  ```
%python
import pandas as pd
rates = pd.read_csv("bank.csv", sep=";")
  ```
 - SQL query it like

  ```
%python.sql
SELECT * FROM rates LIMIT 10
  ```

### Screenshots (if appropriate)
![screen shot 2016-07-11 at 23 56 04](https://cloud.githubusercontent.com/assets/5582506/16735171/1ebb9354-47c3-11e6-9354-6364e9374a20.png)

### Questions:
* Does the licenses files need update? No, no dependencies were included in source or binary release
* Is there breaking changes for older versions? No
* Does this needs documentation? Yes

Author: Alexander Bezzubov <bzz@apache.org>

Closes #1164 from bzz/ZEPPELIN-1115/python/add-sql-for-dataframes and squashes the following commits:

0f2f852 [Alexander Bezzubov] Fail SQL gracefully if no python dependencies installed
aca2bdf [Alexander Bezzubov] Fix typos in docs ⚡
158ba6a [Alexander Bezzubov] Remove third-party dependant test from CI
5fe46fc [Alexander Bezzubov] Update Python Matplotlib notebook example
72884c8 [Alexander Bezzubov] Add docs for %python.sql feature
e931dc4 [Alexander Bezzubov] Make test for PythonPandasSqlInterpreter usable
76bbb44 [Alexander Bezzubov] Complete implementation of the PythonPandasSqlInterpreter
f6ca1eb [Alexander Bezzubov] Add %python.sql to interpreter menue
11ba490 [Alexander Bezzubov] Add draft implementation of %python.sql for DataFrames
@lresende lresende deleted the scala-210-211 branch July 15, 2016 21:03
asfgit pushed a commit that referenced this pull request Jul 15, 2016
Enable Zeppelin to be built with both Scala 2.10
and Scala 2.11, mostly to start supporting interpreters
that are moving to Scala 2.11 only such as Spark.

Before testing this PR, one would need to [build Spark 1.6.1 for example with Scala 2.11](http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211) and [build Flink 1.0 with Scala 2.11](https://ci.apache.org/projects/flink/flink-docs-master/setup/building.html#scala-versions)

Author: Luciano Resende <lresende@apache.org>
Author: Lee moon soo <moon@apache.org>

Closes #747 from lresende/scala-210-211 and squashes the following commits:

b9bdf86 [Luciano Resende] Properly invoke createTempDir from spark utils
c208e69 [Luciano Resende] Fix class reference
87f46de [Luciano Resende] Force build
6e5e5ad [Luciano Resende] Refactor utility methods to helper class
4e2237a [Luciano Resende] Update readme to use profile to build scala 2.11 and match CI
dd79443 [Luciano Resende] Minor formatting change to force build
de4fc10 [Luciano Resende] Minor change to force build
9194218 [Lee moon soo] initialize imain
cbf84c7 [Luciano Resende] Force Scala 2.11 profile to be called
98790a6 [Luciano Resende] Remove obsolete/commented config
6e4f7b0 [Luciano Resende] Force scala-library dependency version based on scala
a3d0525 [Luciano Resende] Fix new code to support both scala versions
e068593 [Luciano Resende] Fix pom.xml merge conflict
736d055 [Lee moon soo] make binary built with scala 2.11 work with spark_2.10 binary
74d8a62 [Luciano Resende] Force close
9f5d2a2 [Lee moon soo] Remove unused methods
fc9e8a0 [Lee moon soo] Update ignite interpreter
6d3e7e2 [Lee moon soo] Update FlinkInterpreter
6b9ff1d [Lee moon soo] SparkContext sharing seems not working in scala 2.11, disable the test
9424769 [Lee moon soo] style
2ec51a3 [Lee moon soo] Fix reflection
c999a2d [Lee moon soo] fix style
dfe6e83 [Lee moon soo] Fix reflection around HttpServer and createTempDir
222e4e7 [Lee moon soo] Fix reflection on creating SparkCommandLine
112ae7d [Lee moon soo] Fix some reflections
b9e0e1e [Lee moon soo] scala 2.11 support for spark interpreter
c88348d [Lee moon soo] Initial scala-210, 211 support in the single binary
5c47d9a [Luciano Resende] [ZEPPELIN-605] Rewrite Spark interpreter based on Scala 2.11 support
a73b68d [Luciano Resende] [ZEPPELIN-605] Enable Scala 2.11 REPL support for Spark Interpreter
175be7a [Luciano Resende] [ZEPPELIN-605] Add Scala 2.11 build profile
82eaefa [Luciano Resende] [ZEPPELIN-605] Add support for Scala 2.11

(cherry picked from commit bd714c2)
Signed-off-by: Lee moon soo <moon@apache.org>
@Leemoonsoo Leemoonsoo mentioned this pull request Jul 16, 2016
4 tasks
asfgit pushed a commit that referenced this pull request Jul 24, 2016
### What is this PR for?
This PR implement spark 2.0 support based on #747.
This PR has approach from #980 which is reimplementing code in scala.

You can try build this branch

```
mvn clean package -Dscala-2.11 -Pspark-2.0 -Dspark.version=2.0.0-preview -Ppyspark -Psparkr -Pyarn -Phadoop-2.6 -DskipTests
```

### What type of PR is it?
Improvements

### Todos
* [x] - Spark 2.0 support
* [x] - Rebase after #747 merge
* [x] - Update LICENSE file
* [x] - Update related document (build)

### What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-759

### How should this be tested?

Build and try
```
mvn clean package -Dscala-2.11 -Pspark-2.0 -Dspark.version=2.0.0-preview -Ppyspark -Psparkr -Pyarn -Phadoop-2.6 -DskipTests
```

### Screenshots (if appropriate)
![spark2](https://cloud.githubusercontent.com/assets/1540981/16771611/fe804038-4805-11e6-8447-3fa4258bb51d.gif)

### Questions:
* Does the licenses files need update? yes
* Is there breaking changes for older versions? no
* Does this needs documentation? yes

Author: Lee moon soo <moon@apache.org>

Closes #1195 from Leemoonsoo/spark-20 and squashes the following commits:

d78b322 [Lee moon soo] trigger ci
8017e8b [Lee moon soo] Remove unnecessary spark.version property
e3141bd [Lee moon soo] restart sparkcluster before sparkr test
1493b2c [Lee moon soo] print spark standalone cluster log when ci test fails
a208cd0 [Lee moon soo] Debug sparkRTest
31369c6 [Lee moon soo] Update license
293896a [Lee moon soo] Update build instruction
862ff6c [Lee moon soo] Make ZeppelinSparkClusterTest.java work with spark 2
839912a [Lee moon soo] Update SPARK_HOME directory detection pattern for 2.0.0-preview in the test
3413707 [Lee moon soo] Update .travis.yml
02bcd5d [Lee moon soo] Update SparkSqlInterpreterTest
f06a2fa [Lee moon soo] Spark 2.0 support
asfgit pushed a commit that referenced this pull request Jul 24, 2016
This PR implement spark 2.0 support based on #747.
This PR has approach from #980 which is reimplementing code in scala.

You can try build this branch

```
mvn clean package -Dscala-2.11 -Pspark-2.0 -Dspark.version=2.0.0-preview -Ppyspark -Psparkr -Pyarn -Phadoop-2.6 -DskipTests
```

Improvements

* [x] - Spark 2.0 support
* [x] - Rebase after #747 merge
* [x] - Update LICENSE file
* [x] - Update related document (build)

https://issues.apache.org/jira/browse/ZEPPELIN-759

Build and try
```
mvn clean package -Dscala-2.11 -Pspark-2.0 -Dspark.version=2.0.0-preview -Ppyspark -Psparkr -Pyarn -Phadoop-2.6 -DskipTests
```

![spark2](https://cloud.githubusercontent.com/assets/1540981/16771611/fe804038-4805-11e6-8447-3fa4258bb51d.gif)

* Does the licenses files need update? yes
* Is there breaking changes for older versions? no
* Does this needs documentation? yes

Author: Lee moon soo <moon@apache.org>

Closes #1195 from Leemoonsoo/spark-20 and squashes the following commits:

d78b322 [Lee moon soo] trigger ci
8017e8b [Lee moon soo] Remove unnecessary spark.version property
e3141bd [Lee moon soo] restart sparkcluster before sparkr test
1493b2c [Lee moon soo] print spark standalone cluster log when ci test fails
a208cd0 [Lee moon soo] Debug sparkRTest
31369c6 [Lee moon soo] Update license
293896a [Lee moon soo] Update build instruction
862ff6c [Lee moon soo] Make ZeppelinSparkClusterTest.java work with spark 2
839912a [Lee moon soo] Update SPARK_HOME directory detection pattern for 2.0.0-preview in the test
3413707 [Lee moon soo] Update .travis.yml
02bcd5d [Lee moon soo] Update SparkSqlInterpreterTest
f06a2fa [Lee moon soo] Spark 2.0 support

(cherry picked from commit 8546666)
Signed-off-by: Lee moon soo <moon@apache.org>
PhilippGrulich pushed a commit to SWC-SENSE/zeppelin that referenced this pull request Aug 8, 2016
Enable Zeppelin to be built with both Scala 2.10
and Scala 2.11, mostly to start supporting interpreters
that are moving to Scala 2.11 only such as Spark.

Before testing this PR, one would need to [build Spark 1.6.1 for example with Scala 2.11](http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211) and [build Flink 1.0 with Scala 2.11](https://ci.apache.org/projects/flink/flink-docs-master/setup/building.html#scala-versions)

Author: Luciano Resende <lresende@apache.org>
Author: Lee moon soo <moon@apache.org>

Closes apache#747 from lresende/scala-210-211 and squashes the following commits:

b9bdf86 [Luciano Resende] Properly invoke createTempDir from spark utils
c208e69 [Luciano Resende] Fix class reference
87f46de [Luciano Resende] Force build
6e5e5ad [Luciano Resende] Refactor utility methods to helper class
4e2237a [Luciano Resende] Update readme to use profile to build scala 2.11 and match CI
dd79443 [Luciano Resende] Minor formatting change to force build
de4fc10 [Luciano Resende] Minor change to force build
9194218 [Lee moon soo] initialize imain
cbf84c7 [Luciano Resende] Force Scala 2.11 profile to be called
98790a6 [Luciano Resende] Remove obsolete/commented config
6e4f7b0 [Luciano Resende] Force scala-library dependency version based on scala
a3d0525 [Luciano Resende] Fix new code to support both scala versions
e068593 [Luciano Resende] Fix pom.xml merge conflict
736d055 [Lee moon soo] make binary built with scala 2.11 work with spark_2.10 binary
74d8a62 [Luciano Resende] Force close
9f5d2a2 [Lee moon soo] Remove unused methods
fc9e8a0 [Lee moon soo] Update ignite interpreter
6d3e7e2 [Lee moon soo] Update FlinkInterpreter
6b9ff1d [Lee moon soo] SparkContext sharing seems not working in scala 2.11, disable the test
9424769 [Lee moon soo] style
2ec51a3 [Lee moon soo] Fix reflection
c999a2d [Lee moon soo] fix style
dfe6e83 [Lee moon soo] Fix reflection around HttpServer and createTempDir
222e4e7 [Lee moon soo] Fix reflection on creating SparkCommandLine
112ae7d [Lee moon soo] Fix some reflections
b9e0e1e [Lee moon soo] scala 2.11 support for spark interpreter
c88348d [Lee moon soo] Initial scala-210, 211 support in the single binary
5c47d9a [Luciano Resende] [ZEPPELIN-605] Rewrite Spark interpreter based on Scala 2.11 support
a73b68d [Luciano Resende] [ZEPPELIN-605] Enable Scala 2.11 REPL support for Spark Interpreter
175be7a [Luciano Resende] [ZEPPELIN-605] Add Scala 2.11 build profile
82eaefa [Luciano Resende] [ZEPPELIN-605] Add support for Scala 2.11
PhilippGrulich pushed a commit to SWC-SENSE/zeppelin that referenced this pull request Aug 8, 2016
### What is this PR for?
Add new interpreter to Python group: `%python.sql` for SQL over DataFrame support

### What type of PR is it?
Improvement

### TODOs
* [x] add new interpreter `%python.sql`
* [x] add test
* [x] make Python-dependant tests, excluded from CI
   * PythonInterpreterWithPythonInstalledTest
   * PythonPandasSqlInterpreterTest
   * run manually by `mvn -Dpython.test.exclude='' test -pl python -am`
* [x] add docs `%python.sql`
* [x] make `%python.sql` fail gracefully in case there is no Pandas or PandaSQL installed
* [x] after apache#747 is merged - rebase and remove `-Dpython.test.exclude=''` from both profiles

### What is the Jira issue?
[ZEPPELIN-1115](https://issues.apache.org/jira/browse/ZEPPELIN-1115)

### How should this be tested?
`mvn -Dpython.test.exclude='' test -pl python -am` should pass or manually run
 - Given the DataFrame i.e

  ```
%python
import pandas as pd
rates = pd.read_csv("bank.csv", sep=";")
  ```
 - SQL query it like

  ```
%python.sql
SELECT * FROM rates LIMIT 10
  ```

### Screenshots (if appropriate)
![screen shot 2016-07-11 at 23 56 04](https://cloud.githubusercontent.com/assets/5582506/16735171/1ebb9354-47c3-11e6-9354-6364e9374a20.png)

### Questions:
* Does the licenses files need update? No, no dependencies were included in source or binary release
* Is there breaking changes for older versions? No
* Does this needs documentation? Yes

Author: Alexander Bezzubov <bzz@apache.org>

Closes apache#1164 from bzz/ZEPPELIN-1115/python/add-sql-for-dataframes and squashes the following commits:

0f2f852 [Alexander Bezzubov] Fail SQL gracefully if no python dependencies installed
aca2bdf [Alexander Bezzubov] Fix typos in docs ⚡
158ba6a [Alexander Bezzubov] Remove third-party dependant test from CI
5fe46fc [Alexander Bezzubov] Update Python Matplotlib notebook example
72884c8 [Alexander Bezzubov] Add docs for %python.sql feature
e931dc4 [Alexander Bezzubov] Make test for PythonPandasSqlInterpreter usable
76bbb44 [Alexander Bezzubov] Complete implementation of the PythonPandasSqlInterpreter
f6ca1eb [Alexander Bezzubov] Add %python.sql to interpreter menue
11ba490 [Alexander Bezzubov] Add draft implementation of %python.sql for DataFrames
PhilippGrulich pushed a commit to SWC-SENSE/zeppelin that referenced this pull request Aug 8, 2016
### What is this PR for?
This PR implement spark 2.0 support based on apache#747.
This PR has approach from apache#980 which is reimplementing code in scala.

You can try build this branch

```
mvn clean package -Dscala-2.11 -Pspark-2.0 -Dspark.version=2.0.0-preview -Ppyspark -Psparkr -Pyarn -Phadoop-2.6 -DskipTests
```

### What type of PR is it?
Improvements

### Todos
* [x] - Spark 2.0 support
* [x] - Rebase after apache#747 merge
* [x] - Update LICENSE file
* [x] - Update related document (build)

### What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-759

### How should this be tested?

Build and try
```
mvn clean package -Dscala-2.11 -Pspark-2.0 -Dspark.version=2.0.0-preview -Ppyspark -Psparkr -Pyarn -Phadoop-2.6 -DskipTests
```

### Screenshots (if appropriate)
![spark2](https://cloud.githubusercontent.com/assets/1540981/16771611/fe804038-4805-11e6-8447-3fa4258bb51d.gif)

### Questions:
* Does the licenses files need update? yes
* Is there breaking changes for older versions? no
* Does this needs documentation? yes

Author: Lee moon soo <moon@apache.org>

Closes apache#1195 from Leemoonsoo/spark-20 and squashes the following commits:

d78b322 [Lee moon soo] trigger ci
8017e8b [Lee moon soo] Remove unnecessary spark.version property
e3141bd [Lee moon soo] restart sparkcluster before sparkr test
1493b2c [Lee moon soo] print spark standalone cluster log when ci test fails
a208cd0 [Lee moon soo] Debug sparkRTest
31369c6 [Lee moon soo] Update license
293896a [Lee moon soo] Update build instruction
862ff6c [Lee moon soo] Make ZeppelinSparkClusterTest.java work with spark 2
839912a [Lee moon soo] Update SPARK_HOME directory detection pattern for 2.0.0-preview in the test
3413707 [Lee moon soo] Update .travis.yml
02bcd5d [Lee moon soo] Update SparkSqlInterpreterTest
f06a2fa [Lee moon soo] Spark 2.0 support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants