Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@ matrix:
include:
# Test all modules with scala 2.10
- jdk: "oraclejdk7"
env: SPARK_VER="1.6.1" HADOOP_VER="2.3" PROFILE="-Pspark-1.6 -Pr -Phadoop-2.3 -Ppyspark -Psparkr -Pscalding -Pexamples" BUILD_FLAG="package -Dscala-2.10 -Pbuild-distr" TEST_FLAG="verify -Pusing-packaged-distr" TEST_PROJECTS="-Dpython.test.exclude=''"
env: SPARK_VER="1.6.1" HADOOP_VER="2.3" PROFILE="-Pspark-1.6 -Pr -Phadoop-2.3 -Ppyspark -Psparkr -Pscalding -Pexamples" BUILD_FLAG="package -Dscala-2.10 -Pbuild-distr" TEST_FLAG="verify -Pusing-packaged-distr" TEST_PROJECTS=""

# Test all modules with scala 2.11
- jdk: "oraclejdk7"
env: SPARK_VER="1.6.1" HADOOP_VER="2.3" PROFILE="-Pspark-1.6 -Pr -Phadoop-2.3 -Ppyspark -Psparkr -Pscalding -Pexamples -Pscala-2.11" BUILD_FLAG="package -Dscala-2.11 -Pbuild-distr" TEST_FLAG="verify -Pusing-packaged-distr" TEST_PROJECTS="-Dpython.test.exclude=''"
env: SPARK_VER="1.6.1" HADOOP_VER="2.3" PROFILE="-Pspark-1.6 -Pr -Phadoop-2.3 -Ppyspark -Psparkr -Pscalding -Pexamples -Pscala-2.11" BUILD_FLAG="package -Dscala-2.11 -Pbuild-distr" TEST_FLAG="verify -Pusing-packaged-distr" TEST_PROJECTS=""

# Test spark module for 1.5.2
- jdk: "oraclejdk7"
Expand Down
35 changes: 32 additions & 3 deletions docs/interpreter/python.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ To access the help, type **help()**
## Python modules
The interpreter can use all modules already installed (with pip, easy_install...)

## Use Zeppelin Dynamic Forms
## Using Zeppelin Dynamic Forms
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html) inside your Python code.

**Zeppelin Dynamic Form can only be used if py4j Python library is installed in your system. If not, you can install it with `pip install py4j`.**
Expand All @@ -65,6 +65,7 @@ print (z.select("f1",[("o1","1"),("o2","2")],"2"))
print("".join(z.checkbox("f3", [("o1","1"), ("o2","2")],["1"])))
```


## Zeppelin features not fully supported by the Python Interpreter

* Interrupt a paragraph execution (`cancel()` method) is currently only supported in Linux and MacOs. If interpreter runs in another operating system (for instance MS Windows) , interrupt a paragraph will close the whole interpreter. A JIRA ticket ([ZEPPELIN-893](https://issues.apache.org/jira/browse/ZEPPELIN-893)) is opened to implement this feature in a next release of the interpreter.
Expand Down Expand Up @@ -94,7 +95,7 @@ z.show(plt, height='150px')


## Pandas integration
[Zeppelin Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides simple API to visualize data in Pandas DataFrames, same as in Matplotlib.
Apache Zeppelin [Table Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides built-in data visualization capabilities. Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, same as with [Matplotlib integration](#matplotlib-integration).

Example:

Expand All @@ -104,6 +105,34 @@ rates = pd.read_csv("bank.csv", sep=";")
z.show(rates)
```

## SQL over Pandas DataFrames

There is a convenience `%python.sql` interpreter that matches Apache Spark experience in Zeppelin and enables usage of SQL language to query [Pandas DataFrames](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) and visualization of results though built-in [Table Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table).

**Pre-requests**

- Pandas `pip install pandas`
- PandaSQL `pip install -U pandasql`

In case default binded interpreter is Python (first in the interpreter list, under the _Gear Icon_), you can just use it as `%sql` i.e

- first paragraph

```python
import pandas as pd
rates = pd.read_csv("bank.csv", sep=";")
```

- next paragraph

```sql
%sql
SELECT * FROM rates WHERE age < 40
```

Otherwise it can be referred to as `%python.sql`


## Technical description

For in-depth technical details on current implementation plese reffer [python/README.md](https://github.com/apache/zeppelin/blob/master/python/README.md).
For in-depth technical details on current implementation please refer to [python/README.md](https://github.com/apache/zeppelin/blob/master/python/README.md).
67 changes: 37 additions & 30 deletions notebook/2BQA35CJZ/note.json

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,14 @@ Current interpreter implementation spawns new system python process through `Pro

# Details

- **UnitTests**

To run full suit of tests, including ones that depend on real Python interpreter AND external libraries installed (like Pandas, Pandasql, etc) do

```
mvn -Dpython.test.exclude='' test -pl python -am
```

- **Py4j support**

[Py4j](https://www.py4j.org/) enables Python programs to dynamically access Java objects in a JVM.
Expand Down Expand Up @@ -40,3 +48,5 @@ Current interpreter implementation spawns new system python process through `Pro
* JavaBuilder can't send SIGINT signal to interrupt paragraph execution. Therefore interpreter directly send a `kill SIGINT PID` to python process to interrupt execution. Python process catch SIGINT signal with some code defined in bootstrap.py

* Matplotlib display feature is made with SVG export (in string) and then displays it with html code.

* `%python.sql` support for Pandas DataFrames is optional and provided using https://github.com/yhat/pandasql if user have one installed
5 changes: 4 additions & 1 deletion python/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,10 @@

<properties>
<py4j.version>0.9.2</py4j.version>
<python.test.exclude>**/PythonInterpreterWithPythonInstalledTest.java</python.test.exclude>
<python.test.exclude>
**/PythonInterpreterWithPythonInstalledTest.java,
**/PythonInterpreterPandasSqlTest.java
</python.test.exclude>
</properties>

<dependencies>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -204,15 +204,20 @@ private Job getRunningJob(String paragraphId) {
}


private String sendCommandToPython(String cmd) {
/**
* Sends given text to Python interpreter, blocks and returns the output
* @param cmd Python expression text
* @return output
*/
String sendCommandToPython(String cmd) {
String output = "";
LOG.info("Sending : \n" + (cmd.length() > 200 ? cmd.substring(0, 200) + "..." : cmd));
LOG.debug("Sending : \n" + (cmd.length() > 200 ? cmd.substring(0, 200) + "..." : cmd));
try {
output = process.sendAndGetResult(cmd);
} catch (IOException e) {
LOG.error("Error when sending commands to python process", e);
}
//logger.info("Got : \n" + output);
LOG.debug("Got : \n" + output);
return output;
}

Expand Down Expand Up @@ -243,11 +248,7 @@ public Integer getPy4jPort() {

public Boolean isPy4jInstalled() {
String output = sendCommandToPython("\n\nimport py4j\n");
if (output.contains("ImportError")) {
return false;
} else {
return true;
}
return !output.contains("ImportError");
}

private int findRandomOpenPortOnAllLocalInterfaces() {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.zeppelin.python;

import java.io.IOException;
import java.util.Properties;

import org.apache.zeppelin.interpreter.Interpreter;
import org.apache.zeppelin.interpreter.InterpreterContext;
import org.apache.zeppelin.interpreter.InterpreterResult;
import org.apache.zeppelin.interpreter.LazyOpenInterpreter;
import org.apache.zeppelin.interpreter.WrappedInterpreter;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
* SQL over Pandas DataFrame interpreter for %python group
*
* Match experience of %sparpk.sql over Spark DataFrame
*/
public class PythonInterpreterPandasSql extends Interpreter {
private static final Logger LOG = LoggerFactory.getLogger(PythonInterpreterPandasSql.class);

private String SQL_BOOTSTRAP_FILE_PY = "/bootstrap_sql.py";

public PythonInterpreterPandasSql(Properties property) {
super(property);
}

PythonInterpreter getPythonInterpreter() {
LazyOpenInterpreter lazy = null;
PythonInterpreter python = null;
Interpreter p = getInterpreterInTheSameSessionByClassName(PythonInterpreter.class.getName());

while (p instanceof WrappedInterpreter) {
if (p instanceof LazyOpenInterpreter) {
lazy = (LazyOpenInterpreter) p;
}
p = ((WrappedInterpreter) p).getInnerInterpreter();
}
python = (PythonInterpreter) p;

if (lazy != null) {
lazy.open();
}
return python;
}

@Override
public void open() {
LOG.info("Open Python SQL interpreter instance: {}", this.toString());
try {
LOG.info("Bootstrap {} interpreter with {}", this.toString(), SQL_BOOTSTRAP_FILE_PY);
PythonInterpreter python = getPythonInterpreter();
python.bootStrapInterpreter(SQL_BOOTSTRAP_FILE_PY);
} catch (IOException e) {
LOG.error("Can't execute " + SQL_BOOTSTRAP_FILE_PY + " to import SQL dependencies", e);
}
}

/**
* Checks if Python dependencies pandas and pandasql are installed
* @return True if they are
*/
boolean isPandasAndPandasqlInstalled() {
PythonInterpreter python = getPythonInterpreter();
String output = python.sendCommandToPython("\n\nimport pandas\nimport pandasql\n");
return !output.contains("ImportError");
}

@Override
public void close() {
LOG.info("Close Python SQL interpreter instance: {}", this.toString());
Interpreter python = getPythonInterpreter();
python.close();
}

@Override
public InterpreterResult interpret(String st, InterpreterContext context) {
LOG.info("Running SQL query: '{}' over Pandas DataFrame", st);
Interpreter python = getPythonInterpreter();
return python.interpret("z.show(pysqldf('" + st + "'))", context);
}

@Override
public void cancel(InterpreterContext context) {

}

@Override
public FormType getFormType() {
return FormType.SIMPLE;
}

@Override
public int getProgress(InterpreterContext context) {
return 0;
}

}
25 changes: 25 additions & 0 deletions python/src/main/resources/bootstrap.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ def help():
print ('''<pre>z.show(plt,width='50px')
z.show(plt,height='150px') </pre></div>''')
print ('<h3>Pandas DataFrame</h3>')
print ('<div> You need to have Pandas module installed ')
print ('to use this functionality (pip install pandas) !</div><br/>')
print """
<div>The interpreter can visualize Pandas DataFrame
with the function z.show()
Expand All @@ -81,6 +83,27 @@ def help():
z.show(df)
</pre></div>
"""
print ('<h3>SQL over Pandas DataFrame</h3>')
print ('<div> You need to have Pandas&Pandasql modules installed ')
print ('to use this functionality (pip install pandas pandasql) !</div><br/>')
print """
<div>Python interpreter group includes %sql interpreter that can query
Pandas DataFrames using SQL and visualize results using Zeppelin Table Display System

<pre>
%python
import pandas as pd
df = pd.read_csv("bank.csv", sep=";")
</pre>
<br />

<pre>
%python.sql
%sql
SELECT * from df LIMIT 5
</pre></div>
"""


class PyZeppelinContext(object):
""" If py4j is detected, these class will be override
Expand Down Expand Up @@ -109,6 +132,8 @@ def show(self, p, **kwargs):
# `isinstance(p, DataFrame)` would req `import pandas.core.frame.DataFrame`
# and so a dependency on pandas
self.show_dataframe(p, **kwargs)
elif hasattr(p, '__call__'):
p() #error reporting

def show_dataframe(self, df, **kwargs):
"""Pretty prints DF using Table Display System
Expand Down
28 changes: 28 additions & 0 deletions python/src/main/resources/bootstrap_sql.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Setup SQL over Pandas DataFrames
# It requires next dependencies to be installed:
# - pandas
# - pandasql

from __future__ import print_function

try:
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
except ImportError:
pysqldf = lambda q: print("Can not run SQL over Pandas DataFrame" +
"Make sure 'pandas' and 'pandasql' libraries are installed")
6 changes: 6 additions & 0 deletions python/src/main/resources/interpreter-setting.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,11 @@
"description": "Max number of dataframe rows to display."
}
}
},
{
"group": "python",
"name": "sql",
"className": "org.apache.zeppelin.python.PythonPandasSqlInterpreter",
"properties": { }
}
]
Loading