Skip to content

Conversation

@bzz
Copy link
Member

@bzz bzz commented Jul 11, 2016

What is this PR for?

Add new interpreter to Python group: %python.sql for SQL over DataFrame support

What type of PR is it?

Improvement

TODOs

  • add new interpreter %python.sql
  • add test
  • make Python-dependant tests, excluded from CI
    • PythonInterpreterWithPythonInstalledTest
    • PythonPandasSqlInterpreterTest
    • run manually by mvn -Dpython.test.exclude='' test -pl python -am
  • add docs %python.sql
  • make %python.sql fail gracefully in case there is no Pandas or PandaSQL installed
  • after [ZEPPELIN-605] Add support for Scala 2.11 #747 is merged - rebase and remove -Dpython.test.exclude='' from both profiles

What is the Jira issue?

ZEPPELIN-1115

How should this be tested?

mvn -Dpython.test.exclude='' test -pl python -am should pass or manually run

  • Given the DataFrame i.e

    %python
    import pandas as pd
    rates = pd.read_csv("bank.csv", sep=";")
    
  • SQL query it like

    %python.sql
    SELECT * FROM rates LIMIT 10
    

Screenshots (if appropriate)

screen shot 2016-07-11 at 23 56 04

Questions:

  • Does the licenses files need update? No, no dependencies were included in source or binary release
  • Is there breaking changes for older versions? No
  • Does this needs documentation? Yes

@bzz bzz changed the title ZEPPELIN-1115: ZEPPELIN-1115: Python - new interpreter for SQL over DataFrame Jul 12, 2016
@bzz bzz changed the title ZEPPELIN-1115: Python - new interpreter for SQL over DataFrame ZEPPELIN-1115: Python - interpreter for SQL over DataFrame Jul 12, 2016

## Pandas integration
[Zeppelin Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides simple API to visualize data in Pandas DataFrames, same as in Matplotlib.
Apace Zeppelin [Table Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides build-in data visualization capabilities. Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, same as with [Matplotlib integration](#matplotlib-integration).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for proof-reading! Late night commits a bad...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean built-in?
And how about adding this link http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html to Pandas DataFrames? It would be helpful to users i think :)

(Great work indeed! 👍 )

@bzz bzz force-pushed the ZEPPELIN-1115/python/add-sql-for-dataframes branch from d20c678 to 886949b Compare July 13, 2016 00:42

## Technical description

For in-depth technical details on current implementation plese reffer [python/README.md](https://github.com/apache/zeppelin/blob/master/python/README.md).
Copy link
Contributor

@AhyoungRyu AhyoungRyu Jul 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo. plese reffer -> please refer to

@bzz
Copy link
Member Author

bzz commented Jul 14, 2016

Documentation review addressed in e432961

@bzz
Copy link
Member Author

bzz commented Jul 14, 2016

feedback on graceful failure addressed in a378226

@khalidhuseynov
Copy link
Member

Thanks for the improvement, LGTM

@bzz bzz force-pushed the ZEPPELIN-1115/python/add-sql-for-dataframes branch from 11da87c to a378226 Compare July 14, 2016 06:05
@bzz
Copy link
Member Author

bzz commented Jul 14, 2016

Thank you guys for prompt reviews!

Have added one minor TODO item to cleanup test profiles on CI, will merge after #747

@bzz bzz force-pushed the ZEPPELIN-1115/python/add-sql-for-dataframes branch from a378226 to 0f2f852 Compare July 15, 2016 08:07
@bzz
Copy link
Member Author

bzz commented Jul 15, 2016

Done, merging after CI ♻️ if there is no further discussion

@asfgit asfgit closed this in d8b54cf Jul 15, 2016
@bzz bzz deleted the ZEPPELIN-1115/python/add-sql-for-dataframes branch July 15, 2016 09:38
PhilippGrulich pushed a commit to SWC-SENSE/zeppelin that referenced this pull request Aug 8, 2016
### What is this PR for?
Add new interpreter to Python group: `%python.sql` for SQL over DataFrame support

### What type of PR is it?
Improvement

### TODOs
* [x] add new interpreter `%python.sql`
* [x] add test
* [x] make Python-dependant tests, excluded from CI
   * PythonInterpreterWithPythonInstalledTest
   * PythonPandasSqlInterpreterTest
   * run manually by `mvn -Dpython.test.exclude='' test -pl python -am`
* [x] add docs `%python.sql`
* [x] make `%python.sql` fail gracefully in case there is no Pandas or PandaSQL installed
* [x] after apache#747 is merged - rebase and remove `-Dpython.test.exclude=''` from both profiles

### What is the Jira issue?
[ZEPPELIN-1115](https://issues.apache.org/jira/browse/ZEPPELIN-1115)

### How should this be tested?
`mvn -Dpython.test.exclude='' test -pl python -am` should pass or manually run
 - Given the DataFrame i.e

  ```
%python
import pandas as pd
rates = pd.read_csv("bank.csv", sep=";")
  ```
 - SQL query it like

  ```
%python.sql
SELECT * FROM rates LIMIT 10
  ```

### Screenshots (if appropriate)
![screen shot 2016-07-11 at 23 56 04](https://cloud.githubusercontent.com/assets/5582506/16735171/1ebb9354-47c3-11e6-9354-6364e9374a20.png)

### Questions:
* Does the licenses files need update? No, no dependencies were included in source or binary release
* Is there breaking changes for older versions? No
* Does this needs documentation? Yes

Author: Alexander Bezzubov <bzz@apache.org>

Closes apache#1164 from bzz/ZEPPELIN-1115/python/add-sql-for-dataframes and squashes the following commits:

0f2f852 [Alexander Bezzubov] Fail SQL gracefully if no python dependencies installed
aca2bdf [Alexander Bezzubov] Fix typos in docs ⚡
158ba6a [Alexander Bezzubov] Remove third-party dependant test from CI
5fe46fc [Alexander Bezzubov] Update Python Matplotlib notebook example
72884c8 [Alexander Bezzubov] Add docs for %python.sql feature
e931dc4 [Alexander Bezzubov] Make test for PythonPandasSqlInterpreter usable
76bbb44 [Alexander Bezzubov] Complete implementation of the PythonPandasSqlInterpreter
f6ca1eb [Alexander Bezzubov] Add %python.sql to interpreter menue
11ba490 [Alexander Bezzubov] Add draft implementation of %python.sql for DataFrames
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants