-
Notifications
You must be signed in to change notification settings - Fork 2.8k
ZEPPELIN-1115: Python - interpreter for SQL over DataFrame #1164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZEPPELIN-1115: Python - interpreter for SQL over DataFrame #1164
Conversation
docs/interpreter/python.md
Outdated
|
|
||
| ## Pandas integration | ||
| [Zeppelin Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides simple API to visualize data in Pandas DataFrames, same as in Matplotlib. | ||
| Apace Zeppelin [Table Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides build-in data visualization capabilities. Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, same as with [Matplotlib integration](#matplotlib-integration). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for proof-reading! Late night commits a bad...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean built-in?
And how about adding this link http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html to Pandas DataFrames? It would be helpful to users i think :)
(Great work indeed! 👍 )
d20c678 to
886949b
Compare
docs/interpreter/python.md
Outdated
|
|
||
| ## Technical description | ||
|
|
||
| For in-depth technical details on current implementation plese reffer [python/README.md](https://github.com/apache/zeppelin/blob/master/python/README.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo. plese reffer -> please refer to
|
Documentation review addressed in e432961 |
|
feedback on graceful failure addressed in a378226 |
|
Thanks for the improvement, LGTM |
11da87c to
a378226
Compare
|
Thank you guys for prompt reviews! Have added one minor TODO item to cleanup test profiles on CI, will merge after #747 |
a378226 to
0f2f852
Compare
|
Done, merging after CI ♻️ if there is no further discussion |
### What is this PR for? Add new interpreter to Python group: `%python.sql` for SQL over DataFrame support ### What type of PR is it? Improvement ### TODOs * [x] add new interpreter `%python.sql` * [x] add test * [x] make Python-dependant tests, excluded from CI * PythonInterpreterWithPythonInstalledTest * PythonPandasSqlInterpreterTest * run manually by `mvn -Dpython.test.exclude='' test -pl python -am` * [x] add docs `%python.sql` * [x] make `%python.sql` fail gracefully in case there is no Pandas or PandaSQL installed * [x] after apache#747 is merged - rebase and remove `-Dpython.test.exclude=''` from both profiles ### What is the Jira issue? [ZEPPELIN-1115](https://issues.apache.org/jira/browse/ZEPPELIN-1115) ### How should this be tested? `mvn -Dpython.test.exclude='' test -pl python -am` should pass or manually run - Given the DataFrame i.e ``` %python import pandas as pd rates = pd.read_csv("bank.csv", sep=";") ``` - SQL query it like ``` %python.sql SELECT * FROM rates LIMIT 10 ``` ### Screenshots (if appropriate)  ### Questions: * Does the licenses files need update? No, no dependencies were included in source or binary release * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alexander Bezzubov <bzz@apache.org> Closes apache#1164 from bzz/ZEPPELIN-1115/python/add-sql-for-dataframes and squashes the following commits: 0f2f852 [Alexander Bezzubov] Fail SQL gracefully if no python dependencies installed aca2bdf [Alexander Bezzubov] Fix typos in docs ⚡ 158ba6a [Alexander Bezzubov] Remove third-party dependant test from CI 5fe46fc [Alexander Bezzubov] Update Python Matplotlib notebook example 72884c8 [Alexander Bezzubov] Add docs for %python.sql feature e931dc4 [Alexander Bezzubov] Make test for PythonPandasSqlInterpreter usable 76bbb44 [Alexander Bezzubov] Complete implementation of the PythonPandasSqlInterpreter f6ca1eb [Alexander Bezzubov] Add %python.sql to interpreter menue 11ba490 [Alexander Bezzubov] Add draft implementation of %python.sql for DataFrames
What is this PR for?
Add new interpreter to Python group:
%python.sqlfor SQL over DataFrame supportWhat type of PR is it?
Improvement
TODOs
%python.sqlmvn -Dpython.test.exclude='' test -pl python -am%python.sql%python.sqlfail gracefully in case there is no Pandas or PandaSQL installed-Dpython.test.exclude=''from both profilesWhat is the Jira issue?
ZEPPELIN-1115
How should this be tested?
mvn -Dpython.test.exclude='' test -pl python -amshould pass or manually runGiven the DataFrame i.e
SQL query it like
Screenshots (if appropriate)
Questions: