- 
                Notifications
    You must be signed in to change notification settings 
- Fork 705
Add run SQLFlow with hive server tutorial #868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add run SQLFlow with hive server tutorial #868
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the reader of this document is the user of SQLFlow, so the document should focus on deploying SQLFlow with their existing Hive service. While the content looks more like "Hi SQLFlow developers, here are two examples of setting up SQLFlow - Hive testing environment".
We should assume the reader is not familiar with SQLFlow, so every new concept needs a detailed explanation. For example, when we are mentioning --datasource='hive://root:root@localhost:10000/', we should explain the data source is in the format of hive://user:password@address/database?param_n=arg_n, so that the user can adjust according to his/her own use case.
        
          
                doc/run_with_hive.md
              
                Outdated
          
        
      |  | ||
| This document is a tutorial on how to run SQLFlow, which connects to the hive server2. | ||
|  | ||
| For the most production environment, the system administrators may setup hive server with [authentication configuration](https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-Authentication/SecurityConfiguration): e.g., KERBEROS, LDAP, PAM, or CUSTOM. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we supported KERBEROS, LDAP, ...? If we have supported them all, we should provide specific examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sql-machine-learning/gohive use beltran/gohive to connect hiveserver2 with SASLTransport, I think we need to add more test in sql-machine-learning/gohive to make sure it works well in SQLFlow, before that we can leave one example (PAM auth) here.
| Test SQLFlow by running a query in Jupyter Notebook | ||
|  | ||
| ``` bash | ||
| > docker run --rm --net=container:hive sqlflow/sqlflow \ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add -p 8888:8888 so that the Jupyter server in the container can be accessed by the browser on the host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sqlflow Docker container shared the network stack of hive container by --net=container:hive  and the hive container exposed the port to host -p  8888:8888.
| @Yancey1989 Maybe the following structure would help. Run SQLFlow with Hive via HiveServer2This tutorial explains how to connect SQLFlow with Hive via HiveServer2. It has been tested on the Hive version [...]. Connect existing Hive ServiceTo connect an existing Hive Service, we only need to configure a data source string in the format of The data source string contains the credential and the configurations for connecting Hive. For example, if we want to connect the database  Using the data source string, we can start an all-in-one SQLFlow container by running docker run --rm -p 8888:8888 sqlflow/sqlflow bash -c \
"sqlflowserver --datasource='hive://root:root@localhost:10000/iris' &
SQLFLOW_SERVER=localhost:50051 jupyter notebook --ip=0.0.0.0 --port=8888 --allow-root --NotebookApp.token=''"Then we can open a web browser and go to  Connect standalone Hive server for testingWe also pack a Hive server in a Docker image for testing. .... @typhoonzero @weiguoz Maybe we should let MySQL and MaxCompute tutorials follow a similar structure? | 
        
          
                doc/run_with_hive.md
              
                Outdated
          
        
      | @@ -0,0 +1,51 @@ | |||
| # Run SQLFlow with Hive via HiveServer2 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to connect Hive with SQLFlow
| Thanks @tonyyang-svail , I updated this PR followed your comment. | 
5927b2c    to
    e52c83f      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| To connect an existing Hive server instance, we only need to configure a `datasource` string in the format of | ||
|  | ||
| ``` text | ||
| hive://user:password@ip:port/dbname[?auth=<auth_mechanism>&session.<cfg_key1>=<cfg_value1>...&session<cfg_keyN>=valueN] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing dot: session<cfg_keyN>=valueN]=>session.<cfg_keyN>=valueN]
* run sqlflow with hive * add how sqlflow connects with hive tutorial
fixed #835