Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
2a2bedc
BigQuery Interpreter for Apazhe Zeppelin
babupe-g Jul 12, 2016
75d8ee6
ZEPPELIN-1153 comments committed
babupe-g Jul 15, 2016
50c41fc
Modified code and license
babupe-g Jul 15, 2016
17846f1
Interpreter modification, License, doc changes
babupe-g Jul 15, 2016
a00b48e
Incorporated feedback
babupe-g Jul 15, 2016
089820b
Trying to add screenshot in README
babupe-g Jul 15, 2016
73e3f6d
Added technical description to bigquery.md
babupe-g Jul 17, 2016
2254a49
removed bad package from import
babupe-g Jul 18, 2016
6132d78
Fixing License and skipping failing tests
babupe-g Jul 18, 2016
87f5efe
Added path and specific wording
babupe-g Jul 18, 2016
4b82abd
Fixed license header and added manual unit test documentation
babupe-g Jul 18, 2016
11e88dc
Replaced license header with formatting
babupe-g Jul 18, 2016
5983e36
Exclude constants.json file for rat plugin since its static config file
babupe-g Jul 18, 2016
f872aa0
Removed unnecessary dependencies in pom.xml
babupe-g Jul 19, 2016
17fd4e8
Added cropped interpreter screenshot
babupe-g Jul 19, 2016
f318b20
Pushing cropped screenshots
babupe-g Jul 19, 2016
287744c
Merge branch 'babupe-bigquery' of https://github.com/babupe/zeppelin …
babupe-g Jul 19, 2016
31c373f
Fixed formatting with readme file
babupe-g Jul 19, 2016
4db74c1
Adding license stuff
babupe-g Jul 21, 2016
5a2e674
Added license info for Jackson library and added BQ API source
babupe-g Jul 21, 2016
3d5f8e7
Modified license header
babupe-g Jul 24, 2016
20962d2
Pushing license changes
babupe-g Jul 26, 2016
ae096d2
License changes
babupe-g Jul 26, 2016
aa52553
Merge branch 'master' of https://github.com/apache/zeppelin
babupe-g Jul 26, 2016
d90e10f
Removed BigQuery from notice
babupe-g Jul 26, 2016
e88b017
Created a new license file
babupe-g Jul 26, 2016
22e3487
Update LICENSE
babupe-g Jul 26, 2016
8fa647b
BigQuery Interpreter for Apazhe Zeppelin
babupe-g Jul 12, 2016
17f6d89
ZEPPELIN-1153 comments committed
babupe-g Jul 15, 2016
d85abd2
Modified code and license
babupe-g Jul 15, 2016
764385c
Interpreter modification, License, doc changes
babupe-g Jul 15, 2016
569757f
Incorporated feedback
babupe-g Jul 15, 2016
b6d181c
Trying to add screenshot in README
babupe-g Jul 15, 2016
d0c8e01
Added technical description to bigquery.md
babupe-g Jul 17, 2016
4a3153f
removed bad package from import
babupe-g Jul 18, 2016
bbf26cc
Added path and specific wording
babupe-g Jul 18, 2016
69cb724
Fixed license header and added manual unit test documentation
babupe-g Jul 18, 2016
e520b7b
Exclude constants.json file for rat plugin since its static config file
babupe-g Jul 18, 2016
4a1d29c
Removed unnecessary dependencies in pom.xml
babupe-g Jul 19, 2016
64affbb
Added cropped interpreter screenshot
babupe-g Jul 19, 2016
97874a4
Pushing cropped screenshots
babupe-g Jul 19, 2016
41e076e
Fixed formatting with readme file
babupe-g Jul 19, 2016
3be1912
New changes
babupe-g Jul 26, 2016
7d4f40b
Add exidentaly removed licenses due to merge conflict
bzz Jul 28, 2016
6a95333
Rename Apach2.0 license for google's code to adhere naming conventions
bzz Jul 28, 2016
fcab6b7
Merge pull request #1 from bzz/babupe-final
babupe-g Jul 28, 2016
03a777f
add docs for BigQuery auth outside of GCE
bzz Jul 29, 2016
64525b8
Fix typos in docs
bzz Jul 29, 2016
d3c2316
Merge pull request #2 from bzz/babupe-add-auth-docs
babupe-g Jul 30, 2016
ffed801
pushing BQ Exception to logs and Interpreter error output
babupe-g Jul 30, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,7 @@ The following components are provided under the Apache License. See project link
The text of each license is also included at licenses/LICENSE-[project]-[version].txt.

(Apache 2.0) Bootstrap v3.0.2 (http://getbootstrap.com/) - https://github.com/twbs/bootstrap/blob/v3.0.2/LICENSE
(Apache 2.0) Software under ./bigquery/* was developed at Google (http://www.google.com/). Licensed under the Apache v2.0 License.

========================================================================
BSD 3-Clause licenses
Expand All @@ -270,4 +271,4 @@ BSD 2-Clause licenses
The following components are provided under the BSD 3-Clause license. See file headers and project links for details.

(BSD 2 Clause) portions of SQLLine (http://sqlline.sourceforge.net/) - http://sqlline.sourceforge.net/#license
jdbc/src/main/java/org/apache/zeppelin/jdbc/SqlCompleter.java
jdbc/src/main/java/org/apache/zeppelin/jdbc/SqlCompleter.java
1 change: 0 additions & 1 deletion NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,4 @@ Copyright 2015 - 2016 The Apache Software Foundation
This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).


Portions of this software were developed at NFLabs, Inc. (http://www.nflabs.com)
109 changes: 109 additions & 0 deletions bigquery/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Overview
BigQuery interpreter for Apache Zeppelin

# Pre requisities
You can follow the instructions at [Apache Zeppelin on Dataproc](https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/blob/master/apache-zeppelin/README.MD) to bring up Zeppelin on Google dataproc.
You could also install and bring up Zeppelin on Google compute Engine.

# Unit Tests
BigQuery Unit tests are excluded as these tests depend on the BigQuery external service. This is because BigQuery does not have a local mock at this point.

If you like to run these tests manually, please follow the following steps:
* [Create a new project](https://support.google.com/cloud/answer/6251787?hl=en)
* [Create a Google Compute Engine instance](https://cloud.google.com/compute/docs/instances/create-start-instance)
* Copy the project ID that you created and add it to the property "projectId" in `resources/constants.json`
* Run the command mvn <options> -Dbigquery.text.exclude='' test -pl bigquery -am


# Interpreter Configuration

Configure the following properties during Interpreter creation.

<table class="table-configuration">
<tr>
<th>Name</th>
<th>Default Value</th>
<th>Description</th>
</tr>
<tr>
<td>zeppelin.bigquery.project_id</td>
<td> </td>
<td>Google Project Id</td>
</tr>
<tr>
<td>zeppelin.bigquery.wait_time</td>
<td>5000</td>
<td>Query Timeout in Milliseconds</td>
</tr>
<tr>
<td>zeppelin.bigquery.max_no_of_rows</td>
<td>100000</td>
<td>Max result set size</td>
</tr>
</table>

# Connection
The Interpreter opens a connection with the BigQuery Service using the supplied Google project ID and the compute environment variables.

# Google BigQuery API Javadoc
[API Javadocs](https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is JavaDoc for the artefact

<groupId>com.google.apis</groupId>
<artifactId>google-api-services-bigquery</artifactId>
<version>v2-rev265-1.21.0</version>

right?

AFAIK it's an open-source library, so would you be so kind to add a link here to it's source code please? This could help future maintainers to keep up with changes, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. These packages are licensed under Apache 2.0. I have asked around to see if the code is publicly available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any updates on this one?

[Source] (http://central.maven.org/maven2/com/google/apis/google-api-services-bigquery/v2-rev265-1.21.0/google-api-services-bigquery-v2-rev265-1.21.0-sources.jar)

We have used the curated veneer version of the Java APIs versus [Idiomatic Java client] (https://github.com/GoogleCloudPlatform/gcloud-java/tree/master/gcloud-java-bigquery) to build the interpreter. This is mainly for usability reasons.

# Enabling the BigQuery Interpreter

In a notebook, to enable the **BigQuery** interpreter, click the **Gear** icon and select **bigquery**.

# Using the BigQuery Interpreter

In a paragraph, use `%bigquery.sql` to select the **BigQuery** interpreter and then input SQL statements against your datasets stored in BigQuery.
You can use [BigQuery SQL Reference](https://cloud.google.com/bigquery/query-reference) to build your own SQL.

For Example, SQL to query for top 10 departure delays across airports using the flights public dataset

```bash
%bigquery.sql
SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays
FROM [bigquery-samples:airline_ontime_data.flights]
group by departure_airport
order by 2 desc
limit 10
```

Another Example, SQL to query for most commonly used java packages from the github data hosted in BigQuery

```bash
%bigquery.sql
SELECT
package,
COUNT(*) count
FROM (
SELECT
REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package,
id
FROM (
SELECT
SPLIT(content, '\n') line,
id
FROM
[bigquery-public-data:github_repos.sample_contents]
WHERE
content CONTAINS 'import'
AND sample_path LIKE '%.java'
HAVING
LEFT(line, 6)='import' )
GROUP BY
package,
id )
GROUP BY
1
ORDER BY
count DESC
LIMIT
40
```

# Sample Screenshot

![Zeppelin BigQuery](https://cloud.githubusercontent.com/assets/10060731/16938817/b9213ea0-4db6-11e6-8c3b-8149a0bdf874.png)
177 changes: 177 additions & 0 deletions bigquery/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the standard license hearer that is used in Apache Zeppelin project.
Could you please make sure that it's the same for all new files, added in this PR?

~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<artifactId>zeppelin</artifactId>
<groupId>org.apache.zeppelin</groupId>
<version>0.7.0-SNAPSHOT</version>
</parent>

<groupId>org.apache.zeppelin</groupId>
<artifactId>zeppelin-bigquery</artifactId>
<packaging>jar</packaging>
<version>0.7.0-SNAPSHOT</version>
<name>Zeppelin: BigQuery interpreter</name>
<url>http://www.apache.org</url>

<dependencies>

<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-bigquery</artifactId>
<version>v2-rev265-1.21.0</version>
</dependency>
<dependency>
<groupId>com.google.oauth-client</groupId>
<artifactId>google-oauth-client</artifactId>
<version>${project.oauth.version}</version>
</dependency>
<dependency>
<groupId>com.google.http-client</groupId>
<artifactId>google-http-client-jackson2</artifactId>
<version>${project.http.version}</version>
</dependency>
<dependency>
<groupId>com.google.oauth-client</groupId>
<artifactId>google-oauth-client-jetty</artifactId>
<version>${project.oauth.version}</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.6</version>
</dependency>

<dependency>
<groupId>org.apache.zeppelin</groupId>
<artifactId>zeppelin-interpreter</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</dependency>

<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</dependency>

<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
</dependencies>

<properties>
<project.http.version>1.21.0</project.http.version>
<project.oauth.version>1.21.0</project.oauth.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<bigquery.test.exclude>**/BigQueryInterpreterTest.java</bigquery.test.exclude>
Copy link
Member

@bzz bzz Jul 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excluding tests will fix the CI, but for the future maintainers, assuming they will not know much about it at first, I think we need to have few things documented in ./bigquery/README.md

  • a command for manually running it mvn -Dbigquery.test.exclude='' test -pl bigquery
  • a pre-requests, that need to be configured (and how) in order for this test to pass locally

I tried on my local env and got

$mvn -Dbigquery.test.exclude='' test -pl bigquery
....
java.lang.NullPointerException: null
    at org.apache.zeppelin.bigquery.BigQueryInterpreter.run(BigQueryInterpreter.java:261)
    at org.apache.zeppelin.bigquery.BigQueryInterpreter.executeSql(BigQueryInterpreter.java:246)
    at org.apache.zeppelin.bigquery.BigQueryInterpreter.interpret(BigQueryInterpreter.java:287)
    at org.apache.zeppelin.bigquery.BigQueryInterpreterTest.badSqlSyntaxFails(BigQueryInterpreterTest.java:110)

sqlSuccess(org.apache.zeppelin.bigquery.BigQueryInterpreterTest)  Time elapsed: 0 sec  <<< ERROR!
java.lang.NullPointerException: null
    at org.apache.zeppelin.bigquery.BigQueryInterpreter.run(BigQueryInterpreter.java:261)
    at org.apache.zeppelin.bigquery.BigQueryInterpreter.executeSql(BigQueryInterpreter.java:246)
    at org.apache.zeppelin.bigquery.BigQueryInterpreter.interpret(BigQueryInterpreter.java:287)
    at org.apache.zeppelin.bigquery.BigQueryInterpreterTest.sqlSuccess(BigQueryInterpreterTest.java:101)

Results :

Tests in error:
  BigQueryInterpreterTest.badSqlSyntaxFails:110 » NullPointer
  BigQueryInterpreterTest.sqlSuccess:101 » NullPointer

Tests run: 2, Failures: 0, Errors: 2, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------

which leaves not much clues on what went wrong. What do you think, does it make sense or did I miss something here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A way to run it is documented under README.md now!

</properties>

<build>
<plugins>
<plugin>
<artifactId>maven-enforcer-plugin</artifactId>
<version>1.3.1</version>
<executions>
<execution>
<id>enforce</id>
<phase>none</phase>
</execution>
</executions>
</plugin>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<excludes>
<exclude>${bigquery.test.exclude}</exclude>
</excludes>
</configuration>
</plugin>

<plugin>
<artifactId>maven-dependency-plugin</artifactId>
<version>2.8</version>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/../../interpreter/bqsql</outputDirectory>
<overWriteReleases>false</overWriteReleases>
<overWriteSnapshots>false</overWriteSnapshots>
<overWriteIfNewer>true</overWriteIfNewer>
<includeScope>runtime</includeScope>
</configuration>
</execution>
<execution>
<id>copy-artifact</id>
<phase>package</phase>
<goals>
<goal>copy</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/../../interpreter/bqsql</outputDirectory>
<overWriteReleases>false</overWriteReleases>
<overWriteSnapshots>false</overWriteSnapshots>
<overWriteIfNewer>true</overWriteIfNewer>
<includeScope>runtime</includeScope>
<artifactItems>
<artifactItem>
<groupId>${project.groupId}</groupId>
<artifactId>${project.artifactId}</artifactId>
<version>${project.version}</version>
<type>${project.packaging}</type>
</artifactItem>
</artifactItems>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>
org.apache.zeppelin.bigquery.BigQueryInterpreter
</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
</plugins>
</build>
</project>
Loading