Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3 fix gbq #4335

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions client/app/pages/queries/query.html
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,9 @@ <h3>
<span class="query-metadata__property" ng-if="queryResult.query_result.data.metadata.data_scanned">Data Scanned
<strong>{{ queryResult.query_result.data.metadata.data_scanned | prettySize}}</strong>
</span>
<span class="query-metadata__property" ng-if="queryResult.query_result.data.metadata.query_cost">Query cost in USD
<strong>{{ queryResult.query_result.data.metadata.query_cost }}</strong>
</span>
</span>

<div>
Expand Down
34 changes: 28 additions & 6 deletions redash/query_runner/big_query.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import datetime
import logging
import os
import sys
import time
from base64 import b64decode
Expand All @@ -12,6 +13,7 @@
from redash.utils import json_dumps, json_loads

logger = logging.getLogger(__name__)
EXPOSE_COST = settings.parse_boolean(os.environ.get('GBQ_EXPOSE_COST', 'false'))

try:
import apiclient.errors
Expand Down Expand Up @@ -92,7 +94,17 @@ def enabled(cls):

@classmethod
def configuration_schema(cls):
return {
if EXPOSE_COST:
schema['order'].append('cost_per_tb')
schema['properties'].update({
'cost_per_tb': {
'type': 'number',
'title': 'Google Big Query cost per Tb scanned',
'default': 1.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default pricing for querying is $5 per TB, no?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

o.. yaa my bad it was BigQuery Storage API pricing

$1.10 per TB read The BigQuery Storage API is not included in the free tier.

will be edit asap

}
})

schema.update({
'type': 'object',
'properties': {
'projectId': {
Expand Down Expand Up @@ -132,7 +144,9 @@ def configuration_schema(cls):
'required': ['jsonKeyFile', 'projectId'],
"order": ['projectId', 'jsonKeyFile', 'loadSchema', 'useStandardSql', 'location', 'totalMBytesProcessedLimit', 'maximumBillingTier', 'userDefinedFunctionResourceUri'],
'secret': ['jsonKeyFile']
}
})

return schema

def _get_bigquery_service(self):
scope = [
Expand All @@ -146,7 +160,7 @@ def _get_bigquery_service(self):
http = httplib2.Http(timeout=settings.BIGQUERY_HTTP_TIMEOUT)
http = creds.authorize(http)

return build("bigquery", "v2", http=http)
return build("bigquery", "v2", http=http, cache_discovery=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this do?

Copy link
Author

@Yuriowindiatmoko2401 Yuriowindiatmoko2401 Nov 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bug fix when this kind of error happened
“ImportError: file_cache is unavailable” when using Python client for Google service account file_cache

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a known issue in the google-api-python-client library: googleapis/google-api-python-client#299 (comment) I would at least apply this fix.


def _get_project_id(self):
return self.configuration["projectId"]
Expand Down Expand Up @@ -207,7 +221,7 @@ def _get_query_result(self, jobs, query):

rows = []

while ("rows" in query_reply) and current_row < query_reply['totalRows']:
while ("rows" in query_reply) and int(current_row) < int(query_reply['totalRows']):
jezdez marked this conversation as resolved.
Show resolved Hide resolved
for row in query_reply["rows"]:
rows.append(transform_row(row, query_reply["schema"]["fields"]))

Expand All @@ -231,12 +245,20 @@ def _get_query_result(self, jobs, query):
else types_map.get(f['type'], "string")
} for f in query_reply["schema"]["fields"]]

qbytes = int(query_reply['totalBytesProcessed'])

data = {
"columns": columns,
"rows": rows,
'metadata': {'data_scanned': int(query_reply['totalBytesProcessed'])}
'metadata': {
'data_scanned': qbytes
}
}

if EXPOSE_COST:
price = self.configuration.get('cost_per_tb', 1.1)
data['metadata'].update({'query_cost': '${0:.2f}'.format(price * qbytes * 10e-12)})

return data

def _get_columns_schema(self, table_data):
Expand Down Expand Up @@ -317,4 +339,4 @@ def run_query(self, query, user):
return json_data, error


register(BigQuery)
register(BigQuery)