Closes #927, #928: Schema refresh improvements. #930

emtwo · 2019-04-03T16:30:13Z

This PR makes a variety of changes to fix bugs and more gracefully handle schema processing for data sources with many tables. A summary:

Batch insert and update of column and table metadata
Time limit on schema refresh increased to a soft limit of 5 min and hard limit of 10
Column types are truncated to fit 255 character limit
Fixed bug with str() function on TableMetadata accessing a non-existent variable
Refresh schema every 12 hours instead of 30 min
Check whether we should collect data samples before we spin off tasks to do so
When a user requests a refresh, do it asynchronously
Don’t force a schema refresh when there is no schema data to display (this clogs up the queue), just wait for the next refresh
Changes to data sources (in data_sources.py) should trigger a refresh for the changed data source only (instead of all data sources)

washort

nice improvement, r+

washort · 2019-04-03T20:42:33Z

redash/settings/__init__.py

@@ -46,7 +46,7 @@ def all_settings():
 QUERY_RESULTS_CLEANUP_COUNT = int(os.environ.get("REDASH_QUERY_RESULTS_CLEANUP_COUNT", "100"))
 QUERY_RESULTS_CLEANUP_MAX_AGE = int(os.environ.get("REDASH_QUERY_RESULTS_CLEANUP_MAX_AGE", "7"))

-SCHEMAS_REFRESH_SCHEDULE = int(os.environ.get("REDASH_SCHEMAS_REFRESH_SCHEDULE", 30))
+SCHEMAS_REFRESH_SCHEDULE = int(os.environ.get("REDASH_SCHEMAS_REFRESH_SCHEDULE", 720))
 SCHEMAS_REFRESH_QUEUE = os.environ.get("REDASH_SCHEMAS_REFRESH_QUEUE", "celery")


With the faster schema refresh do we still want this longer interval?

It's not necessary at the moment, but I'm being cautious because the column examples are still a bit slow since they do select * from table limit 1 for each table. We can always turn it back up if it seems too slow I think.

redash/tasks/queries.py

emtwo requested a review from jezdez April 3, 2019 16:35

washort approved these changes Apr 3, 2019

View reviewed changes

emtwo force-pushed the emtwo/fix_schema_bugs branch from 527eddc to 01ef94d Compare April 3, 2019 22:50

Closes #927, #928: Schema refresh improvements.

f7c12a2

emtwo force-pushed the emtwo/fix_schema_bugs branch from 01ef94d to f7c12a2 Compare April 3, 2019 22:52

emtwo merged commit 1662f47 into master Apr 3, 2019

emtwo deleted the emtwo/fix_schema_bugs branch April 3, 2019 23:05

jezdez mentioned this pull request Apr 4, 2019

Schema refresh review fixes. #931

Merged

emtwo pushed a commit that referenced this pull request Apr 4, 2019

Another follow-up to #930

a1bbbdb

emtwo pushed a commit that referenced this pull request Apr 4, 2019

Another follow-up to #930

1dff4c4

emtwo pushed a commit that referenced this pull request Apr 4, 2019

Another follow-up to #930

63cb0d5

jezdez mentioned this pull request May 13, 2019

M21 rebase #952

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes #927, #928: Schema refresh improvements. #930

Closes #927, #928: Schema refresh improvements. #930

emtwo commented Apr 3, 2019

washort left a comment

washort Apr 3, 2019

emtwo Apr 3, 2019 •

edited

Loading

Closes #927, #928: Schema refresh improvements. #930

Closes #927, #928: Schema refresh improvements. #930

Conversation

emtwo commented Apr 3, 2019

washort left a comment

Choose a reason for hiding this comment

washort Apr 3, 2019

Choose a reason for hiding this comment

emtwo Apr 3, 2019 • edited Loading

Choose a reason for hiding this comment

emtwo Apr 3, 2019 •

edited

Loading