Add link to scheduled pipeline #7536

betodealmeida · 2019-05-17T05:29:34Z

SUMMARY

This PR makes it possible to add a link from the scheduled query to the pipeline running it. The user can provide a URL template, that gets formatted with the query to produce a link that takes to the corresponding pipeline (see example in updated docs).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

I also added some CSS to remove the disabled + button in the form, so it looks better.

TEST PLAN

Tested locally, and added unit tests for the helper functions.

ADDITIONAL INFORMATION

REVIEWERS

@khtruong @DiggidyDave @datability-io

) (apache#7518) * [WIP] Live query validation, where supported This builds on apache#7422 to build check-as-you-type sql query validation in Sql Lab. This closes apache#6707 too. It adds a (debounced) call to the validate_sql_json API endpoint with the querytext, and on Lyft infra is able to return feedback to the user (end to end) in $TBD seconds. At present feedback is provided only through the "annotations" mechanism build in to ACE, although I'd be open to adding full text elsewhere on the page if there's interest. * fix: Unbreak lints and tests

…#7517) (apache#7519) This change makes the query progress bar only show whole number percentage changes, instead of numbers like 12.13168276%.

* Making Talisman configurable * Fixing double quotes * Fixing flake8 * Removing default

codecov-io · 2019-05-17T07:09:32Z

Codecov Report

Merging #7536 into lyft-develop will decrease coverage by <.01%.
The diff coverage is 45%.

@@               Coverage Diff                @@
##           lyft-develop    #7536      +/-   ##
================================================
- Coverage         65.19%   65.18%   -0.01%     
================================================
  Files               433      434       +1     
  Lines             21431    21446      +15     
  Branches           2362     2368       +6     
================================================
+ Hits              13971    13980       +9     
- Misses             7340     7346       +6     
  Partials            120      120

Impacted Files	Coverage Δ
superset/views/sql_lab.py	`85.71% <0%> (-1.25%)`	⬇️
superset/assets/src/showSavedQuery/index.jsx	`0% <0%> (ø)`	⬆️
superset/assets/src/showSavedQuery/utils.js	`100% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0acbb04...0702068. Read the comment docs.

DiggidyDave · 2019-05-17T17:01:41Z

docs/installation.rst

@@ -906,6 +906,12 @@ To allow scheduled queries, add the following to your `config.py`:
                    'container': 'end_date',
                },
            ],
+            # link to the scheduler; this example links to an Airflow pipeline
+            # that uses the query id and the output table as its name
+            'linkback': (


Is this not a circular dependency? Superset should not know anything about the scheduler (eg Airflow), should it? The scheduler knows about superset, and grabs work from a known endpoint, and neither the user nor superset system itself should actually care who is doing that work.

I think we should consider letting any arbitrary scheduler PUT back information (like a URL) about how to view it pipelines or whatever representation it uses for the work it is doing.

I agree it establishes a bi-directional connection, but Superset still doesn't know anything about Airflow with this (it's just an example config). The user is simply saying "when you show the scheduled information, put a link to this URL", and Superset does.

But it does require the running instance of superset to have internal airflow details (via its configuration). This has a "correctness" problem IMO which could manifest as actual issues. It requires the configurator of superset to know (at deployment time?) who will be servicing these and what their URLs look like.

With this approach it is coupled such that it prevents the possibility of multiple systems being able to service these scheduled queries, or if the owners of those services decide to migrate them to an new system it will break the feature in superset. Imagine if the load was migrated partially to another internal system like flyte for example, this would unnecessarily cause us to have to do significant eng work to accomodate that (if it even can be accomodated at all), whereas if the servicer itself PUTs the URL to superset, we don't have any concerns or opinions about that at all, it will just work.

But it does require the running instance of superset to have internal airflow details (via its configuration). This has a "correctness" problem IMO which could manifest as actual issues. It requires the configurator of superset to know (at deployment time?) who will be servicing these and what their URLs look like.

The SCHEDULED_QUERIES feature flag config is a way of informing Superset of the internals of a scheduler: it basically tells what information is needed for a given scheduler. I don't see how the linkback is different from the information stored in the configuration, since the configuration is already scheduler-specific.

With this approach it is coupled such that it prevents the possibility of multiple systems being able to service these scheduled queries, or if the owners of those services decide to migrate them to an new system it will break the feature in superset. Imagine if the load was migrated partially to another internal system like flyte for example, this would unnecessarily cause us to have to do significant eng work to accomodate that (if it even can be accomodated at all), whereas if the servicer itself PUTs the URL to superset, we don't have any concerns or opinions about that at all, it will just work.

Migrating to a new scheduler would most probably require updating the extra_json in all the existing queries, in addition to updating the SCHEDULE_QUERIES config, so the significant engineering work would be already expected.

And while I agree that that having the consumers PUTting the URL would be nice because it could support multiple schedulers (and we get the information from the system that knows more about it) I don't think it's a likely scenario to happen in practice.

I'm also worried about PUTting the URL because in order for the consumer to update the scheduled query with the pipeline URL it needs it to impersonate the user, opening a backdoor for running arbitrary queries in the user's name. And technically it could also result in race conditions, but I think that's an unlikely scenario.

* Validate start/end when scheduling queries * Use chrono instead of Sugar

* Show scheduled queries * Remove column * Secure views * Add import * Fix unit tests * Reuse existing db connection from view * Remove unnecessary import

* feat: add header tooltip (apache#7531)

betodealmeida · 2019-05-21T21:57:59Z

@DiggidyDave is this good to go then?

DiggidyDave · 2019-05-22T23:58:23Z

👍

betodealmeida · 2019-05-23T18:22:25Z

Closing since I merged to master in #7584.

Alex Berghage and others added 4 commits May 15, 2019 15:32

chore: Truncate progressbar percentage decimals (apache#7499) (apache…

9423e9a

…#7517) (apache#7519) This change makes the query progress bar only show whole number percentage changes, instead of numbers like 12.13168276%.

[sql lab] Fix new query stuck at pending state (apache#7523)

7f858e4

Talisman config (apache#7529)

21a4670

* Making Talisman configurable * Fixing double quotes * Fixing flake8 * Removing default

pull-request-size bot added the size/L label May 17, 2019

DiggidyDave reviewed May 17, 2019

View reviewed changes

betodealmeida and others added 3 commits May 17, 2019 17:30

Validate start/end when scheduling queries (apache#7544)

f0f719c

* Validate start/end when scheduling queries * Use chrono instead of Sugar

Show scheduled queries (apache#7545)

dcafabd

* Show scheduled queries * Remove column * Secure views * Add import * Fix unit tests * Reuse existing db connection from view * Remove unnecessary import

feat: add header tooltip (apache#7556)

c79077d

* feat: add header tooltip (apache#7531)

betodealmeida added 4 commits May 22, 2019 14:44

Add link to scheduled pipeline

8e5cc1b

Split utils into separate file

0c39be5

Fix unit test

516b62d

Fix separator recursion

6760dca

betodealmeida force-pushed the VIZ-633 branch from 80de156 to 6760dca Compare May 22, 2019 21:48

betodealmeida mentioned this pull request May 22, 2019

Add link to scheduled pipeline #7584

Merged

6 tasks

DiggidyDave approved these changes May 22, 2019

View reviewed changes

betodealmeida closed this May 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add link to scheduled pipeline #7536

Add link to scheduled pipeline #7536

betodealmeida commented May 17, 2019

codecov-io commented May 17, 2019 •

edited

Loading

DiggidyDave May 17, 2019

betodealmeida May 17, 2019

DiggidyDave May 17, 2019

betodealmeida May 17, 2019

betodealmeida commented May 21, 2019

DiggidyDave commented May 22, 2019

betodealmeida commented May 23, 2019

Add link to scheduled pipeline #7536

Add link to scheduled pipeline #7536

Conversation

betodealmeida commented May 17, 2019

CATEGORY

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TEST PLAN

ADDITIONAL INFORMATION

REVIEWERS

codecov-io commented May 17, 2019 • edited Loading

Codecov Report

DiggidyDave May 17, 2019

Choose a reason for hiding this comment

betodealmeida May 17, 2019

Choose a reason for hiding this comment

DiggidyDave May 17, 2019

Choose a reason for hiding this comment

betodealmeida May 17, 2019

Choose a reason for hiding this comment

betodealmeida commented May 21, 2019

DiggidyDave commented May 22, 2019

betodealmeida commented May 23, 2019

codecov-io commented May 17, 2019 •

edited

Loading