Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC server (#1274) #1301

Merged
merged 8 commits into from
Mar 5, 2019
Merged

RPC server (#1274) #1301

merged 8 commits into from
Mar 5, 2019

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Feb 15, 2019

dbt RPC server - resolves #1274

This implements a basic json-rpc server using werkzeug for routing. You can compile/run sql with the compile/run RPC commands, both of which accept a mandatory name parameter, a mandatory base64-encoded sql parameter, and an optional integer timeout parameter. ref, source, and all jinja things work like you'd expect, the created pseudo-nodes go through the regular compilation model.

In [10]: def encode_request(methodName, sql, request_id=1, timeout=None):
    ...:     encoded = base64.standard_b64encode(sql.encode('utf-8')).decode('utf-8')
    ...:     return { 'jsonrpc': '2.0', 'method': methodName, 'params': {'sql': encoded, 'name': 'foo', 'timeout': timeout}, 'id': request_id}

In [11]: requests.post(url, headers=headers, data=json.dumps(encode_request('run', 'select pg_sleep(3)', timeout=1))).json()
Out[11]:
{'error': {'code': -32000,
  'message': 'Server error',
  'data': {'type': 'RPCException',
   'args': ['timed out after 1s'],
   'message': 'timed out after 1s'}},
 'id': 1,
 'jsonrpc': '2.0'}

In [12]: requests.post(url, headers=headers, data=json.dumps(encode_request('run', 'select * from {{ ref("model_1") }}', timeout=1))).json()
Out[12]:
{'result': {'raw_sql': 'select * from {{ ref("model_1") }}',
  'compiled_sql': 'select * from "postgres"."dbt_postgres_schema_1"."model_1"',
  'timing': [{'name': 'compile',
    'started_at': '2019-02-19T17:30:22.615144Z',
    'completed_at': '2019-02-19T17:30:22.619642Z'},
   {'name': 'execute',
    'started_at': '2019-02-19T17:30:22.619742Z',
    'completed_at': '2019-02-19T17:30:22.633849Z'}],
  'table': [{'id': 1.0}]},
 'id': 1,
 'jsonrpc': '2.0'}

In [13]: requests.post(url, headers=headers, data=json.dumps(encode_request('compile', 'select * from {{ refff("model_1") }}', timeout=1))).json()
Out[13]:
{'error': {'code': -32000,
  'message': 'Server error',
  'data': {'type': 'RPCException',
   'args': ["Compilation Error in rpc call foo (from remote system)\n  'refff' is undefined"],
   'message': "Compilation Error in rpc call foo (from remote system)\n  'refff' is undefined"}},
 'id': 1,
 'jsonrpc': '2.0'}

@beckjake beckjake changed the title RPC server RPC server WIP Feb 15, 2019
@cmcarthur
Copy link
Member

@beckjake this looks really good! (and yeah, there's more work to do)

can you rebase this against dev/wilt-chamberlain? I've just created & pushed it

also, sometime between now and Weds can you spend some time taking your bulleted todo list and writing up some issues? this is a big project, and I think it'd be too much to try to tackle all of that in this branch.

@beckjake beckjake force-pushed the feature/rpc-tasks branch 2 times, most recently from 26131d6 to 8b48bf5 Compare February 19, 2019 17:42
- make tasks all have a "from_args" that handles initializing correct config type, etc
- make it possible to process a single node's refs at a time
- Make remote run/compile tasks + rpc server task, wire them up
- add ref() and source() support, and vestigial doc() support
- refactor results a bit to support new result behavior
- don't write to the filesystem on requests
- handle uniqueness issues
@beckjake
Copy link
Contributor Author

also, sometime between now and Weds can you spend some time taking your bulleted todo list and writing up some issues? this is a big project, and I think it'd be too much to try to tackle all of that in this branch.

Oops, I finished them all instead! I still have to write tests but this seems to work ok

@beckjake beckjake marked this pull request as ready for review February 19, 2019 21:42
@beckjake beckjake changed the base branch from dev/stephen-girard to dev/wilt-chamberlain February 19, 2019 21:42
@beckjake beckjake changed the title RPC server WIP RPC server (#1274) Feb 19, 2019
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks so good! I'm going to spend some time playing with it. You indicated that there may be some tweaks to make around error handling / formatting. Can check that out, and will also report back with any other initial thoughts too.

This is really, very exciting!

core/dbt/compilation.py Show resolved Hide resolved
@@ -0,0 +1,8 @@
from contextlib import contextmanager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's up with this file?

Copy link
Contributor Author

@beckjake beckjake Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is what happens when you git add core and have dead code lying around...

core/dbt/contracts/results.py Show resolved Hide resolved
core/dbt/main.py Show resolved Hide resolved
core/dbt/main.py Outdated
run_sub = _build_run_subparser(subs, base_subparser)
compile_sub = _build_compile_subparser(subs, base_subparser)
generate_sub = _build_docs_generate_subparser(docs_subs, base_subparser)
_add_common_arguments(run_sub, compile_sub, generate_sub)
_add_common_arguments(run_sub, compile_sub, generate_sub, rpc_sub)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the common args like --models, --exclude, --non-destructive, and --full-refresh applicable for the rpc task? I imagine that --threads and --no-version-check might be

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, that's what this does!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok! I think we probably don't want --models, --exclude, --non-destructive, and --full-refresh to be acceptable flags to this task. I don't think they're sensible in the context of a client <> server paradigm. Can we remove these?

I could buy the use case for --threads and --no-version-check though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, and since --models, --exclude are both ignored, removing them won't change anything.

It's worth pointing out that although I'm not sure of the ultimate consequences, --full-refresh and --non-destructive do have impacts on the execution of code in the dbt rpc server model! It flips a global flag value stored in dbt.flags that's then accessible to macros and such, and some adapter methods also inspect them. I think the solution here is probably that flags should be an object generated at runtime based on config/args and passed around as needed, rather than a module with attributes that are set during argument parsing. And then we should accept them in RPC calls, probably.

I think --no-version-check should probably be available everywhere that loads project files (which the server does), and we do actually use the profile's threads in the server, so it makes sense to keep the cli override around.

core/dbt/parser/util.py Show resolved Hide resolved
}
unique_id, node = self.parser.parse_sql_node(node_dict)

self.manifest = ParserUtils.add_new_refs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this addresses my comment above about refs missing from parsing

self.job_queue = self.linker.as_graph_queue(self.manifest,
selected_uids)

# TODO: how can we get a timeout in here? Spin off a subprocess?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question, and one which I think @cmcarthur would be well suited to help think through

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, need to remove this TODO.

@@ -320,3 +333,73 @@ def get_result(self, results, elapsed_time, generated_at):

def task_end_messages(self, results):
dbt.ui.printer.print_run_end_messages(results)


class RemoteCallable(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i need to come back and give this class another look

error = str(exc)
conn.send([result, error])

def safe_handle_request(self, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How happy are you with this implementation of timeouts? It doesn't look unreasonable to me, but curious if you view this as something close to production-ready, or if your TODO above indicates that we need to give this some more thought

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this this is probably how it will shake out, I'm fine with it, I think. I mostly anticipate issues on Windows, potentially.

I do want to make sure we're close()ing everything that needs it though.

@drewbanin
Copy link
Contributor

drewbanin commented Feb 20, 2019

General thought: making requests against this server from a webapp is difficult because the webapp will necessarily be running on a different port. While Chrome shows me:

Access to XMLHttpRequest at 'http://localhost:8580/' from origin 'http://cloud.getdbt.local:31000' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.

I think the server hits a snag before that -- it looks like the OPTIONS request is handled ungracefully by the server.

Generally, is this something we should handle in dbt core? Or should it be handled by some layer in front of dbt-core? I don't feel strongly either way -- curious what you guys think.

@cmcarthur
Copy link
Member

@drewbanin IMO this should be handled by cloud or whatever layer people put in front of this server, not by core

@cmcarthur
Copy link
Member

this looks great to me. the only comment i have is i could foresee wanting a separate log handler for access logs. that would make it easy to divert just the access logs or just the general logs when this is included as a library

@beckjake
Copy link
Contributor Author

@drewbanin IMO this should be handled by cloud or whatever layer people put in front of this server, not by core

Yes. You don't want your python programs handling stuff like this. Toss it behind nginx or apache.

Fix python 2.7
remove TODO
remove useless file I added by accident
close Pipe members
Give RPC its own logger, include remote addrs
@beckjake
Copy link
Contributor Author

I believe with the last push I've addressed all the feedback! I would really like to get windows tests fully working again for this PR.

Jacob Beck added 3 commits February 20, 2019 15:06
argument parsing fixes
change the table to a list of columns + list of rows
@beckjake
Copy link
Contributor Author

The RPC server does not work on windows. I bet it's a simple change (change some parameter to werkzeug's server) but per our discussion at sprint planning I'm not going to get into it right now.

@drewbanin
Copy link
Contributor

sounds good, thanks for the update @beckjake

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel really satisfied with the results of this code review, and i'm trying though unable to break the server locally :)

I imagine there will be a batch of minor updates to make here, but this is a phenomenal starting point. ship it.

Edit: after resolving merge conflicts haha

@beckjake beckjake merged commit 0a4eea4 into dev/wilt-chamberlain Mar 5, 2019
@beckjake beckjake deleted the feature/rpc-tasks branch March 5, 2019 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MVP: Remotely Callable Tasks
3 participants