-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize switch between different datasources #1078
Conversation
2ef6f52
to
8eb5292
Compare
from caravel.utils import generic_find_constraint_name | ||
|
||
def find_constraint_name(col_names, referenced_table): | ||
cols = col_names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if you like cols better just rename the function argument
Please state why you need these changes, what problem do they solve? |
Our team is adding a third kind of datasource, our backend api to caravel besides Druid and Sql. Currently in caravel, the switch between these two datasources are hard-coded, which requires us to do the same when we add our customized datasource engine. So we need to make changes inline to make things work. So we need to maintain our fork and deal with merges. Ideally, we want to keep our changes from upstream caravel as separated as possible. So it's easier to maintain and keep updated with upstream. I've created an issue before #1035, and was suggested to send a PR. So here's the changes I'm hoping to get in. I'm creating an instance of a registry that register all desired datasource type and Models at initialization of the app. So it'll pick up the Model associated with the datasource type in Slice Model and Views.py explore method. Let me know what you think. |
@ShengyaoQian It looks like there are some checkstyle issues that have to be fixed: https://codeclimate.com/github/airbnb/caravel/pull/1078 Just to clarify, with this change you are trying to support the addition of 'generic' engines (not SQLAlchemy or Druid), correct? How exactly will this pickup new datasource engines? I may have missed that in the code. |
@joshwalters he's adding a generic way to handle datasources. So that you don't need to touch every line of code that discriminate between sqla and druid. @ShengyaoQian i think you need to update the add method of SliceModelView no? |
@ShengyaoQian may i ask which datasource are you adding? |
from caravel.utils import generic_find_constraint_name | ||
|
||
def find_constraint_name(cols, referenced_table): | ||
return generic_find_constraint_name( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the codebase is all 4 spaces indents, no tabs pls!
I'm in favor of this PR but highlighted a few nits inline |
@@ -404,7 +404,11 @@ def readfile(filepath): | |||
content = f.read() | |||
return content | |||
|
|||
|
|||
def register_sources(models, registry): | |||
print("Inside register_sources") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should remove debug prints.
@joshwalters I've made some changes in registering datasource. The modules and class will be passed in through config file. And be imported upon initialization of the app. |
@xrmx SliceModelView has been updated. Thanks for the notice. I'm adding our API as a third datasource engine. I take the query object generated by caravel and parse it into our desired format and make request to our api and then parse the json response into pandas df. ps is "She" ;P |
I'm having difficulties making the db migration work for sqlite and postgres. So I'm dropping fk constraints between table, druiddatasources and slices. But when it was originally created, the fk doesn't have a name. So sqlite couldn't drop them. I referred to this file https://github.com/airbnb/caravel/blob/master/caravel/migrations/versions/1226819ee0e3_fix_wrong_constraint_on_table_columns.py where you seems to be dealing with the same issue. But I couldn't really get it to work with adding naming_convention. Any suggestions? |
) | ||
|
||
db.session.delete(datasource) | ||
db.session.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove those test since the order of redirecting no longer applies.
@ShengyaoQian @mistercrunch Do you think we can add the relevant sources to the datasource table when migrating so that we are not breaking every installation? |
constraint_2 = find_constraint_name(cols_2, 'tables') | ||
batch_op.drop_constraint(constraint_2, type_="foreignkey") | ||
|
||
batch_op.drop_column('druid_datasource_id') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you don't want to drop these two columns first. Add the new column. Use the other two ids to populate datasource id. Meaning either druid_datasource_id or table_id would be non-null and you could set datasource_id to whichever one is not null. However, the downgrade would be difficult since you'd need to use the datasource type to determine which column to populate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes are made accordingly.
c54dc44
to
d4b2976
Compare
@mistercrunch ping! |
This is good to go, thanks for the PR! |
Ooops merge conflict, can't merge... |
4cc8f63
to
a386f60
Compare
Thanks for bearing through with us! |
THANKS! |
objects = session.query(models.Slice).all() | ||
objects += session.query(models.Dashboard).all() | ||
objects = session.query(Slice).all() | ||
objects += session.query(Dashboard).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created_by
does not exist as a column in the slice
or dashboard
tables causing an exception to be raised two lines below this if objects
is not empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to modify existing migration files? Existing environments would not rerun this script right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@the-dcruz could you please open a new issue with the stacktrace please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@the-dcruz I added a column datasource_id in Slice and Dashboard. If it uses models.Slice and models.Dashboard the session.query would fail because it couldn't find this column at this migration point. It would be helpful if you can share the stacktrace. Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@the-dcruz I run into some problem as well and I think it's what you have. I think the tests run on empty databases that's why I didn't catch it the first time. When there's existing data, you'd see the error. I've created a pull request for the fix. Can you take a look to see if it fixes what you have? #1262
No description provided.