Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize switch between different datasources #1078

Merged
merged 33 commits into from
Sep 21, 2016

Conversation

ShengyaoQian
Copy link

No description provided.

@ShengyaoQian ShengyaoQian changed the title Src reg Generalize switch between different datasources Sep 8, 2016
from caravel.utils import generic_find_constraint_name

def find_constraint_name(col_names, referenced_table):
cols = col_names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if you like cols better just rename the function argument

@xrmx
Copy link
Contributor

xrmx commented Sep 8, 2016

Please state why you need these changes, what problem do they solve?

@ShengyaoQian
Copy link
Author

ShengyaoQian commented Sep 8, 2016

Our team is adding a third kind of datasource, our backend api to caravel besides Druid and Sql. Currently in caravel, the switch between these two datasources are hard-coded, which requires us to do the same when we add our customized datasource engine. So we need to make changes inline to make things work. So we need to maintain our fork and deal with merges. Ideally, we want to keep our changes from upstream caravel as separated as possible. So it's easier to maintain and keep updated with upstream. I've created an issue before #1035, and was suggested to send a PR. So here's the changes I'm hoping to get in. I'm creating an instance of a registry that register all desired datasource type and Models at initialization of the app. So it'll pick up the Model associated with the datasource type in Slice Model and Views.py explore method. Let me know what you think.

@joshwalters
Copy link
Contributor

joshwalters commented Sep 8, 2016

@ShengyaoQian It looks like there are some checkstyle issues that have to be fixed: https://codeclimate.com/github/airbnb/caravel/pull/1078

Just to clarify, with this change you are trying to support the addition of 'generic' engines (not SQLAlchemy or Druid), correct?

How exactly will this pickup new datasource engines? I may have missed that in the code.

@xrmx
Copy link
Contributor

xrmx commented Sep 9, 2016

@joshwalters he's adding a generic way to handle datasources. So that you don't need to touch every line of code that discriminate between sqla and druid.

@ShengyaoQian i think you need to update the add method of SliceModelView no?

@xrmx
Copy link
Contributor

xrmx commented Sep 9, 2016

@ShengyaoQian may i ask which datasource are you adding?

from caravel.utils import generic_find_constraint_name

def find_constraint_name(cols, referenced_table):
return generic_find_constraint_name(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the codebase is all 4 spaces indents, no tabs pls!

@mistercrunch
Copy link
Member

I'm in favor of this PR but highlighted a few nits inline

@@ -404,7 +404,11 @@ def readfile(filepath):
content = f.read()
return content


def register_sources(models, registry):
print("Inside register_sources")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove debug prints.

@ShengyaoQian
Copy link
Author

@joshwalters I've made some changes in registering datasource. The modules and class will be passed in through config file. And be imported upon initialization of the app.

@ShengyaoQian
Copy link
Author

@xrmx SliceModelView has been updated. Thanks for the notice. I'm adding our API as a third datasource engine. I take the query object generated by caravel and parse it into our desired format and make request to our api and then parse the json response into pandas df.

ps is "She" ;P

@ShengyaoQian
Copy link
Author

I'm having difficulties making the db migration work for sqlite and postgres. So I'm dropping fk constraints between table, druiddatasources and slices. But when it was originally created, the fk doesn't have a name. So sqlite couldn't drop them. I referred to this file https://github.com/airbnb/caravel/blob/master/caravel/migrations/versions/1226819ee0e3_fix_wrong_constraint_on_table_columns.py where you seems to be dealing with the same issue. But I couldn't really get it to work with adding naming_convention. Any suggestions?

)

db.session.delete(datasource)
db.session.commit()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove those test since the order of redirecting no longer applies.

@xrmx
Copy link
Contributor

xrmx commented Sep 10, 2016

@ShengyaoQian @mistercrunch Do you think we can add the relevant sources to the datasource table when migrating so that we are not breaking every installation?

constraint_2 = find_constraint_name(cols_2, 'tables')
batch_op.drop_constraint(constraint_2, type_="foreignkey")

batch_op.drop_column('druid_datasource_id')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you don't want to drop these two columns first. Add the new column. Use the other two ids to populate datasource id. Meaning either druid_datasource_id or table_id would be non-null and you could set datasource_id to whichever one is not null. However, the downgrade would be difficult since you'd need to use the datasource type to determine which column to populate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are made accordingly.

@ShengyaoQian
Copy link
Author

@mistercrunch ping!

@mistercrunch
Copy link
Member

This is good to go, thanks for the PR!

@mistercrunch
Copy link
Member

Ooops merge conflict, can't merge...

@ShengyaoQian
Copy link
Author

@mistercrunch !

@mistercrunch
Copy link
Member

Thanks for bearing through with us!

@mistercrunch mistercrunch merged commit 5a0e06e into apache:master Sep 21, 2016
@mistercrunch
Copy link
Member

img

@ShengyaoQian
Copy link
Author

THANKS! ^o^ Y(^_^)Y

objects = session.query(models.Slice).all()
objects += session.query(models.Dashboard).all()
objects = session.query(Slice).all()
objects += session.query(Dashboard).all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created_by does not exist as a column in the slice or dashboard tables causing an exception to be raised two lines below this if objects is not empty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to modify existing migration files? Existing environments would not rerun this script right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the-dcruz could you please open a new issue with the stacktrace please?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the-dcruz I added a column datasource_id in Slice and Dashboard. If it uses models.Slice and models.Dashboard the session.query would fail because it couldn't find this column at this migration point. It would be helpful if you can share the stacktrace. Thank you

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the-dcruz I run into some problem as well and I think it's what you have. I think the tests run on empty databases that's why I didn't catch it the first time. When there's existing data, you'd see the error. I've created a pull request for the fix. Can you take a look to see if it fixes what you have? #1262

zhaoyongjie pushed a commit to zhaoyongjie/incubator-superset that referenced this pull request Nov 17, 2021
zhaoyongjie pushed a commit to zhaoyongjie/incubator-superset that referenced this pull request Nov 24, 2021
zhaoyongjie pushed a commit to zhaoyongjie/incubator-superset that referenced this pull request Nov 25, 2021
zhaoyongjie pushed a commit to zhaoyongjie/incubator-superset that referenced this pull request Nov 26, 2021
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.11.0 labels Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants