Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time grain config #1188

Closed
wants to merge 2 commits into from
Closed

Time grain config #1188

wants to merge 2 commits into from

Conversation

LAlbertalli
Copy link
Contributor

Allow the user to configure the Time Grain for the Database.

This should make solving issues like #971 easier without having to fork

Luca Albertalli added 2 commits September 23, 2016 18:15
This will make easier to support additional database supported by SQLAlchemy as well
as adding additional time grains (e.g: the problem raised in issue
#971
@xrmx
Copy link
Contributor

xrmx commented Sep 24, 2016

@LAlbertalli what's wrong with adding the missing time grains instead?

@LAlbertalli
Copy link
Contributor Author

@xrmx It is possible to do what you say but I don't think it is a wise choice. Basically, you are proposing or polluting the time grain namespace (so basically a bad User Experience) or that everyone that need a different time grain set should fork and maintain it.

I don't find both options attractive. The first make the usability of Caravel worst, the second just make updating a pain (from pip install --upgrade to having the need to merge, test, compile, check it works). I've seen too many unmaintained forks. There are reasons to avoid making everything configurable, but I think that in this case it makes perfect sense...

@xrmx
Copy link
Contributor

xrmx commented Sep 25, 2016

@LAlbertalli I'm not saying we should make it a kitchen sink, but i don't see why we cannot add the useful ones, like the one you mentioned, before thinking of making it configurable. Do you have many more that you cannot share?

My personal opinion is I don't see anything wrong maintaining our own forks (more like branches anyway) for our production. I maintain one for my employer, airbnb has their branches pushed to the repo. Mine is a commit that was stable plus 4 backported patches. If they become stale or unmaintained that's not an upstream problem. Adding configuration over code makes it easier for every downstream to sit on their stable releases and don't give a damn about upstream stabilization and development. As a free software developer i hate that :)
Finally i don't think updating caravel is so demanding, the only annoying bits is building js taking ages, but the rest is easy and documented, the real problem is fixing regression and make sure they are not introduced. That's the hard part for everyone so better if it's shared by many parties :)

@mistercrunch
Copy link
Member

I understand the rational for this, but I agree with @xrmx on this one. I want to refactor all of the database engine specific code into a single place where we can use inheritance schemes to define all this. Supporting for configuration would make that surgery a bit more complex.

I don't think our needs in terms of time grains are so different here that each install has to manage their own fork for that purpose. I'd rather overload the grain namespace more at this point and see examples of where we start saying "no, these grains are useless to most people" before allowing for customization.

@LAlbertalli
Copy link
Contributor Author

@xrmx sorry, I think you misunderstood my point. Maybe you have different experiences, but, in the long run, every project you fork and patch become a liability. This means, by the way, that you should look into contributing back the patch that you think are worth for the product and you try to make things that are specific to your use case modular.
Another problem is on usability: I know it sounds absurd from an engineer point of view, but more choices are bad. If I could make Caravel smart enough to understand the proper time grain for a visualization, I will submit a patch to do that and remove the choice. Adding one choice, especially when it could lead to confusion is bad.

@mistercrunch Using inheritance was my first choice but the code doesn't allow it easily right now. I saw in Master the code for adding DataSources but it doesn't seem to be ready yet.
As for your question, well, week is a good example. The starting day of the week could change from company to company, usuall choices are Sunday or Monday and slightly less common Saturday. Plus the databases are sometimes a little bit strange (e.g: HP Vertica uses Monday or Sunday depending on the function you use). And you are not even considering the problem of converting Timezones, now in Caravel everything is in UTC. I think these area need a little bit of attention, this configuration was a small piece to make it better, but these are the areas that need a little bit of attention:

  • Config of Granularities for SQL datasource, I agree, inheritance is a more general solution
  • Better Granularities for Druid: right now Caravel support only duration with an optional time origin. But Druid provides also support for Period Granularities that work betters in most cases (especially if you are using Druid for analytics use over long periods)
  • Time Zone support: we have a patch ready that covers Druid datasources but we are not sure to submit it because it touches several point in the code base. I think it would make sense to add it when the code is refactored. Incidentally, in Druid timezones requires to use Period granularities instead of duration ones...

@xrmx
Copy link
Contributor

xrmx commented Sep 25, 2016

Il 25/09/2016 19:29, Luca Albertalli ha scritto:

@xrmx https://github.com/xrmx sorry, I think you misunderstood my point. Maybe
you have different experiences, but, in the long run, every project you fork and
patch become a liability. This means, by the way, that you should look into
contributing back the patch that you think are worth for the product and you try
to make things that are specific to your use case modular.

I very much understand that, and that's because i always work upstream on the
fixes and features i need. So that i have no fork to maintain :)

Another problem is on usability: I know it sounds absurd from an engineer point
of view, but more choices are bad. If I could make Caravel smart enough to
understand the proper time grain for a visualization, I will submit a patch to
do that and remove the choice. Adding one choice, especially when it could lead
to confusion is bad.

I understand the paradox of choice, but I don't think adding another time grain
would be an usability problem, how many others do you have?

@LAlbertalli
Copy link
Contributor Author

@xrmx That's exactly my point of proposing to make it configurable instead of having to patch it at every new version... As for the numbers of time grain added, since there's no Timezone support we would like to hack it in the time grain. This means adding to the main repo day PDT, week PDT, Month PDT etc. I don't see it as a smart move!

I agree with @mistercrunch point of using inheritance, if properly done it means I could just override the default Vertica class with my own with the grains we need. I'm waiting for his patch right now...

@0x0ece
Copy link
Contributor

0x0ece commented Sep 26, 2016

@mistercrunch are you also including timezone support in your refactor? :) I have something working internally that I could try to push next week (this week is insane), but if you're already looking into that I'll wait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants