Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sql backend implementation #430

Closed

Conversation

kevinhinterlong
Copy link
Member

@kevinhinterlong kevinhinterlong commented Jul 14, 2017

@archolewa @michael-mclawhorn Here's the WIP Part 2 of #413

Here's what I can think of that needs to be done:

  • [CLEANUP] Create a SqlDimensionLoader instead of relying on a backing druid instance to sql data
  • [CLEANUP] Remove schema/timestampColumn from code and Extend existing Tables/Schema to support SQL Backends #435
  • [CLEANUP] Fix the SqlRequestHandler once we decide on a schema
  • [DOCS] Mention that tables can have a custom timezone, but for your sanity you should only use UTC
  • [DOCS] Fix instructions for configuring schema/timestamp column
  • [TESTS] Check that multiple intervals works fine more extensively (probably working)
  • [CONFIG] Add support for custom DruidQueryToSqlConverter (AbstractBinderFactory? Custom Filter/Having/PostAggregation evaluator?)

Not yet implemented:

  • [DruidAggregationQuery] Nested GroupByQuery
  • [DruidAggregationQuery] LookbackQuery
  • [DruidQuery] SearchQuery
  • [DruidQuery] SelectQuery
  • [DruidQuery] DataSourceMetadataQuery
  • [DruidQuery] TimeBoundaryQuery

String sql = writeSql(sqlWriter, relToSql, localBuilder);
return "(" + sql + ")";
})
.collect(Collectors.joining("\nUNION ALL\n"));
Copy link
Member Author

@kevinhinterlong kevinhinterlong Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so after looking into it further, it appears this is supported by MySQL/PostgreSQL (not sure what else), but is not supported by Standard SQL. So for example SQLite will fail with this method.

Relevant Stackoverflow

So it looks like the best way to accomplish the ORDER BY and LIMIT functionality on subqueries (these are used by TopN over each bucket and the results are unioned together) would be to make multiple queries and aggregate the results manually like @michael-mclawhorn suggested.

I've filed a bug report with Calcite, but since I'm not sure how much can be done for this issue.

@kevinhinterlong kevinhinterlong force-pushed the SqlBackendImplementation branch 2 times, most recently from 83dfbe5 to 2bf9144 Compare July 17, 2017 14:43
Fili has initial support for using a sql database as the backend instead of druid. Please
note that there are some restrictions since Fili is optimized for Druid.

## Setup
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that DruidDimensionLoader should be turned off?

/**
* Provides a mapping from Druid's {@link Aggregation} to a {@link SqlAggregationBuilder}.
*/
public interface DruidSqlAggregationConverter {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@archolewa I've done something similar to what you mentioned in #413, but tied it specifically to aggregations, Let me know if you have any concerns with this approach

*/
public class FilterEvaluator implements ReflectiveVisitor {
private RelBuilder builder;
private final List<String> dimensions;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a Set

}
return sub;
case DIVIDE:
if (arithmeticPostAggregation.getFields().size() != 2) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR needs to be rebased on #413, this was changed there

* Small utility class to help with connection to databases, building, and writing sql.
*/
public class CalciteHelper {
public static final String DEFAULT_SCHEMA = "PUBLIC";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true for most but not all databases. Another reason this should be specified in the Table logic and not here

* @throws SQLException if failed while reading database.
* @throws IllegalStateException if no {@link JDBCType#TIMESTAMP} column could be found.
*/
public static String getTimestampColumn(Connection connection, String schema, String table)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also be specified in the Table, repurpose this to check that the time column is actually a TIMESTAMP

private void initializeSqlBackend(ObjectMapper mapper) {
String dbUrl = SYSTEM_CONFIG.getStringProperty(DATABASE_URL);
String driver = SYSTEM_CONFIG.getStringProperty(DATABASE_DRIVER);
String schema = SYSTEM_CONFIG.getStringProperty(DATABASE_SCHEMA);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove schema from here as well

private static final String TRUE = "TRUE"
private static final String FALSE = "FALSE"
private static final String FIRST_COMMENT = "added project"
//this is the first result in the database
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarify "this"

@kevinhinterlong kevinhinterlong force-pushed the SqlBackendImplementation branch from 1937592 to aa2b76f Compare July 19, 2017 18:11
@kevinhinterlong kevinhinterlong added this to the Kevin End Date milestone Jul 19, 2017
Added security module with chaining filters.
@kevinhinterlong kevinhinterlong force-pushed the SqlBackendImplementation branch from aac7302 to e7a1e95 Compare July 24, 2017 15:48
- Need to properly distinguish between api and field names
- Before this, everything was assumed to have the same field and api name
- Need to also correct this for metrics, and there should probably be a test for this
- maybe delete the SqlTimeConverter comments?
- Update and add comments to a lot of code
- Refactoring (SqlConverter -> DefaultSqlBackedClient + DefaultSqlBackedClient)
- SqlBackedClient/DefaultSqlBackedClient now only handle the top level operations
  and shouldn't need to be overridden
- Cleaned up how aggregations are converted from druid to sql
- The DruidQuery is now converted into multiple SQL queries which should be
  processed in order (Only affects TopN Queries)
- Calcite already handles this to make sure all columns needed are available
- The dimensions were not being mapped to their field names correctly
- The dimensions should be the last position in the order by
- The timegrain to date part functions didn't work correctly for ZonedTimeGrains
- Also added a test to show that timezones are mapped correctly
- Both now have tests showing that the mapping to field names is working
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants