Sql backend implementation #430

kevinhinterlong · 2017-07-14T19:33:09Z

@archolewa @michael-mclawhorn Here's the WIP Part 2 of #413

Here's what I can think of that needs to be done:

[CLEANUP] Create a SqlDimensionLoader instead of relying on a backing druid instance to sql data
[CLEANUP] Remove schema/timestampColumn from code and Extend existing Tables/Schema to support SQL Backends #435
[CLEANUP] Fix the SqlRequestHandler once we decide on a schema
[DOCS] Mention that tables can have a custom timezone, but for your sanity you should only use UTC
[DOCS] Fix instructions for configuring schema/timestamp column
[TESTS] Check that multiple intervals works fine more extensively (probably working)
[CONFIG] Add support for custom DruidQueryToSqlConverter (AbstractBinderFactory? Custom Filter/Having/PostAggregation evaluator?)

Not yet implemented:

[DruidAggregationQuery] Nested GroupByQuery
[DruidAggregationQuery] LookbackQuery
[DruidQuery] SearchQuery
[DruidQuery] SelectQuery
[DruidQuery] DataSourceMetadataQuery
[DruidQuery] TimeBoundaryQuery

…yahoo#419)

- The api names for accessing the druid dimensions are now all named correctly to the physical dimensions in Druid.

kevinhinterlong · 2017-07-17T13:38:39Z

fili-core/src/main/java/com/yahoo/bard/webservice/sql/DruidQueryToSqlConverter.java

+                    String sql = writeSql(sqlWriter, relToSql, localBuilder);
+                    return "(" + sql + ")";
+                })
+                .collect(Collectors.joining("\nUNION ALL\n"));


Ok so after looking into it further, it appears this is supported by MySQL/PostgreSQL (not sure what else), but is not supported by Standard SQL. So for example SQLite will fail with this method.

Relevant Stackoverflow

So it looks like the best way to accomplish the ORDER BY and LIMIT functionality on subqueries (these are used by TopN over each bucket and the results are unioned together) would be to make multiple queries and aggregate the results manually like @michael-mclawhorn suggested.

I've filed a bug report with Calcite, but since I'm not sure how much can be done for this issue.

kevinhinterlong · 2017-07-17T14:50:14Z

docs/sql-support.md

+Fili has initial support for using a sql database as the backend instead of druid. Please
+note that there are some restrictions since Fili is optimized for Druid.
+
+## Setup


Maybe mention that DruidDimensionLoader should be turned off?

kevinhinterlong · 2017-07-19T13:58:19Z

...re/src/main/java/com/yahoo/bard/webservice/sql/aggregation/DruidSqlAggregationConverter.java

+/**
+ * Provides a mapping from Druid's {@link Aggregation} to a {@link SqlAggregationBuilder}.
+ */
+public interface DruidSqlAggregationConverter {


@archolewa I've done something similar to what you mentioned in #413, but tied it specifically to aggregations, Let me know if you have any concerns with this approach

kevinhinterlong · 2017-07-19T13:59:38Z

fili-core/src/main/java/com/yahoo/bard/webservice/sql/evaluator/FilterEvaluator.java

+ */
+public class FilterEvaluator implements ReflectiveVisitor {
+    private RelBuilder builder;
+    private final List<String> dimensions;


This should probably be a Set

kevinhinterlong · 2017-07-19T14:02:17Z

fili-core/src/main/java/com/yahoo/bard/webservice/sql/evaluator/PostAggregationEvaluator.java

+                }
+                return sub;
+            case DIVIDE:
+                if (arithmeticPostAggregation.getFields().size() != 2) {


This PR needs to be rebased on #413, this was changed there

kevinhinterlong · 2017-07-19T14:03:23Z

fili-core/src/main/java/com/yahoo/bard/webservice/sql/helper/CalciteHelper.java

+ * Small utility class to help with connection to databases, building, and writing sql.
+ */
+public class CalciteHelper {
+    public static final String DEFAULT_SCHEMA = "PUBLIC";


I don't think this is true for most but not all databases. Another reason this should be specified in the Table logic and not here

kevinhinterlong · 2017-07-19T14:04:50Z

fili-core/src/main/java/com/yahoo/bard/webservice/sql/helper/DatabaseHelper.java

+     * @throws SQLException if failed while reading database.
+     * @throws IllegalStateException if no {@link JDBCType#TIMESTAMP} column could be found.
+     */
+    public static String getTimestampColumn(Connection connection, String schema, String table)


Should also be specified in the Table, repurpose this to check that the time column is actually a TIMESTAMP

kevinhinterlong · 2017-07-19T14:06:39Z

fili-core/src/main/java/com/yahoo/bard/webservice/web/handlers/SqlRequestHandler.java

+    private void initializeSqlBackend(ObjectMapper mapper) {
+        String dbUrl = SYSTEM_CONFIG.getStringProperty(DATABASE_URL);
+        String driver = SYSTEM_CONFIG.getStringProperty(DATABASE_DRIVER);
+        String schema = SYSTEM_CONFIG.getStringProperty(DATABASE_SCHEMA);


remove schema from here as well

kevinhinterlong · 2017-07-19T14:07:12Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/sql/DefaultSqlBackedClientSpec.groovy

+    private static final String TRUE = "TRUE"
+    private static final String FALSE = "FALSE"
+    private static final String FIRST_COMMENT = "added project"
+    //this is the first result in the database


clarify "this"

Added security module with chaining filters.

- Need to properly distinguish between api and field names

- Before this, everything was assumed to have the same field and api name - Need to also correct this for metrics, and there should probably be a test for this

- maybe delete the SqlTimeConverter comments?

- Update and add comments to a lot of code - Refactoring (SqlConverter -> DefaultSqlBackedClient + DefaultSqlBackedClient) - SqlBackedClient/DefaultSqlBackedClient now only handle the top level operations and shouldn't need to be overridden - Cleaned up how aggregations are converted from druid to sql

- The DruidQuery is now converted into multiple SQL queries which should be processed in order (Only affects TopN Queries)

- Calcite already handles this to make sure all columns needed are available

- The dimensions were not being mapped to their field names correctly - The dimensions should be the last position in the order by - The timegrain to date part functions didn't work correctly for ZonedTimeGrains

- Also added a test to show that timezones are mapped correctly

- Both now have tests showing that the mapping to field names is working

QubitPi and others added 6 commits June 30, 2017 08:21

Implement etag cache request handler (yahoo#312)

f28707e

Fix availability testing utils to be compatible with composite tables (…

a9863aa

…yahoo#419)

Use camel case for default dimension field. (yahoo#423)

55e8f4c

Add Table-wide Availability (yahoo#414)

e6d4874

Fix wikipedia-example (yahoo#415)

5971328

- The api names for accessing the druid dimensions are now all named correctly to the physical dimensions in Druid.

Give credit to Groovy for lifting the JsonSlurper (yahoo#421)

af20b1b

kevinhinterlong added the WIP label Jul 14, 2017

kevinhinterlong commented Jul 17, 2017

View reviewed changes

kevinhinterlong force-pushed the SqlBackendImplementation branch 2 times, most recently from 83dfbe5 to 2bf9144 Compare July 17, 2017 14:43

kevinhinterlong commented Jul 17, 2017

View reviewed changes

Ensuring travis script shuts down ssh-agent

cede457

kevinhinterlong commented Jul 19, 2017

View reviewed changes

This was referenced Jul 19, 2017

Enhance SearchFilter.QueryType and FailedFuture #396

Merged

Finish Review of Partial SQL backend #434

Closed

kevinhinterlong added the NEED REBASE label Jul 19, 2017

kevinhinterlong and others added 4 commits July 19, 2017 11:10

Fix fili doc links which were resulting in 404 (yahoo#422)

90a5d75

Remove org.json dependency (yahoo#416)

2e04600

Remove globally mutable instance from SimplifiedIntervalList (yahoo#290)

e32c266

Document CHANGELOG process

bf6b9fd

kevinhinterlong force-pushed the SqlBackendImplementation branch from 1937592 to aa2b76f Compare July 19, 2017 18:11

kevinhinterlong added this to the Kevin End Date milestone Jul 19, 2017

Simple security example (yahoo#405)

c930a58

Added security module with chaining filters.

kevinhinterlong force-pushed the SqlBackendImplementation branch from aac7302 to e7a1e95 Compare July 24, 2017 15:48

kevinhinterlong added 28 commits August 1, 2017 12:45

Add ability to sort and limit GroupByQuery

376412d

Allow for sorting of TopNQuery and partially working pagination

48930a9

Minor cleanup for evaluators

1590332

- Need to properly distinguish between api and field names

Use PhysicalTableDictionary to get field names on dimensions

c3f8e96

- Before this, everything was assumed to have the same field and api name - Need to also correct this for metrics, and there should probably be a test for this

Rebase onto partialSql Implementation adding sql functionality

57e70a2

Use datasource from druidQuery instead of providing a dictionary

fa7f482

partially address comments

fd90c2a

Rebase onto partialSql and fix bug with not closing connections

fcce75d

temporary fixup with comments

4fa3f65

- maybe delete the SqlTimeConverter comments?

Fix dependence on MySQL/PostgreSQL backends

eca0b9f

- The DruidQuery is now converted into multiple SQL queries which should be processed in order (Only affects TopN Queries)

Remove unneeded project statement

daca4db

- Calcite already handles this to make sure all columns needed are available

Address comments

c7e3a98

Remove CompletedFuture

318acdb

Remove sql TopN in favor of fili's TopNResultSetMapper

2255136

fix ordering by metrics

2ccf3ae

Add support for AllGranularity

80a5795

Use timezone associated with query when reading results

45988a9

Add tests for sorting direction

d7de8e3

Minor cleanup

74ca85f

Fix an apiToFieldMapper usage and timegrain grouping

3f6952a

- The dimensions were not being mapped to their field names correctly - The dimensions should be the last position in the order by - The timegrain to date part functions didn't work correctly for ZonedTimeGrains

partially address self made comments

8ec17a7

Fix time filters to use the table's timezone

0391885

- Also added a test to show that timezones are mapped correctly

Fix mapping from api to field name for aggregations

35fbdb5

Fix having and filter evaluator

ffad7ca

- Both now have tests showing that the mapping to field names is working

Add tests to verify long/double aggregations are serialized correctly

cb3dc55

Add test to fix multiple dimension fields being ordered correctly

4aac13c

FIx SearchFilter usage after rebase

75fa7ea

kevinhinterlong force-pushed the SqlBackendImplementation branch from 3294989 to 75fa7ea Compare August 1, 2017 17:55

kevinhinterlong closed this Aug 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sql backend implementation #430

Sql backend implementation #430

kevinhinterlong commented Jul 14, 2017 •

edited

Loading

kevinhinterlong Jul 17, 2017 •

edited

Loading

kevinhinterlong Jul 17, 2017

kevinhinterlong Jul 19, 2017

kevinhinterlong Jul 19, 2017

kevinhinterlong Jul 19, 2017

kevinhinterlong Jul 19, 2017

kevinhinterlong Jul 19, 2017

kevinhinterlong Jul 19, 2017

kevinhinterlong Jul 19, 2017

Sql backend implementation #430

Sql backend implementation #430

Conversation

kevinhinterlong commented Jul 14, 2017 • edited Loading

kevinhinterlong Jul 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinhinterlong commented Jul 14, 2017 •

edited

Loading

kevinhinterlong Jul 17, 2017 •

edited

Loading