Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Druid SQL queries #90

Closed
anskarl opened this issue May 2, 2020 · 0 comments
Closed

Support Druid SQL queries #90

anskarl opened this issue May 2, 2020 · 0 comments

Comments

@anskarl
Copy link
Contributor

anskarl commented May 2, 2020

Apache Druid supports two query languages, native JSON-based queries and SQL. At the moment native Druid API is more powerful compare to the SQL, perhaps this may change in the future and become equal. Internally, Druid uses a parser and planner based on Apache Calcite which translates to the native API.

So far Scruid performs queries on top the native API of Druid, either using the specialized case classes or the DQL. Additionally, Scruid uses Akka Http and Streams to provide a robust HTTP client for performing requests and handling of large payloads.

According to the documentation of Druid there are two client APIs for SQL, HTTP POST and JDBC. The non-JDBC JSON over HTTP API is stateless and does not require connection stickiness.

I think that SQL support it would be nice addition to Scruid and take advantage of the HTTP clients that it provides, as well as the bindings to case classes (via circe) regarding the results.

Druid SQL technical details

To perform SQL queries using HTTP, Druid provides the endpoint /druid/v2/sql/. An example query is given below:

{
  "query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar' AND __time > TIMESTAMP '2000-01-01 00:00:00'",
  "context" : {
    "sqlTimeZone" : "America/Los_Angeles"
  }
}

Druid also supports parameterized queries:

{
  "query" : "SELECT COUNT(*) FROM data_source WHERE foo = ? AND __time > ?",
  "parameters": [
    { "type": "VARCHAR", "value": "bar"},
    { "type": "TIMESTAMP", "value": "2000-01-01 00:00:00" }
  ]
}

The structure of JSON object to perform SQL queries contains the query and optionally the context, parameters and resultFormat. The latter controls the format of the resulting response, for Scruid setting resultFormat to object is an option that can be directly supported in order to retain the case class mapping via circe.

Implementation features

In https://github.com/anskarl/scruid/tree/feature/sql there is a full implementation of SQL in Scruid and supports the following:

  • standard Druid SQL
  • parameterized queries
  • context parameters for Druid SQL
  • string interpolation
  • multi-line queries
  • case class mapping
  • uses Http clients of Scruid
  • supports Akka streams

Examples

SQL queries are expressed using the sql processed string function. The examples below demonstrate the functionality:

import scala.concurrent.Future
import ing.wbaa.druid.SQL._
import ing.wbaa.druid.DruidSQLResults

val query = sql"""SELECT SELECT COUNT(*) as "count" FROM wikipedia WHERE "__time" >= TIMESTAMP '2015-09-12 00:00:00'"""

val response: Future[DruidSQLResults] = query.execute()

// The resulting response can also mapped to a case class
case class Result(count: Double)

val result: Future[Result] = response.map(_.list[Result])

Multi-line queries:

import ing.wbaa.druid.SQL._

val query = sql"""
    |SELECT SELECT COUNT(*) as "count" 
    |FROM wikipedia 
    |WHERE "__time" >= TIMESTAMP '2015-09-12 00:00:00'
    """.stripMargin

String interpolation:

import ing.wbaa.druid.SQL._

val countColumnName = "count"
val dataSourceName = "wikipedia"
val dateTime = "2015-09-12 00:00:00"

val query = sql"""
    |SELECT SELECT COUNT(*) as "${countColumnName}" 
    |FROM ${dataSourceName} 
    |WHERE "__time" >= TIMESTAMP '${dateTime}'
    """.stripMargin

Parameterized Queries:

import java.time.LocalDateTime
import ing.wbaa.druid.SQLQuery

import ing.wbaa.druid.SQL._

val fromDateTime  = LocalDateTime.of(2015, 9, 12, 0, 0, 0, 0)
val untilDateTime = fromDateTime.plusDays(1)

val queryParameterized: SQLQuery.Parameterized =
  sql"""
  |SELECT FLOOR(__time to HOUR) AS hourTime, count(*) AS "count"
  |FROM wikipedia
  |WHERE "__time" BETWEEN ? AND ?
  |GROUP BY 1
  |""".stripMargin.parameterized

val query: SQLQuery = queryParameterized
  .withParameter(fromDateTime)
  .withParameter(untilDateTime)
  .create()

Context parameters:

import ing.wbaa.druid.SQL._
import ing.wbaa.druid.definitions.QueryContext

val contextParameters = Map(
  QueryContext.SqlQueryId -> "scruid-sql-example-query",
  QueryContext.SqlTimeZone -> "America/Los_Angeles"
)
val query =
  sql"""SELECT SELECT COUNT(*) as "count" FROM wikipedia WHERE "__time" >= TIMESTAMP '2015-09-12 00:00:00'"""
      .setContext(contextParameters)

Akka Streams:

import akka.NotUsed
import akka.stream.scaladsl.Source
import ing.wbaa.druid.SQL._
import ing.wbaa.druid.{DruidSQLResult, DruidSQLResults}

val query = sql"""SELECT SELECT COUNT(*) as "count" FROM wikipedia WHERE "__time" >= TIMESTAMP '2015-09-12 00:00:00'"""

val source: Source[DruidSQLResult, NotUsed] = query.stream()

// The resulting stream of DruidSQLResult can also mapped to a case class
case class Result(count: Double)

val sourceOfResults: Source[Result, NotUsed] = query.streamAs[Result]()
anskarl added a commit to anskarl/scruid that referenced this issue May 8, 2020
  - standard Druid SQL
  - parameterized queries
  - SQL context parameters
  - string interpolation
  - multi-line queries

ing-bankGH-90
anskarl added a commit to anskarl/scruid that referenced this issue May 8, 2020
anskarl added a commit to anskarl/scruid that referenced this issue May 8, 2020
  - updates logging calls in DruidHttpClient
  - updates documentation

ing-bankGH-90
anskarl added a commit to anskarl/scruid that referenced this issue May 8, 2020
  - drops standard string interpolation
  - uses a custom interpolator that unifies plain and parameterized queries
  - drops explicit parameterized queries
  - updates documentation and unit tests

ing-bankGH-90
anskarl added a commit to anskarl/scruid that referenced this issue May 8, 2020
  - adds supoort for Instant
  - adds configuration zone-id, in order to correctly convert Instant, Timestam, LocalDateTime and LocalDate to Druid SQL TIMESTAMP (y-MM-dd HH:mm:dd)
  - adds unit test to verify the creation of parameterized sql query
  - updates documentation (for zone-id)
  - moves SQLQuerySpec from package ing.wbaa.druid.sql to ing.wbaa.druid

ing-bankGH-90
anskarl added a commit to anskarl/scruid that referenced this issue May 8, 2020
  - standard Druid SQL
  - parameterized queries
  - SQL context parameters
  - string interpolation
  - multi-line queries

ing-bankGH-90
anskarl added a commit to anskarl/scruid that referenced this issue May 8, 2020
anskarl added a commit to anskarl/scruid that referenced this issue May 8, 2020
  - updates logging calls in DruidHttpClient
  - updates documentation

ing-bankGH-90
anskarl added a commit to anskarl/scruid that referenced this issue May 8, 2020
  - drops standard string interpolation
  - uses a custom interpolator that unifies plain and parameterized queries
  - drops explicit parameterized queries
  - updates documentation and unit tests

ing-bankGH-90
anskarl added a commit to anskarl/scruid that referenced this issue May 8, 2020
  - adds supoort for Instant
  - adds configuration zone-id, in order to correctly convert Instant, Timestam, LocalDateTime and LocalDate to Druid SQL TIMESTAMP (y-MM-dd HH:mm:dd)
  - adds unit test to verify the creation of parameterized sql query
  - updates documentation (for zone-id)
  - moves SQLQuerySpec from package ing.wbaa.druid.sql to ing.wbaa.druid

ing-bankGH-90
anskarl added a commit to anskarl/scruid that referenced this issue May 8, 2020
  - adds supoort for Instant
  - adds configuration zone-id, in order to correctly convert Instant, Timestam, LocalDateTime and LocalDate to Druid SQL TIMESTAMP (y-MM-dd HH:mm:dd)
  - adds unit test to verify the creation of parameterized sql query
  - updates documentation (for zone-id)
  - moves SQLQuerySpec from package ing.wbaa.druid.sql to ing.wbaa.druid

ing-bankGH-90
@krisgeus krisgeus closed this as completed May 8, 2020
@anskarl anskarl mentioned this issue May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants