Releases: apache/druid
Druid 24.0.2
Apache Druid 24.0.2 is a bug fix release that fixes some issues in the 24.0.1 release.
See the complete set of changes for additional details.
# Bug fixes
#13138 to fix dependency errors while launching a Hadoop task.
# Credits
Druid 24.0.1
Apache Druid 24.0.1 is a bug fix release that fixes some issues in the 24.0 release.
See the complete set of changes for additional details.
# Notable Bug fixes
#13214 to fix SQL planning when using the JSON_VALUE function.
#13297 to fix values that match a range filter on nested columns.
#13077 to fix detection of nested objects while generating an MSQ SQL in the web-console.
#13172 to correctly handle overlord leader election even when tasks cannot be reacquired.
#13259 to fix memory leaks from SQL statement objects.
#13273 to fix overlord API failures by de-duplicating task entries in memory.
#13049 to fix a race condition while processing query context.
#13151 to fix assertion error in SQL planning.
# Credits
Thanks to everyone who contributed to this release!
@abhishekagarwal87
@AmatyaAvadhanula
@clintropolis
@gianm
@kfaraz
@LakshSingla
@paul-rogers
@vogievetsky
# Known issues
- Hadoop ingestion does not work with custom extension config due to injection errors
(fixed in #13138)
Druid 24.0.0
Apache Druid 24.0.0 contains over 300 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 67 contributors. See the complete set of changes for additional details.
# Major version bump
Starting with this release, we have dropped the leading 0
from the release version and promoted all other digits one place to the left. Druid is now at major version 24, a jump up from the prior 0.23.0 release. In terms of backward-compatibility or breaking changes, this release is not significantly different than other previous major releases such as 0.23.0 or 0.22.0. We are continuing with the same policy as we have used in prior releases: minimizing the number of changes that require special attention when upgrading, and calling out any that do exist in the release notes. For this release, please refer to the Upgrading to 24.0.0 section for a list of backward-incompatible changes in this release.
# New Features
# Multi-stage query task engine
SQL-based ingestion for Apache Druid uses a distributed multi-stage query architecture, which includes a query engine called the multi-stage query task engine (MSQ task engine). The MSQ task engine extends Druid's query capabilities, so you can write queries that reference external data as well as perform ingestion with SQL INSERT and REPLACE. Essentially, you can perform SQL-based ingestion instead of using JSON ingestion specs that Druid's native ingestion uses. In addition to the easy-to-use syntax, the SQL interface lets you perform transformations that involve multiple shuffles of data.
SQL-based ingestion using the multi-stage query task engine is recommended for batch ingestion starting in Druid 24.0.0. Native batch and Hadoop-based ingestion continue to be supported as well. We recommend you review the known issues and test the feature in a staging environment before rolling out in production. Using the multi-stage query task engine with plain SELECT
statements (not INSERT ... SELECT
or REPLACE ... SELECT
) is experimental.
If you're upgrading from an earlier version of Druid or you're using Docker, you'll need to add the druid-multi-stage-query
extension to druid.extensions.loadlist
in your common.runtime.properties
file.
For more information, refer to the Overview documentation for SQL-based ingestion.
# Nested columns
Druid now supports directly storing nested data structures in a newly added COMPLEX<json>
column type. COMPLEX<json>
columns store a copy of the structured data in JSON format as well as specialized internal columns and indexes for nested literal values—STRING
, LONG
, and DOUBLE
types. An optimized virtual column allows Druid to read and filter these values at speeds consistent with standard Druid LONG
, DOUBLE
, and STRING
columns.
Newly added Druid SQL, native JSON functions, and virtual column allow you to extract, transform, and create COMPLEX<json>
values in at query time. You can also use the JSON functions in INSERT
and REPLACE
statements in SQL-based ingestion, or in a transformSpec
in native ingestion as an alternative to using a flattenSpec
object to "flatten" nested data for ingestion.
See SQL JSON functions, native JSON functions, Nested columns, virtual columns, and the feature summary for more detail.
# Updated Java support
Java 11 is fully supported is no longer experimental. Java 17 support is improved.
# Query engine updates
# Updated column indexes and query processing of filters
Reworked column indexes to be extraordinarily flexible, which will eventually allow us to model a wide range of index types. Added machinery to build the filters that use the updated indexes, while also allowing for other column implementations to implement the built-in index types to provide adapters to make use indexing in the current set filters that Druid provides.
# Time filter operator
You can now use the Druid SQL operator TIME_IN_INTERVAL to filter query results based on time. Prefer TIME_IN_INTERVAL over the SQL BETWEEN operator to filter on time. For more information, see Date and time functions.
# Null values and the "in" filter
If a values
array contains null
, the "in" filter matches null values. This differs from the SQL IN filter, which does not match null values.
For more information, see Query filters and SQL data types.
#12863
# Virtual columns in search queries
Previously, a search query could only search on dimensions that existed in the data source. Search queries now support virtual columns as a parameter in the query.
# Optimize simple MIN / MAX SQL queries on __time
Simple queries like select max(__time) from ds
now run as a timeBoundary
queries to take advantage of the time dimension sorting in a segment. You can set a feature flag to enable this feature.
# String aggregation results
The first/last string aggregator now only compares based on values. Previously, the first/last string aggregator’s values were compared based on the _time
column first and then on values.
If you have existing queries and want to continue using both the _time
column and values, update your queries to use ORDER BY MAX(timeCol).
# Reduced allocations due to Jackson serialization
Introduced and implemented new helper functions in JacksonUtils
to enable reuse of
SerializerProvider
objects.
Additionally, disabled backwards compatibility for map-based rows in the GroupByQueryToolChest
by default, which eliminates the need to copy the heavyweight ObjectMapper
. Introduced a configuration option to allow administrators to explicitly enable backwards compatibility.
# Updated IPAddress Java library
Added a new IPAddress Java library dependency to handle IP addresses. The library includes IPv6 support. Additionally, migrated IPv4 functions to use the new library.
# Query performance improvements
Optimized SQL operations and functions as follows:
- Vectorized numeric latest aggregators (#12439)
- Optimized
isEmpty()
andequals()
on RangeSets (#12477) - Optimized reuse of Yielder objects (#12475)
- Operations on numeric columns with indexes are now faster (#12830)
- Optimized GroupBy by reducing allocations. Reduced allocations by reusing entry and key holders (#12474)
- Added a vectorized version of string last aggregator (#12493)
- Added Direct UTF-8 access for IN filters (#12517)
- Enabled virtual columns to cache their outputs in case Druid calls them multiple times on the same underlying row (#12577)
- Druid now rewrites a join as a filter when possible in IN joins (#12225)
- Added automatic sizing for GroupBy dictionaries (#12763)
- Druid now distributes JDBC connections more evenly amongst brokers (#12817)
# Streaming ingestion
# Kafka consumers
Previously, consumers that were registered and used for ingestion persisted until Kafka deleted the...
Druid 0.23.0
Apache Druid 0.23.0 contains over 450 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 81 contributors. See the complete set of changes for additional details.
# New Features
# Query engine
# Grouping on arrays without exploding the arrays
You can now group on a multi-value dimension as an array. For a datasource named "test":
{"timestamp": "2011-01-12T00:00:00.000Z", "tags": ["t1","t2","t3"]} #row1
{"timestamp": "2011-01-13T00:00:00.000Z", "tags": ["t3","t4","t5"]} #row2
{"timestamp": "2011-01-14T00:00:00.000Z", "tags": ["t5","t6","t7"]} #row3
{"timestamp": "2011-01-14T00:00:00.000Z", "tags": []} #row4
The following query:
{
"queryType": "groupBy",
"dataSource": "test",
"intervals": [
"1970-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
],
"granularity": {
"type": "all"
},
"virtualColumns" : [ {
"type" : "expression",
"name" : "v0",
"expression" : "mv_to_array(\"tags\")",
"outputType" : "ARRAY<STRING>"
} ],
"dimensions": [
{
"type": "default",
"dimension": "v0",
"outputName": "tags"
"outputType":"ARRAY<STRING>"
}
],
"aggregations": [
{
"type": "count",
"name": "count"
}
]
}
Returns the following:
[
{
"timestamp": "1970-01-01T00:00:00.000Z",
"event": {
"count": 1,
"tags": "[]"
}
},
{
"timestamp": "1970-01-01T00:00:00.000Z",
"event": {
"count": 1,
"tags": "["t1","t2","t3"]"
}
},
{
"timestamp": "1970-01-01T00:00:00.000Z",
"event": {
"count": 1,
"tags": "[t3","t4","t5"]"
}
},
{
"timestamp": "1970-01-01T00:00:00.000Z",
"event": {
"count": 2,
"tags": "["t5","t6","t7"]"
}
}
]
# Specify a column other than __time column for row comparison in first/last aggregators
You can pass time column in *first
/*last
aggregators by using LATEST_BY
/ EARLIEST_BY
SQL functions. This provides support for cases where the time is stored as a part of a column different than "__time". You can also specify another logical time column.
(#11949)
(#12145)
# Improvements to querying user experience
This release includes several improvements for querying:
- Added the SQL query ID to response header for failed SQL query to aid in locating the error messages (#11756)
- Added input type validation for DataSketches HLL (#12131)
- Improved JDBC logging (#11676)
- Added SQL functions MV_FILTER_ONLY and MV_FILTER_NONE to filter rows of multi-value string dimensions to include only the supplied list of values or none of them respectively (#11650)
- Added ARRAY_CONCAT_AGG to aggregate array inputs together into a single array (#12226)
- Added the ability to authorize the usage of query context parameters (#12396)
- Improved query IDs to make it easier to link queries and sub-queries for end-to-end query visibility (#11809)
- Added a safe divide function to protect against division by 0 (#11904)
- You can now add a query context to internally generated
SegmentMetadata
query (#11429) - Added support for Druid complex types to the native expression processing system to make all Druid data usable within expressions (#11853, #12016)
- You can control the size of the on-heap segment-level dictionary via
druid.query.groupBy.maxSelectorDictionarySize
when grouping on string or array-valued expressions that do not have pre-existing dictionaries. - You have better protection against filter explosion during CNF conversion (#12314) (#12324)
- You can get the complete native query on explaining the SQL query by setting
useNativeQueryExplain
to true in query context (#11908) - You can have broker ignore real time nodes or specific historical tiers. (#11766) (#11732)
# Streaming Ingestion
# Kafka input format for parsing headers and key
We've introduced a Kafka input format so you can ingest header data in addition to the message contents. For example:
- the event key field
- event headers
- the Kafka event timestamp
- the Kafka event value that stores the payload.
(#11630)
# Kinesis ingestion - Improvements
We have made following improvements in kinesis ingestion
- Re-sharding can affect and slow down ingestion as many intermediate empty shards are created. These shards get assigned to tasks causing imbalance in load assignment. You can set
skipIgnorableShards
totrue
in kinesis ingestion tuning config to ignore such shards. (#12235) - Currently, kinesis ingestion uses
DescribeStream
to fetch the list of shards. This call is deprecated and slower. In this release, you can switch to a newer APIlistShards
by settinguseListShards
totrue
in kinesis ingestion tuning config. (#12161)
# Native Batch Ingestion
# Multi-dimension range partitioning
Multi-dimension range partitioning allows users to partition their data on the ranges of any number of dimensions. It develops further on the concepts behind "single-dim" partitioning and is now arguably the most preferable secondary partitioning, both for query performance and storage efficiency.
(#11848)
(#11973)
# Improved replace data behavior
In previous versions of Druid, if ingested data with dropExisting
flag to replace data, Druid would retain the existing data for a time chunk if there was no new data to replace it. Now, if you set dropExisting
to true
in your ioSpec
and ingest data for a time range that includes a time chunk with no data, Druid uses a tombstone to overshadow the existing data in the empty time chunk.
(#12137)
This release includes several improvements for native batch ingestion:
- Druid now emits a new metric when a batch task finishes waiting for segment availability. (#11090)
- Added
segmentAvailabilityWaitTimeMs
, the duration in milliseconds that a task waited for its segments to be handed off to Historical nodes, toIngestionStatsAndErrorsTaskReportData
(#11090) - Added functionality to preserve existing metrics during ingestion (#12185)
- Parallel native batch task can now provide task reports for the sequential and single phase mode (e.g., used with dynamic partitioning) as well as single phase mode subtasks (#11688)
- Added support for
RowStats
indruid/indexer/v1/task/{task_id}/reports
API for multi-phase parallel indexing task (#12280) - Fixed the OOM failures in the dimension distribution phase of parallel indexing (#12331)
- Added support to handle null dimension values while creating partition boundaries (#11973)
# Improvements to ingestion in general
This release includes several improvements for ingestion in general:
- Removed the template modifier from
IncrementalIndex<AggregatorType>
because it is no longer required - You can now use
JsonPath
functions inJsonPath
expressions during ingestion (#11722) - Druid no longer creates a materialized list of segment files and elimited looping over the files to reduce OOM issues (#11903)
- Added an intermediate-persist
IndexSpec
to the main "merge" method inIndexMerger
(#11940) Granularity.granularitiesFinerThan
now returns ALL if you pass in ALL (#12003)- Added a configuation parameter for appending tasks to allow them to use a SHARED lock (#12041)
SchemaRegistryBasedAvroBytesDecoder
now throws aParseException
instead of RE when it fails to retrieve a schema (#12080)- Added
includeAllDimensions
todimensionsSpec
to put all explicit dimensions first inInputRow
and subsequently any other dimensions found in input data (#12276) - Added the ability to store null columns in segments (#12279)
# Compaction
This release includes several improvements for compaction:
- Automatic compaction now supports complex dimensions (#11924)
- Automatic compaction now supports overlapping segment in...
druid-0.22.1
Apache Druid 0.22.1 is a bug fix release that fixes some security issues. See the complete set of changes for additional details.
# Bug fixes
#12051 Update log4j to 2.15.0 to address CVE-2021-44228
#11787 JsonConfigurator no longer logs sensitive properties
#11786 Update axios to 0.21.4 to address CVE-2021-3749
#11844 Update netty4 to 4.1.68 to address CVE-2021-37136 and CVE-2021-37137
# Credits
Thanks to everyone who contributed to this release!
@abhishekagarwal87
@andreacyc
@clintropolis
@gianm
@jihoonson
@kfaraz
@xvrl
druid-0.22.0
Apache Druid 0.22.0 contains over 400 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 73 contributors. See the complete set of changes for additional details.
# New features
# Query engine
# Support for multiple distinct aggregators in same query
Druid now can support multiple DISTINCT
'exact' counts using the grouping aggregator typically used with grouping sets. Note that this only applies to exact counts - when druid.sql.planner.useApproximateCountDistinct
is false, and can be enabled by setting druid.sql.planner.useGroupingSetForExactDistinct
to true
.
# SQL ARRAY_AGG and STRING_AGG aggregator functions
The ARRAY_AGG
aggregation function has been added, to allow accumulating values or distinct values of a column into a single array result. This release also adds STRING_AGG
, which is similar to ARRAY_AGG
, except it joins the array values into a single string with a supplied 'delimiter' and it ignores null values. Both of these functions accept a maximum size parameter to control maximum result size, and will fail if this value is exceeded. See SQL documentation for additional details.
# Bitwise math function expressions and aggregators
Several new SQL functions functions for performing 'bitwise' math (along with corresponding native expressions), including BITWISE_AND
, BITWISE_OR
, BITWISE_XOR
and so on. Additionally, aggregation functions BIT_AND
, BIT_OR
, and BIT_XOR
have been added to accumulate values in a column with the corresponding bitwise function. For complete details see SQL documentation.
# Human readable number format functions
Three new SQL and native expression number format functions have been added in Druid 0.22.0, HUMAN_READABLE_BINARY_BYTE_FORMAT
, HUMAN_READABLE_DECIMAL_BYTE_FORMAT
, and HUMAN_READABLE_DECIMAL_FORMAT
, which allow transforming results into a more friendly consumption format for query results. For more information see SQL documentation.
# Expression aggregator
Druid 0.22.0 adds a new 'native' JSON query expression aggregator function, that lets you use Druid native expressions to perform "fold" (alternatively known as "reduce") operations to accumulate some value on any number of input columns. This adds significant flexibility to what can be done in a Druid aggregator, similar in a lot of ways to what was possible with the Javascript aggregator, but in a much safer, sandboxed manner.
Expressions now being able to perform a "fold" on input columns also really rounds out the abilities of native expressions in addition to the previously possible "map" (expression virtual columns), "filter" (expression filters) and post-transform (expression post-aggregators) functions.
Since this uses expressions, performance is not yet optimal, and it is not directly documented yet, but it is the underlying technology behind the SQL ARRAY_AGG
, STRING_AGG
, and bitwise aggregator functions also added in this release.
# SQL query routing improvements
Druid 0.22 adds some new facilities to provide extension writers with enhanced control over how queries are routed between Druid routers and brokers. The first adds a new manual
broker selection strategy to the Druid router, which allows a query to manually specify which Druid brokers a query should be sent to based on a query context parameter brokerService
to any broker pool defined in druid.router.tierToBrokerMap
(this corresponds to the 'service name' of the broker set, druid.service
).
The second new feature allows the Druid router to parse and examine SQL queries so that broker selection strategies can also function for SQL queries. This can be enabled by setting druid.router.sql.enable
to true. This does not affect JDBC queries, which use a different mechanism to facilitate "sticky" connections to a single broker.
# Avatica protobuf JDBC Support
Druid now supports using Avatica Protobuf JDBC connections, such as for use with the Avatica Golang Driver, and has a separate endpoint from the JSON JDBC uri.
String url = "jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica-protobuf/;serialization=protobuf";
# Improved query error logging
Query exceptions have been changed from WARN
level to ERROR
level to include additional information in the logs to help troubleshoot query failures. Additionally, a new query context flag, enableQueryDebugging
has been added that will include stack traces in these query error logs, to provide even more information without the need to enable logs at the DEBUG
level.
# Streaming Ingestion
# Task autoscaling for Kafka and Kinesis streaming ingestion
Druid 0.22.0 now offers experimental support for dynamic Kafka and Kinesis task scaling. The included strategies are driven by periodic measurement of stream lag (which is based on message count for Kafka, and difference of age between the message iterator and the oldest message for Kinesis), and will adjust the number of tasks based on the amount of 'lag' and several configuration parameters. See Kafka and Kinesis documentation for complete information.
# Avro and Protobuf streaming InputFormat and Confluent Schema Registry Support
Druid streaming ingestion now has support for Avro and Protobuf in the updated InputFormat
specification format, which replaces the deprecated firehose/parser specification used by legacy Druid streaming formats. Alongside this, comes support for obtaining schemas for these formats from Confluent Schema Registry. See data formats documentation for further information.
# Kafka ingestion support for specifying group.id
Druid Kafka streaming ingestion now optionally supports specifying group.id
on the connections Druid tasks make to the Kafka brokers. This is useful for accessing clusters which require this be set as part of authorization, and can be specified in the consumerProperties
section of the Kafka supervisor spec. See Kafka ingestion documentation for more details.
# Native Batch Ingestion
# Support for using deep storage for intermediary shuffle data
Druid native 'perfect rollup' 2-phase ingestion tasks now support using deep storage as a shuffle location, as an alternative to local disks on middle-managers or indexers. To use this feature, set druid.processing.intermediaryData.storage.type
to deepstore
, which uses the configured deep storage type.
Note - With "deepstore" type, data is stored in shuffle-data
directory under the configured deep storage path, auto clean up for this directory is not supported yet. One can setup cloud storage lifecycle rules for auto clean up of data at shuffle-data
prefix location.
# Improved native batch ingestion task memory usage
Druid native batch ingestion has received a new configuration option, druid.indexer.task.batchProcessingMode
which introduces two new operating modes that should allow batch ingestion to operate with a smaller and more predictable heap memory usage footprint. The CLOSED_SEGMENTS_SINKS
mode is the most aggressive, and should have the smallest memory footprint, and works by eliminating in memory tracking and mmap of intermediary segments produced during segment creation, but isn't super well tested at this point so considered experimental...
druid-0.21.1
Apache Druid 0.21.1 is a bug fix release that fixes a few regressions with the 0.21 release. The first is an issue with the published Docker image, which causes containers to fail to start due to volume permission issues, described in #11166 as fixed in #11167. This release also fixes an issue caused by a bug in the upgraded Jetty version which was released in 0.21, described in #11206 and fixed in #11207. Finally, a web console regression related to field validation has been added in #11228.
# Bug fixes
#11167 fix docker volume permissions
#11207 Upgrade jetty version
#11228 Web console: Fix required field treatment
#11299 Fix permission problems in docker
# Credits
Thanks to everyone who contributed to this release!
druid-0.21.0
Apache Druid 0.21.0 contains around 120 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 36 contributors. Refer to the complete list of changes and everything tagged to the milestone for further details.
# New features
# Operation
# Service discovery and leader election based on Kubernetes
The new Kubernetes extension supports service discovery and leader election based on Kubernetes. This extension works in conjunction with the HTTP-based server view (druid.serverview.type=http
) and task management (druid.indexer.runner.type=httpRemote
) to allow you to run a Druid cluster with zero ZooKeeper dependencies. This extension is still experimental. See Kubernetes extension for more details.
# New dynamic coordinator configuration to limit the number of segments when finding a candidate segment for segment balancing
You can set the percentOfSegmentsToConsiderPerMove
to limit the number of segments considered when picking a candidate segment to move. The candidates are searched up to maxSegmentsToMove * 2
times. This new configuration prevents Druid from iterating through all available segments to speed up the segment balancing process, especially if you have lots of available segments in your cluster. See Coordinator dynamic configuration for more details.
# status
and selfDiscovered
endpoints for Indexers
The Indexer now supports status
and selfDiscovered
endpoints. See Processor information APIs for details.
# Querying
# New grouping
aggregator function
You can use the new grouping
aggregator SQL function with GROUPING SETS
or CUBE
to indicate which grouping dimensions are included in the current grouping set. See Aggregation functions for more details.
# Improved missing argument handling in expressions and functions
Expression processing now can be vectorized when inputs are missing. For example a non-existent column. When an argument is missing in an expression, Druid can now infer the proper type of result based on non-null arguments. For instance, for longColumn + nonExistentColumn
, nonExistentColumn
is treated as (long) 0
instead of (double) 0.0
. Finally, in default null handling mode, math functions can produce output properly by treating missing arguments as zeros.
# Allow zero period for TIMESTAMPADD
TIMESTAMPADD
function now allows zero period. This functionality is required for some BI tools such as Tableau.
# Ingestion
# Native parallel ingestion no longer requires explicit intervals
Parallel task no longer requires you to set explicit intervals in granularitySpec
. If intervals are missing, the parallel task executes an extra step for input sampling which collects the intervals to index.
# Old Kafka version support
Druid now supports Apache Kafka older than 0.11. To read from an old version of Kafka, set the isolation.level
to read_uncommitted
in consumerProperties
. Only 0.10.2.1 have been tested up until this release. See Kafka supervisor configurations for details.
Multi-phase segment merge for native batch ingestion
A new tuningConfig, maxColumnsToMerge
, controls how many segments can be merged at the same time in the task. This configuration can be useful to avoid high memory pressure during the merge. See tuningConfig for native batch ingestion for more details.
# Native re-ingestion is less memory intensive
Parallel tasks now sort segments by ID before assigning them to subtasks. This sorting minimizes the number of time chunks for each subtask to handle. As a result, each subtask is expected to use less memory, especially when a single Parallel task is issued to re-ingest segments covering a long time period.
# Web console
# Updated and improved web console styles
The new web console styles make better use of the Druid brand colors and standardize paddings and margins throughout. The icon and background colors are now derived from the Druid logo.
# Partitioning information is available in the web console
The web console now shows datasource partitioning information on the new Segment granularity
and Partitioning
columns.
Segment granularity
column in the Datasources
tab
Partitioning
column in the Segments
tab
# The column order in the Schema
table matches the dimensionsSpec
The Schema
table now reflects the dimension ordering in the dimensionsSpec
.
# Metrics
# Coordinator duty runtime metrics
The coordinator performs several 'duty' tasks. For example segment balancing, loading new segments, etc. Now there are two new metrics to help you analyze how fast the Coordinator is executing these duties.
coordinator/time
: the time for an individual duty to executecoordinator/global/time
: the time for the whole duties runnable to execute
# Query timeout metric
A new metric provides the number of timed out queries. Previously timed out queries were treated as interrupted and included in the query/interrupted/count
(see Changed HTTP status codes for query errors for more details).
query/timeout/count
: the number of timed out queries during the emission period
# Shuffle metrics for batch ingestion
Two new metrics provide shuffle statistics for MiddleManagers and Indexers. These metrics have the supervisorTaskId
as their dimension.
ingest/shuffle/bytes
: number of bytes shuffled per emission periodingest/shuffle/requests
: number of shuffle requests per emission period
To enable the shuffle metrics, add org.apache.druid.indexing.worker.shuffle.ShuffleMonitor
in druid.monitoring.monitors
. See Shuffle metrics for more details.
# New clock-drift safe metrics monitor scheduler
The default metrics monitor scheduler is implemented based on ScheduledThreadPoolExecutor
which is prone to unbounded clock drift. A new monitor scheduler, ClockDriftSafeMonitorScheduler
, overcomes this limitation. To use the new scheduler, set druid.monitoring.schedulerClassName
to org.apache.druid.java.util.metrics.ClockDriftSafeMonitorScheduler
in the runtime.properties file.
# Others
# New extension for a password p...
druid-0.20.2
Apache Druid 0.20.2 introduces new configurations to address CVE-2021-26919: Authenticated users can execute arbitrary code from malicious MySQL database systems. Users are recommended to enable new configurations in the below to mitigate vulnerable JDBC connection properties. These configurations will be applied to all JDBC connections for ingestion and lookups, but not for metadata store. See security configurations for more details.
druid.access.jdbc.enforceAllowedProperties
: When true, Druid appliesdruid.access.jdbc.allowedProperties
to JDBC connections starting withjdbc:postgresql:
orjdbc:mysql:
. When false, Druid allows any kind of JDBC connections without JDBC property validation. This config is set to false by default to not break rolling upgrade. This config is deprecated now and can be removed in a future release. The allow list will be always enforced in that case.druid.access.jdbc.allowedProperties
: Defines a list of allowed JDBC properties. Druid always enforces the list for all JDBC connections starting withjdbc:postgresql:
orjdbc:mysql:
ifdruid.access.jdbc.enforceAllowedProperties
is set to true. This option is tested against MySQL connector 5.1.48 and PostgreSQL connector 42.2.14. Other connector versions might not work.druid.access.jdbc.allowUnknownJdbcUrlFormat
: When false, Druid only accepts JDBC connections starting withjdbc:postgresql:
orjdbc:mysql:
. When true, Druid allows JDBC connections to any kind of database, but only enforcesdruid.access.jdbc.allowedProperties
for PostgreSQL and MySQL.
druid-0.20.1
Apache Druid 0.20.1 is a bug fix release that addresses CVE-2021-25646: Authenticated users can override system configurations in their requests which allows them to execute arbitrary code.
# Known issues
# Incorrect Druid version in docker-compose.yml
The Druid version is specified as 0.20.0 in the docker-compose.yml
file. We recommend to update the version to 0.20.1 before you run a Druid cluster using docker compose.