You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apache Druid 24.0.0 contains over 300 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 67 contributors. See the complete set of changes for additional details.
SQL-based ingestion for Apache Druid uses a distributed multi-stage query architecture, which includes a query engine called the multi-stage query task engine (MSQ task engine). The MSQ task engine extends Druid's query capabilities, so you can write queries that reference external data as well as perform ingestion with SQL INSERT and REPLACE. Essentially, you can perform SQL-based ingestion instead of using JSON ingestion specs that Druid's native ingestion uses.
SQL-based ingestion using the multi-stage query task engine is the recommended solution starting in Druid 24.0.0. Alternative ingestion solutions such as native batch and Hadoop-based ingestion systems will still be supported. We recommend you read all known issues and test the feature in a development environment before rolling out in production. Using the multi-stage query task engine with SELECT statements that do not write to a datasource is experimental.
The extension for it (druid-multi-stage-query) is loaded by default. If you're upgrading from an earlier version of Druid, you'll need to add the extension to druid.extensions.loadlist in your common.runtime.properties file.
For more information, see the overview for the multi-stage query architecture.
Druid now supports directly storing nested data structures in a newly added COMPLEX<json> column type. COMPLEX<json> columns store a copy of the structured data in JSON format as well as specialized internal columns and indexes for nested literal values—STRING, LONG, and DOUBLE types. An optimized virtual column allows Druid to read and filter these values at speeds consistent with standard Druid LONG, DOUBLE, and STRING columns.
Newly added Druid SQL, native JSON functions, and virtual column allow you to extract, transform, and create COMPLEX<json> values in at query time. You can also use the JSON functions in INSERT and REPLACE statements in SQL-based ingestion, or in a transformSpec in native ingestion as an alternative to using a flattenSpec object to "flatten" nested data for ingestion.
# Updated column indexes and query processing of filters
Reworked column indexes to be extraordinarily flexible, which will eventually allow us to model a wide range of index types. Added machinery to build the filters that use the updated indexes, while also allowing for other column implementations to implement the built-in index types to provide adapters to make use indexing in the current set filters that Druid provides.
You can now use the Druid SQL operator TIME_IN_INTERVAL to filter query results based on time. Prefer TIME_IN_INTERVAL over the SQL BETWEEN operator to filter on time. For more information, see Date and time functions.
Previously, a search query could only search on dimensions that existed in the data source. Search queries now support virtual columns as a parameter in the query.
Simple queries like select max(__time) from ds now run as a timeBoundary queries to take advantage of the time dimension sorting in a segment. You can set a feature flag to enable this feature.
The first/last string aggregator now only compares based on values. Previously, the first/last string aggregator’s values were compared based on the _time column first and then on values.
If you have existing queries and want to continue using both the _time column and values, update your queries to use ORDER BY MAX(timeCol).
# Reduced allocations due to Jackson serialization
Introduced and implemented new helper functions in JacksonUtils to enable reuse of SerializerProvider objects.
Additionally, disabled backwards compatibility for map-based rows in the GroupByQueryToolChest by default, which eliminates the need to copy the heavyweight ObjectMapper. Introduced a configuration option to allow administrators to explicitly enable backwards compatibility.
Added a new IPAddress Java library dependency to handle IP addresses. The library includes IPv6 support. Additionally, migrated IPv4 functions to use the new library.
Previously, consumers that were registered and used for ingestion persisted until Kafka deleted them. They were only used to make sure that an entire topic was consumed. There are no longer consumer groups that linger.
You can now ingest data from endpoints that are different from your default S3 endpoint and signing region.
For more information, see S3 config. #11798
Added setNumProcessorsPerTask to prevent various automatically-sized thread pools from becoming unreasonably large. It isn't ideal for each task to size its pools as if it is the only process on the entire machine. On large machines, this solves a common cause of OutOfMemoryError due to "unable to create native thread".
Druid now accepts the EIGHT_HOUR granularity. You can segment incoming data to EIGHT_HOUR buckets as well as group query results by eight hour granularity. #12717
The previous Avro extension leaked objects from the parser. If these objects leaked into your ingestion, you had objects being stored as a string column with the value as the .toString(). This string column will remain after you upgrade but will return Map.toString() instead of GenericRecord.toString. If you relied on the previous behavior, you can use the Avro extension from an earlier release.
The sampler API has additional limits: maxBytesInMemory and maxClientResponseBytes. These options augment the existing options numRows and timeoutMs. maxBytesInMemory can be used to control the memory usage on the Overlord while sampling. maxClientResponseBytes can be used by clients to specify the maximum size of response they would prefer to handle.
The DruidSchema and SegmentMetadataQuery properties now preserve column order instead of ordering columns alphabetically. This means that query order better matches ingestion order.
You can improve performance by pushing JOINs partially or fully to the base table as a filter at runtime by setting the enableRewriteJoinToFilter context parameter to true for a query.
Druid now pushes down join filters in case the query computing join references any columns from the right side.
Added is_active as shorthand for (is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1). This represents "all the segments that should be queryable, whether or not they actually are right now".
The useNativeQueryExplain property now defaults to true. This means that EXPLAIN PLAN FOR returns the explain plan as a JSON representation of equivalent native query(s) by default. For more information, see Broker Generated Query Configuration Supplementation.
# Running queries with inline data using druid query engine
Some queries that do not refer to any table, such as select 1, are now always translated to a native Druid query with InlineDataSource before execution. If translation is not possible, for queries such as SELECT (1, 2), then an error occurs. In earlier versions, this query would still run.
# You can configure the Coordinator to kill segments in the future
You can now set druid.coordinator.kill.durationToRetain to a negative period to configure the Druid cluster to kill segments whose interval_end is a date in the future. For example, PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system.
A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and can instead kill any segment marked unused. This new configuration is turned off by default, and a cluster operator should fully understand and accept the risks before enabling it.
Reduced contention between the management thread and the reception of status updates from the cluster. This improves the stability of Overlord and all tasks in a cluster when there are large (1000+) task counts.
Updated Coordinator load rule logging to include current replication levels. Added missing segment ID and tier information from some of the log messages.
Addressed the significant memory overhead caused by the web-console indirectly calling the Overlord’s GET tasks API. This could cause unresponsiveness or Overlord failure when the ingestion tab was opened multiple times.
In order to optimize segment cost computation time by reducing time taken for interval creation, store segment interval instead of creating it each time from primitives and reduce memory overhead of storing intervals by interning them. The set of intervals for segments is low in cardinality.
Brokers now have a default of 25MB maximum queued per query. Previously, there was no default limit. Depending on your use case, you may need to increase the value, especially if you have large result sets or large amounts of intermediate data. To adjust the maximum memory available, use the druid.broker.http.maxQueuedBytes property.
For more information, see Configuration reference.
Prepare to have your Web Console experience elevated! - @vogievetsky
# New query view (WorkbenchView) with tabs and long running query support
You can use the new query view to execute multi-stage, task based, queries with the /druid/v2/sql/task and /druid/indexer/v1/task/* APIs as well as native and sql-native queries just like the old Query view. A key point of the sql-msq-task based queries is that they may run for a long time. This inspired / necessitated many UX changes including, but not limited to the following:
Queries run with the multi-stage query task engine have detailed progress reports shown in the summary progress bar and the in detail execution table that provides summaries of the counters for every step.
Queries run with the multi-stage query task engine present user friendly warnings and errors should anything go wrong.
The new query view has components to visualize these with their full detail including a stack-trace.
Queries run with the multi-stage query task engine are tasks. This makes it possible to show queries that are executing currently and that have executed in the recent past.
For any query in the Recent query tasks panel you can view the execution details for it and you can also attach it as a new tab and continue iterating on the query. It is also possible to download the "query detail archive", a JSON file containing all the important details for a given query to use for troubleshooting.
Connect external data flow lets you use the sampler to sample your source data to, determine its schema and generate a fully formed SQL query that you can edit to fit your use case before you launch your ingestion job. This point-and-click flow will save you much typing.
The Preview button appears when you type in an INSERT or REPLACE SQL query. Click the button to remove the INSERT or REPLACE clause and execute your query as an "inline" query with a limi). This gives you a sense of the shape of your data after Druid applies all your transformations from your SQL query.
The query results table has been improved in style and function. It now shows you type icons for the column types and supports the ability to manipulate nested columns with ease.
The Web Console now has some UI affordances for notebook and CTE users. You can reference helper queries, collapsable elements that hold a query, from the main query just like they were defined with a WITH statement. When you are composing a complicated query, it is helpful to break it down into multiple queries to preview the parts individually.
The data loader exists as a GUI wizard to help users craft a JSON ingestion spec using point and click and quick previews. The SQL data loader is the SQL-based ingestion analog of that.
Like the native based data loader, the SQL-based data loader stores all the state in the SQL query itself. You can opt to manipulate the query directly at any stage. See (#12919) for more information about how the data loader differs from the Connect external data workflow.
Sysmonitor stats, like memory or swap, are no longer reported since Peons always run on the same host as MiddleManagerse. This means that duplicate stats will no longer be reported.
(Experimental) You can now see the average number of rows in a segment and the distribution of segments in predefined buckets with the following metrics: segment/rowCount/avg and segment/rowCount/range/count.
Enable the metrics with the following property: org.apache.druid.server.metrics.SegmentStatsMonitor #12730
Added a new monitor, WorkerTaskCountStatsMonitor, that allows each middle manage worker to report metrics for successful / failed tasks, and task slot usage.
You can now hide properties that are sensitive in the API response from /status/properties, such as S3 access keys. Use the druid.server.hiddenProperties property in common.runtime.properties to specify the properties (case insensitive) you want to hide.
You can now configure the retention period for request logs stored on disk with the druid.request.logging.durationToRetain property. Set the retention period to be longer than P1D (Add retention for file request logs #12559)
To read external data using the multi-stage query task engine, you must have READ permissions for the EXTERNAL resource type. Users without the correct permission encounter a 403 error when trying to run SQL queries that include EXTERN.
The way you assign the permission depends on your authorizer. For example, with [basic security]((/docs/development/extensions-core/druid-basic-security.md) in Druid, add the EXTERNAL READ permission by sending a POST request to the roles API.
The example adds permissions for users with the admin role using a basic authorizer named MyBasicMetadataAuthorizer. The following permissions are granted:
Druid automatically retains any segments marked as unused. Previously, Druid permanently deleted unused segments from metadata store and deep storage after their duration to retain passed. This behavior was reverted from 0.23.0. #12693
The default for druid.processing.fifo is now true. This means that tasks of equal priority are treated in a FIFO manner. For most use cases, this change can improve performance on heavily loaded clusters.
In previous releases, Druid automatically closed the JDBC Statement when the ResultSet was closed. Druid closed the ResultSet on EOF. Druid closed the statement on any exception. This behavior is, however, non-standard.
In this release, Druid's JDBC driver follows the JDBC standards more closely:
The ResultSet closes automatically on EOF, but does not close the Statement or PreparedStatement. Your code must close these statements, perhaps by using a try-with-resources block.
The PreparedStatement can now be used multiple times with different parameters. (Previously this was not true since closing the ResultSet closed the PreparedStatement.)
If any call to a Statement or PreparedStatement raises an error, the client code must still explicitly close the statement. According to the JDBC standards, statements are not closed automatically on errors. This allows you to obtain information about a failed statement before closing it.
If you have code that depended on the old behavior, you may have to change your code to add the required close statement.
Apache Druid 24.0.0 contains over 300 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 67 contributors. See the complete set of changes for additional details.
# New Features
# Multi-stage query task engine
SQL-based ingestion for Apache Druid uses a distributed multi-stage query architecture, which includes a query engine called the multi-stage query task engine (MSQ task engine). The MSQ task engine extends Druid's query capabilities, so you can write queries that reference external data as well as perform ingestion with SQL INSERT and REPLACE. Essentially, you can perform SQL-based ingestion instead of using JSON ingestion specs that Druid's native ingestion uses.
SQL-based ingestion using the multi-stage query task engine is the recommended solution starting in Druid 24.0.0. Alternative ingestion solutions such as native batch and Hadoop-based ingestion systems will still be supported. We recommend you read all known issues and test the feature in a development environment before rolling out in production. Using the multi-stage query task engine with
SELECT
statements that do not write to a datasource is experimental.The extension for it (druid-multi-stage-query) is loaded by default. If you're upgrading from an earlier version of Druid, you'll need to add the extension to druid.extensions.loadlist in your common.runtime.properties file.
For more information, see the overview for the multi-stage query architecture.
#12524
#12386
#12523
#12589
# Nested columns
Druid now supports directly storing nested data structures in a newly added
COMPLEX<json>
column type.COMPLEX<json>
columns store a copy of the structured data in JSON format as well as specialized internal columns and indexes for nested literal values—STRING
,LONG
, andDOUBLE
types. An optimized virtual column allows Druid to read and filter these values at speeds consistent with standard DruidLONG
,DOUBLE
, andSTRING
columns.Newly added Druid SQL, native JSON functions, and virtual column allow you to extract, transform, and create
COMPLEX<json>
values in at query time. You can also use the JSON functions inINSERT
andREPLACE
statements in SQL-based ingestion, or in atransformSpec
in native ingestion as an alternative to using aflattenSpec
object to "flatten" nested data for ingestion.See SQL JSON functions, native JSON functions, Nested columns, virtual columns, and the feature summary for more detail.
#12753
#12714
#12753
#12920
# Updated Java support
Java 11 is fully supported is no longer experimental. Java 17 support is improved.
#12839
# Query engine updates
# Updated column indexes and query processing of filters
Reworked column indexes to be extraordinarily flexible, which will eventually allow us to model a wide range of index types. Added machinery to build the filters that use the updated indexes, while also allowing for other column implementations to implement the built-in index types to provide adapters to make use indexing in the current set filters that Druid provides.
#12388
# Time filter operator
You can now use the Druid SQL operator TIME_IN_INTERVAL to filter query results based on time. Prefer TIME_IN_INTERVAL over the SQL BETWEEN operator to filter on time. For more information, see Date and time functions.
#12662
# Null values and the "in" filter
If a
values
array containsnull
, the "in" filter matches null values. This differs from the SQL IN filter, which does not match null values.For more information, see Query filters and SQL data types.
#12863
# Virtual columns in search queries
Previously, a search query could only search on dimensions that existed in the data source. Search queries now support virtual columns as a parameter in the query.
#12720
# Optimize simple MIN / MAX SQL queries on __time
Simple queries like
select max(__time) from ds
now run as atimeBoundary
queries to take advantage of the time dimension sorting in a segment. You can set a feature flag to enable this feature.#12472
#12491
# String aggregation results
The first/last string aggregator now only compares based on values. Previously, the first/last string aggregator’s values were compared based on the
_time
column first and then on values.If you have existing queries and want to continue using both the
_time
column and values, update your queries to use ORDER BY MAX(timeCol).#12773
# Reduced allocations due to Jackson serialization
Introduced and implemented new helper functions in
JacksonUtils
to enable reuse ofSerializerProvider
objects.Additionally, disabled backwards compatibility for map-based rows in the
GroupByQueryToolChest
by default, which eliminates the need to copy the heavyweightObjectMapper
. Introduced a configuration option to allow administrators to explicitly enable backwards compatibility.#12468
# Updated IPAddress Java library
Added a new IPAddress Java library dependency to handle IP addresses. The library includes IPv6 support. Additionally, migrated IPv4 functions to use the new library.
#11634
# Query performance improvements
Optimized SQL operations and functions as follows:
isEmpty()
andequals()
on RangeSets (DimensionRangeShardSpec speed boost. #12477)# Streaming ingestion
# Kafka consumers
Previously, consumers that were registered and used for ingestion persisted until Kafka deleted them. They were only used to make sure that an entire topic was consumed. There are no longer consumer groups that linger.
#12842
# Kinesis ingestion
You can now perform Kinesis ingestion even if there are empty shards. Previously, all shards had to have at least one record.
#12792
# Batch ingestion
# Batch ingestion from S3
You can now ingest data from endpoints that are different from your default S3 endpoint and signing region.
For more information, see S3 config.
#11798
# Improvements to ingestion in general
This release includes the following improvements for ingestion in general.
# Increased robustness for task management
Added
setNumProcessorsPerTask
to prevent various automatically-sized thread pools from becoming unreasonably large. It isn't ideal for each task to size its pools as if it is the only process on the entire machine. On large machines, this solves a common cause ofOutOfMemoryError
due to "unable to create native thread".#12592
# Avatica JDBC driver
The JDBC driver now follows the JDBC standard and uses two kinds of statements, Statement and PreparedStatement.
#12709
# Eight hour granularity
Druid now accepts the
EIGHT_HOUR
granularity. You can segment incoming data toEIGHT_HOUR
buckets as well as group query results by eight hour granularity.#12717
# Ingestion general
# Updated Avro extension
The previous Avro extension leaked objects from the parser. If these objects leaked into your ingestion, you had objects being stored as a string column with the value as the .toString(). This string column will remain after you upgrade but will return
Map.toString()
instead ofGenericRecord.toString
. If you relied on the previous behavior, you can use the Avro extension from an earlier release.#12828
# Sampler API
The sampler API has additional limits:
maxBytesInMemory
andmaxClientResponseBytes
. These options augment the existing optionsnumRows
andtimeoutMs
.maxBytesInMemory
can be used to control the memory usage on the Overlord while sampling.maxClientResponseBytes
can be used by clients to specify the maximum size of response they would prefer to handle.#12947
# SQL
# Column order
The
DruidSchema
andSegmentMetadataQuery
properties now preserve column order instead of ordering columns alphabetically. This means that query order better matches ingestion order.#12754
# Converting JOINs to filter
You can improve performance by pushing JOINs partially or fully to the base table as a filter at runtime by setting the
enableRewriteJoinToFilter
context parameter totrue
for a query.Druid now pushes down join filters in case the query computing join references any columns from the right side.
#12749
#12868
# Add is_active to sys.segments
Added
is_active
as shorthand for(is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1)
. This represents "all the segments that should be queryable, whether or not they actually are right now".#11550
#
useNativeQueryExplain
now defaults to trueThe
useNativeQueryExplain
property now defaults totrue
. This means that EXPLAIN PLAN FOR returns the explain plan as a JSON representation of equivalent native query(s) by default. For more information, see Broker Generated Query Configuration Supplementation.#12936
# Running queries with inline data using druid query engine
Some queries that do not refer to any table, such as
select 1
, are now always translated to a native Druid query withInlineDataSource
before execution. If translation is not possible, for queries such asSELECT (1, 2)
, then an error occurs. In earlier versions, this query would still run.#12897
# Coordinator/Overlord
# You can configure the Coordinator to kill segments in the future
You can now set
druid.coordinator.kill.durationToRetain
to a negative period to configure the Druid cluster to kill segments whoseinterval_end
is a date in the future. For example, PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system.A cluster operator can also disregard the
druid.coordinator.kill.durationToRetain
entirely by setting a new configuration,druid.coordinator.kill.ignoreDurationToRetain=true
. This ignoresinterval_end
date when looking for segments to kill, and can instead kill any segment marked unused. This new configuration is turned off by default, and a cluster operator should fully understand and accept the risks before enabling it.# Improved Overlord stability
Reduced contention between the management thread and the reception of status updates from the cluster. This improves the stability of Overlord and all tasks in a cluster when there are large (1000+) task counts.
#12099
# Improved Coordinator segment logging
Updated Coordinator load rule logging to include current replication levels. Added missing segment ID and tier information from some of the log messages.
#12511
# Optimized overlord GET tasks memory usage
Addressed the significant memory overhead caused by the web-console indirectly calling the Overlord’s GET tasks API. This could cause unresponsiveness or Overlord failure when the ingestion tab was opened multiple times.
#12404
# Reduced time to create intervals
In order to optimize segment cost computation time by reducing time taken for interval creation, store segment interval instead of creating it each time from primitives and reduce memory overhead of storing intervals by interning them. The set of intervals for segments is low in cardinality.
#12670
# Brokers/Overlord
Brokers now have a default of 25MB maximum queued per query. Previously, there was no default limit. Depending on your use case, you may need to increase the value, especially if you have large result sets or large amounts of intermediate data. To adjust the maximum memory available, use the
druid.broker.http.maxQueuedBytes
property.For more information, see Configuration reference.
# Web console
# New query view (WorkbenchView) with tabs and long running query support
You can use the new query view to execute multi-stage, task based, queries with the /druid/v2/sql/task and /druid/indexer/v1/task/* APIs as well as native and sql-native queries just like the old Query view. A key point of the sql-msq-task based queries is that they may run for a long time. This inspired / necessitated many UX changes including, but not limited to the following:
# Tabs
You can now have many queries stored and running at the same time, significantly improving the query view UX.
You can open several tabs, duplicate them, and copy them as text to paste into any console and reopen there.
# Progress reports (counter reports)
Queries run with the multi-stage query task engine have detailed progress reports shown in the summary progress bar and the in detail execution table that provides summaries of the counters for every step.
# Error and warning reports
Queries run with the multi-stage query task engine present user friendly warnings and errors should anything go wrong.
The new query view has components to visualize these with their full detail including a stack-trace.
# Recent query tasks panel
Queries run with the multi-stage query task engine are tasks. This makes it possible to show queries that are executing currently and that have executed in the recent past.
For any query in the Recent query tasks panel you can view the execution details for it and you can also attach it as a new tab and continue iterating on the query. It is also possible to download the "query detail archive", a JSON file containing all the important details for a given query to use for troubleshooting.
# Connect external data flow
Connect external data flow lets you use the sampler to sample your source data to, determine its schema and generate a fully formed SQL query that you can edit to fit your use case before you launch your ingestion job. This point-and-click flow will save you much typing.
# Preview button
The Preview button appears when you type in an INSERT or REPLACE SQL query. Click the button to remove the INSERT or REPLACE clause and execute your query as an "inline" query with a limi). This gives you a sense of the shape of your data after Druid applies all your transformations from your SQL query.
# Results table
The query results table has been improved in style and function. It now shows you type icons for the column types and supports the ability to manipulate nested columns with ease.
# Helper queries
The Web Console now has some UI affordances for notebook and CTE users. You can reference helper queries, collapsable elements that hold a query, from the main query just like they were defined with a WITH statement. When you are composing a complicated query, it is helpful to break it down into multiple queries to preview the parts individually.
# Additional Web Console tools
More tools are available from the ... menu:
# New SQL-based data loader
The data loader exists as a GUI wizard to help users craft a JSON ingestion spec using point and click and quick previews. The SQL data loader is the SQL-based ingestion analog of that.
Like the native based data loader, the SQL-based data loader stores all the state in the SQL query itself. You can opt to manipulate the query directly at any stage. See (#12919) for more information about how the data loader differs from the Connect external data workflow.
# Other changes and improvements
See (#12919) for more details and other improvements
# Metrics
# Sysmonitor stats for Peons
Sysmonitor stats, like memory or swap, are no longer reported since Peons always run on the same host as MiddleManagerse. This means that duplicate stats will no longer be reported.
#12802
# Prometheus
You can now include the host and service as labels for Prometheus by setting the following properties to true:
druid.emitter.prometheus.addHostAsLabel
druid.emitter.prometheus.addServiceAsLabel
#12769
# Rows per segment
(Experimental) You can now see the average number of rows in a segment and the distribution of segments in predefined buckets with the following metrics:
segment/rowCount/avg
andsegment/rowCount/range/count
.Enable the metrics with the following property:
org.apache.druid.server.metrics.SegmentStatsMonitor
#12730
# New
sqlQuery/planningTimeMs
metricThere’s a new
sqlQuery/planningTimeMs
metric for SQL queries that computes the time it takes to build a native query from a SQL query.#12923
# StatsD metrics reporter
The StatsD metrics reporter extension now includes the following metrics:
Add missing metrics into statsd-reporter. #12762
# New worker level task metrics
Added a new monitor,
WorkerTaskCountStatsMonitor
, that allows each middle manage worker to report metrics for successful / failed tasks, and task slot usage.#12446
# Improvements to the JvmMonitor
The JvmMonitor can now handle more generation and collector scenarios. The monitor is more robust and works properly for ZGC on both Java 11 and 15.
#12469
# Garbage collection
Garbage collection metrics now use MXBeans.
#12481
# Metric for task duration in the pending queue
Introduced the metric
task/pending/time
to measure how long a task stays in the pending queue.#12492
# Emit metrics object for Scan, Timeseries, and GroupBy queries during cursor creation
Adds vectorized metric for scan, timeseries and groupby queries.
#12484
# Emit state of replace and append for native batch tasks
Druid now emits metrics so you can monitor and assess the use of different types of batch ingestion, in particular replace and tombstone creation.
#12488
#12840
# KafkaEmitter emits
queryType
The KafkaEmitter now properly emits the
queryType
property for native queries.#12915
# Security
You can now hide properties that are sensitive in the API response from
/status/properties
, such as S3 access keys. Use thedruid.server.hiddenProperties
property incommon.runtime.properties
to specify the properties (case insensitive) you want to hide.#12950
# Other changes
druid.request.logging.durationToRetain
property. Set the retention period to be longer thanP1D
(Add retention for file request logs #12559)Zstandard
compression library toCompressionStrategy
(Adding zstandard compression library #12408)inputSegmentSizeBytes
in Compaction configuration to 100,000,000,000,000 (~100TB)# Bug fixes
Druid 24.0 contains over 68 bug fixes. You can find the complete list here
# Upgrading to 24.0
# Permissions for multi-stage query engine
To read external data using the multi-stage query task engine, you must have READ permissions for the EXTERNAL resource type. Users without the correct permission encounter a 403 error when trying to run SQL queries that include EXTERN.
The way you assign the permission depends on your authorizer. For example, with [basic security]((/docs/development/extensions-core/druid-basic-security.md) in Druid, add the
EXTERNAL READ
permission by sending aPOST
request to the roles API.The example adds permissions for users with the
admin
role using a basic authorizer namedMyBasicMetadataAuthorizer
. The following permissions are granted:# Behavior for unused segments
Druid automatically retains any segments marked as unused. Previously, Druid permanently deleted unused segments from metadata store and deep storage after their duration to retain passed. This behavior was reverted from
0.23.0
.#12693
# Default for
druid.processing.fifo
The default for
druid.processing.fifo
is now true. This means that tasks of equal priority are treated in a FIFO manner. For most use cases, this change can improve performance on heavily loaded clusters.#12571
# Update to JDBC statement closure
In previous releases, Druid automatically closed the JDBC Statement when the ResultSet was closed. Druid closed the ResultSet on EOF. Druid closed the statement on any exception. This behavior is, however, non-standard.
In this release, Druid's JDBC driver follows the JDBC standards more closely:
The ResultSet closes automatically on EOF, but does not close the Statement or PreparedStatement. Your code must close these statements, perhaps by using a try-with-resources block.
The PreparedStatement can now be used multiple times with different parameters. (Previously this was not true since closing the ResultSet closed the PreparedStatement.)
If any call to a Statement or PreparedStatement raises an error, the client code must still explicitly close the statement. According to the JDBC standards, statements are not closed automatically on errors. This allows you to obtain information about a failed statement before closing it.
If you have code that depended on the old behavior, you may have to change your code to add the required close statement.
#12709
# Known issues
# Credits
@2bethere
@317brian
@a2l007
@abhagraw
@abhishekagarwal87
@abhishekrb19
@adarshsanjeev
@aggarwalakshay
@AmatyaAvadhanula
@BartMiki
@capistrant
@chenrui333
@churromorales
@clintropolis
@cloventt
@CodingParsley
@cryptoe
@dampcake
@dependabot[bot]
@dherg
@didip
@dongjoon-hyun
@ektravel
@EsoragotoSpirit
@exherb
@FrankChen021
@gianm
@hellmarbecker
@hwball
@iandr413
@imply-cheddar
@jarnoux
@jasonk000
@jihoonson
@jon-wei
@kfaraz
@LakshSingla
@liujianhuanzz
@liuxiaohui1221
@lmsurpre
@loquisgon
@machine424
@maytasm
@MC-JY
@Mihaylov93
@nishantmonu51
@paul-rogers
@petermarshallio
@pjfanning
@rockc2020
@rohangarg
@somu-imply
@suneet-s
@superivaj
@techdocsmith
@tejaswini-imply
@TSFenwick
@vimil-saju
@vogievetsky
@vtlim
@williamhyun
@wiquan
@writer-jill
@xvrl
@yuanlihan
@zachjsh
@zemin-piao
The text was updated successfully, but these errors were encountered: