You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Time series points have a timestamp, measurement and dimensions associated with them. The common queries are range queries on timestamp, metric aggregation on measurement and grouping on dimensions. Or similar query with histogram on timestamp.
tsid1 is the time series ID of the point. It will be the unit of storage for time series points and for prototype each of them represents a unique field in lucene.
timestamp is the actual point in BKD on which the index is created.
Below is the comparison of running unit test for DocValue approach vs TSPoint approach -
This test ingests 10000000 docs against a given TSID and performs a range query on timestamp 100 times against the same TSID. Merge function used is sum.
This is not apple to apple comparison since number of segments are 3 in DocValues approach whereas its 10 in TSPoint approach.
Limitation of this feature
Doc deletion currently not supported. We need to evaluate how important is it and possibly find a way to support it in future.
Only decomposable aggregation functions can be supported. E.g. min, max, sum, avg, count.
Todos
Implementation for multiple TSIDs. For now we need to create a new field with the name same as TSID for a timeseries.
Segment merge for BKD with summaries. Currently, the UTs disables merge and perform search across multiple segments and cumulate the results.
Pluggable merge function to merge 2 TSPoint. Currently its hardcoded in FieldInfo.java which isn't the right place to define them.
Measurement compression in BKD. I'm thinking of using delta encoding to store measurement values and summaries while packing the summaries associated with nodes of the tree.
Persist first and last docID in internal nodes of BKD with summaries in an efficient way. This will be useful to use precomputed summaries and skip over batches of documents when iterating using DocIDSetIterator. Its a blocker for integration with OpenSearch aggregation framework.
Integrate with OpenSearch aggregation framework.
Benchmark against real timeseries dataset.
compare against SortedDocValues approach.
compare against other timeseries databases.
Evaluate support of deletion of document/timeseries/batch of documents (matching a timestamp range).
The text was updated successfully, but these errors were encountered:
Description
Time series points have a timestamp, measurement and dimensions associated with them. The common queries are range queries on timestamp, metric aggregation on measurement and grouping on dimensions. Or similar query with histogram on timestamp.
Proposal:
Prototype can be found here
TSPoint
which can be added as -tsid1
is the time series ID of the point. It will be the unit of storage for time series points and for prototype each of them represents a unique field in lucene.timestamp
is the actual point in BKD on which the index is created.Full definition can be found here.
Sum function -
LeafReader
to perform range queries on TSPoint and retrieve summarized results -Instead of BKDReader and BKDWriter, we will be using BKDSummaryWriter BKDSummaryReader which supports writing summaries in internal nodes of the tree.
Changes in IntersectVisitor interface here.
Comparison with DocValues
Below is the comparison of running unit test for DocValue approach vs TSPoint approach -
This test ingests
10000000
docs against a given TSID and performs a range query on timestamp 100 times against the same TSID. Merge function used issum
.This is not apple to apple comparison since number of segments are 3 in DocValues approach whereas its 10 in TSPoint approach.
Limitation of this feature
Todos
TSPoint
. Currently its hardcoded inFieldInfo.java
which isn't the right place to define them.The text was updated successfully, but these errors were encountered: