Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSDB followups #98877

Open
2 of 14 tasks
martijnvg opened this issue Aug 25, 2023 · 3 comments
Open
2 of 14 tasks

TSDB followups #98877

martijnvg opened this issue Aug 25, 2023 · 3 comments

Comments

@martijnvg
Copy link
Member

martijnvg commented Aug 25, 2023

Not completed tasks from #74660 that we may want to followup at some point in the future.

  • Move IndexRequest#autoGeneratId? It's a bit spook where it is but I don't like it any other place.
  • Improve error messages in _update_by_query when modifying the dimensions or @timestamp
  • On translog replay and recovery and replicas we regenerate the _id and assert that it matches the _id from the primary. Should we? Probably. Let's make sure.
  • Treating data stream/index as a dimension
  • Support text field labels in downsampling
  • Histograms support in downsampling
  • SQL/ES|QL support for downsampling
  • Optimization of merge policies (Move backing indices of data streams to LogByteMergePolicy #87684)
  • Default the setting's value to all of the keyword dimensions
  • Support shard splitting on time_series indices
  • Make an object or interface for _id's values. Right now it's a String that we encode with Uid.encodeId. That was reasonable. Maybe it still is. But it feels complex and for tsdb who's _id is always some bytes. And encoding it also wastes a byte about 1/128 of the time. It's a common prefix byte so this is probably not really an issue. But still. This is a big change but it'd make ES easier to read. Probably wouldn't really improve the storage though.
  • Figure out how to specify tsdb settings in component templates. For example index.routing_path can be specified in a composable index template if data stream template' index_mode is set to time_series. But if this setting is specified in a component template then it is required to also set the index.mode index setting. This feels backwards. @martijnvg
  • In order to retrieve the routing values (defined in index.routin_path), the source needs to be parsed on coordinating node. However in the case that an ingest pipeline is executed this, then the source of document will be parsed for the second time. Ideally the routing values should be extracted when ingest is performed. Similar to how the @timestamp field is already retrieved from a document during pipeline execution.
  • In order to determine the backing index a document should be to, a timestamp is parsed into Instant. The format being used is: strict_date_optional_time_nanos||strict_date_optional_time||epoch_millis. This to allow regular data format, data nanos date format and epoch since mills defined as string. We can optimise the data parsing if we know the exact format being used. For example if on data stream there is parameter that indices that exact data format we can optimise parsing by either using strict_date_optional_time_nanos, strict_date_optional_time or epoch_millis.
@martijnvg martijnvg added Meta :StorageEngine/TSDB You know, for Metrics labels Aug 25, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 25, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@mbudge
Copy link

mbudge commented Jan 27, 2024

Please fix this

#104839

@wchaparro wchaparro removed the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 29, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants