Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No viable timestamp field in ElasticSearch #480

Closed
robinkb opened this issue Oct 18, 2017 · 8 comments
Closed

No viable timestamp field in ElasticSearch #480

robinkb opened this issue Oct 18, 2017 · 8 comments

Comments

@robinkb
Copy link

robinkb commented Oct 18, 2017

Hello,

We are using ElasticSearch to store our Jaeger traces, and we would like to perform additional queries on this data outside of Jaeger. However, the logs that Jaeger sends to ElasticSearch do not have a viable timestamp field, which makes querying this data externally very difficult, if not impossible.

I understand that Jaeger intentionally saves the StartTime field as a Long type in ES, because ES does not support dates with microsecond accuracy. If Jaeger saved the StartTime field as Date, ES would cut off the timestamp to milisecond accuracy, losing valuable precision.

Zipkin works around this issue by adding an additional field, timestamp_milis, which ES can use as a date field for sorting. I looked into implementing this in Jaeger myself, but I have little to no experience with Go.

Please consider adding a similar solution in Jaeger, as it would greatly increase the value of using ElasticSearch as a storage backend.

@robinkb
Copy link
Author

robinkb commented Oct 22, 2017

In case anyone is looking into this, I went and developed a patch myself. I'll try and create a pull request in the coming days, though as I said, I am not a Go developer, or even an actual developer, so I make no promises about its quality :-)

@yurishkuro
Copy link
Member

+1 especially if it can be made backwards compatible, i.e. not force any schema upgrades on people

@robinkb
Copy link
Author

robinkb commented Oct 24, 2017

Opened pull request #491

The change should be backwards compatible. The only issue that I can see now is that the ElasticSearch mapping must be updated in advance to include startTimeMillis as a date type. If not, the traces of that day will be mapped to a number, and won't be a valid timerange field.

I expect that when the collector creates a new index for the next day, it will use the new mapping, and use startTimeMillis is a timestamp field.

@yurishkuro
Copy link
Member

The only issue that I can see now is that the ElasticSearch mapping must be updated in advance to include startTimeMillis as a date type. If not, the traces of that day will be mapped to a number, and won't be a valid timerange field.

What would that mean to the end user?

@robinkb
Copy link
Author

robinkb commented Oct 25, 2017

Jaeger Query/UI functionality is not affected at all. You can even switch between 0.8.0 and my patched version, and traces will continue to show up correctly in Jaeger.

If the mapping in ElasticSearch is not updated in advance, the updated collector will send spans using the new format, with StartTimeMillis added, into the existing index. ElasticSearch will then automatically map the StartTimeMillis field to a number type. This means that we are back at square one, and timerange queries on ElasticSearch will not work. There is no regression, but no progression either. I still need to test if the index of the next day is created with the correct mapping. I suspect that it will, because the Jaeger Collector is responsible for creating indices and mappings, but I have not verified this yet. If this works as I expect, the user will simply have to wait until the next day, when the next index is created, to start running timerange queries. In any case, existing functionality will not be affected.

If the mapping is updated in advance, before the Jaeger Collector is updated, the user can start running timerange queries and will find traces starting from when the collector is updated.
An example for updating the mapping:

PUT /jaeger-span-<current day>/_mapping/span
{
  "properties": {
    "startTimeMillis": {
      "type": "date",
      "format": "epoch_millis"
    }
  }
}

I tested this on my Kibana instance, and I had to re-add my index pattern. I don't think this results in any data loss. If the user does not have an existing index pattern for jaeger-span-*, there is no issue.

An example of what this looks like in Kibana:
image

I manually updated the mapping of the current index, and then updated my Jaeger Collector from 0.8.0 to my patched version, at around 10:22.

Another example of a quick visualization using Grafana on timestamped tracing data from Redis spans:
image

@robinkb
Copy link
Author

robinkb commented Oct 26, 2017

Had a chance to test this scenario last night:

I still need to test if the index of the next day is created with the correct mapping. I suspect that it will, because the Jaeger Collector is responsible for creating indices and mappings, but I have not verified this yet.

The new index is created with the correct mapping, but apparently Kibana doesn't like it when indices matching a pattern (like jaeger-span-*) have differing mappings.

So, if users want to use the new functionality, there are two possible scenarios:

First scenario:

  1. The user updates the mappings of every Jaeger Span index to include startTimeMillis
  2. The user upgrades Jaeger Collector
  3. If the user has a index pattern for jaeger-span-* in Kibana, the user re-adds the index pattern
  4. The user can now perform timerange queries from Kibana

Second scenario:

  1. The user upgrades Jaeger Collector without updating the mappings in advance
  2. The user reindexes all old jaeger-span-* indices with the new mapping
  3. If the user has a index pattern for jaeger-span-* in Kibana, the user re-adds the index pattern
  4. The user can now perform timerange queries from Kibana

This is only a problem for Kibana. Grafana and Jaeger have no issue with the differing mappings.

@yurishkuro
Copy link
Member

Is it worth adding a script that updates the mappings for existing indices? It seems like it would make the migration least painful, (1) run the script, (2) upgrade collectors.

@black-adder
Copy link
Contributor

if users only persist traces for 2 days like we do, I don't see much need for a script to update all the previous spans. Let's open a task for it but I don't think it's super urgent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants