Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More browser / parser friendly return format for timestamps #273

Closed
torkelo opened this issue Feb 1, 2019 · 10 comments · Fixed by #1010 or #1022
Closed

More browser / parser friendly return format for timestamps #273

torkelo opened this issue Feb 1, 2019 · 10 comments · Fixed by #1010 or #1022
Assignees
Labels
component/loki type/feature Something new we should do

Comments

@torkelo
Copy link
Member

torkelo commented Feb 1, 2019

Currently loki returns timestamps in a string format, these date strings can be quite expensive to parse when dealing with many thousands of entries.

In the browser timestamps are in millisecond epochs so that would be the deals format to return values in (ie in plain json numbers). Since the internal format is in nanoseconds this will lose some precision (that we throw away anyway when parsing this to a local date). So not sure what the best approach here is. Maybe some query flag that will control the ts format & precision.

A related issue is how labels are currently returned in string that has to be json parsed separately (from the rest of the body). If dealing with a large number of streams this could also be a minor performance issue but mostly I think this just looks very strange from an API perspective to have this be a string that contains json and not be proper part of the json response.

@tomwilkie tomwilkie added component/loki type/feature Something new we should do labels Feb 5, 2019
@candlerb
Copy link
Contributor

candlerb commented Apr 9, 2019

Currently loki returns timestamps in a string format, these date strings can be quite expensive to parse when dealing with many thousands of entries.

It's also inconsistent that the 'start' and 'end' query parameters are in Unix epoch nanoseconds, but the returned ts values are ISO datetime. It's certainly painful to keep converting back and forth.

The problem with using Unix nanoseconds with Javascript is that IEEE double has only 53 bits of precision, and the current time requires 61 bits. This means you lose precision when going via JSON to systems which implement double floating numeric values like Javascript - although you should still get roughly microsecond precision.

A related issue is how labels are currently returned in string that has to be json parsed separately (from the rest of the body).

It's not even JSON; it's loki/prometheus labels format.

Labels: {foo="bar"}

JSON would be: {"foo":"bar"}

@kylebrandt
Copy link

I wonder if the JSON for labels might be easier to work with it if the JSON representation was closer to the Labels structure in prom to keep ordering. (https://github.com/prometheus/prometheus/blob/master/pkg/labels/labels.go#L43).

So instead of turning something like [ [ k,v ], [ k,v ] ] sorted by member Names.

@candlerb
Copy link
Contributor

candlerb commented Aug 26, 2019

The fact that labels are not JSON makes it harder to build a valid string of labels too: e.g. see #927

However, labels are also a mini query language in their own right: e.g. {"foo"=~".*bar.*"}. That's not something that would be pretty or convenient in pure JSON.

I don't think the ordering of labels is important: {foo="1",bar="2"} and {bar="2",foo="1"} are surely the same label set. The storage engine might need to sort them on ingestion if it requires them in a consistent order - but if that's the case, it must be doing so already.

@cyriltovena
Copy link
Contributor

cyriltovena commented Sep 11, 2019

@torkelo @dprokop @aocenas @slim-bean @joe-elliott

We had a discussion today and I think the suggestion will be to change the streams result type from :

{
  "resultType": "streams",
  "result": [
    {
      "labels": "{filename=\"/var/log/myproject.log\", job=\"varlogs\", level=\"info\"}",
      "entries": [
        {
          "ts": "2019-06-06T19:25:41.972739Z",
          "line": "foo"
        },
        {
          "ts": "2019-06-06T19:25:41.972722Z",
          "line": "bar"
        }
      ]
    }
  ]

to

{
  "resultType": "streams",
  "result": [
    {
      "stream": {
          "app":"foo",
      },
       "values": [
            [
                "1568215619994013001",
                "a log message"
            ],
            [
                "1568215619994013002",
                "another log messa"
            ],
            [
                "1568215619994013003",
                "blah"
            ]
        ]
    }
  ]
}

Values will be in nanoseconds string.

@cyriltovena
Copy link
Contributor

This will only be true for the new API endpoint, the current one won't change.

@joe-elliott joe-elliott self-assigned this Sep 11, 2019
@candlerb
Copy link
Contributor

When you say "nanoseconds float", bear in mind that an IEEE double has only 53 bits of precision, and in the current era, time will be rounded to about a quarter of a microsecond.

$ node
> 1435781451.123456789
1435781451.1234567
> 1435781451.123456800
1435781451.1234567
> 1435781451.123456900
1435781451.123457

If the new API uses this form consistently, I'd be fine with that - although there seems to be little point storing exact nanoseconds if these are not exposed in the API. Would query start and end parameters be changed to float?

Double Extended (80 bit) would have enough precision, but not many clients will implement that.

@ryantxu
Copy link
Member

ryantxu commented Sep 11, 2019

discussed with @torkelo and @aocenas -- we think sending nanoseconds as strings is a good solution:

       "values": [
            [
                "1568215619994013001",
                "a log message"
            ],

This makes parsing dates to epoch something like:

parseInt(stringTs.splice(-6))

and later when we add nanosecond support we can look at nano-date

@candlerb
Copy link
Contributor

Thanks: I think nanoseconds as strings is fine.

It would be great if the rest of the API could be similarly updated (e.g. ts values for POST to /api/prom/push; start and end query values). It would be very cheap to attempt to parse them as integers first, before falling back to full ISO datetime parsing.

@cyriltovena
Copy link
Contributor

Good point, @joe-elliott what’s your take on the the push API?

@joe-elliott
Copy link
Member

I'm not opposed to moving the push API forward to this new format, but I think we should do it in a new issue/PR.

I believe the push API exclusively uses grpc objects for both json and protobuf unmarshalling. The PR I've put together would have to touch entirely new parts of the codebase to start handling that logic.

periklis pushed a commit to periklis/loki that referenced this issue Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/loki type/feature Something new we should do
Projects
None yet
7 participants