Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add semantic conventions for errors #427

Closed
wants to merge 2 commits into from

Conversation

mwear
Copy link
Member

@mwear mwear commented Jan 27, 2020

Having a way to record errors and annotate spans with that information is essential functionality that is currently missing from OpenTelemetry.

There is currently OTEP-69 that proposes typed events and includes structures for recording errors.

Since it was authored, array-value attributes have been added to the specification. This addition makes it possible to record errors using the existing event type by representing a stack trace as an array of strings, where each entry corresponds to a stackframe.

This PR doesn't intend to compete with OTEP-69, but offers another approach to be considered. Hopefully, between this proposal and OTEP-69 we can begin to drive a discussion on how to record errors forward.

@mwear
Copy link
Member Author

mwear commented Jan 27, 2020

cc: @kbrockhoff

| :------------- | :--------- | :----------------------------------------------------------- | :-------- |
| error.type | String | The type of the error. E.g. "Exception", "OSError" | Yes |
| error.message | String | The error message. E.g. `"Division by zero"`, `"Can't convert 'int' object to str implicitly"` | Yes |
| error.stack | Array\<String\> | A stack trace formatted as an array of strings. The stackframe at which the exeception occurred should be the first entry. E.g. `["Exception in thread \"main\" java.lang.RuntimeException: Test exception", " at com.example.GenerateTrace.methodB(GenerateTrace.java:13)", " at com.example.GenerateTrace.methodA(GenerateTrace.java:9)", " at com.example.GenerateTrace.main(GenerateTrace.java:5)" ]` | No |
Copy link

@vmihailenco vmihailenco Jan 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how simple this is, but it does not expose information about file:line etc in a structured format for backends which may use it for grouping and/or filtering. And given the fact that most languages (at least JS, Java, Go, Python, Ruby & PHP) provide access to stack info in a structured/already parsed way we will end up with

  • Languages/runtimes providing stack info in a structured way like {file: string, line: number, func: string}
  • OpenTelemetry requiring encoding that stack info as array of strings in language native human-readable format
  • Backends having to parse those strings to extract information they need

That is unneeded extra work both in OpenTelemetry and backends. So my 2 cents are that error.stack should have some internal structure like in this proposal. Perhaps by default it should be array of JSON-encoded strings like this:

[
  "{file: \"GenerateTrace.java\", line: 13, func: \"com.example.GenerateTrace.methodB\"}",
  "{file: \"GenerateTrace.java\", line: 9, func: \com.example.GenerateTrace.methodA\"}",
]

And as a fallback if a string can't be parsed as JSON - it is a string in human-readable format you proposed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining an internal structure makes sense if languages have a mechanism (other than string parsing) to access the data. If folks can chime in with what fields are readily available for stackframes perhaps we can come up with a set of required and optional attributes to collect.

For Ruby we have Thead::Backtrace::Location. The line and file are readily available, but func is not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing systems that solved this problem long time ago. We don't need to reinvent the wheel.

Eg https://docs.sentry.io/development/sdk-dev/event-payloads/stacktrace/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that I have some reservations about the cost and complexity of collecting data at such a granular level. Most (if not all) of the relevant information is available in a stack trace formatted as an array of strings (or a newline delimited one). If we added the language as an attribute backends would have the information they need to parse the relevant data, and spare OpenTelemetry users from having to gather it. While this does increase the complexity of for the backend, the trade off is simplicity and improved performance for OpenTelemetry users.

If we do go the route of more granular data, I think we should have actual structures for collecting it, and possibly separate APIs. That might make OTEP-69 a better fit.

Copy link

@vmihailenco vmihailenco Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwear It would be helpful if you describe the complexity (ideally write some code) even for one language of your choice like Ruby. I insist that practically for most languages there is no significant difference between generating at com.example.GenerateTrace.methodA(GenerateTrace.java:9) and {file: "com.example.GenerateTrace.methodA(GenerateTrace.java", line: 9}. Or the difference will be the same as between sprintf and json_encode. And it is not that hard to use sprintf to generate JSON if you worried about micro-optimizations. Overall the requirement to use JSON is imposed by OpenTelemetry spec - it is not specific to this use case. IMHO OpenTelemetry could be more flexible and let exporters decide what to do in this case instead of fitting data structures for protobuf usage. E.g. JSON or MsgPack handles it without any problems.

@yurishkuro all those error reporting services reinvent their own wheels to solve exactly the same task. And Sentry is not a particularly good example - it is a very old wheel with some legacy.

I would start with something very simple like:

  • file - required
  • line - required
  • func - required
  • column - optional for JS

It is self-documented and covers 99% use cases. I intentionally omit

  • language specific things like module, class, namespace etc - they can be stored in func attribute and are not that interesting to complicate things
  • frame vars aka context-locals - AFAIK only supported by Python & Ruby and double/triple stack size
  • IOS and symbolication in general - too complex and too specific
  • in_app, abs_path, raw_function - specific to Sentry
  • source_version - it is not clear why every frame needs this information and how to obtain it

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vmihailenco - I think that stack frame addressing may be a bit more varied than that. In Java you need both a class and a function to fully qualify a function unit and you won't have a column. In some cases a frame will not have a file or number- I believe that an exception thrown by a system framework in .Net would be an example of the latter.

Copy link

@vmihailenco vmihailenco Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java I see StackTraceElement that has

  • getMethodName
  • getClassName that can be used together with ClassUtil.getFilepath to obtain filepath
  • getLineNumber

That information can be nicely fitted in {func, file, line} structure I suggested.

In some cases a frame will not have a file or number

Ack, but that is not very popular and useful case - debugging becomes very complicated without such basic info. But we could omit missing fields and use only {func} when file:line is not available.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was actually just the first case I thought of, and that case is indeed useful- you may not go to the source to view the location, but a system framework frame is as useful to troubleshooting as an application source location. I believe a C/C++ stack trace won't have line numbers when debugging symbols have been stripped, either. I think that to @yurishkuro's point, having a "simple" tuple is liable to back us into a corner, and there is some prior work that could inform a more extensible approach.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the idea that there is some prior work with good ideas was repeated several times already - what are those ideas and how they solve C++ without debugging symbols use case?

Copy link
Member

@Oberon00 Oberon00 Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, in C++ you often have at least unwind information for exception handling, so you can get the stack in form of a list of return addresses (e.g. using libunwind). With that you record the base address of the module in which that adress points with it, so you get something like 0xfff12345 (libfoobar.so+0xabc12). At the backend, you may have (detached) debugging symbols for libfoobar.so available, so you can now combine that information and symbolize the module+offset pair to get something like 0xfff12345 (libfoobar.so!MyClass::myFunction(int,long)+0xa1, MyClass.cpp:356).

| :------------- | :--------- | :----------------------------------------------------------- | :-------- |
| error.type | String | The type of the error. E.g. "Exception", "OSError" | Yes |
| error.message | String | The error message. E.g. `"Division by zero"`, `"Can't convert 'int' object to str implicitly"` | Yes |
| error.stack | Array\<String\> | A stack trace formatted as an array of strings. The stackframe at which the exeception occurred should be the first entry. E.g. `["Exception in thread \"main\" java.lang.RuntimeException: Test exception", " at com.example.GenerateTrace.methodB(GenerateTrace.java:13)", " at com.example.GenerateTrace.methodA(GenerateTrace.java:9)", " at com.example.GenerateTrace.main(GenerateTrace.java:5)" ]` | No |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say you needed arrays for these conventions, but I can't see any advantages of the array over a newline-separated string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was (incorrectly) under the assumption that there were length limits for attribute values and that a collection would be treated differently, in that the length limits would apply to each entry rather than the value as a whole.

While it's true that OpenTelemetry doesn't specify length limits, backends and alternate SDK implementations might choose to do so. Having a stack trace stored as an array might make it easier to avoid truncation in this situation, so I still think the idea has some merit. Aside from that, I think a newline delimited string and an array of strings are roughly equivalent, and transforming between the two representations is fairly trivial.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of attributes that will be important which are covered in open-telemetry/oteps#69 are a unique hash for the particular exception and another unique hash for the exception type and line it was thrown in the code. This is important for deduplication and aggregation.

Also I still think it is important to support more kinds of errors besides application exceptions. For example, core dumps from IoT devices which would get sent after the device reboots.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There also needs to be an attribute which contains specific input parameter values. This is supported by several backend systems. It helps in debugging exceptions caused by unexpected data inputs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbrockhoff could you explain why the hashes can't be computed in the backend?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases, the only way to effectively calculate the identity of an exception object is the address in memory. The only alternative I can think of that would work on the backend would be if the file and line number of entire nested exception chain plus a full list of local variable values from the stack is recorded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address of what in memory? That sounds deployment-dependent. At Uber we have an internal system that performs deduping of exceptions, all in the backend, by hashing the stacktrace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hashing of the stacktrace working fine for identifying multiple triggers of the same exception/event but does not help in identifying multiple reports of the same invocation or event. For example if auto-instrumentation of a method call records the exception which is then also recorded by tracing instrumentation higher up in the call stack.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hashing of the stacktrace could be important when information is dropped- for example a Java stacktrace with nested causes can run into the hundreds of lines, so it's not uncommon to limit the data to the nearest so many frames and drop the rest, retaining a hash of the full exception would allow determination whether two now-identical stack traces started out that way.

On the other hand, creating a hash of the exception is going to require serializing the whole exception in order to hash it which might not be needed if you are only going to send a few frames.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A string representation, whether it's an array of strings or newline-separated, does encode data in a way that can be troublesome to parse later on. In most languages I'm familiar with, the language offers access to the stack at the time of exception handling- it would seem that putting the stack frames into a normalized form would make sense at that point, rather than requiring a language dependent parse stage on the backend.

@@ -0,0 +1,19 @@
# Semantic conventions for errors

This document defines semantic conventions for recording errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the semantic conventions here (especially error.type) it seems that these conventions are specifically geared towards recording thrown exceptions rather than any kind of error (cf. #306 which, despite the title, contains quite some interesting discussion about the issue).

Co-Authored-By: Armin Ruech <armin.ruech@gmail.com>
@vmihailenco
Copy link

vmihailenco commented Jan 29, 2020

Here is my attempt at extending this PR to support logs. I wonder whether this is a step in right direction or not...

@mwear
Copy link
Member Author

mwear commented Jan 30, 2020

@vmihailenco I would encourage you to open a PR for your attempt as a third option to consider. I think between this, your approach and OTEP-69 we have reasonable options to choose to from. We may end up with a combination of approaches, but having these here to drive the conversation forward is useful.

@@ -16,4 +16,5 @@ their types.
| :------------- | :--------- | :----------------------------------------------------------- | :-------- |
| error.type | String | The type of the error (its fully-qualified class name, if applicable). E.g. "java.net.ConnectException", "OSError" | Yes |
| error.message | String | The error message. E.g. `"Division by zero"`, `"Can't convert 'int' object to str implicitly"` | Yes |
| error.language | String | The language from which the error was generated. E.g. "Java", "Python", "Ruby". | Yes |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have recommendations for how to spell language names in the confusingly named resource semantic conventions for the telemetry library: https://github.com/open-telemetry/opentelemetry-specification/blob/3bbed37ba65df027721a6a3ab62f62dfc7fcf467/specification/data-resource-semantic-conventions.md#library

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Language should be captured at the SDK level. Do we need it repeated on errors?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all SDK. Do you mean resource level?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean resource that identifies the SDK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @yurishkuro. I think we can assume that the backend can get this information from the resource identifying the application. We shouldn't have to repeat it on every error. I removed error.language for now. If it turns out that we need it, I can add it back in later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are cases like Jython, or C# calling into native code...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, Visual Studio shows the language for each frame in a Stack trace, so you can get as complicated as you want here.

their types.

| Attribute name | Type | Notes and examples | Required? |
| :------------- | :--------- | :----------------------------------------------------------- | :-------- |
Copy link

@vikrambe vikrambe Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwear thanks for tagging this PR @ open-telemetry/oteps#86.
Proposed fields are already part of opentracing specification. Please refer https://github.com/opentracing/specification/blob/master/semantic_conventions.md#log-fields-table

I like the discussion on error.stack. We should add below tag https://github.com/opentracing/specification/blob/master/semantic_conventions.md#span-tags-table

error | bool | true if and only if the application considers the operation represented by the Span to have failed|

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was added, I'd rather call it fatal_error, unrecoverable or similar. It would still be an error that has occurred even after being able to recover from it by retrying or using another fallback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tigrannajaryan suggested that we use log.severity to indicate how severe the exception is. I would prefer log.severity in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jmacd. Small correction: log data model uses SeverityNumber. Would be useful to keep the name consistent here too (and I am not sure what casing we want to use here - log data model does not mandate a particular casing).

@mwear
Copy link
Member Author

mwear commented Feb 11, 2020

I went ahead and wrote some ruby code for this proposal and #432 to see how they compare.

Example for #427

begin
  buggy_method     # raises an exception when called
rescue => e
  span.add_event(name: 'error',
                 attributes: {
                   'error.type' => e.class,
                   'error.message' => e.message,
                   'error.stack' => e.backtrace
                 })
end

Example for #432

begin
  buggy_method     # raises an exception when called
rescue => e
  span.add_event(name: 'error',
                 attributes: {
                   'log.type' => e.class,
                   'log.severity' => 'error',
                   'log.message' => e.message,
                   'frame.file' => e.backtrace_locations[0].absolute_path,
                   'frame.line' => e.backtrace_locations[0].lineno,
                   'frame.func' => e.backtrace_locations[0].label,
                   'frame.stack' => stack_to_json(e)
                 })
end

def stack_to_json(e)
  JSON.dump(
    e.backtrace_locations.map do |loc|
      { 'frame.file' => loc.absolute_path, 'frame.line' => loc.lineno, 'frame.func' => loc.label }
    end
  )
end

#427 has merits for its simplicity. It does not cover logs as #432 does, although it could be abstracted for that purpose.

I like that #432 works generically for logs with an error being a specific type of log. There is more work to transform the backtrace into json. I also noticed that frame.file, frame.line and frame.func duplicate the first entry of the frame.stack in the case of an exception. They are listed as optional, so perhaps they should be omitted when also recording frame.stack.

Both of these approaches seem viable and I'd be 👍 on either. It'd be interesting to see how this plays out in other languages.

@jmacd
Copy link
Contributor

jmacd commented Jun 26, 2020

I want to add that I like the schema in https://cloud.google.com/trace/docs/reference/v2/rest/v2/StackTrace that you pointed out. If we're going with a structured representation, that looks good.

The 50-line version (Cloud Trace Java) does not include causes or limit stack depth. The 150-line version (X-Ray) does include causes and limit stack depth. Which do you prefer? I think I would prefer to include causes, sometimes I believe they're necessary to categorize an error stack. We should compare these two functions with the builtin Throwable.printStackTrace(), see what it does and why. I also believe the Java printed stacktrace takes care to elide common stack prefixes of causal stacks, therefore to do the best we can, even the 150 line code example isn't our best option.

The limit on stack depth is pretty important, as a complete stack in a common Java framework can easily add 10s of KB of annotation if we are not careful.

I looked up the Go implementation of runtime.Stack(). Here it is:

// Stack formats a stack trace of the calling goroutine into buf
// and returns the number of bytes written to buf.
// If all is true, Stack formats stack traces of all other goroutines
// into buf after the trace for the current goroutine.
func Stack(buf []byte, all bool) int {
	if all {
		stopTheWorld("stack trace")
	}

	n := 0
	if len(buf) > 0 {
		gp := getg()
		sp := getcallersp()
		pc := getcallerpc()
		systemstack(func() {
			g0 := getg()
			// Force traceback=1 to override GOTRACEBACK setting,
			// so that Stack's results are consistent.
			// GOTRACEBACK is only about crash dumps.
			g0.m.traceback = 1
			g0.writebuf = buf[0:0:len(buf)]
			goroutineheader(gp)
			traceback(pc, sp, 0, gp)
			if all {
				tracebackothers(gp)
			}
			g0.m.traceback = 0
			n = len(g0.writebuf)
			g0.writebuf = nil
		})
	}

	if all {
		startTheWorld()
	}
	return n
}

That's optimized in a way not available to the user. I'd like to leave this task to the runtime.

I'm still worried for the user, regardless of the implementation. Including a complete stack can be dangerous if, for some reason, suddenly every request encounters an error. Will the SDK need to implement limits on error stack rate? (I am reminded of https://www.sigops.org/s/conferences/sosp/2009/papers/glerum-sosp09.pdf)

@anuraaga
Copy link
Contributor

The 50-line version (Cloud Trace Java) does not include causes or limit stack depth. The 150-line version (X-Ray) does include causes and limit stack depth. Which do you prefer?

Yeah would prefer to have causes (with a limit on the entire stack payload). That being said, it's true the difference in overhead for having cause vs not is pretty huge, so leaving it out at least in a first version that only had the handled exception would be pretty good bang for buck I think.

We should compare these two functions with the builtin Throwable.printStackTrace(), see what it does and why.

Yup, the complexity of that seems to indeed be trimming out common frames.

https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Throwable.java#L705

If we include causes, this would be a nice optimization to copy into any description of the stack trace in the model, but also maybe not in an initial version. Luckily, once it's implemented once for a language it should be stable for a long time while they implement the myriads of framework instrumentation :)

I looked up the Go implementation of runtime.Stack(). Here it is:

API-wise, Go's Callers API seems pretty clean.

https://golang.org/pkg/runtime/#CallersFrames

We'd need to run a benchmark to compare the performance of that vs runtime.Stack to see how much we lose by working on the objects instead of the string. I'm sure it's a lot, but need to see if it's so significant that we'd instead pick String as the API for accessing stack traces. Maybe it is.

I'm still worried for the user, regardless of the implementation. Including a complete stack can be dangerous if, for some reason, suddenly every request encounters an error.

Yeah, this is definitely an issue - to clarify, I guess this is more about including stack trace in any form, structured or not, though structured would probably have a larger impact. I agree this should either be opt-out or even opt-in. I suspect sites where this could really be an issue, due to serving huge traffic, will have sophisticated setups like log correlation in place, and the importance of the stack trace in the span goes down. The sites that only look at tracing independently though can get a lot out of seeing the stack trace though. Ideally the number of sites not using correlation goes down and down over time, but I think we have a long way to go for that.

@trask
Copy link
Member

trask commented Jun 26, 2020

I would not want to lose the cause chain.

Adding to the X-Ray and GCP examples from above, here's an example from Glowroot that has both common frame trimming (which really helps reduce payload) and overall frame limit:

https://github.com/glowroot/glowroot/blob/master/agent/core/src/main/java/org/glowroot/agent/model/ErrorMessage.java

@jmacd
Copy link
Contributor

jmacd commented Jun 26, 2020

I'm still trying to compare the CPU and memory allocation cost of recording structured stacks as compared to using a string encoding. The contention was that coding this structure by allocating a bunch of objects and then calling a protobuf encoder adds a lot of cost compared with sending a string form.

This shifts the complexity and risk to the server side, and it relies on a stable format across time. OpenJDK's printStackTrace writes:

The format of this information depends on the implementation, but the following example may be regarded as typical.

and it's followed by a lengthy explanation of how to parse the output. I believe letting the server parse the stack is a better arrangement from a total operational cost standpoint. I wonder if XX:MaxJavaStackTraceDepth is sufficient as a way to limit the number of frames printed, as opposed to requiring a custom implementation.

I looked at Throwable.printStackTrace() and see one safety issue ("Guard against malicious overrides of Throwable.equals") that is not captured in any of the three examples under examination (Google Cloud Trace, X-Ray, Glowroot).

I would put safety above cost in this discussion.

@anuraaga
Copy link
Contributor

anuraaga commented Jun 27, 2020

I agree safety is the most important - it's the main reason I want to use structs :) Even printStackTrace has the caveat that "The format of this information depends on the implementation", and there has indeed been a significant format change with Java 7. Having parsing fail because a backend didn't catch up with a Java or Android version seems fragile and with a higher chance of failure, admittedly it's not something that happens often but there's no guarantee of safety here. This is just one programming language, others may take even less care with stack trace strings, but the native APIs of the languages don't have this issue.

I appreciate the concern about the missing checks / features in our reference implementations. But I don't think that's a reason to assume we can't get it right - we're doing a good job of learning from prior art and going through the details. Serializing a stack isn't such a complex issue that this is untenable, and once we have it down right it should be forwards compatible through any changes in the runtime, I don't see a big reason to expect our code to be less safe than a runtime.

I'm still trying to compare the CPU and memory allocation cost of recording structured stacks as compared to using a string encoding.

I spiked a change to marshall exceptions into the proto and compare with string.

https://github.com/anuraaga/opentelemetry-java/blob/exception-serialize/exporters/otlp/src/jmh/java/io/opentelemetry/trace/ExceptionDescriptionBenchmark.java
https://gist.github.com/anuraaga/4e6f09408392264fedbfc13c98f2d3f8
(note the proto change is in the gist itself in case anyone wants to compile themselves, since it'd be tedious to propagate that through submodules in a way that can be pushed)

As with all benchmarks, there's always a chance it's missing something, happy to adjust with any feedback. FWIW, I tried a default StringWriter before and tried changing to presized StringBuilder wrapped with writer, but didn't see any real difference.

From the numbers, it seems that proto is a little faster, maybe due to the reduced amount of whitespace / words like At:. But even with 150 stack frames, both are <50us - don't know what people's thresholds are but this seems low enough not to worry too much about to me. Of course, this is only Java, but I figure other languages like Go would be at least as fast.

One thing I noticed with MaxJavaStackTraceDepth is it affects population of stack trace, not printing, so the flag affects both approaches the same. If we wanted to use printStackTrace with our own limit on depth, I can't think of a way other than a line-length-limited PrintWriter which still has to iterate the entire stack either way.

Let me know if any other info can help here (though I can't commit to writing similar code in other languages).

@jmacd
Copy link
Contributor

jmacd commented Jun 29, 2020

We're thinking about "safety" differently.

I agree safety is the most important - it's the main reason I want to use structs :)

To me, safety means not crashing or deadlocking the caller. I think your concern is more about "reliability". The safety issue in this case is that a malicious Throwable would cause an infinite loop or a stack overflow, and one of the reasons I prefer to use a builtin stack-to-string method is that we should expect the language runtime to be safer. We were looking at OpenJDK code, but another implementation of Throwable.printStackTrace() might incorporate implementation-specific safety features as well.

There are many risks in calling out to a protocol-buffer implementation, but we're already factoring some of this risk in when we export data. Still, adding a bunch of memory allocations creates risk of OOMing and otherwise destabilizing the application. Builtin stack-to-string methods generally are optimized in ways that the user does not have access to. For example, in Java Throwable.printStackTrace() does not need to clone the stack trace, whereas a user-reported stack does have to clone the stack trace. The builtin method is predicted to use less memory than our library would.

I sense that a lot of demand for this "feature" is essentially coming from vendors who are trying to simplify their services (I suspect a there is a tiny amount of cost-shifting here too: why pay for something if you can ask your users to pay for it?)

Lastly, let's ask: is reliability a real problem when parsing a stack from the builtin stack-to-string function? I will argued that this concern is over-inflated. The stack trace format of most language runtimes is stable in time, precisely because it is meant to be parsed and language implementors know that users are already parsing these stacks. Java 7 changed their stacktrace format in a backward compatible way when they introduced Suppressed exceptions.

It is a significant amount of work to implement these parsers, but I do not expect a great deal of ongoing maintenance will be needed. This functionality will be owned by the community and contributed to the opentelemetry collector. When problems are discovered, the parsers will be improved. Vendors and/or users can operate the otel-collector as a service for stacktrace parsing. I believe the total cost of maintaining stack parsers will be lower than the total cost of maintaining structured stack exporters in every language.

Note that extracting and parsing stacks from unstructured text is a more difficult task than parsing a stack from a structured field (where length is delimited for you), but this is not what w're talking about.

Note that users who are operating custom runtimes will be forced to output stacks in a well-known text format when annotating for OTel, given my proposal. I would recommend they re-use one of the supported formats.

Lastly, I can see that I'm arguing an uphill battle here. I would be willing to compromise with both solutions. Let users send stacks as strings, which gives us the safer, more-performant option. Let us also specify the structure of a stack and allow libraries to implement the structured representation too. Then we'll let users choose: the less-risky less-expensive option that does require a collector? Or, the more-risky more-expensive option that does not require a collector?

@anuraaga
Copy link
Contributor

The only thing I'll respond to here, the rest seems like mostly rehash with an inappropriate cost-shifting allegation.

Java 7 changed their stacktrace format in a backward compatible way when they introduced Suppressed exceptions.

In a text format, changing the way that the text is outputted, with different text and whitespace is a backwards incompatible change. A parser may do fine with it, or may not, and unfortunately that's generally only known after the change has been made.

Let users send stacks as strings. Let us also specify the structure of a stack and allow libraries to implement the structured representation too.

This seems fine to me. There was some interest in this with concern about CSV, I would still argue the structured version should be proto like the rest of our structure.

I think we've added our input as well as we can. Committee, can we get some steering? @carlosalberto @bogdandrutu @yurishkuro @SergeyKanzhelev

@jmacd
Copy link
Contributor

jmacd commented Jun 30, 2020

I don't see the point about cost shifting as inappropriate. One of the major techniques in reducing the cost of instrumentation is to defer work. It's cheaper to defer stacktrace parsing until it's needed. Cost and safety are both important, and still think the concern about reliability is over-inflated.

When stacktrace parsing fails, the result is still a human-readable stacktrace in its natural representation. The user is still able to read their data, it's only an automated error monitoring system that will fail at the point stacktraces do not parse. System operators would of course be advised to test their error monitoring systems before deploying a major new language runtime release. I expect stacktrace formats to evolve, just not quickly with respect to OTel collector releases. When problems arise they'll be fixed long before responsible system operators are deploying those language runtimes in production, I would assert.

About the benchmarks:

I am not a Java or JMH expert, but I want to be sure that you are not microbenchmarking only the CPU cost and that your benchmark factors in GC. I believe you can use the THROUGHPUT BenchmarkMode for this?

The printStackTrace variation is testing the builtin method, which does eliminate common stack frames from causes. The structured variation is not eliminating common stack frames. I suspect some of the difference in performance is related to this cost.

@trask
Copy link
Member

trask commented Jun 30, 2020

I want to be sure that you are not microbenchmarking only the CPU cost and that your benchmark factors in GC

I would assume structs are going to be way cheaper memory-wise, because they get to just point to all of the existing JVM-provided string constants that represent class/method names, and don't have to concatenate them all into a separate, very large, string.

I can take review the benchmark if that would help the discussion.

@anuraaga
Copy link
Contributor

anuraaga commented Jun 30, 2020

When stacktrace parsing fails, the result is still a human-readable stacktrace in its natural representation.

Ah didn't realize you were treating the free-form output and automated monitoring / code jump type features separately. Agree that there aren't really any compatibility concerns for the free-form output to a user.

One thing to keep in mind is a collector will generally be infrastructure, probably managed by a different team than business logic. If a team wants to try out a new language that isn't even supported by their infrastructure, it's nice that this would still work as best as it can without requiring the infrastructure team to deploy an updated collector, it somewhat leaves the destiny of the language in the hands of the language itself, not the infrastructure.

I am not a Java or JMH expert, but I want to be sure that you are not microbenchmarking only the CPU cost and that your benchmark factors in GC. I believe you can use the THROUGHPUT BenchmarkMode for this?

Thanks, happy to tweak as needed. AFAIK, GC is being reflected (using something like forceGC to force between runs would reduce its impact) - my understanding is average time and throughput are basically just inverse numbers, but indeed throughput might give a clearer idea for operations where we care about QPS. I added throughput numbers too, they indeed seem to follow the same trends as the average times.

Looking at the GC stats, there seems to be some small difference but over different runs I think I've seen both versions take more GC time than the other and it seems like noise.

The printStackTrace variation is testing the builtin method, which does eliminate common stack frames from causes. The structured variation is not eliminating common stack frames. I suspect some of the difference in performance is related to this cost.

I should have linked to the code too, but it does at least intend to eliminate common frames, didn't see anything out of the ordinary when comparing the output of the main function when playing with it.

https://github.com/anuraaga/opentelemetry-java/blob/exception-serialize/api/src/main/java/io/opentelemetry/trace/StackTrace.java#L25
https://github.com/anuraaga/opentelemetry-java/blob/exception-serialize/api/src/main/java/io/opentelemetry/trace/ExceptionDescription.java#L12

@yurishkuro
Copy link
Member

I suggest supporting both text and structured format. There may be languages where structured access to stack frames is not available and text is the only option. When structured access is available, I'd be surprised if it's not possible to write serialization into a structured format (in a pre-allocated buffer) that is just as efficient as the runtime's toString. Also, efficiency is an interesting topic given that we're talking about thrown exceptions, one could argue that it's not necessary to optimize the error conditions.

FWIW, the safety argument does not sound very convincing to me. There are plenty of prior art of libraries that capture very rich stack traces (e.g. https://github.com/getsentry/sentry#official-sentry-sdks). I don't think it's as hard of a problem as @jmacd is implying.

I am also very interested in having the ability to represent application state (like the local vars) for each frame. Plain text stack trace output won't be able to represent them, but structured format can be further extended, if Frame is its own struct like in https://cloud.google.com/trace/docs/reference/v2/rest/v2/StackTrace or in Sentry Go.

@jmacd
Copy link
Contributor

jmacd commented Jun 30, 2020

Presumably the structured protocol would be a part of OTLP. I am concerned that the amount of work implied by supporting structured exceptions is large, and I would be much happier with this plan if I knew there were committed resources willing to add this support in the collector and all the languages. Since I don't think that commitment exists, can we agree to make a string-valued error.stack semantic convention for a GA release, and follow that with support for structured exceptions?

@yurishkuro points to the solution: a hand-coded implementation that encodes structured data can be as safe and fast as the built-in stack-to-string method. That's a lot of work (more than writing a parser, probably 😀 ), and this has been my concern all along, that we're getting in the way of a GA release.

@anuraaga
Copy link
Contributor

Agree that we should get error.stack in for now and can add structure to the protocol separately - I'd probably stick with a single string instead of array of strings.

I'm quite interested in the topic of structured exceptions and am interested in helping drive it through, my spike contains much of the code change that would be required in proto + java, though is by no means cleaned up yet. But I'm very hesitant to continue on this as I don't see any reason it won't result in another wall of defensiveness. When even benchmarks can't produce a thought that, "hey maybe the performance issue isn't so clear after all" in the very end (it took no hand-coded encoding for proto to perform generally better than the string, and that's with buffer recycling for the string and not for the proto, that doesn't guarantee the benchmark is ideal but it's saying something), I can't imagine working on this topic to be resulting in anything more than burnout... Will just go ahead and wait on the sidelines :)

@jmacd
Copy link
Contributor

jmacd commented Jun 30, 2020

@anuraaga I'm sorry that you see this as a wall of defensiveness. It appears to be just me. If we are not going to block GA on this feature request, I'm feeling less defensive, although I'm still not convinced that a structured representation improves life for the user (structured stacks are less readable, meaning the user needs more tooling to study their data).

I raised a number of points related to performance, because it seems like there is a common argument that a structured approach is going to be faster or at least not slower. I've pointed out that the Java printStackTrace() function has more safety than the version you tested, precisely because it builds a map to avoid a real pitfall, so I still haven't seen a convincing benchmark.

@trask
Copy link
Member

trask commented Jun 30, 2020

I reviewed the benchmark.

The struct-based benchmark does have the same safety check as printStackTrace(), so I think it's a valid comparison.

And the benchmark does show the struct-based approach being faster (I ran it locally to validate).

[note, the jmh results do not show gc time normalized per operation, so you need to do some division to compare those numbers]

@jmacd
Copy link
Contributor

jmacd commented Jun 30, 2020

Thank you @trask.

I don't think it's worth continuing to debate the benchmark. It's curious that the text version is allocating more memory than the structured version, but I'm not qualified to scrutinize them any further.

@trask
Copy link
Member

trask commented Jun 30, 2020

It's curious that the text version is allocating more memory than the structured version

See my comment above:

I would assume structs are going to be way cheaper memory-wise, because they get to just point to all of the existing JVM-provided string constants that represent class/method names, and don't have to concatenate them all into a separate, very large, string.

@jmacd
Copy link
Contributor

jmacd commented Jun 30, 2020

because they get to just point to all of the existing JVM-provided string constants

In the end, the structured form is copied into a byte array, and the text form is copied into a string. Each of the underlying strings should be copied once, so that doesn't sound like the explanation I'm looking for. Either you serialize the underlying strings (copying them once) with protobuf encoding bits in between, or you serialize the underlying strings (copying them once) with syntactic markers in between. Anyway, I shouldn't be discussing Java performance.

@trask
Copy link
Member

trask commented Jun 30, 2020

In the end, the structured form is copied into a byte array, and the text form is copied into a string

and then the text form (string) still has to be copied (again) into a byte array (or streamed directly over the wire, but you could do that with the structured form also)

@jmacd
Copy link
Contributor

jmacd commented Jul 1, 2020

copied (again) into a byte array

The point was that each string should be copied once in both cases. In the benchmark, the string representation is being copied twice, once into a StringBuilder buffer and once as a result of StringBuilder.toString(). Since the Stringbuilder's buffer is a thread local, that doesn't explain the difference in rate of memory allocation shown in the JMH numbers as far as I can tell--I would expect one large buffer allocation per run of this benchmark in both cases (a toByteArray() or a toString()).

My guess is that the difference in memory allocation rate has something not obvious to do with Throwable.printStackTrace(). Writing a stack as a string is supposed to require fewer memory allocations, and if that's not happening, something else is wrong. I should not have waded this far into a discussion of Java performance. This contention began for me, because it sounded like a repeat of the debate we had in Golang, where I benchmarked a text format (OpenMetrics) vs a protobuf format (OpenCensus). The text format was significantly faster.

@Oberon00
Copy link
Member

Oberon00 commented Jul 7, 2020

Is it OK to rename this PR to "Add semantic conventions for exceptions"? Since errors are a more generic topics and we have several PRs & issues with similar names.

@mwear
Copy link
Member Author

mwear commented Jul 10, 2020

I created another, updated version of this proposal in #697, PTAL.

@reyang reyang added release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:miscellaneous For issues that don't match any other spec label labels Jul 10, 2020
@mwear
Copy link
Member Author

mwear commented Jul 28, 2020

Closing. This work was addressed in #697.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:error-reporting Related to error reporting release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:miscellaneous For issues that don't match any other spec label
Projects
None yet
Development

Successfully merging this pull request may close these issues.