Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of the runtime state dump #399

Merged
merged 2 commits into from
May 27, 2020

Conversation

ConnorMcMahon
Copy link
Contributor

The actions of dumping the orchestration state for the telemetry event
type TaskOrchestrationDispatcher-ExecuteUserOrchestration-Begin is
expensive. It scales linearly with the number of events, meaning
orchestrations with large histories start to spend most of their time
writing this verbose telemetry item, which is not ideal.

Instead, we now just log the number of events, as this is a constant
time operation.

The actions of dumping the orchestration state for the telemetry event
type TaskOrchestrationDispatcher-ExecuteUserOrchestration-Begin is
expensive. It scales linearly with the number of events, meaning
orchestrations with large histories start to spend most of their time
writing this verbose telemetry item, which is not ideal.

Instead, we now just log the number of events, as this is a constant
time operation.
@cgillum
Copy link
Member

cgillum commented May 26, 2020

Wow, I have known about this verbose tracing for a while but had not gone the next step to see whether it might be impactful! Thanks for finding and making this fix.

To avoid impacting anyone who might depend on this behavior, I suggest we employ conditional complication. For #if DEBUG, let's keep the behavior as-is. Otherwise, we can use your new trace. Does that sound like a good idea?

@sebastianburckhardt
Copy link
Collaborator

Should the verbose tracing for DurableTask.Core even enabled by default? I am not sure we need it. If it is not enabled by default, this whole problem has an easy solution: check the verbosity level before doing expensive formatting, e.g. as in

if (TraceHelper.IsVerboseEnabled)
{
        if (TraceHelper.TraceInstance(
                        TraceEventType.Verbose,
                        "TaskOrchestrationDispatcher-ExecuteUserOrchestration-Begin",
                        runtimeState.OrchestrationInstance,
                        "Executing user orchestration: {0}",
                        DataConverter.Serialize(runtimeState.GetOrchestrationRuntimeStateDump(this.efficientTelemetry), true));
}

@ConnorMcMahon
Copy link
Contributor Author

ConnorMcMahon commented May 26, 2020

@sebastianburckhardt,

Unfortunately, it looks like ETW doesn't give you very good data on whether the Event Source is enabled unless you have a specific listener enabled. See here. Since TraceHelper's Is<log-level>Enabled methods rely on the EventSource.IsEnabled() method, I'm not sure we can rely on it.

I think that as a quick workaround in the meantime, I will go with Chris's suggestion.

@sebastianburckhardt
Copy link
Collaborator

As far as I understand the general idea (for both EventSource and ILogger) is that the log level is determined by the subscribed consumers, not by the producer. I think we should try to follow this standard pattern where possible.

In this particular case I would suggest that for our DF hosted scenarios, we stop collecting verbose tracing from DurableTask.Core. I don't think it gets stored in the Kusto logs anyway, so I am not sure why we should generate those events.

@cgillum
Copy link
Member

cgillum commented May 27, 2020

I believe you are correct that this behavior is determined by the consumer. The problem I think we're facing is that we do not control the consumer. Rather, the consumer is a piece of infrastructure that runs on the Azure VMs (we call it "MDS"). We explicitly ask it to listen to DurableTask-Core event traces so that we can capture specific event IDs, but I suspect this infrastructure is subscribing to all trace levels. We are able to have more controlled filters before pushing the data to Kusto, but I don't know if we have the ability to control the trace levels on the ETW consumer side. It's something we'd need to look into and I'm not confident that the MDS configuration allows us this level of control.

@sebastianburckhardt
Copy link
Collaborator

I see. In that case perhaps the next best solution is to extend the TraceHelper so a lower limit on the tracing level can be specified by a configuration parameter. This is what I did for the EventSourced backend. We don't have a nice way to pass configuration parameters right now though, I was discussing that with @ConnorMcMahon last week.

@ConnorMcMahon ConnorMcMahon merged commit e996f92 into master May 27, 2020
@ConnorMcMahon ConnorMcMahon deleted the ImproveCorePerformance branch May 27, 2020 23:43
@oising
Copy link

oising commented May 31, 2020

@sebastianburckhardt Sorry to jump in here (hi @cgillum), but any chance of a private preview/alpha for Microsoft.Azure.DurableTask.EventSourced with durable entities? I'm trying to make a critical path decision over moving to Orleans or not in the next 6 months... :P

@oising
Copy link

oising commented Jun 5, 2020

@sebastianburckhardt Sorry to jump in here (hi @cgillum), but any chance of a private preview/alpha for Microsoft.Azure.DurableTask.EventSourced with durable entities? I'm trying to make a critical path decision over moving to Orleans or not in the next 6 months... :P

Switched to Orleans.

shankarsama pushed a commit to shankarsama/durabletask that referenced this pull request Apr 27, 2021
The actions of dumping the orchestration state for the telemetry event
type TaskOrchestrationDispatcher-ExecuteUserOrchestration-Begin is
expensive. It scales linearly with the number of events, meaning
orchestrations with large histories start to spend most of their time
writing this verbose telemetry item, which is not ideal.

Instead, we now just log the number of events for release bits, as this is a
constant time operation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants