-
Notifications
You must be signed in to change notification settings - Fork 151
Add modified WCF CallTarget instrumentation via opt-in environment variable #1992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add a new Custom binding that will throw an exception when accessing the RequestMessage, similar to what we've seen in production - Test all 4 bindings in the integration tests - Update WcfTests to use snapshot testing - Create test cases for the new WCF tracing mode, enabled by setting DD_TRACE_WCF_ENABLE_NEW_INSTRUMENTATION=true
…CF integration is disabled
…gumentsToLoad and TargetMethodArgumentsToLoadLength properties. This allows the OnBegin method to use a subset of the instrumented method's parameters, allowing us to instrument methods with ref arguments
…ENTATION setting, disable original WCF instrumentation when DD_TRACE_WCF_ENABLE_NEW_INSTRUMENTATION=true, and update tests/snapshots
…Invoker and update snapshots
…Invoker and update snapshots
…dInvoker and update snapshots. IMPORTANT: Does not work with TcpBinding; OnMethodEnd for the InvokeEnd operation does not detect ambient Scope
This comment has been minimized.
This comment has been minimized.
andrewlock
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! That looked like "fun" 😅 Minor comments, mostly for my understanding
tracer/test/Datadog.Trace.ClrProfiler.IntegrationTests/WcfTests.cs
Outdated
Show resolved
Hide resolved
...nding=BasicHttpBinding_enableCallTarget=False_enableNewWcfInstrumentation=False.verified.txt
Show resolved
Hide resolved
|
|
||
| auto numArguments = (int) methodArguments.size(); | ||
| auto numArgsToLoad = argsToLoad.size(); | ||
| bool useArgToLoad = numArgsToLoad > 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unlikely, but would we ever want to load 0 arguments (i.e. just access the instance)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible, so I added another field to allow us to disambiguate between 0 arguments and NOT use the feature in f9b0cf0
| auto numArguments = (int) methodArguments.size(); | ||
| auto numArgsToLoad = argsToLoad.size(); | ||
| bool useArgToLoad = numArgsToLoad > 0; | ||
| auto numArguments = useArgToLoad ? (int) numArgsToLoad : (int) methodArguments.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, just to clarify, changing this doesn't mean we accidentally select the wrong overload, it just means we don't pass those arguments to our instrumentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's right in this part of the code we are already rewriting the method, not selecting it.
tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Wcf/SyncMethodInvokerIntegration.cs
Outdated
Show resolved
Hide resolved
| var operationContextType = Type.GetType("System.ServiceModel.OperationContext, System.ServiceModel, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089", throwOnError: false); | ||
| if (operationContextType is not null) | ||
| { | ||
| var property = operationContextType.GetProperty("Current", BindingFlags.Public | BindingFlags.Static); | ||
| var method = property.GetGetMethod(); | ||
| _getCurrentOperationContext = (Func<object>)method.CreateDelegate(typeof(Func<object>)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is the same (I think) in all the new integrations, is it worth moving to a helper, and using a generic helper with <TTarget> to cache it between all the instances)? We could also avoid the reflection entirely if the integration or the feature flag is disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha @andrewlock also notice it... I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I got too excited about posting the PR that I forgot to consolidate the methods. Done in c5987de
...atadog.Trace/ClrProfiler/AutoInstrumentation/Wcf/AsyncMethodInvoker_InvokeEnd_Integration.cs
Show resolved
Hide resolved
|
Can we keep the |
tonyredondo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, just a couple of comments.
|
|
||
| HRESULT WriteBeginMethod(void* rewriterWrapperPtr, mdTypeRef integrationTypeRef, const TypeInfo* currentType, | ||
| std::vector<FunctionMethodArgument>& methodArguments, ILInstr** instruction); | ||
| std::vector<FunctionMethodArgument>& methodArguments, std::vector<USHORT>& argsToLoad, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe the new vector can be passed as a const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 75f06c9
| MethodReplacement( | ||
| {}, | ||
| MethodReference(targetAssembly, targetType, targetMethod, EmptyWStr, minVersion, maxVersion, | ||
| {}, signatureTypes), | ||
| {}, signatureTypes, targetMethodArgumentsToLoad), | ||
| MethodReference(wrapperAssembly, wrapperType, EmptyWStr, calltarget_modification_action, {}, {}, {}, | ||
| {}))); | ||
| {}, {}))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm checking this, porting this fix to release/2.0 will be a rewrite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah...I can do the rewrite for 2.0
| static AsyncMethodInvoker_InvokeEnd_Integration() | ||
| { | ||
| var operationContextType = Type.GetType("System.ServiceModel.OperationContext, System.ServiceModel, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089", throwOnError: false); | ||
| if (operationContextType is not null) | ||
| { | ||
| var property = operationContextType.GetProperty("Current", BindingFlags.Public | BindingFlags.Static); | ||
| var method = property.GetGetMethod(); | ||
| _getCurrentOperationContext = (Func<object>)method.CreateDelegate(typeof(Func<object>)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same as the integration above? (AsyncMethodInvoker_InvokeBegin_Integration)
Maybe having an Integration common class so we are not creating two delegates for the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's the same. I got so excited that everything was working that I posted the PR without consolidating these into a helper class. Done in c5987de
| static SyncMethodInvokerIntegration() | ||
| { | ||
| var operationContextType = Type.GetType("System.ServiceModel.OperationContext, System.ServiceModel, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089", throwOnError: false); | ||
| if (operationContextType is not null) | ||
| { | ||
| var property = operationContextType.GetProperty("Current", BindingFlags.Public | BindingFlags.Static); | ||
| var method = property.GetGetMethod(); | ||
| _getCurrentOperationContext = (Func<object>)method.CreateDelegate(typeof(Func<object>)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, fixed in c5987de
| var operationContextType = Type.GetType("System.ServiceModel.OperationContext, System.ServiceModel, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089", throwOnError: false); | ||
| if (operationContextType is not null) | ||
| { | ||
| var property = operationContextType.GetProperty("Current", BindingFlags.Public | BindingFlags.Static); | ||
| var method = property.GetGetMethod(); | ||
| _getCurrentOperationContext = (Func<object>)method.CreateDelegate(typeof(Func<object>)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha @andrewlock also notice it... I agree.
| new("System.Data.SQLite", "System.Data.SQLite.SQLiteCommand", "ExecuteReader", new[] { "System.Data.SQLite.SQLiteDataReader", "System.Data.CommandBehavior" }, 1, 0, 0, 2, 65535, 65535, assemblyFullName, "Datadog.Trace.ClrProfiler.AutoInstrumentation.AdoNet.CommandExecuteReaderWithBehaviorIntegration"), | ||
| new("System.Data.SQLite", "System.Data.SQLite.SQLiteCommand", "ExecuteScalar", new[] { "System.Object" }, 1, 0, 0, 2, 65535, 65535, assemblyFullName, "Datadog.Trace.ClrProfiler.AutoInstrumentation.AdoNet.CommandExecuteScalarIntegration"), | ||
| new("System.Data.SQLite", "System.Data.SQLite.SQLiteCommand", "ExecuteScalar", new[] { "System.Object", "System.Data.CommandBehavior" }, 1, 0, 0, 2, 65535, 65535, assemblyFullName, "Datadog.Trace.ClrProfiler.AutoInstrumentation.AdoNet.CommandExecuteScalarWithBehaviorIntegration"), | ||
| new("Microsoft.Data.SqlClient", "Microsoft.Data.SqlClient.SqlCommand", "ExecuteDbDataReader", new[] { "System.Data.Common.DbDataReader", "System.Data.CommandBehavior" }, new ushort[] { }, 1, 0, 0, 3, 65535, 65535, assemblyFullName, "Datadog.Trace.ClrProfiler.AutoInstrumentation.AdoNet.CommandExecuteReaderWithBehaviorIntegration"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should accept null values instead of new ushort[] { }, this seems like a waste of allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would also play into the case of where we don't need to pass any arguments to the integration as I mentioned - an empty array would mean "no arguments", as opposed to null which would mean "all arguments"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed both issues in f9b0cf0 . null results in us not using the feature, and an empty array will allow us to use the feature by passing zero arguments to the instrumentation method
| TargetMethodArgumentsToLoad = IntPtr.Zero; | ||
| if (targetMethodArgumentsToLoad?.Length > 0) | ||
| { | ||
| // The Marshal operations only operate on short not ushort (CLS-compliance) | ||
| TargetMethodArgumentsToLoad = Marshal.AllocHGlobal(targetMethodArgumentsToLoad.Length * Marshal.SizeOf(typeof(short))); | ||
| var ptr = TargetMethodArgumentsToLoad; | ||
| for (var i = 0; i < targetMethodArgumentsToLoad.Length; i++) | ||
| { | ||
| Marshal.WriteInt16(ptr, (short)targetMethodArgumentsToLoad[i]); | ||
| ptr += Marshal.SizeOf(typeof(short)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should also do the disposal of the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 0992cfb . Does that look fine to you?
…the WcfCommon.cs class (intentionally new file to avoid rename conflicts with existing WcfIntegration.cs) - Populate the GetCurrentOperationContext delegate lazily - Check that the WCF integration is enabled and the feature flag is enabled FIRST, before checking whether the GetCurrentOperationContext is non-null
…NTATION_ENABLED - Move the DelayWcfInstrumentationEnabled to the FeatureFlags grouping of ConfigurationKeys - Make the DelayWcfInstrumentationEnabled TracerSetting internal
Sure thing! This was changed to |
…oLoad array in the Marshal'ed NativeCallTargetDefinition
This comment has been minimized.
This comment has been minimized.
…ation definitions that do not override the TargetMethodArgumentsToLoad array - Add additional bool field in NativeCallTargetDefinition to disambiguate between "Don't use the TargetMethodArgumentsToLoad feature" and "Use the feature but don't pass any arguments"
…iteBeginMethod constant
…AY_WCF_INSTRUMENTATION_ENABLED
…sses into a Bindings/Custom folder, and suppress CS0067 build warning
|
|
||
| public ushort TargetSignatureTypesLength; | ||
|
|
||
| [MarshalAs(UnmanagedType.U1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool vs BOOL in native side :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this fine or do I need to make a change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfectly fine!, you don't need to modify anything. I was just pointing out this line is required because you used bool instead of BOOL in the native struct.
tonyredondo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…hanging and getting unpredictable snapshots
Commit SHA at time of cherry-pick: 9ccf9dd Fix the native Windows tests Additional changes include: - Move CreateScope method from ChannelHandlerIntegration.cs to WcfCommon.cs
This comment has been minimized.
This comment has been minimized.
…x instead of 20s max for spans to reach the MockTracerAgent, in hopes that it fixes the flakiness in the snapshot tests where some snapshots don't have all expected spans.
…s flaky in CI where it does not consistently emit the server spans. Reset the snapshots to exclude the Async spans for now. Note: This didn't affect the NetTcpBinding with the new instrumentation because the Async endpoint wasn't working, and it didn't affect the CustomBinding because that doesn't return any spans.
Code Coverage Report 📊✔️ Merging #1992 into master will not change line coverage
View the full report for further details: Datadog.Trace Breakdown ✔️
The following classes have significant coverage changes.
The following classes were added in #1992:
View the full reports for further details: |
Benchmarks Report 🐌Benchmarks for #1992 compared to master:
The following thresholds were used for comparing the benchmark speeds:
Allocation changes below 0.5% are ignored. Benchmark detailsBenchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.SpanBenchmark - Same speed ✔️ Same allocations ✔️Raw results
|
Additional changes include: - Move CreateScope method from ChannelHandlerIntegration.cs to WcfCommon.cs
Additional changes include: - Move CreateScope method from ChannelHandlerIntegration.cs to WcfCommon.cs
Additional changes include: - Move CreateScope method from ChannelHandlerIntegration.cs to WcfCommon.cs Strip out all CallTarget changes and use the new CallTarget ByRef instrumentation to instrument WCF methods which have out parameters (thanks Tony!)
…ent variable (#2041) * Port changes from #1992 into the release/2.0 branch. Additional changes include: - Move CreateScope method from ChannelHandlerIntegration.cs to WcfCommon.cs - Instead of breaking the CallTarget native structures (causing the original PR to be reverted for 1.30.1), use the new CallTarget ByRef instrumentation to instrument WCF methods which have out parameters (thanks Tony!) - Remove testing of CallSite codepaths in WcfTests
* Upstream sync 4153faa..7fa9a93 * Don't throw in AgentWriterBenchmark (#2057) Currently we're throwing every time we flush in the `AgentWriterBenchmark`. That's obviously very expensive, variable, and makes for difficult comparisons. * Remove unused and defunct throughput test (#1802) * [AppSec] Waf fixes: reading register for waf and unit tests for alpine (#2060) * Remove musl suffix in TryLoadLibrary too * Don't read memory if no reg key could be read * typo fix * Remove logging info * simplify filter switch * Remove dependency from the shared project to main project (#2043) By including the dllmain.h file, we add a dependency between the shared project that contains the loader and the main project (today, the profiler C++ project) __ImageBase is a linker (Windows) trick that allows us to get the HINSTANCE and to remove the dependency * Revert "Add modified WCF CallTarget instrumentation via opt-in environment variable (#1992)" This reverts commit d009bf0. * [2.0] - Remove CallSite instrumentation (#1958) * Initial removal POC * first commit to remove .net 4.5 support * changes. * change rejithandler pointer to smart pointer * changes * Fixes and refactoring. * Add stats from the beginning to the end of a process * Remove unused clr_helpers functions. * Changes * Changes * Initial Remove of CallSite integrations. * Remove testing callsite instrumentation * Remove ADO.NET CallSite integrations * Changes * fix test * fix test * fix test * fix * remove integrations.json references * fix build * fix test * fix windows installer * changes * fix native tests. * fix merge * Remove integrations.json references. * -Remove integrations.json references -Fix Tests -CleanUp * Fixes * MSI installer fix * Improve cor profiler * Removes Datadog.Trace.ClrProfiler.Integrations namespace * Fix CallTargetNativeTest without relying on integrations.json file. * Upgrade ICorProfilerInfo6 to ICorProfilerInfo7. * remove unnecessary IntegrationName field. * Remove custom member data on tests. * - Enable NGEN by default in containers. - Fix Crank NGEN settings. * [2.0] - Native stats fixes (#1967) * Fixes a possible crash if Stats::ToString() is called more than once. * fix algorithm on multiple Refresh() calls. * [2.0] Additional cleanup from CallSite removal (#1966) * Remove remaining CallSite references from PrepareRelease and Dependabot file manager * Delete CallSite snapshots * Remove .CallTarget from snapshot file names (everything is calltarget) * Update expected snapshot names * Inline now-constant booleans * Remove references to DD_TRACE_NETSTANDARD_ENABLED and DD_TRACE_CALLTARGET_ENABLED - These keys doesn't do anything any more. * Remove unused "enableCallTarget" parameter This reverts commit 662c177ac6a96b20d18f2642e98635beb7d76220. * [2.0] Drop support for .NET < 4.6.1 (#1957) * Drop support for .NET < 4.6.1 * Switch header collections to struct (#1971) * Switch header collections to struct * Convert more collections and remove unused one * [2.0] Mark deprecated settings obsolete (#1974) * Mark DD_TRACE_ANALYTICS_ENABLED obsolete App Analytics was replaced by Tracing without Limits last year. * Mark DD_TRACE_LOG_PATH as deprecated Prefer DD_TRACE_LOG_DIRECTORY instead * Update public api * [2.0] Remove obsolete APIs (#1952) * Remove obsolete APIs * Add missing namespace * [2.0] - Refactor Integration struct, and remove unused code. (#1969) * Refactor Native Integration struct. * Remove unused code and refactor remaining for calltarget only. * Rename all WrapperType references to IntegrationType * Rename InegrationMethods to IntegrationDefinitions * [2.0] - Adds a simple agent port checker on CI Visibility scenarios. (#1972) * Adds a simple agent port checker on CI Visibility scenarios. * Use AgentWriter.Ping to do the check. * Restore target framework to netcoreapp3.1 * Address review comments. * Write to stdout only on error. * [2.0] Update default value of DD_TRACE_ROUTE_TEMPLATE_RESOURCE_NAMES_ENABLED to true (#1961) * Update default value of DD_TRACE_ROUTE_TEMPLATE_RESOURCE_NAMES_ENABLED to true * Update ASP.NET Core test snapshots * Update ASP.NET test snapshots * Update Datadog.Security.IntegrationTests to work with the new default AspNetCore spans * Remove unused files (#1985) * [2.0] Cleanup old factory method and tests (#1980) * Delete old "Create" method * Delete old tests that were never run, and wouldn't pass if they were These are superseded by proper integration tests anyway * Fix broken OpenTracing.IntegrationTests These were never run, and wouldn't have passed if they were. Given this is the only testing of the OpenTracing agent, converted these to use the MockTracerAgent instead of removing * Remove unused handlers and methods * Update public API * Remove commented out code * Remove unnescessary copies. (#1989) * [2.0] Make various APIs internal (#1968) * Made Tags and SpanTypes associated specifically with our automatic instrumentation internal * Mark duck types for integrations as not browsable Also mark Azure functions integration as not browsable * Make duck typing internal The DuckType itself had to stay Public, as it's invoked by the proxies themselves for converting types * Hide IgnoresAccessChecksToAttribute * Update public API Simplify test, because the API is now consistent between framework versions * Make some more tags internal * [2.0] Cleanup some more obsolete usages (#1997) * Add workaround for "Unable move and reuse existing repository to required location." * Delete EntityFramework6xMdTokenLookupFailure as this tested a callsite-specific code path * Stop setting the obsolete AnalyticsEnabled in test apps * [2.0] Add support for .NET 6 (#1885) Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com> * Update gitlab build to use .NET 6 SDK (#2010) * [2.0] Fix ASP.NET Core tests + snapshots (#2012) * Fix ASP.NET Core integration tests should disable new setting as well as enable it * Revert snapshot changes introduced in #1961 These _shouldn't_ have changed, the fact they did shows we did something wrong * Fix owin integration tests * Set AspNetMvc flag correctly * Fix WebApi2 tests * [2.0] Handle multiple tracer instances gracefully (#1990) * Add TracerManager for managing the lifetime of "inner" tracer objects The Tracer instance now delegates most of its behaviour to the TracerManager, and instead primarily becomes a "helper" on top of IScopeManager. This changes the "main" API for users from being the Tracer constructor, to the static Tracer.ReplaceGlobalSettings(). This makes the real behaviour clear. We maintain the existing constructor for compatibility, but mark it obsolete. The TracerManager is responsible for cleaning up resources when we replace global agentwriter/sampler etc. Currently it will _always_ replace the services, though we could make this smarter and not replace them if nothing has changed (which would reduce some overhead on startup for users that just setting a couple of things in code, like the service name) Note that we now create a singleton IScopeManager, as logs injection with lib log can not support multiple concurrent traces. Had to refactor Api.cs so as to not rely on Tracer.Instance. This commit temporarily removes the LibLog integration, it will be added back in the next commit * Update LibLogScopeEventSubscriber implementation It doesn't actually require a tracer instance, it just requires a scope manager and the Tracing Without limits properties. This correctly creates a new lib log instance if settings change, and disposes the old one. * Update tests and test apps - Add a helper method that avoids the old constructors (which replace the global tracer) - Replace some usages of the old APIs - Replace usages of Tracer.Instance that cause issues in CI otherwise - Fix snapshots * Update SpanExtensions to explicitly pass a tracer instance In the diagnostic observer (for example) this previously resulted in creating a separate Tracer instance (in practical terms, only during tests, but better to be explicit I think) * #pragma ignore uses of obsolete Tracer.Instance set in benchmarks * Move cleanup to a separate Task as it requires async work * Fix cleanup of old TracerManager to be proper fire and forget * Replace usages of obsolete API in benchmarks * Pass TracerSettings into SpanExtensions.SetHttpCode instead of Tracer * Use a callback to update sample rates instead of an ISampler instance * Fix sampling rate test * Feedback from PR * fixes to previous commit * Rename ReplaceGlobalSettings() -> Configure() (#2023) * fold `IntegrationIds.ElasticsearchNet5` into `ElasticsearchNet` (#1983) * [2.0] Use buster-based debian for building (#2032) * Revert debian dockerfile to use buster instead of bullseye * Update build_in_docker sdk version * [2.0] Add .NET 6 WebApplicationBuilder tests (#2016) * Add .NET 6 minimal APIs sample project * Add integration tests for .NET 6 minimal APIs Note that the NoFF /api/delay/0 tests have a different resource name than the equivalent ApiController version: `GET /api/delay/?` instead of `GET api/delay/{seconds}`. This is because we don't run the endpoint routing events, so we end up using the "default" resource names, which don't include the route templates, even though they're technically available. * Add snapshots for .net 6 webapplication/minimal api tests * Add explicit DiagnosticObserver tests for .NET 6's WebApplicationBuilder The most annoying thing here is how WebApplication adds an implicit UseRouting call at the start of the pipeline (before our existing middleware). That messes up the `UseMultipleErrorHandlerPipelines()` helper we had for testing multiple types of pipeline re-execution. The upshot is that to handle the same paths, we need to build the pipeline in slightly different ways. This is "fine", just a bit annoying. I couldn't see a good alternative way around it, other than separating the executions to entirely different apps (but didn't want to that refactor here) * Don't try and run diff tool inside CI * [2.0] Minor restructuring of logging files (#2040) * Move logging files to separate folders To make it clear what's related to "internal logging" and what's related to "logs injection" Keep the namespaces the same for now * Inline unused static variable * [2.0] - Detect current module at runtime and use it for PInvoke rewriting (#2044) * Initial current module function for windows. * Adds linux/osx version of the function * Use current module filepath for PInvoke rewrites. * Removing unnecessary code. * Adds validation on Initialization. * Add GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT flag. * [2.0] Various small DiagnosticObserver optimisations (#2024) * Make EndpointFeatureProxy a virtual class instead of an interface as slightly faster * Use request.PathBase.Value not request.PathBase Otherwise we fall back to string.Format() * Use separate ASP.NET Core tags implementations These more correctly represent the tags we can (and want) to set on the inner tags * Calculate "obfuscated" urls on request end, not request start We always throw away the calculation _except_ in error (404) cases or where middleware returns the response. By calculating it on exit, we avoid wasted work. I'm not entirely clear of the Azure Functions behaviour here, so kept existing behaviour for that. Also minor refactor to move the integrationsEnabled check outside of AspNetCoreHttpRequestHandler so we don't check it twice * Remove unneeded properties from RequestTrackingFeature We record these on the tags of the initial span anyway, so no need to duplicate * Avoid calling HttpContext.Request.GetUrl() where possible As this allocates. Do our best to avoid allocation by using PathBase and Path directly, recording the original values in the tracking feature. This assumes that the Scheme and Host don't change, but those seem like reasonable assumptions to make here * Remove unused field * Use TryDuckCast instead of DuckCast * Fix HttpRequestIn_PopulateSpan test as we no longer populate resource name at the start of the pipeline * [2.0] Add Linux installer smoke tests (#2036) * minor: don't install so much in update-github-status-jobs.yml * minor: remove checkout from generator steps * Add an ASP.NET Core smoke test app Sends a request to itself, confirms there's no errors, and crashes if there are. Enforces that the profiler is attached. Doesn't use the existing SampleHelpers to make building in docker easier * Add a smoke-test stage for the Linux installers - Builds and publishes an ASP.NET Core app using the .NET 6 SDK, for a specific framework version - Downloads the linux installer binaries from the current pipeline - Installs the binaries in a given runtime image - Runs the smoke test application. Fails the job on non-zero exit code * Add arm64 debian installer tests * Add testing of fedora images * Fix incorrect alpine tag for .NET Core 2.1 * [2.0] Relax duck type visibility requirements (#2045) * Add [DoesNotReturn] to throw helpers Fixes some null/code-flow IDE issues Remove unused exception * fix typo * Remove unused method * Remove visibility checks for DuckTypes As we no longer support NET45, we don't need our duck types to be public. * Couchbase instrumentation (#1925) * Instrumentation * ITests (no ARM) * Switch to using curl to send AzDo build GitHub status (#2049) Installing .NET 6 and using Nuke for this is a bit heavyweight, as it potentially slows down every stage by 2/3 mins. Using curl reduces that to 2s * Remove .bat files from repository root (#1763) * Remove script files from repository root This is obviously opinionated, but I don't use them, so noone else can either, right?! * Move pre-build-event-cpp.bat file to the native file (as that's where it's used anyway) * Revert "Couchbase instrumentation (#1925)" (#2054) This reverts commit 1bea1dfaa6e92f2c4736eed2defe5c608a848648. * Fix snapshots introduced in #1938 but not renamed in #1966 * [2.0] Add ImmutableTracerSettings (#1951) * Add ImmutableTracerSettings Changing TracerSettings after they've been passed to a Tracer instance is not valid and could lead to unpredictable results. To try and emphasise that, this adds a new ImmutableTracerSettings type, that does not allow changing any properties For backwards compatability, I've kept the TracerSettings object as not-obsolete, as the de-facto "builder" for ImmutableTracerSettings. This also avoids the need to use any base classes, avoids having to duplicate initialization code, and also avoids the need for a dedicated "builder" type for the immutable settings (we can't use `init set;` properties because of missing references unfortunately) * Make some mutable properties immutable Update the readonly properties on ImmutableTracerSettings to be "immutable" collections. The biggest change here is ImmutableTracerSettingsCollection, which is a little bit clumsy, but seemed to be the best way to make Integrations immutable-ish. * Add test that immutable tracer settings is immutable * Update public API * Add unit test to confirm ImmutableTracerSettings contains all (most) of the properties on TracerSettings * ImmutableIntegrationSettings is actually immutable! * Delete AdoNetExcludedTypes as will be removed in #2008 * Use IReadOnlyDictionary for HeaderTags and GlobalTags * Revert "Delete AdoNetExcludedTypes as will be removed in #2008" This reverts commit 8c6602d1c19c55084c66a5153fe437ae1fd22257. * Fix public API * [2.0] Simplify IntegrationSettingsCollection (#2048) * Replace usages of IntegrationInfo with IntegrationIds Thanks to the version conflict resolution, we will never have integrations where the integration name is unknown. Therefore we don't need to keep the dictionary of names, and we can use IntegrationIds directly instead of IntegrationInfo If a customer is using manual tracing, and calls TracerSettings.Integrations["SomeIntegration"] with an unknown integration name, we will now just return a "default" IntegrationSettings, and log a warning. Replaced all the old usages of IntegrationInfo with IntegrationIds * fix comment * Rename IntegrationIds -> IntegrationId * Fix obsolete API usages (#2056) * Allow to run ITests locally on Mac (#2063) * Restoring Couchbase instrumentation (#2064) * Revert "Revert "Couchbase instrumentation (#1925)" (#2054)" This reverts commit 803f27b573fc24a6cede4c3807ac502f49404744. With the addition of a few changes around the sample app, and added one method for particular v3 versions. Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com> * Try switch dependabot generation to use pull_request_target (#2055) * Try switch dependabot generation to use pull_request_target This is all a bit confusing, but I think this will resolve our permission issues, and it's safe because we're only checking out code from `master/main`, never the PR code itself * Bump .NET SDK version used in workflows * Remove the newline between the log message and properties (#2070) Currently the properties (e.g. `{ MachineName: ".", Process: "[44096 iisexpress]", AppDomain: "[2 /LM/W3SVC/1/ROOT-1-132820633973714685]", TracerVersion: "1.30.0.0" }`) are written on a new line _after_ the log line. This can make it difficult to tell whether a given set of properties is related to the line above or the line below. This change removes the newline. * [2.0] Split integration id `AdoNet` into individual ids per db type (#2008) Co-authored-by: Zach Montoya <zach.montoya@datadoghq.com> * remove unused class MethodBuilder and test references (#2074) * Fix disabled integrations log message (#2078) Was trying to serialize the whole integration setting * [AppSec] Upgrade WAF rule set (#2086) * Upgrade WAF rule set * Update ruleset in unit tests too and waf tests Co-authored-by: Anna <anna.yafi@datadoghq.com> * [AppSec] Change waf initialization to have better logging (#2068) * Refactoring of waf initialization, no more static constructor, split classes * Link the ruleset instead of duplicating it in unit tests * Remove CIApp Feature Tracking. (#2088) * Just add a pointer to the ms doc (#2093) * Add space in log line (#2094) * [AppSec] Report attacks by the _dd.appsec.triggers tag (#1923) * Report attacks by the _dd.appsec.triggers tag It's been decided to sent the app sec event data via a tag to remove the need for the backend to correlated two data sources. I also used this to fix several related issue: - report all attacks generated by the waf - report all matches within an attack using the new rule_matches field - add the standard logs about attack data APPSEC-1839 APPSEC-1541 APPSEC-983 Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com> Co-authored-by: Pierre Bonet <pierre.bonet@datadoghq.com> * Redesign package version generator to allow more granular control (#2091) * Bump package versions with existing task * Delete unused file * Rename Program -> PackageVersionGenerator * Expand PackageVersionEntry format This allows us to be more flexible about which package versions we run. This is necessary as certain frameworks that we support cease being supported by NuGet packages (e.g. npgsl 6.0 isn't supported on netcoreapp2.1) Takes the target frameworks from the samples themselves, so shouldn't need updating often * Run GeneratePackageVersions with new format The generated files are much more granular, but should be equivalent * Downgrade tests on MSTest to 2.2.7 - there's a bug in the instrumentation of 2.2.8 * CallTarget ByRef (#2092) * Initial work for calltarget by ref. * Fix CallTarget ref * Fixes * Fix mapper. * Fixes. * revert config. * Ensure compatibility when using and old native profiler by keeping an overload without the ref. * Ensure profiler compatibility with older managed version, by a new PInvoke to enable byref instrumentation. * Fix native test. * Ensures arguments are by ref in the managed side. * Add helper for collecting host metadata (#2097) * Add helper for collecting host metadata i.e. Hostname, Linux kernel information Required for some new features (e.g. log shipping, telemetry) * Use EnvironmentHelpers and add named params * CallTarget ByRef: Fix ByRef Generics arguments (#2099) * Fix ByRef Generic arguments * Update tracer/src/Datadog.Trace/ClrProfiler/CallTarget/Handlers/IntegrationMapper.cs Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com> Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com> * [Appsec] Fix duplicate key in cookies when building waf args (#2098) * Fix duplicate key in cookies * Make sure no duplicate key * Uniformize order * Fix looping through cookies, otherwise always same value * Add lists for all headers, cookies, query strings to make sure we handle them in case of different IIS behavior * Specializing for .net core with try add * Fix else query string * [2.0] Add modified WCF CallTarget instrumentation via opt-in environment variable (#2041) * Port changes from DataDog/dd-trace-dotnet#1992 into the release/2.0 branch. Additional changes include: - Move CreateScope method from ChannelHandlerIntegration.cs to WcfCommon.cs - Instead of breaking the CallTarget native structures (causing the original PR to be reverted for 1.30.1), use the new CallTarget ByRef instrumentation to instrument WCF methods which have out parameters (thanks Tony!) - Remove testing of CallSite codepaths in WcfTests * [2.0] remove leftover references to `integrations.json` and `DD_INTEGRATIONS` (#2073) * remove references to integrations.json and DD_INTEGRATIONS * fix build step description (but don't change it's name to avoid breaking CI) * remove DD_INTEGRATIONS from AppSec loader * Change the CallTargetReturn struct to a ref struct (#2103) * Update supported ADO.NET versions (#2077) * Bump supported versions of ADO.NET providers * Update generator definitions Note that we have not been testing the latest Npgsql release Also Npgsql 6.0.0 does not support netcoreapp2.1 or netcoreapp3.0 * Update package versions * Don't use Assembly.LoadFrom() with v4.0.0 as it crashes Looks unrelated to the profiler, prob a bug in 4.0.0 * Version 4.0.0 doesn't work on Linux (can't connect) * Explicitly set CheckEolTargetFramework=false in linux sample builds For some reason these aren't picking it up from elsewhere * Microsoft.Data.Sqlite hangs on Alpine with 6.0.0 on .NET Core 3.0 So skip the test. It hangs whether we are attached or not, so skip the test. * Fix host metadata kernel information (#2104) * update CLSID_CorProfiler after upstream * update old log path after rebase * fix max System.ServiceModel version. There is no 5.0 version * fix DD_ env variables * env variables - ultimate ultimate-pipeline * propagate Propagators from TracerSettings to ImmutableTracerSettings * use TestHelper class instead of direct call to Tracer constructor * Use real command for CreateDbCommandScope_SetsDbStatementBasedOnCommandText test There is no longer support dor non standard IDbCommands implementations * remove not used property ConfigurationSource * remove setter from ImmutableTracerSettings * fix TracerManager documentation comment * propagate plugins from Instrumentation class * bump donet sdk to 6.0 for CI pipeline * fix compilation issues after next-ver merge * remove .net 4.5.2 from integration tests and upstream sync documentation support was dropped on upstream * fix compilation issues * remove leftovers for SIGNALFX_INTEGRATIONS * install all neded dotnet versions * fix reference to snk file * use DataDog.Agent exporter for ReplacingGlobalTracerManagerMidTraceWritesTheTrace * use DataDogAgent exporter in OpenTracingSendTracesToAgent * fix internalsvisible to argument for CallTargetNativeTest * update verified files * fix OpenTracingSendTracesToAgent * verified files for .NET6 * verified files for .NET5, .NET 4.6.1 .NET Core 3.1 * fix Couchbase tests * fix HttpContextKey typos * update WcfTests verified files * update OwinWebApi2Tests based on next-ver branch * fix launchSettings.json form Samples.Couchbase * update documentation after upstream sync * Skip Couchbase3 tests locally, upstream code ends with the same issue: Expecting at least 9 spans, only received 0 * update upstream sync documentation Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com> Co-authored-by: Colin Higgins <colin-higgins@users.noreply.github.com> Co-authored-by: Anna <anna.yafi@datadoghq.com> Co-authored-by: Gregory LEOCADIE <gregory.leocadie@datadoghq.com> Co-authored-by: Tony Redondo <tony.redondo@datadoghq.com> Co-authored-by: Kevin Gosse <krix33@gmail.com> Co-authored-by: Zach Montoya <zach.montoya@datadoghq.com> Co-authored-by: Lucas Pimentel-Ordyna <lucas.pimentel@datadoghq.com> Co-authored-by: Pierre Bonet <pierre.bonet@datadoghq.com> Co-authored-by: Robert Pickering <robert.pickering@datadoghq.com>
## Summary of changes Fixes orphan traces in WCF ## Reason for change Many customers have reported that in v3, with "delayed WCF" instrumentation enabled, they get orphan spans. This can be easily reproduced, so this PR fixes the issue. ## Background There are two "modes" WCF can operate in: delayed instrumentation, and "simple" instrumentation. Delayed instrumentation was introduced in #1992 to avoid a problematic error scenario. Instead of instrumenting early across the whole "channel", we delayed instrumentation. A subsequent PR #5206 updated the instrumentation point to ensure the wcf span was active while `IDispatchMessageInspector` were running. This was somewhat complex, due to the convoluted and legacy way wcf uses async and threading. Basically, we use a weak table to hold the current span, and us the `MessageRpc` object as a key. Later on, we retrieve the scope and dispose it. ## Implementation details The missing part was that we were not "restoring" the wcf scope for when the actual Invoker implementations executed. The process was: - Update the tests and snapshots to: - Record _all_ spans and traces (it makes it easier to [understand the snapshots using](https://andrewlock.github.io/DatadogSpanSnapshotViewer/)) - Create a span inside the "service" calls. - These _should_ have been children of the wcf server spans, but they were not. They were orphaned - Update the "Invoker" implementations to grab the wcf spans, make it active, and then deactivate it afterwards - Regenerate the snapshots, and confirm the custom spans are no longer orphaned 🎉 Additionally, do a small amount of refactoring: - "Fix" nullability in some places - Extract the "restore wcf span" functionality to `WcfCommon` as it's used in multiple places ## Test coverage Covered by existing, after the tweaks above. The snapshots change like this: Before the fix, after adding the additional custom spans, `WSHttpBinding` traces: <img width="1468" height="622" alt="image" src="https://github.com/user-attachments/assets/7973514c-660e-4871-8c4d-7495473e9944" /> After the fix, `WSHttpBinding` traces: <img width="1459" height="96" alt="image" src="https://github.com/user-attachments/assets/0698e5d3-cb06-4ea6-ab00-5a3680f7e3ee" /> Before the fix, after adding the additional custom spans, WebHttp traces: <img width="1458" height="268" alt="image" src="https://github.com/user-attachments/assets/b0a9f4c9-8584-4fa2-9441-74268d6fd279" /> After the fix, Web Http traces <img width="1465" height="126" alt="image" src="https://github.com/user-attachments/assets/b06f0255-0818-4516-a6a1-b1807c9781ad" /> Unfortunately, this ends up being somewhat unstable in regards to sorting, so I did some more refactoring to separate the root span for each separate httpcall. This should make sure we have more stable ordering of the spans using the built-in sort order The result is we get separate traces for each API call instead: <img width="1458" height="713" alt="image" src="https://github.com/user-attachments/assets/989a0cc8-befa-429f-9a9e-ede09c2abc48" /> ## Other details I wonder if we really need a weak table here? 🤔 seeing as we have an instrumentation point that explicitly disposes the span, shouldn't we just remove the entry at the same time?
Issue
Our current WCF instrumentation (both CallSite and CallTarget) targets
System.ServiceModel.Dispatcher.ChannelHandler.HandleRequest, which is very early in the callstack before a lot of the other WCF infrastructure is called, so any exceptions that propagate outside of the instrumentation will crash the server. For our instrumentation to create meaningful spans that are also connected to distributed parents, we must access the request message to gather information and create the span.Unfortunately, in one known scenario, the first access to the request message should throw an exception that the WCF infrastructure should catch and it should cease processing of that request. Since we currently swallow exceptions during scope creation, the message continues to be processed and the application behavior is changed. Even if we decided to throw an exception, because we instrument a method before the main message processing loop, we would actually crash the server instead of stopping the processing of that one message. As a result, the only fix for this scenario is to defer our instrumentation until later in the WCF infrastructure code.
Since this is a big change and our WCF server testing is not as robust as other integrations, this change is being implemented by a feature flag.
Changes
MessageRequestobjectOnMethodBegincallbacks. This allows us to now instrument methods that haveref/in/outparameters, as long as we bypass those specific parametersIOperationInvokerimplementations, which is the last bit of WCF infrastructure code before the actual WCF services are invokedSyncMethodInvokerAsyncMethodInvokerTaskMethodInvokerDD_TRACE_DELAY_WCF_INSTRUMENTATION_ENABLEDTesting
Samples.Wcftest application to invoke all 3 built-inIOperationInvokerimplementations:SyncMethodInvoker,AsyncMethodInvoker, andTaskMethodInvokerSyncMethodInvokerSamples.Wcftest application with a new custom binding that throws an exception on the firstRequestMessageaccess, exposing the issue with our existing CallSite and CallTarget WCF instrumentationWcfTestsintegration test to exercise all 4 bindingsWcfTestsWork Remaining
The
AsyncMethodInvoker(BeginXXXandEndXXX) does not work correctly with the built-inNetTcpBinding. When invoking theEndXXXmethod, the ambient Tracer scope returnsnull. In order to get this critical fix out more quickly, I propose deferring this work until we find a user that runs into this.@DataDog/apm-dotnet