Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(agent): Add aws-sdk-php SQS instrumentation #1003

Open
wants to merge 26 commits into
base: feat/message_segment
Choose a base branch
from

Conversation

zsistla
Copy link
Contributor

@zsistla zsistla commented Dec 30, 2024

Uses message segments to create SQS instrumentation.

Added unit tests and multiverse tests.

SQS commands sendMessage, sendMessageBatch, receiveMessage are instrumented
to create message segment spans.
Added unit tests.
* Added helper func to get command arg value for reuse with future AWS instrumentations.
* Added unit tests for helper functions.
Separated cloud attributes from message segment.
They can be applied to multiple segment types and don't belong in typed attributes.
It will make it easier to add to other segments in the future.

Added a function to add the cloud attributes.

Reworked unit tests with new functionality.
SQS commands sendMessage, sendMessageBatch, receiveMessage are instrumented
to create message segment spans.
Added unit tests.
* Added helper func to get command arg value for reuse with future AWS instrumentations.
* Added unit tests for helper functions.
Separated cloud attributes from message segment.
They can be applied to multiple segment types and don't belong in typed attributes.
It will make it easier to add to other segments in the future.

Added a function to add the cloud attributes.

Reworked unit tests with new functionality.
@zsistla zsistla force-pushed the feat/sqs_instrumentation branch from b342fc5 to 352f85f Compare December 30, 2024 14:11
@newrelic-php-agent-bot
Copy link

newrelic-php-agent-bot commented Dec 30, 2024

Test Suite Status Result
Multiverse 0/8 passing
SOAK 60/72 passing

@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2024

Codecov Report

Attention: Patch coverage is 44.18605% with 72 lines in your changes missing coverage. Please review.

Please upload report for BASE (feat/message_segment@925b94c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
agent/lib_aws_sdk_php.c 43.08% 70 Missing ⚠️
axiom/nr_segment_message.c 50.00% 2 Missing ⚠️
Additional details and impacted files
@@                   Coverage Diff                   @@
##             feat/message_segment    #1003   +/-   ##
=======================================================
  Coverage                        ?   78.01%           
=======================================================
  Files                           ?      197           
  Lines                           ?    27421           
  Branches                        ?        0           
=======================================================
  Hits                            ?    21392           
  Misses                          ?     6029           
  Partials                        ?        0           
Flag Coverage Δ
agent-for-php-7.2 78.16% <66.66%> (?)
agent-for-php-7.3 78.18% <66.66%> (?)
agent-for-php-7.4 77.89% <66.66%> (?)
agent-for-php-8.0 77.26% <66.66%> (?)
agent-for-php-8.1 77.75% <44.18%> (?)
agent-for-php-8.2 77.35% <44.18%> (?)
agent-for-php-8.3 77.35% <44.18%> (?)
agent-for-php-8.4 77.37% <44.18%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

agent/lib_aws_sdk_php.h Outdated Show resolved Hide resolved
agent/lib_aws_sdk_php.h Outdated Show resolved Hide resolved
agent/lib_aws_sdk_php.h Outdated Show resolved Hide resolved
* Start a new segment to contain the message segment.
* The new segment is started in a new "before" wrapper.
* If we created a message segment, the segment is ended; otherwise we need to end it before exiting the wrapper.
* Give the calling segment a more meaningful name so we see something like `Aws\\Sqs\\SqsClient::sendMessage` instead of seeing `Aws/AwsClient::__call`
agent/lib_aws_sdk_php.c Outdated Show resolved Hide resolved
@zsistla zsistla requested review from ZNeumann and mfulb January 9, 2025 13:54

/*
* Purpose : The second argument to the Aws/AwsClient::__call function should be
* an array containing an array of argument name:value pairs. Given an argument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it an array containing an array or is it just an array containing arguments to the command stored as name_of_an_argument => value?

Copy link
Contributor Author

@zsistla zsistla Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an array containing an array of name:value pairs that represent the arguments passed to the original called command_name function.

agent/lib_aws_sdk_php.h Outdated Show resolved Hide resolved
agent/lib_aws_sdk_php.c Outdated Show resolved Hide resolved
zsistla and others added 5 commits January 9, 2025 14:58
Co-authored-by: Michael Fulbright <89205663+mfulb@users.noreply.github.com>
Co-authored-by: Michal Nowacki <mnowacki@newrelic.com>
}

nr_free(command_name_string);
nr_free(real_class_and_command);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this cleanup for real_class_and_command happen in the if block above since it would make it more obvious when reading the code that this allocation is released?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored before reading this comment, but I think the refactor addressed this concern.
2a74246

Research showed comparing strlen first significantly reduced lookup times, especially in non-matching cases.
Refactor to use the already computed strlen in the zval/zend_string.
This involved removing the helper function which took the zval/zend_string and strduped it onto a char*.
Since we weren't helping by converting to string anymore, keeping the helper function would have meant having to do the less optimal options of the calling function remember to nr_php_arg_release an arg it didn't directly get, copy the zval, add a ref to the zval that would need to be freed.
Simpler just to extract the functionality back to the main function.  One positive of helper functions is they are easier to test, but shouldn't be at the detriment of code readability or optimization. nr_php_arg_get was already directly giving us the zval/zend_string availability that we wanted.
* Find the pattern of the AWS queueurl that should immediately precede the
* region.
*/
if (0 != strncmp(queueurl_pointer, "https://sqs.", 12)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our conversation - replace hard coded numbers with a #define and sizeof().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see: 09f10e8

* nr_lib_aws_sdk_php_sqs_parse_queueurl checks for NULL so safe pass
* command_arg_value directly in.
*/
nr_lib_aws_sdk_php_sqs_parse_queueurl(command_arg_value, &message_params,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider an optimization where a copy of command_arg_value is passed in and then the parsing function can modify this buffer in place and return pointers to the fields within this buffer to avoid strdup'ing the 3 fields in the parsing function. Also removes the need to free these later. Just need to document the parse function REQUIRES a copy that can be modified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see: f48df3f

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold off on looking at this, it needs additional rework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests needed to be updated for this refactor to pass modifiable strings not literals.
0b194c2

@@ -1260,7 +1260,10 @@ static inline void nr_php_execute_segment_add_metric(
if (create_metric) {
nr_segment_add_metric(segment, buf, true);
}
nr_segment_set_name(segment, buf);
if (!segment->name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this related to SQS instrumentation? This is a code path run for all instrumentation so I'm trying to understand why this general a change is required for SQS and if this is not something we've handled differently before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required for just SQS but for all AWS instrumentation.
Because we instrument the Aws/AwsClient::__call and decode the calling command, we are able to give a more meaningful name here.

So, instead of naming the segment Aws//AwsClient::__call we can name it Aws\\Sqs\\SqsClient::sendMessage.

However, when the segment was getting set in php_execute, it wasn't checking if a more meaningful name had already been assigned. This just checks if a name has already been applied to the segment, and if so don't overwrite it; otherwise, continue as normal.

Copy link
Contributor

@mfulb mfulb Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this means up till this PR the segment would never have been assigned a name before this point by any existing instrumentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

Copy link
Contributor

@mfulb mfulb Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if this line of code was doing something similar for external segments and how it might interplay with this change?

segment->name = nr_string_add(segment->txn->trace_strings, buf);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

External naming will be unaffected by this change (as also seen by the passing multiverse/integration tests).

That line in external is doing the same thing message is doing here.
Datastore, external, and message all manually start a segment, add all the typed attributes and category names, then manually end the segment with the appropriate name so those manually started/stopped segments will never be closed in the way that will try and give it another name.

This will affect generic segments that are closed in func_end that we just grab the function name as a way of naming.

In this particular case in this PR, it's naming the segment that has a manually started/stopped message segment as its child.

Comment on lines +110 to +131
if (command_name_len == AWS_SQS_SEND_MESSAGE_BATCH_COMMAND_LEN
&& 0
== nr_strncmp(AWS_SQS_SEND_MESSAGE_BATCH_COMMAND,
command_name_string,
AWS_SQS_SEND_MESSAGE_BATCH_COMMAND_LEN)) {
message_params.message_action = NR_SPANKIND_PRODUCER;
} else if (command_name_len == AWS_SQS_SEND_MESSAGE_COMMAND_LEN
&& 0
== nr_strncmp(AWS_SQS_SEND_MESSAGE_COMMAND,
command_name_string,
AWS_SQS_SEND_MESSAGE_COMMAND_LEN)) {
message_params.message_action = NR_SPANKIND_PRODUCER;
} else if (command_name_len == AWS_SQS_RECEIVE_MESSAGE_COMMAND_LEN
&& 0
== nr_strncmp(AWS_SQS_RECEIVE_MESSAGE_COMMAND,
command_name_string,
AWS_SQS_RECEIVE_MESSAGE_COMMAND_LEN)) {
message_params.message_action = NR_SPANKIND_CONSUMER;
} else {
/* Nothing to do here so exit. */
return;
}
Copy link
Member

@lavarou lavarou Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has so much indirection and requires jumping back and forth to check if the macros have correct values and what those values actually are. Wouldn't this be easier to read and maintain:

Suggested change
if (command_name_len == AWS_SQS_SEND_MESSAGE_BATCH_COMMAND_LEN
&& 0
== nr_strncmp(AWS_SQS_SEND_MESSAGE_BATCH_COMMAND,
command_name_string,
AWS_SQS_SEND_MESSAGE_BATCH_COMMAND_LEN)) {
message_params.message_action = NR_SPANKIND_PRODUCER;
} else if (command_name_len == AWS_SQS_SEND_MESSAGE_COMMAND_LEN
&& 0
== nr_strncmp(AWS_SQS_SEND_MESSAGE_COMMAND,
command_name_string,
AWS_SQS_SEND_MESSAGE_COMMAND_LEN)) {
message_params.message_action = NR_SPANKIND_PRODUCER;
} else if (command_name_len == AWS_SQS_RECEIVE_MESSAGE_COMMAND_LEN
&& 0
== nr_strncmp(AWS_SQS_RECEIVE_MESSAGE_COMMAND,
command_name_string,
AWS_SQS_RECEIVE_MESSAGE_COMMAND_LEN)) {
message_params.message_action = NR_SPANKIND_CONSUMER;
} else {
/* Nothing to do here so exit. */
return;
}
#define COMMAND_IS(CMD) \
(command_name_len == (sizeof(CMD) - 1) && nr_streq(CMD, command_name_string))
/* Determine if we instrument this command. */
if (COMMAND_IS("sendBatch")) {
message_params.message_action = NR_SPANKIND_PRODUCER;
} else if (COMMAND_IS("sendMessage")) {
message_params.message_action = NR_SPANKIND_PRODUCER;
} else if (COMMAND_IS("receiveMessage")) {
message_params.message_action = NR_SPANKIND_CONSUMER;
} else {
/* Nothing to do here so exit. */
return;
}
#undef IS_COMMAND

Suggested change avoids the redirection, and also avoids defining and maintaining the macros for each command and its length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion is something to consider but is not blocking approval.

Copy link
Contributor Author

@zsistla zsistla Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the macro makes the code more readable, but it means we've got to compute the size of a known string every single time we go through the handle_sqs code.

The defines were both to avoid hardcoding values as well as to avoid having to compute sizeof/strlen for those three values every time we go through the code.

For the case where we match the last string, the numbers for computing the size everytime vs having it already computed is:
use macro duration =979650
precomputed sizes duration =957323

For the case where no strings match:
macro=62778
precomputed=56180


We could combine approaches to improve readability while maintaining the optimization.

The macro could be something similar to:


   #define COMMAND_IS(CMD) \
  (command_name_len == AWS_ ## CMD ## _COMMAND ## _LEN && (nr_streq(AWS_ ## CMD ## _COMMAND, command_name_string)))

then the code could look like:

  /* Determine if we instrument this command. */
  if (COMMAND_IS(SQS_SEND_MESSAGE_BATCH)) {
    message_params.message_action = NR_SPANKIND_PRODUCER;
  } else if (COMMAND_IS(SQS_SEND_MESSAGE)) {
    message_params.message_action = NR_SPANKIND_PRODUCER;
  } else if (COMMAND_IS(SQS_RECEIVE_MESSAGE)) {
    message_params.message_action = NR_SPANKIND_CONSUMER;
  } else {
    /* Nothing to do here so exit. */
    return;
  }
#undef IS_COMMAND

The above improves readability but as seen below this still keeps the performance improvements
macro comparison when len/string matches:
macro that computes lengths each time 924522
macro that uses precomputed lengths 866615

macro comparison when len/string matches:
macro that computes lengths each time 68946
macro that uses precomputed lengths 65715

Comment on lines +112 to +114
== nr_strncmp(AWS_SQS_SEND_MESSAGE_BATCH_COMMAND,
command_name_string,
AWS_SQS_SEND_MESSAGE_BATCH_COMMAND_LEN)) {
Copy link
Member

@lavarou lavarou Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the advantage of using nr_strncmp over nr_strncmp or nr_streq? nr_streq makes the code most readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strncmp was needed since the defined strings had an extra char on the end that we could account for if we used the precomputed length instead of the length computed in nr_streq.
This could be made more readable with a macro.

Comment on lines +370 to +415
if (NULL != klass) {
/* Get the arg command_name. */
command_name = nr_php_arg_get(1, NR_EXECUTE_ORIG_ARGS);

if (nr_php_is_zval_non_empty_string(command_name)) {
command_name_string = Z_STRVAL_P(command_name);
klass_len = nr_php_class_entry_name_length(class_entry);

if (klass_len == AWS_SDK_PHP_SQSCLIENT_CLASS_LEN
&& nr_striendswith(klass, klass_len,
AWS_SDK_PHP_SQSCLIENT_CLASS_SHORT,
AWS_SDK_PHP_SQSCLIENT_CLASS_SHORT_LEN)) {
nr_lib_aws_sdk_php_sqs_handle(auto_segment, command_name_string,
Z_STRLEN_P(command_name),
NR_EXECUTE_ORIG_ARGS);
}

if (NR_SEGMENT_CUSTOM == auto_segment->type) {
/*
* We need to end the segment that we started in the 'before' wrapper if
* it wasn't handled and ended by the handling function. Handling
* function would have changed the segment type from from default
* (`NR_SEGMENT_CUSTOM`) if it ended it.
*/
nr_segment_discard(&auto_segment);
}

/*
* Since we have klass and command_name, we can give the calling segment
* a more meaningful name than Aws/AwsClient::__call We can decode it to
* Aws/CALLING_CLASS_NAME/CALLING_CLASS_CLIENT::CALLING_CLASS_COMMAND
*
* EX: Aws\\Sqs\\SqsClient::sendMessage
*/

segment = nr_txn_get_current_segment(NRPRG(txn), NULL);
if (NULL != segment) {
real_class_and_command
= nr_formatf("Custom/%s::%s", klass, command_name_string);
nr_segment_set_name(segment, real_class_and_command);
nr_free(real_class_and_command);
}
}
/* Release the command_name. */
nr_php_arg_release(&command_name);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newrelic-php-agent's code base doesn't seem to have a standard around structuring the code. Golang has the following standard defined in Effective Go:

[...] a common situation where code must guard against a sequence of error conditions. The code reads well if the successful flow of control runs down the page, eliminating error cases as they arise. Since error cases tend to end in return statements, the resulting code needs no else statements.

The readability of the above could could definitely benefit from applying Effective Go standards. Consider:

  if (NULL == klass) {
    return;
  }
  ...
  if (!nr_php_is_zval_non_empty_string(command_name)) {
    nr_php_arg_release(&command_name);
    return;
  }
  

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion is something to consider but is not blocking approval.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nr_lib_aws_sdk_php_sqs_parse_queueurl follows the Effective Go standards and is easy to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but it's not a simple return that can occur since it needs a return value.
I can however, make a goto end so that we'll skip to the bottom then all the logic here can execute for the actual return value.

Add a test case for get_command_arg_value  when queueUrl exists but is not a string.
Also required passing the expected values as NULLs.
*/
nr_free(cloud_attrs.cloud_region);
nr_free(message_params.destination_name);
nr_free(cloud_attrs.cloud_account_id);
}

Copy link
Contributor

@mfulb mfulb Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nr_lib_aws_sdk_php_sqs_parse_queueurl should set the relevant fields in message_params and cloud_attrs to NULL at the start as the function can exit w/o actually storing these values. This could lead to issues when the caller uses these uninitialized values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values are already being initialized here.
All cloud_attrs start as NULL, all message_params not explicitly set are set to NULL during that initialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is a question of expectations - my initial expectation was that this function to NULL those fields out if the queryurl was unparsable to reflect no data was found to fill them in properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those 3 fields are already set to NULL by the caller who NULLs all the unknown fields prior to finding values for them. Caller is responsible for initializing the structs, not parseQ.
parseQ should only modify the fields if it has something meaninful to add; otherwise, it should leave them untouched. At some point in future, we might give any of these values "unknown" or something as default then pass it to parseQ, we wouldn't want parseQ to ovewrite the default with NULL. parseQ is only reponsible setting the values if they are known; otherwise, caller is responsible for setting struct values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants