Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose time spent collecting cycles in gc_status() #11523

Merged
merged 14 commits into from
Jul 16, 2023

Conversation

arnaud-lb
Copy link
Member

@arnaud-lb arnaud-lb commented Jun 24, 2023

See #11327

This adds the following new entries to the gc_status() result:

  • application_time: Total application time (including collector_time)
  • collector_time: Time spend collecting cycles (including destructor_time and free_time)
  • destructor_time: Time spend executing destructors during cycle collection
  • free_time: Time spend freeing values during cycle collection

Times have nanoseconds resolution and use a monotonic clock. They are exposed as a number of seconds in floats, because the int type may be too small for nanoseconds depending on the platform.

application_time could be computed with hrtime(true) - $_SERVER['REQUEST_TIME_FLOAT'], but it seems useful to still expose it because $_SERVER['REQUEST_TIME_FLOAT'] is not monotonic.

@sebastianbergmann
Copy link
Contributor

I cannot comment on the implementation, but this looks like what I asked for in #11327. Thank you for working on this!

@@ -148,6 +148,12 @@ ZEND_FUNCTION(gc_status)
add_assoc_long_ex(return_value, "threshold", sizeof("threshold")-1, (long)status.threshold);
add_assoc_long_ex(return_value, "buffer_size", sizeof("buffer_size")-1, (long)status.buf_size);
add_assoc_long_ex(return_value, "roots", sizeof("roots")-1, (long)status.num_roots);

/* Using double because zend_long may be too small on some platforms */
add_assoc_double_ex(return_value, "application_time", sizeof("application_time")-1, (double) status.application_time / ZEND_NANO_IN_SEC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the init size above should be updated

also are/can these strings be interned for better perf?

what is the time measurement and gc_status() itself overhead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the initial array size, thank you.

Regarding the time measurement overhead:

The synthetic benchmarks here and here show no significant overhead over multiple runs. The number of 10k nodes was chosen because this is the default GC threshold. Applications will typically have more nodes to scan, and their measurement_overhead/actual_work ratio will be lower.

I've seen no significant overhead on a long running application.

Regarding the overhead on gc_status() itself, it would matter if the function was called a high number of times during a request. Do you think it will be the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the overhead on gc_status() itself, it would matter if the function was called a high number of times during a request. Do you think it will be the case?

In PHPUnit 10, gc_status() is called for each event that is emitted. This is "a lot", but IMO perfectly fine in a development context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no slowdown on the PHPUnit 10 test suite. However, gc_status() was called only 128 times. Is there a configuration that would cause more events to be triggered?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at the wrong file: gc_status() is actually called more than 19k times during the test suite and the changes in this PR caused no slowdown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PHPUnit's test suite (main branch) currently has 2845 tests. For each test, multiple events are emitted. For each event, gc_status() is called. I am confused that you only see 128 calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! :)

Zend/zend_time.h Outdated

#define ZEND_NANO_IN_SEC 1000000000

typedef uint64_t zend_time_t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer this name to keep hr, since this clearly helps to identify the underlying abstraction used.

So zend_hrtime_t maybe, including names of the files etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that zend_time is too general. I've renamed to zend_hrtime. Thank you!

@@ -1,56 +0,0 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to move this? It should be fine to include an ext/standard header, as standard is a required extension.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not absolutely necessary, but it felt right to not have Zend/ depend on ext/ (for no particular reason other than Zend/ being a lower layer than ext/)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving code has a few disadvantages:

  • It makes reviewing harder, because it's difficult to determine what code is new and what code was just moved. Sometimes small changes can sneak into moved code too, which is very hard to catch without using a diff tool.
  • It creates merge conflicts when merging changes to the moved code. This is more of a problem for us than a "standard" code base because we have 3-4 active branches.
  • It breaks git blame.

Of course, not refactoring also has downsides, e.g. leading to big, messy files, or making things hard to find.

I don't mind either way, you can decide what you see fit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your points. In this case I will retain the relocation of the code because, since I've also moved most of it from the .c to the .h for inlining purposes, keeping it in ext/ would not cancel all of these disadvantages.

@@ -439,6 +445,9 @@ static void gc_globals_ctor_ex(zend_gc_globals *gc_globals)

gc_globals->gc_runs = 0;
gc_globals->collected = 0;
gc_globals->collector_time = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is setting activated_at not desired here?

Zend/zend_gc.h Outdated
@@ -24,6 +24,8 @@
# define GC_BENCH 0
#endif

#include "zend_hrtime.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We usually put includes first.

@sebastianbergmann sebastianbergmann linked an issue Jun 27, 2023 that may be closed by this pull request
UPGRADING Outdated Show resolved Hide resolved
UPGRADING Outdated Show resolved Hide resolved
UPGRADING Outdated Show resolved Hide resolved
sebastianbergmann added a commit to sebastianbergmann/phpunit that referenced this pull request Jun 28, 2023
@arnaud-lb arnaud-lb marked this pull request as ready for review June 28, 2023 09:35
Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see big problems.
The overhead of the patch is minimal and the collected information may be useful.

sebastianbergmann added a commit to sebastianbergmann/phpunit that referenced this pull request Jul 10, 2023

#endif

zend_result zend_startup_hrtime(void)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value is unused and isn't marked as ZEND_API. Any reason it shouldn't be void zend_startup_hrtime(void)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was existing code I moved from ext/standard/hrtime.c, and I tried to limit the amount of changes. But you are right, the return value is unnecessary, so I've removed it.


#endif

#define ZEND_NANO_IN_SEC 1000000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would avoid the need to cast this value:

Suggested change
#define ZEND_NANO_IN_SEC 1000000000
#define ZEND_NANO_IN_SEC UINT64_C(1000000000)

return (zend_hrtime_t)mach_absolute_time() * zend_hrtime_timerlib_info.numer / zend_hrtime_timerlib_info.denom;
#elif ZEND_HRTIME_PLATFORM_POSIX
struct timespec ts = { .tv_sec = 0, .tv_nsec = 0 };
if (0 == clock_gettime(CLOCK_MONOTONIC, &ts)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect branch predictors to handle this well, but I don't think this could hurt in this case:

Suggested change
if (0 == clock_gettime(CLOCK_MONOTONIC, &ts)) {
if (EXPECTED(0 == clock_gettime(CLOCK_MONOTONIC, &ts))) {

@arnaud-lb arnaud-lb merged commit d073193 into php:master Jul 16, 2023
12 checks passed
arnaud-lb added a commit to arnaud-lb/php-src that referenced this pull request Jul 16, 2023
* up/master: (571 commits)
  Expose time spent collecting cycles in gc_status() (php#11523)
  Warn when fpm socket was not registered on the expected path
  Implement DOMElement::id
  Fix ?
  Implement DOMParentNode::replaceChildren()
  Implement DOMElement::className
  RFC: Deprecate remains of string evaluated code assertions (php#11671)
  Prevent decimal int precision loss in number_format()
  Implement DOMNode::getRootNode()
  Implement DOMElement::getAttributeNames()
  Refactor dom_node_node_name_read() to avoid double allocation
  Handle fragments consisting out of multiple children without a single root correctly
  Avoid allocations in DOMElement::getAttribute()
  Avoid string allocation in dom_get_dom1_attribute() for as long as possible
  Fix use-of-uninitialized-value when calling php_posix_stream_get_fd (php#11694)
  Reorder list construction in the function php_intpow10 (php#11683)
  proc_open: Use posix_spawn(3) interface on systems where it is profitable
  zend_gdb disable gdb detection for FreeBSD < 11.
  Fix iface const visibility variance check
  Fix missing iface class const inheritance type check
  ...
sebastianbergmann added a commit to sebastianbergmann/phpunit that referenced this pull request Jul 16, 2023
sebastianbergmann added a commit to sebastianbergmann/phpunit that referenced this pull request Jul 18, 2023
sebastianbergmann added a commit to sebastianbergmann/phpunit that referenced this pull request Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose time spent collecting cycles in gc_status()
7 participants