-
-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Systematic testing for the runtime #4140
Conversation
The overall idea and some details of the implementation for systematic testing has been shamelessly stolen from the `Verona` runtime (see: https://github.com/microsoft/verona/blob/master/docs/explore.md#systematic-testing for details). This implementation doesn't include replayable runtime unit tests like `Verona`, but it sets a foundation for allowing replayable runs of programs (and probably tests) for debugging runtime issues such as backpressure/etc.
Hi @dipinhora, The changelog - added label was added to this pull request; all PRs with a changelog label need to have release notes included as part of the PR. If you haven't added release notes already, please do. Release notes are added by creating a uniquely named file in the The basic format of the release notes (using markdown) should be:
Thanks. |
Discussed briefly during sync, but I haven't had the chance to review in detail yet. It's exciting stuff. |
@dipinhora please let us know when you think this is ready for review. |
@SeanTAllen sorry, this in a good place in terms of setting the foundation for systematic testing and ready for review. i currently don't have any other changes pending although there can/should probably be follow up PRs to add calls to |
|
||
#if defined(USE_SYSTEMATIC_TESTING) | ||
#if !defined(PLATFORM_IS_WINDOWS) && !defined(USE_SCHEDULER_SCALING_PTHREADS) | ||
pony_static_assert(false, "Systematic testing requires pthreads (USE_SCHEDULER_SCALING_PTHREADS) to be enabled!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Shouldn't it be possible to write this in such a way that it works both with and without scaling enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
Because it was simpler to not worry about signals
and rely only on pthreads
to start with. In theory, it should be possible to extend this to using signals
instead of pthreads
when USE_SCHEDULER_SCALING_PTHREADS
is not defined.
Shouldn't it be possible to write this in such a way that it works both with and without scaling enabled?
i don't understand this question. The current implementation already works whether or not scaling
is disabled or not via the (--ponynoscale
cli option). Can you clarify what you meant by this question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time Joe asked the question, he had forgotten that you can turn on scaling and then one of two options for how the scaling is done. When we reviewed during sync today I reminded him of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks for the info.
So at the point this, what level of usability is this at? What could we do with it? I feel like we need some documentation around that somewhere. |
src/libponyrt/asio/asio.c
Outdated
// create wait event objects | ||
running_base.sleep_object = CreateEvent(NULL, FALSE, FALSE, NULL); | ||
#elif defined(USE_SCHEDULER_SCALING_PTHREADS) | ||
// TODO: memtrack accounting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see any TODO's be able to be easily found like...
TODO systematic testing:
Or some unique value and then opening an issue with a checklist of things to add/do that notes what those things are and notes the special todo value to look for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced TODO:
with TODO systematic testing:
. Will open an issue with the checklist/string to search for once this PR is merged.
It functions and allows one to reproduce a deterministic run of a program compiled with the runtime.
As an example, if someone has a test that has an intermittent failure (that is somehow related to timing of how actors are scheduled and run) they could recompile the test with systematic testing enabled and then run the test until it fails and then continually reproduce the failure by re-using the same
More than happy to add it assuming you have some suggestions as to what and where. |
Would you say that right now the systematic testing is geared to the developers of Ponyc or users of Ponyc? That would influence where documentation should go. |
It could be useful to users of |
@dipinhora i'm not sure where docs should go or what they should be. im open to ideas but i think without docs that people can refer to, it will be underused. Perhaps the contributors repo? Perhaps a new top-level document in this repo that gets referenced from BUILD.md. In this repo feels best as it can easily change as the functionality changes. |
@SeanTAllen i am all for adding in documentation for folks to make it easier to use the feature and i will think on what that could look like. However, i would prefer to get the changes merged even without comprehensive documentation to avoid a bitrot type scenario with the changes in this PR. Do you have any suggestions for what would be considered an adequate amount of documentation before this PR can be merged (assuming all other review concerns are also addressed)? |
I think information in BUILD on how to build and a new doc that details the basic of usage. |
i've updated |
@jemc based on Dipin's reply, is there anything you want to see done with this before we merge this initial version? |
@dipinhora seems like the final commit message should be updated? you have a lot of nice content in the release notes etc that suggest we could have a more meaningful commit message. If you agree, please update the first comment and I'll use that. If no, let me know. Either way, ping me and I'll squash and merge this whichever way you decide to go with the final commit message. |
@SeanTAllen first comment updated |
Pony is a concurrent and parallel language. Different actors can be run
at the same time on multiple CPUs. The Pony runtime coordinates all of
this interleaving of actors and contains a fair amount of complexity.
Runtime functionality such as the message queues and the backpressure
system rely on atomic operations which can be tricky to get right across
multiple platforms.
Systematic testing allows for running of Pony programs in a deterministic
manner. It accomplishes this by coordinating the interleaving of the
multiple runtime scheduler threads in a deterministic and reproducible
manner instead of allowing them all to run in parallel like happens
normally. This ability to reproduce a particular runtime behavior is
invaluable for debugging runtime issues.
The overall idea and some details of the implementation for systematic
testing has been shamelessly stolen from the Verona runtime.
This implementation doesn't include replayable runtime unit tests like
Verona, but it sets a foundation for allowing replayable runs of
programs (and probably tests) for debugging runtime issues such as
backpressure/etc. Additionally, while all development and testing was
done on Linux, in theory this systematic testing functionality should
work on other operating systems (Windows, MacOS, Freebsd, etc)
barring issues related to lack of atomics for tracking the active thread
and whether a thread has stopped executing or not (unlikely to be an
issue on MacOS/Freebsd/other
pthread
based threadingimplementations).
An example use case could be if someone has a test that has an
intermittent failure (that is somehow related to timing of how actors are
scheduled and run) they could recompile the test with systematic testing
enabled and then run the test until it fails and then continually reproduce
the failure by re-using the same seed via the
--ponysystematictestingseed <SEED_THAT_CAUSED_FAILURE>
cliargument. Then once the intermittent failure can be reliably reproduced,
it should make it significantly easier to track down the root cause and fix
the bug.
NOTE: While systematic testing could be useful to users of ponyc (like in
the example scenario), we expect it to get more use from developers of
Pony as they enhance the runtime (i.e. changes to backpressure, changes
to the message queue, changes to the objectmap, changes to GC,
changes to the cycle detector, etc).