Systematic testing for the runtime #4140

dipinhora · 2022-06-14T03:11:59Z

Pony is a concurrent and parallel language. Different actors can be run
at the same time on multiple CPUs. The Pony runtime coordinates all of
this interleaving of actors and contains a fair amount of complexity.
Runtime functionality such as the message queues and the backpressure
system rely on atomic operations which can be tricky to get right across
multiple platforms.

Systematic testing allows for running of Pony programs in a deterministic
manner. It accomplishes this by coordinating the interleaving of the
multiple runtime scheduler threads in a deterministic and reproducible
manner instead of allowing them all to run in parallel like happens
normally. This ability to reproduce a particular runtime behavior is
invaluable for debugging runtime issues.

The overall idea and some details of the implementation for systematic
testing has been shamelessly stolen from the Verona runtime.
This implementation doesn't include replayable runtime unit tests like
Verona, but it sets a foundation for allowing replayable runs of
programs (and probably tests) for debugging runtime issues such as
backpressure/etc. Additionally, while all development and testing was
done on Linux, in theory this systematic testing functionality should
work on other operating systems (Windows, MacOS, Freebsd, etc)
barring issues related to lack of atomics for tracking the active thread
and whether a thread has stopped executing or not (unlikely to be an
issue on MacOS/Freebsd/other pthread based threading
implementations).

An example use case could be if someone has a test that has an
intermittent failure (that is somehow related to timing of how actors are
scheduled and run) they could recompile the test with systematic testing
enabled and then run the test until it fails and then continually reproduce
the failure by re-using the same seed via the
--ponysystematictestingseed <SEED_THAT_CAUSED_FAILURE> cli
argument. Then once the intermittent failure can be reliably reproduced,
it should make it significantly easier to track down the root cause and fix
the bug.

NOTE: While systematic testing could be useful to users of ponyc (like in
the example scenario), we expect it to get more use from developers of
Pony as they enhance the runtime (i.e. changes to backpressure, changes
to the message queue, changes to the objectmap, changes to GC,
changes to the cycle detector, etc).

The overall idea and some details of the implementation for systematic testing has been shamelessly stolen from the `Verona` runtime (see: https://github.com/microsoft/verona/blob/master/docs/explore.md#systematic-testing for details). This implementation doesn't include replayable runtime unit tests like `Verona`, but it sets a foundation for allowing replayable runs of programs (and probably tests) for debugging runtime issues such as backpressure/etc.

ponylang-main · 2022-06-14T03:14:21Z

Hi @dipinhora,

The changelog - added label was added to this pull request; all PRs with a changelog label need to have release notes included as part of the PR. If you haven't added release notes already, please do.

Release notes are added by creating a uniquely named file in the .release-notes directory. We suggest you call the file 4140.md to match the number of this pull request.

The basic format of the release notes (using markdown) should be:

## Title

End user description of changes, why it's important,
problems it solves etc.

If a breaking change, make sure to include 1 or more
examples what code would look like prior to this change
and how to update it to work after this change.

Thanks.

jemc · 2022-06-14T18:23:46Z

Discussed briefly during sync, but I haven't had the chance to review in detail yet.

It's exciting stuff.

SeanTAllen · 2022-06-17T22:37:16Z

@dipinhora please let us know when you think this is ready for review.

dipinhora · 2022-06-18T00:07:26Z

@SeanTAllen sorry, this in a good place in terms of setting the foundation for systematic testing and ready for review. i currently don't have any other changes pending although there can/should probably be follow up PRs to add calls to SYSTEMATIC_TESTING_YIELD in many more places and other enhancements (some of which are in comments as TODO:)..

jemc · 2022-06-21T17:48:12Z

src/libponyrt/sched/systematic_testing.h

+
+#if defined(USE_SYSTEMATIC_TESTING)
+#if !defined(PLATFORM_IS_WINDOWS) && !defined(USE_SCHEDULER_SCALING_PTHREADS)
+pony_static_assert(false, "Systematic testing requires pthreads (USE_SCHEDULER_SCALING_PTHREADS) to be enabled!");


Why? Shouldn't it be possible to write this in such a way that it works both with and without scaling enabled?

Why?

Because it was simpler to not worry about signals and rely only on pthreads to start with. In theory, it should be possible to extend this to using signals instead of pthreads when USE_SCHEDULER_SCALING_PTHREADS is not defined.

Shouldn't it be possible to write this in such a way that it works both with and without scaling enabled?

i don't understand this question. The current implementation already works whether or not scaling is disabled or not via the (--ponynoscale cli option). Can you clarify what you meant by this question?

At the time Joe asked the question, he had forgotten that you can turn on scaling and then one of two options for how the scaling is done. When we reviewed during sync today I reminded him of that.

Got it. Thanks for the info.

SeanTAllen · 2022-06-21T18:32:24Z

So at the point this, what level of usability is this at? What could we do with it? I feel like we need some documentation around that somewhere.

SeanTAllen · 2022-06-21T18:34:08Z

src/libponyrt/asio/asio.c

+    // create wait event objects
+    running_base.sleep_object = CreateEvent(NULL, FALSE, FALSE, NULL);
+#elif defined(USE_SCHEDULER_SCALING_PTHREADS)
+    // TODO: memtrack accounting


I'd like to see any TODO's be able to be easily found like...

TODO systematic testing:

Or some unique value and then opening an issue with a checklist of things to add/do that notes what those things are and notes the special todo value to look for.

Replaced TODO: with TODO systematic testing:. Will open an issue with the checklist/string to search for once this PR is merged.

src/libponyrt/sched/systematic_testing.c

src/libponyrt/asio/asio.c

dipinhora · 2022-06-21T22:21:11Z

So at the point this, what level of usability is this at?

It functions and allows one to reproduce a deterministic run of a program compiled with the runtime.

What could we do with it?

As an example, if someone has a test that has an intermittent failure (that is somehow related to timing of how actors are scheduled and run) they could recompile the test with systematic testing enabled and then run the test until it fails and then continually reproduce the failure by re-using the same seed via the --ponysystematictestingseed <SEED_THAT_CAUSED_FAILURE> cli argument. Then once the intermittent failure can be reliably reproduced, it should make it significantly easier to track down the root cause and fix the bug.

I feel like we need some documentation around that somewhere.

More than happy to add it assuming you have some suggestions as to what and where.

SeanTAllen · 2022-06-21T22:23:52Z

Would you say that right now the systematic testing is geared to the developers of Ponyc or users of Ponyc? That would influence where documentation should go.

dipinhora · 2022-06-21T22:28:39Z

It could be useful to users of ponyc (like in the example scenario in my last comment) but i would expect it to get more use from developers of Pony as they enhance the runtime (i.e. changes to backpressure, changes to the message queue, changes to the objectmap, changes to GC, changes to the cycle detector, etc).

SeanTAllen · 2022-06-21T22:31:48Z

@dipinhora i'm not sure where docs should go or what they should be. im open to ideas but i think without docs that people can refer to, it will be underused. Perhaps the contributors repo? Perhaps a new top-level document in this repo that gets referenced from BUILD.md. In this repo feels best as it can easily change as the functionality changes.

dipinhora · 2022-06-21T22:59:44Z

@SeanTAllen i am all for adding in documentation for folks to make it easier to use the feature and i will think on what that could look like. However, i would prefer to get the changes merged even without comprehensive documentation to avoid a bitrot type scenario with the changes in this PR. Do you have any suggestions for what would be considered an adequate amount of documentation before this PR can be merged (assuming all other review concerns are also addressed)?

SeanTAllen · 2022-06-21T23:27:26Z

I think information in BUILD on how to build and a new doc that details the basic of usage.

dipinhora · 2022-06-22T00:05:50Z

i've updated BUILD.md and added a new SYSTEMATIC_TESTING.md. The new document isn't comprehensive but it sets a foundation that can be enhanced and built upon similar to how the rest of this PR sets a foundation for systematic testing that can be enhanced and built upon.

SeanTAllen · 2022-06-26T00:14:34Z

@jemc based on Dipin's reply, is there anything you want to see done with this before we merge this initial version?

SeanTAllen · 2022-06-27T21:38:43Z

@dipinhora seems like the final commit message should be updated? you have a lot of nice content in the release notes etc that suggest we could have a more meaningful commit message. If you agree, please update the first comment and I'll use that. If no, let me know.

Either way, ping me and I'll squash and merge this whichever way you decide to go with the final commit message.

dipinhora · 2022-06-27T22:49:08Z

@SeanTAllen first comment updated

ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Jun 14, 2022

SeanTAllen added the changelog - added Automatically add "Added" CHANGELOG entry on merge label Jun 14, 2022

dipinhora added 3 commits June 13, 2022 23:30

Fix mac/bsd failure

826cb2c

Fix windows failure

69b0e3e

Fix linux failure

63e34a4

dipinhora added 3 commits June 14, 2022 17:30

Fix windows check for thread inequality

e9a77cc

Ensure all threads are ready before starting execution

7324462

Add release notes

593b6b4

jemc reviewed Jun 21, 2022

View reviewed changes

SeanTAllen reviewed Jun 21, 2022

View reviewed changes

jemc reviewed Jun 21, 2022

View reviewed changes

src/libponyrt/sched/systematic_testing.c Outdated Show resolved Hide resolved

src/libponyrt/asio/asio.c Outdated Show resolved Hide resolved

dipinhora added 5 commits June 21, 2022 19:34

Track memory allocs/frees for systematic testing

4fb8e92

Replace 'TODO:' with 'TODO systematic testing:'

7435265

Add some basic documentation for systematic testing

a3f8fbb

Fix compile issue for macs

705980c

Fix markdown lint issue

b598b95

Add newline to SYSTEMATIC_TESTING.md

eeb7265

dipinhora added 2 commits June 21, 2022 20:43

Remove trailing space from SYSTEMATIC_TESTING.md

ddd018f

Set code block language in SYSTEMATIC_TESTING.md

545aa8b

jemc approved these changes Jun 26, 2022

View reviewed changes

SeanTAllen merged commit bbe32c4 into ponylang:main Jun 28, 2022

ponylang-main removed the discuss during sync Should be discussed during an upcoming sync label Jun 28, 2022

github-actions bot pushed a commit that referenced this pull request Jun 28, 2022

Update CHANGELOG for PR #4140

6db7d50

github-actions bot pushed a commit that referenced this pull request Jun 28, 2022

Updates release notes for PR #4140

76a20d9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Systematic testing for the runtime #4140

Systematic testing for the runtime #4140

dipinhora commented Jun 14, 2022 •

edited

Loading

ponylang-main commented Jun 14, 2022

jemc commented Jun 14, 2022

SeanTAllen commented Jun 17, 2022

dipinhora commented Jun 18, 2022

jemc Jun 21, 2022

dipinhora Jun 21, 2022

SeanTAllen Jun 21, 2022

dipinhora Jun 22, 2022

SeanTAllen commented Jun 21, 2022

SeanTAllen Jun 21, 2022

dipinhora Jun 21, 2022

dipinhora commented Jun 21, 2022

SeanTAllen commented Jun 21, 2022

dipinhora commented Jun 21, 2022

SeanTAllen commented Jun 21, 2022

dipinhora commented Jun 21, 2022

SeanTAllen commented Jun 21, 2022

dipinhora commented Jun 22, 2022

SeanTAllen commented Jun 26, 2022

SeanTAllen commented Jun 27, 2022

dipinhora commented Jun 27, 2022

Systematic testing for the runtime #4140

Systematic testing for the runtime #4140

Conversation

dipinhora commented Jun 14, 2022 • edited Loading

ponylang-main commented Jun 14, 2022

jemc commented Jun 14, 2022

SeanTAllen commented Jun 17, 2022

dipinhora commented Jun 18, 2022

jemc Jun 21, 2022

Choose a reason for hiding this comment

dipinhora Jun 21, 2022

Choose a reason for hiding this comment

SeanTAllen Jun 21, 2022

Choose a reason for hiding this comment

dipinhora Jun 22, 2022

Choose a reason for hiding this comment

SeanTAllen commented Jun 21, 2022

SeanTAllen Jun 21, 2022

Choose a reason for hiding this comment

dipinhora Jun 21, 2022

Choose a reason for hiding this comment

dipinhora commented Jun 21, 2022

SeanTAllen commented Jun 21, 2022

dipinhora commented Jun 21, 2022

SeanTAllen commented Jun 21, 2022

dipinhora commented Jun 21, 2022

SeanTAllen commented Jun 21, 2022

dipinhora commented Jun 22, 2022

SeanTAllen commented Jun 26, 2022

SeanTAllen commented Jun 27, 2022

dipinhora commented Jun 27, 2022

dipinhora commented Jun 14, 2022 •

edited

Loading