Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systematic testing for the runtime #4140

Merged
merged 15 commits into from
Jun 28, 2022
Merged

Conversation

dipinhora
Copy link
Contributor

@dipinhora dipinhora commented Jun 14, 2022

Pony is a concurrent and parallel language. Different actors can be run
at the same time on multiple CPUs. The Pony runtime coordinates all of
this interleaving of actors and contains a fair amount of complexity.
Runtime functionality such as the message queues and the backpressure
system rely on atomic operations which can be tricky to get right across
multiple platforms.

Systematic testing allows for running of Pony programs in a deterministic
manner. It accomplishes this by coordinating the interleaving of the
multiple runtime scheduler threads in a deterministic and reproducible
manner instead of allowing them all to run in parallel like happens
normally. This ability to reproduce a particular runtime behavior is
invaluable for debugging runtime issues.

The overall idea and some details of the implementation for systematic
testing has been shamelessly stolen from the Verona runtime.
This implementation doesn't include replayable runtime unit tests like
Verona, but it sets a foundation for allowing replayable runs of
programs (and probably tests) for debugging runtime issues such as
backpressure/etc. Additionally, while all development and testing was
done on Linux, in theory this systematic testing functionality should
work on other operating systems (Windows, MacOS, Freebsd, etc)
barring issues related to lack of atomics for tracking the active thread
and whether a thread has stopped executing or not (unlikely to be an
issue on MacOS/Freebsd/other pthread based threading
implementations).

An example use case could be if someone has a test that has an
intermittent failure (that is somehow related to timing of how actors are
scheduled and run) they could recompile the test with systematic testing
enabled and then run the test until it fails and then continually reproduce
the failure by re-using the same seed via the
--ponysystematictestingseed <SEED_THAT_CAUSED_FAILURE> cli
argument. Then once the intermittent failure can be reliably reproduced,
it should make it significantly easier to track down the root cause and fix
the bug.

NOTE: While systematic testing could be useful to users of ponyc (like in
the example scenario), we expect it to get more use from developers of
Pony as they enhance the runtime (i.e. changes to backpressure, changes
to the message queue, changes to the objectmap, changes to GC,
changes to the cycle detector, etc).

The overall idea and some details of the implementation for systematic
testing has been shamelessly stolen from the `Verona` runtime (see:
https://github.com/microsoft/verona/blob/master/docs/explore.md#systematic-testing
for details). This implementation doesn't include replayable runtime
unit tests like `Verona`, but it sets a foundation for allowing
replayable runs of programs (and probably tests) for debugging
runtime issues such as backpressure/etc.
@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Jun 14, 2022
@SeanTAllen SeanTAllen added the changelog - added Automatically add "Added" CHANGELOG entry on merge label Jun 14, 2022
@ponylang-main
Copy link
Contributor

Hi @dipinhora,

The changelog - added label was added to this pull request; all PRs with a changelog label need to have release notes included as part of the PR. If you haven't added release notes already, please do.

Release notes are added by creating a uniquely named file in the .release-notes directory. We suggest you call the file 4140.md to match the number of this pull request.

The basic format of the release notes (using markdown) should be:

## Title

End user description of changes, why it's important,
problems it solves etc.

If a breaking change, make sure to include 1 or more
examples what code would look like prior to this change
and how to update it to work after this change.

Thanks.

@jemc
Copy link
Member

jemc commented Jun 14, 2022

Discussed briefly during sync, but I haven't had the chance to review in detail yet.

It's exciting stuff.

@SeanTAllen
Copy link
Member

@dipinhora please let us know when you think this is ready for review.

@dipinhora
Copy link
Contributor Author

@SeanTAllen sorry, this in a good place in terms of setting the foundation for systematic testing and ready for review. i currently don't have any other changes pending although there can/should probably be follow up PRs to add calls to SYSTEMATIC_TESTING_YIELD in many more places and other enhancements (some of which are in comments as TODO:)..


#if defined(USE_SYSTEMATIC_TESTING)
#if !defined(PLATFORM_IS_WINDOWS) && !defined(USE_SCHEDULER_SCALING_PTHREADS)
pony_static_assert(false, "Systematic testing requires pthreads (USE_SCHEDULER_SCALING_PTHREADS) to be enabled!");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Shouldn't it be possible to write this in such a way that it works both with and without scaling enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Because it was simpler to not worry about signals and rely only on pthreads to start with. In theory, it should be possible to extend this to using signals instead of pthreads when USE_SCHEDULER_SCALING_PTHREADS is not defined.

Shouldn't it be possible to write this in such a way that it works both with and without scaling enabled?

i don't understand this question. The current implementation already works whether or not scaling is disabled or not via the (--ponynoscale cli option). Can you clarify what you meant by this question?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time Joe asked the question, he had forgotten that you can turn on scaling and then one of two options for how the scaling is done. When we reviewed during sync today I reminded him of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for the info.

@SeanTAllen
Copy link
Member

So at the point this, what level of usability is this at? What could we do with it? I feel like we need some documentation around that somewhere.

// create wait event objects
running_base.sleep_object = CreateEvent(NULL, FALSE, FALSE, NULL);
#elif defined(USE_SCHEDULER_SCALING_PTHREADS)
// TODO: memtrack accounting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see any TODO's be able to be easily found like...

TODO systematic testing:

Or some unique value and then opening an issue with a checklist of things to add/do that notes what those things are and notes the special todo value to look for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced TODO: with TODO systematic testing:. Will open an issue with the checklist/string to search for once this PR is merged.

src/libponyrt/sched/systematic_testing.c Outdated Show resolved Hide resolved
src/libponyrt/asio/asio.c Outdated Show resolved Hide resolved
@dipinhora
Copy link
Contributor Author

So at the point this, what level of usability is this at?

It functions and allows one to reproduce a deterministic run of a program compiled with the runtime.

What could we do with it?

As an example, if someone has a test that has an intermittent failure (that is somehow related to timing of how actors are scheduled and run) they could recompile the test with systematic testing enabled and then run the test until it fails and then continually reproduce the failure by re-using the same seed via the --ponysystematictestingseed <SEED_THAT_CAUSED_FAILURE> cli argument. Then once the intermittent failure can be reliably reproduced, it should make it significantly easier to track down the root cause and fix the bug.

I feel like we need some documentation around that somewhere.

More than happy to add it assuming you have some suggestions as to what and where.

@SeanTAllen
Copy link
Member

Would you say that right now the systematic testing is geared to the developers of Ponyc or users of Ponyc? That would influence where documentation should go.

@dipinhora
Copy link
Contributor Author

It could be useful to users of ponyc (like in the example scenario in my last comment) but i would expect it to get more use from developers of Pony as they enhance the runtime (i.e. changes to backpressure, changes to the message queue, changes to the objectmap, changes to GC, changes to the cycle detector, etc).

@SeanTAllen
Copy link
Member

@dipinhora i'm not sure where docs should go or what they should be. im open to ideas but i think without docs that people can refer to, it will be underused. Perhaps the contributors repo? Perhaps a new top-level document in this repo that gets referenced from BUILD.md. In this repo feels best as it can easily change as the functionality changes.

@dipinhora
Copy link
Contributor Author

@SeanTAllen i am all for adding in documentation for folks to make it easier to use the feature and i will think on what that could look like. However, i would prefer to get the changes merged even without comprehensive documentation to avoid a bitrot type scenario with the changes in this PR. Do you have any suggestions for what would be considered an adequate amount of documentation before this PR can be merged (assuming all other review concerns are also addressed)?

@SeanTAllen
Copy link
Member

I think information in BUILD on how to build and a new doc that details the basic of usage.

@dipinhora
Copy link
Contributor Author

i've updated BUILD.md and added a new SYSTEMATIC_TESTING.md. The new document isn't comprehensive but it sets a foundation that can be enhanced and built upon similar to how the rest of this PR sets a foundation for systematic testing that can be enhanced and built upon.

@SeanTAllen
Copy link
Member

@jemc based on Dipin's reply, is there anything you want to see done with this before we merge this initial version?

@SeanTAllen
Copy link
Member

@dipinhora seems like the final commit message should be updated? you have a lot of nice content in the release notes etc that suggest we could have a more meaningful commit message. If you agree, please update the first comment and I'll use that. If no, let me know.

Either way, ping me and I'll squash and merge this whichever way you decide to go with the final commit message.

@dipinhora
Copy link
Contributor Author

@SeanTAllen first comment updated

@SeanTAllen SeanTAllen merged commit bbe32c4 into ponylang:main Jun 28, 2022
@ponylang-main ponylang-main removed the discuss during sync Should be discussed during an upcoming sync label Jun 28, 2022
github-actions bot pushed a commit that referenced this pull request Jun 28, 2022
github-actions bot pushed a commit that referenced this pull request Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - added Automatically add "Added" CHANGELOG entry on merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants