Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iron-out edge cases for library use-case, adding extensive real-world test assertions #2309

Merged
merged 29 commits into from
Feb 28, 2024

Conversation

lucksus
Copy link
Contributor

@lucksus lucksus commented Jan 29, 2024

Additions

  1. result assertions for the lib_machine integration tests (which are a log of our ad4m integration tests, copying how ad4m is driving the scryer machine during our test runs).
  2. two regular test cases that are minimal examples of the first failing result assertion from the integration tests.

Context

After merging #1880 we tried to jump to upstream scryer commits/versions but were forced to stay at an old commit that was still using our custom toplevel. Using the fully Rust-based version that got merged we couldn't get our full integration tests to pass.

After double checking our code, I went ahead and completed the integration tests here with the results that our test code expects to get from scryer. (that has not really changed from our old SWI integration over the toplevel based scryer lib_machine).

1. problems with discontiguous

Interpreting the new test results, I think it's clear that the problem only occurs when declaring a predicate with discontiguous. In the new test dont_return_partial_matches we load the program:

:- discontiguous(property_resolve/2).
subject_class("Todo", c).

which is all we need from our integration tests to show the faulty behaviour when running the query:

subject_class("Todo", C), property_resolve(C, "isLiked").

which (wrongly) yields a match for C: "c".

Changing the order of the predicates in the query gives a different result:

property_resolve(C, "isLiked"), subject_class("Todo", C).

=> false

The second new test case dont_return_partial_matches_without_discountiguous shows that a similar case works fine when discontiguous is not used.

fixed in: 7de693e

2. Assertion 50 leaves run_query loop

Assertion 50 fails with only one of 3 expected matches.
Debugging through this case shows a strange behaviour: after the first execution of the loop in run_query (ln. 111) which adds the first match, the loop is exited at the end, not calling dispatch_loop() again. This seems to happen without hitting a break.

Resembling this assertion in its own test case by running all previous consults and then running the query and assertion does not yield this strange behaviour.

Seems like the history of all the queries happening before in the integration test are relevant to replicate this.

Debugger jumps to the first line of src/lib.rs when exiting the loop (and then continues after the loop). I have tried increasing and removing the #![recursion_limit = "4112"] there to no avail.

@lucksus lucksus changed the title Additional tests for lib_machine showing problems with discontiguous Iron-out edge cases for library use-case, adding extensive integration tests Jan 31, 2024
@lucksus
Copy link
Contributor Author

lucksus commented Jan 31, 2024

@mthom after fixing the ordering of the first assertions, I'm now stuck with this:
(also added to the PR description)

Assertion 50 fails with only one of 3 expected matches.
Debugging through this case shows a strange behaviour: after the first execution of the loop in run_query (ln. 111) which adds the first match, the loop is exited at the end, not calling dispatch_loop() again. This seems to happen without hitting a break.

Resembling this assertion in its own test case by running all previous consults and then running the query and assertion does not yield this strange behaviour.

Seems like the history of all the queries happening before in the integration test are relevant to replicate this.

Debugger jumps to the first line of src/lib.rs when exiting the loop (and then continues after the loop). I have tried increasing and removing the #![recursion_limit = "4112"] there to no avail.

@Skgland
Copy link
Contributor

Skgland commented Jan 31, 2024

#![recursion_limit] is an attribute of the rust compiler, it tells the compiler how deep it should be allowed to recurse before failing compilation, so this shouldn't effect the debugging of the program.

Though if removing it doesn't break the build, then it appears to have become superfluous.

@mthom
Copy link
Owner

mthom commented Feb 1, 2024

I fixed the Assertion 50 error and in the process I believe I exposed a redundant fact in the integration test text file causing a solution to be reported twice instead of once as expected. Please check its diff to see it. Now Assertion #57 fails because the contents of the solution are out of order again.

@lucksus
Copy link
Contributor Author

lucksus commented Feb 1, 2024

Alright! We got our ad4m integration test suite passing with commit 53028a9. I've fixed all the orderings in the integration assertions and have all scryer tests pass locally :)

@lucksus lucksus marked this pull request as ready for review February 1, 2024 12:47
@lucksus
Copy link
Contributor Author

lucksus commented Feb 1, 2024

...almost.
Another test suite on our side fails because scryer panics when a query contains a nonexistent predicate, but only if other predicates where defined. I've just added a new test showing this. With the consult call commented-out, this test would pass.

@Skgland
Copy link
Contributor

Skgland commented Feb 1, 2024

As these are supposed to be integration test, based on https://doc.rust-lang.org/cargo/guide/tests.html they should go into the tests/ folder.
Also, that way they can't accidentally use library internals.

@lucksus
Copy link
Contributor Author

lucksus commented Feb 1, 2024

As these are supposed to be integration test, based on https://doc.rust-lang.org/cargo/guide/tests.html they should go into the tests/ folder. Also, that way they can't accidentally use library internals.

Sorry for the confusion, I called the test "integration" because these are assertions coming from our (ad4m) integration tests where we run scryer as library integrated in ad4m. The tests in lib_machine.rs are testing run_query() located in the same file.

@lucksus lucksus changed the title Iron-out edge cases for library use-case, adding extensive integration tests Iron-out edge cases for library use-case, adding extensive real-world test assertions Feb 1, 2024
@lucksus
Copy link
Contributor Author

lucksus commented Feb 1, 2024

Huh, interesting that the last mentioned problem does not occur on wasm and i686 architectures, as seen in CI runs... (but all other archs)

@lucksus
Copy link
Contributor Author

lucksus commented Feb 1, 2024

🥳

src/rcu.rs Outdated
@@ -22,7 +22,7 @@ thread_local! {
// odd value means the current thread is about to access the active_epoch of an Rcu
// a thread has a single epoch counter for all Rcu it accesses,
// as a thread can only access one Rcu at a time
static THREAD_EPOCH_COUNTER: OnceCell<Arc<AtomicU8>> = OnceCell::new();
static THREAD_EPOCH_COUNTER: OnceCell<Arc<AtomicU8>> = const { OnceCell::new() };

This comment was marked as resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I just noticed in the thread_local! docs:

This macro supports a special const {} syntax that can be used
when the initialization expression can be evaluated as a constant.
This can enable a more efficient thread local implementation that
can avoid lazy initialization.

So, I think this change makes sense.

src/machine/parsed_results.rs Outdated Show resolved Hide resolved
@@ -524,4 +531,82 @@ mod tests {
),]))
);
}

#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the PR description these tests are the result of attempting to use scryer-prolog as a library. As such I would expect these to be integration test that do just that.
On a quick glance these tests also appear to only use the public accessible interface of scryer as a library, so I think these could be just moved to be integration tests in tests/.

Based on you comment at #2309 (comment) you appear to disagree, could you explain why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say this PR (and the ping-pong between me and Mark) demonstrates that the tests here are, first and foremost, covering the function run_query() in this same file. So I would suggest to keep them here because the context of the test be located near run_query() provides import meaning, independent of the visibility of that function.

That said, I'm happy to move it over to tests/ if that's where you want it to be :)
(just explaining my reasoning)

You are talking about all tests in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have invited you to our forked repo so you can push changes to this branch if you like, @Skgland!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I had some misconception/misunderstanding while writing that comment.

While I most of these would make sense as integration tests, as they appear to only use the public interface, I think most are small enough to be fine as is, especially as most already existed here from before this PR.

Only integration_test due to the now huge lib_integration_test_commands.txt (12k+ lines) appears too large.
Maybe the txt file could be split into multiple smaller txt files, so that the files can still be viewed here on Github?
Not sure how intertwined everything in there is.

integration_test could then be split into multiple test functions including the individual txt files which would then call a helper function that is basically the current integration_test but taking code as an argument.

Copy link
Contributor Author

@lucksus lucksus Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's adequate to question if this test should be here at all. It was a pragmatic way for us to delineate where the problems are stemming from as we did some refactoring in our project at the same time we switched from SWI to Scryer, but had our integration tests from before.

Running this big txt file here is a bit opaque, which is also why I tried to extract the failing assertions into the other more understandable test cases. But it did already uncover a problem that you only see if you have multiple consecutive calls to consult followed by queries. This is actually close the reason that made us add this test mechanism in the first place: we saw scryer slowing down query by query when used in our ad4m integration tests (that's why we had the previous version without assertions on the results - it was just making sure Scryer would make it to the end). So I would argue against splitting it up.

But yeah, that does make it feel more like an integration test.
I'll leave it completely up to you guys to decide if you want to keep this test at all, and if so where it should stay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its not worth blocking the whole PR about this, as it can always be moved later.

Co-authored-by: Bennet Bleßmann <bennet.blessmann+github@googlemail.com>
@lucksus
Copy link
Contributor Author

lucksus commented Feb 9, 2024

The style check workflow uses nightly Rust, which even fails the regular tests.

@Skgland
Copy link
Contributor

Skgland commented Feb 9, 2024

The style check workflow uses nightly Rust, which even fails the regular tests.

The problem is apparently that the stdsimd feature was split into sub-features and removed.
This breaks the build of the ahash dependency on nightly.
Something (ahash?) appears to automatically enables nightly features when the use of nightly is detected, which can break the build (and did here) if nightly features change.

@Skgland
Copy link
Contributor

Skgland commented Feb 9, 2024

I think updating ahash from 0.8.6 to 0.8.7 might be sufficient, unless that reveals further broken dependencies.

@lucksus
Copy link
Contributor Author

lucksus commented Feb 9, 2024

Yeah, I mean these things can happen "over night" ;) so my latest commit is a suggestion to not use nightly in CI if not absolutely necessary.. the wasm target build surprisingly works despite nightly. I guess ahash is not build there.. (?)

@Skgland
Copy link
Contributor

Skgland commented Feb 9, 2024

Yeah, I mean these things can happen "over night" ;) so my latest commit is a suggestion to not use nightly in CI if not absolutely necessary.. the wasm target build surprisingly works despite nightly. I guess ahash is not build there.. (?)

Yeah, I think that is good for the style and report CI.

Maybe the nightly CIs can be marked continue-on-error instead of disabling them.

@lucksus
Copy link
Contributor Author

lucksus commented Feb 9, 2024

Ah, I see. So the build and test actually failed for the wasm target (https://github.com/mthom/scryer-prolog/actions/runs/7843777400/job/21404819971) but because of continue-on-error it showed up as green tick anyways... hm, what is the benefit of running these jobs in CI if they won't trigger a failure?

@Skgland
Copy link
Contributor

Skgland commented Feb 9, 2024

Right, I forgot that GitHub Actions doesn't differentiate between success and failed but allowed to fail. This way the original failing CI would be better as the error is at least visible.

- fix nightly build
- not bumping to latest aka. 0.8.8 as that has a msrv of 1.72.0 and we are only at 1.70.0
- currently  `continue-on-error` is not shown in a usefull way on failiure see <https://github.com/orgs/community/discussions/15452>
@Skgland
Copy link
Contributor

Skgland commented Feb 16, 2024

I bumped ahash to 0.8.7 to fix nightly and undid the continue-on-error changes to resolve #2334 (comment)

@Skgland
Copy link
Contributor

Skgland commented Feb 16, 2024

Looks like the transitive wait-timeout dependency isn't compatible with wasm.
Though I don't know why it fails in the build job rather than the test job,
as it should only be a dev-dependency:

> cargo tree -i -p wait-timeout
wait-timeout v0.2.0
├── assert_cmd v1.0.8
│   [dev-dependencies]
│   └── scryer-prolog v0.9.3 (/mnt/c/Users/Bennet/Git/scryer-prolog-1)
└── snapbox v0.4.15
    └── trycmd v0.14.19
        [dev-dependencies]
        └── scryer-prolog v0.9.3 (/mnt/c/Users/Bennet/Git/scryer-prolog-1)

@triska
Copy link
Contributor

triska commented Feb 16, 2024

Is a failing nightly build even worth delaying PRs? It seems that issues with nightly may also be resolved by completely unrelated changes in other crates that will be made according to their own schedule.

@Skgland
Copy link
Contributor

Skgland commented Feb 16, 2024

Is a failing nightly build even worth delaying PRs? It seems that issues with nightly may also be resolved by completely unrelated changes in other crates that will be made according to their own schedule.

The only one that's still failing is wasm32 and the test failure already present on master just ignored.
It appears to be mostly tests that are failing to build as some dev-dependencies don't support wasm32.
I am currently trying to disabling those dev-dependencies for wasm32 and #[cfg]ing out the tests that use them

@lucksus
Copy link
Contributor Author

lucksus commented Feb 26, 2024

What is needed for this PR to get merged? Please let me know if there is something left I can do.

@mthom
Copy link
Owner

mthom commented Feb 28, 2024

Get the nightly tests to pass, I suppose? I'm not sure how important they are ultimately.

@aarroyoc
Copy link
Contributor

Get the nightly tests to pass, I suppose?

They already pass

I'm not sure how important they are ultimately.

It's a good question. Rust should never break compatibility but this rule doesn't apply for nightly, so sometimes it can fail due to mistakes in the Rust side.

@mthom mthom merged commit 84d5ce0 into mthom:master Feb 28, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants