-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds a nickel test
subcommand for testing examples in docs.
#2020
Conversation
|
Report | Thu, September 12, 2024 at 11:40:25 UTC |
Project | nickel |
Branch | 2020/merge |
Testbed | ubuntu-latest |
⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!
- Latency (latency)
Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the--ci-only-thresholds
CLI flag.
Click to view all benchmark results
Benchmark | Latency | Latency Results nanoseconds (ns) |
---|---|---|
fibonacci 10 | ➖ (view plot) | 490,070.00 |
pidigits 100 | ➖ (view plot) | 3,154,400.00 |
product 30 | ➖ (view plot) | 782,240.00 |
scalar 10 | ➖ (view plot) | 1,448,600.00 |
sum 30 | ➖ (view plot) | 763,410.00 |
Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help
Idea: we could do as in the manual, and use a modifier such as I fear a sequence of four or five 2-line fenced code blocks are going to be less readable and quite noisy in the source or in the markdown rendered version. What do you think? |
I gave One possible issue with modifiers is that typos can give confusing errors. I accidentally wrote |
Ok, I've cleaned up the implementation and I think it's ready for review. |
A couple caveats:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very happy to see this going through; we have had our fair share of examples that aren't updated properly in the stdlib, for example.
In general, I found that it lacked a bit of documentation, both general and detailed. First the high-level scheme of the pipeline, which is: you go through a first time by applying the transformation and filling the registry with test entries, then you go through the spine to run the added tests. Additionally, each more specific small function and datatype could be commenter as well, even quickly.
Another remark is that we seem to expose a lot of the internals of Program
and Cache
, which doesn't sound so good - including "transmuting" the main_id
. At this rate, do you think it could also be an alternative to just go with the "lower-level" objects instead (Cache
+ VirtualMachine
), like what the REPL is doing? In the end, Program
is mostly a struct and a few thin wrappers around VirtualMachine
or Cache
methods, at least when there are no overrides (and here the command doesn't allow them). Although there's still eval_record_spine
which is non trivial and useful. Or to just use Program
for that, but to use a stand-alone separate VirtualMachine
to handle the generated snippets. Or to move part of the code that requires the access to the internals to Program
itself? Maybe not, but from a distance we are breaking the abstraction barrier quite a lot here.
cli/src/doctest.rs
Outdated
DocTest { input, expected } | ||
} | ||
|
||
fn code(&self) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method would deserve documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided to do this a bit differently, building up the contract application programmatically instead of by modifying the text (which gives slightly weird error messages).
core/src/eval/mod.rs
Outdated
@@ -124,7 +124,7 @@ pub struct VirtualMachine<R: ImportResolver, C: Cache> { | |||
// The call stack, for error reporting. | |||
call_stack: CallStack, | |||
// The interface used to fetch imports. | |||
import_resolver: R, | |||
pub import_resolver: R, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already import_resolver()
and import_resolver_mut()
methods; is this pub
really needed? Although setting a field to be public is maybe simpler, in which case maybe we should get rid of the accessors, but having both looks redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe it was originally for borrow-check reasons (because import_resolver_mut()
borrows the whole VirtualMachine), but it isn't needed anymore.
core/src/cache.rs
Outdated
@@ -1068,6 +1068,18 @@ impl Cache { | |||
self.terms.get(&file_id).map(|TermEntry { term, .. }| term) | |||
} | |||
|
|||
/// Set a new value for a cached term. | |||
pub fn set(&mut self, file_id: FileId, term: RichTerm, state: EntryState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? That is a bit suspicious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is definitely suspicious and I've removed it.
I replaced it with the custom_transform
function, which I'm also not completely thrilled with but I think it's better than the previous poking around in Program
's private fields.
Fundamentally, the problem is that we want to apply a custom transformation pass, but otherwise we want all of the behavior that Program
provides (the input processing, the override mode, the error reporting...)
Thanks for the review, I think this is in a better state now. It's still missing tests and user-facing documentation. I'll try to get to that tomorrow. |
core/src/program.rs
Outdated
@@ -672,7 +701,7 @@ impl<EC: EvalCache> Program<EC> { | |||
result.body.pos, | |||
)) | |||
} | |||
_ => Ok(result.body), | |||
_ => Ok(result.body.closurize(&mut vm.cache, result.env)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if that can break stuff. In particular extract_from_term
is expecting the record spine to be fields with Record
and RecRecord
directly inside, while here it seems we're introducing intermediate Closure
nodes everywhere.
PS: ok, I think I get it. We're introducing closures only when the result isn't a record, which is treated above. It's still a strange discrepancy: it means that if you take a random value somewhere in the field of a record spine, if it's a record and you try to evaluate it further you might get unbound identifiers, but if it's not a record then it's properly closurized and it's evaluating fine.
I have mixed feelings about this, although it doesn't seem to break existing code. Intuitively I would expect evaluate_record_spine
to ensure that the spine is just nested records and terminal values with no Closure
s in-between (something stupid: imagine we want to list all the accessible functions in the future, eval_record_spine
would be a good candidate, but now functions are hidden behind Closure
nodes, for example).
I wonder if we should somehow specialize eval_record_spine
for doc testing to make it clear that it's a strange version that might not make sense for something else, and keep eval_record_spine
with the current semantics (of course, under the hood, we can use a generic common base that just takes an additional boolean-like argument to know if it should closurize non records or something like that). It's just a suggestion, there might be other ways to go about it,, but I have the overall feeling that we try to cram too many different things into one method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it means that if you take a random value somewhere in the field of a record spine, if it's a record and you try to evaluate it further you might get unbound identifiers
I didn't follow this part -- as long as you're encountering records, there's nothing to evaluate and so no unbound identifiers. And then as soon as you encounter a non-record, it's closurized with the right environment, right?
It seems to me like the closurized version is the appropriate version if you're planning to do further evaluation, while the non-closurized version is the right one for any other analysis.
of course, under the hood, we can use a generic common base that just takes an additional boolean-like argument to know if it should closurize non records or something like that
Sure, I'll give that version a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't follow this part -- as long as you're encountering records, there's nothing to evaluate and so no unbound identifiers. And then as soon as you encounter a non-record, it's closurized with the right environment, right?
I think you're right (although that might not be entirely true because contracts are evaluated but not closurized).
It seems to me like the closurized version is the appropriate version if you're planning to do further evaluation, while the non-closurized version is the right one for any other analysis.
I think my intuition about eval_record_spine
was precisely that you don't want to perform further evaluation, but just peek at the term and gather whatever information you want. For example, nickel doc
doesn't currently include default values, but we could imagine doing so in the future. It would be easy with the previous version, but in the new one we now need to get the content of the closure from the cache first (which isn't undoable, but looks a bit like a leaking abstraction IMHO).
However, I see that you could write the specification of your version as all records are evaluated recursively, and environment is pushed toward the "leaf" by closurizing. I didn't see a simple and correct explanation of the updated behaviour of eval_record_spine
, but this may be reasonable. I'm still slightly in favour of separating the two flavour, but I don't have such a strong opinion anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think beside the reservation around eval_record_spine
, it looks good now! It's just missing a bunch of tests indeed. In a second step we will also want to test the stdlib as part of the general Nickel testing suite.
I have some tests and docs now, but
|
@jneem I lost track a bit - is there anything left to do on this PR, either on your side or mine? Is the |
I think stderr/stdout is settled. The only outstanding question IIRC is where the docs should go. |
I see. I wonder if it would nice to do something like the Nix CLI - write the CLI documentation as markdown that can be exported as an mdbook but also as manpages, with a single source of truth. In the meantime I guess a new section of the manual for the CLI should do - note that adding a manual section might require some trivial adaptation of |
@yannham I forgot to bring this up in the weekly, but can we merge this and do the CLI docs as a follow-up? |
This adds a
nickel test
subcommand that extracts all nickel code blocks contained in doc fields, and runs them to make sure they work. The examples are run in the environment of the record they're contained in, so inthe "foo" variable in the doc test will evaluate to
3
.There is rudimentary support for tests that are expected to error, and tests that are expected to evaluate to a specific value: a test ending in
will be wrapped in
(...) | std.contract.Equal [1, 2]
, while a test ending inwill be expected to error, with the main diagnostic message containing the string "foo".
Tests can be ignored, by starting them with
```nickel ignore
This mostly conforms to how the examples in the standard library are formatted, but a few changes are required:
=> expected
bit needs to be on the last line, and in a commentI converted the doctests in
std.array
to the new format, and discovered a couple of typos in the process. If the format is ok, I'll do the rest of the stdlib.