[RFC007] Improve/simplify record representation in the new AST #2102

yannham · 2024-11-21T18:35:07Z

Instead of elaborating piecewise definitions (such as {foo.bar = 1, foo.baz = 2}) directly at the parsing stage, this commit makes the new AST closer to the source language by making record a list of field definitions, where the field "name" (left hand side of =) can be a sequence of identifiers and dynamic strings. This representation is used internally by the parser; we now make it the default in the new AST, such that the migration of the parser in #2083 won't have to do this elaboration at all. The elaboration is offloaded to the conversion to RichTerm, which happens in the ast::compat module.

This makes the AST closer to the source language.

The first motivation is that it'll be better for the LSP, where some open issues on the tracker are caused by the inability to trace what the LSP get back to the original piecewise definitions.

The second reason is that we can't actually elaborate a piecewise definition while staying in the new AST correctly as of today: the new AST only has one record variant, which is recursive by default, but this doesn't match the way recursion and scoping work for piecewise definition. For example, {foo.bar = 1, baz.foo = foo + 1} works fine in today's Nickel (evaluate to {foo = {bar = 1}, baz {foo = 2}}), but if we elaborate it in the new AST naively, we'll get an infinite recursion: {foo = {bar = 1}, baz = {foo = foo + 1}}.

Mailine Nickel currently uses a non recursive Record for that, but we don't want to introduce such "runtime dictionary" in the new AST as they can't be expressed in the source language. Instead, we rather keep records as piecewise defined without transformation and will do further elaboration when needed later in the pipeline, during typechecking, future compilation, or in the meantime when converting the new AST representation to mainline Nickel.

yannham · 2024-11-21T18:36:29Z

core/src/bytecode/ast/mod.rs

+    /// Allocate an AST element in the arena.
+    ///
+    /// [Self] never guarantees that all destructors are going to be run when using such a generic
+    /// allocation function. We don't want to allocate values that need to be dropped through this
+    /// method, typically because they own heap-allocated data, such as numbers or parse errors.
+    /// That's why we use a marker trait to specify which types can be allocated freely. Types that
+    /// need to be dropped have a dedicated method for allocation.
+    pub fn alloc<T>(&self, value: T) -> &T {
+        self.generic_arena.alloc(value)
+    }
+
+    /// Allocate a sequence of AST elements in the arena.
+    ///
+    /// See [Self::alloc].
+    pub fn alloc_iter<T, I>(&self, iter: I) -> &[T]
+    where
+        I: IntoIterator<Item = T>,
+        I::IntoIter: ExactSizeIterator,
+    {
+        self.generic_arena.alloc_slice_fill_iter(iter)
+    }
+


Those functions are a bit orthogonal to the matter, but they were introduced in the commit cherry picked here from #2083 and it was easier to keep them. After #2083 is merged I'll make a pass on the AstAlloc API to trim it down a bit anyway.

That's why I talk about a marker trait that is yet to be written, but will exist in the future.

yannham · 2024-11-21T18:37:49Z

core/src/bytecode/ast/compat.rs

+}
+
+impl<'ast> FromAst<record::FieldDef<'ast>> for (FieldName, term::record::Field) {
+    fn from_ast(field: &record::FieldDef<'ast>) -> Self {


This is mostly parser::utils::build_record adapted to take a new AST as an input.

Sorry, this is rather elaborate_field_def.

yannham · 2024-11-21T18:39:35Z

core/src/bytecode/ast/compat.rs

+        Vec<(term::RichTerm, term::record::Field)>,
+    )
+{
+    fn from_ast(record: &Record<'ast>) -> Self {


This is mostly parser::utils::build_record adapted to take a new AST as an input.

yannham · 2024-11-21T18:39:52Z

core/src/bytecode/ast/compat.rs

+/// https://github.com/tweag/nickel/issues/1427.
+///
+/// This is a helper for the conversion of a record definition to mainline.
+fn merge_fields(


This was taken and type-adapted from parser::utils as well.

github-actions · 2024-11-21T18:47:35Z

Bencher Report

Branch	rfc007/records-as-field-defs
Testbed	ubuntu-latest

Click to view all benchmark results

Benchmark	Latency	nanoseconds (ns)
fibonacci 10	📈 view plot 🚷 view threshold	483,720.00
foldl arrays 50	📈 view plot 🚷 view threshold	1,735,600.00
foldl arrays 500	📈 view plot 🚷 view threshold	6,912,700.00
foldr strings 50	📈 view plot 🚷 view threshold	7,289,000.00
foldr strings 500	📈 view plot 🚷 view threshold	64,364,000.00
generate normal 250	📈 view plot 🚷 view threshold	52,302,000.00
generate normal 50	📈 view plot 🚷 view threshold	2,243,400.00
generate normal unchecked 1000	📈 view plot 🚷 view threshold	3,668,000.00
generate normal unchecked 200	📈 view plot 🚷 view threshold	793,020.00
pidigits 100	📈 view plot 🚷 view threshold	3,234,800.00
pipe normal 20	📈 view plot 🚷 view threshold	1,520,000.00
pipe normal 200	📈 view plot 🚷 view threshold	10,646,000.00
product 30	📈 view plot 🚷 view threshold	833,290.00
scalar 10	📈 view plot 🚷 view threshold	1,504,400.00
sum 30	📈 view plot 🚷 view threshold	832,960.00

🐰 View full continuous benchmarking report in Bencher

Instead of elaborating piecewise definitions (such as `{foo.bar = 1, foo.baz = 2}`) directly at the parsing stage, this commit makes the new AST closer to the source language by making record a list of field definition, where the field "name" can be a sequence of identifiers and strings. This representation is used internally by the parser; we now make it the default in the AST, such that the migration of the parser won't have to do this elaboration at all. The elaboration is offloaded to the conversion to `RichTerm`, which happens in the `ast::compat` module. This makes the AST closer to the source language. The first motivation is that it'll be better for the LSP, where some open issues on the tracker are caused by the inability to trace what the LSP get back to the original piecewise definitions. The second reason is that we can't actually elaborate a piecewise definition while staying in the new AST correctly as of today: the new AST only has one record variant, which is recursive by default, but this doesn't match the way recursion and scoping work for piecewise definition. For example, `{foo.bar = 1, baz.foo = foo + 1}` works fine in today's Nickel (evaluate to `{foo = {bar = 1}, baz {foo = 2}}`), but if we elaborate it in the new AST naively, we'll get an infinite recursion: `{foo = {bar = 1}, baz = {foo = foo + 1}}`. Mailine Nickel currently uses a non recursive `Record` for that, but we don't want to introduce such "runtime dictionary" in the new AST as they can't be expressed in the source language. Instead, we rather keep record as defined piecewise and will do further elaboration when needed, during typechecking, future compilation, or in the meantime when converting the new AST representation to mainline Nickel.

core/src/bytecode/ast/record.rs

yannham requested a review from jneem November 21, 2024 18:35

yannham commented Nov 21, 2024

View reviewed changes

yannham force-pushed the rfc007/records-as-field-defs branch from 6e607db to 01a2688 Compare November 21, 2024 20:28

jneem approved these changes Nov 22, 2024

View reviewed changes

core/src/bytecode/ast/record.rs Outdated Show resolved Hide resolved

Use slice pattern to make function nicer

09170e3

yannham added this pull request to the merge queue Nov 22, 2024

Merged via the queue into master with commit f1c826d Nov 22, 2024
5 checks passed

yannham deleted the rfc007/records-as-field-defs branch November 22, 2024 10:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC007] Improve/simplify record representation in the new AST #2102

[RFC007] Improve/simplify record representation in the new AST #2102

yannham commented Nov 21, 2024

yannham Nov 21, 2024

yannham Nov 21, 2024

yannham Nov 21, 2024

yannham Nov 21, 2024

yannham Nov 21, 2024

yannham Nov 21, 2024

github-actions bot commented Nov 21, 2024 •

edited

Loading

[RFC007] Improve/simplify record representation in the new AST #2102

[RFC007] Improve/simplify record representation in the new AST #2102

Conversation

yannham commented Nov 21, 2024

yannham Nov 21, 2024

Choose a reason for hiding this comment

yannham Nov 21, 2024

Choose a reason for hiding this comment

yannham Nov 21, 2024

Choose a reason for hiding this comment

yannham Nov 21, 2024

Choose a reason for hiding this comment

yannham Nov 21, 2024

Choose a reason for hiding this comment

yannham Nov 21, 2024

Choose a reason for hiding this comment

github-actions bot commented Nov 21, 2024 • edited Loading

Bencher Report

github-actions bot commented Nov 21, 2024 •

edited

Loading