Generated Haskell records should have strict fields #97

bitc · 2020-05-09T17:09:23Z

The generated data types are intended to be used in two different ways:

External data that is received and parsed into the data structure
A data structure is created in code that will be soon be serialized

In both cases of these cases lazy fields don't make sense, therefore I recommend that strict ! annotations be added to all fields.

Workaround: You can compile the generated code with -XStrictData, but I believe the better approach is to just have all the fields be strict by default

The text was updated successfully, but these errors were encountered:

timbod7 · 2020-05-10T01:25:23Z

Interesting point.

In what circumstances do you think the lack of strictness annotations will cause problems?

For your case 2, the serialization process will fully evaluate all of the thunks. For case 1, I can see that unevaluated thunks may build up with in memory data structures (eg with this demo project!). In my experience, however, in real world API examples, one generally updating some persistent store and the thunks will be indeed evaluated.

Rather than try and make a blanket decision on strictness, I think it would be better to have an an ADL annotation to specify strictness on an individual field or whole declaration.

This of course leaves us with the decision as to what the annotation "default" should be. Given one goal of ADL is to generate idiomatic code in each target language, I'd favor keeping the default to non-strict fields, for consistency with regular haskell.

bitc · 2020-07-21T13:20:40Z

Given one goal of ADL is to generate idiomatic code in each target language[...]

In my opinion, the idiomatic Haskell way is to always use strict fields for records that are for storing "business data" (i.e. not generic data structures).

This can be seen in lots of open source Haskell projects. For example, a random file from XMonad: https://github.com/xmonad/xmonad/blob/40466b2be266e50e941f2fcc53b7526f1cfc71be/src/XMonad/Layout.hs#L51

From the well known What I Wish I Knew when Learning Haskell:

A lot of industrial codebases have a policy of marking all constructors as strict by default

From the Johan Tibell Haskell Style Guide:

By default, use strict data types and lazy functions.

Also, from the Kowainik Haskell style guide:

Fields of data type constructors should be strict.

And here is another quote from some blog post:

You're probably asking a pretty good question right now: "how do I know if I should use a strictness annotation on my data fields?" This answer is slightly controversial, but my advice and recommended best practice: unless you know that you want laziness for a field, make it strict.

One last example: The Haskell protocol-buffers package has a similar use-case as ADL, and always generates strict fields.

In my experience and from what I've seen in the Haskell community, there is widespread consensus about this type of usage of strict fields. The above links have more in depth explanations.

In regards to performance, strict fields can be a huge win, especially when a record has lots of small Int fields or similar. When they are strict, they are unboxed/unpacked directly into the parent, which is major performance gain.

In my experience, however, in real world API examples, one generally updating some persistent store and the thunks will be indeed evaluated.

Right, I don't think I communicated my thoughts clearly in my initial post. What I meant to say is: If it is certain that your Haskell record will have all of its fields eventually evaluated, then laziness only gives you worse performance for no benefit. In the case of ADL, we are certain that all of the fields of a record will be eventually evaluated:

When we parse a message, we need to immediately fill in all of the fields and validate them all, so they must obviously be fully evaluated.
When a user creates a message from scratch, her intention is to eventually serialize it. The serialization will force all of the fields to be evaluated (if they weren't already).
The only case where laziness would be useful, is if I am creating an ADL record inside my code, and only passing it around internally, never serializing it. This is not the designed use-case for ADL, but even if you are doing this, you almost certainly want strict fields anyway, due to the reasons I gave above.

This of course leaves us with the decision as to what the annotation "default" should be.

It should be clear that my opinion is that the default should be strict :] (In fact, I would go so far as to say that I don't think it is necessary to even have the option for lazy fields). But I genuinely am interested in hearing the opinions of other people on this matter.

timbod7 · 2020-07-22T13:20:22Z

I think you've convinced me that fields should be strict by default. But I don't want to preclude ADL from being used to define potentially lazy data structures, so I'd still like to see the annotation to control this.

timbod7 added the enhancement label May 27, 2020

bitc mentioned this issue Jul 20, 2020

Customizable code generation #105

Open

bitc mentioned this issue Jul 21, 2020

1.0 Release Features #101

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated Haskell records should have strict fields #97

Generated Haskell records should have strict fields #97

bitc commented May 9, 2020

timbod7 commented May 10, 2020

bitc commented Jul 21, 2020

timbod7 commented Jul 22, 2020

Generated Haskell records should have strict fields #97

Generated Haskell records should have strict fields #97

Comments

bitc commented May 9, 2020

timbod7 commented May 10, 2020

bitc commented Jul 21, 2020

timbod7 commented Jul 22, 2020