Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML output record fields in order for readability #1187

Open
JohannesRudolph opened this issue Jan 8, 2019 · 14 comments
Open

YAML output record fields in order for readability #1187

JohannesRudolph opened this issue Jan 8, 2019 · 14 comments

Comments

@JohannesRudolph
Copy link

Suppose I have the following record definitions:

{ platform :
    Text
, inputs :
    List ./directory.dhall
, outputs :
    List ./directory.dhall
, params :
    List ./param.dhall
, run :
    { path : Text, args : List Text }
}

When I run such a value through dhall-to-yaml, I get a YAML that looks like this

platform: linux
inputs:
- name: abc
run:
  args:
  - ec
  - |
    cp -r a/* b/
    cd b
outputs:
- name: sources-with-creds

Unfortunately the reordering of yaml fields compared to the record definition makes these files a little less readable. Ideally, dhall-to-yaml would have a mode that allows for preservation of field order to make generated yaml a little more human-friendly. This is important when I want my generated YAML to match established conventions or replace existing hand-built YAML with dhall-generated ones without creating crazy diffs. Of course, yaml key order is not important semantically, so this is just about humans reading it.

@Gabriella439
Copy link
Collaborator

@JohannesRudolph: The main issue here is that the conversion to YAML is conceptually done in two steps:

  • Evaluate the Dhall expression to its normal form
  • Convert that normal form to YAML

... and the standard requires that normalization sorts record fields (See: dhall-lang/dhall-lang#223), so even before the YAML-specific logic begins the record fields are already sorted.

Using the current Haskell API, there is currently no way to preserve field order when normalizing expressions. Fixing the API to support that, is possible, but would be an invasive change and significant departure from the standard so I want to ensure that there is a compelling reason to do so first.

In this particular case, the main thing I want to understand is why the generated YAML file is checked into version control since it's a derived build product from the Dhall configuration. By storing both in version control you permit an invalid state to be representable (where that invalid state is the two files being out of sync).

Or to make an analogy, I believe you would not want to version control the derived YAML file for the same reasons that you don't version control binary artifacts of your build process and you instead only version control the original source code.

@JohannesRudolph
Copy link
Author

@Gabriel439 Thanks for the fast response! I highly appreciate your support here and on SO and admire all the hard work you (and many others!) have put into dhall. It's clearly a smart take on configuration and so far I really like it!

We’re evaluating whether we can use dhall to generate configuration for a bunch of apps + corresponding concourse ci pipelines and tasks. This would add a great deal of safety to our delivery pipelines that need to cover a range of different environment and deployment targets.

Concourse for example is not able to read dhall natively In any way, it only supports yaml stored in a repo for tasks. I agree that it’s not the optimum way and the sync issue is something to beware of. However, I think for configuration the situation is not as clear cut as it is for source code:

  • configuration needs to be „auditable“, i.e. we need to track config changes in either in an artifact store or git. Both do the job but git has better diff and versioning tooling compared to many artifact managers (or at least, it feels more accessible to developers)
  • configuration needs to be "debuggable" and many times it's necessary to debug the effective or actual configuration used at runtime (i.e. the normalized form in Dhall). If we're talking about apps or tools that consume YAML, that better be YAML. With debuggability it's a bit like CoffeScript vs. JavaScript and the like. After all, my dhall scripts may have bugs.
  • there's some intersection between the requirements a) auditable and b) debuggable, i.e. I find auditing effective configuration more useful

I would assume that Dhall will often be used in "brownfield" contexts, i.e. our particular evolution of problems and tooling choices as our product continues growing looked like this:

  1. just a bunch of apps deployed to one target env -> a bunch of plain yaml files, we're happy
  2. crap, our yaml files get too repetitive (non-DRY) -> use anchors and all sorts of yaml magic
  3. crap, we have even more yaml files and they need variation that yaml magic can't represent -> use a "string template" style generator (in our case ejs)
  4. oh crap, we need to generate so many yaml files for deployments etc. and missing config keys is a frequent source of bugs -> we need something that compiles typesafe config, start looking at jsonnet, hcl, dhall etc.

So in the brownfield context being able to "gradually" introduce dhall is somewhat important, e.g. we want to start by replacing some of our old ejs yaml generators with dhall. And it's easy for us to ensure we don't break things in the process when we can just review the generated yaml in our git PRs.

I hope that shed's some light into our experience with dhall so far and gives you a real-world data point on potential users ;-) And I must admit that I really like that I finally have a good excuse to do some haskell-y things at work!

@JohannesRudolph
Copy link
Author

After that long monologue above I realize I actually forgot to get back to the point 😅

So in essence, I consider ordered yaml keys very useful. I bet dhall-kubernetes users will feel similar (e.g. its customary to have metadata fields declared pretty much at the top of a k8s yaml object). My memories of lambda calculus at university have faded a bit... but I think what you're saying is that dhall sorts record fields during normalization, i.e. normalized dhall expressions always have sorted record fields.

A simpler approach is to not specify at all that records are order-insensitive.
Instead, specify that β-normalization sorts fields.

So there must be some sort order defined... right? (btw. I couldn't figure out what sort order it actually is). Why couldn't that normalized sort order be the order defined in the type definition? Of course that implies that changing the sort order in a type definition changes the normalized form and the semantic integrity hash changes? Is that something that would kill compatibility?

@MonoidMusician
Copy link
Contributor

Keys are sorted according to Ord instance in Haskell :) (for unions, union literals, records, and record literals)

In my dhall-purescript project, I essentially have forms of normalization that do change the order of keys, which are used for comparisons (e.g. judgmentallyEqual), while the default normalization does not. (I actually parameterize the AST over the functor used to store key–value pairs, which either preserves order or is unordered.)

What happens in that case is that a {record,union} {literal,type} has each of its fields normalized but its order is not changed. That is, it is simply mapped or traversed to obtain the new value. The order of a record literal going in is its order going out, the type has nothing to do with it. (And when it needs to be sorted, the appropriate transformation is run over the AST to sort those cases.)

Now, you have to decide on an order for operations like // and /\, and that could be complicated – you at least have to decide on where keys should be ordered that appear on both sides. Do you want to largely preserve the order of the left operand, and insert new keys from the right at the end? What about putting left-only keys first, duplicated keys in the middle, and right-only keys last? It's not exactly trivial – some choice has to be made.

Anyhow, I imagine standardizing this behavior would be possible, if the idea is favored.

@f-f
Copy link
Member

f-f commented Jan 8, 2019

@JohannesRudolph at work we went through exactly the same "brownfield" process that you described. To keep the configs "auditable" and "debuggable" we just print the rendered YAML/JSON/etc to the CI logs (so that if we want to take a look and/or replicate we can), but only the Dhall code is versioned.

We do use dhall-kubernetes, but the field ordering has never been a problem because we don't look at the YAML at all - because if it typechecks then it's good to be sent to Kubernetes 🙂

Though in some other cases I felt that ordering the rendered YAML would have been nicer/clearer/etc, but I think that if we want to implement this "rendering order" then there's a simple-but-maybe-not-perfect way; we could:

  • traverse the Dhall expression after parsing it, and for every Record/Union we save in a Map the List of the keys in order (having the Set of the keys as key)
  • we then resolve/typecheck/normalize the expression, and then traverse again, reordering the keys according to the order saved before for each set of keys

This can be done in the Dhall.Yaml part, so no need to standardize this or change behaviour of the core part.
It is not totally accurate because e.g. you could have the same set of keys occuring two times and we'd have to pick one of the orders and apply it for all the other cases. Though IMO this is fine, since it's just for rendering.

@Gabriella439
Copy link
Collaborator

@f-f: That won't work in general because the records might be computed and might not be present in their entirety in the original syntax tree parsed from the file. For example:

let x = { foo = 1 }

let y = { bar = True }

in x  y

The only way to guarantee getting the complete record is to normalize the expression, but that will sort the fields along the way.

However, I think I will go with the approach that @MonoidMusician suggested (to parametrize the expression type on the key-value type constructor), since I will need to do that anyway to fix #772, too.

@antislava
Copy link
Collaborator

antislava commented Apr 3, 2019

I know it probably sounds very naive but wouldn't it be possible to export a normalizeNoSort function from Dhall.Core just to address those (few?) cases when it is badly needed, e.g. when a tool generates or, in particular, updates a Dhall expression subject to the inspection and editing by a human? This function would be used by such a tool if, for example, a switch --no-key-sort is specified by the user. (I am referring to json-to-dhall · Issue #878 · dhall-lang/dhall-haskell here in particular ;)

A straightforward implementation could be to start with normalizeWithM' - same as today's normalizeWithM but also taking a sorting function as its first argument so that normalizeWithM = normalizeWithM' Dhall.Map.sort while normalizeNoSortWithM = normalizeWithM' id. (Here I intentionally simplify things a bit as the sort function for record and union keys takes different type signatures but that is a minor detail.)

@Gabriella439
Copy link
Collaborator

@antislava: The main thing that makes it tricky is the API's support for custom normalizers. You'd have to thread the sorting decision throughout the code in a lot of places. I'm currently trying to clean up how the API supports customization to make this easier to do

@Gabriella439
Copy link
Collaborator

I'm going to move this issue to the dhall-haskell repository since it's specific to the Haskell implementation

@Gabriella439 Gabriella439 transferred this issue from dhall-lang/dhall-lang Aug 2, 2019
@ari-becker
Copy link

@JohannesRudolph saw this once Gabriel bumped the topic by moving it into dhall-haskell. Specifically for the Concourse use-case, you can run fly format-pipeline -c <(dhall-to-yaml --omitNull <<< '/path/to/pipeline.dhall') and Concourse will re-order the fields for you in a canonical way, which would be a good pattern to use even if you were working with straight YAML only.

This is still an issue in the general case, but thankfully not for Concourse.

@sjakobi
Copy link
Collaborator

sjakobi commented Aug 19, 2019

However, I think I will go with the approach that @MonoidMusician suggested (to parametrize the expression type on the key-value type constructor), since I will need to do that anyway to fix #772, too.

A possibly simpler approach would be to perform the field sorting as a separate step after normalizing everything else. I believe that would be fairly simple to implement, but possibly less efficient than the parametrization.

@sjakobi
Copy link
Collaborator

sjakobi commented Oct 20, 2019

I believe that the departure from the existing Dhall-JSON-YAML pipeline discussed in #1435 could help us preserve the order of fields when translating Dhall to YAML.

@erjenkins29
Copy link

wondering if this is in the roadmap-- It would be a helpful function on my end, as I'm hoping to use dhall to yaml for readability/documentation.

@Gabriella439
Copy link
Collaborator

We don't have an official roadmap. It's mainly whenever one of us can get around to this

Part of the delay is that this particular change requires pretty sweeping changes to the codebase to implement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants