Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging sequences (like Merge keys for mappings) #48

Open
perlpunk opened this issue May 27, 2019 · 45 comments
Open

Merging sequences (like Merge keys for mappings) #48

perlpunk opened this issue May 27, 2019 · 45 comments

Comments

@perlpunk
Copy link
Member

perlpunk commented May 27, 2019

In #35 a syntax to merge sequences was proposed, but the syntax isn't going to happen, while I can understand that it would be useful to have a standard way of doing this.

There are plans to add programmatic features in YAML 1.3, as @ingydotnet mentioned, but they simply don't exist yet.

The question is, can we come up with something simple, that doesn't introduce new syntax?

Here are some suggestions:

  1. @ingydotnet 's suggestion was:
array1: &my_array_alias
- foo
- bar

array2:
- <: *my_array_alias
- baz
  1. My alternative suggestion would be:
array1: &my_array_alias
- foo
- bar

array2:
- <<
- *my_array_alias
- baz

This approach would be very close to the merge key feature; also when thinking about the implementation (because I have just implemented this in my YAML processor).

  1. Another variant would be almost equal to @ingydotnet's suggestion:
array1: &my_array_alias
- foo
- bar

array2:
- <<: *my_array_alias
- baz

I can't think of a reason not to use the same << as for merge keys.

@perlpunk perlpunk mentioned this issue May 27, 2019
@perlpunk
Copy link
Member Author

perlpunk commented May 27, 2019

Thinking about implementation, IMHO suggestion 1 and 3 will be much harder, because one would have to add additional handling for this on several levels.
Suggestion 2 can be more verbose, especially when you want to merge more than one sequence, but it should be comparably easy to add to an already existing merge key feature.

@DanySK
Copy link

DanySK commented Dec 12, 2019

More a question than a suggestion, as I might be overlooking something obvious. Why not just provide a "flattened entry" key?

array1: &my_array_alias
- foo
- bar

array2:
+ *my_array_alias
- baz

this loads + with the "flatten" meaning.

Possibly, this could work regardless of the list syntax (JSON style or multiline). All these entries should have the same result

flat: [a, b, c, 1, 2, 3]

classic:
- a
- b
- c
+ 
  - 1
  - 2
  - 3

mixed:
- a
- b
- c
+ [1, 2, 3]

json:
[a, b, c, + [1, 2, 3]]

@perlpunk
Copy link
Member Author

@DanySK this looks nice, I agree. But it would mean a new syntax element, so more work for parsers, and another thing + which would be disallowed in plain scalars.
I think this should be avoided like explained in #35.
If we get a new syntax element (or more), then it should be something general which will be able to add more programmatic features, and the basic syntax should not get more complex.
I know this feature would be useful and I could use it myself, but I can't think of a simple solution.

@DanySK
Copy link

DanySK commented Dec 13, 2019

Thanks @perlpunk
Would overloading a currently disallowed symbol in plain scalars (e.g., |) cause retrocompatibility issues? As said, the idea came off the top of my mind, I did not consider all the consequences.

Yes, there will be some more work for parsers, and yes, I do understand the YAML specification is already large. However, let me for a moment look at this from a higher perspective: why does YAML exist? Why don't we just use JSON or TOML? I can see two main reasons:

  1. readability is higher wrt JSON, and wrt TOML on complex specifications; but more importantly
  2. support for reuse via anchors, enabling DRY strategies.

Under this point of view, I advocate for further, not simplifying syntax changes to be considered in case they lead to clear, non ambiguous, and YAML-style-like forms of reuse.

@bbrouwer
Copy link

What about syntax like this:

array1: &my_array_alias
- foo
- bar

array2:
-*my_array_alias
- baz

I'm not going to argue at length for this, just wanted to propose this syntax. I'm sure this suggestion requires parser and language changes, but it doesn't introduce any new reserved symbols and sort of fits in with the feel of a list by starting with a dash.

@ssbarnea
Copy link

For sure there is a real need for this feature. We only need to be careful on how we describe its behaviour because lists are sorted and can allow duplicate entries.

AFAIK, I do not see a need to allow duplicates (in fact preventing them may count as an advantage). Still the order of entries may be critical, do we want to merge at too, bottom, middle? Which duplicate entry takes priority the one from inside the default or the override?

@DanySK
Copy link

DanySK commented Mar 21, 2020

@ssbarnea switching from lists to sets is madness, you'd lose also JSON compatibility. Let's keep the discussion clean, the only needed feature is a "flatten-inside" operator. But I can't see any way of introducing it without tinkering with the grammar

@ssbarnea
Copy link

@ssbarnea switching from lists to sets is madness, you'd lose also JSON compatibility. Let's keep the discussion clean, the only needed feature is a "flatten-inside" operator. But I can't see any way of introducing it without tinkering with the grammar

I did not propose any divergence from JSON, it would be insane to do so. I only wasted to state that we need to define very well the merging logic, so implementations would not have different behaviours.

I had lots of cases where I had a default list and I wanted to new entries to it, at top of bottom. Based on experience I encountered cases where the tool loading the list would choke if it finds duplicates (list of packages to install). This is why I mentioned that a set like behaviour when doing the merge could be desirable.

@DanySK
Copy link

DanySK commented Mar 21, 2020

Don't mix the use for your your specific use cases with the general framework. Performing a set union would for instance be irrelevant for most of my uses and deleterious (making the feature useless) for others.
I think the only meaningful merging behaviour for a list is list inserting.

What'd be the output of merging a [2] into a [1, 1] list? [1, 2]? That'd be very surprising to me.

@perlpunk
Copy link
Member Author

perlpunk commented Mar 21, 2020

@bbrouwer

but it doesn't introduce any new reserved symbols

Well, it does introduce new syntax. The plain scalar -*foo is perfectly fine currently.

If I rewrite your example:

array2:
   -*my_array_alias

then its meaning changes to something which is valid YAML right now.
Also, this would look odd in flow style:

array2: [ -*my_array_alias ]

I still think that introducing new syntax for implementing just one specific programmatic function is wrong. And I bet many other people who actually implemented a YAML parser would agree.

It would also be totally different from merging mapping keys, which happens in the constructing level, while a new syntax element introduces a new type of parsing event.

@perlpunk
Copy link
Member Author

Don't mix the use for your your specific use cases with the general framework

I think this is also a good example why introducing a specific programmatic element is not a good idea.

Merge keys are already not trivial to implement (look into pyyaml and try to figure out how to implement forbidding duplicate mapping keys while keeping the merge key behaviour).

And some might rather want a deep merge, which is not what merge keys do. Would you introduce another merge key (e.g. <<<) just to implement deep merging?

The solution is a more generic programmatic syntax which allows functions and parameters.

This is already possible with local tags, for example in AWS CloudFormation files you can use the !Join tag to concatenate list items.
The disadvantage of that is, that you cannot give the result of such a tag-function another tag.

@perlpunk
Copy link
Member Author

@ssbarnea If I added a merge-list feature similar to merge keys, I would just concatenate the lists.

Anything more complicated needs something like a templating (jinja for example) or a more generic programmatic syntax.

@DanySK
Copy link

DanySK commented Mar 23, 2020

@perlpunk "merging list" it's not necessarily programmatic. The way I see it, it is purely declarative. The problem with local tags is that it's not a standard solution, hence many use cases, the majority actually (e.g.: CI configuration) won't enjoy them.
For my own software, I implement workarounds manually, but still I am convinced that an equivalent of the merge keys for lists would be useful.

Also re-reading your initial post, I believe:

array2:
- <<: *my_array_alias
- baz

looks good, and does not introduce any new syntax element.

@jcpunk
Copy link

jcpunk commented Aug 31, 2020

I've been thinking generically about the future on this one and thought I might add a suggestion or two....

Since YAML is a human friendly data serialization standard and not a markup language, I'm less worried about the << syntax than the !!merge syntax. When editing the file there isn't really a visible difference between the two. There is, however, a grammar difference between them which seems relevant here.

With a merge I do want to "serialize" the data into this location. I view merges as a data storage/retrieval issue.

Vague thoughts that aren't well fleshed out follow:

Things I really want:

  • simple merge maps
  • merge sequence
  • deep merge

Sequence uniqueness feels like it manipulates the data rather than translating the result of multiple sequences put together. Or phrased differently, if you want the unique values of a sequence after a merge - then you don't want the node as is merged over here - you want to manipulate rather than store the data.

The !! name space is reserved, so can we use it for something fancy? Could we extend the !! name space a tiny bit by adding a !!! space for serialization specific functions? I'd want to limit those functions right up front to ONLY YAML node (scalar, sequence, or mapping) specific serialization.

psudo yaml:

---
&SEQ_A:
  - 1
  - 2
  - 3

&SEQ_B:
  - 4
  - 3
  - 2
  - 1

&DEFAULT_MAP:
  a: "value a set by DEFAULT_MAP"
  b: "value b set by DEFAULT_MAP"
  sub_map:
    - 1
    - 3

&EXTRA_MAP:
  c: "value c set by EXTRA_MAP"
  d: "value d set by EXTRA_MAP"
  sub_map:
    - 2

&REPLACE_MAP:
  b: "value b set by REPLACE_MAP"
  d: "value d set by REPLACE_MAP"
  sub_map:
    a_map: 3

##
## possible simple map merge syntax where the top
##  level map key is just replaced if it exists
##

merge: !!!merge_map_replace([*DEFAULT_MAP, *EXTRA_MAP])
## which would become
## simple_merge: 
##   a: "value a set by DEFAULT_MAP"
##   b: "value b set by DEFAULT_MAP"
##   c: "value c set by EXTRA_MAP"
##   d: "value d set by EXTRA_MAP"
##   sub_map:
##   - 2

merge: !!!merge_map_replace([*DEFAULT_MAP, *REPLACE_MAP])
## which would become
## simple_merge: 
##   a: "value a set by DEFAULT_MAP"
##   b: "value b set by REPLACE_MAP"
##   d: "value d set by REPLACE_MAP"
##   sub_map:
##     a_map: 3

merge: !!!merge_map_replace([*DEFAULT_MAP, *EXTRA_MAP, *REPLACE_MAP])
## which would become
## merge: 
##   a: "value a set by DEFAULT_MAP"
##   b: "value b set by REPLACE_MAP"
##   c: "value c set by EXTRA_MAP"
##   d: "value d set by REPLACE_MAP"
##   sub_map:
##     a_map: 3

merge:
  !!!merge_map_replace([*DEFAULT_MAP, *EXTRA_MAP, *REPLACE_MAP])
  d: "value d set locally"
  e: "value e set locally"
## which would become
## merge: 
##   a: "value a set by DEFAULT_MAP"
##   b: "value b set by REPLACE_MAP"
##   c: "value c set by EXTRA_MAP"
##   d: "value d set locally"
##   e: "value e set locally"
##   sub_map:
##     a_map: 3


##
## possible simple seq merge syntax
##

merge: !!!join_seq([*SEQ_A, *SEQ_B])
## which would become
## merge: 
##   - 1
##   - 2
##   - 3
##   - 4
##   - 3
##   - 2
##   - 1

merge: !!!join_seq([*SEQ_A, *SEQ_B, 5, 6, 7])
## which would become
## merge:
##   - 1
##   - 2
##   - 3
##   - 4
##   - 3
##   - 2
##   - 1
##   - 5
##   - 6
##   - 7


##
## possible deep merge syntax
##  logically I'd build it from the merges above.
##  if the types don't match, replace the node
##    aka, merge_map_replace if I can't put the data together
##

merge: !!!merge_nodes(*DEFAULT_MAP, *EXTRA_MAP])
## which would become
## simple_merge:
##   a: "value a set by DEFAULT_MAP"
##   b: "value b set by DEFAULT_MAP"
##   c: "value c set by EXTRA_MAP"
##   d: "value d set by EXTRA_MAP"
##   sub_map:
##     - 1
##     - 3
##     - 2

merge: !!!merge_nodes([*DEFAULT_MAP, *EXTRA_MAP, *REPLACE_MAP])
## which would become
## merge:
##   a: "value a set by DEFAULT_MAP"
##   b: "value b set by REPLACE_MAP"
##   c: "value c set by EXTRA_MAP"
##   d: "value d set by REPLACE_MAP"
##   sub_map:
##     a_map: 3

merge:
  !!!merge_nodes([*DEFAULT_MAP, *EXTRA_MAP, *REPLACE_MAP])
  d: "value d set locally"
  e: "value e set locally"
  sub_map:
    b_map: 4
## which would become
## merge:
##   a: "value a set by DEFAULT_MAP"
##   b: "value b set by REPLACE_MAP"
##   c: "value c set by EXTRA_MAP"
##   d: "value d set locally"
##   e: "value e set locally"
##   sub_map:
##     a_map: 3
##     b_map: 4

merge:
  !!!merge_nodes([*DEFAULT_MAP, *EXTRA_MAP, *REPLACE_MAP])
  d: "value d set locally"
  e: "value e set locally"
  sub_map:
    - 8
## which would become
## merge:
##   a: "value a set by DEFAULT_MAP"
##   b: "value b set by REPLACE_MAP"
##   c: "value c set by EXTRA_MAP"
##   d: "value d set locally"
##   e: "value e set locally"
##   sub_map:
##     - 8

merge: !!!merge_nodes([*SEQ_A, *SEQ_B])
## which would become
## merge:
##   - 1
##   - 2
##   - 3
##   - 4
##   - 3
##   - 2
##   - 1

merge:
  !!!merge_nodes([*SEQ_A, *SEQ_B])
  - 5
  - 6
  - 7
## which would become
## merge:
##   - 1
##   - 2
##   - 3
##   - 4
##   - 3
##   - 2
##   - 1
##   - 5
##   - 6
##   - 7

merge: 
  !!!merge_nodes([*SEQ_A, *SEQ_B])
  d: "value d set locally"
  e: "value e set locally"
## which would become
## merge:
##   d: "value d set locally"
##   e: "value e set locally"

I'm not sure a "short syntax" (<<) would add anything or make this more readable.

These thoughts aren't fully baked, but hopefully they are interesting....

@muuvmuuv
Copy link

muuvmuuv commented Sep 9, 2020

+1 :) My suggestion is the below which does not introduce new syntax and just uses the existing asterisk for merging.

a: &a
  - 1
  - 2

b: *a
  - 1
  - 3

@perlpunk
Copy link
Member Author

perlpunk commented Sep 9, 2020

Look, if you suggest a new syntax (and both @jcpunk and @muuvmuuv did that), please implement it in one of the existing YAML parsers first. It's a useless discussion, if you think about how it should look like if you have no idea how it is actually implemented.

@perlpunk
Copy link
Member Author

perlpunk commented Sep 9, 2020

@muuvmuuv

which does not introduce new syntax

That would mean there are no necessary changes to a YAML parser. But that's wrong.

@perlpunk
Copy link
Member Author

perlpunk commented Sep 9, 2020

@perlpunk "merging list" it's not necessarily programmatic. The way I see it, it is purely declarative.

Well, whatever this means for you in this context - it is a transformation that has to happen in one of the stages of YAML loading.

Also re-reading your initial post, I believe:

array2:
- <<: *my_array_alias
- baz

looks good, and does not introduce any new syntax element.

This <<: *my_array_alias is simply a mapping with exactly one key, a merge key. This mapping will get transformed in the constructor state of the loading process. In fact, since it is only one merge key and nothing else, it can be written shorter as:

array2:
- *my_array_alias
- baz

If you intended to show this as an example of a merge sequence, then please explain how the constructor is supposed to know that it is. Please implement it in PyYAML or a constructor of your choice.

@bughit
Copy link

bughit commented Sep 11, 2020

@perlpunk

- *arr_alias is already valid syntax for adding the aliased object as an element, at least per the ruby parser, so it can't be overloaded for merging.

require 'yaml'

pp YAML.load <<~YAML
  anchors:
    - &arr1
      - a
      - b
    - &arr2
      - c
      - d
  arr_merge:
    - *arr1
    - *arr2
    - e
    - f
YAML
{"anchors"=>[["a", "b"], ["c", "d"]],
 "arr_merge"=>[["a", "b"], ["c", "d"], "e", "f"]}

@perlpunk
Copy link
Member Author

@bughit I know

@ovangle
Copy link

ovangle commented Sep 29, 2021

Not wanting to intrude on the conversation, but I'd like to suggest a possible refinement of @DanySK's array flattening idea.

If we assume that data in most real-world arrays will be homogenously typed, then is there any scope for adding a "block sequence style" indicator, similar to the scalar style indicators. This doesn't allow for "per-item" array flattening (like the syntax above) or changes the syntax of scalar values at all, but rather applies a schema based transformation to all sequence in a yaml document, based on the first item in the sequence.

For example,

    x-list: &list
       - bar1
       - bar2

    mylist:
      - <<
      - foo
      - *list
      - baz

Would be parsed (using the current 1.2 syntax) as ['<<', 'foo', ['bar1', 'bar2'], baz] (and similarly for the json-array syntax).

The schema transformation rule would essentially be "if '<<' is the first element of an array, then it is removed from the result and, if any item in the list is an array, it is flattened into the result (nested arrays are left untouched)". This is analogous to the transformation rule about '<<' if it appears in an object.

This transformation would produce the array ['foo', 'bar1', 'bar2', 'baz'], which is I think what [the subset of users who use anchors/aliases] would expect.

The question is how many unexpecting users and real-world yaml files would be affected by '<<' special treatment if it occurs as the first element of an array? There are two ways people could get stung by this -- legacy files which are being converted over to the new format, and people who don't expect ['<<'] == []. But, it should be similar enough to the idea of a "header" token at the start of a scalar value?

Any concerns in this area would be mitigated by choosing a longer (and therefore less likely to collide or be entered unexpectedly) token as the array header e.g. '<<flatten'? Essentially any change made in this area is going to break some theoretically existing yaml files and/or make the contents of the file less literal, so if it is worth doing (and I would err on the side of "yes", making anchors/aliases more expressive and consistent is worth it) then it becomes a question of "what is the least invasive change we could make to the grammar?"

So yeah, after all that, the original option 2 gets my vote, although with slightly different semantics than I assume you initially intended.


Some edge-ish cases, nowhere near complete:

         x-seq1: &list1
            - value1
            - value3
         x-seq2: &list2
            - <<
            - *list1
            - value2
         x-seq3: &list3
            - *list1
            - value4
         # "Escaping" '<<' as the first element of the list
         l1: ['<<', '<<']       #  Expected ['<<']
         
         l2: ['<<', ['a', 'b'], ['a', ['b', 'c']]]    # Expected ['a', 'b', 'a', ['b', 'c']]
         
         # More than one reference in the list
         l3: [*list1, *list2]    # Expected [['value1', 'value3'], ['value1', 'value3', 'value2']]
         l4: ['<<', *list1, *list4]       # Expected ['value1', 'value3', ['value1', 'value3'], 'value4']
         
         # Schema rule applied to nested lists?
         l5: ['<<', ['<<', *list1], *list3]         # Expected [['value1', 'value3'], ['value1', 'value3'], 'value4']

Update 1/10:

  • "what people would expect" is too generic and vague
  • Expand thoughts about version compatibility issues
  • Fixed keys in example
  • Added self-referenential "update" section outlining edits to original post.

@gsmethells
Copy link

gsmethells commented Oct 11, 2021

The essence of the desire is so basic, I find it hard to believe the language has yet to provide a way to merge sequences. This needs stronger consideration.

Without this feature, DRY is impossible in many, many config files in many, many projects that use YAML.

@gsmethells
Copy link

@ovangle I recommend you take a step back and consider how you're communicating -- we understand you're passionate about this issue but you don't need to use inflammatory language to get your point across. please be nice -- there are humans on the other side of the cable

@ovangle
Copy link

ovangle commented Oct 13, 2021

@gsmethells My sincerest apologies -- as you probably noticed I deleted my message immediately after dispatching it (although that doesn't stop it being delivered to anyone subscribed to this). I am not typically that rude, nor am I even particularly passionate about what I was saying. It started out as a simple "you probably should think about why this isn't the easiest thing to change", but yeah, I was a bit overcaffeinated and hit send before taking the time to reflect on what I was saying.

@gsmethells
Copy link

@ovangle no hard feelings. Thank you for your hard work on this issue.

@ingydotnet
Copy link
Member

I'm going to chime in here with a few successive comments.

The right way to all functional transformation in YAML is with tags. Every mapping, sequence and scalar has a tag assigned to it either explicitly or implicitly during the load() process. The tag is associated with a function that controls how the data is processed into a native (Python here) data structure.

In YAML 1.1 the << key is implicitly tagged with a !!merge tag that triggers a merging transformation.

Here's a first pass solution with real code using PyYAML:

#!/usr/bin/env python3                                             
                                                                   
from yaml import *                                                 
                                                                   
def join(ldr, node):                                               
    l = []                                                         
    for e in ldr.construct_sequence(node, deep=True):              
        if type(e) is list:                                        
            l.extend(e)                                            
        else:                                                      
            l.append(e)                                            
    return l                                                       
add_constructor('!++', join)                                       
                                                                   
yaml = """\                                                        
data:                                                              
- &seq1                                                            
  - aaa                                                            
  - bbb                                                            
                                                                   
joined: !++                                                        
- foo                                                              
- *seq1                                                            
- bar                                                              
"""                                                                
                                                                   
print(load(yaml, Loader))                                          

Which produces:

{'data': [['aaa', 'bbb']], 'joined': ['foo', 'aaa', 'bbb', 'bar']}

This solution uses a local tag !++ to flatten any lists in a container list. It uses an alias to another list that must be stored somewhere in the YAML document.

As you can see we created our own tag and used the punctuation tag characters ++ instead of something like !join. This is just a personal style choice here, that you can decide on yourself.

Then we associated a transformation function with the tag. This whole scenario assumes you have a reasonable YAML framework. PyYAML is a pretty decent YAML framework overall.

@ingydotnet
Copy link
Member

A problem with the last comment solution is that you can't have lists in the container list that don't get flattened.

Here's a modification where we explicitly mark the list elements that we want to splat.

#!/usr/bin/env python3

from yaml import *

def join(ldr, node):
    v = ldr.construct_sequence(node, deep=True)
    l = []
    for e in v:
        if type(e) is tuple and e[0] == 'splat':
            l.extend(e[1][0])
        else: 
            l.append(e)
    return l 
add_constructor('!++', join)

def splat(ldr, node):
    return ('splat', ldr.construct_sequence(node, deep=True))
add_constructor('!*', splat)

yaml = """\ 
data:
- &seq1 
  - aaa
  - bbb

joined: !++ 
- [foo, yoo: hoo] 
- !* [*seq1] 
- bar
"""

print(load(yaml, Loader))

which prints:

{'data': [['aaa', 'bbb']], 'joined': [['foo', {'yoo': 'hoo'}], 'aaa', 'bbb', 'bar']}

Here the container list contains a list that don't flatten and one that we do.
We tag the list we want to splat with custom local !* tag attached to a splat function.
Since we can't tag an alias in YAML 1.2, we need to wrap it in a sequence.
Not ideal, but not terrible.

Let's see if we can do better...

@ingydotnet
Copy link
Member

Here we have almost the same thing, but notice we don't have to specify a !++ tag anymore.

#!/usr/bin/env python3

from yaml import *

def join(ldr, node):
    v = ldr.construct_sequence(node, deep=True)
    l = []
    for e in v:
        if type(e) is tuple and e[0] == 'splat':
            l.extend(e[1][0])
        else:
            l.append(e)
    return l
add_constructor('tag:yaml.org,2002:seq', join)

def splat(ldr, node):
    return ('splat', ldr.construct_sequence(node, deep=True))
add_constructor('!*', splat)
    
yaml = """\
data:   
- &seq1     
  - aaa
  - bbb     
    
joined:
- [foo, yoo: hoo]           
- !* [*seq1]
- bar                
"""

print(load(yaml, Loader))

It prints:

{'data': [['aaa', 'bbb']], 'joined': [['foo', {'yoo': 'hoo'}], 'aaa', 'bbb', 'bar']}

same as before.

Note even without the !++ tag we still have the join function. We just attached it to !!seq so every sequence that has a !* splat element works.

@ingydotnet
Copy link
Member

ingydotnet commented Oct 15, 2021

In this final rendition, we add a couple cool things:

#!/usr/bin/env python3

from yaml import *

data = {
    'seq1': ['aaa', 'bbb'],
    'seq2': ['ccc', 'ddd'],
}

def get_data(ldr, node):
    k = ldr.construct_scalar(node)
    return data.get(k, [])
add_constructor('!$', get_data)

def join(ldr, node):
    l = []
    for e in ldr.construct_sequence(node, deep=True):
        if type(e) is tuple and e[0] == 'splat':
            l.extend(e[1][0])
        elif type(e) is dict and e.get('<', None) is not None:
            l.extend(e['<'])
        else:
            l.append(e)
    return l
add_constructor('tag:yaml.org,2002:seq', join)

def splat(ldr, node):
    return ('splat', ldr.construct_sequence(node, deep=True))
add_constructor('!*', splat)

yaml = """\
seq3: &seq3
- xxx
- yyy
joined:
- !* [ !$ seq1 ]
- [ foo, yoo: hoo ]
- <: !$ seq2
- bar
- <: *seq3
"""

print(load(yaml, Loader))

First off we made a !$ tag to import data from outside of the YAML. It could have be from a file or a database or anything, but here it's just a Python dict. As you can see it was trivial to do.

For the seq1 list we splatted it with !* as before. For seq2 we instead introduce using a special < key, with a value that we want to use. This lets us not have to wrap the value in a sequence as before.

Note that << could not be used here because PyYAML already wants to use that for merging maps.

So now we kind of can merge lists today in YAML 1.2 with a slightly customized PyYAML like:

- string
- <: *list
- a: thing
- [ with, things ]  # not flattened
- <:
  - hello
  - world

@ingydotnet
Copy link
Member

ingydotnet commented Oct 15, 2021

Now that the YAML language development team has just released the YAML specification revision 1.2.2, we are actively working on the next specification and reference implementations.

I can't say for sure what YAML 1.3 will look like exactly, but I'm pretty certain that a whole host of loading transformations will be specified, including merging sequences. That means you'll be able to do these transformations in the same manner from framework to compliant framework.

We can do most of this without any syntax modifications, but we've been working on dozens of back compat ideas that will make these functional things super slick, while keeping today's YAML working as-is.

If you want to engage directly with us, stop by https://matrix.to/#/#chat:yaml.io

@DanySK
Copy link

DanySK commented Oct 15, 2021

This is amazing news @ingydotnet. I find all the examples very interesting, yet I believe the whole point of this discussion on merge sequences is standardization: YAML is a de-facto standard for the configuration of many services, including CI, where users do not have control over the interpretation of the configuration file.
Also, I note, different providers may come up with different implementations, generating fragmentation.

@ingydotnet
Copy link
Member

ingydotnet commented Oct 15, 2021

Without saying anything concretely about 1.3 yet, imagine that:

  • YAML defined a set of useful transformation functions (including import, merge, concat, etc) and the reference implementations encoded those as a standard-library.
  • It was possible to compose new functions (from standard functions) in YAML itself.
  • It was possible to associate functions and tags in YAML itself.

At that point I can imagine a well defined ecosystem where you can do these basics everywhere. Then you extend that by importing complex YAML extension definitions in a single phrase. You'll note that GitHub Actions is based around user defined extensions that you invoke with YAML like - uses: actions/checkout@v2. You might see in the future YAML like %import danysk/merge-utils@v1. ;)

We are well aware of the current fragmentation of YAML implementations and the lack of specific guidance offered by YAML to date. The 1.3 development process involves:

  • The specification of the syntax and data models
  • A comprehensive test suite for everything specified
  • Multiple reference implementations backing up everything
  • Detailed developer guides to explain the best practices
  • Engaging with the YAML Community

We intend to do these things simultaneously as we evolve the YAML data language to better serve all of its use cases.

@osher
Copy link

osher commented Nov 14, 2021

[Edited: fixed quite a few typos and format errors, sorry for those who get it unedited in the mail :( ]
[Edited2: added a requirement and a test-demo]

Well. I hope we can at least agree on the basics.
(some of the propositions there miss a few IMHO)

The goal: to allow the author to express lists in a DRY manner.

The requirement (at least as I see it):

  • should support merging of any number of lists into one list
  • should allow to control the order of elements in the resulting list
  • should allow adding explicit items before, after and between merged lists
  • should support merging lists of lists without over flattening
  • should allow to alias the resulting list in a new alias
  • should not be ambiguous with object merges
  • should be as clear in flow-form as in the indent-based form

I hope that helps.

...

now lets discuss implementation

Personally I think that there should be consistency in form - i.e - do not provide a different marker for merging lists than for merging maps. A merge is a merge (even it's implementation details vary depending on the target).
I also am reluctant to flex requirements because of implementation considerations, but not totally against it.
IMHO - << could be recognized the same way that - is.

  • While - accepts an in-place item, << expects a list and flattens it exactly one level.
    (expecting a list - i.e - throws an error when the expression to the right resolves to anything but a list)
  • Both imply a list context
    (i.e - an error should be thrown when this definition could not be applied to the current context).

If you want to get petty in semantics about merge vs expand - I think that the distinction between map-merge and list-merge is expressed by the difference between << used as a key - i,e merge, and << used like - - i.e expand

As simple as that :)

i.e:

src: [ { from: the } , src-array ]

merged1:
  - foo
  << *src
  - baz
# [ foo, { from: the } , src-array, baz ]

merged2:
  - foo
  << [ *src ]
  - baz
# [ foo, [ { from: the } , src-array } ], baz ]

Now lets try to work with that:

sources:
  scalars: [ 1, 2 ]
  objects: [ {a: a}, {b: b} ]
  lists: [ [1], [2] ]
  some-object: { some: object }

should support merging of any number of lists into one list: &example1
  flow form: [ << *scalars, << *objects, << *lists ]
  indented:
    << *scalars
    << *objects
    << *lists
  expected:  |-
     [ 1, 2, { "a": "a" }, { "b": "b" }, [1], [2] ]

should allow to control the order of elements in the resulting list: *example1 #well.. common

should allow adding explicit items before, after and between merged lists:
  flow form: [ inplace1, << *scalar, inplace2, << *scalar, inplace3 ]
  indented:
    - inplace1
    << *list1
    - inplace2
    << *list2
    - inplace3
  expected:  |-
    [ "inplace1", 1, 2 , "inplace2", 1, 2 , "inplace3" ]

should support merging lists of lists without over flattening:
  indented:
    << *lists, 
    - in-place-string
    << [ [ in-place, expanded-list ] ], 
    - << *lists    # TRICKY on purpose, see what it means the flow form
  flow form: [ << *lists, in-place-string, << [ [ in-place, expanded-list ] ],  [ << *lists ]  ]
  expected: |-
    [
       [ 1 ],
       [ 2 ],
       "in-place-string",
       [ "in-place", "expanded-list" ],    #yea, the  `<< and the outer [ ]  cancel each other... :P `
       [ [ 1 ], [ 2 ] ],                   #  \ its useless for explicit items, but crucial for references
    ]
  
should allow to alias the resulting list in a new alias:
  indented: 
    aliased: &aliased
       << scalars
       << scalars
    referenced: *aliased
  flow-form: [ << *scalars, << *scalars ]
  expected:  |-
    { 
       "aliased": [ 1, 2, 1, 2 ],
       "referenced": [ 1, 2, 1, 2 ]       # but as a reference...
    }

should not be ambiguous with object merges:
  indented:
      - *some-object
      << *scalars
      - <<: *some-object
        and: some
        more: attributes
  flow-form: [ *some-object, << *scalars, { <<: *some-object, and: some more: attributes } ]
  expected: |-
    [
      { "some": "object" },
      1,
      2,
      { "some": "object", "and": "some", "more": "attributes" }
    ]
      
should be as clear in flow-form as in the indent-based form: |
   well, since this is subjective, 
   the points of possible confusion are here for debate :) 

Being pragmatic - having such an inflammatory reaction to the discussion on the << thingy at #35 - and despite what I said, I can accept that merge and expand can be such different two things that the language would like to express it in a different semantic, even if the context could be enough to imply that difference - so I think that I won't be too grumpy about working with a new syntax element dedicated for a strictly list-only expander. Like I said - I just don't think that's necessary, and yet - it's better than not having anything for this use-cases.

BTW - I'm surprised that nobody suggested the ... expand operator that was accepted in JavaScript - especially now that we're already considering new syntax elements (I just think that's not necessary, but you know, if the flow objects swim with it).
This will simply mean that ... expects a list (referenced or not!) and flattens it one level.
Non-scalar values could be referenced or copied - whatever is consistent with how maps are merged.
(just like what in the ideal proposal the << should mean in a list context)

@osher
Copy link

osher commented Nov 16, 2021

if we're not using << then < could be just enough

on one side - < keeps the list better aligned
on the other hand - a) the << is more consistent with what we got and b) the misalignment calls for attention - and tells the reader - hey, something going on here. too much attention? I don't know.

I think I'd still vote for << despite I kinda like these aesthetics better. They just might be clear enough...

should allow adding explicit items before, after and between merged lists:
  flow form: [ inplace1, < *scalar, inplace2, < *scalar, inplace3 ]
  indented:
    - inplace1
    < *list1
    - inplace2
    < *list2
    - inplace3
  expected:  |-
    [ "inplace1", 1, 2 , "inplace2", 1, 2 , "inplace3" ]

@ingydotnet
Copy link
Member

@osher You are suggesting a YAML syntax change for one function. That's not on the table.

To be clear, and hopefully you understand this, the current << method for declaring a mapping merge function, is not YAML syntax either. It is simply a plain mapping key that some loaders use to indicate that a merge constructor function should be used during a load operation.

YAML already has a general purpose way to invoke any function. It's the !tag system.

key:
  <<: *map1
  foo: bar

is essentially the same as:

key: !merge-maps
- *map1
- foo: bar

So you can concatenate sequences, or a grillion other functions in YAML (now in 1.2):

key: !concat-seqs
- *seq1
- - foo
  - bar
- *seq2

What we are hoping to do in YAML 1.3 and beyond, is define a standard library of useful data functions.

We are also working on ways to make the tagging implicit so you don't need to use explicit !function tags all over the place.

Hopefully there will be very few changes to the YAML syntax proper. They would mostly be around making YAML more extensible overall. Certainly not for some particular data manipulation function.

@2colours
Copy link

Functions in a data description format sounds rather problematic to begin with, from design, implementation and security perspective likewise.

@osher
Copy link

osher commented Apr 3, 2023

No, I disagree - I'm not proposing a change of a function - the proposition is an addition that comes to answer for a true need, as we have a solution for maps and we neglect lists.

If we don't like overloading << by context - then we can use for explicitness
<< for objects and < for lists.

I also do not agree that it's functional or logic, the proposition is purely structural and declarative - as yaml is.

The proposition bypasses the need for merge logic and lets the user compose their value.
It follows the same thing we do with <<, and adjust it for lists with <.

Let me re-iterate:

sources:
  scalars: [ 1, 2 ]
  objects: [ {a: a}, {b: b} ]
  lists: [ [1], [2] ]
  some-object: { some: object }

should support concatenating of any number of lists into one list: &example1
  flow form: [ < *scalars, < *objects, < *lists ]
  indented:
    - < *scalars
    - < *objects
    - < *lists
  expected:  |-
     [ 1, 2, { "a": "a" }, { "b": "b" }, [1], [2] ]

should allow to control the order of elements in the resulting list: *example1 #well.. common

should allow adding explicit in-place items before, after and between merged lists:
  flow form: [ inplace1, < *scalar, inplace2, < *scalar, inplace3 ]
  indented:
    - inplace1
    - < *list1
    - inplace2
    - < *list2
    - inplace3
  expected:  |-
    [ "inplace1", 1, 2 , "inplace2", 1, 2 , "inplace3" ]

should support merging lists of lists without over flattening:
  indented:
    - < *lists, 
    - in-place-string
    - < [ [ in-place, expanded-list ] ], 
    - [ < *lists ]   # TRICKY on purpose, see what it means the flow form
  flow form: [ << *lists, in-place-string, << [ [ in-place, expanded-list ] ],  [ << *lists ]  ]
  expected: |-
    [
       [ 1 ],
       [ 2 ],
       "in-place-string",
       [ "in-place", "expanded-list" ],    
       [ [ 1 ], [ 2 ] ],                  
    ]
  
should allow to alias the resulting list in a new alias:
  indented: 
    aliased: &aliased
       - < scalars
       - < scalars
    referenced: *aliased
  flow-form: [ < *scalars, < *scalars ]
  expected:  |-
    { 
       "aliased": [ 1, 2, 1, 2 ],
       "referenced": [ 1, 2, 1, 2 ]       # but as a reference...
    }

should not be ambiguous with object merges:
  indented:
      - *some-object
      - < *scalars
      - <<: *some-object
        and: some
        more: attributes
  flow-form: [ *some-object, < *scalars, { <<: *some-object, and: some more: attributes } ]
  expected: |-
    [
      { "some": "object" },
      1,
      2,
      { "some": "object", "and": "some", "more": "attributes" }
    ]

should support a string value items with `<` as the first character:
  indented:
    - "<html />"
    - < *scalars
  flow-form: [ "<html />", < *scalars ]
  expected: |-
     [ "<html />", 1, 2 ]
  
      
should be as clear in flow-form as in the indent-based form: |
   well, since this is subjective, 
   the points of possible confusion are here for debate :)

@6543
Copy link

6543 commented Apr 28, 2023

I created https://codeberg.org/6543/xyaml to get the merging sequences (and later merging maps) functionality on top of this lib ... I'd like to stick 100% to what the official yaml spec do tell about merging ... so feedback welcome :)

@ingydotnet
Copy link
Member

ingydotnet commented Apr 28, 2023

@6543 ,

To be clear, The official YAML spec contains no mention of << or merging of any kind.

So far I have no real problem with https://codeberg.org/6543/xyaml as an extension to a particular YAML framework; which is what it appears to be claiming it is.

From a high level view YAML was created from the beginning to be used in such a way that various projects and usage domains would use a YAML framework configured in a specific way.

We (somewhat sadly) called this configuration "YAML Schema", which imprecisely means the overall configuration of your YAML framework (aka Load and Dump stack). It is not a validation schema like JSON Schema. Naming is HARD. :)

A YAML loader is a stack of transformations from YAML text to language native data structures. In that process (internally) each node (scalar, sequence, mapping) gets assigned a tag, and that tag maps to a transformation (function) that creates the desired native object.

Effectively what your new project is, is a specific loader schema (configuration) for a particular Go based YAML loader.

Your doc example:

array2:
- <<: *my_array_alias
- baz

could work in YAML 1.2 by assigning (resolving) the tag !merge-key to the << scalar (because it is a plain scalar mapping key of <<). Then assigning the !concat tag to the sequence because it contains a single pair mapping key tagged !merge-key.

Using this (new schema) in your Go based domain might become a popular choice. Dunno.

Many people seem to think that YAML is like JSON, where specific plain (unquoted) scalars mean a certain native type, but that's not the intent at all.
The intent was always meant to be that each YAML action is processed according to a specific "schema", and your domain can customize those rules to their heart's content.

If your domain is cross-process-communication, the processes need to agree on the exact schema at play. The spec (somewhat poorly imho) attempts to define a default composition schema for that purpose.

The spec really should be more clear on this topic (how schema application works and how to define the schema your application needs) in general, but specific schemas probably shouldn't be defined in the spec.
Certainly the definition of specific functional representation transformations like map-merge and seq-concat are not within the scope of the language spec.

We are working to make the definition of YAML schema (full configuration of every stage of both the load and dump stack) be declarative, and transformation capabilities would expand to vastly beyond merge/concat to full standard library level. For any given domain, your desired YAML behavior should be attainable by simply defining it in a declarative yaml schema file.
I see this as attainable without any changes to the 1.2 spec.

@6543
Copy link

6543 commented Apr 28, 2023

thanks for the clarification and looking into it ☝️

@kristian
Copy link

kristian commented Apr 5, 2024

Hey @ingydotnet, I took inspiration from your suggestion:

key: !concat-seqs
- *seq1
- - foo
  - bar
- *seq2

And created a !!concat-seqs type for the js-yaml library. It worked flawlessly. Thanks again for the suggestion.

@ingydotnet
Copy link
Member

ingydotnet commented Apr 5, 2024

@kristian First off congrats. :)

Secondly, thank you. That was the push I needed to get the elusive NodeJS binding for YAMLScript.

I've been waiting for the right time to share YAMLScript in this particular issue.

Not only can you merge and concat, the standard lib has 100s of functions. You can define you own functions (in your yaml or externally) and you an use many external libraries.

All YAML config files are valid YAMLScript files, and load the same with no code evaluation. Adding a top level !yamlscript/v0 tag enables code evaluation where you need it.

The language is still young, but we have loader libraries for 8 programming languages currently, including Python, Java, Rust and NodeJS. :)


I have to note that anchor / alias support is currently missing, but it's the very next thing on my list (should be in next 1-3 days).
Then you can:

!yamlscript/v0/

key::
  apply concat: !
  - *seq1
  - - foo
    - bar
  - *seq2

But here's a pretty interesting thing you can do today:

$ cat concat.yaml 
---
foo:
- one
- two

--- !yamlscript/v0/

key::
  apply concat::
  - ! $$.foo
  - !
    load: 'other.yaml'
  - - the
    - end
$ cat other.yaml
- buckle
- shoe
$ ys -Y concat.yaml 
key:
- one
- two
- buckle
- shoe
- the
- end

Note how concat.yaml has 2 documents and the 2nd refers to the first with $$.

Also the ReadMe for the NodeJS binding has some pretty interesting functionality:
https://github.com/yaml/yamlscript/tree/main/nodejs#nodejs-usage


Also next up is getting more documentation online.

Here's a a past talk and upcoming talk on YS:

I'm interested in hearing people's thoughts and questions about YAMLScript.

@ingydotnet
Copy link
Member

ingydotnet commented Apr 9, 2024

Updates...

I've started writing the docs: https://yamlscript.org/doc/

I'm working on aliases now.

One cool thing you'll be able to do in code mode is:

vars: &v
  greeting: Hello
  name: world
greeting:: "$(*v.greeting), $(*v.name)!"

IOW, path off of aliases.
Actually *v is not a real alias but it acts the same way.


Working through this use case I realized that:

!yamlscript/v0/

key::
  apply concat: !
  - *seq1
  - - foo
    - bar
  - *seq2

Is not as nice as:

key: !concat-seqs
- *seq1
- - foo
  - bar
- *seq2

So I decided to add tag function calls. You'll be able to:

!yamlscript/v0/

key: !concat*:
- *seq1
- - foo
  - bar
- *seq2

Note that concat is just one of hundreds of builtin functions that I didn't define.
It takes multiple sequences as arguments. Calling it with the * "splats" the sequence into 3 arguments here.

Compare that to this call which takes 1 sequence argument:

!yamlscript/v0/

key: !count:
- *seq1
- - foo
  - bar
- *seq2

Which would produce {"key": 3}


Just to be clear, as magic as all this seems, it's just YAML 1.2 syntax loaded with a very special loader library (your YAMLScript YAML loader).
Effectively YAMLScript has a very detailed YAML Schema that determines its meaning.

If you want to use YAMLScript but can't change your YAML loader for some reason (like if you wanted to use it for Ansible playbooks) you can simply use the ys command as a preprocessor:

$ ys -Y playbook.ys > playbook.yaml

@ingydotnet
Copy link
Member

Forgot to mention this but YAML doesn't support aliasing anchors defined in another document (in a multi-doc file or stream).

In code mode, YAMLScript does. So this would work fine:

vars: &v
  greeting: Hello
  name: world

--- !yamlscript/v0/
greeting:: "$(*v.greeting), $(*v.name)!"

as would:

--- !yamlscript/v0 &v load('vars.yaml')

--- !yamlscript/v0/
greeting:: "$(*v.greeting), $(*v.name)!"

Or to extend the example at hand:

- &seq1 [a, b, c]
- &seq2 [x, y, z]

--- !yamlscript/v0/

key: !concat*:
- *seq1
- - foo
  - bar
- *seq2

In these examples we never showed where &seq1 and &seq2 were being defined.
In YAML it's problematic because you can't define values for reuse without making the definitions part of the entire result.

With YAMLScript you have lots of places to define those kind of values; both inline and from external sources.

@ingydotnet
Copy link
Member

ingydotnet commented Apr 11, 2024

I finally got all this working and released: https://github.com/yaml/yamlscript/releases/tag/0.1.54

Here's an working example of what you can now do, by using a YAMLScript library to load your YAML files:

$ cat example1.yaml
- &seq1 [apple, banana, carrot]

--- !yamlscript/v0
=>: &seq2 load('example2.yaml')
url =: 'https://gist.githubusercontent.com/ingydotnet/e7d907648b78329ab0dfe8c398b0071f/raw/a160335147fdbdfe5739233b796efdf78c836313/example3.yaml'
=>: &seq3 curl(url).yaml/load().pathway.to.my.data

--- !yamlscript/v0/
key1: !concat*:
- *seq1
- - foo
  - bar
- *seq2
- *seq3

key2:: &seq4 concat(*seq1 ['foo' 'bar'] *seq2)

key3: *seq4

key4:: .*seq4.reverse()

key5: !reverse:
- 1
- 2
- 3

$ cat example2.yaml 
- x-ray
- yellow
- zebra

Load this file with ys:

$ ys -J example1.yaml 
{"key1":
 ["apple", "banana", "carrot", "foo", "bar", "x-ray", "yellow",
  "zebra", "I", "like", "pie!"],
 "key2":
 ["apple", "banana", "carrot", "foo", "bar", "x-ray", "yellow",
  "zebra"],
 "key3":
 ["apple", "banana", "carrot", "foo", "bar", "x-ray", "yellow",
  "zebra"],
 "key4":
 ["zebra", "yellow", "x-ray", "bar", "foo", "carrot", "banana",
  "apple"],
 "key5":[3, 2, 1]}

Here's a similar YAML file but with lots of comments: https://gist.github.com/ingydotnet/2b0f5a679ae0837a9afcd37345117438

Try it out for yourself.
Let me know if you have questions.
Either here or come chat with us: https://matrix.to/#/#chat-yamlscript:yaml.io

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests