Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eval: issues around the function pattern #3711

Open
loisch opened this issue Jan 29, 2025 · 12 comments
Open

eval: issues around the function pattern #3711

loisch opened this issue Jan 29, 2025 · 12 comments
Assignees
Labels
evaluator evalv3 issues affecting only the evaluator version 3 NeedsInvestigation

Comments

@loisch
Copy link

loisch commented Jan 29, 2025

What version of CUE are you using (cue version)?

0.12-rc1

$ cue version
ue version v0.12.0-8dc16dd

go version go1.22.1
      -buildmode exe
       -compiler gc
        -ldflags -X cuelang.org/go/cmd/cue/cmd.version=v0.12.0-8dc16dd
     CGO_ENABLED 1
          GOARCH arm64
            GOOS darwin
             vcs git
    vcs.revision 8dc16dd330172af44bd376d3f51b32c96e56be81
        vcs.time 2025-01-28T14:37:26Z
    vcs.modified false
cue.lang.version v0.12.0

Does this issue reproduce with the latest stable release?

Yes and no. Old and new evaluator both produce the same result with version 0.11.1 but still the test doesn't fail (as it should, in my opinion). The new evaluator in version 0.11.1 produces the same result as the old evaluator.

What did you do?

env CUE_EXPERIMENT=evalv3=1
exec cue vet out.cue
exec cue export out.cue -e test1
cmp stdout foo.golden
env CUE_EXPERIMENT=evalv3=0
exec cue vet out.cue
exec cue export out.cue -e test1
cmp stdout impossible.golden
-- foo.golden --
"foo"
-- impossible.golden --
"impossible"
-- out.cue --
#TypeA: string | {foo: int}

#fun1: {
	in: #TypeA
	out: [
		// either `in` is a string
		if (in & string) != _|_ {
			in
		},
		// or `in` is has field `foo`
		if in.foo != _|_ {
			"foo"
		},
		"impossible" // in & #TypeA != _|_ !!
	][0]
}


//ab: #TypeA ??
ab: {
	foo: 1
	bar: "15"
}

test1: {
	let call = #fun1 & { in: ab }
	call.out
}

What did you expect to see?

A failing test. cue vet should not accept this program because #fun1.in must unify with #TypeA but ab does not unify with #TypeA. Also #fun1.in is used it the calculation of the result (call.out), so it influences the result and it should have _|_ as value and thus call.out should be _|_, too. If my intuition is wrong and this program is indeed correct it should still have the same result in both evaluators.

What did you see instead?

A passing test showing that old and new evaluator produce different results and that a validity constraint (in: #TypeA) is being ignored.

@loisch loisch added NeedsInvestigation Triage Requires triage/attention labels Jan 29, 2025
@myitcv
Copy link
Member

myitcv commented Jan 30, 2025

@loisch thanks for raising this. Taking a look now.

@myitcv myitcv self-assigned this Jan 30, 2025
@myitcv myitcv added evaluator evalv3 issues affecting only the evaluator version 3 and removed Triage Requires triage/attention labels Jan 31, 2025
@loisch
Copy link
Author

loisch commented Jan 31, 2025

Thank's for looking into this @myitcv ! What result should this program have, according to the language specification?
From my intuition _|_ should be correct and both evaluators are wrong, but I'm not sure about the language semantics and I didn't find a description of evaluation semantics concerning order / lazyness in the language specification. I don't understand i if #fun1.in has to unify with ab for #fun1.out to be computed? Does a clear definition and/or description of operational semantics exist? I also don't quite understand what if in.foo != _|_ means. Does this implicitly unify in with {foo:_}? Is it semantically the same as if (in & {foo:_}) != _|_ or does this create a new value that is unified with {foo:_} leaving in untouched? Can this ever make a difference? Can the order of if clauses in the comprehension make a difference preferring one of two disjunct alternatives to another?

@myitcv
Copy link
Member

myitcv commented Feb 1, 2025

This turned out to be a bit of a thorny issue. I'll go through my analysis for the benefit of others following along, and will summarise with next steps, links to specific issues etc.

All analysis as of 5991735.

Given the following repro:

# -- evalv2 --
env CUE_EXPERIMENT=evalv3=0

# in-out function
exec cue export in_out.cue data.cue
cmp stdout stdout.golden

# embedded scalar function
! exec cue export embedded_scalar.cue data.cue
stderr 'bar: field not allowed'

# -- evalv3 --
env CUE_EXPERIMENT=evalv3=1

# in-out function
exec cue export in_out.cue data.cue
cmp stdout stdout.golden

# embedded scalar function
! exec cue export embedded_scalar.cue data.cue
stderr 'bar: field not allowed'


-- data.cue --
package x

examples: eg1: in: "test"
examples: eg2: in: {
	foo: 5
}
examples: eg3: in: {
	foo: 5
	bar: "test"
}
-- embedded_scalar.cue --
package x

#TypeA: string | {foo: int}

#fun1: {
	_in: #TypeA
	[
		// either `in` is a string
		if (_in & string) != _|_ {
			_in
		},
		// or `in` is has field `foo`
		if _in.foo != _|_ {
			"foo"
		},
		"impossible", // in & #TypeA != _|_ !!
	][0]
}

examples: [string]: {
	in: _
	res: #fun1 & {_in: in, _}
}
-- in_out.cue --
package x

#TypeA: string | {foo: int}

#fun1: {
	_in: #TypeA
	out: [
		// either `in` is a string
		if (_in & string) != _|_ {
			_in
		},
		// or `in` is has field `foo`
		if _in.foo != _|_ {
			"foo"
		},
		"impossible", // in & #TypeA != _|_ !!
	][0]
}

examples: [string]: {
	in: _
	res: (#fun1 & {_in: in, _}).out
}
-- stdout.golden --
{
    "examples": {
        "eg1": {
            "in": "test",
            "res": "test"
        },
        "eg2": {
            "in": {
                "foo": 5
            },
            "res": "foo"
        },
        "eg3": {
            "in": {
                "foo": 5,
                "bar": "test"
            },
            "res": "impossible"
        }
    }
}

my expectation is that the testscript should pass (note the inverted exit code expectations via !). However, we actually see the following (note the use of -continue):

$ testscript -continue repro.txtar
# -- evalv2 -- (0.000s)
# in-out function (0.013s)
# embedded scalar function (0.010s)
# -- evalv3 -- (0.000s)
# in-out function (0.038s)
> exec cue export in_out.cue data.cue
[stdout]
{
    "examples": {
        "eg1": {
            "in": "test",
            "res": "test"
        },
        "eg2": {
            "in": {
                "foo": 5
            },
            "res": "foo"
        },
        "eg3": {
            "in": {
                "foo": 5,
                "bar": "test"
            },
            "res": "foo"
        }
    }
}
> cmp stdout stdout.golden
diff stdout stdout.golden
--- stdout
+++ stdout.golden
@@ -15,7 +15,7 @@
                 "foo": 5,
                 "bar": "test"
             },
-            "res": "foo"
+            "res": "impossible"
         }
     }
 }

FAIL: /tmp/testscript3968499483/repro.txtar/script.txtar:17: stdout and stdout.golden differ
# embedded scalar function (0.037s)
> ! exec cue export embedded_scalar.cue data.cue
[stdout]
{
    "examples": {
        "eg1": {
            "in": "test",
            "res": "test"
        },
        "eg2": {
            "in": {
                "foo": 5
            },
            "res": "foo"
        },
        "eg3": {
            "in": {
                "foo": 5,
                "bar": "test"
            },
            "res": "foo"
        }
    }
}
FAIL: /tmp/testscript3968499483/repro.txtar/script.txtar:20: unexpected command success
> stderr 'bar: field not allowed'
FAIL: /tmp/testscript3968499483/repro.txtar/script.txtar:21: no match for `bar: field not allowed` found in stderr

Summarising the result:

  • evalv2 passes as expected
  • evalv3 fails for both the embedded_scalar.cue and in_out.cue cases

Some notes on this repro before my analysis:

  • embedded_scalar.cue uses the pattern of embedded scalars for writing a function.
  • in_out.cue uses the in-out pattern for writing a function, present in the original repro in this issue.

My analysis of the above result is as follows:

  1. evalv2 (CUE_EXPERIMENT=evalv3=0) is behaving as expected. It is correctly failing in the case of embedded_scalar.cue for eg3, because the struct value that results from the function call is in error, and hence the embedded scalar is in error. It is correctly succeeding in the case of in_out.cue for eg3, because the out field is not in error. Note however that this behaviour of the in-out pattern is almost certainly not what the user intended in this case. More on that below.
  2. evalv3 (CUE_EXPERIMENT=evalv3=1) is not behaving as expected. It should match the behaviour of evalv2. This is a bug in evalv3 to my mind. Except for the fact that evalv3 can behave correctly if we set CUE_DEBUG=openinline=0. i.e. the above repro passes as expected if we set CUE_DEBUG=openinline=0. Hence there is not actually a bug in evalv3.

Specifically regarding point 2. I have outlined my reasoning in I have commented in #3688 (comment) as to why I think our default of CUE_DEBUG=openinline=1 is incorrect.

Note also how point 2 builds on point 1. If my assertion in point 1 about evalv2 being correct is itself incorrect, much of what follows needs to be revised.

Therefore, to expand on why I think point 1 holds, specifically the behaviour of evalv2 with respect to in_out.cue. Consider the following:

# set CUE_DEBUG=openinline=0 for correct behaviour. See comment in
# https://github.com/cue-lang/cue/issues/3688#issuecomment-2628814882 for more
# details
env CUE_DEBUG=openinline=0

# -- evalv2 --
env CUE_EXPERIMENT=evalv3=0
exec cue export -e out simple.cue
cmp stdout stdout.golden

# -- evalv3 --
env CUE_EXPERIMENT=evalv3=1
exec cue export -e out simple.cue
cmp stdout stdout.golden

-- simple.cue --
in:  true & false
out: 5
-- stdout.golden --
5

This passes. We can export the value in the out field despite the in field being in error. This short example passing explains my belief that the in_out.cue pattern not failing for evalv2 is correct behaviour: that despite the fact that the out field is declared "in terms of" in (more on that below), its value (the value of the out field) is not in error.

The behaviour demonstrated by cue export -e is a useful property of CUE. It has behaved liked this for some time. It is a property that allows for parts of very large configurations to be evaluated independent of errors in other parts. But it disagrees with the spec:

The result of a unification is bottom (_|_) if any of its defined fields evaluates to bottom, recursively.

Indeed @loisch has just commented to this effect, in the time since I started composing this reply! I have raised #3715 in order to clarify the spec.

That said, what follows assumes that the behaviour of cue export -e is intentional, that the behaviour of evalv2 with respect to in_out.cue is correct, and that we therefore need to fix up the spec.

A brief tangent to confirm that both evalv2 and evalv3 agree (modulo exact error counts and messages) on the following:

# set CUE_DEBUG=openinline=0 for correct behaviour. See comment in
# https://github.com/cue-lang/cue/issues/3688#issuecomment-2628814882 for more
# details
env CUE_DEBUG=openinline=0

# -- evalv2 --
env CUE_EXPERIMENT=evalv3=0
! exec cue eval x.cue
stderr 'b: (2|3) error'
stderr 'd: (2|3) error'
stderr 'e.bar: field not allowed'

# -- evalv3 --
env CUE_EXPERIMENT=evalv3=1
! exec cue eval x.cue
stderr 'b: (2|3) error'
stderr 'd: (2|3) error'
stderr 'e.bar: field not allowed'

-- x.cue --
#TypeA: string | {foo: int}

a: #TypeA & "test"
b: #TypeA & 5
c: #TypeA & {
	foo: 5
}
d: #TypeA & {
	foo: "test"
}
e: #TypeA & {
	foo: 5
	bar: "test"
}

That is to say, in this smaller example, the in field is in error if a struct value is passed with the extraneous bar field, for both evaluators. i.e. closedness checks are working.

Which brings us neatly back to the question of the intent of the user in this issue. In #fun1, out is clearly declared "in terms of" in. It seems odd therefore that out can be in a state of no error, whilst in is in error. Whilst in some cases a reference to in would result in an error in the result in out, where if comprehensions are used, and indeed in general, it is possible to have a situation where out it not in error when in is. Unless I'm missing something, we don't have the machinery to require, as part of the declaration of out, that in must not be in error. This could be fixed by the introduction of must() or similar, such that the following would be possible:

// pseudo code
_in: #TypeA
out: must(_in) & _

I have made this observation in #575 (comment).

Clearly it's awkward to have authors of such "functions" have to annotate with must() in this way. Nonetheless, perhaps this observation provides some interesting insight as to if/how we offer more first-class support for the function pattern. Perhaps something for @mpvl, @rogpeppe and @cuematthew to consider.

Returning to embedded_scalar.cue. Because the repro appears to suggest that this pattern does offer a way forward, whereby it is impossible to produce an "out" value if the parameter field is in error.

Whilst discussing that observation with @rogpeppe, Rog kindly pointed out why the embedded_scalar.cue approach for declaring functions does not work in general (play):

#addn: {
	_in: {
		val!: int
		n!:   int
	}
	_in.val + _in.n
}

x: #addn & {
	_in: {
		val: 2
		n:   5
	}
	_
}
x: #addn & {
	_in: {
		val: 1
		n:   6
	}
	_
}

which gives:

x._in.n: conflicting values 6 and 5:
    ./x.cue:12:8
    ./x.cue:19:8
x._in.val: conflicting values 1 and 2:
    ./x.cue:11:8
    ./x.cue:18:8

There is also the fact that with the embedded_scalar.cue approach, we in general need to resort to declaring parameters via definitions which has the unavoidable consequence of recursively closing the argument when it is used.

That said, a package-level visibility modifier combined with downcasts to "drop" the parameters in the result, might offer a solution.

All of the above tends to point towards some kind of first-class support for function calls being necessary.

Summary

To briefly summarise the points from above, a sort of TL;DR:

@myitcv myitcv changed the title unchecked constraint with v2 and v3 eval: on the function pattern Feb 2, 2025
@myitcv myitcv changed the title eval: on the function pattern eval: issues around the function pattern Feb 2, 2025
@loisch
Copy link
Author

loisch commented Feb 3, 2025

Thank you @myitcv so much this detailed analysis and description! It helped me a lot to understand what's going on. My intuition is a bit off because with lazy evaluation in Haskell we also have bottom / undefined but that's the same as non-termination and nothing you can recover from because it's not a regular value. My intuition problem turned out to be the list comprehension.

It might help others to show "good ways" and "bad ways" to solve problems. The function in this example uses a list comprehension for a case distinction on the type of the input. That's not a good idea. At least it might not do what you intend.

For example I expected that an if-clause with a check like (input & string) != _|_ means that input "is a" string. But that's wrong! input could "turn out" to be an int (technically input will never "be" an int but it might also unify with int).

#values: "string" | 42

// function to convert string or int to string
#fun: {
	input: string | int
	output: [
		if (input & string) != _|_ {input}, // use input string
		if (input & int) != _|_ {"int"},    // use constant string
		null,                               // impossible
	][0]
}

out: (#fun & {input: #values}).output // output always has type string... or not?
out: int                              // this works
out: 42

It's much better, idiomatic, safer and simpler to use a disjunction to define the "case distinction" function:

#fun: {
	input:  string // if the input is a string
	output: input  // then output is just the input
} | {
	input:  int   // if the input is an int
	output: "int" // then output is a constant string
}

The list comprehension also needs the "impossible" case. I don't really understand why I get

#fun.output: index out of range [0] with length 0:

when leaving it out... Which value of input could lead to both if-clauses failing? Where would that value come from?

Still so much to learn!

@myitcv
Copy link
Member

myitcv commented Feb 4, 2025

It might help others to show "good ways" and "bad ways" to solve problems.

Absolutely, and that's what we're looking to do with our worked examples on https://cuelang.org/.

The function in this example uses a list comprehension for a case distinction on the type of the input. That's not a good idea. At least it might not do what you intend.

I'm not convinced it's totally wrong. Per my analysis above, the problem is not that CUE is failing to detect the error in the input field, it's that it's possible for the output field to not be in error in that case. We just don't have the machinery required to express that constraint.

In the meantime your switch statement needs to cover the three possible states of the input field:

  • string
  • int
  • _|_

Hence the need for the "impossible" case. And hence why if that case is missing you get the index out of range error, because the list is empty.

Another alternative would be to write:

	output: [
		if (input & string) != _|_ {input}, // use input string
		if (input & int) != _|_ {"int"},    // use constant string
		_|_,                               // impossible
	][0]

in lieu of the error() builtin, but this is again just papering over the crack that is the absence of must() or similar.

@loisch
Copy link
Author

loisch commented Feb 4, 2025

Thank's for explaining the "main" issue. That's why I propose using the disjunction because the alternative is only valid if the whole struct is valid. I can't get a value for output if input is in error because that invalidates the alternative.

This example from my comment above is different

#values: "string" | 42

// function to convert string or int to string
#fun: {
	input: string | int
	output: [
		if (input & string) != _|_ {input}, // use input string
		if (input & int) != _|_ {"int"},    // use constant string
		null,                               // impossible
	][0]
}

out: (#fun & {input: #values}).output // output always has type string... or not?
out: int                              // this works
out: 42

I don't see where _|_ would come from? #values unifies with string | int so where would _|_ come from?

It again feels like a bug. I think the problem is that x != _|_ is false if x is string although later x might be unified with a concrete string and then the comparison would have been true. When looking at the value latticestring != _|_ should be true because string is distinct from bottom / "above" bottom and therefore unequal to bottom! I'm inclined to report this as a bug but I haven't had a look at the specification. Probably comparison is just defined that (unintuitive) way to only be true for concrete values. It still feels wrong and I think the better behavior would be to create the list comprehension as not concrete instead of producing a concrete value (in this case index error) from a comparison of an unconcrete value to bottom.

I gave the example because many beginners including me would think that it behaves like this Python program

from typing import Union

def fun(input: Union[str, int]) -> str:
    if isinstance(input, str):
        return input  # Use the input string
    elif isinstance(input, int):
        return "int"  # Use the constant string
    else:
        raise ValueError("Impossible type")  # Should never occur

# Test cases
print(fun("hello"))  # Expected: "hello"
print(fun(42))       # Expected: "int"

It seems the function maps int | string to string but it doesn't! If input is a disjunction 42 | "string" the result is not a string.

@myitcv
Copy link
Member

myitcv commented Feb 4, 2025

Thank's for explaining the "main" issue. That's why I propose using the disjunction because the alternative is only valid if the whole struct is valid. I can't get a value for output if input is in error because that invalidates the alternative.

Apologies, for some reason I entirely skipped over reading that part of your reply! My immediate thought is where that pattern applies it probably offers quite an elegant solution for now. Albeit better if we had more first-class function support, so the errors can be more precise.

This example from my comment above is different

A general point to start with: I think the other approach we are discussing, with the list-based switch, is going to fall short in a number of ways, not least because we don't have all the necessary builtins to be precise. See #943 for an explanation of various options that ultimately allow the comparison with _|_ to be eliminated. Which will make for more correct, less ambiguous, and more readable code.

Moving on to the various behaviours we are seeing. (And apologies if anything that follows misses the point you are making - I'm trying to keep up with various threads in various issues!)

This repro passes:

env CUE_EXPERIMENT=evalv3=1
! exec cue export x.cue
stderr 'out: incomplete value "string" | 42'

-- x.cue --
#values: "string" | 42

#fun: {
	input: string | int
	output: [
		if (input & string) != _|_ {input},
		if (input & int) != _|_ {"int"},
		null,
	][0]
}

out: (#fun & {input: #values}).output

This is expected to my mind, because ("string | 42) & string is not an error. It is also not concrete, but that is not what is being expressed in the constraint. Hence the result is "string" | 42, which is incomplete because we cannot distinguish which to take. We can disambiguate the disjunction with out: int, at which point the answer is concrete and the export will succeed:

env CUE_EXPERIMENT=evalv3=1
exec cue export x.cue
cmp stdout stdout.golden

-- x.cue --
#values: "string" | 42

#fun: {
	input: string | int
	output: [
		if (input & string) != _|_ {input},
		if (input & int) != _|_ {"int"},
		null,
	][0]
}

out: (#fun & {input: #values}).output
out: int
-- stdout.golden --
{
    "out": 42
}

We don't have the builtins to express a constraint on input that it be concrete, but we can express the constraint on another field or in a conditional. However, with this approach we leave ourselves open to the problem I described in #3711 (comment).

There is an alternative:

env CUE_EXPERIMENT=evalv3=1
exec cue export x.cue
cmp stdout stdout.golden

-- x.cue --
#values: "string" | 42

#fun: {
	input: string | int
	output: [
		if (input + "" & string) != _|_ {input},
		if (input + 0 & int) != _|_ {"int"},
		null,
	][0]
}

out: (#fun & {input: 42}).output
-- stdout.golden --
{
    "out": "int"
}

Note that this still leaves us in the situation where null can result:

env CUE_EXPERIMENT=evalv3=1
exec cue export x.cue
cmp stdout stdout.golden

-- x.cue --
#values: "string" | 42

#fun: {
	input: string | int
	output: [
		if (input + "" & string) != _|_ {input},
		if (input + 0 & int) != _|_ {"int"},
		null,
	][0]
}

out: (#fun & {input: true}).output
-- stdout.golden --
{
    "out": null
}

So we really need the builtins from #943 in addition to must() or similar to make this work fully to my mind. Or, at least in this situation, we can resort to the embedded scalar approach.

@loisch
Copy link
Author

loisch commented Feb 4, 2025

Oh, thank you so much, @myitcv ! It's like private lessons and I very much appreciate it! It's clear now. :-)

CUE is fascinating! I read #943 and I think these changes (function-call and definition syntax, isconcrete, etc) would be really helpful! They won't make CUE simpler (but also not more complex in it's core) but they will make it a lot easier to get started and "do the right thing". Syntactic sugar introduces the patterns you should use and allows us to map concepts known from other languages into the space of the new language without realising that the underlying execution model is very different. Without the "do" syntax which allows Haskell code to look like imperative language code, Haskell would have died without ever being seen, because this allowed pragmatic programmers to get things done without having to first understand lambdas and monads and the bind operator (>>=). Later you discover the simple but powerful mechanics behind the syntactic sugar and enjoy the fascinating power of the "programmable semicolon".

Keep up the good work! I think CUE is well worth the learning curve! And about getting things done: My configuration generator proof-of-concept is working now flawlessly which v2 and v3 and I'm very happy!

@myitcv
Copy link
Member

myitcv commented Feb 5, 2025

I think CUE is well worth the learning curve! And about getting things done: My configuration generator proof-of-concept is working now flawlessly which v2 and v3 and I'm very happy!

Thanks so much for taking the time to engage so comprehensively here, @loisch.

This issue helped me (at least!) to some very interesting realisations regarding the choices we have for more first-class "function" support. And also helped to reinforce the ideas in #943 and #575, albeit through a different lens.

Once we have the critical bits for evalv3 off the, well, critical-path, we can hopefully return to these points in relatively short order.

@the-nurk
Copy link

the-nurk commented Feb 5, 2025

#454 (comment)

still reading through all the issues mentioned. probably not the greatest take as i was writing in a hand-wavy nature.

this was the cue i made to try to generically handle this issue, but having to call it every function is not performant at all due to lack of structural sharing. it's still pretty performant for one shot use cases, but using it for every function dispatch i was getting 40 second eval times for like 50 lines of json.

        #one_of_many: {
          "": what: """
          expands a single sumtype selection into a list of booleans that describe every case

          this is for reducing a large amount of boilerplate around enumerating the truth tables needed to make decisions in cue layer

          typically used in the `down` section of a unit and fed some choices,options,tags from the `in.pieces` or `in.rules` sections.

          # important :- if options is a list of strings, tags will be inferred from that
              if you want non-human readable, but enum-like options, simply use numbers, not strings
              tags can be human readable and map to options based on array position
              as a result, options that are hashes is probably not a good idea,
                unless a human isn't the consumer of the option list

              i.e. this is operated on as a multi-array list (#mal):
                {
                  options: [...]
                  tags: [...]
                }
              # note :- choices is not considered part of the #mal

          i.e. :-

          choices: [...or(opts)]
          opts: [
            "chatgpt",
            "gemini",
            "claude",
            "pplx",
          ]
          prefix: "use"
          sep: "_"

          becomes

          use_chatgpt: bool
          use_gemini: bool
          use_claude: bool
          use_pplx: bool

          where `bool` is true if the option was present in the choices
          @; list.Contains() comparison pathway is probably a bad idea, so avoiding it
              using single comprehension with union/equivalence comparison
          """

          ii = in: {
            pieces: {

              // the options that were selected
              choice: _

              // the list/map of option values (strings, values, schemas)
              options: [...rules.#shape] | { [string]: rules.#shape }

              // the list of option keys (human-readable name for an option)
              // don't have to provide this in call if options is [...string] or {[string]: rules.#shape}
              tags: [...rules.#basic]
            }

            rules: {
              // sumtype over shapes
              #shape: or(_shapes)
              #basic: or(_basics)
              // kinds of things that we are truthing
              _shapes: [
                #basic,
                {...},
                [...],
              ]
              _basics: [
                string,
                number,
                bool,
                bytes,
              ]

              // thing before your truth's fieldname
              prefix: *"" | string

              // separator between prefix and option tag
              sep: *"_" | string
              debug: bool | *false
            }
          }

          dd = down: {
          // down: {
            let p = ii.pieces
            let r = ii.rules

            prefix: "\(r.prefix)\(r.sep)"
            have_choice: (p.choice != _|_)
            have_choice_str: have_choice && ((p.choice & string) != _|_)
            have_choice_basic: have_choice && ((p.choice & r.#basic) != _|_)
            have_choice_complex: have_choice && !have_choice_basic

            have_shape_map: ((p.options & {[string]: r.#shape}) != _|_)
            have_shape_list: ((p.options & [...{..., _}]) != _|_)
            have_grid_list: ((p.options & [...[...]]) != _|_)
            have_basics: ((p.options & [...r.#basic]) != _|_)
            have_strings: ((p.options & [...string]) != _|_)

            have_tags: (len(p.tags) > 0)

            // @; error handling : options and tags must be same length
            ltags: len(p.tags)
            lopts: len(p.options)
            vet_pieces_mal_length: (ltags == lopts)

            // if we are inferring tags from opts
            mal: [
              if have_shape_map { (u.prime.shapez.mal.from.map & { 
                in: {
                  pieces: p.options
                  rules: {
                    k: "clean_tags"
                    v: "clean_opts"
                  }
                }
                }).out
              },
              {}
            ][0]

          }


          out: uu.truths
          // out: {
          //   down: dd
          //   in: ii
          //   up: uu
          // }

          uu = up: {
          // up: {
            let p = ii.pieces
            // let r = ii.rules

            clean_opts: [
              if dd.have_shape_map { dd.mal.clean_opts },
              if dd.have_basics || dd.have_shape_list || dd.have_grid_list { p.options },
              { [] }
            ][0]

            clean_tags: [
              if dd.have_shape_map { dd.mal.clean_tags },
              if dd.have_strings { p.options },
              if dd.have_tags &&
              (dd.have_basics && !dd.have_strings) ||
              dd.have_shape_list ||
              dd.have_grid_list { p.tags },
              {"# error :- clean_choice cases weren't met."},
            ][0]

            clean_choice: [
              // if have basic and map, choice will be at map's value for choice key
              if dd.have_choice_basic && dd.have_shape_map { p.options["\(p.choice)"] },

              // if have basic and no map, choice must be an option
              if dd.have_choice_basic && dd.have_basics { p.choice & or(p.options) },

              // else just leave choice as is
              if dd.have_choice { p.choice },

              // @; error handling
              {"# error :- clean_choice cases weren't met."},
            ][0]

            vet_clean_mal_length: [
              if (len(clean_opts) == len(clean_tags)) {true},
              {false},
            ][0]

            truths: [
              if dd.have_choice { for x, t in clean_tags {
                // "\(dd.prefix)\(t)": ((clean_choice & matchN(1, clean_opts[x])) != _|_)
                "\(dd.prefix)\(t)": ((clean_choice & clean_opts[x]) != _|_)
              } },
              { }
            ][0]
          }
        }

@the-nurk
Copy link

the-nurk commented Feb 5, 2025

this was my function path flow introspector. categorized, enumerates, and canonicalizes input states and functions that are available as a result of those states and presents results that have the coordinates to their actual objects in the input/output data stored in-band. looping on and matching with the output structure of this builtin should allow the cuelang developer to make the decision about when to render out since out would be able to access this and determine explicitly when it should evaluate.

// an address
// each element in addr says which element in a different array to travel down/open/use
// took this from Cantor Diagonlization
type addr []int64

type Coordinate struct {
  bag_location addr
  item_location addr
  shape_location addr
}

type InputStyle uint8

const (
	bag_item_shape    InputStyle = iota // does this item in the bag match the shape? : bag @ item must match shape (single unifyaccept)
	bag_item_shapes                     // which shapes does the item in the bag match? all shapes? : bag @ item must match shapes (loop on shapes)
	bag_items_shape                     // which items match this shape in this bag? : every item in bag must match the one shape (loop on items)
	bag_items_shapes                    // do all these items in this bag meet these specifications? can they be used with these schemas? : every item in bag must match its corresponding shape element (multi-array list)
	bags_item_shape                     // in which bags are this item,shape? : for each bag, find the item and match against shape (loop on bags)
	bags_item_shapes                    // do these bags have an item that can be used with these shapes? : for each bag, find the item, and match it against all shapes (loop on bags, then shapes)
	bags_items_shape                    // do these bags have items that look like this shape? : for each (bag,item) element-wise pair, match it against the single shape (multi-array list)
	bags_items_shapes                   // which shapes are available in which bags and which items meet them? : for each (bag,item,shape) element-wise tuple, single unifyaccept (multi-array list)
	invalid_style
)

at each depth, ((bag, item), shape) is assessed. if item and shape are both lists/maps and max depth hasn't been reached, dive deeper according to a Coordinate element for the (current depth + 1). if (bag,item) is a list, and shape is a unit, no need to recurse, unit must match all list items. same for the inverse of one (bag,item) and a shape list. once max_depth is reached along some axis (bags: 2, items: 3, shapes: 4) unifyaccept (or whatever match implementation is fired off on the deeper structures). max depth inputs for each domain force bounded loop recursion and act as a chunking or layman's path tracing methodology. i took addr from numpy's stride-based indexing and i'm quite literally trying model how in-game objects are rendered in path tracing algorithms where it fires an observation chain for each pixel that rolls up color and occlusion data for that pixels light path into the world. all matches get sent down a memoizer channel so the output can be built-up in one shot.

a couple things may be off in the schema below

MatchNMK: {
        // functions as matchn but on arbitrary depth multi-dimensional arrays in order to retrieve
        // a 'reference frame' on the data planes to make decisions with in cue pipelines.

        // bags : data planes : individual item, or a map/list of items
	bags: _

        // items : paths in data planes to look at
	items: _ & matchN(1, [string | [...string]])

        // shapes : schemas paths must match
        shapes: _

	// whether shapes must match, or shapes have to not match
	//   essentially : false -> pick operation,  true -> mask operation
	invert: bool | *false

	// whether to fill sparse tables or not since evaluation is sparse
	//   important to note that this will restrict your table styles
	//   only summaries will be returned
	// use this only when you want to loop through the entire dataset to make decisions on cells again in cue
	fill: bool | *false

	// summaries : used to quickly loop over success cases and use the in-band address data to retrieve the value via array indexing:-
	//   matchN-counts:- 2D table indexed on bags with flattened counts of shapes matched per bag set
	//   matchN-oneOf:- same, but oneOf is the rollup function
	//   matchN-anyOf:- same, but anyOf is the rollup function
	//   matchN-allOf:- same, but allOf is the rollup function
	//   matchN-noneOf:- same, but noneOf is the rollup function
	//   flat-shape-map:- 
	//   flat-shape-list:-
	//   sums-bags:- 
	//   sums-items:-
	//   sums-shapes:- 

        boxes: _ // & matchN(-1, [#memos, [...#memos]])
        #memos: or([
          "shape", // input dimensionality state tables storing shapes at their successful match location in the data
          "bool", // same, but storing true/false at their successful match location in the data (alias for "boolean")
          "boolean", // ''
          "bin", // same, but storing 0/1 for SIMD uses instead of bools (alias for "binary")
          "binary", // ''
          "item", // same, but storing the item (path) at the successful match location in the data
          "matchN-counts" // 2D table indexed on bags with flattened counts of shapes matched per bag set
          "matchN-oneOf" // same, but oneOf is the rollup function
          "matchN-anyOf" // same, but anyOf is the rollup function
          "matchN-allOf" // same, but allOf is the rollup function
          "matchN-noneOf" // same, but noneOf is the rollup function
          "flat-item-map" // fully flattened/sparse map of items that matched
          "flat-item-list" // fully flattened/sparse list of items that matched
          "flat-shape-map" // fully flattened/sparse map of shapes that matched
          "flat-shape-list" // fully flattened/sparse list of shapes that matched
          "flat-bag-map" // fully flattened/sparse map of bags that successfully matched
          "flat-bag-list" // fully flattened/sparse list of bags that matched.
          "sum-bags" // integer summation of how many bags had items that matched (or non-matches if `match: false`)
          "sum-items" // integer summation of how many items were matched (or non-matches if `match: false`)
          "sum-shapes" // integer summation of how many shapes were matched (or non-matches if `match: false`)
  ])

  // map of all summaries from function evaluation that are relevant. so manual control of function flow can be handled easier by just looping through flat list or map
  //   will be multi-array list either way
  out: {
        // state tables with same dimensionality as inputs
        boolean: [...[...[...[...]]]]   // True/false results
        binary: [...[...[...[...]]]]    // 1/0 results SIMD
        bags: [...[...[...[...]]]]     // containers that matched  
        items: [...[...[...[...]]]]      // paths that matched
        shapes: [...[...[...[...]]]]     // shemas that matched  

        // Aggregated 2D views
        counts: [...[...int]]    // Totals per dim  
        matches: {
            one: [...[...int]]  // Single match
            any: [...[...int]]  // 1+ match
            all: [...[...intl]]  // All match
            none: [...[...int]] // 0 match
        }
        
       #Memo: {
          bag: [...int]
          item: [...int]
          shape: [...int]
        }

        // Flattened output
        item_list: [...#Memo]    // Matched items
        item_map: [string]: #Memo   // Mapped items
        
        shape_list: [...#Memo]  // Matched patterns
        shape_map: [string]: #Memo // Mapped patterns

        sums: {
            bags: int    // Total bags
            items: int   // Total items 
            shapes: int  // Total shapes
            matches: int  // Total matches
            errors: int   // Total errors
        }
    } 
  }
}

@the-nurk
Copy link

the-nurk commented Feb 6, 2025

maybe

#fn: {
  @in: matchIF(#in_validator, #standardizer, #error_handler)
  // @ says 'unions should have things that slot in here'
  // this is why i think there needs to be syntax or some builtin for handling families of type collisions.
  // and it should probably be based on indexing/looping/assessing constraints at the syntax level.

  _do: _ // do thangs

  *(out: #out_validator )
// signals out is allowed out of the cue flow
// and that it `must()` since the LHS is wrapped too.
// idk how to wrap on the RHS with syntax
}

to me after using cue for a year and just programming in-general if i were to put human words on syntax operators these would be them.

are : speak : change : "{}"
must : require : finish : "()"
grid : arrange : assess : "[]"
take : link : for each : "*"
and : add : append : while : inclusive or : "+"
or : xor : divide : when : "/"
set : associate : ":"
get : create idea : merge : "#"
place : locate : find : "@"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evaluator evalv3 issues affecting only the evaluator version 3 NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

3 participants