Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datatypes RFC- Clarifying union vs set #309

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

oflatt
Copy link
Member

@oflatt oflatt commented Dec 1, 2023

No description provided.

@oflatt oflatt requested a review from a team as a code owner December 1, 2023 20:27
@oflatt oflatt requested review from saulshanabrook and removed request for a team December 1, 2023 20:27
@oflatt oflatt mentioned this pull request Dec 1, 2023
Copy link
Member

@saulshanabrook saulshanabrook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the current behavior would be clear if we didn't allow functions that return eqsorts (any sorts defined in the program) to have merge or default. union could be used on any two eqsorts, whereas set could also be used on functions that return primitives (all non eqsorts). default on a function that returns eqsorts could be rewritten with an initial union, so the real sticking point is how we allow defining merge attributes on functions that return eqsorts.

Is that your understanding as well?

If so, I might find it helpful to start by looking at the use cases so far with merge used, which is mainly in the eggcc example, to see how this drives the desired behavioral design space?

I suggest this because I personally am still unclear from any sort of first principles of the semantics what would be intuitively correct or obvious.

I do think I have a handle now on the current behavior of how this interacts, like by looking at this example:

(sort Math)

(function i (i64) Math)
(function add (Math Math ) Math)

(function f () Math :merge (add old new))
(set (f) (i 10))
;;(union (f (i 3)) (i 1))
(union (f) (i 11))
(set (f) (i 12))
(union (i 12) (f))
(set (f) (i 13))
;(union (f (i 3)) (i 10))

Screenshot 2023-12-01 at 4 47 40 PM

@oflatt
Copy link
Member Author

oflatt commented Dec 1, 2023

Edit: read more closely

Yes, the main difficult part is the combination of merge function and eqsort as the output.
I propose making them different from datatypes to avoid confusion and give guarantees about datatypes

@oflatt
Copy link
Member Author

oflatt commented Dec 1, 2023

I think of functions like f from your example as storing one e-class as the output.
It's not equal to this output in the equivalence relation- it's storing it as the output, and it might change due to the merge function.
So your visualization is confusing because f is in the same eclass as its output

@saulshanabrook
Copy link
Member

I think of functions like f from your example as storing one e-class as the output.
It's not equal to this output in the equivalence relation- it's storing it as the output, and it might change due to the merge function.
So your visualization is confusing because f is in the same class as its output

Yeah, I mean, I think that's where it's odd, it acts like a value when calling set but when using union or extract it acts just like an e-class currently:

(sort Math)

(function i (i64) Math)
(function add (Math Math ) Math)

(function f () Math :cost 100 :merge (add old new))
(set (f) (i 10))
;;(union (f (i 3)) (i 1))
(union (f) (i 11))
(set (f) (i 12))
(union (i 12) (f))
(set (f) (i 13))
;(union (f (i 3)) (i 10))
(extract (f))
(union (f) (i 100))
(add (f) (f))
(extract (add (f) (f)))
(add (i 12) (i 13))
(add (i 100) (i 100))

Screenshot 2023-12-01 at 5 54 59 PM

Hence why I was curious to learn more about how you are using the behavior in the eggcc example.

@oflatt
Copy link
Member Author

oflatt commented Dec 4, 2023

Exactly- that's the confusing part. I'm proposing that we stop treating it the same as datatype for union and extract.

In eggcc, we use an extraction table to store a resulting term that we have extracted. The table is an analysis table, similar to lower-bound

@oflatt
Copy link
Member Author

oflatt commented Dec 4, 2023

For example, in that visualization I would show (f (i 100)) as a separate entry. It's not "equal" to (i 100), it just stores (i 100) as an output

@yihozhang
Copy link
Collaborator

This proposal allows creating functions that have a merge function of union.

(function has-type (Math) Math :merge (union old new))

These functions behave similarly to datatypes, but they never have their own id- they can only be set to a datatype.

I wonder if this creates issues. It is a common practice to have a function like has-type to first have a placeholder id that later gets unioned with an actual "datatype" id. This is similar to unification variables in Prolog. I believe we should allow all functions whose output is an eqsort to make defaults.

@yihozhang
Copy link
Collaborator

On the other hand, the confusion in the eggcc extraction can be solved by having a different kind of datatypes that never union (like normal ADTs in a functional language): TermAndCost forms a lattice and is not supposed to be unioned with each other. Unfortunately, right now it is impossible to describe a (define (smaller (pair e1 c1) (pair e2 c2)) (if (< c1 c2) ... ...)) primitive within egglog due to inexpressiveness.

@saulshanabrook
Copy link
Member

On the other hand, the confusion in the eggcc extraction can be solved by having a different kind of datatypes that never union (like normal ADTs in a functional language): TermAndCost forms a lattice and is not supposed to be unioned with each other

Yeah I was thinking about this as well, if there is a way to define specific constructors as unable to be unioned with each other. Like in the equate-basic example if there would be a way to disallow (union (Num 10) (Num 1)). Like the difference between looking at it by constructor/function and by sort/type.

@oflatt
Copy link
Member Author

oflatt commented Dec 6, 2023

I wonder if this creates issues. It is a common practice to have a function like has-type to first have a placeholder id that later gets unioned with an actual "datatype" id.
@yihozhang and I chatted about this: this proposal still allows specifying a :default for these functions.

I think adding terms and functional programming would certainly help, but would also be a big change. Also, it's unclear how the functional programming would compose with egglog rules- what if I want to put an eq-able term inside a term? Or vice-versa?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants