Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguation based on PartialOrd between choices of a certain node type. #42

Open
eddyb opened this issue Sep 12, 2018 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@eddyb
Copy link
Member

eddyb commented Sep 12, 2018

@eternaleye suggested that instead of having only "validators", that remove "invalid" choices from the SPPF, a more general approach involves PartialOrd, or more specifically, a function taking two Handle<X> and returning an Option<Ordering>, where disambiguation means "find the largest X".

When the ordering between two choices is equal, that means it's still ambiguous (i.e. the default for node types with no user-provided disambiguation logic), while None means neither choice is valid.

Validity can be embedded by mapping all possible pairs of validities to partial orderings:

impl<T: Validate> Disambiguate for T {
    fn disambiguate(a: Handle<Self>, b: Handle<Self>) -> Option<Ordering> {
        match (a.validate(), b.validate()) {
            (false, false) => None,
            (false, true) => Some(Ordering::Less),
            (true, false) => Some(Ordering::Greater),
            (true, true) => Some(Ordering::Equal),
        }
    }
}
@eddyb eddyb added the enhancement New feature or request label Sep 12, 2018
@rljacobson
Copy link

rljacobson commented Sep 26, 2020

I think you might have to project into a partial order from a relation that lacks transitivity, because I feel like a rock-paper-scissors relationship is likely. The ideal properties would be a (upper-)semilattice, which has the additional property that for any two elements x and y, there exists a z such that z > x and z > y. (So z is a l.u.b. of {x,y}.)

If the goal is just to select a likely parse rather than the most likely parse, there probably isn't any harm in extending the partial order arbitrarily to a total order if it's convenient to do so. But that's somehow unsatisfying, isn't it...

A copy+paste of my note to Eddy: I am interested to know how a probabilistic model would perform. So you have a set of parse trees and want to find the "correct"/intended parse tree. Ahead of time you gather statistics about how language structures relate to each other locally by analyzing, say, all of crates.io. Check each parse tree against the statistical model to find the most likely. What I'm describing is syntactic, but it could in principle include other attributes like type, scope, some complexity metric, proximity... And your notion of validators is complementary, not mutually exclusive. In ML we tend to just dump all the factors into one big bucket and study the principle components of the resulting space. We can then remove any factors that appear useless or project to a lower-dimensional space. But, I mean, that's something you'd do, if you do it at all, only after you have a framework to implement any of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants