AST discussion #12

kevinbarabash · 2017-02-07T05:07:22Z

An AST can be produced by input that is syntactic valid. Operations and functions may (or may not) be semantically valid depending on the operands. Moreover, the operands may affect the meaning and or the properties of an operation or function, e.g.

a * b commutes if a and b are scalars whereas A * B does not commute if A and B matrices

Semantic knowledge should be kept out of the AST, but can be used with an AST to for purposes of validating, evaluating, and transforming a mathematical statement.

A mathematical statement could be an expression, equation, or definition.

Multiple mathematical statements can be grouped into a linear sequence to form a computation, manipulation, or proof. These specific types of sequences do not need to be explicitly modeled, only that there should be a way to describe a sequence of statements. Let's call this a "Program" for lack of a better term.

Some design goals for the nodes:

regular, similar mathematical structures should have similar tree structure
avoid one off nodes if possible

Program is a sequence of statements which can be any other node. This

type: "Program"
body: an array of statements which could be any other node.

Operation handles unary, binary, and n-ary operations. Unary minus is used to represent negation.

type: "Operation"
op: a string containing the operation, e.g. "+", "*", "/", "\u00b7", ^, etc.
args: an array of any node other than Relation or whatever we call a sequence

Relation is used for equivalence relations, e.g. =, <, <=, etc., set relations, e.g. "is a subset of", and any other relations that might come up

type "Relation"
rel: =, <, <=, etc. (TODO: figure out subset)
args: an array of two or more nodes other than Statement or

Identifier stores information about variables, constants, and function names. The reason not to have separate nodes for Variable and Constant is that an identifier may be either depending on the context.

type: "Identifier"
name: string, e.g. "x", "pi", "atan2", etc.
subscript: null, Identifier, Number, or Sequence (sequence is useful for matrix indices)

Function can be used to represent either a function definition or function application.

type: "Function" (can be refined to "FunctionDefinition" or "FunctionApplication")
label: Identifier
args: any node other than Relation or Program.

Number

type: "Number"
value: a string representation of the number. It may make sense to convert to some other representation, e.g. number, fraction.js instance, etc. before evaluating or transforming the AST.

Sequence comma separated sequence of non-program nodes

type: "Sequence"
args: one or more non-program nodes, a Sequence of Relations could represent a system of equations, a sequence of numbers could represent an ordered pair or a vector.

Brackets encompasses parentheses as well as other related symbols. Can be used for standard parenthesis, open/closed/half intervals, ordered tuples, sets etc.

type: "Brackets"
left: string, either "(", "[", "{", etc.
right: string, either ")", "]", "}", etc.
content: any non-program node (args: an array of non-program nodes might also make sense)

Weird edge cases:

^-1 for function, trigonometric function, and matrix inverses
^T for transpose
...

Stuff that hasn't been described but should be at some point:

summation, product operators
ellipses to indicate an infinite sequence, series
limits, derivatives, integrals
matrices and column vectors
...

The text was updated successfully, but these errors were encountered:

tkosan · 2017-02-07T16:11:38Z

Since mathsteps is educational in nature, what are your thoughts on a good way to add metadata to the nodes to support visually identifying various parts of a tree like what is shown in the following examples?:

aelnaiem · 2017-02-07T16:53:11Z

This is a great write-up! I appreciate defining relations, programs and sequences. Essentially encapsulating many more types of mathematical statements than we currently have.

Should the value of a Number node be a number or a string? In the first example, there are both:

        {
            "type": "Negation",
            "content": {
                "type": "Number",
                "value": "3"
            }
        },
        {
            "type": "Number",
            "value": -4
        },

Although mathjs also uses string values, I'm curious what the reasoning for using a string is over just a number?

@tkosan not sure if you already knew this but what we currently due is add a changeGroup attribute to the nodes that have changed during certain steps to accomplish that.

kevinbarabash · 2017-02-07T17:17:20Z

@tkosan the plan is to use the same loc format that various JavaScript ASTs use. I was thinking:

loc: {
    start: {
         line: number >= 1, 
         column: number >= 0
    },
    end: {
         line: number >= 1, 
         column: number >= 0
    }
}

@aelnaiem the reason for using string is so that we can represent decimal numbers with the correct precision, e.g. 34.00. Open to suggestions. I need to update the examples.

aelnaiem · 2017-02-07T17:20:38Z

Right, that makes sense!

tkosan · 2017-02-18T21:48:00Z

@kevinbarabash, what are your thoughts on adding support for parsing input that contains LaTeX? A few years ago I obtained a copy of a math parser from the Khan Academy khan-exercises repository on GitHub that included support for parsing LaTeX. It is present near the bottom of the following JavaScript program if you would like to look at it:
Khan Academy math parser

kevinbarabash · 2017-02-19T00:06:09Z

@tkosan the khan-exercises parsers has been extracted into a sub-library called KAS which is still used by KA. I was thinking of having different settings for the parser that produce behavior that was compatible of mathjs, KAS, asciimath, etc. KAS treats abc as implicit differentiation between a, b, and c whereas mathjs treats it as a single identifier. In terms of LaTeX support specifically, KAS only supports a limited number of LaTeX commands, see https://github.com/Khan/KAS/blob/master/src/parser.js#L828.

tkosan · 2017-02-19T05:07:31Z

@kevinbarabash the parser I copied is the one KA used before they switched to the KAS parser. I found that the KAS parser was not suitable for the kind of step-by-step equation solving I was implementing (but I don't remember why).

I like the idea of math-parser having different settings. A setting I would like is one that produced trees that are not flattened. On the LaTeX issue, my thought is having support for even a limited number of LaTeX commands was sufficient for parsing the problems that Ahmed posted.

I have been studying the math-parser code, and it is beautifully written! I am looking forward to having it in mathsteps.

kevinbarabash · 2017-02-19T18:00:13Z

@tkosan I had a look at the file that @aelnaiem posted. Here's the list of all of the LaTeX commands being used:

\frac, \sqrt, \times, \div, \leq, \geq, \cdot

As the capabilities of mathsteps expands, we'll probably also want to support

\pm, \abs, \sin, \cos, \tan, etc.

Regarding, non-flattened trees, is the main use-case generating binary expression trees to show computation order?

kevinbarabash · 2017-02-19T18:09:36Z

I've opened #14 for limited LaTeX support. For further discussion about the AST we want to produce, please see math-ast/blob/master/spec.md and file issues at math-ast/issues.

kevinbarabash mentioned this issue Feb 7, 2017

Use fraction numbers to avoid rounding (wip) google/mathsteps#89

Closed

evykassirer mentioned this issue Feb 17, 2017

create our own parser and node structure google/mathsteps#128

Open

8 tasks

kevinbarabash closed this as completed Feb 19, 2017

tkosan mentioned this issue Feb 19, 2017

Support for non-flattened trees semantic-math/math-ast#10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AST discussion #12

AST discussion #12

kevinbarabash commented Feb 7, 2017 •

edited

Loading

tkosan commented Feb 7, 2017

aelnaiem commented Feb 7, 2017

kevinbarabash commented Feb 7, 2017

aelnaiem commented Feb 7, 2017

tkosan commented Feb 18, 2017

kevinbarabash commented Feb 19, 2017

tkosan commented Feb 19, 2017

kevinbarabash commented Feb 19, 2017

kevinbarabash commented Feb 19, 2017

AST discussion #12

AST discussion #12

Comments

kevinbarabash commented Feb 7, 2017 • edited Loading

tkosan commented Feb 7, 2017

aelnaiem commented Feb 7, 2017

kevinbarabash commented Feb 7, 2017

aelnaiem commented Feb 7, 2017

tkosan commented Feb 18, 2017

kevinbarabash commented Feb 19, 2017

tkosan commented Feb 19, 2017

kevinbarabash commented Feb 19, 2017

kevinbarabash commented Feb 19, 2017

kevinbarabash commented Feb 7, 2017 •

edited

Loading