Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AST discussion #12

Closed
kevinbarabash opened this issue Feb 7, 2017 · 9 comments
Closed

AST discussion #12

kevinbarabash opened this issue Feb 7, 2017 · 9 comments

Comments

@kevinbarabash
Copy link
Member

kevinbarabash commented Feb 7, 2017

An AST can be produced by input that is syntactic valid. Operations and functions may (or may not) be semantically valid depending on the operands. Moreover, the operands may affect the meaning and or the properties of an operation or function, e.g.

  • a * b commutes if a and b are scalars whereas A * B does not commute if A and B matrices

Semantic knowledge should be kept out of the AST, but can be used with an AST to for purposes of validating, evaluating, and transforming a mathematical statement.

A mathematical statement could be an expression, equation, or definition.

Multiple mathematical statements can be grouped into a linear sequence to form a computation, manipulation, or proof. These specific types of sequences do not need to be explicitly modeled, only that there should be a way to describe a sequence of statements. Let's call this a "Program" for lack of a better term.

Some design goals for the nodes:

  • regular, similar mathematical structures should have similar tree structure
  • avoid one off nodes if possible

Program is a sequence of statements which can be any other node. This

  • type: "Program"
  • body: an array of statements which could be any other node.

Operation handles unary, binary, and n-ary operations. Unary minus is used to represent negation.

  • type: "Operation"
  • op: a string containing the operation, e.g. "+", "*", "/", "\u00b7", ^, etc.
  • args: an array of any node other than Relation or whatever we call a sequence

Relation is used for equivalence relations, e.g. =, <, <=, etc., set relations, e.g. "is a subset of", and any other relations that might come up

  • type "Relation"
  • rel: =, <, <=, etc. (TODO: figure out subset)
  • args: an array of two or more nodes other than Statement or

Identifier stores information about variables, constants, and function names. The reason not to have separate nodes for Variable and Constant is that an identifier may be either depending on the context.

  • type: "Identifier"
  • name: string, e.g. "x", "pi", "atan2", etc.
  • subscript: null, Identifier, Number, or Sequence (sequence is useful for matrix indices)

Function can be used to represent either a function definition or function application.

  • type: "Function" (can be refined to "FunctionDefinition" or "FunctionApplication")
  • label: Identifier
  • args: any node other than Relation or Program.

Number

  • type: "Number"
  • value: a string representation of the number. It may make sense to convert to some other representation, e.g. number, fraction.js instance, etc. before evaluating or transforming the AST.

Sequence comma separated sequence of non-program nodes

  • type: "Sequence"
  • args: one or more non-program nodes, a Sequence of Relations could represent a system of equations, a sequence of numbers could represent an ordered pair or a vector.

Brackets encompasses parentheses as well as other related symbols. Can be used for standard parenthesis, open/closed/half intervals, ordered tuples, sets etc.

  • type: "Brackets"
  • left: string, either "(", "[", "{", etc.
  • right: string, either ")", "]", "}", etc.
  • content: any non-program node (args: an array of non-program nodes might also make sense)

Weird edge cases:

  • ^-1 for function, trigonometric function, and matrix inverses
  • ^T for transpose
  • ...

Stuff that hasn't been described but should be at some point:

  • summation, product operators
  • ellipses to indicate an infinite sequence, series
  • limits, derivatives, integrals
  • matrices and column vectors
  • ...
@tkosan
Copy link

tkosan commented Feb 7, 2017

Since mathsteps is educational in nature, what are your thoughts on a good way to add metadata to the nodes to support visually identifying various parts of a tree like what is shown in the following examples?:

@aelnaiem
Copy link

aelnaiem commented Feb 7, 2017

This is a great write-up! I appreciate defining relations, programs and sequences. Essentially encapsulating many more types of mathematical statements than we currently have.

Should the value of a Number node be a number or a string? In the first example, there are both:

        {
            "type": "Negation",
            "content": {
                "type": "Number",
                "value": "3"
            }
        },
        {
            "type": "Number",
            "value": -4
        },

Although mathjs also uses string values, I'm curious what the reasoning for using a string is over just a number?

@tkosan not sure if you already knew this but what we currently due is add a changeGroup attribute to the nodes that have changed during certain steps to accomplish that.

@kevinbarabash
Copy link
Member Author

@tkosan the plan is to use the same loc format that various JavaScript ASTs use. I was thinking:

loc: {
    start: {
         line: number >= 1, 
         column: number >= 0
    },
    end: {
         line: number >= 1, 
         column: number >= 0
    }
}

@aelnaiem the reason for using string is so that we can represent decimal numbers with the correct precision, e.g. 34.00. Open to suggestions. I need to update the examples.

@aelnaiem
Copy link

aelnaiem commented Feb 7, 2017

Right, that makes sense!

@tkosan
Copy link

tkosan commented Feb 18, 2017

@kevinbarabash, what are your thoughts on adding support for parsing input that contains LaTeX? A few years ago I obtained a copy of a math parser from the Khan Academy khan-exercises repository on GitHub that included support for parsing LaTeX. It is present near the bottom of the following JavaScript program if you would like to look at it:
Khan Academy math parser

@kevinbarabash
Copy link
Member Author

@tkosan the khan-exercises parsers has been extracted into a sub-library called KAS which is still used by KA. I was thinking of having different settings for the parser that produce behavior that was compatible of mathjs, KAS, asciimath, etc. KAS treats abc as implicit differentiation between a, b, and c whereas mathjs treats it as a single identifier. In terms of LaTeX support specifically, KAS only supports a limited number of LaTeX commands, see https://github.com/Khan/KAS/blob/master/src/parser.js#L828.

@tkosan
Copy link

tkosan commented Feb 19, 2017

@kevinbarabash the parser I copied is the one KA used before they switched to the KAS parser. I found that the KAS parser was not suitable for the kind of step-by-step equation solving I was implementing (but I don't remember why).

I like the idea of math-parser having different settings. A setting I would like is one that produced trees that are not flattened. On the LaTeX issue, my thought is having support for even a limited number of LaTeX commands was sufficient for parsing the problems that Ahmed posted.

I have been studying the math-parser code, and it is beautifully written! I am looking forward to having it in mathsteps.

@kevinbarabash
Copy link
Member Author

@tkosan I had a look at the file that @aelnaiem posted. Here's the list of all of the LaTeX commands being used:

\frac, \sqrt, \times, \div, \leq, \geq, \cdot

As the capabilities of mathsteps expands, we'll probably also want to support

\pm, \abs, \sin, \cos, \tan, etc.

Regarding, non-flattened trees, is the main use-case generating binary expression trees to show computation order?

@kevinbarabash
Copy link
Member Author

I've opened #14 for limited LaTeX support. For further discussion about the AST we want to produce, please see math-ast/blob/master/spec.md and file issues at math-ast/issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants