-
-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: next generation mathJSON #500
Comments
Very LISPy! Some questions that come to mind for me:
|
Yes! They might have been onto something... As it turns out, a number of historically significant computer algebra systems were implemented in Lisp: Macsyma, Reduce, Maxima, Axiom...
The default dictionaries should help with this. They'll cover a broad range of domains. In addition, the expressions can be annotated with some metadata, specifically a wikidata token that should help disambiguate. If you decide to use a dictionary that defines π as 'PI', and I use the default dictionary that defines it as 'pi', they will both have a wikidata token of 'Q167'. I've considered using OpenMathCD as well, but wikidata seems to have better coverage (and the wikidata info often includes the corresponding OpenMath ID, i.e.
The 'translation' dictionary (i.e. the one that maps from/to Latex) looks something like this: [
{ name: 'pi', trigger: { symbol: '\\pi' } },
{
name: 'mu-0',
trigger: { symbol: ['\\mu', '_', '0'] },
},
{
name: 'set',
trigger: { matchfix: '\\lbrace' },
separator: ',',
closeFence: '\\rbrace',
},
{
name: 'divide',
trigger: { function: '\\frac' },
emit: '\\frac',
requiredLatexArg: 2,
},
{
name: 'eq',
trigger: { infix: '=' },
associativity: 'right',
precedence: 260,
},
{
name: 'abs',
trigger: { matchfix: '|' },
precedence: 880,
closeFence: '|',
},
{
name: 'factorial',
trigger: { postfix: '!' },
precedence: 880,
}
] The 'function' dictionary looks something like this: [
pi: {
wikidata: 'Q167',
isConstant: true,
domain: 'R+',
},
'mu-0': {
isConstant: true,
wikidata: 'Q1515261',
domain: 'R+',
value: 1.25663706212e-6,
unit: ['multiply', 'H', ['power', 'm', -1]],
},
multiply: {
wikidata: 'Q40276',
isPure: true,
isCommutative: true,
isAssociative: true,
},
abs: {
wikidata: 'Q3317982',
isPure: true,
isListable: true,
},
factorial: {
wikidata: 'Q120976',
isPure: true,
},
eq: {
// mathematical relationship asserting that two quantities have the same value
wikidata: 'Q842346',
isPure: true,
isCommutative: true,
},
]
No, the dictionaries are not included in the expression. The wikidata tokens should be sufficient to do the appropriate mapping, but a reference to the dictionary could be included as well (i.e. a URL pointing to the definition of the dictionary). |
Hello, We appreciate this change as it seems to be better for our project and will fix some bugs. Thanks for this ! |
Update: The work is in progress, with the core functionality implemented. This includes support for several "forms", including a canonical form that transforms expressions so they are written as sum of products, with sorted argument (for commutative functions) and using a lexdeg sort order for polynomials. What's left to do:
Once this lands, it will be able to handle some pretty gnarly notations that were difficult/impossible to handle before, for example:
|
The WIP code has been committed, but it's not hooked up to anything yet, so the old MathJSON implementation is still used. Coming soon:
|
@arnog Not sure if this is related. However, is there a way to customize the output of MathLive? For example, instead of outputting as 2^2, it output as Pow(2,2)? |
@BChip Yes! Using the new MathJSON output you would get |
It will be possible to customize the output generated for MathJSON. So, you could indicate that the Latex input However, I'm not sure if that's exactly what you're looking for. It sounds like you might need to customize the input. You might be able to do what you are looking for by providing an inline shortcuts dictionary (with the |
I'm a little late to the game and you have likely already thought about cases like these, but I list them here just in case...
How smart are your rules for going from syntax to semantic going to be. E.g, you have matchfix rule for '| ... |' which becomes absolute value. But if the contents are a table, then that's not right and it's probably not right for a capital letter as in '|M|'. There are lots of other cases like this, so maybe there needs to be some pattern matching and ordering for your rules if that's not there already. |
@NSoiffer Good questions!
Yes: there is an option to attach an attribute to an expression that corresponds to the input:
It all depends on how the operators are defined, since the precise definition of the operators can be overridden. If an operator is defined as associative, a n-ary form will be generated. In the default dictionary, "Add" is an associative operator.
There will be several "forms" supported. A "form" is a transformation applied to an expression, usually with the intent to use a canonical representation for easier processing. The "Full Form" is a form that does not apply transformations, and is therefore closer to the input. The "Canonical Form" applies more simplifications. In particular, the canonical form transforms subtraction into additions, and divisions into multiplications. Canonical Form:
The relational operators are currently non-associative, so this would result in a syntax error, but it could make sense to have them do the first option.
Yes, this is supported. The "head" of a function can itself be an expression, so:
Yes.
Yes, double integrals become two nested integrals.
Yes, this will be supported as well by the default dictionary.
Yes, the rules are ordered and they can match on patterns as well (including checkin on the domain of arguments, etc...). |
Great! You've put a lot of thought into this and it sounds like it will be very powerful. |
I've been taking a sneak peek at the upcoming features and wanted to ask if it will be possible to make "non-greedy" postfix operators. Example:
Adding that isn't an issue. However, it means that
will be parsed as |
What would you like (As an aside, it it possible to customize the parsing by providing a |
Ideally, I would like it to get interpreted as Edit: // Yes, I know, this is probably quite a bit of a hack
dictionary["inequalities"].find(v => v.name=="Equal").parse = function (lhs, scanner, minPrec, _latex) {
if (260 < minPrec) return [lhs, null];
const rhs = scanner.matchExpression(260);
if (rhs == null) return [null, ["Equal", lhs]];
return [null, ["Equal", lhs, rhs]];
}; |
OK, yeah, this can kinda work.
But I'm also curious as to why you're trying to do this. Would "a=" really be a valid expression, or are you trying to handle syntax errors in the input? Trying to figure out if the default definitions shouldn't handle your use case. |
Oh, thank you for pointing those things out! Regarding defining both a postfix and an infix trigger, I actually did try that, but no matter what I did, the postfix trigger always got triggered. {
name: 'Equal',
trigger: { infix: '=', postfix: '=' },
associativity: 'right',
precedence: 260,
}, Now, as to why I'm doing this, no |
OK, that makes sense. So, yeah, in that case it would probably make sense for the default dictionary to return |
Right now when a missing operand is encountered (as in This behavior could be enhanced with a There could also be a global So, an open question right now is whether there should be (1) a per-dictionary entry option, (2) a global option to control the behavior of missing operands, or (3) both. Another question is should the default dictionary entries produce Any thoughts? |
I like having the option for it not to be an error. The example of |
I also like the option for it to not be an error. (by default it should probably be an error) Another symbol where this would be useful would be an interval symbol. For example, However, it should definitely be customize-able per symbol, otherwise catching errors such as |
OK, so right now I'm leaning towards:
|
I'd love it if Mathlive were to expose a function to convert between these forms or applying certain forms. const expression = latexToMathjson(mathfield.value?.$text("latex-expanded") + "", { form: "full" });
// Do stuff with the full expression
// Now get a "simplified" version of the expression that can, say, be submitted to a CAS backend
const simplified = mathjsonApplyForm(expression, ["canonical-subtract", "canonical-root"]); |
This function is called And MathJSON will include a CAS engine. |
The MathJSON implementation is being extracted from MathLive and moved to https://github.com/cortex-js/math-json so that it can be used to manipulate expression without having to load MathLive. |
Hi Arno, I am taking over the work of rmeniche, and would like to know if there is an agenda for the usability of the new functionalities related to mathJSON ASTs ? |
Hi @michelLenczner . There is some documentation on the format here: http://cortexjs.io/guides/math-json/ |
Thank you. Of course, I am familiar with this documentation. I take note of this agenda. For us, the work will start again intensively in March. At the moment we are in a design phase before new implementations. |
Hi Arno. First of all, happy new year. Then, I am coming to know if you had time to progress on the new MathJSON? |
Thank you. Yes, there has been some progress... I'm a bit behind, but I'll try to get something out as quickly as possible. |
I just read through this, and it seems really solid. Great ideas here @arnog :) |
I was wondering if the mathjson format also supports numbers in a different base. The most important ones would be binary and hexadecimal. |
Yes, this can be represented using the In Latex, this is represented as |
Progress update: an implementation is now available at https://github.com/cortex-js/math-json The documentation of the API is lacking, but you can get an idea of how to use it by looking at |
Thank you very much. I will look at it carefully. |
So how will you go about integrating this into mathlive? I'm assuming MASTON is going to be completely superseded by this, so can it be used in mathlive right away? |
It is a bit early to draw conclusions since at this stage I don't really see the general principles guiding the construction. Nevertheless, I would like to know why it is necessary to have a double representation of certain objects in list form and in dictionary form such as |
@saivan yes, MASTON is going to be removed from Mathlive. You can use MathJSON right now by importing the package separately, and using the |
@michelLenczner The form using arrays to represent functions (and numbers to represent numbers, and strings to represent symbols) has the benefit of being more concise. However, the object literal form is necessary to attach metadata to the expressions. It is possible to customize the representation to suits your need using |
I've added the |
Great, I look forward to trying it out then! |
For our part, the date of use of this feature has been moved to April. I don't think I will be able to make a feedback before then. |
Progress updateDocumentationThe documentation has been significantly beefed up at http://cortexjs.io/guides/math-json/. New Atomic Type: DictionaryI have come to the conclusion that adding a fourth atomic element to MathJSON in addition to 'number', 'symbol' and 'function' would be very valuable, namely an element to represent dictionaries (aka associative arrays or maps). This will be added shortly to the documentation and the core library. Domains: Feedback RequestedI am looking for feedback on the definition of 'domains' in the MathJSON library. Domains are not strictly speaking part of the core MathJSON format, but the default symbol dictionary will make use of them. Domain are analogous to "types" in programming languages and they will allow for optimizations when compiling expressions and to perform reasoning by inference on expressions (for example: Have a look at http://cortexjs.io/guides/compute-engine-domains/ and let me know of any domain that should be included (or ones that shouldn't) and of any domain relationship that I may have gotten wrong. This does not need to be an exhaustive list of domains, since it will be possible to dynamically define new domains, but since this is the default dictionary this should be a list of domains that would be frequently convenient to have. Compute EngineI have also made progress on the Compute Engine that can evaluate, compile, and otherwise manipulate MathJSON expressions. It's in the same repo, and the documentation is here: http://cortexjs.io/guides/compute-engine/. Still a work in progress, though. New Language: CortexI have also decided to build a new language, Cortex, that will be essentially syntactic sugar on top of MathJSON expressions, so While it's nice to be able to express math formulas using Latex, more 'functional' programming is better represented with a different syntax. I'll add shortly a parser that will generate MathJSON for this syntax as well as serialize from MathJSON to this syntax. |
This is very interesting, but it would be useful to demarcate the ambitions. This kind of difficulty occurs at almost all levels of these types. For example, if we consider functions as relations (which is not done here) then things get complicated. I guess you didn't want to do it to avoid complications. |
MathJSON has been integrated in mathlive@0.68 🥳 To get the value of a mathfield as MathJSON, use The documentation about MathJSON is available here: https://cortexjs.io/math-json/ |
The MathJSON repo has been renamed to |
Introduction
mathJSON (MASTON) has been useful to represent the content of a mathfield as an Abstract Syntax Tree in a format that can be parsed and manipulated. For example, it's used on mathlive.io to power a computation engine that is used to evaluate expressions and plot them.
However, it has some limitations:
For example, some constructs are represented in ways that make them specific to the typesetting of those operations. e.g. exponentiation (i.e. x^2) is represented with a
sup
property.For example, it would be desirable to specify how arithmetic operations are performed (using native JavaScript numbers, using BigInt, using a third-party numerics library, etc...).
It would also be desirable to be able to specify the syntactic rules of the Latex that can be parsed in order to support custom conventions, for example on how to interpret fences (
]-5, +∞)
) or other syntactic constructs, including specialized operators, functions and fences.As another example from #293
\frac{d}{dx} a + b
could be interpreted/parsed as:d
is a known variable: "((d / (d * x)) * a) + b"The 'correct' interpretation is entirely dependent of the context, and there is currently no way to control this.
Proposal
Therefore, we propose a new version of mathJSON that will feature the following:
\frac{1}{2}
). An option to include parsing of Latex commands (but not their interpretation) would result in["latex", ["\\frac", "1", "2"]}
. A default rule would specify that\frac
should map to thedivide
function, in which case the output would be["divide", 1, 2]
Examples
\frac{a}{1+x}
["divide", "a", ["add", 1, "x"]]
e^{\imaginaryI \pi }+1=0
["eq", ["power", "e", ["add", ["multiply", "pi", "i"], 1]], 0]
For comparison, that last expression was represented in the previous mathJSON version as:
Backward Compatibility
The new format is not backward compatible with the previous version of mathJSON. Although a "translator" between the formats could be written, we do not plan to provide one.
Related Issues
This feature will address the following related issues: #437, #396, #380, #379, #293.
The text was updated successfully, but these errors were encountered: