-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode math properties #2425
Unicode math properties #2425
Conversation
…ning, ...) associated with a unicode char
…ning, ...) associated with a unicode char
…d with the glyph, after decoding and font adjustment
…il digestsion; use new decodeMathChar API
…lass with any already assigned grammatical role
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually quite like the direction of the PR. Maybe I don't yet understand enough to register the controversial piece?
I left a few minor comments, and I can't say I understand the full details of the TeX interplay yet, but the grammar-near pieces seem encouraging.
I especially like the changes in the tests, where the math XML markup has become significantly more enhanced.
# When $mathglyph is provided, it is the unicode corresponding to the \mathchar of $value | ||
sub new { | ||
my ($class, $cs, $value, $mathglyph, %traits) = @_; | ||
my ($class, $cs, $mode, $value) = @_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make a note of this in the Changefile for anyone who may have used CharDef
in an external binding. This is breaking change to its method signature.
Could you actually add such a note to this PR? Would avoid forgetting about it later (which is easy to do...)
Which is fine, I think we still need the ability to make such changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point; I started a new section of Changes
, as we'll need it eventually.
($local ? Tokens(T_CS('\char'), $value->revert, T_CS('\relax')) : $$self{cs})); } | ||
else { # Else math mode, mathDecode! | ||
my ($glyph, $f, $rev, %props) = LaTeXML::Package::decodeMathChar($nvalue); | ||
if (!defined $props{name}) { # TEMPORARY HACK ????? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything to add to that comment? E.g. what is missing for the hack to go away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified; thanks!
$whatsit->setProperty(font => LookupValue('font')->specialize($glyph)) if $glyph; | ||
my ($glyph, $f, $rev, %props) = decodeMathChar($n); | ||
$whatsit->setProperty(glyph => $glyph) if $glyph; | ||
$whatsit->setProperties(%props); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor, but you could also guard that setter call:
$whatsit->setProperties(%props) if %props;
DefConstructor('\mathclose Digested', "?#1(<ltx:XMWrap role='CLOSE'>#1</ltx:XMWrap>)()", bounded => 1); | ||
DefConstructor('\mathpunct Digested', "?#1(<ltx:XMWrap role='PUNCT'>#1</ltx:XMWrap>)()", bounded => 1); | ||
DefConstructor('\mathinner Digested', "?#1(<ltx:XMWrap role='ATOM'>#1</ltx:XMWrap>)()", bounded => 1); | ||
our %mathclass_subclass = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This nested hash has the feeling of a mini-grammar... Interesting.
How about three other ideas:
PUNCT => { PERIOD => 1 }
ID => { NUMBER => 1 }
BIGOP => { SUMOP => 1, INTOP => 1, LIMITOP => 1, DIFFOP => 1 }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!, yep, that's the idea!
my $font = $curfont->merge(%$fontinfo); | ||
$fontinfo = $defn && $defn->isFontDef; | ||
if ($$basefontinfo{size} != $curfont->getSize) { # If we've gotten an explicit font SIZE change; Adjust! | ||
$fontinfo = {%$fontinfo}; $$fontinfo{size} = $curfont->getSize; } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a small theoretical chance $fontinfo
is undef
here, so maybe default it to {}
?
$fontinfo = ($defn && $defn->isFontDef) || {};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
$fontinfo = {%$fontinfo}; $$fontinfo{size} = $curfont->getSize; } } | ||
my $font = $curfont->merge(%$fontinfo); | ||
if ($downsize > 0) { $font = $curfont->merge(scripted => 1); } | ||
if ($downsize > 1) { $font = $curfont->merge(scripted => 1); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These merge calls have the same value? Do you intend it to do the same merge twice when >1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, twice for scriptscript
#====================================================================== | ||
UTF(0x5C) => { role => 'ADDOP', meaning => 'set-minus' }, # \backslash | ||
UTF(0xAC) => { role => 'BIGOP', meaning => 'not' }, # \lnot | ||
UTF(0xAC) => { role => 'BIGOP', meaning => 'not' }, # \neg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0xAC is given 3 separate times here? Unsure how that happened. There may be other cases, I see 0x5C
is also there twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
editing mistake. fixed.
"ker" => { role => 'OPFUNCTION', meaning => 'kernel' }, # \ker # | ||
"lg" => { role => 'OPFUNCTION' }, # \lg # | ||
"lim" => { role => 'LIMITOP', meaning => 'limit', need_scriptpos => 1 }, # \lim # | ||
"lim inf" => { role => 'LIMITOP', meaning => 'limit-infimum', need_scriptpos => 1 }, # \liminf # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, first I assume single characters, then single words (no spaces), but I see some entries even have a space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, these don't actually get fired, yet; it would need to be hooked into the math ligature that combines "s,i,n" into "sin"; and then looked up! I suspect "lim inf" will likely fail that test. I kinda left them all in as a reminder...
Here's an interesting, perhaps controversial, PR: It enhances
decodeMathChar
to also lookup math properties (role, meaning,...
) for the decoded glyph from the Unicode module. This provides some parity between symbols defined "semantically" by bindings and low-level\mathchar
, in the simple cases. This covers most of what TeX's plain defines, as it defines them!