Skip to content

Conversation

@dpvc
Copy link
Member

@dpvc dpvc commented Apr 18, 2025

This PR implements a new paradigm for the expression explorer. The main goals are:

  • Work across the major browser/OS/screen-reader combinations.
  • Properly read the full expression when reading the whole page or stepping through the page sentence-by-sentence (or by other units).
  • Automatically enter "focus mode" when the expression is focused via tabbing or clicking
  • Allow control over the description of the math (e.g., "clickable math") used during reading and focusing.
  • A "press H for help" message should be spoken when the math is first focused (but can be turned off by a menu preference).

I tested with 11 combinations (Chrome, Firefox, and Safari on MacOS with VoiceOver; Chrome, Firefox, and Edge on Windows with NVDA and JAWS; and Chrome and Firefox on Linux with Orca) and I believe the all these goals are met. I spent several weeks experimenting with these 11 configurations to find out what worked the same among them and what worked differently in order to try to find a common approach that worked for all the combinations. Unfortunately, there didn't seem to be a single setup that worked consistently across all 11, but fortunately, within an OS, there was a setup that worked for all the browser/screen-reader combinations, so it is possible to make it work without users having to change settings by hand.

The prior approach (in the current beta.7 version) is to add aria-labels and role="treeitem" to the sub-nodes of the DOM tree and move the focus among these as the expression is explored. This had several problems. First, treeitem nodes must be in a container with role="tree", which was not the case for us. Second, it did not enter "focus mode" automatically. Third, it did not read the full expression when reading the full page in some of the combinations. Fourth, it basically did not work at all in VoiceOver.

In the new approach, rather than focusing the various node in the math display tree, when you walk to a new node that should be spoken, the explorer inserts a new temporary node that has the needed aria-label and focuses that while visually highlighting the proper display node. The temporary node is removed when you navigate to the next node and a new one is added for the new speech. The whole display tree has aria-hidden="true" and only the new speech node is visible to the AT. This part works consistently across all the browser/OS/screen-reader combinations tested. The only point of contention is how the top-level container element is handled.

In MacOS, there is no "focus mode" versus "browse mode" dichotomy, but in Windows and Unix, we need trigger the screen reader to enter "focus mode" when the expression is focused. That requires the top-level mjx-container to have a role that does that, such as application or tree. While tree could be used to enter focus mode, this role is designed as a means of making selections, and screen readers speak extra information about where you are in the tree and whether the current item is selected or not, which interferes with the reading of the math itself. (E.g., NDVA would say things like "item 1 of 8, not selected" when moving within the expression.) So it turned out to be impractical to use the tree role. The application role was the only one that worked consistently across browsers and screen readers in Windows. Fortunately, that role also works with Orca in Unix. This role also has the advantage that its aria-label is spoken when reading through the page, whereas other roles would end up not speaking anything, so the math was skipped when reading through the page.

For MacOS, the handling of the various top-level roles varied widely across browsers, and there was not a role that would be read properly in all browsers when the page is being read as a whole. The solution used here is to insert an extra node in the container that holds the speech for the entire equation and give it role="img", since that seems to be one of the few roles whose aria-label will be spoken when the page is being read (this is also needed for JAWS to read the expression while reading the full page, if I recall correctly). The top-level container gets no role or aria-label in this case, and when the expression is focused, the explorer descends to the math node and produces its speech (for Windows and Linux, the focus must remain on the container in order for the screen reader to enter focus mode).

In order to handle the "press H for help" message, MacOS includes it in the speech string when the expression is first focused (it is on the temporary node created for the speech). Because the other OSs must keep the top-level expression focused in order to enter focus mode, the help message is added to the aria-roledescription on the fly (technically not legal, but works; this is the only illegal thing being done in this new explorer, though the use of img role is not ideal).

The timing of when the temporary images are created and removed, and when the tabindex is changed is important, and has been carefully managed, here.

I will add more comments about the details of the changes below soon, but wanted to get the PR out so that you can be trying it out in the meantime.

@dpvc dpvc requested a review from zorkow April 18, 2025 14:36
@dpvc dpvc added this to the v4.0 milestone Apr 18, 2025
@dpvc
Copy link
Member Author

dpvc commented Apr 19, 2025

The Details

Now that the semantic-enrich component has had the speech processing moved into the new speech component, the setup for the worker path that was being done in the semantic-enrich component definition file has been moved (and updated) into the speech component file. The updates allow it to work better in node applications without requiring additional configuration.

In order to accommodate the platform-specific configuration defaults, the ts/util/context.ts file now includes information about the current OS as well as the window and document. The test files for the context utility have been updated to test that, and a new one added to handle the Android situation.


In explorer.ts, we add new static properties for the aria role and aria role description to use, and a string to use for the description when it is set to none (the description needs to be non-empty and non-whitespace, so we use a unicode character that is not spoken in any of the tested screen readers). We also add getters for these values in the instances of the explorer MathItem.

We also override the speech MathItem's attachSpeech() and detachSpeech() methods that add or remove the needed attributes and nodes for handling the explorer paradigm described above. These are implemented in the KeyExplorer, described below.

The ExplorerMathDocument gets a new infoIcon property that holds a visible icon for sighted users to get the help dialog when an expression is focused. A new a11y property, help, is added to specify whether or not the "press H for help" message is spoken when an expression is first focused (it is controlled by the menu settings).

We add a number of styles for handling the temporary speech nodes, as well as help icon. The constructor is modified to add the styles to the document, and to create the needed SVG elements for the info icon. Finally, the role description set from the menu settings.


In ExplorerPool.ts, an extraneous AddEvents() call is removed, since it is already called in the Attach() method.


The KeyExplorer.ts file is the one with the most changes. Must of this functionality has been completely changed, and I moved some of the functions around to group related ones together. It may be easier to just view this file as a whole rather than try to use the differences.

Here the KeyMapping type describes the functions that are used within the KeyDown event handler.

The selectors for walking are changed, since we no longer use role="tree item", but instead have marked the walkable nodes with data-speech-node="true".

The hasModifiers() function is moved earlier, and has an extra parameter to decide if it should check for SHIFT or not.

The helpMessage() and helpData are used to construct the help dialog message, most of which is common to all settings, but needs a bit of customization by platform.

The static keyMap gives the mapping for key presses by mapping the key name to a function to perform for each key. These functions return true if the action should not stop the event propagation, false if the honk sound should be given, and void if the action succeeded and should stop propagation.

Some getters are used for easy access to the role and description.

Next come some values used to store the element clicked in a Click event, the node to refocus on after the element regains focus (e.g., after a menu is displayed and removed), the selector string to look for after the expression is rerendered (e.g., by an maction toggle), plus pointers to the active temporary speech node, and the extra top-level speech node with img role. The descend value determines whether focus should remain on the top-level node (when false), or should descend to the temporary speech node for the math element (when true). This will be set to true for MacOS and false otherwise.

Next come the event handlers. The focus-in handler calls Start() unless this is the result of a click event (which already called Start()). The focus-out handler clears the current speech node and calls Stop() unless this focus out is due to the focus being set onto a new temporary speech node. If the document itself has lost focus, are focus the top-level node so that it will get focus again when the document regains focus.

The keydown handler gets the KeyMapping for the pressed key from the keyMap, and if there is one, it does that action, otherwise does the uindefinedKey() action. If the action returns true, the event is allowed to propagate. Otherwise the event is stopped from propagating, and if it is false and we are making sounds, then the honk is sounded. The actions are defined below.

For a mouse down, we check that there are no modifiers (e.g., that would cause selections to be extended), and that the button is the left button (so contextual menu clicks won't be registered, for example). We find the speech element that was clicked, if any, and if it is the info icon, error stop propagation and return (it will be handled by the click event later). Otherwise, if the active element is not the top-level node (i.e., we have clicked on a subexpression), then if we have clicked on a rect added by the highlighter for SVG highlighting, mark the clicked element for a refocus, otherwise record the clicked item (so the focus in will not do a Start() until the actual click event).

For the actual click events, if it has modifiers or is not the left button or we have a selection already (e.g., we drag selected and have just un-clicked at the end of that, which still generates a click event), then focus out and let the system handle the selection. Otherwise, find the clicked speech node, and if it is the info icon, stop the event and open the help dialog. If not, then if we either haven't got a click element (in which case we will select the full equation), or the container contains the clicked item, then stop the event and record the clicked item as the one to focus on when the container receives the focus-in event, but if it was a link, start right away (we won't get a focus-in).

The next section is the key actions. These first few are straight forward. The Enter key action checks if we are active, and if so, looks for a link or an maction to activate. If not already active (e.g., we pressed Escape earlier), then we start the explorer again.

The arrow actions look up the proper node, if any, and use the moveTo() method to check if there is an element, and honks if not, or sets the current speech node to the new one.

The depth action gets the depth string and speaks that. The summary speaks the summary string.

The next rules action removes the attached speech marker (so speech will be added again), gets the new rules, and restarts after the new speech has been attached. Similarly, for the next styles action.

The help action creates the help dialog box with its controls, and adds event handlers to deal with closing it, then focuses the dialog.

The next section handles the actual speech nodes. the setCurrent() method first saves the current node if the document has lost focus (so it can be refocused when the document regains focus). Since we are going to modify the contents of the top node, we let the AT know we are busy for a bit. If a node is already highlighted, we remove the highlighting and clear the current node. If we are not setting a new node (i.e., we are focusing out), we remove the speech node here. (If we are setting a new speech node, we need to delay the removal for later in order to get the focusing to come in the right order and to make sure the AT speaks the new speech properly.) Then we set the new current node, and if there is one, we highlight it. If we are supposed to add a new speech node (i.e., we aren't leaving the top node as the focused node and relying on its speech), then we call addSpeech() to add the new speech node. Finally, we let the AT know we are doing changing things.

The addSpeech() method first removed the special "img" node that holds the full speech string, and then gets the speech from the prefix, speech, and postfix attributes. If we are adding a role description, we create it from the aria role and its description, and include the help message if we need it. Then we speak the needed string (which is where the actual speech node is created), and finally set the top-level tabIndex to -1 (this must come after creating the new speech node, so that we don't get a focus-out too early).

The removeSpeech() method removes the speech node, replaces the "img" speech node, and resets the top-level tabIndex to 0.

The speak() method causes given speech and braille strings to be spoken. It does this by making a new mjx-speech node with role="math", and adding an aria-label for the speech string, together with the SSML attributes (so they can be picked up by the Update() method), and the aria-braillelabel for the Braille string, if any. The aria role description is added and the node is made focusable, then appended to the container node. We indicate that the next focus event will be for the speech node, and then focus it. Then we call Update() to update the regions with the new speech and braille, and finally remove the old speech after a slight delay, allowing the AT to notice the new speech node before the old one is removed (this was needed for Orca to properly read the new speech).

The attachSpeech() method is called from the explorer MathItem's addSpeech() to set up the container for use by the explorer. It hides the child nodes from AT, marks the container as having speech, sets the label, role and description for the container, if needed, and creates and appends the "img" speech node.

The detachSpeech() method undoes all of that.

The next section is for various utility functions.

The findClicked() method finds the node that was clicked (for mousedown and click events). If the node is the info icon, return that. For CHTML output, get the nearest element with the navigation selector. But for SVG output, we look through the SVG tree for the element (with speech) whose bounding box contains the click coordinates. This allows us to register clicks even when they are not on the ink of the glyph. The loop finds the deepest node containing the click position.

The focusTop() function focuses the top-level node, but without setting any speech (like when Escape is pressed).

The SsmlAttributes() function returns the array of SSML attributes.

The restartAfter() function waits for a promise to resolve and then reattaches the speech and refocuses the result. This is used when the ruleset or styles change.

The findStartNode() function determines the node to use as the current node during a Start() action. It will be the refocus node, if there is one, or the current node if one is already current, or the node specified by the restarted selector string, if there is one, otherwise it is null (it will be determined later).

The Start() method has been redesigned. If the explorer is not attached or active, we do nothing. If the item doesn't have speech yet, we ask for the speech to be attached and wait for that to occur (no need for restarts that way). If this Start() call is coming from focusing in on the new speech node, we stop here. Otherwise we mark the container as active (for CSS selectors) and add the info icon. We get the node to make current, and determine if we need to add a speech node or not, then set the current node to the found node or the first selectable node, indicating if we need to add the speech string, and whether to include the description and help message or not on the top-level node. Then we do the super-class Start(). After that, we check to see if we need to add the help message to the top-level node, and finally show all the needed display regions.

The Stop() action puts back the top-level description, if it was changed, unhighlights the expression, removes the info icon, and closes the display regions before doing the super-class Stop().

The Update() method uses the speech node or the top node rather than the current node (since they have the proper speech attributes).

The Attach() and Detach() methods have been moved here (with the other overriden methods), and Attach() sets up the descend value to be when the top-level role is none. Detach() removes the added attributes and elements in addition to its previous actions. The NoMove() and AddEvents() have also been moved here.

The action and link methods have minor adjustments.

Finally, semanticFocus() is modified to handle refocusing on a normal element or an maction element's first selectable node (so if the maction changes its value, the proper thing is active afterward).

That covers a pretty complicate set of changes. I will add more about the other files later (to of time right now).

@dpvc
Copy link
Member Author

dpvc commented Apr 19, 2025

More Details

For region.ts, the speech defaults to U+00A0 so that the region will stay full sized if it contains no speech (rather than collapsing to a thin line).

In speech-enrich.ts, these are the changes I mentioned that convert the SRE attributes to the ones used in the new explorer. In particular, role="treeitem" is removed and replaced by data-speech-node="true", and the aria attributes (other than aria-level) are removed. These changes could be made in SRE and removed from here.

In speech.ts, we add a detachSpeech() that calls the web-worker to remove the speech attributes that were added by attachSpeech() when the state is changed to below when speech is added. In attachSpeech() we save the promise for when the speech is attached so that the explorer can find it in its overridden attachSpeech() method. In the SpeechMathDocument, a couple of properties left over from the no-longer used asynchronous speech code are removed. The detachSpeech() call is made in the state() function, when needed.

In GeneratorPool.ts, the MathItem type is replaced by SpeechMathItem in order to access it properties. We add the enableSpeech and enableBraille properties to the options so that we can tell whether speech or Braille should be added to the expression tree. The summary() processing has been simplified and moved to KeyExplorer. The CleanUp() code is no longer needed, since we don't modify any of the tree node's attributes. The last move and last speech code is no longer used (since we don't change the expression nodes' attributes) and is removed. The updateRegions() method is simplified put pushing most of the work into getLabel() and a new getBraille() method. Finally, the depth() action has been simplified and moved to the KeyExplorer.

In SpeechUtil.ts, we add a BRAILLE attribute name for use in obtaining the braille label, similar to the ones for getting the speech.

in WebWorker.ts, we replace the MathItem type with SpeechMathItem in order to access its properties. We pass the enableSpeech and enableBraille options on to the Attach() method so that only the needed attributes are attached to the DOM tree. Since the new explorer paradigm manages aria-label and aria-braillelabel in the KeyExplorer, we don't add those attributes here; instead we add these using the data-semantic attributes where the explorer will pick them up. The setSpeechAttribute() and setSpeechAttributes() methods now add speech and braille only when the options allow it. Finally, we add a Detach() method that undoes the Attach() method using a recursive detachSpeech() method to remove any aded attributes from the expression DOM tree. This is called by the SpeechMathItem when the state changes to below where the speech has been added.

The :focus CSS has been removed from output/chtml.ts, since the focus is not placed on the expression nodes themselves, but on the mjx-speech nodes.

The Menu.ts file now includes menu items to control the role description to use (e.g., 'math', 'clickable math', no description, etc.), and a checkbox for whether to add the 'press H for help' message. In addition, in applySettings(), the settings are updated before setting the renderer, so that things like the scaling factor are properly set when the pre-rendering occurs. (Prettier also adjust the line breaks in the render() method.) Finally, in the contextemenu event handler, we retain the focused node (if the explorer exists and is active) so that if the context menu is used while an expression is being walked, it will refocus the correct node when the menu exits.

In MenuHandler.ts, we ad a getMenus render action that causes the menu store to be rebuilt. This was being done in the document's updateDocument() method, but if a single expression was rerendered (e.g., but an maction toggle), the menu store didn't properly include the newly rendered expression, and that caused problems with reactivating the explorer after a menu closed on an expression that includes an maction that was toggled. Because of the new getMenu action, the updateDocument() method no longer needs to be overridden.

Finally, MenuUtil.js now uses the new context.os value to determine whether we are in MacOS or not.

That completes the details about these changes.

@zorkow zorkow mentioned this pull request Apr 23, 2025
Copy link
Member

@zorkow zorkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some typos.
The help box could be done with the help box from the menu. Not sure if we want to do this at some point or keep it independent.

I'll send you an email regarding testing.

@zorkow
Copy link
Member

zorkow commented May 13, 2025

There are some issues that I have now discovered (in the update/explorer-misc branch):

  1. Highlight is left on changing rules/styles. Easy fix by adding an this.pool.unhighlight(); before the restartAfter.
  2. The removal of the lastMove and cleanup leads to summary and depth no longer toggling the speech. That is, we can no longer get back to the full speech, without making a move. Theres effectively two ways to solve this issue:
    1. Reinstate keeping (and clearing) the last move. But that does not really fit well with the current way keydown is handled.
    2. Have a repeat key. Previously (in 2.7), we had used the tab key for repeat, but that does not make sense since we still want to tab out of the expression. We could use r/R
  3. Collapse does no longer seem to work as expected. I am still investigating this one, but from what I can see initially it could be related to attributes not being set correctly when maction elements are introduced, leading to incorrect replacement in enrich.
  4. Since this.current is always reset in setCurrent we loose the exploration state. That is, exploration always starts at the top level, not at the node that was visited last. This can be fixed by rewriting the first condition in setCurrent to:
    if (this.current) {
      this.current.classList.remove('mjx-selected');
      this.pool.unhighlight();
      if (!node) {
        this.removeSpeech();
        this.node.removeAttribute('aria-busy');
        return;
      }
    }

@zorkow
Copy link
Member

zorkow commented May 13, 2025

I've partially solved point (3) by ensuring that data-speech attribute is copied. Exploration over the collapsed element
does not work yet, but I am sure has a similar solution.

These will be my next steps:

  • PR that removes SRE related functionality no longer needed in MathJax main.
  • PR with the corrections I have mentioned above.

@dpvc in other words, you can merge this PR and we clean up in a few smaller ones.

@dpvc dpvc merged commit 4b141fd into develop May 13, 2025
@dpvc dpvc deleted the update/explorer branch May 13, 2025 18:16
@dpvc
Copy link
Member Author

dpvc commented May 14, 2025

The issue with collapsing is actually due to a problem with the semanticFocus() method, as well as a timing issue with the fix that I made for Orca (that clones the container node), if you are using the test/explorer branch. I have a fixes for both. There is also a problem with the menu not loading the complexity extension on startup, and I have fixed that as well. These should take care of your items 3 and 4.

I have a straight forward fox for item 2. You already indicate how to fix 1.

I have another update to make for some other issues with the explorer/menu interactions, as well as a proper implementation for the container-replacement idea from test/explorer. (That branch was only meant as a quick test of whether that idea worked for you, not as a base for further development, in case that is the branch you are using.)

I will make a PR for these changes, so you don't need to do the second PR you mentioned above.

@zorkow
Copy link
Member

zorkow commented May 14, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants