Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with mirroring workspace #8225

Closed
1 task done
QuirkyCort opened this issue Jun 20, 2024 · 8 comments · Fixed by #8539
Closed
1 task done

Issue with mirroring workspace #8225

QuirkyCort opened this issue Jun 20, 2024 · 8 comments · Fixed by #8539
Assignees
Labels
component: events issue: bug Describes why the code or behaviour is wrong

Comments

@QuirkyCort
Copy link

Check for duplicates

  • I have searched for similar issues before opening a new one.

Description

When all mirroring events from one workspace to another, I expect both workspaces to be identical. But when an "else if" is added to an "if" block, it is possible for the two workspace to differ.

The problem can be reproduced on https://google.github.io/blockly-samples/examples/mirror-demo/, and I have also verified it on my own application where I can confirmed that all events are being mirrored.

Reproduction steps

  1. Open https://google.github.io/blockly-samples/examples/mirror-demo/
  2. Add an "if" block.
  3. Add an "else" to the "if" block.
  4. Add a "print" under the "else".
  5. Add an "else if" to the "if" block.
  6. The "print" remains under the "else" in the first workspace, but is disconnected in the second workspace.

Stack trace

No response

Screenshots

No response

Browsers

No response

@QuirkyCort QuirkyCort added issue: bug Describes why the code or behaviour is wrong issue: triage Issues awaiting triage by a Blockly team member labels Jun 20, 2024
@maribethb
Copy link
Contributor

fwiw this occurs in both v11.0.0 and v11.1.1 so it's not related to the disappearing input blocks.

@cpcallen cpcallen removed the issue: triage Issues awaiting triage by a Blockly team member label Jun 21, 2024
@cpcallen cpcallen self-assigned this Jun 24, 2024
@cpcallen
Copy link
Contributor

See also #7950.

@cpcallen
Copy link
Contributor

cpcallen commented Jun 27, 2024

Root cause analysis (part I)

Initially I knew the following (true) facts:

  • When a BlockChange change is replayed in the secondary workspace, BlockChange.prototype.run handles replaying mutation-element events by calling loadExtraState on the block.
  • The implementation CONTROLS_IF_MUTATOR_MIXIN.loadExtraState calls this.updateShape_().
  • CONTROLS_IF_MUTATOR_MIXIN.updateShape_ does not saving and restoring connections, unlike its sibling .rebuildShape_.

It therefore seemed logical that the solution would be to have loadExtraState instead call this.rebuildShape_() instead.

I had actually begun making this change when I decided to do a little additional refactoring: specifically, pulling the connection-saving code out of rebuildShape_ into a separate method I had named saveChildBlocks_ (to correspond with the existing reconnectChildBlocks_) when I noticed that there is already a saveConnections method that does something similar, but slightly different: instead of returning the connections, it saves each on the corresponding clause blocks in the mutator bubble. It's not called from anywhere in blocks/logic.ts, but is called from the saveConnectionsListener in core/icons/mutator_icon.ts, with the intention of keeping the saved connections up-to-date in the face of changes to blocks connected to the CONTROLS_IF block on the main workspace while the mutator bubble is open.

The purpose of this (saving the connections on the clause blocks, and keeping them updated) is so that rearranging clause blocks in the mutator bubble will correspondingly rearrange the blocks connected to the CONTROLS_IF block.

Relevant to this issue is the fact that such rearrangements cannot be conveyed only via BlockChange events, because the mutation element conveys only:

  • the count of elseif clauses, and
  • the presence (or absence) of an else clause

and no sequence of such mutations can convey information about how to reorder connected blocks, so if such changes are to be mirrored they must be mirrored by another mechanism—and in fact they are.

Playing around with the mirror demo further (after refining its logging code) demonstrated that in fact reconfiguring the CONTROLS_IF block via its mutator normally causes the primary workspace to emit a series of BlockMove move events that—in most circumstances—successfully disconnect and then reconnect the child blocks in the secondary workspace. The reproduction steps reported in this issue are a (fairly rare) exception, where the code in test/index.js sees only a bare BlockChange event emitted, without being preceded and followed by BlockMove events as it would normally be.

In fact, after further playing around with various combinations of mutations—making a longer and longer CONTROLS_IF block by inserting additional elseif clause blocks in the mutator—it appeared that the BlockMove events are reliably emitted in every case except when the reshaping of the CONTROLS_IF happens to result in the one of the (formerly) attached blocks not moving.

This was very surprising, so I traced the execution of the reproduction case from the point that rebuildShape_ calls reconnectChildBlocks_ at step 5 of the reproduction, and observed Connection.prototype.connect_ create a block_move and then .fire() it (and then traced further to ensure that the event was actually being added to ``FIRE_QUEUE`, just for good measure).

Why is this BlockMove not being observed by the change listener set up to implement the mirroring? The most plausible hypothesis I could think of was that for some reason BlockMove events that do not actually move the block (i.e. have zero offset) are being suppressed.

Further tracing revealed an interesting situation: at the time fireNow is called following step 5 of the reproduction, the contents of FIRE_QUEUE are (edited for clarity):

FIRE_QUEUE = [
  /* 0: BlockMove */ {
    blockId: "GHBo}8?anH~0O3/k#)9E",
    newCoordinate: {x: 85.49684143066406, y: 120.89669799804688},
    newInputName: undefined,
    newParentId: undefined,
    oldCoordinate: undefined,
    oldInputName: "ELSE",
    oldParentId: "R((cl0jw+oRvjZJ9JT}l",
    reason: ['disconnect'],
  },
  /* 1: BlockMove */ {
    blockId: "GHBo}8?anH~0O3/k#)9E",
    newCoordinate: undefined,
    newInputName: "ELSE",
    newParentId: "R((cl0jw+oRvjZJ9JT}l",
    oldCoordinate: {x: 85.49684143066406, y: 120.89669799804688},
    oldInputName: undefined,
    oldParentId: undefined,
    reason: ['connect'],
  },
  /* 2: BlockChange */ {
    blockId: "R((cl0jw+oRvjZJ9JT}l",
    element: "mutation",
    name: undefined
    newValue: '{"elseIfCount":1,"hasElse":true}',
    oldValue: '{"hasElse":true}',
  }
}

Well: that's not ideal: I'd expect BlockChange to be between the BlockMove events, not after both. Even if the BlockMove events hadn't disappeared, this order would still result in an incorrect replay in the secondary workspace.

So it looks like the two BlockMove events might be being coalesced into a single event which, seeming to have no effect, is subsequently dropped for being moot. More tracing determined that the answer is yes, with a call to filter(FIRE_QUEUE, true) returning an array containing only a single event—the BlockChange.

So, It appears that the actual cause of the failure reported in this issue is that the BlockMove event(s) reconnecting the blocks are being emitted before the BlockChange event instead of after it—and curiously enough the filter method contains the following code that appears to attempt to kludge a fix for this by reordering the the events in the queue:

  // Move mutation events to the top of the queue.
  // Intentionally skip first event.
  for (let i = 1, event; (event = queue[i]); i++) {
    // AnyDuringMigration because:  Property 'element' does not exist on type
    // 'Abstract'.
    if (
      event.type === CHANGE &&
      (event as AnyDuringMigration).element === 'mutation'
    ) {
      queue.unshift(queue.splice(i, 1)[0]);
    }
  }

This code first appears in 5578458, merged in PR #298. I have not so far been able to establish whether this was actually created to fix the emit-order issue or if it just happend to do so coincidentally.

(The null-event filtering was added later, in PR #1398, attempting to fix issue #1373 that had resulted from changes made in PR #1205 to address the fact that the previous filter routine had been O(n^2) due to use of Array.prototype.splice().)

@cpcallen
Copy link
Contributor

cpcallen commented Jul 2, 2024

Root cause analysis (part II)

The ultimate root cause of this issue, and the reason the BlockChange event is being emitted after the BlockMove events is that this is how the MutatorIcon.prototype.recomposeSourceBlock method is implemented (simplified):

  private recomposeSourceBlock() {
    // ...
    this.sourceBlock.compose(this.rootBlock);
    // ...
    eventUtils.fire(/* BlockChange */,  this.sourceBlock, 'mutation', /* ... */);
    // ...
  }

The block's compose method is run, modifying the shape of the block and causing all the BlockMove events to be emitted, and then the BlockChange event is emitted.

@cpcallen
Copy link
Contributor

cpcallen commented Jul 2, 2024

Possible remedies

There are three principle categories of fixes that could be applied to this issue.

Emitting events in correct order

This would seem to be the most elegant and notionally correct approach. The tricky thing is how to arrange for blocks' compose methods to emit the BlockChange event at the correct time. There are a few possible approaches:

  1. Have compose directly create and fire a single BlockChange event.
    • The fire call would be removed from recomposeSourceBlock.
    • All mutator mixins (ours and external developers) would need to be modified, and unmodified mutator mixins would cease to emit BlockChange events at all, rendering them broken.
  2. Have the block-shape-changing calls each emit BlockChange events, and verify these will be correctly deduped by the filter function. ('post-dedupe')
    • The BlockChange-emitting fire call would be removed from recomposeSourceBlock.
    • This might entail a creating and then deduping a large number of BlockChange events (with a large number of calls to saveExtraState.
    • Would need to instrument all block methods that add/remove/modify inputs.
    • Would need to have a way to disable event emission during block creation (or perhaps to only enable it during mutation).
  3. Have the Block-shape-changing methods set a flag or place a marker in the event queue to indicate that a BlockChange event is pending. ('pre-dedupe')
    • The BlockChange-emitting fire call would be removed from recomposeSourceBlock.
    • The fire function would be modified to check for such a flag/marker and create and, if present, enqueue a single BlockChange before enqueueing the fired event.
    • Needs to be able to handle multiple different blocks changing, for full generality.
    • Would need to have a way to disable flagging/marking during block creation (or perhaps to only enable it during mutation).

In any case it should be possible to remove the existing post-filtering reorder hack from the filter function, though it would be wise to try to fully understand the original reason why it was created in the first place before doing so.

Fix reordering of events between emission and execution

This would entail updating the filter function to put the reordering hack above the main filter routine (probably breaking it out into a separate method.

  • Needs to be able handle multiple different blocks changing simultaneously, for full generality.
  • Not easy to know, outside of the simple case of [disconnect all, reconnect all, change] for a single block, how to correctly insert the BlockChange amongst multiple BlockMove events.

Introduce separate block method for reconnection

Currently compose has to do the reconnections, and only after compose has returned can recomposeSourceBlock emit the BlockChange event. We could achieve correct event ordering without substantial changes to event emitting code by separating the existing connect method into two separate well-known block methods:

  • The compose method would continue to do the disconnections and shape update (and, for legacy block definitions, also the reconnections).
  • A separate method, which might replace or wrap the existing reconnectChildBlocks_ method, would be called by recomposeSourceBlock_ after_ emitting the BlockChange event to do the reconnections.

Advantages:

  • Avoids any breaking changes to core.
  • Developers wanting to guarantee that events would be emitted in the correct order could move reconnect code from compose to this new method.
  • Developers who did not make this change would continue to experience the existing behaviour, which is correct in most cases.

Disadvantages:

  • This introduces an additional well-known block method:
    • Slight increase in API complexity and difficulty of creating mutable blocks.
    • Increases interdependence between MutatorIcon and block definitions—but perhaps in a reasonably appropriate way given the existing saveConnections / saveConnectionsListener mechanism.
  • Allows us to leave the existing dubious reordering code in the filter function.

(This option suggested by @BeksOmega.)

Fix event reordering and introduce separate reconnect method

This would take a belt-and-braces approach by doing both of the two previous options.

  • Because the reordering code would exist mainly to handle mutation of blocks that don't implement the new reconnect method., we could fix the most dubious aspects of the existing reordering code (reordering after dedupe/drop, always putting BlockChange event in second position in queue) without necessarily needing to write a fully-general implementation (one that could e.g. correctly handle overlapping mutations to multiple blocks).
  • Developers (including us, for the library blocks) can easily guarantee correct event ordering with modest additional effort.

Kludgy hacks Workarounds

There are ways to resolve this issue without resolving the underlying issue in Blockly. The simplest is probably to modify the CONTROLS_IF_MUTATOR_MIXIN's loadExtraState method to call this.rebuildShape_ instead of this.updateShape_.

  • This entails no changes to Blockly so carries most minimal danger of unexpected breakage.
  • Requires checking and updating all mutator mixins (both ours and external developers), but failing to update does not result in complete breakage of mutation events.
  • Does not in any way address the underlying problem.

@cpcallen
Copy link
Contributor

cpcallen commented Jul 3, 2024

Summary of discussion in the team meeting:

  • Workarounds don't seem too bad, until you look closely.
  • Emitting events in correct order seems like desirable approach, but:
    • Option 1 is a breaking change so to be avoided.
    • Options 2 and 3 add significant complication and risk.
  • @BeksOmega suggested an alternative approach to fixing event ordering (now added to list above): add an extra block method, to be called after BlockChange event has been emitted, to do the reconnections. Notes that, worst-case, the bug continues to exist for developers who don't implement this method.
  • I liked this proposal but noted that it does not address the questionable reordering code in filter, which we can't delete without making the bug much worse (i.e. would be a breaking change); I proposed adding reconnect block method and making a modest effort to improve the reordering code to be slightly less broken; this would probably eliminate the bug in most cases, even for developers who don't implement the reconnect method.

I propose adding the block method first, since that is the less risky approach and definitely fixes the problem (for the library blocks, at least), and then looking at fixing the reordering code, and then attempting to improve the reordering code (after ensuring it has sufficient test coverage to reduce the likelihood of regression).

@cpcallen
Copy link
Contributor

Making note of some additional discoveries that have been made since I wrote up the above:

It looks like issue #7026 and PR #7069 are related related to this, insofar as they deal with unexpected consequences of the way filtering currently works. There are some key insights there which are relevant to deciding how to proceed here:

  • Undo and redo result in events being re-filtered.
  • The results of filtering depend on what events happen to be in the FIRE_QUEUE at the time filter is called, so microtask boundaries affect the results. This means that events might get filtered differently the second time time through.
  • Filtering actually mutates events, which can have unexpected effects if references to those events are held elsewhere (e.g. in undo/redo queues).

My thoughts about this:

  • In what situations does filtering make a significant difference to the size of the event queue (not counting cases where it breaks things due to the various problems mentioned above)?
  • It might be better to avoid re-filtering events on undo/redo.

I am currently investigating what happens if I remove all of the filtering code.

@cpcallen
Copy link
Contributor

This has been adequately patched for the time being by the (new, higher quality) bandaid in #8539, but the underlying root cause remains and is tracked in #1036.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: events issue: bug Describes why the code or behaviour is wrong
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants