Architectural Refactoring #142

inancgumus · 2021-11-25T14:38:05Z

inancgumus
Nov 25, 2021
Maintainer

I've been thinking about this today:

We might need a single event loop.
Instead of multiple event-loops scattered around the system.

So we can easily manage the chain of events in a single place and fix the problems faster and robustly. I believe, doing so will simplify the codebase and allow us to become more productive in time.

My reasoning is the current design makes debugging challenging and creates hard to grok synchronization issues. I know we follow Playwright code structure but we might want a different approach since we are using Go instead of Javascript. It's because each has its own style when architecting code.

You can't easily follow the chain of events:

Almost every component in the system has its own goroutine listening for events.
All these are event listeners: network manager, page, session, connection, worker, browser...
For example: Frame manages its own lifecycle events and deals with its mainframe's and child frames' events.
Every component is an executor. So, when you're debugging, it feels as if CDP calls each one of them almost randomly.

For example, when you want to debug something:

You need to look around a lot of places in the system.
It's challenging to get a good stack trace because the code jumps from a stack to the CDP side.
Then CDP takes over and calls our system components. That's why I'm trying to add a verbose logging system.

Here's an experimental project. It's only for experimenting with CDP, and learning about how it works. I put it here only for reference.

imiric · 2021-11-25T14:58:29Z

imiric
Nov 25, 2021

As we discussed over Slack, it would be good to abstract away the CDP event loop in a separate package, so that we can start cleaning up common. Listening for CDP events and dispatching it to the rest of the system should ideally happen from a single place.

I don't think we can move to a single event loop, as we need an internal one for navigation and other events, but extracting the CDP one should be straightforward, conceptually at least if not practically 😅

I'll let Robin assign the priority for this :) I think we should fix the currently open major issues and tackle this ASAP after that and before implementing new features, since it will help us troubleshoot any other issues more easily.

0 replies

robingustafsson · 2021-11-26T13:36:06Z

robingustafsson
Nov 26, 2021
Maintainer

I support any effort to make the code more easily debuggable and long-term maintainable 🙂

Priority wise I'd put this somewhere in the second half of the prio 1 issues...but I still have a bunch of prio 1 issues to create in regards to tests and docs and I'd like to see that we implement some of those before embarking on bigger changes to the internals (and it should help having more integration and end-to-end test coverage before making these kinds of changes as well).

On a related note, I've also many times wished there was a way to separate out a "trace" of log messages belonging to a particular JS API call (eg. page.goto(...) etc) and then having ways to visualize the log messages produced in different goroutines (why I originally added the goroutine ID to log messages) as part of that trace (like a tree view of logs or by using color coding or something like that). Something like what's presented in this paper would be awesome IMO: https://www.cs.ubc.ca/~bestchai/papers/tosem20-shiviz.pdf

0 replies

imiric · 2022-03-22T18:32:03Z

imiric
Mar 22, 2022

To reopen this discussion, after working on #281 it became clear that the current way we emit and process internal events is inherently racy with CDP events. Sometimes we don't process events quickly enough (due to internal locking or unreliable timing on CI machines), or fail to setup waiters for events, leading to timeouts.

The system is also very difficult to reason about, as events could be emitted from anywhere in any order, and they sometimes cause complex internal state changes, which coupled with CDP events makes it difficult to troubleshoot issues.

So I'd like to propose removing this internal event-based system in favor of an event-driven finite-state machine. The events driving the FSM will be the CDP events we have to handle, but the internal state of frames and other entities would be managed by FSMs.

This would have at least a couple benefits:

Easier to reason about as every state has predefined previous and next states. For example, we know that a Page.navigate command must be followed by Network.requestWillBeSent, then optionally Network.requestWillBeSentExtraInfo, Network.responseReceived, Network.responseReceivedExtraInfo, several lifecycle events that are also in order (networkAlmostIdle, networkIdle, load, DOMContentLoaded), then Page.frameNavigated, DOM.contentUpdated, Network.loadingFinished, and so on.

Unless I'm mistaken, these events are entirely synchronous on the browser (for a given frame), so the fact we receive them as CDP events doesn't mean our internal processing of them must be asynchronous as well.
Easier to wait for a specific event that would avoid any race conditions. Hopefully less goroutines and chances of leakage too.

If we break it down into packages, it could look something like this:

graph LR
    script([script]) --> js --> state <--> cdp <--> browser([browser])
    state <--> frame 
    subgraph lib
    frame
    page
    ...
    end

(The state, cdp, browser and lib subpackages relation should be double sided <--> but the current mermaid.js version used by GH doesn't support it.)

If we can avoid any circular imports, this would break up the current common package nicely, and we can think about how to structure lib properly so that it's not just an "internal common" 😄

The idea is that state changes will come from either the script or CDP, but internally it should always be clear what the current state of the system is.

js would contain all functionality exported to scripts, essentially Separate Sobek and k6 logic from the module logic #271, which will trigger specific state transitions.
state would contain the FSM and all state definitions.
cdp would contain the CDP event listeners that unpack the CDP messages and trigger specific state transitions.
lib (or internal?) would contain all current logic, which ideally shouldn't interact with CDP directly but do so via the state, but if this is unfeasible we might want to reconsider.

The FSM could either be built from scratch, or there are several libs we could choose from, e.g. https://github.com/looplab/fsm.

Additionally we might want to drop our cdp package entirely and use a higher level package like https://github.com/mafredri/cdp. This likely wouldn't give us the flexibility we currently have, but it would avoid all sorts of current issues, and add support for functionality we currently don't have. Though this is a separate discussion from this rearchitecture one.

@inancgumus @olegbespalov @robingustafsson WDYT?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architectural Refactoring #142

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Architectural Refactoring #142

inancgumus Nov 25, 2021 Maintainer

Replies: 3 comments

imiric Nov 25, 2021

robingustafsson Nov 26, 2021 Maintainer

imiric Mar 22, 2022

inancgumus
Nov 25, 2021
Maintainer

imiric
Nov 25, 2021

robingustafsson
Nov 26, 2021
Maintainer

imiric
Mar 22, 2022