Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support forking from mainnet (or any target network) #625

Open
janewang opened this issue Aug 7, 2024 · 19 comments
Open

Support forking from mainnet (or any target network) #625

janewang opened this issue Aug 7, 2024 · 19 comments
Assignees

Comments

@janewang
Copy link

janewang commented Aug 7, 2024

What problem does your feature solve?

To be able to replicate state and issues seen from another network, or used for testing.

What would you like to see?

Be able to recreate a state from mainnet. The node could be forked from the target network from a specific block or continously syncing to the target network.

What alternatives are there?

@leighmcculloch
Copy link
Member

leighmcculloch commented Aug 16, 2024

Internal document we should deliver on the action items for:

@leighmcculloch
Copy link
Member

leighmcculloch commented Aug 21, 2024

Proposed requirements:

  • Start quickstart, core catches up to a specific ledger (any ledger, not checkpoint) then disconnects from network and quorum and shifts to an unsafe quorum with just itself.
  • Maintains connection with local RPC, and local Horizon, etc. Or, RPC and local Horizon are started after the fork.
  • G account impersonation:
    • Be able to submit txs for existing mainnet G accounts without holding the signers.
    • Be able to submit soroban auths for existing mainnet G accounts without holding the signers.

Ideal requirements:

  • The forked network has a different network passphrase to the original network.
  • That the bulk of the fork functionality is built directly into stellar-core to make it possible to use stellar-core in isolation to connect to and fork a network.
  • C account impersonation:
    • Be able to submit soroban auths for existing mainnet C accounts without executing their __check_auth logic.

I think most of the work for this is adding capabilities to stellar-core, with some small work to expose those capabilities to quickstart. I don't think we could realistically implement this all in quickstart only, because there's no way to stop stellar-core at a specific ledger and starting and stopping core, swapping out config files, is likely to be brittle.

cc @anupsdf @dmkozh @janewang @tomerweller

@dmkozh
Copy link
Contributor

dmkozh commented Aug 21, 2024

G account impersonation:
Be able to submit txs for existing mainnet G accounts without holding the signers.
Be able to submit soroban auths for existing mainnet G accounts without holding the signers.

I'm not sure if real 'impersonation' is feasible; that seems too cumbersome and risky to maintain in Core. I think we could just disable signature verification if a certain Core config flag is set. This still seems risky, but at least is much easier to control. One can also use this mode to fund an arbitrary number of test accounts and then switch back into 'enforcement' mode (e.g. when they want to set up some sort of integration test).

The forked network has a different network passphrase to the original network.

I don't think that's a good idea; the network id defines the contract id namespace, so if we change the passphrase, then the network allow instantiating 2 SAC instances per asset and that's generally not the operation mode that we'd want to support in any capacity.

That the bulk of the fork functionality is built directly into stellar-core to make it possible to use stellar-core in isolation to connect to and fork a network.

Is the bulk of the functionality not already in the Core/has to be implemented in the Core (besides downstream service deps, that is)? I don't think we need to go beyond that - there needs to be some external orchestration and I don't think it belongs to Core.

C account impersonation:
Be able to submit soroban auths for existing mainnet C accounts without executing their __check_auth logic.

Similarly to G-accounts, we could just switch host to recording auth. I wouldn't try to go for more granular control than that.

@MonsieurNicolas
Copy link
Contributor

wrt requirements above: those seem to be solutions more than actual requirements.

Adding arbitrary overrides/hooks to core seems to be very brittle (as it's not "the real thing") and will make adding features slow (because now you need to coordinate DevX and core teams on future changes) and I don't see why devX (or others) would have to write different code depending on if they're testing against a "real core" or against some arbitrary state (be local filesystem for CLI or in the client for browser based solutions).

For background, we actually investigated some of those things as part of stellar/stellar-core#2695 -- this was before Soroban.

Here are few things to think about:

  • changing the network passphrase is probably not doable as it changes auth, but also all contract IDs (include SAC). So things like proxy contracts and SAC balances will break. It also breaks classic constructs like AMMs, CBs, etc.
  • auth breaking is the "canary", signature verification does not occur exclusively during auth. if people want to test interactions in the context of layer 2/bridge development, they will run into similar issues elsewhere (in some cases it's more that you need to control which role a specific address has, typically stored in some data entry or wrapped in some access token).
  • as you have multiple core nodes running, they all need to behave exactly the same.

I would actually try to flip this work on its head by exposing a much narrower set of functionality in core and let people outside iterate on functionality.

For example, if we were adding a special native contract (only enabled when a special flag is set) that allows to create/update/delete arbitrary ledger entries (first version, we can limit this to soroban code/data, but there could be other methods added in the future to make changes to classic entries, network settings or even TTL entries).

Note that we would still need to do something to allow people to use this contract, so maybe the special flag that enables that functionality would also reset the "network admin account" somehow (so that people can submit transactions with it). For example GAAZI4TCR3TY5OJHCTJC2A4QSY6CJWJH5IAJTGKIN2ER7LBNVKOCCWN7 on the current public network is "locked" right now and does not have a lot of XLMs see its state in lab.

With this functionality you can:

  • replace any contract code by anything -- so if I don't like a policy contained in a contract, I can just change it, so for example change the admin check code to "return true", or add new methods to popular contracts. The replacement could also just be a wrapper of sorts that performs some pre-post processing before invoking the "real" wasm.
  • replace ledger entries based on educated guesses (like if you know where a balance is stored) or by using the output of a simulation run (that bypasses all auth)

I could see the same logic built on top of this kind of functionality usable either on top of a "quickstart image" like this, or in a pure client side (browser based or cli where the "host" is not core).

@leighmcculloch
Copy link
Member

👍🏻 Thanks, this is really helpful feedback.

If we went for the narrower set of functionality in core focused on supporting quickstart coordinating the forking and supporting ledger entry substitution, could we make these two changes in core?

  • Add --stop-at-ledger option to stellar run stellar-core#4427 so that we can start core, catch up to a specific ledger, then exit, change quorum cfg, restart core. Without this we can in theory do this with checkpoints only with the catchup command?
  • a new http endpoint that accepts a ledger entry which overwrites that entry before, the next ledger.
    • This can be used to reset the network root account.
    • This can be used to do anything that the contract @MonsieurNicolas you suggested could do, but without the need for people to build valid txs, or build contract invocations that requires rpc to simulate for costs and footprints.

With those two changes quickstart in fork mode would:

  • catch up core to ledger # (core stops/shutsdown itself after catch up)
  • change cfg of core+rpc+horizon to local core instance only quorum
  • start core, rpc, horizon
  • call core's new http endpoint to:
    • reset thresholds of root account so the root's master key works again
    • set balance to u32::MAX for native of root account
  • start friendbot (it uses the root account to issue test accounts)

Then folks can use the fork like any test network, or they can use the new http endpoint to sub any other data.

Technically it wouldn't allow you to do everything you might want to do. You might want to disable auth on a contract without subbing the entire contract and subbing ledger entries wouldn't let you do that. But I think the above would get us 80% there, and then we can add other features as needed such as recording auth like what @dmkozh suggested.


  • changing the network passphrase is probably not doable as it changes auth, but also all contract IDs (include SAC)

I understand the difficulty with contract IDs. It's unfortunate that we tied the IDs to the network passphrase, because it hasn't turned out to be a benefit. Could we separate the network passphrase/id concept so that a network could change it's ID for future signatures (txs, auths) while keeping it's "original ID" for contract IDs and other uses?

The risk of a tx accidentally being submitted to pubnet exists. Even though txs won't be naturally circulated to pubnet, there's a footgun opportunity that someone copies a test tx that they're developing with and pastes it into something like the Lab, then accidentally submitting it to pubnet, or runs the forked setup in a public CI environment where their private key might be secret but a signed tx is leaked and someone submits to pubnet.

@tomerweller
Copy link

The risk of a tx accidentally being submitted to pubnet exists.

Just want to emphasize that this is a very real foot gun if we maintain the same passphrase. Developers often jump between networks and often accidentally submit a transaction in the wrong network (happens to me all the time). If we promote a flow in which local debug transactions are valid on mainnet someone will accidentally submit them on mainnet.

@janewang janewang added this to DevX Aug 27, 2024
@github-project-automation github-project-automation bot moved this to Backlog (Not Ready) in DevX Aug 27, 2024
@janewang janewang moved this from Backlog (Not Ready) to Backlog (Ready for Design) in DevX Aug 27, 2024
@MonsieurNicolas
Copy link
Contributor

yeah the passphrase issue is quite annoying -- changing it "partially" would require adopting this partial switch all over SDKs etc (ie: SDKs today compute the SAC address for example, so they would need to know about this split and only use the new ID when signing payloads).

Going back to what we're trying to do here: do we really need to fork an entire network's state?
What about the original requirement of "continuously syncing to the network"?

Could this work be instead be rescoped to just "import and transform" (that can be extended as much as needed with contract specific transforms): I imagine that the list of entries to import is actually small (and simple to generate) and transforms (like compute different hashes) are also fairly simple to do.

With this paradigm:

  • "forking" a network is a matter of seconds, even on top of pubnet (that would normally require downloading GBs of data). So even "rebasing" some changes on top of the latest network state should be doable + as the overhead is very low, a fork can be run everywhere (laptop, browser, etc)
  • separate passphrase -> no risk of signing something valid on existing networks
  • no need to deal with archived state (something nobody mentioned so far)

to make this work, I think the only core/platform change needed would be to support taking as an argument a file that contains the genesis ledger + its ledger header.

@github-actions github-actions bot added the stale label Oct 30, 2024
@janewang janewang removed the stale label Oct 30, 2024
@janewang janewang assigned janewang and unassigned janewang Oct 30, 2024
@stellar stellar deleted a comment from github-actions bot Nov 25, 2024
@sagpatil sagpatil moved this from Todo (Ready for Dev) to Backlog (Ready for Design) in DevX Dec 3, 2024
@leighmcculloch
Copy link
Member

There's another use case as well where changing the passphrase would restrict functionality: it's reasonable I think to fork a network, and then to also be able to apply some legit transactions from that network to the fork. While that approach isn't guaranteed to work because state may diverge, I think it's reasonable enough to support it.

We could go a step further and say, what if forking just allows forking but doesn't actually fork immediately, such as the network is running taking in data from the network, but a command to modify a signer, or something like that, will result in it forking, and that fork is tolerated rather than halted on.

Supporting ideas like this would require the network ID to stay the same.

Ideally we also find a way for transaction hashes to stay consistent, which is trickier than the network ID problem, because transaction hashes derived from the TransactionSignaturePayload are what gets signed.

So I think what we need is to signal a transaction should only be accepted by a development node, or a development fork of a network. That signal needs to be included in the transaction envelope outside the transaction and not affect or in the TransactionSignaturePayload. So it could be signatureless. i.e. a network with forking enabled can execute txns with no signature, rather than a special signature.

Transactions without signatures naturally can't be delivered to a real network. There's actually no need for any special new signal, it's just no signatures are needed.

Signatures are still needed for SorobanAuthorizationEntry's, but we can address "cheatcode" like capabilities there using the pattern @MonsieurNicolas suggested where there's a way to deploy a contract with a transaction that then lets you set any ledger entry and reconfigure contracts.

@leighmcculloch
Copy link
Member

As a bonus, no signatures is extremely easy to integrate into SDKs. SDKs and wallets don't need special "way to sign" to work with the fork, devs can just build txns without signing them and submit them.

@leighmcculloch
Copy link
Member

leighmcculloch commented Dec 16, 2024

  • "forking" a network is a matter of seconds

If we take the approach above where we replace "forking" with "allowing forking and enable some cheatcodes", then we don't actually need to focus on fast forking, we can instead focus on fast network joining which would benefit all users, not only forking devs.

If a node can join the network in a matter of seconds, then someone who wants to fork can join the network in a matter of seconds. Their node will continue to follow the network until such time as the fork occurs.

For devs who wish to test actions at a specific ledger, we'd still need some sort of 'start and fork immediately'.

So at a high level I think this looks like:

  1. Project to support ultra fast network joining (goal < 5-10 seconds with good bandwidth) (this work disconnected from forking completely)

  2. Project to support a dev mode start and fork immediately.

  3. Project to support a dev mode start and tolerate a fork when it occurs.

  4. Project to support a dev mode allow signature-less txs

  5. Project to support a dev mode admin interface for writing / deleting any ledger entry, either via a command, or via a transaction invoking an admin contract not normally available

  6. Project to support snapshotting ledger state in a (2) situation, and restoring, so that a node could be brought up at that specific ledger with zero outside world communication for preserving a test scenario for run in CI.

@MonsieurNicolas Thoughts?

@MonsieurNicolas
Copy link
Contributor

I am not sure I totally following this thread. Can you rework it so that we have a clear list of actual problems worth getting solved (decoupled from solutions) and hard requirements?

Like: if we're actually implementing any "forking" while still producing signatures valid on the public network, it seems we still don't have closure on that topic?

The "signature less" idea is interesting -- if we go in this direction, I am not sure how it differs from simulation though (and/or why we can't just solve this whole thing as an extension on top of simulation semantics).

@leighmcculloch
Copy link
Member

leighmcculloch commented Dec 20, 2024

I'll rework it.

To answer the question on signatureless - simulation doesn't provide a full running network that persists state across simulation, simulation gets fees and costings wrong, and simulation doesn't work with classic operations. But fair point, we could address those gaps. It would make simulation more useful. There is also something comforting about forking because you are running the full system for real.

@leighmcculloch
Copy link
Member

leighmcculloch commented Jan 17, 2025

Main requirement:

  • Test Stellar network behaviour (txs, accounts, contracts) with pubnet state in an identical* to pubnet test environment.

* Identical meaning observable functionality to the view of a network user, not a validator operator.

Additional requirements we've discussed:

  • Great developer experience so that tooling is accessible.
  • Low resource cost to make tooling available to more devs.
  • Safe, low risk to user self-exploit during use.

@github-actions github-actions bot added the stale label Feb 16, 2025
@stellar stellar deleted a comment from github-actions bot Feb 24, 2025
@leighmcculloch
Copy link
Member

leighmcculloch commented Feb 24, 2025

Some products to look at for inspiration / comparison in the Ethereum ecosystem. Broken up into two categories:


1‍⃣ Similar to what the Rust SDK provides:

  • foundry

    Tests can run against an internal running shared fork, or each individual test can run its own isolated EVM. The latter is very similar to how the Soroban Rust SDK works, where we can already run tests against an Env preloaded with a ledger snapshot.


2‍⃣ Similar to what is proposed in this issue: support for running a node that operates from forking a network. All the options are simulators. They aren't software you'd use with a production node. But they do all provide JSON-RPC APIs consistent with geth, so you can test the full stack with them.

  • hardhart node --fork ...

    Fast, if the RPC you have has high enough rate limiting. Infura free nodes don't provide enough requests to fork. Merkle does with https://eth.merkle.io.

    By default forks from a recent commit, and pauses, then mines as needed when txs are submitted. Can be configured to mine on an interval.

    Can fork any ledger if the RPC has archival data. Can change which ledger it is forked at without restarting.

    Because it's a simulation of the network, provides a lot of flexibility. In tests using the network can go backward or forwards in time, modify balances of accounts, impersonate accounts.

    Exposes an RPC compatible API, that has extra methods. Extra methods allow for direct modification of: contract code, contract storage, accounts for impersonation.

    Usable from within tests, but those tests are calling out to the running network, therefore tests are using a shared resource.


  • anvil --fork ...

    Essentially same features as hardhat node. Provides extra methods that are compatible with hardhat, and other tooling.


  • ganache-cli --fork ...

    Similar to hardhat. There are test frameworks, e.g. brownie, that can be used with it.
Ganache is no longer maintained.

  • geth --fork ... doesn't exist. It was suggested and debated, but it didn't go ahead:


🤔 My take aways for the moment from a cursory look at the above tools is that:

  1. A first priority should be to improve the Soroban Rust SDK to make the experience of fork testing require less from the user. By integrating the fetching of fork data directly into the SDK itself so that a user does not need to use the stellar-cli to prefetch data needed for the test.

    This has some challenges for us, but it looks achievable. We can do the first pass using archives, and future passes using a different data source, but realistically RPC would only work if it could return some pinned data. What we really need is random access of any ledger's state.

    Could we do this with todays RPC? I don't think so. All of the tooling in Ethereum seems to have made the same decision to pin ledgers when forking, which makes sense to me, without doing so there's no way to get data consistency.

    If RPCs data storage was changed so that random access was supported of the 7-day history, the SDK could use the RPC for this instead.

Other things that come to mind on going ahead with adding fork testing to quickstart:

  1. What we propose in this issue, running quickstart as a fork, is unlikely to create a good experience unless the node can start up really fast. We said that already, but never defined exactly how fast that should be. As a baseline, it takes the hardhat node when using the Merkle RPC, ~6 seconds to be operational and responding to requests with meaningful data. Chatting to @graydon about this for ideas.

  2. For what we propose in this issue, we don't need to provide a ton of admin functionality, some basic foundational pieces like funding accounts, and ability to overwrite any ledger entry, should suffice, at least as an MVP. We still need to consider what to do about making signatures valid / invalid without changing the network passphrase, tx hashes, hashpreimage hashes.

Other thoughts about making a simulation-style fork testing node, similar to hardhat:

  1. The existing simulation in RPC doesn't suffice. It's stateless, but all of the options in the Ethereum ecosystem are stateful. So I think we'd be looking at using the simulation concept, but reimplementing it into something that could be both stateful, and malleable (resettable).

@leighmcculloch
Copy link
Member

leighmcculloch commented Feb 26, 2025

In the previous message a missing piece is the ability to quickly retrieve ledger state for a stable ledger. In all fork products listed above the behaviour of tooling is to return data from a fixed ledger when the fork begins. This is possible because Ethereum RPCs can serve historical state requests across some range (small or big depending on the operator).

If quickstart is to support fork testing, core* or some simulator needs a fast data source of a stable ledger.

If we are to make the SDKs fork testing a better experience, it needs a fast data source of a stable ledger. The way we have addressed the need for historical state today in the Soroban Rust SDK and stellar-cli's snapshot testing is for the stellar-cli to download a history archive, which takes considerable time and requires significant bandwidth.

We should explore the possibility of adding the ability to get recent stable ledger entries from the RPC, similar to how the Ethereum JSON-RPC implementations allow requesting a specific ledger inside the RPCs available data. An RPC doesn't need to be able to support requesting all historical state, just recent so that a fork test can be setup that can repeatedly get access to ledger data for a recent fixed ledger.

This is an example of what it looks like to request historical state from an Ethereum JSON-RPC implementation:

Documentation: https://ethereum.org/en/developers/docs/apis/json-rpc/#default-block

For example, requesting the latest ledger state:

$ http https://eth.merkle.io/ \
    id=1 \
    jsonrpc=2.0 \
    method=eth_getBalance \
    'params[0]=0xaf2358e98683265cbd3a48509123d390ddf54534'

{
    "id": "1",
    "jsonrpc": "2.0",
    "result": "0x2f0e878afe21c44d"
}

For example, requesting state from a previous ledger:

$ http https://eth.merkle.io/ \
    id=1 \
    jsonrpc=2.0 \
    method=eth_getBalance \
    'params[0]=0xaf2358e98683265cbd3a48509123d390ddf54534' \
    'params[1]='$(printf '0x%x\n' 21908956)

{
    "id": "1",
    "jsonrpc": "2.0",
    "result": "0x3487b9c513159f29"
}

I've opened the following issue to explore the possibility that RPC, or an RPC-compatible service, could serve stable recent historical state in a fashion that could be randomly accessed:

* I mention core here, but realise core today is dependent on a full bucket list. A first goal is to support the SDK doing faster fork testing without needing to use history archives, and then follow up goals are to figure out how to serve fork testing in a running-node-like experience, whether that be modifying core, or running a simulation of core classic+soroban (minus parts like offers, liq pools, claimable balances).

@leighmcculloch
Copy link
Member

leighmcculloch commented Feb 26, 2025

To add more detail, the way that SDK fork testing works today is:

sequenceDiagram
    user->>+cli: create snapshot with<br/>accountIDs and contract IDs
    cli->>+historyarchive: req full ledger
    historyarchive->>-cli: full ledger (GBs)
    cli->>cli: filter ledger down to<br/>accountIDs and contract IDs
    cli->>-user: snapshot (KBs)
    
    user->>+sdk: write test importing snapshot for fork testing
    sdk->>sdk: loads snapshot and uses it as storage starting point
    sdk->>-user: 
Loading

The way it would work if it had access to requesting historical state from RPC or some RPC-compatible service:

sequenceDiagram
    user->>+sdk: write test fork testing ledger N
    loop 
    sdk->>sdk: test running
    alt local snapshot exists
    sdk->>sdk: load ledger entry for key K from snapshot
    else local snapshot does not exist
    sdk->>+rpc: req ledger entry for key K ledger N
    rpc->>-sdk: entry for K (Bytes)
    end
    end
    opt if new entries were downloaded
    sdk->>sdk: write local snapshot for starting point
    end
    sdk->>-user: 
Loading

@MonsieurNicolas
Copy link
Contributor

I suspect that with a fairly small spike you can answer the question of "how hard would it be to pin a ledger" in RPC.

For the quickstart use case: we're really only talking about 1 pin, so should not be a problem, for a hosted RPC it may require some other mechanism.

Maybe the functionality is:

  • add the ability to have "custom pins" to captive-core, a config setting controls how many custom pin slots can exist at any given time. The interface would let RPC associate one of the slots to the latest ledger, and let it pinned for as long as needed.
  • core already supports read only snapshots, so this should not be a huge lift. Difference is that maybe we want caching disabled (so that the overhead is only buckets).
  • RPC would need to manage the list of pinned ledgers and how this gets propagated to clients.
    • that's probably where more work is needed as "pinning" or "overriding" would be some privileged operation (if multiple clients are supported at a time)
    • a way to get around the previous point would be to have pinning be managed by policy and not by clients, so clients would have to ask for the most recent pinned ledger. Something in the line of "ledgers are pinned every 15 minutes" (so if you have 4 pin slots, the lifetime of a pinned ledger would be 1 hour)

@leighmcculloch
Copy link
Member

+1 I think some spiking is warranted to see if we can get an end-to-end working as we imagine and nut out what else we're missing. There are multiple approaches we could consider, such as core doing the pinning, or RPC itself doing pinning.

Once we solve the problem of how to get random access to historical state for the SDK's existing fork testing functionality, we should have a foundation to be able to return to quickstart / node-simulation and use the same there.

@leighmcculloch
Copy link
Member

leighmcculloch commented Mar 1, 2025

I'm moving commentary about the improvement to SDK fork testing out of this thread from now on, and tracking SDK fork testing improvements in this issue:

The rest of this thread will continue to focus on the changes to Quickstart specifically to support fork testing within that product.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog (Ready for Design)
Development

No branches or pull requests

5 participants