Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Program Runtime v2 - ABI #32154

Closed
Tracked by #28755
Lichtso opened this issue Jun 15, 2023 · 17 comments
Closed
Tracked by #28755

Program Runtime v2 - ABI #32154

Lichtso opened this issue Jun 15, 2023 · 17 comments
Labels
stale [bot only] Added to stale content; results in auto-close after a week.

Comments

@Lichtso
Copy link
Contributor

Lichtso commented Jun 15, 2023

Problem

#27384 is outdated as that was designed for program runtime v1.

Proposed Solution

Removing the concept of virtual address space:

  • No more address translation (which dominates execution cost in our JIT)
  • No need for memory layouts at all (better than direct mapping with ABIv1, which is very hard to implement)
  • No serialization / deserialization of accounts
  • No separate virtual address spaces, references can be directly shared between CPI instructions

How:

  • Share the host address space for all programs in a transaction (similar to Native Client)
  • Replace VM nesting by dynamic dispatch using two levels of indirection
  • All methods have type signatures and can be called from other programs directly (many possible entrypoints), this replaces CPI and syscalls. In that sense programs become libraries.
  • To be able to call programs from the networks Message interface, which can only call methods with raw bytes and accounts as parameters, they should still provide an entrypoint to deserialize the instruction data and dispatch it.
  • Accounts will be typed as well and not require ser/de inside programs anymore.
  • Pin account allocations host ptrs by reserving (without allocating) host address space for account resizing
  • Allocation and lifetime tracking
    • Stack allocation for internal types
      • Normal pointers with memory layout information: Load always possible, store only if pointer is mutable
    • Heap allocation (transaction global) for external types and persistent structures (accounts)
      • Opaque pointers: No load or store possible
      • Runtime provides table of these opaque pointers for programs to lookup their members
  • [Not MVP] How to inline dynamic arrays / vectors into struct type definitions (especially for accounts)?
  • [Optional] Use page table dirty bit to track which parts of accounts were actually modified and report that back to the accounts DB to allow for a partial write back to disk. This could be done using either using /proc/PID/pagemap or using CPU virtualization.
@dmakarov
Copy link
Contributor

I don't understand the wording of the first two items in the Proposed Solution above.

For Move programs we need to be able to specify on the client side which entry function an Instruction (or Transaction) invokes. Entry functions can have an arbitrary list of formal parameters of various types. We need to support passing from the client the actual arguments for the entry function.

Move programs can refer to global data objects not held in any account. One possible implementation of global data objects in Solana ledger might be a dedicated account with the account data partitioned between the global objects, assuming the size of the account data can grow dynamically. We need to agree on a convention of indexing and addressing the objects in the dedicated account data, so that the Move compiler can generate correct code to interacting with such global objects.

Move has a notion of scripts, arbitrary Move code that is executed by the VM when a transaction is being processed. Scripts are never published to the blockchain. They are executed as part of the client side transaction preparation. How do we support this?

@Lichtso
Copy link
Contributor Author

Lichtso commented Aug 30, 2023

Move scripts are dynamically created temporary programs, right? I think they don't really make sense for us because we are compiling programs, not interpreting them (like the other Move implementations do). From the use case site of things, the closest we have is the Message. The Message is in a sense also a dynamically created temporary program, but not a BPF program, instead something simpler and more limited. In that sense we have to think about how to translate the notion of move scripts to Solana Messages.

We need to support passing from the client the actual arguments for the entry function.

Yes, we know about this problem, hence why wrote the second point in "proposed solution". We can't change the network layer and Message handling because that is far beyond the scope of the program runtime changes, which are bloating up too much already. Also, even if we could I don't see a way for the clients to securely create typed Messages.

Move programs can refer to global data objects not held in any account. One possible implementation of global data objects in Solana ledger might be a dedicated account with the account data partitioned between the global objects, assuming the size of the account data can grow dynamically. We need to agree on a convention of indexing and addressing the objects in the dedicated account data, so that the Move compiler can generate correct code to interacting with such global objects.

Agreed, that is what the point "Heap allocation (transaction global)" is about and the interfaces of managing such heap objects / allocations have to be designed.

@dmakarov
Copy link
Contributor

Move scripts are dynamically created temporary programs, right? I think they don't really make sense for us because we are compiling programs, not interpreting them (like the other Move implementations do). From the use case site of things, the closest we have is the Message. The Message is in a sense also a dynamically created temporary program, but not a BPF program, instead something simpler and more limited. In that sense we have to think about how to translate the notion of move scripts to Solana Messages.

As far as I understand in Solana environment a client creates a transaction that includes a message, and a message can include a list of instructions. The instructions in the message are executed one after another. In Move environment the message can contain arbitrary logic expressed in terms of Move syntax -- this logic is a script. I don't understand what you mean by dynamically created temporary programs in this context. The programs are no more temporary than a transaction that executes the script, they're also not created dynamically, but built into the client application as far as I understand.

@Lichtso
Copy link
Contributor Author

Lichtso commented Aug 30, 2023

In Move environment the message can contain arbitrary logic expressed in terms of Move syntax -- this logic is a script.

Exactly, the difference is we do not allow BPF in Messages, only this limited notion of "Instructions".

The programs are no more temporary than a transaction that executes the script

Yes, that is what I meant by temporary: They only live as long as the Message (and the Transaction it results in), they are not persistent in the sense that they are not stored in the accounts DB like other programs are.

they're also not created dynamically, but built into the client application as far as I understand

They are dynamically created from the perspective of the program runtime, as it can not predict what the clients will submit.

@dmakarov
Copy link
Contributor

In Move environment the message can contain arbitrary logic expressed in terms of Move syntax -- this logic is a script.

Exactly, the difference is we do not allow BPF in Messages, only this limited notion of "Instructions".

The programs are no more temporary than a transaction that executes the script

Yes, that is what I meant by temporary: They only live as long as the Message (and the Transaction it results in), they are not persistent in the sense that they are not stored in the accounts DB like other programs are.

they're also not created dynamically, but built into the client application as far as I understand

They are dynamically created from the perspective of the program runtime, as it can not predict what the clients will submit.

It's not at all important what you call them -- temporary, dynamically created, or just scripts. They're a big part of Move developer experience, and are important for client application development. It would be good to check whether any non-trivial Move based application even exists without using such a script. Just by calling them temporary dynamically created programs you can't diminish their importance for supporting Move. We don't allow many other things that Move needs. This is not a question of what we currently allow or don't, but how to make Solana environment to support Move properly. And supporting scripts is one such problem.

@Lichtso
Copy link
Contributor Author

Lichtso commented Aug 30, 2023

diminish their importance

That is certainly not my intention, I just gave this description to verify if my understanding is correct.

This is not a question of what we currently allow or don't, but how to make Solana environment to support Move properly. And supporting scripts is one such problem.

This has nothing to do with disallowing, but simply the fact that neither our protocol, nor our architecture have such a concept and (as I said) this reaches far beyond the scope of the program-runtime. In other words we have to make with what we have (Messages, Transactions, Instructions) and it won't be exactly like Move scripts. Because again, there is no concept for arbitrary logic outside of programs which are deployed ahead of time into persistent accounts. So instead we should think about how to pass references / values in between Instructions to emulate scripts (minus the Turing completeness part).

@dmakarov
Copy link
Contributor

diminish their importance

That is certainly not my intention, I just gave this description to verify if my understanding is correct.

This is not a question of what we currently allow or don't, but how to make Solana environment to support Move properly. And supporting scripts is one such problem.

This has noting to do with disallowing, but simply the fact that neither our protocol, nor our architecture have such a concept and (as I said) this reaches far beyond the scope of the program-runtime. In other words we have to make with what we have (Messages, Transactions, Instructions) and it won't be exactly like Move scripts.

With existing architecture it is clear that there is no such a concept. No need to repeat the obvious. The issue is not establish the status-quo but to find a solution for a new problem.

Because again, there is no concept for arbitrary logic outside of programs which are deployed ahead of time into persistent accounts. So instead we should think about how to pass references / values in between Instructions to emulate scripts (minus the Turing completeness part).

I don't think this is a valid instead. Other networks support scripts. We might have to extend our Architecture, API and Run-time to support the Move scripts too.

@lheeger-jump
Copy link

Still reading but I would like to just pose one thing:

  • Firedancer is opposed to adding any features to the runtime which make specification of efficient C programs overly tedious (it is already tedious). If I cant write assembly for SBPF because Move support makes that too challenging, thats a non-starter.
  • Move increases the runtime complexity and adds features only it needs.
  • Move does not have any serious usage at scale and its verifiers are untested at the scale that Solana needs to attain. The verifiers are buggy and that has been extremely costly for chains and pushes users

It is my opinion that Move support is a distraction when the Solana runtime and developer tooling has fundamental and historical ills which must be quelled before we can do fun things that make it do different things.

@lheeger-jump
Copy link

I don't think this is a valid instead. Other networks support scripts. We might have to extend our Architecture, API and Run-time to support the Move scripts too.

This will drastically slow down the network. Lets stick to compiled programs. What other widely used blockchain uses scripts?

@dmakarov
Copy link
Contributor

dmakarov commented Sep 1, 2023

I don't think this is a valid instead. Other networks support scripts. We might have to extend our Architecture, API and Run-time to support the Move scripts too.

This will drastically slow down the network.

Is this claim based on some profiling data? I'd like to see the data.

@lheeger-jump
Copy link

Problem

#27384 is outdated as that was designed for program runtime v1.

Proposed Solution

  • All methods have type signatures and can be called from other programs directly (many possible entrypoints), this replaces CPI and syscalls. In that sense programs become libraries.

Disagree. Consider: allow for exported type signatures & symbols in library accounts and allow for one entrypoint in executable accounts like in any OS.

  • To be able to call programs from the networks Message interface, which can only call methods with raw bytes and accounts as parameters, they should still provide an entrypoint to deserialize the instruction data and dispatch it.

Agreed, I think. Can you elaborate? Do you have a thing like a method selector, like in EVM? Basically a weird 4 byte switch stmt? If so, I like that.

  • Accounts will be typed as well and not require ser/de inside programs anymore.

Lets consider doing away with the rust stdlib. It would let us make location independent data structures for devs. No runtime changes needed! (a good thing!)

  • Unify the currently separate virtual address spaces and remove address translation at runtime

This is still a very hard problem.

More to come.

@lheeger-jump
Copy link

I don't think this is a valid instead. Other networks support scripts. We might have to extend our Architecture, API and Run-time to support the Move scripts too.

This will drastically slow down the network.

Is this claim based on some profiling data? I'd like to see the data.

Interpreting and verifying code at runtime for each transaction will be slower than the current approach. I'm happy to look at an MVP that proves me wrong if you have one.

@ripatel-fd
Copy link
Contributor

ripatel-fd commented Sep 1, 2023

Apologies for the wall of text that follows. This is a complex topic, and I've been accumulating thoughts over the weeks.

To begin with, I strongly think we should split out the PRv2 road map into a few main items. (Those can still be shipped as a combined upgrade). Here are some of the main topics I've identified:

  • Instruction set improvements
  • Runtime-provided ABI compatibility checks for cross-program function calls
  • The Verifier: Moving just-in-time safety checks to ahead-of-time (which depends on the above two items)

TL;DR I don't like additional complexity, but I understand why it might be required

Firedancer is opposed to adding any features to the runtime which make specification of efficient C programs overly tedious (it is already tedious).

@lheeger-jump Most of the proposed changes are additional type safety features that are effectively opt-in. The three main mandatory restrictions are as follows:

  1. Limited data flow: Pointer data may not be exposed outside of the VM (including memory, logs, etc.)
  2. Limited indirect data flow: Pointer data may not be passed to indirect calls (debatable - there might be a way to remove this restriction)
  3. Limited indirect control flow: Indirect calls may only jump to a limited set of program counters

It seems like the strict pointer limitations can be avoided via relative addressing.

There is a valid argument regarding the compiler tooling though: It should remain simple to generate type information to disable type system checks (which would effectively annotate everything as &[u8]). Ideally support for lower-level language frontends (such as no_std unsafe Rust or C) would continue to exist without the need for explicit type annotations. (And IIRC @Lichtso repeatedly mentioned that this is possible for all but the public interface)

If I cant write assembly for SBPF because Move support makes that too challenging, thats a non-starter.

Regarding Move:

I agree - To clarify, development and publishing of on-chain programs is currently out of scope for the Firedancer project, but we are considering it. The runtime should support operation of programs with near native efficiency, such that we can eventually migrate native programs from Rust to sBPF.

The porting of native programs to sBPF would be highly beneficial to protocol simplicity and security (as the vast majority of runtime complexity resides in native programs). Thus, the PRv2 roadmap should not impede on our ability to port such programs through overly restrictive data flow.

But as far as I understand it, Move support would require a more permissive runtime, rather than restrictive.

Because again, there is no concept for arbitrary logic outside of programs which are deployed ahead of time into persistent accounts. So instead we should think about how to pass references / values in between Instructions to emulate scripts (minus the Turing completeness part).

I don't think this is a valid instead. Other networks support scripts. We might have to extend our Architecture, API and Run-time to support the Move scripts too.

@dmakarov @Lichtso Script support exceeds the scope of the original program runtime v2 project, which already features an impressive scope.

If we design PRv2 well, then we should be able to introduce support for scripting in a future upgrade.

In the interest of keeping practical timelines for PRv2, I would suggest starting with a restrictive verifier that does not allow 'dynamic' interpretation (as in, without prior program deployment)

This will drastically slow down the network.

Is this claim based on some profiling data? I'd like to see the data.

@dmakarov As discussed above, scripts require access to program type information to safely execute.

I can imagine this being implemented in two ways:

  1. Compile-time: As with regular program deployments, run the bytecode verifier over the execution script, then interpret.
  2. Dynamic interpretation: Extend the interpreter to perform just-in-time type system checks at runtime. (e.g. reintroduce memory protection checks on each memory access)

Option 2 is obviously slower than execution of precompiled programs but also seems excessively difficult: It is already quite difficult to maintain compatibility between the sBPFv1 interpreter and JIT compiler. As bytecode/type system verification is a more complex problem than bytecode execution, it does not seem practical to support a mix of ahead-of-time and just-in-time verification.

Option 1 is demonstrably slow, and sBPFv2 acknowledges this by installing a deployment cooldown to limit the amount of verification operations.

Thus, I agree with @lheeger-jump that the added overhead of Move scripts to transaction execution is likely to slow down transaction processing rate (with sufficient community adoption). Personally, I think runtime restrictions should encourage Solana on-chain developers to write simple and efficient programs (in the realm of max ~100 sBPF instructions for a simple vote or token transfer, and ~thousands for an exchange order). IMHO, the Move programming model can be summarized as trading worse complexity for greater flexibility (verifier, scripts, etc). Perhaps, a 'Solana Move' can find a middle ground between both ideologies.

@dmakarov I would be interested in examples of Move scripts solving Solana on-chain issues that could not have been solved with precompiled programs.

@ilya-bobyr
Copy link
Contributor

ilya-bobyr commented Sep 1, 2023

Option 1 is demonstrably slow, and sBPFv2 acknowledges this by installing a deployment cooldown to limit the amount of verification operations.

Thus, I agree with @lheeger-jump that the added overhead of Move scripts to transaction execution is likely to slow down transaction processing rate (with sufficient community adoption). Personally, I think runtime restrictions should encourage Solana on-chain developers to write simple and efficient programs (in the realm of max ~100 sBPF instructions for a simple vote or token transfer, and ~thousands for an exchange order). IMHO, the Move programming model can be summarized as trading worse complexity for greater flexibility (verifier, scripts, etc). Perhaps, a 'Solana Move' can find a middle ground between both ideologies.

I may not have all the context, but why is this compromise necessary in the first place?
All operations that are requested by the users of the blockchain are supposed to be paid by those users.
So, if there is additional overhead in running a script that was never deployed, it should be reflected in a higher per instruction cost and/or per transaction cost.
And it does not need to have any impact on the execution of any transactions that do not use this feature.

I can absolutely see how it may add complexity. And it also seems like a rather orthogonal addition, that can happen as part of the Move support effort.
If there is a mechanism to verify and run a program, it can be invoked either asynchronously for deployed programs, or synchronously, as part of a program execution.

Also, if there is a constraint on the number of verifications that can be performed in a given period of time, not reflected in the user's fees, it opens up an attack opportunity.
A starvation attack, when someone is deploying the cheapest programs possible, depriving everyone else of an ability to deploy their programs in a timely manner.

@lheeger-jump
Copy link

lheeger-jump commented Sep 5, 2023

All operations that are requested by the users of the blockchain are supposed to be paid by those users.
So, if there is additional overhead in running a script that was never deployed, it should be reflected in a higher per instruction cost and/or per transaction cons.

Just because something could exist, does not imply that it should. We need very good reasoning for increasing the validator complexity and slowing down the runtime. Also, its unclear to me that users actually want Move (how many users are on Move blockchains again?)

So, if there is additional overhead in running a script that was never deployed, it should be reflected in a higher per instruction cost and/or per transaction cons.

I do not think users will pay 10-100x for script executions compared to a normal program.

And it does not need to have any impact on the execution of any transactions that do not use this feature.

It would be highly undesirable to maintain both features. This would mean maintaining non-Move logic, Move logic and then interop between all of them. Lets focus on picking one. And it does impact the execution of other txns not using this fearture. There are costs at execution time to increasing the paths a transaction can execute under, and its not necessarily small. It may also affect transaction scheduling (i.e. because these new Move transactions will take so long execute, we will have to schedule them differently to complete within the deadline). There is also huge cost associated with both maintaining with bug-free code.

Again, I see Move as a distraction, especially when there are myriad other issues with Solana with solutions which seek to simplify the validator clients.

@lheeger-jump
Copy link

Perhaps, a 'Solana Move' can find a middle ground between both ideologies.

I think the BTF work is towards this and has most of the same goals that Move would provide.

@ripatel-fd
Copy link
Contributor

Perhaps, a 'Solana Move' can find a middle ground between both ideologies.

I think the BTF work is towards this and has most of the same goals that Move would provide.

Indeed. All of the above is referring to extensions to the program loader, executable format, and BTF.

@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Sep 5, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale [bot only] Added to stale content; results in auto-close after a week.
Projects
Development

No branches or pull requests

5 participants