Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define detailed costs for accounts #29582

Closed
4 of 14 tasks
tao-stones opened this issue Jan 9, 2023 · 14 comments
Closed
4 of 14 tasks

Define detailed costs for accounts #29582

tao-stones opened this issue Jan 9, 2023 · 14 comments
Labels
community Community contribution stale [bot only] Added to stale content; results in auto-close after a week.

Comments

@tao-stones
Copy link
Contributor

tao-stones commented Jan 9, 2023

Problem

Cost Model currently only coarsely set cost WRITE_LOCK_UNITS for accounts. Should probably be split up in multiple features to regress a more accurate cost model.

Details

Accounts costs are part of transaction cost, which is used to aid to produce parallelizable blocks; also serves as base for transaction base fee (via fee structure). The costs should be able to derive from compiled transaction without need of loading accounts and executing instructions.

Costs relate to accounts can be categorized into two groups:

  1. costs of accessing accounts; which should be further broken into two categories based on its distinguish characteristics:
    1.1. The actual computational cost of load/write accounts; or CPU costs on loading account to make it available to programs, and write back to permanent storage when necessary. This metric is an aggregated cluster average, so some details such as load from cold storage vs load from cache are being identified at the moment. It is statically defined in bloc_cost_limit.rs, and should be updated periodically to reflect cluster characters.

    • writable account cost consists loading and writing account;
    • readable account cost consists just loading account;
    • executable account cost consists only loading account, note all loaders have their own cost assigned;
    • address table lookup account cost consists only loading account, note native address lookup program has its own cost assigned.

    1.2 The exclusiveness of read/write locks on accounts; eg, when an account is write locked by TXs in a batch, not other batches can read/write to it concurrently. MAX_WRITABLE_ACCOUNT_UNITS is used to limit such linear work in a block.

    For a TX has xyz CUs that writes to account A and read account B, currently including it into a block will contribute xyz CU to A account limit until MAX_WRITABLE_ACCOUNT_UNITS is exceeded. It does not do anything to ACCOUNT B. As Read CU's also need to count towards account limits #25594 states, read lock can also contribute to the linearness, hence should also be counted towards to account limit.

  2. costs of storage/space;

    Assigning CU to storage isn't straightforward. IMO, we should avoid doing so. Instead, could add bytes into cost limits, and define lamports/Kilobyte in base fee calculation to charge for space as first step. Ultimately to get a storage market (See proposal)

Proposed Solution

Note:

Most of these changes need to be feature gated after #29595 is activated.

Further Tasks

  • Should probably treat new account allocation differently since it usually costs more
  • Storage market
  • Add lamports_per_kb in fee structure
  • Activate fee structure
  • A RPC call to simplify estimating CUs for transaction, or perhaps via simulation.
  • To help estimate CUs for complex dapps, provide better introspection into the vm in solana-test-validator, ideally per instruction, so limits can be effectively determined experimentally
@tao-stones tao-stones added the community Community contribution label Jan 9, 2023
@tao-stones tao-stones changed the title Define costs for accounts Define detailed costs for accounts Jan 9, 2023
@tao-stones
Copy link
Contributor Author

link @carllin's #25594

@tao-stones
Copy link
Contributor Author

tag @mschneider @apfitzge

@mschneider
Copy link
Contributor

So initially I only proposed features that can be derived from the compiled transaction. I wonder though, if we can't include more dynamic information to become more precise:

  1. account allocated data length
  2. account accessed data length (in units of aligned 64 bytes, if you read 64MB that should cost 1M times more than reading 1 byte)

@sakridge
Copy link
Member

An idea that came up before was making the write locks cost proportional to the potential CU used in a transaction. To be calculated up-front, it would have to be based on the predicted CU of the transaction. This accounts for the transaction holding onto the exclusive write-lock proportional to roughly how long it takes to execute the transaction.

I like the list, as @mschneider pointed out, total number of read/written bytes would be a good idea, downside is that you have to actually load the accounts. The transaction could request a limit for these in buckets similar to how we do for instruction CU to request a larger amount.

@mschneider
Copy link
Contributor

For highly dynamic programs like CLOBs it can be very hard to estimate all these limits upfront, at least if you want to be more precise than just assuming that all of the account is rewritten. There's a lot of downside from choosing the size too low. So the more individual dimension we add, the more cases of "let's multiply by 2 to be safe" will happen.

Basically we had this issue with CU and didn't really solve it, it's also incredibly time consuming to communicate the CU limits cross-teams developing dapps that CPI into another dapp, as their developers often have a different set of assumptions about how CU behaves.

A good example for this is: mango offering cross-margined limit orders on openbook

  1. depending on the configuration of the mango account (which oracles are used) the risk-engine check uses variable CU. we usually used a maximum value based on synthetic test cases for this part.
  2. depending on the size of the order and the amount of orders on top of the book, open book needs variable CU for matching, again synthetic tests have shown that 200k CU can usually fit 20 orders matched, but that's not always true. luckily the orderbook is often very dynamic and so it might just cause an intermittent hard to reproduce failure

It's definitely possible, but inconvenient. Would be great if we had some way to reduce the burden on the dapp developer here. A lot of them don't even know how CUs are calculated and some even don't know that there are limits. Imagine writing your first solana program and you need to manually specify the CU and read/written memory areas when testing your first instruction.

We might be able to alleviate this issue by providing these two band aids:
For simple dapps: a RPC call like eth_estimateGas
For complex dapps: provide better introspection into the vm in solana-test-validator, ideally per instruction, so limits can be effectively determined experimentally

@tao-stones
Copy link
Contributor Author

The burden of estimating and requesting is on dev. This applies to CU and account data size (if implemented). For storage market, thinking can do something similar to local fee market by adding a RPC call to query accounts (recent) sizes. Added your suggestions to Further Tasks in description.

@tao-stones
Copy link
Contributor Author

tao-stones commented Feb 15, 2023

tag @brooksprumo @jeffwashington @apfitzge Issue description updated

@tao-stones
Copy link
Contributor Author

  1. costs of storage/space;
    Assigning CU to storage isn't straightforward.

perhaps can still assign CU to storage if heeding how heap_cost is set by compute budget:

    /// Number of compute units per additional 32k heap above the default (~.5
    /// us per 32k at 15 units/us rounded up)
    pub heap_cost: u64,

It's 8 CUs per 32K, for a tx that loads 64MB accounts, it's cost for storage = (8cu/32K) * 64M = 16,000 CUs. Objections?

@tao-stones
Copy link
Contributor Author

The transaction could request a limit for these in buckets similar to how we do for instruction CU to request a larger amount.

#30366 to add compute budget ix to request limit

@mschneider
Copy link
Contributor

mschneider commented Feb 17, 2023

If you need to squash everything into CU, converting at 60GB/s (the result of using 8CU) is a good choice.

But I really don't like that we now have a magic constant to convert from account size to CU, as this will slowly lead towards the current eth gas fee design.

For long-term I would prefer an approach where users only pay for priority but the scheduler packs blocks according to a rough baseline machine spec defined as limits (in multiple dimensions) for clock cycles (sigverify & vm execution), memory bandwidth (accounts, stack & heap), storage cost (accounts & ledger) and network bandwidth (ledger).

All those limits should be requested by a user directly upfront, similar to a pod spec:

  • 30MB accounts loaded
  • 10MB accounts written
  • 10MB of additional heap
  • 200 bogomip of compute
  • 500 bytes of ledger space

@tao-stones
Copy link
Contributor Author

If you need to squash everything into CU, converting at 60GB/s (the result of using 8CU) is a good choice.

converting space taken into CU is not first choice, it doesn't fit naturally, as in issue description: "Assigning CU to storage isn't straightforward. IMO, we should avoid doing so. Instead, could add bytes into cost limits, and define lamports/Kilobyte in base fee calculation to charge for space as first step." But figuring out "lamports/kilobyes" is kind of magic still. Feel copying heap to CU gives us a quicker solution, but not feeling too strong on this tho.

For long-term I would prefer an approach where users only pay for priority

Mmm, you mean not to pay extra for CPU (as in CU) and memory (as in account size)? Transaction with priority fee could take up all block limits (clock, memory etc) with relatively low cost, possible to reject other txs from landing relatively easily?

@mschneider
Copy link
Contributor

the difference would be:

lamports = 5000 + priority * ( cu[cpu] + cu[heap] )
lamports = priority_cpu * limit_cpu + priority_heap * limit_heap

each of those priority factors and resource limits would need to be set high enough to allow for an inclusion in the block. lamports / kb would not be set by the runtime itself but bid by the users. if a lot of users compete for access to heap the priority fees for heap should slowly increase, while cpu bound workloads stay the same.

@Lichtso
Copy link
Contributor

Lichtso commented Feb 23, 2023

About program accounts: Please don't use the is_executable flag anymore, we are deprecating it, use the owner id instead. See: #30371

Also, loading program accounts should be more expensive than loading normal accounts because the program additionally needs to be verified and compiled.

@tao-stones
Copy link
Contributor Author

tao-stones commented Mar 10, 2023

Do people have thoughts on this approach (#30659) of charging fee for accounts size? Is max 16_000 CU too cheap?

tag @brooksprumo @jeffwashington @sakridge

@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jun 10, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Community contribution stale [bot only] Added to stale content; results in auto-close after a week.
Projects
None yet
Development

No branches or pull requests

4 participants