-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERC: Compressed Integers #3772
Comments
Why not just use floating point numbers? |
Does this make sense? |
This is a very exciting avenue to go down. ERC20 transfers account for a significant amount of eth usage. another point; a lot of addresses hold the exact same token balance. E.g most addresses hold 1 token, 5 tokens, 10 tokens, 100 tokens etc. It’s pointless storing these same amounts thousands of times. There is probably some kind of uint8 reference that can be used which points an address to a common token amount if it holds it. |
I thought about something very close to this. Myself, I was way more ruthless and thought that compressing values in 16 bits instead (1 byte for shift, 1 byte for the amount) was acceptable in many circumstances. That means you have "256 levels of precision", which isn't great but means you can get up to 99.6% accuracy to your intended value (if you have to floor or ceil) or 99.8% accuracy (if you can choose floor or ceil to get to the closest value). Oh, if you want example projects that compress amounts, you can check out Slot Curate, which never launched but uses |
There has been no activity on this issue for six months. It will be closed in a week if no further activity occurs. If you would like to move this EIP forward, please respond to any outstanding feedback or add a comment indicating that you have addressed all required feedback and are ready for a review. |
Compressed Integers
Using lossy compression on uint256 to improve gas costs, ideally by a factor up to 4x.
Abstract
This document specifies compression of
uint256
to smaller data structures likeuint64
,uint96
,uint128
, for optimizing costs for storage in Ethereum Smart Contracts. The smaller data structure (represented ascintx
) is divided into two parts, in the first one we storesignificant
bits and in the other number of leftshift
s needed on the significant bits to decompress. This document also includes two specifications for decompression due to the nature of compression being lossy, i.e. it causes underflow.Motivation
Specification
In this specification, the structure for representing a compressed value is represented using
cintx
, where x is the number of bits taken by the entire compressed value. (On the implementation level, anuintx
can be used for storing acintx
value).Compression
uint256 into cint64 (upto cint120)
The rightmost or least significant 8 bits in
cintx
are reserved for storing the shift and the rest available bits are used to store the significant bits starting from the first1
bit inuintx
.uint256 into cint128 (upto cint248)
The rightmost or least significant 7 bits in
cintx
are reserved for storing the shift and the rest available bits are used to store the significant bits starting from the first on bit inuintx
.Examples:
Decompression
Two decompression methods are defined: a normal
decompress
and adecompressRoundingUp
.Normal Decompression
The
significant
bits in thecintx
are moved to auint256
space and shifted left byshift
.Decompression along with rounding up
The
significant
bits in thecintx
are moved to auint256
space and shifted left byshift
and the least significantshift
bits are1
s.This specification is to be used by a new smart contract for managing its internal state so that any state mutating calls to it can be cheaper. These compressed values on a smart contract's state are something that should not be exposed to the external world (other smart contracts or clients). A smart contract should expose a decompressed value if needed.
Rationale
significant
bits are stored in the most significant part ofcintx
whileshift
bits in the least significant part, to help prevent obvious dev mistakes. For e.g. a number smaller than 256-1 its compressedcint64
value would be itself if the arrangement were to be opposite than specified. If a developer forgets to uncompress a value before using it, this case would still pass if the compressed value is the same as decompressed value.cint64
doesn't render gas savings automatically. The solidity compiler needs to pack more data into the same storage slot.uint120
and 7 up touint248
).Backwards Compatibility
There are no known backward-incompatible issues.
Reference Implementation
Using structs for
cintx
as of Solidity 0.8.6 does not yet serve the purpose of saving gas, (see solidity#11691, requires some backward-incompatible changes in storage layout in Solidity). Hence on the implementation leveluint64
can be used directly.Gist: https://gist.github.com/zemse/0ea19dd9b4922cd68f096fc2eb4abf93
The above gist has
library CompressedMath64
that contains demonstrative logic for compression, decompression, and arithmetic forcint64
, however, we utilizeuint64
data structure due to a present storage layout limitation on nested structs in Solidity. The gist also has an example contract that uses the library for demonstration purposes.Some related work
Some engineers have realized that using
uint256
entirely for storing value/money is not optimal in view of storage costs, and some of the bits in the slot could be better utilized with other data stored in its place.uint256
values touint128
, to make some room for other data. No lossy compression).uint96
)Security Considerations
The following security considerations are discussed:
cint64
cintx
uint256
s.1. Effects due to lossy compression
When a value is compressed, it causes underflow, i.e. some less significant bits are sacrificed. This results in a
cintx
value whose decompressed value is less than or equal to the actualuint256
value.Error estimation for cint64
Let's consider we have a
value
of the order 2m (less than 2m and greater than or equal to 2m-1).For all values such that 2m - 1 - (2m-56 - 1) <=
value
<= 2m - 1, the compressed valuecvalue
is 2m - 1 - (2m-56 - 1).The maximum error is 2m-56 - 1, approximating it to decimal: 10n-17 (log2(56) is 17). Here
n
is number of decimal digits + 1.For e.g. compressing a value of the order $1,000,000,000,000 (or 1T or 1012) to
cint64
, the maximum error turns out to be 1012+1-17 = $10-4 = $0.0001. This means the precision after 4 decimal places is lost, or we can say that the uncompressed value is at maximum $0.0001 smaller. Similarly, if someone is storing $1,000,000 intocint64
, the uncompressed value would be at maximum $0.0000000001 smaller. In comparision, the storage costs are almost $0.8 to initialize and $0.2 to update (20 gwei, 2000 ETHUSD).Handling the error
Note that compression makes the value slightly smaller (underflow). But we also have another operation that also does that. In integer math, the division is a lossy operation (causing underflow). For instance,
The result of the division operation is not always exact, but it's smaller than the actual value, in some cases as in the above example. Though, most engineers try to reduce this effect by doing all the divisions at the end.
The division operation has been in use in the wild, and plenty of lossy integer divisions have taken place, causing DeFi users to get very very slightly less withdrawal amounts, which they don't even notice. If been careful, then the risk is very negligible. Compression is similar, in the sense that it is also a division by 2shift. If been careful with this too, the effects are minimized.
In general, one should follow the rule:
amount.decompress()
).amount.decompressUp()
).The above ensures that smart contract does not loose money due to the compression, it is the user who receives less funds or pays more funds. The extent of rounding is something that is negligible enough for the user. Also just to mention, this rounding up and down pattern is observed in many projects including UniswapV3.
2. Losing precision due to incorrect use of
cintx
This is an example where dev errors while using compression can be a problem.
Usual user amounts mostly have an max entropy of 50, i.e. 1015 (or 250) values in use, that is the reason why we find uint56 enough for storing significant bits. However, let's see an example:
The above code results in a serious precision loss.
sharesC
has an entropy of 50, as well aspriceC
also has an entropy of 50. When we multiply them, we get a value that contains entropies of both, and hence, an entropy of 100. After multiplication is done,cmul
compresses the value, which drops the entropy ofamountC
to 56 (as we have uint56 there to store significant bits).To prevent entropy/precision from dropping, we get out from compression.
Compression is only useful when writing to storage while doing arithmetic with them should be done very carefully.
3. Compressing something other than money
uint256
s.Compressed Integers is intended to only compress money amount. Technically there are about 1077 values that a
uint256
can store but most of those values have a flat distribution i.e. the probability is 0 or extremely negligible. (What is a probability that a user would be depositing 1000T DAI or 1T ETH to a contract? In normal circumstances it doesn't happen, unless someone has full access to the mint function). Only the amounts that people work with have a non-zero distribution ($0.001 DAI to $1T or 1015 to 1030 in uint256). 50 bits are enough to represent this information, just to round it we use 56 bits for precision.Using the same method for compressing something else which have a completely different probability distribution will likely result in a problem. It's best to just not compress if you're not sure about the distribution of values your
uint256
is going to take. And also, for things you think you are sure about using compression for, it's better to give more thought if compression can result in edge cases (e.g. in previous multiplication example).4. Compressing Stable vs Volatile money amounts
Since we have a dynamic
uint8 shift
value that can move around. So even if you wanted to represent 1 Million SHIBA INU tokens or 0.0002 WBTC (both $10 as of this writing), cint64 will pick its top 56 significant bits which will take care of the value representation.It can be a problem for volatile tokens if the coin is extremely volatile wrt user's native currency. Imagine a very unlikely case where a coin goes 256x up (price went up by 1016 lol). In such cases
uint56
might not be enough as even its least significant bit is very valuable. If such insanely volatile tokens are to be stored, you should store more significant bits, i.e. usingcint96
orcint128
.cint64
has 56 bits for storing significant, when only 50 were required. Hence there are 6 extra bits, which means that it is fine if the $ value of the cryptocurrency stored in cint64 increases by 26 or 64x. If the value goes down it's not a problem.The text was updated successfully, but these errors were encountered: