Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Preparation: Precise Tagging + Enhanced Orthogonal Persistence (64-Bit) #4392

Merged
merged 212 commits into from
Feb 29, 2024

Conversation

luc-blaeser
Copy link
Contributor

Preparing merging #4369 in #4225

crusso and others added 27 commits February 18, 2024 16:40
## Changelog for motoko-base:
Branch: next-moc
Commits: [dfinity/motoko-base@b772c9e4...520ccf5d](dfinity/motoko-base@b772c9e...520ccf5)

* [`0f14b175`](dfinity/motoko-base@0f14b17) Unused Declaration Cleanup ([dfinity/motoko-base⁠#614](https://togithub.com/dfinity/motoko-base/issues/614))
Replaces `sdk@dfinity.org` with the recently introduced `team-motoko@dfinity.org`.
## Changelog for motoko-base:
Branch: next-moc
Commits: [dfinity/motoko-base@520ccf5d...712d0587](dfinity/motoko-base@520ccf5...712d058)

* [`cba05e81`](dfinity/motoko-base@cba05e8) Publish on Mops ([dfinity/motoko-base⁠#618](https://togithub.com/dfinity/motoko-base/issues/618))
* [`d81f5527`](dfinity/motoko-base@d81f552) Add commit hash to `matchers` dependency ([dfinity/motoko-base⁠#621](https://togithub.com/dfinity/motoko-base/issues/621))
* [`c86d76ff`](dfinity/motoko-base@c86d76f) doc: update `List.mo` ([dfinity/motoko-base⁠#616](https://togithub.com/dfinity/motoko-base/issues/616))
* [`4c2a90e7`](dfinity/motoko-base@4c2a90e) Fix compiler warning in `Array.take()` method ([dfinity/motoko-base⁠#611](https://togithub.com/dfinity/motoko-base/issues/611))
* add flag to enable rtti

* fix bugs in can_tag_i32/i64 tests and sanity checks

* adjust test assert on heap size

* update perf numbers

* revert change

* revert test

* optimized clearing of all-zero tags

* update perf numbers
…4410)

Only passive Wasm data segments are used by the compiler and runtime system. In contrast to ordinary active data segments, passive segments can be explicitly loaded to a dynamic address.

This simplifies two aspects: 
* The generated Motoko code can contain arbitrarily large data segments which can loaded to dynamic heap when needed.
* The IC can simply retain the main memory on an upgrade without needing to patch the active data segments of the new program version to the persistent memory.

However, more specific handling is required for the Rust-implemented runtime system:
The Rust-generated active data segments of the runtime system is changed to passive and loaded to the expected static address at the program start (canister initialization and upgrade).
The location and size of the RTS data segments is therefore limited to a defined reserve, see above. 
This is acceptable because the RTS only uses small size for data segments (e.g. 54KB) that is independent of the compiled Motoko program.
Housekeeping, largely to reduce the size of the diff in PRs #4416 and  #4377.
# Unused Declaration Detection

Detection of unused program declarations with compiler warnings.

Program example `example.mo`:
```
import Array "mo:base/Array";
import Debug "mo:base/Debug";

actor {
    var variable1 = 0;
    var variable2 = "TEST";

    func testUnusedFunction(parameter1 : Bool, parameter2 : Int) {
        var variable2 = 2;
        var variable3 = 3;
        let variable4 = 4;
        if (variable1 == 0 and variable3 == 3) {
            let variable2 = parameter1;
            Debug.print(debug_show(variable2));
        };
    };
};
```

Compiler messages:
```
example.mo:1.8-1.13: warning [M0194], Unused declaration Array
example.mo:6.9-6.18: warning [M0194], Unused declaration variable2
example.mo:8.10-8.28: warning [M0194], Unused declaration testUnusedFunction
example.mo:8.48-8.58: warning [M0194], Unused declaration parameter2
example.mo:9.13-9.22: warning [M0194], Unused declaration variable2
example.mo:11.13-11.22: warning [M0194], Unused declaration variable4
```

## Coverage

The analysis detects the following unused declarations:
* Variables
* Parameters, including shared context
* Functions
* Classes
* Objects
* Modules
* Imports
* Private fields in objects and classes

Special aspects:
* System functions are considered implicitly used.
* Non-accessed stable variables are considered unused, even if they could be accessed in a future upgraded program version.

## Warnings

The warning of an unused declaration can be suppressed by prefixing the identifier by an underscore.

Example:

```
object Silence {
    public func log(_message: Text) { // Suppress the warning for the unused `_message` parameter.
    }
}
```
## Tweaks from #4407 

* don't warn about unused declarations in code from packages (assuming packaces are third party you can't silence them anyway):
  * annotate LibPath Ast nodes with source package, if any, as tracked and determined during import resolution.
  * predicate unused declaration warnings on package origin.
* don't reject unused declarations in the repl treating top-level  code as belonging to fake package "<top-level>" (a mild hack).
   The repl can't know the rest of the interaction so any warning is premature and a nuisance. 
* change terminology of declarations/variables to bindings/indentifiers (for consistency with rest of code)
* add error-code description in M0194.md
* add changelog entry.

Future: we could suppress all warnings, not just unused declarations - from imported package code this way, should we want to.  A --lint mode could re-enable them for further auditing. The rationale is that the warnings are of interest to and actionable on only by the author of the package, not the client. 

## Future Work

The following analyses are not yet implemented but would be beneficial to support:
* Unused recursive function calls (direct or indirect recursion).
* Unused type definitions, unused type parameters
* Unused branch labels
* Unused variant options
* Unused public fields: Additional aspects to consider:
    - Accesses via paths outside the declaration scope.
    - Possible usage before declaration.
    - Polymorphism of structural typing.
    - A library module may expose more (directly or indirectly) public declarations than used.
* Write-only mutable variables: Mutable variables that are never read but only written
* Unnecessary mutability of read-only variables: Recommend `let` instead of `var`.
The Motoko runtime representation of values is largely untyped, distinguishing only between scalar and boxed values 
 using a single bit of the 32-bit value representation.  The tagging is only to support garbage collection, not precise runtime type information.

In the existing value encoding, a Motoko value in vanilla form is a  32-bit value that is either:
* false `(0b0)`,
* true `(0b1)`, 
* a word-aligned (encoded) pointer to a heap allocated value.
  Encoded by subtracting 1 from the pointer value (ensuring the 2 LSBs are 0b11), pointing
  heap allocated value
* null (some well-known skewed pointer).
* a 31-bit scalar value, stored in the top bits of the value with LSB 0.

Scalar values encode `Nat8/16` and `Int8/16` values and chars, and 31-bit subranges of `Nat32`, `Int32`, `Nat64`, `Int64`, `Nat` and `Int`. Large integer values that don't fit  in a 31-bit scalar are boxed on the heap.

Observe that, in Motoko, some types are always scalar (eg. `Nat8`), some types are always boxed (e.g. `Blob`), and some types have a mixed scalar/boxed representation (e.g. `Nat32` and `Nat`), depending on the size of the value.

This PR adds exact runtime type information to all[*] scalar values, making the scalar values self describing.
Making the _entire_ heap fully self-describing requires refining the heap tags use to identify heap objects, distinguishing boxed `Nat32` from boxed `Int32`, `Blob` from `Principal` and `Text`, tuples from (mutable and immutable) arrays etc. That work of refining heap tags will need to be completed in a follow on or sibling PR, but is hopefully less involved than the changes herein.

To add precise scalar type info, we extend the scalar tagging scheme with a richer set of (inline) type descriptors, using some of the least significant bits of the 31-bit scalar representation.

To avoid dedicating a fix-length suffix (say 1 byte) to the scalar tag, scalar tags are actually variable length, using shorter tags for larger payload types, and longer tags for shorter payload types. This gives us a reasonable tag space (set of possible tags, some still unused), without  reducing the scalar range of mixed representation types too much.

At one extreme, the tag of `Int` (and `Nat`) is just `0b10`, leaving a 30-bit payload for compact `Nat/Int`, losing just `1` bit from the current representation's 31-bit compact range. This is important because `Int`s are common, and `Nat`s are used to index arrays, so we should avoid boxing more than necessary.

In the middle, the tag of `Nat16`, `Int16` is `0b10(0^12)00` and `0b11(0^12)00`, leaving a 16-bit payload in the MSB.

At the other extreme, the tag of the unit value, `()`, is 32-bit `0x01(0^28)00`, occupying the entire value.

The primary motivation of this work is to support value, not type driven, serialization of stable values to a precisely typed stable format, without loss of type information, so that upgrades can still accommodate type dependent changes of representation from one in-memory format to another. Secondary motivations are live and post-mortem heap inspection tools and light-weight debugging tools, that can parse values in locals, arguments and on the heap using tags.

[*] There remain some raw, untagged 31-bit scalars whose type is only known to the compiler. These are used to encode the state of text and blob iterators, hidden in dedicated iterator closure environments. Note that these are not stable types, so need not be precisely tagged for stabilization.

# Tagging Scheme

   | Value | Type | Payload bits |
   |-------| ------| --------------|
   | `((O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O))` | TBool (* false *) | 0 |
   | `((O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,I))`   | TBool (* true *) | 0 |
   | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,I,I))`                                         | TRef    |    30       |  
   | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,I,O))`                                       | TNum |  30        |
   | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,O,I,O,O))`                                     | TNat64 | 28       |
   | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,I,I,O,O))`                                       |  TInt64  |  28      |
   | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,O,I,O,O,O))`                                    | TNat32 |  27      |
   | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,I,I,O,O,O))`                                     | TInt32   | 27        |
   | ... unused tags ....                             | ...   | ...       |
   | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (_,_,_,_,_,O,I,O), (O,O,O,O,O,O,O,O))` | TChar | 21 |
  | ... unused tags ....                             | ...   | ...       |
   | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (O,I,O,O,O,O,O,O), (O,O,O,O,O,O,O,O))`| TNat16 | 16 |
   | `((_,_,_,_,_,_,_,_), (_,_,_,_,_,_,_,_), (I,I,O,O,O,O,O,O), (O,O,O,O,O,O,O,O))` | TInt16 | 16 |
  | ... unused tags ....                             | ...   | ...       |
   | `((_,_,_,_,_,_,_,_), (O,I,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O))` | TNat8 | 8 |
   | `((_,_,_,_,_,_,_,_), (I,I,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O))` | TInt8 | 8 |
  | ... unused tags ....                             | ...   | ...       |
   | `((O,I,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O), (O,O,O,O,O,O,O,O))` |  TUnit | 0 |

# Implementation

The implementation was carried out in a number of precursor PRs:

* #4098: Added 1-byte tags to small  values, untagging an retagging on every operation, with many code changes.
* #4278: Made the payload/tag size for scalar values configurable using a fixed compile time constant.
* #4322: Added tags to compact `Nat32/Int32` and `Nat32/Nat64`, making the payload size type-dependent.
            The previously untyped _StackReps_ `UnboxedWord32` and `UnboxedWord64` were extended to carry a type 
argument. The argument is used to remember and re-introduce the precise tag on unboxing and boxing. 
            It can also be used to verify the tag on unboxing, for sanity checking.
* #4345: Tag compact Int and Nat (both as Int due to subtyping) 
* #4353: Extended the range of compact `Int/Nat` from 29 to 30-bit, by adjusting the tagging scheme. This is just 1 bit less
            than with the existing scheme (31-bit, untagged scalars).
* #4354: Improved the tagging scheme to use the longest possible tags for the required payload size, upping the ranges of unused tags (for future use)
* #4357: Merge with master, fixs bugs in sanity checking of tags. Fix bugs revealing by more stringent sanity checks.
* #4363: Uses the `UnboxedWord32/Word64` stack reps also for untagged, 0-right-padded  small tagged values, 
           tagging/untagging only on exit to and from stack. 
           This alone reduces the (large) 80% overhead in bench/nat16.mo to 55%.
           It also has the advantage of reverting almost all changes to the arithmetic code, 
           which can now (again) assume values are right, 0-padded as it did previously,
* #4369: (this PR) does a small tweak so that mutable locals containing small tagged values in untagged form, extending
           the existing optimization done for mutable locals containing unboxed `Nat32`/`Int32` and `Int64`/`Nat64`.
           This reduces the `bench/nat16.mo` overhead from 55% to just 6% (the benchmark use repeated in-place updates in a tight loop so benefits greatly).
           This PR also makes use of the previously unused bit in the the compact representation of `Nat32s` and `Nat64s` which previously had to concur with the representation of `Int32` and `Nat64` and could only represent half the unsigned range.
           With the typed StackRep, we now know whether the values are signed or not and can choose distinct compact
           representation for `Nat32` vs `Int32`, and `Nat64` vs `Int64` rather that shared ones.
           Note however, that the compact representation for `Nat` cannot recover the missing bit because of subtyping.
           A compact `Nat` **must** have the same representation as a compact `Int` to support non-coercive subtyping.
* #4375 (incoming): rewrite array iter optimization to respect compact bignum representation invariants.
* #4400 : gate feature behind `Mo_config.Flags.rtti (default off)`, avoiding overhead for now.
* added (unadvertised) flag `--experimental-rtti` to enable feature for performance feedback from users.

# Overheads

These are the cycle count and code size differences measured using `test/bench` and  `test/perf`, compared against master (see spreadsheet for perf of interim PRs).

Summarized from:

https://docs.google.com/spreadsheets/d/1zC2Hsl9gGUzJESQmSABPiu-XIsICEw1I3O-JKHNWVQs/edit?usp=sharing


## perf


## test/perf

Master |   |   | Widening |   | Widening vs Master |   | Gated |   | Gated vs Master
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
gas/assetstorage | 10013950 |   | gas/assetstorage | 10013950 | 0.00% |   | gas/assetstorage | 10013950 | 0.00%
size/assetstorage | 186455 |   | size/assetstorage | 186705 | 0.13% |   | size/assetstorage | 186520 | 0.03%
gas/dao | 4413634512 |   | gas/dao | 4413744976 | 0.00% |   | gas/dao | 4413743944 | 0.00%
size/dao | 265797 |   | size/dao | 266385 | 0.22% |   | size/dao | 265922 | 0.05%
gas/qr | 1302744688 |   | gas/qr | 1305067118 | 0.18% |   | gas/qr | 1302750018 | 0.00%
size/qr | 256049 |   | size/qr | 256925 | 0.34% |   | size/qr | 256285 | 0.09%
gas/reversi | 80920993 |   | gas/reversi | 81019001 | 0.12% |   | gas/reversi | 80927129 | 0.01%
size/reversi | 175956 |   | size/reversi | 176421 | 0.26% |   | size/reversi | 176084 | 0.07%
gas/sha224 | 460197621 |   | gas/sha224 | 498978947 | 8.43% |   |   |   |  
size/sha224 | 191929 |   | size/sha224 | 192859 | 0.48% |   |   |   |  
gas/sha256 | 14487063673 |   | gas/sha256 | 15568532694 | 7.47% |   | gas/sha256 | 14486916565 | 0.00%
size/sha256 | 179075 |   | size/sha256 | 180167 | 0.61% |   | size/sha256 | 179223 | 0.08%



## test/bench

Master |   |   | Widening |   | Widening vs Master |   | Gated |   | Gated vs Master
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
gas/alloc | 9,243,068,120.00 |   | gas/alloc | 10,350,366,461.00 | 11.98% |   | gas/alloc | 9243068126 | 0.00%
size/alloc | 181,066.00 |   | size/alloc | 180,759.00 | -0.17% |   | size/alloc | 180464 | -0.33%
gas/bignum | 130,604,743.00 |   | gas/bignum | 130,606,013.00 | 0.00% |   | gas/bignum | 130604779 | 0.00%
size/bignum | 184,420.00 |   | size/bignum | 184,093.00 | -0.18% |   | size/bignum | 183790 | -0.34%
gas/heap-32 | 1,610,218,447.00 |   | gas/heap-32 | 1,695,702,521.00 | 5.31% |   | gas/heap-32 | 1609469958 | -0.05%
size/heap-32 | 182,167.00 |   | size/heap-32 | 181,856.00 | -0.17% |   | size/heap-32 | 181556 | -0.34%
gas/nat16 | 61,393,031.00 |   | gas/nat16 | 65,587,813.00 | 6.83% |   | gas/nat16 | 61393019 | 0.00%
size/nat16 | 181,010.00 |   | size/nat16 | 180,727.00 | -0.16% |   | size/nat16 | 180408 | -0.33%
gas/palindrome | 10,131,340.00 |   | gas/palindrome | 10,133,866.00 | 0.02% |   | gas/palindrome | 10131268 | 0.00%
size/palindrome | 185,338.00 |   | size/palindrome | 185,024.00 | -0.17% |   | size/palindrome | 184695 | -0.35%
gas/region0-mem | 6,402,149,937.00 |   | gas/region0-mem | 6,452,495,054.00 | 0.79% |   | gas/region0-mem | 6402149955 | 0.00%
size/region0-mem | 181,898.00 |   | size/region0-mem | 181,602.00 | -0.16% |   | size/region0-mem | 181281 | -0.34%
gas/region-mem | 5,974,331,587.00 |   | gas/region-mem | 6,024,676,752.00 | 0.84% |   | gas/region-mem | 5974331605 | 0.00%
size/region-mem | 181,539.00 |   | size/region-mem | 181,252.00 | -0.16% |   | size/region-mem | 180931 | -0.33%
gas/stable-mem | 3,885,566,188.00 |   | gas/stable-mem | 3,935,898,195.00 | 1.30% |   | gas/stable-mem | 3885566206 | 0.00%
size/stable-mem | 181,896.00 |   | size/stable-mem | 181,600.00 | -0.16% |   | size/stable-mem | 181279 | -0.34%
gas/xxx-nat32 | 57,198,791.00 |   | gas/xxx-nat32 | 57,199,237.00 | 0.00% |   | gas/xxx-nat32 | 57198779 | 0.00%
size/xxx-nat32 | 181,001.00 |   | size/xxx-nat32 | 180,694.00 | -0.17% |   | size/xxx-nat32 | 180399 | -0.33%
@luc-blaeser luc-blaeser merged commit 7996b86 into luc/stable-heap64 Feb 29, 2024
6 checks passed
@luc-blaeser luc-blaeser deleted the luc/stable-heap64-tagging branch February 29, 2024 12:12
luc-blaeser added a commit that referenced this pull request Aug 26, 2024
* Adjust emscripten dependency for nix

* Use latest emscripten from nix unstable channel

* Adjust CI build

* Adjust CI build

* Adjust CI build

* Adjust CI build

* Add latest emscripten via nix `sources.json`

* Adjust emscripten dependency in `sources.json`

* Update sources.json

* Update sources.json

* Disable base library tests

* Adjust build

* Adjust tests, disable benchmark

* Enable random tests on 64-bit

* Bug fix

* Exclude inter-actor quickcheck tests

* Downscale test for CI

* Remove unnecessary clean-up function

* Adjust `is_controller` system call

* Manual merge from master

* Fix direct numeric conversions

* Use `drun` with 64-bit main memory

* Adjust callback signatures

* Adjust ignore callback sentinel value

* Bug fix

* Remove memory reserve feature

* Adjust CI build

* Adjust serialization

* Bug fix

* Bug fix

* Bug fix

* Adjust IC system calls

* Adjust IC system calls

* Bug fix

* Create Cargo.lock

* Adjust region and stable memory accesses

* Fix float format length

* Update nix setup

* Adjust tests

* Adjust nix config

* Adjust stabilization

* Bug fix

* Adjust stable memory and region accesses

* Adjust region RTS calls

* Manual merge RTS tests

* Manual merge of compiler

* Adjust IC call

* Update benchmark

* Adjust test

* Adjust test script

* Adjust tests

* Bug fix

* Adjust tests

* Adjust linker tests

* Minor refactoring

* Adjust test

* Adjust CI build

* Update IC dependency

* Wasm profiler does not support 64-bit

* Test case beyond 4GB

* Update CI test configuration

* Increase partitioned heap to 64GB

* Update IC dependency

* Manual merge, to be continued

* Adjust BigInt literals

* Bug fix

* Adjust tests

* Manual merge conflict resolution

* Code refactoring

* Update IC dependency

* Increase data segment limit

* Adjust test case

* Update migration test case

* Revert "Code refactoring"

This reverts commit 8063f8b.

* Adjust test case

* Update benchmark results

* Update documentation

* Update fingerprint to 64-bit

* Manual merge Rust allocator

* Remove memory reserve

* Test CI build

* Refine memory compatibility check

* Add test case

* Distinguish blob and Nat8 arrays

* Bug fix

* Reformat code

* Update benchmark results

* Distinguish tuple type in memory compatibility check

* Update IC dependency

* Revert "Test CI build"

This reverts commit d4889f9.

* Use 64-bit IC API

* Update IC dependency

* Update benchmark results

* Adjust sanity checks

* Reformat

* Upgrade IC dependency, use persistence flag

* Update IC dependency

* Update IC dependency

* Manual resolution of undetected merge conflicts

* Manual merge conflict resolution

* Resolve merge conflicts

* Manual merge conflict resolution

* Manual merge: Adjust test

* Merge branch 'luc/stable-heap' into luc/stable-heap64

* Update base library dependency

* Manual merge conflict resolution

* Updating nix hashes

* Limit array length because of optimized array iterator

* Code refactoring

* Update benchmark results

* Update motoko base dependency

* Enhanced Orthogonal Persistence: Use Passive Data Segments (64-Bit) (#4411)

Only passive Wasm data segments are used by the compiler and runtime system. In contrast to ordinary active data segments, passive segments can be explicitly loaded to a dynamic address.

This simplifies two aspects: 
* The generated Motoko code can contain arbitrarily large data segments which can loaded to dynamic heap when needed.
* The IC can simply retain the main memory on an upgrade without needing to patch the active data segments of the new program version to the persistent memory.

However, more specific handling is required for the Rust-implemented runtime system:
The Rust-generated active data segments of the runtime system is changed to passive and loaded to the expected static address at the program start (canister initialization and upgrade).
The location and size of the RTS data segments is therefore limited to a defined reserve, see above. 
This is acceptable because the RTS only uses a small sized data segment that is independent of the compiled Motoko program.

* Update IC dependency

* Merge Preparation: Precise Tagging + Enhanced Orthogonal Persistence (64-Bit) (#4392)

Preparing merging #4369 in #4225

* Manual merge conflict resolution

* Update Motoko base depedency

* Manual merge conflict resolution

* Manual merge conflict resolution

* Optimization: Object Pooling for Enhanced Orthogonal Persistence (#4465)

* Object pooling

* Update benchmark results

* Optimize further (BigNum pooling)

* Update benchmark results

* Adjust tests

* Optimize static blobs

* Adjust test and benchmark results

* Update documentation

* Manual merge conflict resolution

* Update .gitignore

* Enhanced Orthogonal Persistence: Refactor 64-bit Port of SLEB128 for BigInt (#4486)

* Refactor 64-bit port of SLEB128 for BigInt

* Remove redundant test file

* Adjust data segment loading

To avoid allocation of trap text blob during object pool creation.

* Manual merge conflict resolution

* Manual merge conflict resolution

* Manual merge conflict resolution

* Update benchmark results

* Manual merge conflict resolution

* Update Motoko base dependency

* Manual merge conflict resolution

* Apply the expected shift distance

* Remove redundant code

* Code refactoring: Move constant

* Add a debug assertion

* Code refactoring: Reduce code difference

* Update comment

* Represent function indices as `i32`

* Use pointer compression on Candid destabilization

Candid destabilization remembers aliases as 32-bit pointers in deserialized data. However, the deserialized pointers can be larger than 32-bit due to the 64-bit representation. Therefore, use pointer compression (by 3 bits) to store the 64-bit addresses in the 32-bit alias memo section.

* Manual merge conflict resolution

* Fix test case

* Add comment

* Add TODO comment

* Code refactoring: Arithmetics

* Fix boundary check in small `Int` `pow` function

* Code refactoring: `Nat` conversions

* Code refactoring: Remove redundant blank.

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Fix tagging for `hashBlob`

* Remove redundant shifts for signed bit count operations

* Reenable randomized tests

* Update quickcheck documentation

* Revert unwanted modification in test case

This partially reverts commit ea20c6c.

* Adjust test case to original configuration

* Try to run original `map-upgrades` test

* Tests for wasi stable memory beyond 4GB

* Update expected test result

* Code refactoring: Linker

* Optimizations

* Manual merge conflict resolution

* Use 64-bit version of Tom's math library

* Add benchmark case

* Optimize float to int conversion for 64-bit

* Manual merge conflict resolution

* Experiment Remove `musl`/`libc` dependency from RTS (#4577)

* Remove MUSL/LIBC dependency from RTS
* Update benchmark result

* Manual merge conflict resolution

* Unbounded Number of Heap Partitions for 64-Bit (#4556)

* EOP: Support Unknown Main Memory Capacity in 64-Bit (#4585)

* Tune for unknown memory capacity in 64-bit

* Adjust benchmark results

* Fix debug assertion, code refactoring

* Code refactoring: Improve comments

* Reformat

* Re-enable memory reserve for upgrade and queries

See PR #4158

* Adjust comment

* Fix build

* EOP: Integrating Latest IC with Memory 64 (#4610)

* Adjust to new system API

* Port to latest IC 64-bit system API

* Update to new IC with Wasm64

* Updating nix hashes

* Update IC dependency (Wasm64 enabled)

* Update expected test results

* Fix migration test

* Use latest `drun`

* Adjust expected test results

* Updating nix hashes

* Update expected test results

* Fix `drun` nix build for Linux

* Disable DTS in `drun`, refactor `drun` patches

* Adjust expected test results

---------

Co-authored-by: Nix hash updater <41898282+github-actions[bot]@users.noreply.github.com>

* Enhanced Orthogonal Persistence (64-Bit with Graph Copy) (#4475)

* Graph copy: Work in progress

* Implement stable memory reader writer

* Add skip function

* Code refactoring

* Continue stabilization function

* Support update at scan position

* Code refactoring

* Code refactoring

* Extend unit test

* Continue implementation

* Adjust test

* Prepare memory compatibility check

* Variable stable to-space offset

* Deserialize with partitioned heap

* Prepare metadata stabilization

* Adjust stable memory size

* Stabilization version management

* Remove code redundancies

* Fix version upgrade

* Put object field hashes in a blob

* Support object type

* Code refactoring

* Support blob, fix bug

* Renaming variable

* Adjust deserialization heap start

* Handle null singleton

* Fix version upgrade

* Support regions

* Backup first word in stable memory

* Support additional fields in upgraded actor

* Make unit tests runnable again

* Dummy null singleton in unit test

* Add test cases

* Support boxed 32-bit and 64-bit numbers

* Support more object types

* Support more object types

* Handle `true` bool constant

* Grow main memory on bulk copy

* Update benchmark results

* Support bigint

* Clear deserialized data in stable memory

* Update test results

* Add documentation

* Reformat

* Add missing file

* Update design/GraphCopyStabilization.md

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Update rts/motoko-rts/src/stabilization.rs

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Update rts/motoko-rts/src/stabilization.rs

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Graph Copy: Explicit Stable Data Layout (#4293)

Refinement of Graph-Copy-Based Stabilization (#4286):

Serialize/deserialize in an explicitly defined and fixed stable layout for a long-term perspective.
* Supporting 64-bit pointer representations in stable format, even if main memory currently only uses 32-bit addresses. 

Open aspect: 
* Make `BigInt` stable format independent of Tom's math library.

* Update rts/motoko-rts/src/stabilization.rs

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Update rts/motoko-rts/src/stabilization/layout.rs

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Handle non-stable fields in stable records

* Add object type `Some`

* Add test case

* Adjust stabilization to incremental GC

* Update benchmark results

* Distinguish assertions

* Fix RTS unit test

* Update benchmark results

* Adjust test

* Adjust test

* Fix: Handle all non-stable types during serialization

* Fix typos and complete comment

* Experiment: Simplified Graph-Copy-Based Stabilization (#4313)

# Experiment: Simplified Graph-Copy-Based Stabilization

**Simplified version of #4286, without stable memory buffering and without memory flipping on deserialization.**

Using graph copying instead of Candid-based serialization for stabilization, to save stable variables across upgrades. 

## Goals

* **Stop-gap solution until enhanced orthogonal persistence**: More scalable stabilization than the current Candid(ish) serialization.
* **With enhanced orthogonal persistence**: Upgrades in the presence of memory layout changes introduced by future compiler versions.

## Design

Graph copy of sub-graph of stable objects from main memory to stable memory and vice versa on upgrades.

## Properties
* Preserve sharing for all objects like in the heap.
* Allow the serialization format to be independent of the main memory layout.
* Limit the additional main memory needed during serialization and deserialization.
* Avoid deep call stack recursion (stack overflow).

## Memory Compatibility Check
Apply a memory compatibility check analogous to the enhanced orthogonal persistence, since the upgrade compatibility of the graph copy is not identical to the Candid subtype relation.

## Algorithm
Applying Cheney’s algorithm [1, 2] for both serialization and deserialization:

### Serialization
* Cheney’s algorithm using main memory as from-space and stable memory as to-space: 
* Focusing on stable variables as root (sub-graph of stable objects).
* The target pointers and Cheney’s forwarding pointers denote the (skewed) offsets in stable memory.
* Using streaming reads for the `scan`-pointer and streaming writes for the `free`-pointer in stable memory.

### Deserialization
* Cheney’s algorithm using stable memory as from-space and main memory as to-space: 
* Starting with the stable root created during the serialization process.
* Objects are allocated in main memory using the default allocator.
* Using random read/write access on the stable memory.

## Stable Format
For a long-term perspective, the object layout of the serialized data in the stable memory is fixed and independent of the main memory layout.
* Pointers support 64-bit representations, even if only 32-bit pointers are used in current main memory address space.
* The Brooks forwarding pointer is omitted (used by the incremental GC).
* The pointers encode skewed stable memory offsets to the corresponding target objects.
* References to the null objects are encoded by a sentinel value.

## Specific Aspects
* The null object is handled specifically to guarantee the singleton property. For this purpose, null references are encoded as sentinel values that are decoded back to the static singleton of the new program version.
* Field hashes in objects are serialized in a blob. On deserialization, the hash blob is allocated in the dynamic heap. Same-typed objects that have been created by the same program version share the same hash blob.
* Stable records can dynamically contain non-stable fields due to structural sub-typing. A dummy value can be serialized for such fields as a new program version can no longer access this field through the stable types.
* For backwards compatibility, old Candid destabilzation is still supported when upgrading from a program that used older compiler version.
* Incremental GC: Serialization needs to consider Brooks forwarding pointers (not to be confused with the Cheney's forwarding information), while deserialization can deal with partitioned heap that can have internal fragmentation (free space at partition ends).

## Complexity
Specific aspects that entail complexity:
* For each object type, not only serialization and deserialization needs to be implemeneted but also the pointer scanning logic of its serialized and deserialized format. Since the deserialization also targets stable memory the existing pointer visitor logic cannot be used for scanning pointers in its deserialized format.
* The deserialization requires scanning the heap which is more complicated for the partitioned heap. The allocator must yield monotonously growing addresses during deserialization. Free space gaps are allowed to complete partitions.

## Open Aspects
* Unused fields in stable records that are no longer declared in a new program versions should be removed. This could be done during garbage collection, when objects are moved/evacuated.
* The binary serialization and deserialization of `BigInt` entails dynamic allocations (cf. `mp_to_sbin` and `mp_from_sbin` of Tom's math library).

## Related PRs

* Motoko Enhanced Orthogonal Persistence: #4225
* Motoko Incremental Garbage Collector: #3837

## References

[1] C. J. Cheney. A Non-Recursive List Compacting Algorithm. Communications of the ACM, 13(11):677-8, November 1970.

[2] R. Jones and R. Lins. Garbage Collection: Algorithms for Automatic Dynamic Memory Management. Wiley 2003. Algorithm 6.1: Cheney's algorithm, page 123.

* Bug fix: Allocations are not monotonically growing in partitioned heap for large objects

* Update benchmark results

* Update benchmark results

* Drop content of destabilized `Any`-typed actor field

* Refactor `is_primitive_type` in Candid parser and subtype check

* Do not use the cache for the main actor type compatibility check

* Update benchmark results

* Increase chunk size for stable memory clearing

* Custom bigint serialization

* Update benchmark results

* Update documentation

* Update documentation

* Optimize array deserialization

* Update benchmark results

* Code refactoring of upgrade version checks

* Remove redundant math functions

* Eliminate size redundancy in the `Object` header

* Also adjust the `Object` header in the compiler

* Revert "Also adjust the `Object` header in the compiler"

This reverts commit f75bb76.

* Revert "Eliminate size redundancy in the `Object` header"

This reverts commit 0fe3926.

* Record the upgrade instruction costs

* Update tests for new `Prim.rts_upgrade_instructions()` function

* Make test more ergonomic

* Incremental Graph-Copy-Based Upgrades (#4361)

# Incremental Graph-Copy-Based Upgrades

Refinement of #4286

Supporting arbitrarily large graph-copy-based upgrades beyond the instruction limit:
* Splitting the stabilization/destabilization in multiple asynchronous messages.
* Limiting the stabilization work units to fit the update or upgrade messages.
* Blocking other messages during the explicit incremental stabilization.
* Restricting the upgrade functionality to the canister owner and controllers.
* Stopping the GC during the explicit incremental upgrade process.

## Usage

For large upgrades:
1. Initiate the explicit stabilization before the upgrade:
    
```
dfx canister call CANISTER_ID __motoko_stabilize_before_upgrade "()"
```

* An assertion first checks that the caller is the canister owner or a canister controller.
* All other messages to the canister will be blocked until the upgrade has been successfully completed.
* The GC is stopped.
* If defined, the actor's pre-upgrade function is called before the explicit stabilization.
* The stabilzation runs in possibly multiple asynchronous messages, each with a limited number of instructions.

2. Run the actual upgrade:

```
dfx deploy CANISTER_ID
```

* Run and complete the stabilization if not yet done in advance. 
* Perform the actual upgrade of the canister on the IC.
* Start the destabilization with a limited number of steps to fit into the upgrade message.
* If destabilization cannot be completed, the canister does not start the GC and does not accept messages except step 3.

3. Complete the explicit destabilization after the upgrade:

```
dfx canister call CANISTER_ID __motoko_destabilze_after_upgrade "()"
```

* An assertion checks that the caller is the canister owner or a canister controller.
* All other messages remain blocked until the successful completion of the destabilization.
* The destabilzation runs in possibly multiple asynchronous messages, each with a limited number of instructions.
* If defined, the actor's post-upgrade function is called at the end of the explicit destabilization.
* The GC is restarted.

## Remarks

* Steps 1 (explicit stabilization) and/or 2 (explicit destabilization) may not be needed if the corresponding operation fits into the upgrade message.
* Stabilization and destabilization steps are limited to the increment limits:

    Operation | Message Type | IC Instruction Limit | **Increment Limit**
    ----------|--------------|----------------------|--------------------
    **Explicit (de)stabilization step** | Update | 20e9 | **16e9**
    **Actual upgrade** | Upgrade | 200e9 | **160e9**

* The stabilization code in the RTS has been restructured to be less monolithic.

* Manual merge conflict resolution (work in progress)

* Adjust tests, resolve some merge bugs

* Adjust RTS test case

* Make RTS tests run again

* Add missing function export

* Adjust imports, manual merge conflict resolution

* Manual merge conflict resolution

* Manual merge conflict resolution

* Adjust persistence initialization

* Adjust persistence version management

* Adjust stable memory metadata for enhanced orthogonal persistence

Distinguish enhanced orthogonal persistence from Candid legacy stabilization

* Add comment

* Adjust graph stabilization initialization

* Adjust GC mode during destabilization

* Adjust object visitor for graph destabilization

* Adjust incremental graph destabilization

* Adjust error message

* Adjust tests

* Adjust tests

* Update benchmark results

* Adjust test

* Upgrade stable memory version after graph destabilization

* Adjust memory sanity check

* Clear memory on graph destabilization as first step

* Adjust big int serialization for 64-bit

* Fix: Clear memory on graph destabilization

* Add test case for graph stabilization

* Add test case for incremental graph stabilization

* Add tests for graph stabilization

* Add more tests for graph stabilization

* Add more test cases for graph stabilization

* Add more test cases for graph stabilization

* More conservative persistence version check

* Adjust expected test results

* Adjust test

* Adjust tests

* Adjust tests

* Adjust RTS test for stabilization

* Adjust tests

* Adjust test results

* Remove unwanted binary files

* Adjust comment

* Code refactoring

* Fix merge mistake

* Manual merge conflict resolution

* Add test cases

* Manual merge conflict resolution

* Fix typo in documentation

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Fix typo in documentation

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Bug fix: Allow stabilization beyond compiler-specified stable memory limit

* Adjustment to RTS unit tests

* Add comments

* Code refactoring

* Fix difference between debug and release test execution

* Fix typo in comment

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Fix typo in comment

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Fix typo in comment

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Fix typo in comment

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Delete unused file

* Code refactoring

* Use correct trap for an unreachable case

* Remove dead code

* Fix typo in comment

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Fix typo in function identifier

* Fix indendation

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Removing unused code

* Fix typo in comment

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Fix typo in comment

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Fix RTS compile error

* Bug fix: Object size lookup during stabilization

* experiment: refactoring of ir extensions in graph-copy PR (#4543)

* refactoring of ir

* fix arrange_ir.ml

---------

Co-authored-by: luc-blaeser <luc.blaeser@dfinity.org>

* Manual merge conflict resolution

* Adjust test case, remove file check

* Manual merge conflict resolution

* Manual merge conflict resolution

* test graph copy of text and blob iterators (#4562)

* Optimize instruction limit checks

* Bug fix graph copy limit on destabilization

* Incremental stable memory clearing after graph copy

* Parameter tuning for graph copy

* Manual merge conflict resolution

* Manual merge conflict resolution

* Remove redundant code

* Manual merge conflict resolution: Remove `ObjInd` from graph-copy stabilization

* Manual merge conflict resolution

* Merge Preparation: Latest IC with Graph Copy (#4630)

* Adjust to new system API

* Port to latest IC 64-bit system API

* Update to new IC with Wasm64

* Updating nix hashes

* Update IC dependency (Wasm64 enabled)

* Update expected test results

* Fix migration test

* Use latest `drun`

* Adjust expected test results

* Updating nix hashes

* Update expected test results

* Fix `drun` nix build for Linux

* Disable DTS in `drun`, refactor `drun` patches

* Update expected test results for new `drun`

* Limiting amount of stable memory accessed per graph copy increment

* Reformat

* Adjust expected test result

---------

Co-authored-by: Nix hash updater <41898282+github-actions[bot]@users.noreply.github.com>

* Message-dependent stable memory access limit

* Graph copy: Fix accessed memory limit during stabilization

* Enhanced Orthogonal Persistence (Complete Integration) (#4488)

* Prepare two compilation targets

* Combined RTS Makefile

* Port classical compiler backend to combined solution

* Adjust nix config file

* Start combined RTS

* Reduce classical compiler backend changes

* Continue combined RTS

* Make RTS compilable for enhanced orthogonal persistence

* Make RTS tests runnable again for enhanced orthogonal persistence

* Adjust compiler backend of enhanced orthogonal persistence

* Unify Tom's math library binding

* Make classical non-incremental RTS compile again

* Make classical incremental GC version compilable again

* Make all RTS versions compile again

* Adjust memory sanity check for combined RTS modes

* Prepare RTS tests for combined modes

* Continue RTS test merge

* Continue RTS tests combined modes

* Continue RTS tests support for combined modes

* Adjust LEB128 encoding for combined mode

* Adjust RTS test for classical incremental GC

* Adjust RTS GC tests

* Different heap layouts in RTS tests

* Continue RTS GC test multi-mode support

* Make all RTS run again

* Adjust linker to support combined modes

* Adjust libc import in RTS for combined mode

* Adjust RTS test dependencies

* Bugfix in Makefile

* Adjust compiler backend import for combined mode

* Adjust RTS import for combined mode

* Adjust region management to combined modes

* Adjust classical compiler backend to fit combined modes

* Reorder object tags to match combined RTS

* Adjust test

* Adjust linker for multi memory during Wasi mode with regions

* Adjust tests

* Adjust bigint LEB encoding for combined modes

* Adjust bigint LEB128 encoding for combined modes

* Adjust test

* Adjust tests

* Adjust test

* Code refactoring: SLEB128 for BigInt

* Adjust tests

* Adjust test

* Reformat

* Adjust tests

* Adjust benchmark results

* Adjust RTS for unit tests

* Reintroduce compiler flags in classical mode

* Support classical incremental GC

* Add missing export for classical incremental GC

* Adjust tests

* Adjust test

* Adjust test

* Adjust test

* Adjust test

* Adjust test

* Adjust test

* Pass `keep_main_memory` upgrade option only for enhanced orthogonal persistence

* Adjust test

* Update nix hash

* Adjust Motoko base dependency

* Adjust tests

* Extend documentation

* Adjust test

* Update documentation

* Update documentation

* Manual merge conflict resolution

* Manual merge refinement

* Manual merge conflict resolution

* Manual merge conflict resolution

* Refactor migration test from classical to new persistence

* Adjust migration test

* Manual merge conflict resolution

* Manual merge conflict resolution

* Adjust compiler reference documentation

* Test CI build

* Test CI build

* Adjust performance comparison in CI build

* Manual merge conflict resolution

* Add test for migration paths

* Adjust test for integrated PR

* Adjust test case

* Manual merge conflict resolution

* Manual merge conflict resolution

* Manual merge conflict resolution

* Manual merge conflict resolution

* Code refactoring

* Fix typo in comment

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Manual merge conflict resolution

* Add static assertions, code formatting

* Manual merge conflict resolution

* Add test case

* Refine comment

Co-authored-by: Claudio Russo <claudio@dfinity.org>

* Manual merge conflict resolution

* Manual merge conflict resolution

* Code refactoring

* Manual merge conflict resolution

* Adjust test run script messages

* Manual merge conflict resolution

* Manual merge conflict resolution

* Manual merge conflict resolution

* Manual merge conflict resolution

* Merge Preparation: Dynamic Memory Capacity for Integrated EOP (#4586)

* Tune for unknown memory capacity in 64-bit

* Adjust benchmark results

* Fix debug assertion, code refactoring

* Manual merge conflict resolution

* Manual merge conflict resolution

* Code refactoring: Improve comments

* Reformat

* Fix debug assertion

* Re-enable memory reserve for upgrade and queries

See PR #4158

* Manual merge conflict resolution

* Manual merge conflict resolution

* Update benchmark results

* Manual merge conflict resolution

* Manual merge conflict resolution

* Merge Preparation: Latest IC with Integrated EOP  (#4638)

* Adjust to new system API

* Port to latest IC 64-bit system API

* Update to new IC with Wasm64

* Updating nix hashes

* Update IC dependency (Wasm64 enabled)

* Update expected test results

* Fix migration test

* Use latest `drun`

* Adjust expected test results

* Updating nix hashes

* Update expected test results

* Fix `drun` nix build for Linux

* Disable DTS in `drun`, refactor `drun` patches

* Update expected test results for new `drun`

* Limiting amount of stable memory accessed per graph copy increment

* Reformat

* Manual merge conflict resolution

* Manual merge conflict resolution

* Adjust expected test result

---------

Co-authored-by: Nix hash updater <41898282+github-actions[bot]@users.noreply.github.com>

* Manual merge conflict resolution

* Documentation Update for Enhanced Orthogonal Persistence (#4670)

---------

Co-authored-by: Claudio Russo <claudio@dfinity.org>
Co-authored-by: Nix hash updater <41898282+github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: Claudio Russo <claudio@dfinity.org>
Co-authored-by: Nix hash updater <41898282+github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: Nix hash updater <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claudio Russo <claudio@dfinity.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants