Cadence Compact Format (CCF) Specification RC and CCF Codec (fully self-describing mode) #2157

fxamacker · 2022-11-21T20:30:51Z

Problem

Currently, Cadence external values (such as events) are encoded using JSON-Cadence Data Interchange Format. The JSON-based format is a human-readable text encoding that is verbose, inefficient, and doesn't define deterministic encoding (canonical format).

Proposed Solution

Design and specify Cadence Compact Format (CCF) as a more compact, efficient, and deterministic alternative to the JSON-based format.

We can leverage an existing data format's data model to reduce complexity, cost, and risks to CCF specs and codecs.

Additionally, we can design and implement a CCF codec by using an existing codec (e.g. the same way COSE codec uses CBOR codec under the hood).

For more info about CCF requirements or design considerations, see

Scope

CCF Specifications only specifies Cadence Compact Format.

It is outside the scope of CCF Specifications to specify individual CCF-based formats or protocols (e.g. events).

Introduction

CCF is a data format that allows compact, efficient, and deterministic encoding of Cadence external values.

Cadence external values (e.g. events, transaction arguments, etc.) have been encoded using JSON-CDC, which is inefficient, verbose, and doesn't define deterministic encoding.

The same FeesDeducted event on the Flow blockchain can encode to:

298 bytes in JSON-CDC (minified).
118 bytes in CCF (fully self-describing mode).
20 bytes in CCF (partially self-describing mode).

CCF defines all requirements for deterministic encoding (sort orders, smallest encoded forms, and Cadence-specific requirements) to allow CCF codecs implemented in different programming languages to produce the same deterministic encodings.

Some requirements (such as "Deterministic CCF Encoding Requirements") are defined as optional. Each CCF-based format or protocol can have its specification state how CCF options are used. This allows each protocol to balance tradeoffs such as compatibility, determinism, speed, encoded data size, etc.

CCF uses CBOR and is designed to allow efficient detection and rejection of malformed messages without creating Cadence objects. This allows more costly checks for validity, etc. to be performed only on well-formed messages.

CBOR is an Internet Standard defined by IETF STD 94. CBOR is designed to be relevant for decades and is used by data formats and protocols such as W3C WebAuthn, C-DNS (IETF RFC 8618), COSE (IETF STD 96), CWT (IETF RFC 8392), etc.

Preliminary Benchmark Comparisons (obsolete)

We are not comparing apples to apples. Prior formats (CBF and JSON-Cadence Data Interchange) didn't specify requirements for validity, sorting, etc.

CCF encoder sorts events data for deterministic encoding.
CCF decoder verifies that events data are well-formed and sorted.

At this time, CCF decoder doesn't include the option to check for "Preferred Serialization" (encoding to smallest size).

These informal and preliminary benchmarks used CCF in fully self-describing mode (see size comparisons above).

$ benchstat bench_json_events_48k.log bench_ccf_events_48k.log 
goos: linux
goarch: amd64
pkg: github.com/onflow/cadence/encoding/ccf
cpu: 13th Gen Intel(R) Core(TM) i5-13600K
                     │ bench_json_events_48k.log │      bench_ccf_events_48k.log       │
                     │          sec/op           │   sec/op     vs base                │
EncodeBatchEvents-20                 96.61m ± 4%   70.73m ± 3%  -26.79% (p=0.000 n=10)
DecodeBatchEvents-20                 647.7m ± 3%   157.5m ± 3%  -75.68% (p=0.000 n=10)
geomean                              250.1m        105.5m       -57.81%

                     │ bench_json_events_48k.log │       bench_ccf_events_48k.log       │
                     │           B/op            │     B/op      vs base                │
EncodeBatchEvents-20                32.45Mi ± 0%   25.82Mi ± 0%  -20.45% (p=0.000 n=10)
DecodeBatchEvents-20               234.97Mi ± 0%   56.16Mi ± 0%  -76.10% (p=0.000 n=10)
geomean                             87.32Mi        38.08Mi       -56.39%

                     │ bench_json_events_48k.log │      bench_ccf_events_48k.log       │
                     │         allocs/op         │  allocs/op   vs base                │
EncodeBatchEvents-20                 756.6k ± 0%   370.4k ± 0%  -51.05% (p=0.000 n=10)
DecodeBatchEvents-20                 4.746M ± 0%   1.288M ± 0%  -72.86% (p=0.000 n=10)
geomean                              1.895M        690.7k       -63.55%

NOTE: These benchmarks used JSON-CDC (298 byte msg) and
CCF in fully self-describing mode (118 byte msg),
instead of CCF in partially self-describing mode (20 byte msg).

TODO

Next steps, such as integration and use of CCF for events, transaction arguments, etc. are outside the scope of this epic.

Edits

2022-11-22 - added Preliminary Size and Benchmark Comparisons
2022-11-23 - update TODO to mention first CCF specs meeting and mark it as done. Also add link to flow-go 3593 and fxamacker/ccf_draft/issues.
2023-03-03 - update TODO and Preliminary Size and Benchmark Comparisons using commit f400c35 in PR Add CCF codec implementing CCF Specification (RC1) #2364.
2023-04-18 - update TODO.
2023-04-27 - add "Introduction" (copied from https://github.com/fxamacker/ccf_draft), remove old size comparisons, mark "Preliminary Benchmark Comparison" as obsolete, and add PR 2467 as completed item.
2023-08-10 - sync with latest Introduction from onflow/ccf (CCF Specs RC2) and update TODO list. For context on timeline, this project was paused in April 2023 by higher priority project, and then resumed on May 26 to implement and deploy CCF events encoding with June 7 testnet deadline, and then paused again.
2023-08-21 - closing this epic and will open issues for remaining todos

The text was updated successfully, but these errors were encountered:

bluesign · 2022-11-22T09:29:59Z

is this related to previous work done in this area? or is it started from scratch?

PS: notion doc is not accessible ( initial requirements )

fxamacker · 2022-11-22T15:25:55Z

EDIT: clarify answer to remove ambiguity, renamed CCF "design" section to "objectives" and replace wall of text with CCF Abstract

Hi @bluesign

is this related to previous work done in this area? or is it started from scratch?

The requirements/objectives are related to previous work but design & implementation is from scratch:

For requirements/objectives, this leverages previous work done in this area. The "Objectives" section of draft CCF specs is mostly from the same requirements doc by @ramtinms, plus some extra items like detection of malformed data, etc.
For CCF design, specs, and codec this is a complete redesign and implementation started from scratch.

Some reasons for CCF replacing previous work (the proposed Cadence Binary Format) are mentioned in onflow/flow-go#3448. E.g., it didn't have all required features (type and data were not separate, so redundant Cadence type info couldn't be eliminated from messages) and it didn't leverage existing data formats.

CCF will close all 15 open issues in onflow/cadence related to Cadence Binary Format (CBF).

BTW, I just updated text of this epic to add Preliminary Size and Benchmark Comparisons section. CCF's preliminary encoding size and benchmark comparisons are at onflow/flow-go#3593.

notion doc is not accessible ( initial requirements )

The initial requirements in notion is captured by the Objectives section of CCF Specifications, which is a superset of initial requirements doc. For example, it adds early detection of malformed data (so CCF codecs don't need to create Cadence objects when attempting to decode malformed data).

See Draft CCF Specification for more details (superset of initial requirements from notion is listed in the Objectives section). For convenience, here's the abstract:

Abstract

Cadence Compact Format (CCF) is a data format designed for compact, efficient, and deterministic encoding of Cadence external values.

Cadence is a resource-oriented programming language that introduces new features to smart contract programming. It's used by Flow blockchain and has a syntax inspired by Swift, Kotlin, and Rust. Its use of resource types maps well to the Move language.

CCF can be used as a hybrid data format. CCF-based messages can be fully self-describing or partially self-describing. Both are more compact than JSON-based messages. CCF-based protocols can send Cadence metadata just once for all messages of that type. Malformed data can be detected without Cadence metadata and without creating Cadence objects.

bluesign · 2022-11-25T07:00:13Z

thanks @fxamacker, I read in detail, it is really well designed and solid.

fxamacker · 2023-03-02T01:11:19Z

Thanks @bluesign for taking a look at the draft specs. I updated my reply (it's still long but less so). 😄

More detailed requirements were added to specs and PR #2364 has the CCF codec implementing CCF Specification (RC1). PR #2364 has more up-to-date info, including size & benchmark comparisons using 48,000+ events from a mainnet transaction.

bluesign · 2023-03-02T13:41:04Z

thanks @fxamacker I just checked the benchmarks, but I feel CCF should be a lot faster. Plugging something like go-json etc, would easily make them similar encoding performances. ( which in those events probably 30-40% time is spent on encoding UFix64 to String for json )

fxamacker · 2023-03-02T17:13:41Z

I just checked the benchmarks, but I feel CCF should be a lot faster. Plugging something like go-json etc, would easily make them similar encoding performances.

Hi @bluesign,

Yes, I also think CCF codec can be faster (it will be 😄). There are tradeoffs to get deterministic encoding, compact size, etc. For the first implementation, I'm focused on correctness because CCF is already faster, uses less memory, and produces smaller encoded size than JSON.

There are possible optimizations to increase encoding performance. I want to check use cases with Cadence and FVM team before implementing them. Some optimizations may be at CCF level, some at CCF-based protocol level. We'll see.

Apples to Apples Speed Comparisons

Speed comparisons need to account for extra requirements such as deterministic encoding, extra data validation, and smaller encoded data size. JSON-Cadence Data Interchange (JSON-CDC) doesn't sort and isn't deterministic.

CCF decoder verifies event data is sorted, etc.
CCF encoder sorts data to encode event data deterministically.
CCF encoder avoids encoding redundant Cadence type information (e.g. not repeatedly encoding same Cadence type for every element of array).

As noted earlier, complying with extra requirements such as determinism, sorting, compact size, etc. requires extra processing and memory but CCF is still faster, uses less memory, and produces smaller encoded size.

After fuzzing is in place, CCF codec and CCF-based protocols can be optimized for even faster speed if needed. 😄

bluesign · 2023-03-03T07:47:38Z

yeah I meant in a good way that there is a big buffer for optimisations. :)

JSON-Cadence Data Interchange (JSON-CDC) doesn't sort and isn't deterministic.

how this can be possible? I don't believe JSON-CDC (encoder) can be non-deterministic, can you explain a bit on that?

CCF decoder verifies event data is sorted, etc.
CCF encoder avoids encoding redundant Cadence type information (e.g. not repeatedly encoding same Cadence type for every element of array).

this shouldn't be a performance bottleneck, as it is not sorting. second one is benefit to performance.

CCF encoder sorts data to encode event data deterministically.

sort here also confused me too ( but probably related to first question )

Btw: Btw I really love CCF, my comments usually sound a bit grumpy or criticising as English is my second language, I am just trying to understand it deeper.

fxamacker · 2023-03-06T16:16:49Z

Hi @bluesign

my comments usually sound a bit grumpy or criticising as English is my second language

Your English is excellent and didn't sound negative at all! 👍

this shouldn't be a performance bottleneck, as it is not sorting. second one is benefit to performance.

Yes, although this isn't a bottleneck for decoder, the encoder does extra work to avoid encoding redundant Cadence type information and it is a tradeoff for encoding less data.

how this can be possible? I don't believe JSON-CDC (encoder) can be non-deterministic, can you explain a bit on that?

I agree with @turbolent's comment last year in issue #2165, that "The JSON encoding is not deterministic".

The JSON-CDC Specifications doesn't mention "sort". That is an indication that it doesn't fully specify how to encode deterministically (e.g. how encoders must sort dictionaries).

JSON-CDC encoder doesn't sort dictionary elements or composite fields. For ordering these, the encoder relies on implementation detail of Cadence (outside of the encoder).

For these reasons, we cannot claim JSON-CDC encoding is deterministic.

fxamacker · 2023-08-21T13:40:13Z

Closing this.

CCF Specs RC2 was imported from https://github.com/fxamacker/ccf_draft -> https://github.com/onflow/ccf and then cleaned up a bit. I would like to make it more concise, etc. but didn't have time this weekend.
CCF Codec currently deployed to mainnet matches CCF Specs RC2. Currently, CCF Codec only supports fully self-describing mode.

Future changes to codec should be fuzzed before being merged.

fxamacker added Epic Performance labels Nov 21, 2022

fxamacker self-assigned this Nov 21, 2022

This was referenced Nov 23, 2022

Evaluate using RFC 8949 (CBOR) to design and define Cadence Compact Format (CCF) to replace JSON and Cadence Binary Format (CBF) onflow/flow-go#3448

Closed

[CCF] Design and publish first draft of Cadence Compact Format (CCF) onflow/flow-go#3593

Closed

turbolent mentioned this issue Nov 30, 2022

Standard on-chain encoding scheme for Cadence values #2165

Open

fxamacker mentioned this issue Dec 18, 2022

Add static type for Cadence external PathLink value #2167

Closed

j1010001 mentioned this issue Jan 31, 2023

Event encoding with CCF #2283

Closed

This was referenced Mar 1, 2023

Replace DRAFT status with RC1 fxamacker/ccf_draft#61

Merged

Add CCF codec implementing CCF Specification (RC1) #2364

Merged

[stream-mode branch] Add StreamEncoder.Close() to recycle buffer fxamacker/cbor#392

Merged

j1010001 added the E&V Team Execution / Verification / Edge Team label Mar 22, 2023

fxamacker mentioned this issue Apr 4, 2023

Update CCF codec to support type parameters for function types #2410

Closed

This was referenced Apr 27, 2023

Support function-purity annotations in CCF #2457

Closed

CCF codec: add API to override default CBOR limits #2449

Closed

fxamacker mentioned this issue May 16, 2023

Update Security Considerations and other text fxamacker/ccf_draft#89

Merged

This was referenced May 30, 2023

Update CCF codec to support AnyStructAttachmentType, AnyResourceAttachementType #2515

Closed

Support 2 new attachment types in CCF codec #2516

Merged

Support cyclic reference value in CCF codec #2521

Merged

fxamacker mentioned this issue Jun 10, 2023

Add support for CCF encoding/decoding modes and options #2559

Merged

8 tasks

fxamacker mentioned this issue Jul 10, 2023

[Exec] Use CCF in self-describing mode to encode events (replaces JSON-CDC) onflow/flow-go#4417

Merged

2 tasks

fxamacker mentioned this issue Aug 4, 2023

Update CCF specs and tag RC2 before it gets hosted at onflow/ccf repo fxamacker/ccf_draft#92

Closed

9 tasks

fxamacker changed the title ~~Cadence Compact Format (CCF) Specification and CCF Codec~~ Cadence Compact Format (CCF) Specification RC and CCF Codec (fully self-describing mode) Aug 21, 2023

fxamacker closed this as completed Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cadence Compact Format (CCF) Specification RC and CCF Codec (fully self-describing mode) #2157

Cadence Compact Format (CCF) Specification RC and CCF Codec (fully self-describing mode) #2157

fxamacker commented Nov 21, 2022 •

edited

Loading

bluesign commented Nov 22, 2022 •

edited

Loading

fxamacker commented Nov 22, 2022 •

edited

Loading

bluesign commented Nov 25, 2022

fxamacker commented Mar 2, 2023

bluesign commented Mar 2, 2023

fxamacker commented Mar 2, 2023 •

edited

Loading

bluesign commented Mar 3, 2023

fxamacker commented Mar 6, 2023

fxamacker commented Aug 21, 2023

Cadence Compact Format (CCF) Specification RC and CCF Codec (fully self-describing mode) #2157

Cadence Compact Format (CCF) Specification RC and CCF Codec (fully self-describing mode) #2157

Comments

fxamacker commented Nov 21, 2022 • edited Loading

Problem

Proposed Solution

Scope

Introduction

Preliminary Benchmark Comparisons (obsolete)

TODO

Edits

bluesign commented Nov 22, 2022 • edited Loading

fxamacker commented Nov 22, 2022 • edited Loading

Abstract

bluesign commented Nov 25, 2022

fxamacker commented Mar 2, 2023

bluesign commented Mar 2, 2023

fxamacker commented Mar 2, 2023 • edited Loading

Apples to Apples Speed Comparisons

bluesign commented Mar 3, 2023

fxamacker commented Mar 6, 2023

fxamacker commented Aug 21, 2023

fxamacker commented Nov 21, 2022 •

edited

Loading

bluesign commented Nov 22, 2022 •

edited

Loading

fxamacker commented Nov 22, 2022 •

edited

Loading

fxamacker commented Mar 2, 2023 •

edited

Loading