Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-0058? | New CIP for bitwise primitives #268

Closed
wants to merge 1 commit into from

Conversation

goverthrow
Copy link
Contributor

Abstract

Add primitives for bitwise operations, based on BuiltinByteString, without requiring new data types.

Motivation

Bitwise operations are one of the most fundamental building blocks of algorithms
and data structures. They can be used for a wide variety of applications,
ranging from representing and manipulating sets of integers efficiently, to
implementations of cryptographic primitives, to fast searches. Their wide
availability, law-abiding behaviour and efficiency are the key reasons why they
are widely used, and widely depended on.

At present, Plutus lacks meaningful support for bitwise operations, which
significantly limits what can be usefully done on-chain. While it is possible to
mimic some of these capabilities with what currently exists, and it is always
possible to introduce new primitives for any task, this is extremely
unsustainable, and often leads to significant inefficiencies and duplication of
effort.

We describe a list of bitwise operations, as well as their intended semantics,
designed to address this problem.

Example applications

We provide a range of applications that could be useful or beneficial on-chain,
but are difficult or impossible to implement without some, or all, of the
primitives we propose.

Succinct data structures

Due to the on-chain size limit, many data structures become impractical or
impossible, as they require too much space either for their elements, or their
overheads, to allow them to fit alongside the operations we want to perform on
them. Succinct data structures could serve as a solution to this, as they
represent data in an amount of space much closer to the entropy limit and ensure
only constant overheads. There are several examples of these, and all rely on
bitwise operations for their implementations.

For example, consider wanting to store a set of BuiltinIntegers
on-chain. Given current on-chain primitives, the most viable option involves
some variant on a BuiltinList of BuiltinIntegers; however,
this is unviable in practice unless the set is small. To see why, suppose that
we have an upper limit of $k$ on the BuiltinIntegers we want to store;
this is realistic in practically all cases. To store $n$
BuiltinIntegers under the above scheme requires

$$n \cdot \left( \left\lceil \frac{\log_2(k)}{64} \right\rceil \cdot 64 + c\right) $$

bits, where $c$ denotes the constant overhead for each cons cell of
the BuiltinList holding the data. If the set being represented is dense
(meaning that the number of entries is a sizeable fraction of $k$), this cost
becomes intolerable quickly, especially when taking into account the need to
also store the operations manipulating such a structure on-chain with the script
where the set is being used.

If we instead represented the same set as a bitmap based on
BuiltinByteString, the amount of space required would instead be

$$\left\lceil \frac{k}{8} \right\rceil \cdot 8 + \left\lceil \frac{\log_2(k)}{64} \right\rceil \cdot 64 $$

bits. This is significantly better unless $n$ is small. Furthermore,
this representation would likely be more efficient in terms of time in practice,
as instead of having to crawl through a cons-like structure, we can implement
set operations on a memory-contiguous byte string:

  • The cardinality of the set can be computed as a population count. This
    can have terrifyingly efficient implementations: the Muła-Kurz-Lemire
    algorithm (the current state of the art) can process four kilobytes per loop
    iteration, which amounts to over four thousand potential stored integers.
  • Insertion or removal is a bit set or bit clear respectively.
  • Finding the smallest element is a find-first-one.
  • Testing for membership is a check to see if the bit is set.
  • Set intersection is bitwise and.
  • Set union is bitwise inclusive or.
  • Set symmetric difference is bitwise exclusive or.

A potential implementation could use a range of techniques to make these
operations extremely efficient, by relying on SWAR (SIMD-within-a-register)
techniques if portability is desired, and SIMD instructions for maximum speed.
This would allow both potentially large integer sets to be represented on-chain
without breaking the size limit, and nodes to efficiently compute with such,
reducing the usage of resources by the chain. Lastly, in practice, if
compression techniques are used (which also rely on bitwise operations!), the
number of required bits can be reduced considerably in most cases without
compromising performance: the current state-of-the-art (Roaring Bitmaps) can be
used as an example of the possible gains.

In order to make such techniques viable, bitwise primitives are mandatory.
Furthermore, succinct data structures are not limited to sets of integers, but
all require bitwise operations to be implementable.

Binary representations and encodings

On-chain, space is at a premium. One way that space can be saved is with binary
representations, which can potentially represent something much closer to the
entropy limit, especially if the structure or value being represented has
significant redundant structure. While some possibilities for a more efficient
packing already exist in the form of BuiltinData, it is rather
idiosyncratic to the needs of Plutus, and its decoding is potentially quite
costly.

Bitwise primitives would allow more compact binary encodings to be defined,
where complex structures or values are represented using fixed-size
BuiltinByteStrings. The encoders and decoders for these could also be
implemented more efficiently than currently possible, as there exist numerous
bitwise techniques for this.

Goals

To ensure a focused and meaningful proposal, we specify our goals below.

Useful primitives

The primitives provided should enable implementations of algorithms and data
structures that are currently impossible or impractical. Furthermore, the
primitives provided should have a high power-to-weight ratio: having them should
enable as much as possible to be implemented.

Maintaining as many algebraic laws as possible

Bitwise operations, via Boolean algebras, have a long and storied history of
algebraic laws, dating back to important results by the like of de Morgan, Post
and many others. These algebraic laws are useful for a range of reasons: they
guide implementations, enable easier testing (especially property testing) and
in some cases much more efficient implementations. To some extent, they also
formalize our intuition about how these operations should work. Thus,
maintaining as many of these laws in our implementation, and being clear about
them, is important.

Allowing efficient, portable implementations

Providing primitives alone is not enough: they should also be efficient. This is
not least of all because many would associate primitive operation with a
notion of being close to the machine, and therefore fast. Thus, it is on us to
ensure that the implementations of the primitives we provide have to be
implementable in an efficient way, across a range of hardware.

Clear indication of failure

While totality is desirable, in some cases, there isn't a sensible answer for us
to give. A good example is a division-by-zero: if we are asked to do such a
thing, the only choice we have is to reject it. However, we need to make it as
easy as possible for someone to realize why their program is failing, by
emitting a sensible message which can later be inspected.

Non-goals

We also specify some specific non-goals of this proposal.

No metaphor-mixing between numbers and bits

A widespread legacy of C is the mixing of treatment of numbers and blobs of
bits: specifically, the allowing of logical operations on representations of
numbers. This applies to Haskell as much as any other language: according to the
Haskell Report, it is in fact required that any type implementing
Bits implement Num first. While GHC Haskell only mandates
Eq, it still defines Bits instances for types clearly meant to
represent numbers. This is a bad choice, as it creates complex situations and
partiality in several cases, for arguably no real gain other than C-like bit
twiddling code.

Even if two types share a representation, their type distinctness is meant to be
a semantic or abstraction boundary: just because a number is represented as a
blob of bits does not necessarily mean that arbitrary bit manipulations are
sensible. However, by defining such a capability, we create several semantic
problems:

  • Some operations end up needing multiple definitions to take this into
    account. A good example are shifts: instead of simply having left or right
    shifts, we now have to distinguish arithmetic versus logical
    shifts, simply to take into account that a shift can be used on something
    which is meant to be a number, which could be signed. This creates
    unnecessary complexity and duplication of operations.
  • As Plutus BuiltinIntegers are of arbitrary precision, certain
    bitwise operations are not well-defined on them. A good example is bitwise
    complement: the bitwise complement of $0$ cannot be defined sensibly, and in
    fact, is partial in its Bits instance.
  • Certain bitwise operations on BuiltinInteger would have quite
    undesirable semantic changes in order to be implementable. A good example
    are bitwise rotations: we should be able to decompose a rotation left or
    right by $n$ into two rotations (by $m_1$ and $m_2$ such that $m_1 + m_2 = n$)
    without changing the outcome. However, because trailing zeroes are not
    tracked by the implementation, this can fail depending on the choice of
    decomposition, which seems needlessly annoying for no good reason.
  • Certain bitwise operations on BuiltinInteger would require
    additional arguments and padding to define them sensibly. Consider bitwise
    logical AND: in order to perform this sensibly on BuiltinIntegers
    we would need to specify what length we assume they have, and some policy
    of padding when the length requested is longer than one, or both,
    arguments. This feels unnecessary, and it isn't even clear exactly how we
    should do this: for example, how would negative numbers be padded?

These complexities, and many more besides, are poor choices, owing more to the
legacy of C than any real useful functionality. Furthermore, they feel like a
casual and senseless undermining of type safety and its guarantees for very
small and questionable gains. Therefore, defining bitwise operations on
BuiltinInteger is not something we wish to support.

There are legitimate cases where a conversion from BuiltinInteger to
BuiltinByteString is desirable; this conversion should be provided, and
be both explicit and specified in a way that is independent of the machine or
the implementation of BuiltinInteger, as well as total and
round-tripping. Arguably, it is also desirable to provide built-in support for
BuiltinByteString literals specified in a way convenient to their
treatment as blobs of bytes (for example, hexadecimal or binary notation), but
this is outside the scope of this proposal.

Specification

Proposed operations

We propose several classes of operations. Firstly, we propose two operations for
inter-conversion between BuiltinByteString and BuiltinInteger:

integerToByteString :: BuiltinInteger -> BuiltinByteString

Convert a number to a bitwise representation.


byteStringToInteger :: BuiltinByteString -> BuiltinInteger

Reinterpret a bitwise representation as a number.


We also propose several logical operations on BuiltinByteStrings:

andByteString :: BuiltinByteString -> BuiltinByteString -> BuiltinByteString

Perform a bitwise logical AND on arguments of the same
length, producing a result of the same length, erroring otherwise.


iorByteString :: BuiltinByteString -> BuiltinByteString -> BuiltinByteString

Perform a bitwise logical IOR on arguments of the same
length, producing a result of the same length, erroring otherwise.


xorByteString :: BuiltinByteString -> BuiltinByteString -> BuiltinByteString

Perform a bitwise logical XOR on arguments of the same
length, producing a result of the same length, erroring otherwise.


complementByteString :: BuiltinByteString -> BuiltinByteString

Complement all the bits in the argument, producing a
result of the same length.


Lastly, we define the following additional operations:

shiftByteString :: BuiltinByteString -> BuiltinInteger -> BuiltinByteString

Performs a bitwise shift of the first argument by the
absolute value of the second argument, with padding, the direction being
indicated by the sign of the second argument.


rotateByteString :: BuiltinByteString -> BuiltinInteger -> BuiltinByteString

Performs a bitwise rotation of the first argument by
the absolute value of the second argument, the direction being indicated by
the sign of the second argument.


popCountByteString :: BuiltinByteString -> BuiltinInteger

Returns the number of $1$ bits in the argument.


testBitByteString :: BuiltinByteString -> BuiltinInteger -> BuiltinBool

If the position given by the second argument is not in
bounds for the first argument, error; otherwise, if the bit given by that
position is $1$, return True, and False otherwise.


writeBitByteString :: BuiltinByteString -> BuiltinInteger -> BuiltinBool -> BuiltinByteString

If the position given by the second
argument is not in bound for the first argument, error; otherwise, set the
bit given by that position to $1$ if the third argument is True,
and $0$ otherwise.


findFirstSetByteString :: BuiltinByteString -> BuiltinInteger

Return the lowest index such that testBitByteString with the first
argument and that index would be True. If no such index exists,
return $-1$ instead.

Semantics

Preliminaries

We define $\mathbb{N}^{+} = { x \in \mathbb{N} \mid x \neq 0 }$. We assume
that BuiltinInteger is a faithful representation of $\mathbb{Z}$. A
bit sequence $s = s_n s_{n-1} \ldots s_0$ is a sequence such that for
all $i \in {0,1,\ldots,n}$, $s_i \in {0, 1}$. A bit sequence $s = s_n s_{n-1} \ldots s_0$ is a byte sequence if $n = 8k - 1$ for some $k \in \mathbb{N}$. We denote the empty bit sequence (and, indeed, byte sequence
as well) by $\emptyset$.

We intend that BuiltinByteStrings represent byte sequences, with the
sequence of bits being exactly as the description above. For example, given the
byte sequence 0110111100001100, the BuiltinByteString
corresponding to it would be o\f.

Let $i \in \mathbb{N}^{+}$. We define the sequence $\mathtt{binary}(i) = (d_0, m_0), (d_1, m_1), \ldots$ as

  • $m_0 = i \mod 2$, $d_0 = \frac{i}{2}$ if $i$ is even, and $\frac{i - 1}{2}$ if it is odd.
  • $m_j = d_{j - 1} \mod 2$, $d_j = \frac{d_{j-1}}{2}$ if $d_j$ is even,
    and $\frac{d_{j-1} - 1}{2}$ if it is odd.

Representation of BuiltinInteger as BuiltinByteString and conversions

We describe the translation of BuiltinInteger into
BuiltinByteString which is implemented as the
integerToByteString primitive. Informally, we represent
BuiltinIntegers with the least significant bit at bit position $0$,
using a twos-complement representation. More precisely, let $i \in \mathbb{N}^{+}$. We represent $i$ as the bit sequence $s = s_n s_{n-1} \ldots s_0$, such that:

  • $\sum_{j \in {0, 1, \ldots, n}} s_j \cdot 2^j = i$; and
  • $s_n = 0$.
  • Let $\mathtt{binary}(j) = (d_0, m_0), (d_1, m_1), \ldots$. For any $j \in {0, 1, \ldots, n - 1}$, $s_j = m_j$; and
  • $n + 1 = 8k$ for the smallest $k \in \mathbb{N}^{+}$ consistent with the previous requirements.

For $0$, we represent it as the sequence 00000000 (one zero byte). We
represent any $i \in { x \in \mathbb{Z} \mid x < 0 }$ as the twos-complement
of the representation of its additive inverse. We observe that any such sequence
is by definition a byte sequence.

To interpret a byte sequence $s = s_n s_{n - 1} \ldots s_0$ as a
BuiltinInteger, we use the following process:

  • If $s$ is 00000000, then the result is $0$.
  • Otherwise, if $s_n = 1$, let $s^{\prime}$ be the twos-complement of $s$. Then the result is the additive inverse of the result of interpreting $s^{\prime}$.
  • Otherwise, the result is $\sum_{i \in {0, 1, \ldots, n}} s_i \cdot 2^i$.

The above interpretation is implemented as the byteStringToInteger
primitive. We observe that byteStringToInteger and
integerToByteString form an isomorphism. More specifically:

byteStringToInteger . integerToByteString = 
integerToByteString . byteStringToInteger = 
id

Bitwise logical operations on BuiltinByteString

Throughout, let $s = s_n s_{n-1} \ldots s_0$ and $t = t_m t_{m - 1} \ldots t_0$ be two byte sequences. Whenever we
specify a mismatched length error result, its error message must contain
at least the following information:

  • The name of the failed operation;
  • The reason (mismatched lengths); and
  • The lengths of the arguments.

We describe the semantics of andByteString. For inputs $s$ and $t$, if
$n \neq m$, the result is a mismatched length error. Otherwise, the result is
the byte sequence $u = u_n u_{n - 1} \ldots, u_0$ such that for all $i \in {0, 1, \ldots, n}$ we have

$$u_i = \begin{cases} 1 & s_i = t_i = 1 0 & \text{otherwise} \end{cases} $$

For iorByteString, for inputs $s$ and $t$, if $n \neq m$, the result is
a mismatched length error. Otherwise, the result is the byte sequence $u = u_n u_{n - 1} \ldots u_0$ such that for all $i \in {0, 1, \ldots, n}$ we have

$$u_i = \begin{cases} 1 & s_i = 1 1 & t_i = 1 0 & \text{otherwise} \end{cases} $$

For xorByteString, for inputs $s$ and $t$, if $n \neq m$, the result is
a mismatched length error. Otherwise, the result is the byte sequence $u = u_n u_{n-1} \ldots u_0$ such that for all $i \in {0, 1, \ldots, n}$ we have

$$u_i = \begin{cases} 0 & s_i = t_i 1 & \text{otherwise} \end{cases} $$

We observe that, for length-matched arguments, each of andByteString,
iorByteString and xorByteString describes a commutative and
associative operation. Furthermore, for any given length $k$, each of these
operations have an identity element: for iorByteString, this is the bit
sequence of length $k$ where each element is $0$, and for andByteString
and xorByteString, this is the bit sequence of length $k$ where each
element is $1$. Lastly, for any length $k$, the bit sequence of length $k$ where
each element is $0$ is an absorbing element for andByteString, and the
bit sequence of length $k$ where each element is $1$ is an absorbing element for
iorByteString.

We now describe the semantics of complementByteString. For input $s$,
the result is the byte sequence $u = u_n u_{n - 1} \ldots u_0$ such that for all
$i \in {0, 1, \ldots, n}$ we have

$$u_i = \begin{cases} 1 & s_i = 0 0 & \text{otherwise} \end{cases} $$

We observe that complementByteString is self-inverting. We also note
the following equivalences hold assuming b and b' have the
same length; these are the DeMorgan laws:

complementByteString (andByteString b b') = 
iorByteString (complementByteString b) (complementByteString b')

complementByteString (iorByteString b b') = 
andByteString (complementByteString b) (complementByteString b')

Mixed operations

Throughout this section, let $s = s_n s_{n-1} \ldots s_0$ and $t = t_m t_{m - 1} \ldots t_0$ be byte sequences, and let $i \in \mathbb{Z}$.

We describe the semantics of shiftByteString. Informally, these are logical
shifts, with negative shifts moving away from bit index $0$, and positive
shifts moving towards bit index $0$. More precisely, given the argument
$s$ and $i$, the result of shiftByteString is the byte sequence
$u_n u_{n - 1} \ldots u_0$, such that for all $j \in {0, 1, \ldots, n }$, we have

$$u_j = \begin{cases} s_{j + i} & j - i \in {0, 1, \ldots, n } 0 & \text{otherwise} \end{cases} $$

We observe that for $k, \ell$ with the same sign and any bs, we have

shiftByteString (shiftBytestring bs k) l = shiftByteString bs (k + l)

We now describe rotateByteString, assuming the same inputs as the
description of shiftByteString above. Informally, the direction of
the rotations matches that of shiftByteString above. More precisely,
the result of rotateByteString on the given inputs is the byte sequence
$u_n u_{n - 1} \ldots u_0$ such that for all $j \in {0, 1, \ldots, n}$, we
have $u_j = s_{j + i \mod (n + 1)}$. We observe that for any $k, \ell$, and any
bs, we have

rotateByteString (rotateByteString bs k) l = rotateByteString bs (k + l)

We also note that

rotateByteString bs 0 = shiftByteString bs 0 = bs

For popCountByteString with argument $s$, the result is

$$\sum_{j \in {0, 1, \ldots, n}} s_j $$

Informally, this is just the total count of $1$ bits. We observe that
for any bs and bs', we have

popCountByteString bs + popCountByteString bs' = 
popCountByteString (appendByteString bs bs')

We now describe the semantics of testBitByteString and
writeBitByteString. Throughout, whenever we specify an out-of-bounds error result, its error message must contain at least the
following information:

  • The name of the failed operation;
  • The reason (out of bounds access);
  • What index was accessed out-of-bounds; and
  • The valid range of indexes.

For testBitByteString with arguments $s$ and $i$, if $0 \leq i \leq n$,
then the result is True if $s_i = 1$, and False if $s_i = 0$;
otherwise, the result is an out-of-bounds error. Let b :: BuiltinBool;
for writeBitByteString with arguments $s$, $i$ and b, if $0\leq i \leq n$, then the result is the byte sequence $u_n u_{n - 1} \ldots u_0$
such that for all $j \in {0, 1, \ldots, n}$, we have

$$u_j = \begin{cases}
1 & i = j \text{ and b } = \text{True}
0 & i = j \text{ and b } = \text{False}
s_j & \text{otherwise}
\end{cases}
$$

If $i < 0$ or $i > n$, the result is an out-of-bounds error.

Lastly, we describe the semantics of findFirstSetByteString. Given the
argument $s$, if for any $j \in {0, 1, \ldots, n }$, $s_j = 0$, the result is
$-1$; otherwise, the result is $k$ such that all of the following hold:

  • $k \in {0, 1, \ldots, n}$;
  • $s_k = 1$; and
  • For all $0 \leq k^{\prime} < k$, $s_{k^{\prime}} = 0$.

Costing

All of the primitives we describe are linear in one of their arguments. For a
more precise description, see the table below.

Primitive Linear in
integerToByteString Argument (only one)
byteStringToInteger Argument (only one)
andByteString One argument (same length for both)
iorByteString One argument (same length for both)
xorByteString One argument (same length for both)
complementByteString Argument (only one)
shiftByteString BuiltinByteString argument
rotateByteString BuiltinByteString argument
popCountByteString Argument (only one)
testBitByteString BuiltinByteString argument
writeBitByteString BuiltinByteString argument
findFirstSetByteString Argument (only one)

Primitives and which argument they are linear in

Rationale

Why these operations?

There needs to be a well-defined
interface between the world of BuiltinInteger and
BuiltinByteString. To provide this, we require
integerToByteString and byteStringToInteger, which is designed
to roundtrip (that is, describe an isomorphism). Furthermore, by spelling out a
precise description of the conversions,
we make this predictable and portable.

Our choice of logical AND, IOR, XOR and complement as the primary logical
operations is driven by a mixture of prior art, utility and convenience. These
are the typical bitwise logical operations provided in hardware, and in most
programming languages; for example, in the x86 instruction set, the following
bitwise operations have existed since the 8086:

  • AND: Bitwise AND.
  • OR: Bitwise IOR.
  • NOT: Bitwise complement.
  • XOR: Bitwise XOR.

Likewise, on the ARM instruction set, the following bitwise operations have
existed since ARM2:

  • AND: Bitwise AND.
  • ORR: Bitwise IOR.
  • EOR: Bitwise XOR.
  • ORN: Bitwise IOR with complement of the second argument.
  • BIC: Bitwise AND with complement of the second argument.

Going up a level, the C and Forth programming languages (according to C89 and
ANS Forth respectively) define bitwise AND (denoted \& and
AND respectively), bitwise IOR (denoted | and OR
respectively), bitwise XOR (denoted \^ and XOR respectively)
and bitwise complement (denoted \~ and NOT respectively) as
the primitive bitwise operations. This is followed by basically all languages
higher-up than C and Forth: Haskell's Bits type class defines these
same four as .&., .|., xor and complement.

This ubiquity in choices leads to most algorithm descriptions that rely on
bitwise operations to assume that these four are primitive, and thus,
constant-time and cost. While we could reduce this number
(and, in fact, due to Post, we know that there exist two sole sufficient
operators), this would be both inconvenient and inefficient. As an example,
consider implementing XOR using AND, IOR and complement: this would translate
$x \text{ XOR } y$ into

$$(\text{COMPLEMENT } x \text{ AND } y) \text{ IOR } (x \text{ AND COMPLEMENT } y) $$

This is both needlessly complex and also inefficient, as it requires copying the
arguments twice, only to throw away both copies.

Like our baseline bitwise operations above, shifts and rotations are widely
used, and considered as primitive. For example, x86 platforms have had the
following available since the 8086:

  • RCL: Rotate left.
  • RCR: Rotate right.
  • SHL: Shift left.
  • SHR: Shift right.

Likewise, ARM platforms have had the following available since ARM2:

  • ROR: Rotate right.
  • LSL: Shift left.
  • LSR: Shift right.

While C and Forth both have shifts (denoted with << and >> in
C, and LSHIFT and RSHIFT in Forth), they don't have rotations;
however, many higher-level languages do: Haskell's Bits type class has
rotate, which enables both left and right rotations.

While popCountByteString could in theory be simulated using
testBitByteString and a fold, this is quite inefficient: the best way
to simulate this operation would involve using something similar to the
Harley-Seal algorithm, which requires a large lookup table, making it
impractical on-chain. Furthermore, population counting is important for several
classes of succinct data structure (particularly rank-select dictionaries and
bitmaps), and is in fact provided as part of the SSE4.2 x86 instruction
set as a primitive POPCNT.

In order to usefully manipulate individual bits, both testBitByteString
and writeBitByteString are needed. They can also be used as part of
specifying, and verifying, that other bitwise operations, both primitive and
non-primitive, are behaving correctly. They are also particularly essential for
binary encodings.

findFirstSetByteString is an essential primitive for several succinct
data structures: both Roaring Bitmaps and rank-select dictionaries rely on it
being efficient for much of their usefulness. Furthermore, this operation is
provided in hardware by several instruction sets: on x86, there exist (at least)
BSF, BSR, LZCNT and TZCNT, which allow
finding both the first and last set bits, while on ARM, there exists
CLZ, which can be used to simulate finding the first set bit. The
instruction also exists in higher-level languages: for example, GHC's
FiniteBits type class has countTrailingZeros and
countLeadingZeros. The main reason we propose taking finding the first set bit as primitive, rather than counting leading zeroes or counting trailing zeroes is that finding the first set bit is required specifically for
several succinct data structures.

On-chain vectors

For linear structures on-chain, we are currently limited to BuiltinList
and BuiltinMap, which don't allow constant-time indexing. This is a
significant restriction, especially when many data structures and algorithms
rely on the broad availability of a constant-time-indexable linear structure,
such as a C array or Haskell Vector. While we could introduce a
primitive of this sort, this is a significant undertaking, and would require
both implementing and costing a possibly large API.

While for variable-length data, we don't have any alternatives if constant-time
indexing is a goal, for fixed-length (or limited-length at least) data, there is
a possibility, based on a similar approach taken by the finitary
library. Essentially, given finitary data, we can transform any item into a
numerical index, which is then stored by embedding into a byte array. As the
indexes are of a fixed maximum size, this can be done efficiently, but only if
there is a way of converting indices into bitstrings, and vice versa. Such a
construction would allow using a (wrapper around) BuiltinByteString as
a constant-time indexable structure of any finitary type. This is not much of a
restriction in practice, as on-chain, fixed-width or size-bounded types are
preferable due to the on-chain size limit.

Currently, all the pieces to make this work already exist: the only missing
piece is the ability to convert indices (which would have to be
BuiltinIntegers) into bit strings (which would have to be
BuiltinByteStrings) and back again. With this capability, it would be
possible to use these techniques to implement something like an array or vector
without new primitive data types.

@rphair
Copy link
Collaborator

rphair commented May 27, 2022

@goverthrow in a future editing round could you remove the hard line breaks from within paragraphs?

also re: portability of the document... @KtorZ @SebastienGllmt @crptmppt the inline maths formatting here is brilliant & probably essential... how could we make sure it's preserved in other contexts like https://cips.cardano.org ? So far equations in CIPs have been submitted as code, images, or manually formatted text.

@michaelpj
Copy link
Contributor

I think pasting the entire content into the PR description was probably not necessary :)


# Abstract

Add primitives for bitwise operations, based on `BuiltinByteString`, without requiring new data types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Add primitives for bitwise operations, based on `BuiltinByteString`, without requiring new data types.
Add primitives for bitwise operations, based on `bytestring`, without requiring new data types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these (and other) changes? BuiltinByteString is the name of the corresponding Plutus Core type as far as I'm aware; has this changed? Even if this is the case, bytestring isn't a valid type name.

but are difficult or impossible to implement without some, or all, of the
primitives we propose.

## Succinct data structures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Succinct data structures
### Succinct data structures

Furthermore, succinct data structures are not limited to sets of integers, but
**all** require bitwise operations to be implementable.

## Binary representations and encodings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Binary representations and encodings
### Binary representations and encodings

only constant overheads. There are several examples of these, and all rely on
bitwise operations for their implementations.

For example, consider wanting to store a set of `BuiltinInteger`s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For example, consider wanting to store a set of `BuiltinInteger`s
For example, consider wanting to store a set of `integer`s


For example, consider wanting to store a set of `BuiltinInteger`s
on-chain. Given current on-chain primitives, the most viable option involves
some variant on a `BuiltinList` of `BuiltinInteger`s; however,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
some variant on a `BuiltinList` of `BuiltinInteger`s; however,
some variant on a `list` of `integer`s; however,


We also specify some specific non-goals of this proposal.

### No metaphor-mixing between numbers and bits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

```
Performs a bitwise shift of the first argument by the
absolute value of the second argument, with padding, the direction being
indicated by the sign of the second argument.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "with padding" mean, and how does the argument indicate the direction?

We intend that `BuiltinByteString`s represent byte sequences, with the
sequence of bits being exactly as the description above. For example, given the
byte sequence `0110111100001100`, the `BuiltinByteString`
corresponding to it would be `o\f`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you mean "the bytestring corresponding to this string under encoding X", what is X?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be under the encoding we describe in this section. How could I phrase this more clearly?

Copy link
Contributor

@michaelpj michaelpj Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't describe a string encoding here. I guessed that it might be ASCII and it looks like it is, so you could say "The bytestring corresponding to it would be the one corresponding to the ASCII encoding of o\f".

return $-1$ instead.


## Semantics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe my comments in IntersectMBO/plutus#4252 (comment) still need to be addressed here.

$n \neq m$, the result is a mismatched length error. Otherwise, the result is
the byte sequence $u = u_n u_{n - 1} \ldots, u_0$ such that for all $i \in \{0, 1, \ldots, n\}$ we have

$$u_i = \begin{cases}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't render well on github

@blamario
Copy link

blamario commented Jun 1, 2022

Perform a bitwise logical operation on arguments of the same
length, producing a result of the same length, erroring otherwise

There may be a justification for this choice, but it's not obvious to me and it's not specified in the proposal either. The operations' result length could instead match the length of the longer argument, reproducing its tail. This would simplify certain tasks like clearing and masking of select bits.

@michaelpj
Copy link
Contributor

There may be a justification for this choice, but it's not obvious to me and it's not specified in the proposal either.

I think the justification is that we want to be especially fussy in on-chain code about not allowing unexpected behaviour to sneak through. In this proposal, a bitwise AND just does a bitwise AND, rather than possibly also extending one of the arguments. The extending behaviour may be convenient, but it's easy to implement yourself, and opens up opportunities for accidental misuse. Better to be explicit, do one thing, and let the user extend if they need it.

@KtorZ KtorZ changed the title New CIP for bitwise primitives CIP-0058? | New CIP for bitwise primitives Jun 7, 2022
@rphair rphair added State: Likely Deprecated Close if confirmed deprecated (or long waiting). and removed Candidate CIP labels Jun 24, 2022
@rphair
Copy link
Collaborator

rphair commented Jun 24, 2022

@kozross @goverthrow marking this deprecated as per #283 (comment).

@KtorZ
Copy link
Member

KtorZ commented Jul 5, 2022

Shall we close this PR ? @goverthrow

@rphair
Copy link
Collaborator

rphair commented Sep 14, 2022

@KtorZ I believe the fact that new commits have just been made to superseding version #283 means that this PR is in fact deprecated (not having heard otherwise), so closing as such.

@rphair rphair closed this Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
State: Likely Deprecated Close if confirmed deprecated (or long waiting).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants