Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instance of class Bits for byte strings? #383

Closed
kindaro opened this issue May 3, 2021 · 9 comments
Closed

Add instance of class Bits for byte strings? #383

kindaro opened this issue May 3, 2021 · 9 comments

Comments

@kindaro
Copy link
Contributor

kindaro commented May 3, 2021

There is a class in base that implements xor and other bit operations. I wonder if we can add an instance of this class for byte strings. There is already a package xor that defines these operations between byte strings and words — this suggests that the feature is useful and not trivial.

@Bodigrim
Copy link
Contributor

Bodigrim commented May 4, 2021

I'm +0.5 in favor of this, but would like to hear from others.

What kind of behavior for strings of unequal length do you have in mind? Could we just use packZipWith xor?

bitvec package provides Bits instance for vectors of bits and could be a more idiomatic choice in many cases. Does it fit your use case?

@kindaro
Copy link
Contributor Author

kindaro commented May 4, 2021

It is already enough trouble to mediate between character lists (used by some older or simpler libraries) and byte arrays (used by some newer or more sophisticated libraries). I would rather not add another type conversion. All I want is to play Crypto Hack with Haskell rather than Python (as they advise), and it is so messy already. I want convenience.

The behaviour I am looking for is the same as in the xor package: to replicate the shorter byte array enough times that it is not shorter anymore and then zip with xor with the other byte array. I can do the replication arithmetic myself but I think this is the common use case so I would propose the Bits instance behave like that out of the box.

@sjakobi
Copy link
Member

sjakobi commented May 4, 2021

I don't have a good idea of the tradeoffs involved in such an instance. I think a draft implementation would help me make up my mind.

@chessai
Copy link
Member

chessai commented May 4, 2021

this comment describes some reasons and points to some other reasons why I'm -1 on this (particularly @ekmett's comment on the reddit thread).

Furthermore, xor package is for bit masks, not Bits-style interactions between two bytestrings, so they're not really comparable.

@Bodigrim
Copy link
Contributor

Bodigrim commented May 4, 2021

What if we add a function
foo :: FiniteBits b => (forall a. FiniteBits a => a -> a -> a) -> b -> ByteString -> Bytestring?

@Bodigrim
Copy link
Contributor

Bodigrim commented May 4, 2021

Or even more general (and useful) foo :: FiniteBits a => ByteString -> [a] and bar :: FiniteBits a => [a] -> ByteString plus some rules for fusion.

@kindaro
Copy link
Contributor Author

kindaro commented May 5, 2021

This is the argument of Edward Kmett from the Reddit thread mentioned above, for reference:

What do you do when you go to set a bit beyond the end of the bytestring? extend it? Do we renormalize to collapse if we shrink the result? Leave it alone? The zero bytestring would have to be empty. Complementing it would not result in an infinite bytestring of 1s, because we can't do that...

I am going to engage with it presently.

I dislike partial functions as much as the next person, but the reality is that we are going to have some. You cannot set bit №9 of a byte, you cannot divide by 0, and so on. This is life, we should get over it somehow.

The usual interpretation of xor in cryptography is that the shorter string is replicated, so |u xor v| = |u| max |v|. Alternatively, we can pass through infinity and say that u xor v = take (|u| max |v|) (replicate ∞ u xor replicate ∞ v). In order for us to follow this interpretation, we must assign to every finite byte string an infinite correspondent.

So what is the infinite correspondent of an empty byte string? There is no natural definition. I do not see this terrible revelation preventing people from using xor in this way outside of Haskell, and to great good, so I do not think we should let it stop us either. Maybe we can assign empty = [False, False, …] by fiat and call it a day, or we can leave logical connectives undefined for an empty byte string.

On the good side, under this «infinitely looping» definition of logical connectives we have no problem defining the complement operation, since singleton ⊤ is equivalent to an infinite sequence of 1s.

Another way to look at it is like this: logical connectives are defined for bits, so all we have to do is define the shape where bits are located. Finite byte strings have the shape of a sequence indexed by finite ordinals, so for a byte string u the set of indices is {n ∈ ℕ, n ≤ |u|}. At the limit, infinite byte strings are indexed by ℕ. replicate furnishes us with a natural transformation that puts all byte strings into infinite byte strings. It can even be defined directly for lazy byte strings. It also has a natural pseudo-inverse take |u|.

What more to wish for?

@kindaro
Copy link
Contributor Author

kindaro commented May 6, 2021

Unfortunately, there is no way to make operations defined this way associative.

For example, suppose we have a long message m and two keys k₁ of length 2 and k₂ of length 3. Applying these keys to the message with xor is the same as applying a composite key of length 6. This affords a way of making a one time pad of size Π|vᵢ| from a relatively short string of length Σ|vᵢ|, provided that the lengths of all keys vᵢ are co-prime, so it is an important application. Since xor on bits is associative, we would expect to be able to xor the keys together and then apply the resulting key k₃ to the message, but by the definition proposed above k₃ would only have length 3, or generally maximum {vᵢ}. That is to say, m xor (k₁ xor k₂) ≠ (m xor k₁) xor k₂.

This means that the appropriate setting for these computations is infinite byte streams, or perhaps infinite byte streams decorated with length, either of which is out of scope for this package.

@Bodigrim
Copy link
Contributor

I assume this can be closed?.. Feel free to reopen, if I misjudged the conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants