Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(frontend): adding a levelled case for XOR distance between one encrypted and one clear vectors #928

Merged
merged 1 commit into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
import argparse
import time

import numpy as np

from concrete import fhe


# Hamming weight computation
hw_table_values = [np.binary_repr(x).count("1") for x in range(2**8)]

# fmt: off
assert np.array_equal(hw_table_values, [
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3,
4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4,
4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2,
3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5,
4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3,
4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3,
3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5,
6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6,
4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5,
6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8]
)
# fmt: on

hw = fhe.LookupTable(hw_table_values)


def mapme(x):
"""Map 0 to -1, and keep 1 as 1."""
return 2 * x - 1


def dist_in_clear(x, y):
"""Compute the distance in the clear."""
return np.sum(hw[x ^ y])


def dist_in_fhe(x_mapped, y_mapped):
"""Compute the distance in FHE."""

# x is a line tensor, whose 0's have been replaced by -1
# y_clear is a column tensor, whose 0's have been replaced by -1
assert x_mapped.ndim == y_mapped.ndim == 2
assert x_mapped.shape[0] == y_mapped.shape[1] == 1

u = np.matmul(x_mapped, y_mapped)[0][0]

# So, u is a scalar:
# - bits which are the same between x and y_clear (either two -1's or two 1's) count for a +1 in the scalar
# - bits which are different between x and y_clear (either (-1, 1) or (1, -1)) count for a -1 in the scalar
# Hence the HW distance is (len(x) - u) / 2
final_result = np.prod(x_mapped.shape) - u

# The result which is returned is the double of the distance, we'll halve this in the clear
return final_result


def manage_args():
"""Manage user args."""
parser = argparse.ArgumentParser(
description="Hamming weight (aka XOR) distance in Concrete, between an encrypted vector and a clear vector."
)
parser.add_argument(
"--nb_bits",
dest="nb_bits",
action="store",
type=int,
default=120,
help="Number of bits (better to be a multiple of 12 to test all bitwidths)",
)
parser.add_argument(
"--show_mlir",
dest="show_mlir",
action="store_true",
help="Show the MLIR",
)
parser.add_argument(
"--repeat",
dest="repeat",
action="store",
type=int,
default=5,
help="Repeat x times",
)
args = parser.parse_args()
return args


def main():
"""Main function."""
print()

# Options by the user
args = manage_args()

nb_bits = args.nb_bits

# Info
print(
f"Computing XOR distance on {nb_bits} bits using algorithm dist_in_fhe, using vectors of 1b cells"
)

# Compile the circuit
inputset = [
(
mapme(np.random.randint(2**1, size=(1, nb_bits))),
mapme(np.transpose(np.random.randint(2**1, size=(1, nb_bits)))),
)
for _ in range(100)
]

compiler = fhe.Compiler(dist_in_fhe, {"x_mapped": "encrypted", "y_mapped": "clear"})
circuit = compiler.compile(
inputset,
show_mlir=args.show_mlir,
bitwise_strategy_preference=fhe.BitwiseStrategy.ONE_TLU_PROMOTED,
multivariate_strategy_preference=fhe.MultivariateStrategy.PROMOTED,
)

# Then generate the keys
circuit.keygen()

total_time = 0
bcm-at-zama marked this conversation as resolved.
Show resolved Hide resolved

nb_samples_for_warmup = 10

# Then use
for i in range(nb_samples_for_warmup + args.repeat):
# Take a random input pair
x, y = (
np.random.randint(2**1, size=(1, nb_bits)),
np.random.randint(2**1, size=(1, nb_bits)),
)

x_mapped = mapme(x)
y_mapped = mapme(np.transpose(y))

# Encrypt
encrypted_input = circuit.encrypt(x_mapped, y_mapped)

# Compute the distance in FHE
begin_time = time.time()
encrypted_result = circuit.run(encrypted_input)
end_time = time.time()

# Don't count the warmup samples
if i >= nb_samples_for_warmup:
total_time += end_time - begin_time

# Decrypt
result = circuit.decrypt(encrypted_result)

# Halve this in the clear, to have the final result
result /= 2

# Check
assert result == dist_in_clear(x, y)

average_time = total_time / args.repeat
print(f"Distance between encrypted vectors done in {average_time:.2f} " f"seconds in average")


if __name__ == "__main__":
main()
46 changes: 39 additions & 7 deletions frontends/concrete-python/examples/xor_distance/xor_distance.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,11 @@ We describe how to compute a XOR distance (as known as an Hamming weight distanc
can be useful in particular for biometry use-cases, where obviously, private is a very interesting
feature.

The full code can be done [here](hamming_distance.py). Execution times of the different functions are given in the
final section.
We present the XOR distance in two contexts, with corresponding codes:
- the XOR distance between [two encrypted tensors](hamming_distance.py)
- the XOR distance between [one encrypted tensor and one clear tensor](hamming_distance_to_clear.py)

Execution times of the different functions are given in the different sections.

## The Goal

Expand Down Expand Up @@ -34,7 +37,9 @@ This is a distance function, which can be used for various purpose, including me
vectors are close to each other. In the context of biometry (or others), it may be very interesting
to compute this function over encrypted `x` and `y` vectors.

## First Implementation
## Distance Between Two Encrypted Tensors

### First Implementation

In the [full code](hamming_distance.py), we use a first implementation, which is

Expand All @@ -45,7 +50,7 @@ def dist_in_fhe_directly_from_cp(x, y):

Here, it's a pure copy of the code in Concrete, and it compiles directly into FHE code!

## Second Implementation with `fhe.bits`
### Second Implementation with `fhe.bits`

In the [full code](hamming_distance.py), we use a second implementation, which is

Expand All @@ -60,7 +65,7 @@ This function only works for bit-vectors `x` and `y` (as opposed to other functi
`fhe.bits` operator to extract the least-significant bit of the addition `x+y`: indeed, this least
signification bit is exactly `x ^ y`.

## Third Implementation with Concatenation
### Third Implementation with Concatenation

In the [full code](hamming_distance.py), we use a third implementation, which is

Expand All @@ -78,7 +83,7 @@ def dist_in_fhe_with_xor_internal(x, y, bitsize_w):
Here, we concatenate the elements of `x` and `y` (which are of bitsize `bitsize_w`) into a
`2 * bitsize_w` input, and use a `2 * bitsize_w`-bit programmable bootstrapping.

## Fourth Implementation with `fhe.multivariate`
### Fourth Implementation with `fhe.multivariate`

In the [full code](hamming_distance.py), we use a fourth implementation, which is

Expand All @@ -90,7 +95,7 @@ def dist_in_fhe_with_multivariate_internal(x, y):

Here, we use `fhe.multivariate`, which is a function which takes the two inputs `x` and `y`. Under the hood, it's going to be replaced by a `2 * bitsize_w`-bit programmable bootstrapping.

## Execution Time
### Execution Time Between Two Encrypted Tensors

_All of the following timings were measured on an `hpc7a` machine, with Concrete 2.5.1._

Expand Down Expand Up @@ -151,3 +156,30 @@ And finally, for 12804-bit vectors, execution times should be:
dist_in_fhe_with_multivariate_tables on 4 bits: 40.89 seconds
```

## Distance Between One Encrypted Tensor and One Clear Tensor

In [this code](hamming_distance_to_clear.py), we propose a simple implementation for the special case
where one of the vectors (here, `y`) is not encrypted. The function `dist_in_fhe` is based on the
following idea: `x` is seen as a line-vector of bits, while `y` is seen as a column-vector of bits.
`x` and `y` follow a simple transform (before the encryption): bits 0 are mapped to -1, while bits 1
are mapped to 1. Then we just compute the scalar product `u` between mapped `x` and `y`.

Bits which are equal between mapped `x` and `y` will be either (1, 1) or (-1, -1) so corresponding
impact on the sum of the scalar multiplication is a 1. On the opposite, for bits which are different,
so (1, -1) or (-1, 1), the impact on the sum of the scalar multiplication is a -1. All in all,
`u = n - 2 HW(x^y)`, where `n` is the number of bits of `x` (which is the number of bits of `y` too).

In the code, we compute `n - u`, and we divide by 2 after the decryption, which doesn't reduce the
privacy of the computation.

### Execution Time Between One Encrypted Tensor and One Clear Tensor

This case is really fast, since there is no programmable bootstrapping (PBS) in the code. It's a
purely levelled FHE circuit.

For 12804-bit vectors, on an `hpc7a` machine, with Concrete 2.7.0, we have:

```
Computing XOR distance on 12804 bits using algorithm dist_in_fhe, using vectors of 1b cells
Distance between encrypted vectors done in 0.43 seconds in average
```
Loading