Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(frontend): 2.7x doc review #917

Merged
merged 4 commits into from
Jul 4, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 25 additions & 26 deletions docs/compilation/common_errors.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Common errors

In this document, we list the most common errors, and mention how the user can fix them.
This document explains the most common errors and provides solutions to fix them.

## 1. Could not find a version that satisfies the requirement concrete-python (from versions: none)

Expand All @@ -9,73 +9,72 @@ In this document, we list the most common errors, and mention how the user can f
**Cause**: The installation does not work fine for you.

**Possible solutions**:
- Be sure that you use a supported Python version (currently from 3.8 to 3.11, included)
- Check you have done `pip install -U pip wheel setuptools` before
- Consider adding a `--extra-index-url https://pypi.zama.ai/cpu/`
- Concrete requires glibc>=2.28, be sure to have a sufficiently recent version
- Be sure that you use a supported Python version (currently from 3.8 to 3.11, included).
- Check that you have done `pip install -U pip wheel setuptools` before.
- Consider adding a `--extra-index-url https://pypi.zama.ai/cpu/`.
- Concrete requires glibc>=2.28, be sure to have a sufficiently recent version.

## 2. Only integers are supported

**Error message**: `RuntimeError: Function you are trying to compile cannot be compiled` with extra information `only integers are supported`

**Cause**: This error can occur if parts of your program contain graphs which are not from integer to integer
**Cause**: Parts of your program contain graphs that are not from integer to integer

**Possible solutions**:
- It is possible to use floats as intermediate values (see the [documentation](../core-features/floating_points.md#floating-points-as-intermediate-values)) but always, inputs and outputs must be integers. So, consider adding ways to convert to integers, such as `.astype(np.uint64)`
- You can use floats as intermediate values (see the [documentation](../core-features/floating_points.md#floating-points-as-intermediate-values)). However, both inputs and outputs must be integers. Consider converting values to integers, such as `.astype(np.uint64)`

## 3. No parameters found

**Error message**: `NoParametersFound`

**Cause**: The optimizer was not able to find cryptographic parameters for the circuit, which are both secure and correct
**Cause**: The optimizer can't find cryptographic parameters for the circuit that are both secure and correct.

**Possible solutions**:
- Try to simplify your circuit
- Use smaller weights,
- Add intermediate PBS to reduce the noise, with identity function `fhe.univariate(lambda x: x)`
- Try to simplify your circuit.
- Use smaller weights.
- Add intermediate PBS to reduce the noise, with identity function `fhe.univariate(lambda x: x)`.

## 4. Too long inputs for table looup

**Error message**: `RuntimeError: Function you are trying to compile cannot be compiled`, with extra information as `this [...]-bit value is used as an input to a table lookup` with `but only up to 16-bit table lookups are supported`

**Cause**: In your program, you use a table lookup where the input is too large, i.e., is more than 16-bits, which is the current limit
**Cause**: The program uses a Table Lookup that contains oversized inputs exceeding the current 16-bit limit.

**Possible solutions**:
- Try to simplify your circuit
- Use smaller weights,
- Look to the MLIR to understand where this too-long input comes from
- Try to simplify your circuit.
- Use smaller weights.
- Look to the MLIR to understand where this oversized input comes from and ensure that the input size for Table Lookup operations does not exceed 16 bits.

## 5. Impossible to fuse multiple-nodes

**Error message**: `RuntimeError: A subgraph within the function you are trying to compile cannot be fused because it has multiple input nodes`

**Cause**: In your program, you have a subgraph using two nodes or more. It is impossible to fuse such a graph, i.e., to replace it by a table lookup. Concrete will show you where the different nodes are, with some `this is one of the input nodes` printed in the circuit.
**Cause**: A subgraph in your program uses two or more input nodes. It is impossible to fuse such a graph, meaning replace it by a table lookup. Concrete will indicate the input nodes with `this is one of the input nodes printed` in the circuit.

**Possible solutions**:
- Try to simplify your circuit
- Have a look to `fhe.multivariate`
- Try to simplify your circuit.
- Have a look to `fhe.multivariate`.

## 6. Function is not supported

**Error message**: `RuntimeError: Function '[...]' is not supported`

**Cause**: You are using a function which is not currently supported by Concrete
**Cause**: The function used is not currently supported by Concrete.

**Possible solutions**:
- Try to change your program
- Have a look to the documentation to see if there are ways to implement the function differently
- Ask our community channels
- Try to change your program.
- Check the corresponding documentation to see if there are ways to implement the function differently.
- Post your issue in our [community channels](https://community.zama.ai/c/concrete/7).

## 7. Branching is not allowed

**Error message**: `RuntimeError: Branching within circuits is not possible`

**Cause**: You are using branches in Concrete, it is not allowed in FHE program (typically, if's or
non-constant loops)
**Cause**: Branching operations, such as if statements or non-constant loops, are not supported in Concrete's FHE programs.

**Possible solutions**:
- Change your program
- Consider using tricks to replace ternary-if, as `c ? t : f = f + c * (t-f)`
- Change your program.
- Consider using tricks to replace ternary-if, as `c ? t : f = f + c * (t-f)`.



53 changes: 33 additions & 20 deletions docs/compilation/composing_functions_with_modules.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
# Composing functions with modules

In various cases, deploying a server that contains many compatible functions is important. `concrete-python` can compile FHE modules containing as many functions as needed. More importantly, modules support _composition_ of the different functions. This means the encrypted result of one function execution can be used as input of a different function, without needing to decrypt in between. A module is [deployed in a single artifact](../guides/deploy.md#deployment-of-modules), making as simple to use a single function project.
This document explains how to compile Fully Homomorphic Encryption (FHE) modules containing multiple functions using Concrete.

Here is a first simple example:
Deploying a server that contains many compatible functions is important for some use cases. With Concrete, you can compile FHE modules containing as many functions as needed.

These modules support the composition of different functions, meaning that the encrypted result of one function can be used as the input for another function without needing to decrypt it first. Additionally, a module is [deployed in a single artifact](../guides/deploy.md#deployment-of-modules), making it as simple to use as a single-function project.

## Single inputs / outputs

The following example demonstrates how to create an FHE module:
```python
from concrete import fhe

Expand All @@ -17,14 +23,14 @@ class Counter:
return x - 1 % 20
```

You can compile the FHE module `Counter` using the `compile` method. To do that, you need to provide a dictionnary of input sets for every function:
Then, you can compile the FHE module `Counter` using the `compile` method. To do that, you need to provide a dictionary of input-sets for every function:

```python
inputset = list(range(20))
CounterFhe = CounterFhe.compile({"inc": inputset, "dec": inputset})
```

After the module has been compiled, we can encrypt and call the different functions in the following way:
After the module is compiled, you can encrypt and call the different functions as follows:

```python
x = 5
Expand All @@ -43,7 +49,7 @@ x_dec = CounterFhe.inc.decrypt(x_enc)
assert x_dec == 15
```

## Multi inputs, multi outputs
## Multi inputs / outputs

Composition is not limited to single input / single output. Here is an example that computes the 10 first elements of the Fibonacci sequence in FHE:

Expand Down Expand Up @@ -103,9 +109,9 @@ Encrypting initial values
| 9 || 144 | 144 | 233 | 233 |
```

## Iteration support
## Iterations

With the previous example we see that to some extent, modules allows to support iteration with cleartext iterands. That is, loops with the following shape :
With the previous example, we see that modules allow iteration with cleartext iterands to some extent. Specifically, loops with the following structure are supported:

```python
for i in some_cleartext_constant_range:
Expand Down Expand Up @@ -158,7 +164,7 @@ while is_one_enc is None or not CollatzFhe.collatz.decrypt(is_one_enc):
print(f"| {x_dec:<9} | {x:<9} |")
```

Which prints:
This script prints the following output:

```shell
Compiling `Collatz` module ...
Expand Down Expand Up @@ -186,14 +192,14 @@ Encrypting initial value
| 2 | 2 |
| 1 | 1 |
```
In this example, a while loop iterates until the decrypted value equals 1. The loop body is implemented in FHE, but the iteration control must be in cleartext.

Here we use a while loop that keeps iterating as long as the decryption of the running value is different from `1`. Again, the loop body is implemented in FHE, but the iteration control has to be in the clear.
## Runtime optimization

## Optimizing runtimes with composition policies
By default, when using modules, all inputs and outputs of every function are compatible, sharing the same precision and crypto-parameters. This approach applies the crypto-parameters of the most costly code path to all code paths. This simplicity may be costly and unnecessary for some use cases.

By default when using modules, every inputs and outputs of every functions are compatible: they share the same precision and the same crypto-parameters. This means that the most costly crypto-parameters of all code-paths is used for every code paths. This simplicity comes at a cost, and depending on the use case, it may not be necessary.
To optimize runtime, we provide finer-grained control over the composition policy via the `composition` module attribute. Here is an example:

To optimize the runtimes, we provide a finer grained control over the composition policy via the `composition` module attribute. Here is an example:
```python
from concrete import fhe

Expand All @@ -212,11 +218,15 @@ class Collatz:
composition = fhe.AllComposable()
```

By default the attribute is set to `fhe.AllComposable`. This policy ensures that every ciphertexts used in the module are compatible. This is the less restrictive, but most costly policy.
You have 3 options for the `composition` attribute:

If one does not need composition at all, but just want to pack multiple functions in a single artifact, it is possible to do so by setting the `composition` attribute to `fhe.NotComposable`. This is the most restrictive, but less costly policy.
1. **`fhe.AllComposable` (default)**: This policy ensures that all ciphertexts used in the module are compatible. It is the least restrictive policy but the most costly in terms of performance.

Hopefully there is no need to choose between one of those two extremes. It is also possible to detail custom policies by using `fhe.Wired`. For instance:
2. **`fhe.NotComposable`**: This policy is the most restrictive but the least costly. It is suitable when you do not need any composition and only want to pack multiple functions in a single artifact.

3. **`fhe.Wired`**: This policy allows you to define custom composition rules. You can specify which outputs of a function can be forwarded to which inputs of another function.

Here is an example:
```python
from concrete import fhe
from fhe import Wired, Wire, Output, Input
Expand All @@ -242,7 +252,7 @@ class Collatz:

In this case, the policy states that the first output of the `collatz` function can be forwarded to the first input of `collatz`, but not the second output (which is decrypted every time, and used for control flow).

It is possible to use an `fhe.Wire` between any two functions, it is also possible to define wires with `fhe.AllInputs` and `fhe.AllOutputs` ends. For instance in the previous example:
You can use the `fhe.Wire` between any two functions. It is also possible to define wires with `fhe.AllInputs` and `fhe.AllOutputs` ends. For instance, in the previous example:
```python
composition = Wired(
[
Expand All @@ -253,11 +263,13 @@ It is possible to use an `fhe.Wire` between any two functions, it is also possib

This policy would be equivalent to using the `fhe.AllComposable` policy.

## Limitations
## Current limitations

Depending on the functions, composition may add a significant overhead compared to a non-composable version.

Depending on the functions, supporting composition may add a non-negligible overhead when compared to a non-composable version. Indeed, to be composable a function must verify the following condition: Every output which can be forwarded as input (as per the composition policy) must contain a noise refreshing operation.
To be composable, a function must meet the following condition: every output that can be forwarded as input (according to the composition policy) must contain a noise-refreshing operation. Since adding a noise refresh has a noticeable impact on performance, Concrete does not automatically include it.

Since adding a noise refresh has a non negligeable impact on performance, `concrete-python` does not do it in behalf of the user. For instance, to implement a function that doubles an encrypted value, we would write something like:
For instance, to implement a function that doubles an encrypted value, you might write:

```python
@fhe.module()
Expand All @@ -266,8 +278,9 @@ class Doubler:
def double(counter):
return counter * 2
```
This function is valid with the `fhe.NotComposable` policy. However, if compiled with the `fhe.AllComposable` policy, it will raise a `RuntimeError: Program cannot be composed: ...`, indicating that an extra Programmable Bootstrapping (PBS) step must be added.

This is a valid function with the `fhe.NotComposable` policy, but if compiled with `fhe.AllComposable` policy, a `RuntimeError: Program can not be composed: ...` error is reported, signalling that an extra PBS must be added. To solve this situation, and turn this circuit into a valid one, one can use the following snippet to add a PBS at the end of the circuit:
To resolve this and make the circuit valid, add a PBS at the end of the circuit:

```python
def noise_reset(x):
Expand Down
57 changes: 42 additions & 15 deletions docs/execution-analysis/gpu_acceleration.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,56 @@
# GPU acceleration

This document explains how to use GPU accelerations with Concrete.

Concrete supports acceleration using one or more GPUs.

To use this feature, you need to install a Concrete Python wheel which is built with GPU/CUDA support. This is **not** available on pypi.org/project/concrete-python as we only release wheels with CPU support there. To install a GPU/CUDA wheel, you need to install it from our [Zama public pypi repository](https://pypi.zama.ai), which can be done with the following command line:
{% hint style="info" %}
This version is not available on [pypi.org](pypi.org/project/concrete-python), which only hosts wheels with CPU support.
{% endhint %}

To use GPU acceleration, install the GPU/CUDA wheel from our [Zama public PyPI repository] (https://pypi.zama.ai) using the following command:

`pip install concrete-python --index-url https://pypi.zama.ai/gpu`.

Once a GPU/CUDA flavor python wheel is installed, FHE program compilation must be [configured](../guides/configure.md) using the **use_gpu** option to enable GPU offloading.
After installing the GPU/CUDA wheel, you must [configure] ((../guides/configure.md)) the FHE program compilation to enable GPU offloading using the `use_gpu` option.

{% hint style="info" %}
Our GPU wheels are built with CUDA 11.8 and should be compatible with higher versions of CUDA.
{% endhint %}

## GPU execution configuration

By default the compiler and runtime will make use of all resources available on the system, to include all CPU cores and GPUs. This can be adjusted by using environment variables.
The following variables are relevant in this context:

* **SDFG_NUM_THREADS**: Integer = Number of hardware threads on the system, including hyperthreading, less number of GPUs in use.
* Number of CPU threads to execute concurrently to GPU for workloads that can be offloaded. As GPU scheduler threads (including CUDA threads and those used within Concrete) are necessary and can be a bottleneck or interfere with worker thread execution, it is recommended to undersubscribe the CPU hardware threads by the number of GPU devices used.
* **SDFG_NUM_GPUS**: Integer = Number of GPUs available
* Number of GPUs to use for offloading. This can be set at any value between 1 and the total number of GPUs on the system.
* **SDFG_MAX_BATCH_SIZE**: Integer = LLONG_MAX (no batch size limit)
* Limit the maximum batch size to offload in cases where the GPU memory is insufficient.
* **SDFG_DEVICE_TO_CORE_RATIO**: Integer = Ratio between the compute capability of the GPU (at index 0) and a CPU core
* Ratio between GPU and CPU used to balance the load between CPU and GPU. If the GPU is starved, this can be set at higher values to increase the amount of work offloaded.
* **OMP_NUM_THREADS**: Integer = Number of hardware threads on the system, including hyperthreading
* Portions of program execution that are not yet supported for GPU offload are parallelized using OpenMP on the CPU.
By default the compiler and runtime will use all available system resources, including all CPU cores and GPUs. You can adjust this by using the following environment variables:

### SDFG_NUM_THREADS
- **Type**: Integer
- **Default value**: The number of hardware threads on the system (including hyperthreading) minus the number of GPUs in use.
- **Description:** This variable determines the number of CPU threads that execute in paralelle with the GPU for offloadable workloads. GPU scheduler threads (including CUDA threads and those used within Concrete) are necessary but can block or interfere with worker thread execution. Therefore, it is recommended to undersubscribe the CPU hardware threads by the number of GPU devices used.
- **Required**: No

### SDFG_NUM_GPUS
- **Type**: Integer
- **Default value**: The number of GPUs available.
- **Description**: This value determines the number of GPUs to use for offloading. This can be set to any value between 1 and the total number of GPUs on the system.
- **Required**: No

### SDFG_MAX_BATCH_SIZE**

- **Type**: Integer (default: LLONG_MAX)
- **Default value**: LLONG_MAX (no batch size limit)
- **Description**: This value limits the maximum batch size for offloading in cases where the GPU memory is insufficient.
- **Required**: No

### SDFG_DEVICE_TO_CORE_RATIO

- **Type**: Integer
- **Default value**: The ratio between the compute capability of the GPU (at index 0) and a CPU core.
- **Description**: This ratio is used to balance the load between the CPU and GPU. If the GPU is underutilized, set this value higher to increase the amount of work offloaded to the GPU.
- **Required**: No

### OMP_NUM_THREADS

- **Type**: Integer
- **Default value**: The number of hardware threads on the system, including hyperthreading.
- **Description**: This value specifies the portions of program execution that are not yet supported for GPU offload, which will be parallelized using OpenMP on the CPU.
- **Required**: No
Loading