diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
index 98f0eb5328..df1cb36309 100644
--- a/docs/SUMMARY.md
+++ b/docs/SUMMARY.md
@@ -13,14 +13,9 @@
 ## Core features
 
 * [Overview](core-features/fhe_basics.md)
-* [Table lookups](core-features/table_lookups.md)
+* [Table lookups (basics)](core-features/table_lookups.md)
 * [Bit extraction](core-features/bit_extraction.md)
-* [Rounding](core-features/rounding.md)
-* [Truncating](core-features/truncating.md)
-* [Floating points](core-features/floating_points.md)
-* [Comparisons](core-features/comparisons.md)
-* [Min/Max operations](core-features/minmax.md)
-* [Bitwise operations](core-features/bitwise.md)
+* [Non-linear operations](core-features/non_linear_operations.md)
 * [Common tips](core-features/workarounds.md)
 * [Extensions](core-features/extensions.md)
 * [Tagging](core-features/tagging.md)
@@ -64,6 +59,14 @@
 ## Explanations
 
 * [Compiler workflow](dev/compilation/compiler_workflow.md)
+* [Compiler internals](dev/compilation/compiler_internals.md)
+  * [Table lookups](core-features/table_lookups_advanced.md)
+  * [Rounding](core-features/rounding.md)
+  * [Truncating](core-features/truncating.md)
+  * [Floating points](core-features/floating_points.md)
+  * [Comparisons](core-features/comparisons.md)
+  * [Min/Max operations](core-features/minmax.md)
+  * [Bitwise operations](core-features/bitwise.md)
 * [Frontend fusing](explanations/fusing.md)
 * [Compiler backend](explanations/backends/README.md)
   * [Adding a new backend](explanations/backends/new_backend.md)
diff --git a/docs/core-features/minmax.md b/docs/core-features/minmax.md
index 89f44152df..d105400b04 100644
--- a/docs/core-features/minmax.md
+++ b/docs/core-features/minmax.md
@@ -160,7 +160,7 @@ What this means is that if we are comparing `uint3` and `uint6`, we need to conv
   (x - y).bit_width <= MAXIMUM_TLU_BIT_WIDTH
   ```
 
-### 1. fhe.ComparisonStrategy.ONE_TLU_PROMOTED
+### 1. fhe.MinMaxStrategy.ONE_TLU_PROMOTED
 
 This strategy makes sure that during bit-width assignment, both operands are assigned the same bit-width, and that bit-width contains at least the amount of bits required to store `x - y`. The idea is:
 
@@ -226,7 +226,7 @@ module {
 }
 ```
 
-### 2. fhe.ComparisonStrategy.THREE_TLU_CASTED
+### 2. fhe.MinMaxStrategy.THREE_TLU_CASTED
 
 This strategy will not put any constraint on bit-widths during bit-width assignment. Instead, operands are cast to a bit-width that can store `x - y` during runtime using table lookups. The idea is:
 
diff --git a/docs/core-features/non_linear_operations.md b/docs/core-features/non_linear_operations.md
new file mode 100644
index 0000000000..9b8dc5395c
--- /dev/null
+++ b/docs/core-features/non_linear_operations.md
@@ -0,0 +1,165 @@
+# Non-linear operations
+
+In Concrete, there are basically two types of operations:
+- linear operations, like additions, subtraction and multiplication by an integer, which are very fast
+- and all the rest, which is done by a table lookup (TLU).
+
+TLU are essential to be able to compile all functions, by keeping the semantic of user's program, but
+they can be slower, depending on the bitwidth of the inputs of the TLU.
+
+In this document, we explain briefly, from a user point of view, how it works for non-linear operations as comparisons, min/max, bitwise operations, shifts. In [the poweruser documentation](../dev/compilation/compiler_internals.md), we enter a bit more into the details.
+
+## Changing bit width in the MLIR or dynamically with a TLU
+
+Often, for binary operations, we need to have equivalent bit width for the two operands: it can be done in two ways. Either directly in the MLIR, or dynamically (i.e., at execution time) with a TLU. Because of these different methods, and the fact that none is stricly better than the other one in the general case, we offer different configurations for the non-linear functions.
+
+The first method has the advantage to not require an expensive TLU. However, it may have impact in other parts of the program, since the operand of which we change the bit width may be used elsewhere in the program, so it may create more bit widths changes. Also, if ever the modified operands are used in TLUs, the impact may be significative.
+
+The second method has the advantage to be very local: it has no impact elsewhere. However, it is costly, since it uses a TLU.
+
+## Generic Principle for the user
+
+In the following non-linear operations, we propose a certain number of configurations, using the two methods on the different operands. In general, it is not easy to know in advance which configuration will be the fastest one, but with some Concrete experience. We recommend the users to test and try what are the best configuration depending on their circuits.
+
+By running the following programs with `show_mlir=True`, the advanced user may look the MLIR, and see the different uses of TLUs, bit width changes in the MLIR and dynamic change of the bit width. However, for the classical user, it is not critical to understand the different flavours. We would just recommend to try the different configurations and see which one fits the best for your case.
+
+## Comparisons
+
+For comparison, there are 7 available methods. The generic principle is
+
+```python
+import numpy as np
+from concrete import fhe
+
+configuration = fhe.Configuration(
+    comparison_strategy_preference=config,
+)
+
+def f(x, y):
+    return x < y
+
+inputset = [
+    (np.random.randint(0, 2**4), np.random.randint(0, 2**4))
+    for _ in range(100)
+]
+
+compiler = fhe.Compiler(f, {"x": "encrypted", "y": "encrypted"})
+circuit = compiler.compile(inputset, configuration, show_mlir=True)
+```
+
+where `config` is one of
+- `fhe.ComparisonStrategy.CHUNKED`
+- `fhe.ComparisonStrategy.ONE_TLU_PROMOTED`
+- `fhe.ComparisonStrategy.THREE_TLU_CASTED`
+- `fhe.ComparisonStrategy.TWO_TLU_BIGGER_PROMOTED_SMALLER_CASTED`
+- `fhe.ComparisonStrategy.TWO_TLU_BIGGER_CASTED_SMALLER_PROMOTED`
+- `fhe.ComparisonStrategy.THREE_TLU_BIGGER_CLIPPED_SMALLER_CASTED`
+- `fhe.ComparisonStrategy.TWO_TLU_BIGGER_CLIPPED_SMALLER_PROMOTED`
+
+## Min / Max operations
+
+For min / max operations, there are 3 available methods. The generic principle is
+
+```python
+import numpy as np
+from concrete import fhe
+
+configuration = fhe.Configuration(
+    min_max_strategy_preference=config,
+)
+
+def f(x, y):
+    return np.minimum(x, y)
+
+inputset = [
+    (np.random.randint(0, 2**4), np.random.randint(0, 2**2))
+    for _ in range(100)
+]
+
+compiler = fhe.Compiler(f, {"x": "encrypted", "y": "encrypted"})
+circuit = compiler.compile(inputset, configuration, show_mlir=True)
+```
+
+where `config` is one of
+- `fhe.MinMaxStrategy.CHUNKED` (default)
+- `fhe.MinMaxStrategy.ONE_TLU_PROMOTED`
+- `fhe.MinMaxStrategy.THREE_TLU_CASTED`
+
+## Bitwise operations
+
+For bit wise operations (typically, AND, OR, XOR), there are 5 available methods. The generic principle is
+
+```python
+import numpy as np
+from concrete import fhe
+
+configuration = fhe.Configuration(
+    bitwise_strategy_preference=config,
+)
+
+def f(x, y):
+    return x & y
+
+inputset = [
+    (np.random.randint(0, 2**4), np.random.randint(0, 2**4))
+    for _ in range(100)
+]
+
+compiler = fhe.Compiler(f, {"x": "encrypted", "y": "encrypted"})
+circuit = compiler.compile(inputset, configuration, show_mlir=True)
+```
+
+where `config` is one of
+- `fhe.BitwiseStrategy.CHUNKED`
+- `fhe.BitwiseStrategy.ONE_TLU_PROMOTED`
+- `fhe.BitwiseStrategy.THREE_TLU_CASTED`
+- `fhe.BitwiseStrategy.TWO_TLU_BIGGER_PROMOTED_SMALLER_CASTED`
+- `fhe.BitwiseStrategy.TWO_TLU_BIGGER_CASTED_SMALLER_PROMOTED`
+
+## Shift operations
+
+For shift operations, there are 2 available methods. The generic principle is
+
+```python
+import numpy as np
+from concrete import fhe
+
+configuration = fhe.Configuration(
+    shifts_with_promotion=shifts_with_promotion,
+)
+
+def f(x, y):
+    return x << y
+
+inputset = [
+    (np.random.randint(0, 2**3), np.random.randint(0, 2**2))
+    for _ in range(100)
+]
+
+compiler = fhe.Compiler(f, {"x": "encrypted", "y": "encrypted"})
+circuit = compiler.compile(inputset, configuration, show_mlir=True)
+```
+
+where `shifts_with_promotion` is either `True` or `False`.
+
+## Relation with `fhe.multivariate`
+
+Let us just remark that all binary operations described in this document can also be implemented with the `fhe.multivariate` function which is described in [this section](../core-features/extensions.md#fhe.multivariate-function).
+
+```python
+import numpy as np
+from concrete import fhe
+
+
+def f(x, y):
+    return fhe.multivariate(lambda x, y: x << y)(x, y)
+
+
+inputset = [(np.random.randint(0, 2**3), np.random.randint(0, 2**2)) for _ in range(100)]
+
+compiler = fhe.Compiler(f, {"x": "encrypted", "y": "encrypted"})
+circuit = compiler.compile(inputset, show_mlir=True)
+```
+
+
+
diff --git a/docs/core-features/table_lookups.md b/docs/core-features/table_lookups.md
index 8f28646d95..c87555ba72 100644
--- a/docs/core-features/table_lookups.md
+++ b/docs/core-features/table_lookups.md
@@ -1,30 +1,14 @@
-# Table lookups
+# Table lookup
 
-One of the most common operations in **Concrete** are `Table Lookups` (TLUs). All operations except addition, subtraction, multiplication with non-encrypted values, tensor manipulation operations, and a few operations built with those primitive operations (e.g. matmul, conv) are converted to Table Lookups under the hood.
+In TFHE, there exists mainly two operations: the linear operations (additions, subtractions, multiplications by integers) and the rest. And the rest is done with table lookups (TLUs), which means that a lot of things are done with TLU. In this document, we explain briefly, from a user point of view, how TLU can be used. In [the poweruser documentation](../dev/compilation/compiler_internals.md), we enter a bit more into the details.
 
-Table Lookups are very flexible. They allow Concrete to support many operations, but they are expensive. The exact cost depends on many variables (hardware used, error probability, etc.), but they are always much more expensive compared to other operations. You should try to avoid them as much as possible. It's not always possible to avoid them completely, but you might remove the number of TLUs or replace some of them with other primitive operations.
+## Performance
 
-{% hint style="info" %}
-Concrete automatically parallelizes TLUs if they are applied to tensors.
-{% endhint %}
-
-## Direct table lookup
-
-**Concrete** provides a `LookupTable` class to create your own tables and apply them in your circuits.
-
-{% hint style="info" %}
-`LookupTable`s can have any number of elements. Let's call the number of elements **N**. As long as the lookup variable is within the range \[-**N**, **N**), the Table Lookup is valid.
-
-If you go outside of this range, you will receive the following error:
-
-```
-IndexError: index 10 is out of bounds for axis 0 with size 6
-```
-{% endhint %}
+Before entering into the details on how we can use TLU in Concrete, let us mention the most important parameter for speed here: the smallest the bitwidth of a TLU is, the faster the corresponding FHE operation will be. Which means, in general, the user should try to reduce the size of the inputs to the tables. Also, we propose in the end of this document ways to truncate or round entries, which makes effective inputs smaller and so makes corresponding TLU faster.
 
-### With scalars.
+## Direct TLU
 
-You can create the lookup table using a list of integers and apply it using indexing:
+Direct TLU stands for instructions of the form `y = T[i]`, for some table `T` and some index `i`. One can use the `fhe.LookupTable` to define the table values, and then use it on scalars or tensors.
 
 ```python
 from concrete import fhe
@@ -44,20 +28,18 @@ assert circuit.encrypt_run_decrypt(2) == table[2] == 3
 assert circuit.encrypt_run_decrypt(3) == table[3] == 0
 ```
 
-### With tensors.
-
-When you apply a table lookup to a tensor, the scalar table lookup is applied to each element of the tensor:
-
 ```python
 from concrete import fhe
 import numpy as np
 
 table = fhe.LookupTable([2, -1, 3, 0])
 
+
 @fhe.compiler({"x": "encrypted"})
 def f(x):
     return table[x]
 
+
 inputset = [np.random.randint(0, 4, size=(2, 3)) for _ in range(10)]
 circuit = f.compile(inputset)
 
@@ -71,36 +53,14 @@ expected_output = [
 ]
 actual_output = circuit.encrypt_run_decrypt(np.array(sample))
 
-for i in range(2):
-    for j in range(3):
-        assert actual_output[i][j] == expected_output[i][j] == table[sample[i][j]]
+assert np.array_equal(actual_output, expected_output)
 ```
 
-### With negative values.
-
-`LookupTable` mimics array indexing in Python, which means if the lookup variable is negative, the table is looked up from the back:
-
-```python
-from concrete import fhe
-
-table = fhe.LookupTable([2, -1, 3, 0])
-
-@fhe.compiler({"x": "encrypted"})
-def f(x):
-    return table[-x]
-
-inputset = range(1, 5)
-circuit = f.compile(inputset)
-
-assert circuit.encrypt_run_decrypt(1) == table[-1] == 0
-assert circuit.encrypt_run_decrypt(2) == table[-2] == 3
-assert circuit.encrypt_run_decrypt(3) == table[-3] == -1
-assert circuit.encrypt_run_decrypt(4) == table[-4] == 2
-```
+`LookupTable` mimics array indexing in Python, which means if the lookup variable is negative, the table is looked up from the back.
 
-## Direct multi-table lookup
+## Multi TLU
 
-If you want to apply a different lookup table to each element of a tensor, you can have a `LookupTable` of `LookupTable`s:
+Multi TLU stands for instructions of the form `y = T[j][i]`, for some set of tables `T`, some table-index `j` and some index `i`. One can use the `fhe.LookupTable` to define the table values, and then use it on scalars or tensors.
 
 ```python
 from concrete import fhe
@@ -134,83 +94,129 @@ expected_output = [
 ]
 actual_output = circuit.encrypt_run_decrypt(np.array(sample))
 
-for i in range(3):
-    for j in range(2):
-        if j == 0:
-            assert actual_output[i][j] == expected_output[i][j] == squared[sample[i][j]]
-        else:
-            assert actual_output[i][j] == expected_output[i][j] == cubed[sample[i][j]]
+assert np.array_equal(actual_output, expected_output)
 ```
 
-In this example, we applied a `squared` table to the first column and a `cubed` table to the second column.
-
-## Fused table lookup
+### Transparent TLU
 
-**Concrete** tries to fuse some operations into table lookups automatically so that lookup tables don't need to be created manually:
+For lot of programs, users will not even need to define their table lookup, it will be done
+automatically by Concrete.
 
 ```python
 from concrete import fhe
-import numpy as np
 
 @fhe.compiler({"x": "encrypted"})
 def f(x):
-    return (42 * np.sin(x)).astype(np.int64) // 10
+    return x ** 2
 
-inputset = range(8)
-circuit = f.compile(inputset)
+inputset = range(4)
+circuit = f.compile(inputset, show_mlir = True)
 
-for x in range(8):
-    assert circuit.encrypt_run_decrypt(x) == f(x)
+assert circuit.encrypt_run_decrypt(0) == 0
+assert circuit.encrypt_run_decrypt(1) == 1
+assert circuit.encrypt_run_decrypt(2) == 4
+assert circuit.encrypt_run_decrypt(3) == 9
 ```
 
+Remark that this kind of TLU is compatible with the TLU options, and in particular, with rounding and
+truncating which are explained below.
+
 {% hint style="info" %}
-All lookup tables need to be from integers to integers. So, without `.astype(np.int64)`, **Concrete** will not be able to fuse.
+
+[fhe.univariate](../core-features/extensions.md#fheunivariatefunction) and [fhe.multivariate](../core-features/extensions.md#fhemultivariatefunction) extensions are convenient ways to perform more complex operations as transparent TLUs.
+
 {% endhint %}
 
-The function is first traced into:
+### TLU on the most significant bits
 
-![](../\_static/tutorials/table-lookup/1.initial.graph.png)
+As we said in the beginning of this document, bitsize of the inputs of TLU are critical for the efficiency of the execution: the slower they are, the faster the TLU will be. Thus, we have proposed a few mechanisms to reduce this bitsize: the main one is _rounding_ or _truncating_.
 
-**Concrete** then fuses appropriate nodes:
+For lot of use-cases, like for example in Machine Learning, it is possible to replace the table lookup `y = T[i]` by some `y = T'[i']`, where `i'` only has the most significant bits of `i` and `T'` is a much shorter table, and still maintain a good accuracy of the function. The interest of such a method stands in the fact that, since the table `T'` is much smaller, the corresponding TLU will be done much more quickly.
 
-![](../\_static/tutorials/table-lookup/3.final.graph.png)
+There are different flavors of doing this in Concrete. We describe them quickly here, and refer the user to the [poweruser documentation](../dev/compilation/compiler_internals.md) for more explanations.
 
-{% hint style="info" %}
-Fusing makes the code more readable and easier to modify, so try to utilize it over manual `LookupTable`s as much as possible.
-{% endhint %}
+The first possibility is to set `i'` as the truncation of `i`: here, we just take the most significant bits of `i`. This is done with `fhe.truncate_bit_pattern`.
+
+```python
+from concrete import fhe
+import numpy as np
+
+table = fhe.LookupTable([i**2 for i in range(16)])
+lsbs_to_remove = 1
+
+
+@fhe.compiler({"x": "encrypted"})
+def f(x):
+    return table[fhe.truncate_bit_pattern(x, lsbs_to_remove)]
 
-## Using automatically created table lookup
 
-We refer the users to [this page](extensions.md) for explanations about `fhe.univariate(function)` and `fhe.multivariate(function)` features, which are convenient ways to use automatically created table lookup.
+inputset = range(16)
+circuit = f.compile(inputset)
 
-## Table lookup exactness
+for i in range(16):
+    rounded_i = int(i / 2**lsbs_to_remove) * 2**lsbs_to_remove
 
-TLUs are performed with an FHE operation called `Programmable Bootstrapping` (PBS). PBSs have a certain probability of error: when these errors happen, it results in inaccurate results.
+    assert circuit.encrypt_run_decrypt(i) == rounded_i**2
+```
 
-Let's say you have the table:
+The second possibility is to set `i'` as the rounding of `i`: here, we take the most significant bits of `i`, and increment by 1 if ever the most significant ignored bit is a 1, to round.  This is done with `fhe.round_bit_pattern`. It's however a bit more complicated, since rounding may make us go "out" of the original table. Remark how we enlarged the original table by 1 index.
 
 ```python
-lut = [0, 1, 4, 9, 16, 25, 36, 49, 64]
+from concrete import fhe
+import numpy as np
+
+table = fhe.LookupTable([i**2 for i in range(17)])
+lsbs_to_remove = 1
+
+
+def our_round(x):
+    float_part = x - np.floor(x)
+    if float_part < 0.5:
+        return int(np.floor(x))
+    return int(np.ceil(x))
+
+
+@fhe.compiler({"x": "encrypted"})
+def f(x):
+    return table[fhe.round_bit_pattern(x, lsbs_to_remove)]
+
+
+inputset = range(16)
+circuit = f.compile(inputset)
+
+for i in range(16):
+    rounded_i = our_round(i * 1.0 / 2**lsbs_to_remove) * 2**lsbs_to_remove
+
+    assert (
+        circuit.encrypt_run_decrypt(i) == rounded_i**2
+    ), f"Miscomputation {i=} {circuit.encrypt_run_decrypt(i)} {rounded_i**2}"
 ```
 
-And you perform a Table Lookup using `4`. The result you should get is `lut[4] = 16`, but because of the possibility of error, you could get any other value in the table.
+Finally, for `fhe.round_bit_pattern`, there exist an `exactness=fhe.Exactness.APPROXIMATE` option, which can make computations even faster, at the price of having a few (minor) differences between cleartext computations and encrypted computations.
 
-The probability of this error can be configured through the `p_error` and `global_p_error` configuration options. The difference between these two options is that, `p_error` is for individual TLUs but `global_p_error` is for the whole circuit.
+```python
+from concrete import fhe
+import numpy as np
 
-If you set `p_error` to `0.01`, for example, it means every TLU in the circuit will have a 99% chance (or more) of being exact. If there is a single TLU in the circuit, it corresponds to `global_p_error = 0.01` as well. But if we have 2 TLUs, then `global_p_error` would be higher: that's `1 - (0.99 * 0.99) ~= 0.02 = 2%`.
+table = fhe.LookupTable([i**2 for i in range(17)])
+lsbs_to_remove = 1
 
-If you set `global_p_error` to `0.01`, the whole circuit will have at most 1% probability of error, no matter how many Table Lookups are included (which means that `p_error` will be smaller than `0.01` if there are more than a single TLU).
 
-If you set both of them, both will be satisfied. Essentially, the stricter one will be used.
+@fhe.compiler({"x": "encrypted"})
+def f(x):
+    return table[fhe.round_bit_pattern(x, lsbs_to_remove, exactness=fhe.Exactness.APPROXIMATE)]
 
-By default, both `p_error` and `global_p_error` are set to `None`, which results in a `global_p_error` of `1 / 100_000` being used.
 
-Feel free to play with these configuration options to pick the one best suited for your needs! See [How to Configure](../guides/configure.md) to learn how you can set a custom `p_error` and/or `global_p_error`.
+inputset = range(16)
+circuit = f.compile(inputset)
 
-{% hint style="info" %}
-Configuring either of those variables impacts compilation and execution times (compilation, keys generation, circuit execution) and space requirements (size of the keys on disk and in memory). Lower error probabilities result in longer compilation and execution times and larger space requirements.
-{% endhint %}
+for i in range(16):
+    lower_i = np.floor(i * 1.0 / 2**lsbs_to_remove) * 2**lsbs_to_remove
+    upper_i = np.ceil(i * 1.0 / 2**lsbs_to_remove) * 2**lsbs_to_remove
 
-## Table lookup performance
+    assert circuit.encrypt_run_decrypt(i) in [
+        lower_i**2,
+        upper_i**2,
+    ], f"Miscomputation {i=} {circuit.encrypt_run_decrypt(i)} {[lower_i**2, upper_i**2]}"
+```
 
-PBSs are very expensive, in terms of computations. Fortunately, it is sometimes possible to replace PBS by [rounded PBS](rounding.md), [truncate PBS](truncating.md) or even [approximate PBS](rounding.md). These TLUs have a slightly different semantic, but are very useful in cases like machine learning for more efficiency without drop of accuracy.
diff --git a/docs/core-features/table_lookups_advanced.md b/docs/core-features/table_lookups_advanced.md
new file mode 100644
index 0000000000..b10e76d95a
--- /dev/null
+++ b/docs/core-features/table_lookups_advanced.md
@@ -0,0 +1,216 @@
+# Table lookups (advanced)
+
+One of the most common operations in **Concrete** are `Table Lookups` (TLUs). All operations except addition, subtraction, multiplication with non-encrypted values, tensor manipulation operations, and a few operations built with those primitive operations (e.g. matmul, conv) are converted to Table Lookups under the hood.
+
+Table Lookups are very flexible. They allow Concrete to support many operations, but they are expensive. The exact cost depends on many variables (hardware used, error probability, etc.), but they are always much more expensive compared to other operations. You should try to avoid them as much as possible. It's not always possible to avoid them completely, but you might remove the number of TLUs or replace some of them with other primitive operations.
+
+{% hint style="info" %}
+Concrete automatically parallelizes TLUs if they are applied to tensors.
+{% endhint %}
+
+## Direct table lookup
+
+**Concrete** provides a `LookupTable` class to create your own tables and apply them in your circuits.
+
+{% hint style="info" %}
+`LookupTable`s can have any number of elements. Let's call the number of elements **N**. As long as the lookup variable is within the range \[-**N**, **N**), the Table Lookup is valid.
+
+If you go outside of this range, you will receive the following error:
+
+```
+IndexError: index 10 is out of bounds for axis 0 with size 6
+```
+{% endhint %}
+
+### With scalars.
+
+You can create the lookup table using a list of integers and apply it using indexing:
+
+```python
+from concrete import fhe
+
+table = fhe.LookupTable([2, -1, 3, 0])
+
+@fhe.compiler({"x": "encrypted"})
+def f(x):
+    return table[x]
+
+inputset = range(4)
+circuit = f.compile(inputset)
+
+assert circuit.encrypt_run_decrypt(0) == table[0] == 2
+assert circuit.encrypt_run_decrypt(1) == table[1] == -1
+assert circuit.encrypt_run_decrypt(2) == table[2] == 3
+assert circuit.encrypt_run_decrypt(3) == table[3] == 0
+```
+
+### With tensors.
+
+When you apply a table lookup to a tensor, the scalar table lookup is applied to each element of the tensor:
+
+```python
+from concrete import fhe
+import numpy as np
+
+table = fhe.LookupTable([2, -1, 3, 0])
+
+@fhe.compiler({"x": "encrypted"})
+def f(x):
+    return table[x]
+
+inputset = [np.random.randint(0, 4, size=(2, 3)) for _ in range(10)]
+circuit = f.compile(inputset)
+
+sample = [
+    [0, 1, 3],
+    [2, 3, 1],
+]
+expected_output = [
+    [2, -1, 0],
+    [3, 0, -1],
+]
+actual_output = circuit.encrypt_run_decrypt(np.array(sample))
+
+for i in range(2):
+    for j in range(3):
+        assert actual_output[i][j] == expected_output[i][j] == table[sample[i][j]]
+```
+
+### With negative values.
+
+`LookupTable` mimics array indexing in Python, which means if the lookup variable is negative, the table is looked up from the back:
+
+```python
+from concrete import fhe
+
+table = fhe.LookupTable([2, -1, 3, 0])
+
+@fhe.compiler({"x": "encrypted"})
+def f(x):
+    return table[-x]
+
+inputset = range(1, 5)
+circuit = f.compile(inputset)
+
+assert circuit.encrypt_run_decrypt(1) == table[-1] == 0
+assert circuit.encrypt_run_decrypt(2) == table[-2] == 3
+assert circuit.encrypt_run_decrypt(3) == table[-3] == -1
+assert circuit.encrypt_run_decrypt(4) == table[-4] == 2
+```
+
+## Direct multi-table lookup
+
+If you want to apply a different lookup table to each element of a tensor, you can have a `LookupTable` of `LookupTable`s:
+
+```python
+from concrete import fhe
+import numpy as np
+
+squared = fhe.LookupTable([i ** 2 for i in range(4)])
+cubed = fhe.LookupTable([i ** 3 for i in range(4)])
+
+table = fhe.LookupTable([
+    [squared, cubed],
+    [squared, cubed],
+    [squared, cubed],
+])
+
+@fhe.compiler({"x": "encrypted"})
+def f(x):
+    return table[x]
+
+inputset = [np.random.randint(0, 4, size=(3, 2)) for _ in range(10)]
+circuit = f.compile(inputset)
+
+sample = [
+    [0, 1],
+    [2, 3],
+    [3, 0],
+]
+expected_output = [
+    [0, 1],
+    [4, 27],
+    [9, 0]
+]
+actual_output = circuit.encrypt_run_decrypt(np.array(sample))
+
+for i in range(3):
+    for j in range(2):
+        if j == 0:
+            assert actual_output[i][j] == expected_output[i][j] == squared[sample[i][j]]
+        else:
+            assert actual_output[i][j] == expected_output[i][j] == cubed[sample[i][j]]
+```
+
+In this example, we applied a `squared` table to the first column and a `cubed` table to the second column.
+
+## Fused table lookup
+
+**Concrete** tries to fuse some operations into table lookups automatically so that lookup tables don't need to be created manually:
+
+```python
+from concrete import fhe
+import numpy as np
+
+@fhe.compiler({"x": "encrypted"})
+def f(x):
+    return (42 * np.sin(x)).astype(np.int64) // 10
+
+inputset = range(8)
+circuit = f.compile(inputset)
+
+for x in range(8):
+    assert circuit.encrypt_run_decrypt(x) == f(x)
+```
+
+{% hint style="info" %}
+All lookup tables need to be from integers to integers. So, without `.astype(np.int64)`, **Concrete** will not be able to fuse.
+{% endhint %}
+
+The function is first traced into:
+
+![](../\_static/tutorials/table-lookup/1.initial.graph.png)
+
+**Concrete** then fuses appropriate nodes:
+
+![](../\_static/tutorials/table-lookup/3.final.graph.png)
+
+{% hint style="info" %}
+Fusing makes the code more readable and easier to modify, so try to utilize it over manual `LookupTable`s as much as possible.
+{% endhint %}
+
+## Using automatically created table lookup
+
+We refer the users to [this page](extensions.md) for explanations about `fhe.univariate(function)` and `fhe.multivariate(function)` features, which are convenient ways to use automatically created table lookup.
+
+## Table lookup exactness
+
+TLUs are performed with an FHE operation called `Programmable Bootstrapping` (PBS). PBSs have a certain probability of error: when these errors happen, it results in inaccurate results.
+
+Let's say you have the table:
+
+```python
+lut = [0, 1, 4, 9, 16, 25, 36, 49, 64]
+```
+
+And you perform a Table Lookup using `4`. The result you should get is `lut[4] = 16`, but because of the possibility of error, you could get any other value in the table.
+
+The probability of this error can be configured through the `p_error` and `global_p_error` configuration options. The difference between these two options is that, `p_error` is for individual TLUs but `global_p_error` is for the whole circuit.
+
+If you set `p_error` to `0.01`, for example, it means every TLU in the circuit will have a 99% chance (or more) of being exact. If there is a single TLU in the circuit, it corresponds to `global_p_error = 0.01` as well. But if we have 2 TLUs, then `global_p_error` would be higher: that's `1 - (0.99 * 0.99) ~= 0.02 = 2%`.
+
+If you set `global_p_error` to `0.01`, the whole circuit will have at most 1% probability of error, no matter how many Table Lookups are included (which means that `p_error` will be smaller than `0.01` if there are more than a single TLU).
+
+If you set both of them, both will be satisfied. Essentially, the stricter one will be used.
+
+By default, both `p_error` and `global_p_error` are set to `None`, which results in a `global_p_error` of `1 / 100_000` being used.
+
+Feel free to play with these configuration options to pick the one best suited for your needs! See [How to Configure](../guides/configure.md) to learn how you can set a custom `p_error` and/or `global_p_error`.
+
+{% hint style="info" %}
+Configuring either of those variables impacts compilation and execution times (compilation, keys generation, circuit execution) and space requirements (size of the keys on disk and in memory). Lower error probabilities result in longer compilation and execution times and larger space requirements.
+{% endhint %}
+
+## Table lookup performance
+
+PBSs are very expensive, in terms of computations. Fortunately, it is sometimes possible to replace PBS by [rounded PBS](rounding.md), [truncate PBS](truncating.md) or even [approximate PBS](rounding.md). These TLUs have a slightly different semantic, but are very useful in cases like machine learning for more efficiency without drop of accuracy.
diff --git a/docs/dev/compilation/compiler_internals.md b/docs/dev/compilation/compiler_internals.md
new file mode 100644
index 0000000000..703cf39573
--- /dev/null
+++ b/docs/dev/compilation/compiler_internals.md
@@ -0,0 +1,14 @@
+# Concrete internals
+
+This section of the documentation is for more advanced users, who want to go further with the use of the compiler.
+
+In particular, we have subsections for:
+- [Table lookups](../../core-features/table_lookups.md)
+- [Rounding](../../core-features/rounding.md)
+- [Truncating](../../core-features/truncating.md)
+- [Floating points](../../core-features/floating_points.md)
+- [Comparisons](../../core-features/comparisons.md)
+- [Min/Max operations](../../core-features/minmax.md)
+- [Bitwise operations](../../core-features/bitwise.md)
+
+