Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge devel into main #22

Merged
merged 47 commits into from
Dec 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
691c01a
Simplify AArch64 model
hanno-becker Dec 3, 2023
518e81a
Don't rename locked registers in SSA transform
hanno-becker Dec 8, 2023
cb99841
Modularize application of parsing callbacks during DFG construction
hanno-becker Dec 8, 2023
38ba498
Add support for fusion callbacks, implement eor+eor->eor3 fusion
hanno-becker Dec 8, 2023
2c31555
Fix div-by-0 issue
hanno-becker Dec 8, 2023
864ac28
Handle low-iteration count when preamble+postamble are >1 iterations
hanno-becker Dec 8, 2023
1be6340
Disable split heuristic during fusion
hanno-becker Dec 8, 2023
7627713
Remove clutter in split heuristic
hanno-becker Dec 8, 2023
23eb8b7
Add selfcheck after split heuristic
hanno-becker Dec 8, 2023
5a04ae0
Move selfcheck and preamble/postamble fixup to result class
hanno-becker Dec 8, 2023
7cb52fa
Introduce class for source lines
hanno-becker Dec 8, 2023
169872d
Add vector ldp/stp to AArch64 model
hanno-becker Dec 10, 2023
98c4cd5
Add `transpose` parent class for trn1 and trn2
hanno-becker Dec 10, 2023
f1f5f35
Simplify preprecessing by naive interleaving
hanno-becker Dec 11, 2023
0712ca6
Use tag for no-unfold
hanno-becker Dec 11, 2023
9273e83
Adust use of `is_virtual` which is now a property
hanno-becker Dec 18, 2023
9421154
Add configuration option controlling address fixup
hanno-becker Dec 18, 2023
dd75877
Add support for `after_last` source annotation
hanno-becker Dec 18, 2023
d267623
Keep line metadata during optimization
hanno-becker Dec 11, 2023
2da51bd
Smaller cleanup in helper.py
hanno-becker Dec 11, 2023
9359311
Drop source line tags by default upon optimization
hanno-becker Dec 12, 2023
c16c459
Some more cleanup
hanno-becker Dec 12, 2023
3fcc154
Further smaller improvements
hanno-becker Dec 13, 2023
8154e95
Cleanup imports and directory structure
hanno-becker Dec 15, 2023
b3339d4
Fix example.py
hanno-becker Dec 15, 2023
9131e4a
Some pylint'ing
hanno-becker Dec 16, 2023
3fc7bb3
More pylint'ing
hanno-becker Dec 17, 2023
b571978
Adjust AArch64 parsing callbacks to addition of source line info
hanno-becker Dec 18, 2023
7699dca
Add some experimental batched AES 'virtual' instructions to AArch64
hanno-becker Dec 8, 2023
4361e87
Add FAQ
hanno-becker Dec 18, 2023
17ab097
Fix parsing bug in AArch64 model for instructions affecting flags
hanno-becker Dec 18, 2023
2e567b0
Adjust x25519-aarch64-simple.s to tag and parsing changes
hanno-becker Dec 18, 2023
564aae3
Merge escaped lines during source code parsing
hanno-becker Dec 18, 2023
66a8caf
Print and keep tags by default
hanno-becker Dec 18, 2023
4b25416
Fix link in FAQ
hanno-becker Dec 19, 2023
cb502a9
Fix init.sh
hanno-becker Dec 19, 2023
dc6c6bb
Work around bug https://github.com/google/or-tools/issues/4027
hanno-becker Dec 19, 2023
4543a87
pages: Add back-pointer from FAQ to index
hanno-becker Dec 19, 2023
10d22be
Add or-tools patch working around build issue
hanno-becker Dec 23, 2023
f5baf3c
Experiment: Simplify x25519 optimization script
hanno-becker Dec 23, 2023
62a7d57
More pylint
hanno-becker Dec 23, 2023
95b52f9
Minor changes to github pages
hanno-becker Dec 23, 2023
40af412
Fix slothy imports in ntt helium script
hanno-becker Dec 22, 2023
dfe347c
Add logo to README
hanno-becker Dec 24, 2023
8d7dce8
Update logo
hanno-becker Dec 24, 2023
896757a
Add OR-Tools dependencies to README and setup-ortools.sh
hanno-becker Dec 25, 2023
5819b7a
Revert "Experiment: Simplify x25519 optimization script"
hanno-becker Dec 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
<p align="center">
<image src="./docs/slothy_logo.png" width=160>
</p>

**SLOTHY** - **S**uper (**L**azy) **O**ptimization of **T**ricky **H**andwritten assembl**Y** - is an assembly-level superoptimizer
for:
1. Instruction scheduling
Expand All @@ -6,7 +10,7 @@ for:

SLOTHY is generic in the target architecture and microarchitecture. This repository provides instantiations for the
the Cortex-M55 and Cortex-M85 CPUs implementing Armv8.1-M + Helium, and the Cortex-A55 and Cortex-A72
CPUs implementing Armv8-A + Neon. There is an experimental model for Cortex-X/Neoverse-V cores.
CPUs implementing Armv8-A + Neon. There is an experimental model for Cortex-X/Neoverse-V cores.

SLOTHY is discussed in [Fast and Clean: Auditable high-performance assembly via constraint solving](https://eprint.iacr.org/2022/1303).

Expand All @@ -16,10 +20,10 @@ SLOTHY enables a development workflow where developers write 'clean' assembly by

### How it works

SLOTHY is essentially a constraint solver frontend: It converts the input source into a data flow graph and
SLOTHY is essentially a constraint solver frontend: It converts the input source into a data flow graph and
builds a constraint model capturing valid instruction schedulings, register renamings, and periodic loop
interleavings. The model is passed to an external constraint solver and, upon success,
a satisfying assignment converted back into the final code. Currently, SLOTHY uses
interleavings. The model is passed to an external constraint solver and, upon success,
a satisfying assignment converted back into the final code. Currently, SLOTHY uses
[Google OR-Tools](https://developers.google.com/optimization) as its constraint solver backend.

### Performance
Expand Down Expand Up @@ -51,9 +55,11 @@ and build from scratch, e.g. as follows (also available as [submodules/setup-ort
for convenience):

```
% apt install -y git build-essential python3-pip cmake swig
% git submodule init
% git submodule update
% cd submodules/or-tools
% git apply ../0001-Pin-pybind11_protobuf-commit-in-cmake-files.patch
% mkdir build
% cmake -S. -Bbuild -DBUILD_PYTHON:BOOL=ON
% make -C build -j8
Expand Down Expand Up @@ -270,4 +276,4 @@ The [examples](examples/naive) directory contains numerous exemplary assembly sn
`python3 example.py --examples={YOUR_EXAMPLE}`. See `python3 examples.py --help` for the list of all available examples.

The use of SLOTHY from the command line is illustrated in [scripts/](scripts/) supporting the real-world optimizations
for the NTT, FFT and X25519 discussed in [Fast and Clean: Auditable high-performance assembly via constraint solving](https://eprint.iacr.org/2022/1303).
for the NTT, FFT and X25519 discussed in [Fast and Clean: Auditable high-performance assembly via constraint solving](https://eprint.iacr.org/2022/1303).
49 changes: 49 additions & 0 deletions docs/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
layout: default
---

## Frequently asked questions

[back](index.md)

#### Is SLOTHY a peephole optimizer?

No. SLOTHY is a _fixed-instruction_ super-optimizer: It keeps instructions and optimizes
register allocation, instruction scheduling, and software pipelining. It is the developer's or another tool's
responsibility to map the workload at hand to the target architecture.

<!-- #### When should I use SLOTHY?

You may want to use SLOTHY on performance-critical workloads for which precise control over instruction-selection
is beneficial (e.g. because other code-generation techniques do not find ideal instruction sequences) or needed
(e.g. because some instructions or instruction patterns have to be avoided for security). -->

#### Is SLOTHY better than {name your favourite superoptimizer}?

Most likely, they serve different purposes. SLOTHY aims to do one thing well: Optimization _after_ instruction selection.
It is thus independent of and potentially combinable with superoptimizers operating at earlier stages of the code-generation process, such as [souper](https://github.com/google/souper) and [CryptOpt](https://github.com/0xADE1A1DE/CryptOpt).

#### Does SLOTHY support x86?

The core of SLOTHY is architecture- and microarchitecture-agnostic and can accommodate x86. As it stands, however,
there is no model of the x86 architecture. Feel free to build one!

#### Does SLOTHY support RISC-V?

As for x86.

#### Is SLOTHY formally verified?

No. Arguably, that wouldn't be a good use of time. The more relevant question is the following:

#### Is SLOTHY-generated code formally verified to be equivalent to the input code?

Not yet. SLOTHY runs a self-check confirming that input and output have isomorphic data flow graphs,
but pitfalls remain, such as bad user configurations allowing SLOTHY to clobber a register that's not
meant to be reserved. More work is needed for formal verification of the equivalence of input
and output.

#### Why is my question not here?

Ping us! ([GitHub](https://github.com/slothy-optimizer/slothy/issues), or see [paper](https://eprint.iacr.org/2022/1303.pdf) for
contact information).
8 changes: 5 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,14 @@ super-optimizes:
`SLOTHY` enables a development workflow where developers write 'clean' assembly by hand, emphasizing the logic of the
computation, while `SLOTHY` automates microarchitecture-specific micro-optimizations. Since `SLOTHY` does not change
instructions, and scheduling/allocation optimizations are tightly controlled through configurable and extensible
constraints, the developer keeps close control over the final assembly, while being freed from the most tedious and
readability- and verifiability-impeding micro-optimizations.
constraints, the developer keeps close control over the final assembly, while being freed from tedious
micro-optimizations.

See also [FAQ](faq.md)

#### Architecture/Microarchitecture support

`SLOTHY` is generic in the target architecture and microarchitecture. So far, it supports Cortex-M55 and Cortex-M85
`SLOTHY` is generic in the target architecture and microarchitecture. It currently supports Cortex-M55 and Cortex-M85
implementing Armv8.1-M + Helium, and Cortex-A55 and Cortex-A72 implementing
Armv8-A + Neon. Moreover, there is an experimental model for Cortex-X/Neoverse-V cores.

Expand Down
Binary file modified docs/slothy_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
32 changes: 18 additions & 14 deletions example.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,31 +25,35 @@
# Author: Hanno Becker <hannobecker@posteo.de>
#

import argparse, logging, sys
from io import StringIO
import argparse
import logging
import sys

from slothy.slothy import Slothy
from slothy.core import Config
from slothy import Slothy, Config

import targets.arm_v81m.arch_v81m as Arch_Armv81M
import targets.arm_v81m.cortex_m55r1 as Target_CortexM55r1
import targets.arm_v81m.cortex_m85r1 as Target_CortexM85r1
import slothy.targets.arm_v81m.arch_v81m as Arch_Armv81M
import slothy.targets.arm_v81m.cortex_m55r1 as Target_CortexM55r1
import slothy.targets.arm_v81m.cortex_m85r1 as Target_CortexM85r1

import targets.aarch64.aarch64_neon as AArch64_Neon
import targets.aarch64.cortex_a55 as Target_CortexA55
import targets.aarch64.cortex_a72_frontend as Target_CortexA72
import slothy.targets.aarch64.aarch64_neon as AArch64_Neon
import slothy.targets.aarch64.cortex_a55 as Target_CortexA55
import slothy.targets.aarch64.cortex_a72_frontend as Target_CortexA72

target_label_dict = {Target_CortexA55: "a55",
Target_CortexA72: "a72",
Target_CortexM55r1: "m55",
Target_CortexM85r1: "m85"}

class ExampleException(Exception):
"""Exception thrown when an example goes wrong"""

class Example():
"""Common boilerplate for SLOTHY examples"""

def __init__(self, infile, name=None, funcname=None, suffix="opt",
rename=False, outfile="", arch=Arch_Armv81M, target=Target_CortexM55r1,
**kwargs):
if name == None:
if name is None:
name = infile

self.arch = arch
Expand All @@ -61,7 +65,7 @@ def __init__(self, infile, name=None, funcname=None, suffix="opt",
self.outfile = f"{infile}_{self.suffix}_{target_label_dict[self.target]}"
else:
self.outfile = f"{outfile}_{self.suffix}_{target_label_dict[self.target]}"
if funcname == None:
if funcname is None:
self.funcname = self.infile
subfolder = ""
if self.arch == AArch64_Neon:
Expand Down Expand Up @@ -1127,8 +1131,8 @@ def run_example(name, debug=False):
if e.name == name:
ex = e
break
if ex == None:
raise Exception(f"Could not find example {name}")
if ex is None:
raise ExampleException(f"Could not find example {name}")
ex.run(debug=debug)

for e in todo:
Expand Down
Loading