Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opt_size feature to reduce .text segment size at the cost of perf… #9

Merged
merged 1 commit into from
Feb 24, 2022

Conversation

cr1901
Copy link
Contributor

@cr1901 cr1901 commented Feb 24, 2022

…ormance.

Analogous to jedisct1/rust-hmac-sha256#2, except several months late :D!

Benchmarks

Benchmarks were done targeting an Adafruit Feather RP2040, running at the default 125 MHz. My rustc info is as follows:

rustc 1.60.0-nightly (51126be1b 2022-01-24)
binary: rustc
commit-hash: 51126be1b260216b41143469086e6e6ee567647e
commit-date: 2022-01-24
host: x86_64-pc-windows-gnu
release: 1.60.0-nightly
LLVM version: 13.0.0

I used the following cargo profiles (dev == Debug) and varied the opt-level:

[profile.dev]
codegen-units = 1
debug = 2
debug-assertions = true
incremental = false
overflow-checks = true

[profile.release]
codegen-units = 1
debug = 2
debug-assertions = false
incremental = false
lto = 'fat'
overflow-checks = false

Results

Profile opt-level opt_size? .text size uSecs cargo bloat .text diff uSecs diff bloat diff
Debug 0 No 422040 X 68.0% 254.2KiB
Debug 0 Yes 345656 X 60.0% 179.7KiB -18.1% X -29.3%
Debug 1 No 150456 828569 68.4% 83.2KiB
Debug 1 Yes 139728 857174 65.2% 71.9KiB -7.13% +3.45% -13.6%
Debug 2 No 136508 717383 70.6% 79.1KiB
Debug 2 Yes 126848 771214 67.6% 69.0KiB -7.08% +7.50% -12.77%
Debug 3 No 196944 801755 80.8% 138.5KiB
Debug 3 Yes 128576 852571 68.4% 71.2KiB -34.71% +6.33% -48.59%
Debug "s" No 121396 685952 64.7% 62.2KiB
Debug "s" Yes 112180 732091 60.9% 52.8KiB -7.59% +6.73% -12.64%
Debug "z" No 124028 716984 63.2% 61.2KiB
Debug "z" Yes 112532 859824 58.3% 49.8KiB -9.27% +19.92% -18.62%
Release 1 No 111640 694665 61.8% 59.3KiB
Release 1 Yes 99444 744662 56.2% 47.1KiB -12.26% +7.20% -20.57%
Release 2 No 91116 785331 71.4% 59.1KiB
Release 2 Yes 79992 775112 67.0% 48.1KiB -12.21% -1.30% -18.61%
Release 3 No 149816 825196 84.5% 118.6KiB
Release 3 Yes 79848 864777 69.8% 50.2KiB -46.70% +4.80% -57.67%
Release "s" No 73420 659434 67.3% 44.0KiB
Release "s" Yes 62788 703593 61.1% 33.5KiB -14.48% +6.70% -23.86%
Release "z" No 84700 766108 40.6% 28.6KiB
Release "z" Yes 72916 797148 31.6% 18.6KiB -13.91% +4.05% -34.97%
  • An "X" means "USB enumeration timed out, so I couldn't get a measurement".
  • The cargo bloat field represents the cumulative .text and Size reported by running: cargo bloat --release --example ed22519-bench --filter ed25519_compact [--features=opt_size]".
  • .text size is in bytes.

Out of curiosity, I also wanted to see what would happen if I used a different opt-level for ed25519_compact compared to the other deps.

opt-level=3, ed25519_compact opt-level="s"

opt_size? .text size uSecs cargo bloat .text diff uSecs diff bloat diff
No 73764 659047 66.9% 44.1KiB
Yes 63132 725665 60.7% 33.7KiB -14.41% +10.11% -23.58%

opt-level="s", ed25519_compact opt-level="z"

opt_size? .text size uSecs cargo bloat .text diff uSecs diff bloat diff
No 72272 672932 66.1% 42.2KiB
Yes 60504 803764 58.6% 30.6KiB -16.28% +19.44% -27.49%

My takeaway from these results is that the opt-size feature can be a net-win for time and space, but you need to do benchmarks to be sure. opt-level=3 does not appear to be a good idea for constrained environments period.

Benchmark Code

I provide this for completeness only, I suggest using a project template and paste the below code into it.

Source

#![no_std]
#![no_main]

use core::fmt::{self, Write};

use adafruit_feather_rp2040 as bsp;
use bsp::hal::{self,
    clocks::init_clocks_and_plls,
    pac,
    watchdog::Watchdog,
};
use cortex_m_rt::entry;
use defmt::*;
use defmt_rtt as _;
use ed25519_compact::{KeyPair, Seed, Noise};
use getrandom::register_custom_getrandom;
use panic_probe as _;
use usb_device::{class_prelude::*, prelude::*};
use usbd_serial::SerialPort;

static mut STATE: u8 = 0;
// Safety: No interrupts are used in this demo- single-threaded.
pub fn dummy_rng(buf: &mut [u8]) -> Result<(), getrandom::Error> {
    let mut iter_u8 = unsafe { STATE };

    for b in buf {
        *b = iter_u8;
        iter_u8 += 1;
    }

    // Doesn't work... why?
    // (unsafe { STATE }) = iter_u8;
    unsafe { STATE = iter_u8 };
    Ok(())
}

register_custom_getrandom!(dummy_rng);

// Slices don't implement fmt::Write, so use a newtype just for this application.
struct NumFormatter32([u8; 10]);

impl fmt::Write for NumFormatter32 {
    fn write_str(&mut self, s: &str) -> fmt::Result {
        if s.len() > 10 {
            Err(fmt::Error)
        } else {
            let bytes = s.as_bytes();
            let correct_size_slice = &mut self.0[..s.len()];

            correct_size_slice.copy_from_slice(bytes);
            Ok(())
        }
    }
}

#[entry]
fn main() -> ! {
    // info!("Program start");
    let mut pac = pac::Peripherals::take().unwrap();
    let mut watchdog = Watchdog::new(pac.WATCHDOG);

    // External high-speed crystal on the pico board is 12Mhz
    let external_xtal_freq_hz = 12_000_000u32;
    let clocks = init_clocks_and_plls(
        external_xtal_freq_hz,
        pac.XOSC,
        pac.CLOCKS,
        pac.PLL_SYS,
        pac.PLL_USB,
        &mut pac.RESETS,
        &mut watchdog,
    )
    .ok()
    .unwrap();

    // Set up the USB driver
    let usb_bus = UsbBusAllocator::new(hal::usb::UsbBus::new(
        pac.USBCTRL_REGS,
        pac.USBCTRL_DPRAM,
        clocks.usb_clock,
        true,
        &mut pac.RESETS,
    ));

    // Set up the USB Communications Class Device driver
    let mut serial = SerialPort::new(&usb_bus);

    // Create a USB device with a fake VID and PID
    let mut usb_dev = UsbDeviceBuilder::new(&usb_bus, UsbVidPid(0x16c0, 0x27dd))
        .manufacturer("Fake company")
        .product("Serial port")
        .serial_number("TEST")
        .device_class(2) // from: https://www.usb.org/defined-class-codes
        .build();

    let timer = hal::Timer::new(pac.TIMER, &mut pac.RESETS);
    let start = timer.get_counter();

    // A message to sign and verify.
    let message = b"test";

    // Generates a new key pair using a random seed.
    // A given seed will always produce the same key pair.
    let key_pair = KeyPair::from_seed(Seed::default());

    // Computes a signature for this message using the secret part of the key pair.
    let signature = key_pair.sk.sign(message, Some(Noise::default()));

    // Verifies the signature using the public part of the key pair.
    crate::assert!(key_pair.pk.verify(message, &signature).is_ok());

    let end = timer.get_counter();

    let mut formatted_num = NumFormatter32([0; 10]);
    let _ = core::write!(&mut formatted_num, "{}", end - start);

    // We are done! Init USB and write results. Yes, this is a hack!
    let mut wrote_msg = false;
    loop {
        usb_dev.poll(&mut [&mut serial]);

        if !wrote_msg && timer.get_counter() >= 2_000_000 {
            wrote_msg = true;
            serial.write(b"ed22519 test took ").unwrap();
            serial.write(&formatted_num.0).unwrap();
            serial.write(b" cycles \r\n").unwrap();
        }
    }
}

Cargo.toml Deps

[dependencies]
cortex-m = "0.7.3"
cortex-m-rt = "0.7.0"
embedded-hal = { version = "0.2.5", features=["unproven"] }
embedded-time = "0.12.0"

defmt = "0.3.0"
defmt-rtt = "0.3.0"
panic-probe = { version = "0.3.0", features = ["print-defmt"] }
adafruit-feather-rp2040  = "0.1.0"
usb-device= "0.2.8"
usbd-serial = "0.1.1"
usbd-hid = "0.5.1"
getrandom = { version = "0.2.4", features=["custom"] }
ed25519-compact = { version = "1.0.11", default_features = false, features=["random"], path = "C:\\msys64\\home\\William\\Projects\\MSP430\\rust-ed25519-compact"}

[features]
opt_size = ["ed25519-compact/opt_size"]

@jedisct1
Copy link
Owner

Awesome, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants