Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code size optimizations #257

Merged
merged 2 commits into from
Nov 23, 2020
Merged

Code size optimizations #257

merged 2 commits into from
Nov 23, 2020

Conversation

Dirbaio
Copy link
Contributor

@Dirbaio Dirbaio commented Nov 14, 2020

  • Replace the dyn Write with an extra Write function in Logger trait. This saves quite a bit of space because now many Formatter methods no longer require access to self, which allows nice optimizations. -- MOVED from this PR into Replace Write trait with a write fn in Logger. #258
  • All logs do a timestamp call, then write the log string with istr, then write the timestamp with leb64. Extract this to a single header function. Replaces 3 calls with 1, though most of the savings here come from #[inline(never)] because the compiler was inlining timestamp.
  • Add #[inline(never)] to acquire and release because the compiler was inlining them.

With these changes, defmt code size goes from 62kb to 16kb In my project's firmware. This is a 73,8% code size reduction.


 total  defmt   savings  sav %   
114824                            No defmt (baseline)
176868  62044                     defmt main
159780  44956    17088   27.5%    write
148956  34132    10824   24.1%    write, header
131100  16276    17856   52.3%    write, header, noinline
                 45768   73.8%    total

(numbers are flash bytes, .text + .rodata)

build env:
rustc 1.49.0-nightly (cf9cf7c92 2020-11-10)
opt-level='s' lto='fat' debug-assertions=false codegen-units=1
--target thumbv7em-none-eabi -Zbuild-std=core -Zbuild-std-features=panic_immediate_abort

Generated code for this function:

fn foo() {
    info!("Foo!");
}

Before:

    0003d302 f0 b5            push                 { r4, r5, r6, r7, lr }
    0003d304 03 af            add                  r7,sp,#0xc
    0003d306 2d e9 00 0b      push                 { r8, r9, r11  }
    0003d30a 84 b0            sub                  sp,#0x10
    0003d30c 4d f2 60 71      movw                 r1,#0xd760
    0003d310 c2 f2 01 01      movt                 r1,#0x2001
    0003d314 08 68            ldr                  r0,[r1,#0x0]=>embassy::time::CLOCK::h6bf0607dab71a52c      = ??
    0003d316 00 28            cmp                  r0,#0x0
    0003d318 43 d0            beq                  LAB_0003d3a2
    0003d31a 49 68            ldr                  r1,[r1,#0x4]=>h6bf0607dab71a52c+4                          = null
    0003d31c c9 68            ldr                  r1,[r1,#0xc]
    0003d31e 88 47            blx                  r1
    0003d320 4d f2 58 76      movw                 r6,#0xd758
    0003d324 04 46            mov                  r4,r0
    0003d326 ef f3 10 80      mrs                  r0,primask
    0003d32a 72 b6            cpsid                i
    0003d32c c2 f2 01 06      movt                 r6,#0x2001
    0003d330 0d 46            mov                  r5,r1
    0003d332 31 78            ldrb                 r1,[r6,#0x0]=>_ZN16defmt_rtt_target5TAKEN17h1f0c4be8515c   = ??
    0003d334 00 f0 01 00      and                  r0,r0,#0x1
    0003d338 09 06            lsl                  r1,r1,#0x18
    0003d33a 02 d0            beq                  LAB_0003d342
    0003d33c 68 bb            cbnz                 r0,LAB_0003d39a
    0003d33e 62 b6            cpsie                i
    0003d340 2b e0            b                    LAB_0003d39a
                             LAB_0003d342                                    XREF[1]:     0003d33a(j)  
    0003d342 4d f2 59 79      movw                 r9,#0xd759
    0003d346 01 21            mov                  r1,#0x1
    0003d348 80 f0 01 00      eor                  r0,r0,#0x1
    0003d34c c2 f2 01 09      movt                 r9,#0x2001
    0003d350 31 70            strb                 r1,[r6,#0x0]=>_ZN16defmt_rtt_target5TAKEN17h1f0c4be8515c   = ??
    0003d352 0d f1 04 08      add.w                r8,sp,#0x4
    0003d356 89 f8 00 00      strb.w               r0,[r9,#0x0]=>_ZN16defmt_rtt_target17INTERRUPTS_ACTIVE17   = ??
    0003d35a 4f f4 00 60      mov.w                r0,#0x800
    0003d35e 03 90            str                  r0,[sp,#local_24]
    0003d360 42 f2 50 00      movw                 r0,#0x2050
    0003d364 c0 f2 05 00      movt                 r0,#0x5
    0003d368 40 f2 6d 01      movw                 r1,#0x6d
    0003d36c 02 90            str                  r0=>CORE_PERIPHERALS,[sp,#local_28]                        = 
    0003d36e 42 f2 50 00      movw                 r0,#0x2050
    0003d372 c0 f2 05 00      movt                 r0,#0x5
    0003d376 c0 f2 00 01      movt                 r1,#0x0
    0003d37a 01 90            str                  r0=>CORE_PERIPHERALS,[sp,#local_2c]                        = 
    0003d37c 40 46            mov                  r0,r8
    0003d37e 0a f0 62 f9      bl                   defmt::Formatter::istr::h511835e8c9f38eb5                  undefined h511835e8c9f38eb5()
    0003d382 40 46            mov                  r0,r8
    0003d384 22 46            mov                  r2,r4
    0003d386 2b 46            mov                  r3,r5
    0003d388 0a f0 bd f8      bl                   defmt::Formatter::leb64::h0268c27e9787be3a                 undefined h0268c27e9787be3a()
    0003d38c 00 20            mov                  r0,#0x0
    0003d38e 30 70            strb                 r0,[r6,#0x0]=>_ZN16defmt_rtt_target5TAKEN17h1f0c4be8515c   = ??
    0003d390 99 f8 00 00      ldrb.w               r0,[r9,#0x0]=>_ZN16defmt_rtt_target17INTERRUPTS_ACTIVE17   = ??
    0003d394 00 06            lsl                  r0,r0,#0x18
    0003d396 00 d0            beq                  LAB_0003d39a
    0003d398 62 b6            cpsie                i
                             LAB_0003d39a                                    XREF[3]:     0003d33c(j), 0003d340(j), 
                                                                                          0003d396(j)  
    0003d39a 04 b0            add                  sp,#0x10
    0003d39c bd e8 00 0b      pop.w                { r8, r9, r11 }
    0003d3a0 f0 bd            pop                  { r4, r5, r6, r7, pc }
                             LAB_0003d3a2                                    XREF[1]:     0003d318(j)  
    0003d3a2 fe de            udf                  #0xfe
    0003d3a4 fe               ??                   FEh
    0003d3a5 de               ??                   DEh

After:

    00037842 80 b5            push                 { r7, lr }
    00037844 6f 46            mov                  r7,sp
    00037846 08 f0 ff fb      bl                   defmt::export::acquire::h8ae560551824f2bc                  undefined h8ae560551824f2bc()
    0003784a 58 b1            cbz                  r0,LAB_00037864
    0003784c 40 f2 6a 00      movw                 r0,#0x6a
    00037850 c0 f2 00 00      movt                 r0,#0x0
    00037854 08 f0 cc fc      bl                   defmt::Formatter::istr::h8532c58e5c1eb886                  undefined h8532c58e5c1eb886()
    00037858 08 f0 09 fd      bl                   defmt::Formatter::timestamp::h6f4ec153907f6d4d             undefined h6f4ec153907f6d4d()
    0003785c bd e8 80 40      pop.w                { r7, lr }=>local_8
    00037860 08 f0 0c bc      b.w                  defmt::export::release::h57dd438e17359598                  undefined h57dd438e17359598()
                             -- Flow Override: CALL_RETURN (CALL_TERMINATOR)
                             LAB_00037864                                    XREF[1]:     0003784a(j)  
    00037864 80 bd            pop                  { r7, pc }

@Dirbaio Dirbaio force-pushed the codesize branch 3 times, most recently from d5f003e to 5e30729 Compare November 14, 2020 02:04
@jonas-schievink
Copy link
Contributor

Really nice! We might want to hold off on the breaking changes though, since we probably want to wait a bit until releasing 0.2.0, so if you can split them into a separate PR that'd be great!

@Dirbaio
Copy link
Contributor Author

Dirbaio commented Nov 15, 2020

Done!

@@ -421,10 +421,8 @@ fn log(level: Level, ts: TokenStream) -> TokenStream {
if #logging_enabled {
match (#(&(#args)),*) {
(#(#pats),*) => {
let ts = defmt::export::timestamp();
if let Some(mut _fmt_) = defmt::export::acquire() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also make the acquire part of shared code, by doing something like:

impl Formatter {
    pub fn start_frame(s: &Str) -> Option<Self> {
        let ts = timestamp();
        let mut fmt = acquire();
        if let Some(fmt) = &mut fmt {
            fmt.istr(s);
            fmt.leb64(ts);
        }

        fmt
    }
}

As a bonus, this keeps the order of operations like it was before (timestamp, then acquire).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably we'd want to move the #[inline(never)] to start_frame, since we should only have a single call site to acquire remaining.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea. What do we do with winfo! then? It needs to encode the header but doesn't create its own Formatter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, good question. Ideally I'd like to get rid of it. I might look into doing that tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this PR as-is for now – we can always land further improvements later.

@jonas-schievink jonas-schievink added the pr waits on: review Pull Request is waiting on review label Nov 16, 2020
@Lotterleben Lotterleben requested review from jonas-schievink and removed request for Lotterleben November 23, 2020 12:05
@jonas-schievink jonas-schievink merged commit b22578f into knurling-rs:main Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr waits on: review Pull Request is waiting on review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants