Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Write trait with a write fn in Logger. #258

Closed
wants to merge 1 commit into from

Conversation

Dirbaio
Copy link
Contributor

@Dirbaio Dirbaio commented Nov 15, 2020

The "remove dyn Write" part of of #257 contained breaking changes, so it's extracted here as a separate PR. See that PR for details.

@Dirbaio Dirbaio mentioned this pull request Nov 15, 2020
Dirbaio added a commit to akiles/defmt-rtt-target that referenced this pull request Nov 15, 2020
@jonas-schievink jonas-schievink added the status: blocked Blocked on another issue, or on upstream libraries label Nov 16, 2020
@Dirbaio Dirbaio force-pushed the codesize2 branch 2 times, most recently from e0f0c9e to dc01395 Compare November 23, 2020 21:16
@japaric japaric added this to the v0.2.0 milestone Nov 24, 2020
@japaric
Copy link
Member

japaric commented Nov 24, 2020

triage: added to the 0.2.0 milestone.
status: blocked on deciding a set of new features and changes for the 0.2.0 release.
These breaking changes need to be evaluated against other changes that will be included in the 0.2.0 release

@japaric japaric added the breaking change fix / feature / improvement involves a breaking change and needs to wait until next minor version label Nov 27, 2020
@japaric japaric removed this from the v0.2.0 milestone Nov 27, 2020
@Dirbaio Dirbaio force-pushed the codesize2 branch 2 times, most recently from 84930c2 to f553e86 Compare December 1, 2020 16:06
@japaric
Copy link
Member

japaric commented Dec 1, 2020

I was wondering if it's still possible to implement a lock-free logger with the proposed trait changes and it seems possible but these changes may impact performance negatively (don't know by how much though).

One way to implement such a logger on Cortex-M is to have one RTT channel per priority level. With this design interrupts running at different priorities don't need to lock (disable interrupts / cause priority inversion) a single logger, instead they grab their interrupt-priority-local (similar to a thread-local) logger.

With the current trait, the implementation of that lock-free logger looks like this (off the top of my head: may not be correct / accurate):

// an RTT channel
struct Channel { taken: AtomicBool, /* .. */ }
impl defmt::Write for Channel { /* .. */ }

static mut CHANNELS: [Channel; N] = [/* .. */];

unsafe fn grab() -> Option<&mut Channel> {
    let static_priority = get_static_priority();
    // omitted: scaling priority down to a logical index
    let channel = CHANNELS.get_mut(index);
    // omitted: check `taken` flag; return `None` if already taken
}

impl defmt::Logger for Logger {
    fn acquire() -> Option<NonNull<dyn defmt::Write>> {
        let channel = unsafe { acquire() };
        // omitted: set `taken` flag; cast into trait object
    }
    fn release(channel: Option<NonNull<dyn defmt::Write>>) {
        // omitted: downcast `channel` to `Channel` then clear `taken`
    }
}

With the proposed Logger trait changes, the grab operation (or something like it?) needs to be called on each Logger::write call because Logger::write doesn't get a handle to the interrupt-priority-local logger (the dyn Write trait object).

impl defmt::Logger for Logger {
    unsafe fn write(bytes: &[u8]) {
        // no handle passed as an argument; logger needs to be retrieved again
        if let Some(channel) = grab() {
            channel.write() // Channel::write
        }
    }
}

So that's extra overhead per write operation.

@Dirbaio
Copy link
Contributor Author

Dirbaio commented Dec 1, 2020

Indexing into CHANNELS should be just a handful of asm instructions, which I think is comparable to the overhead of doing a write call through a vtable.

Aside from that, this new API allows for inlining write, plus in many cases it allows completely optimizing away the entire Formatter.

I'd be very surprised if it's slower all things considered.

@Dirbaio
Copy link
Contributor Author

Dirbaio commented Dec 1, 2020

Also, your example can be optimized a bit more: it's enough to check TAKEN in acquire instead of every write.

EDIT added code

static mut TAKEN: [bool; N] = [/* .. */];
static mut CHANNELS: [UpChannel; N] = [/* .. */];

unsafe fn logical_priority() -> usize {
    let static_priority = get_static_priority();
    // omitted: scaling priority down to a logical index
    let logical_priority = ...
    logical_priority
}

impl defmt::Logger for Logger {
    fn acquire() -> bool {
        let p = logical_priority();
        if TAKEN[p] {
            return false;
        }
        TAKEN[p] = true;
        true
    }

    fn release() {
        let p = logical_priority();
        TAKEN[p] = false;
    }

    unsafe fn write(bytes: &[u8]) {
        let p = logical_priority();
        CHANNELS[p].write(bytes);
    }
}

@Dirbaio
Copy link
Contributor Author

Dirbaio commented Dec 1, 2020

Ran the numbers again on my project. This is still 54% code size savings!

               total  defmt
no defmt      115922
defmt main    151746  35824
defmt nowrite 132378  16456  

@jonas-schievink
Copy link
Contributor

Thinking about this, could we make write a safe function by passing a &mut self to it and returning Option<Self> from acquire?

@Dirbaio
Copy link
Contributor Author

Dirbaio commented Jan 20, 2021

That would be nearly the same as the current code in main branch, only folding Writer and Logger into the same trait: defmt would have to hold a dyn Logger instead of a dyn Writer. I don't think it'd realize any code-size savings.

@Dirbaio
Copy link
Contributor Author

Dirbaio commented Mar 2, 2021

This is now heavily bitrotten. I was hoping this could be in 0.2 but nope :(

Is there still interest in doing this? @japaric @jonas-schievink

@jonas-schievink
Copy link
Contributor

Yeah, I do think we want this, or something like this. I don't currently have the time to come up with the best solution here though.

@Dirbaio Dirbaio force-pushed the codesize2 branch 2 times, most recently from fc39c14 to 143616e Compare March 2, 2021 18:00
@Dirbaio
Copy link
Contributor Author

Dirbaio commented Mar 2, 2021

Rebased (rewrote) on latest master. Here are the numbers. This time reduction is only 27%. I still haven't investigated why. Maybe it's other kind of bloat introduced in 0.2 that doesn't get removed by this? For example the finalize call is new, but it's probably something else.

                total  defmt  win
no defmt       120636
defmt main     166556  45920
defmt nowrite  154356  33720 -27%
#[inline(never)]
fn lol(x: u32) {
    info!("defmt test {=u32}", x);
}

before:

                             _ZN11application3sys3lol17hcf2450099b262064E    XREF[2]:     0003a3aa(c), 
                             application::application::sys::lol                           .debug_frame::00001e6c(*)  
    00039c04 b0 b5            push                 { r4, r5, r7, lr }
    00039c06 02 af            add                  r7,sp,#0x8
    00039c08 8a b0            sub                  sp,#0x28
    00039c0a 01 ad            add                  r5,sp,#0x4
    00039c0c 28 46            mov                  r0,r5
    00039c0e 0c f0 3e f9      bl                   defmt::export::acquire                                     Option<defmt--InternalFormatter>
    00039c12 9d f8 0e 00      ldrb.w               r0,[sp,#local_2a]
    00039c16 02 28            cmp                  r0,#0x2
    00039c18 1b d0            beq                  LAB_00039c52
    00039c1a 04 ac            add                  r4,sp,#0x10
    00039c1c 95 e8 0e 00      ldm.w                r5=>local_34,{ r1, r2, r3 }
    00039c20 20 46            mov                  r0,r4
    00039c22 0e c0            stmia                r0!=>local_28,{  r1, r2, r3 }
    00039c24 20 46            mov                  r0,r4
    00039c26 0c 49            ldr                  r1,[DAT_00039c58]                                          = 00000062h
    00039c28 0c f0 66 fa      bl                   defmt::InternalFormatter::header                           void header(InternalFormatter * 
    00039c2c 7b 20            mov                  r0,#0x7b
    00039c2e 04 22            mov                  r2,#0x4
    00039c30 07 90            str                  r0,[sp,#local_1c]
    00039c32 dd e9 04 01      ldrd                 r0,r1,[sp,#local_28]
    00039c36 cb 68            ldr                  r3,[r1,#0xc]
    00039c38 07 ad            add                  r5,sp,#0x1c
    00039c3a 29 46            mov                  r1,r5
    00039c3c 98 47            blx                  r3
    00039c3e 20 46            mov                  r0,r4
    00039c40 0c f0 54 fa      bl                   defmt::InternalFormatter::finalize                         void finalize(InternalFormatter 
    00039c44 94 e8 0e 00      ldm.w                r4=>local_28,{ r1, r2, r3 }
    00039c48 28 46            mov                  r0,r5
    00039c4a 0e c0            stmia                r0!=>local_1c,{  r1, r2, r3 }
    00039c4c 28 46            mov                  r0,r5
    00039c4e 0c f0 23 f9      bl                   defmt::export::release                                     void release(InternalFormatter *
                             LAB_00039c52                                    XREF[1]:     00039c18(j)  
    00039c52 0a b0            add                  sp,#0x28
    00039c54 b0 bd            pop                  { r4, r5, r7, pc }

after:

                             _ZN11application3sys3lol17h66aa1823a336ffe0E    XREF[2]:     0003880c(c), 
                             application::sys::lol::h66aa1823a336ffe0                     .debug_frame::00001ddc(*)  
    000380dc dc b5            push                 { r2, r3, r4, r6, r7, lr }
    000380de 04 af            add                  r7,sp,#0x10
    000380e0 0b f0 0b f9      bl                   defmt::export::acquire::h4f7b9aa9db4e8941                  undefined h4f7b9aa9db4e8941()
    000380e4 98 b1            cbz                  r0,LAB_0003810e
    000380e6 00 20            mov                  r0,#0x0
    000380e8 09 49            ldr                  r1,[DAT_00038110]                                          = 00000062h
    000380ea 8d f8 06 00      strb.w               r0,[sp,#local_12]
    000380ee 4f f4 00 60      mov.w                r0,#0x800
    000380f2 ad f8 04 00      strh.w               r0=>DAT_00000800,[sp,#local_14]
    000380f6 01 ac            add                  r4,sp,#0x4
    000380f8 20 46            mov                  r0,r4
    000380fa 0b f0 fa f9      bl                   defmt::InternalFormatter::header::h2ca9585eed7f8b9c        undefined h2ca9585eed7f8b9c()
    000380fe 7b 20            mov                  r0,#0x7b
    00038100 0b f0 70 f9      bl                   defmt::InternalFormatter::u32::h2b2e0abc8a8ec278           undefined h2b2e0abc8a8ec278()
    00038104 20 46            mov                  r0,r4
    00038106 0b f0 ee f9      bl                   defmt::InternalFormatter::finalize::h3d434c2c09091648      undefined h3d434c2c09091648()
    0003810a 0b f0 f8 f8      bl                   defmt::export::release::hfb47944ac52b8ac4                  undefined hfb47944ac52b8ac4()
                             LAB_0003810e                                    XREF[1]:     000380e4(j)  
    0003810e dc bd            pop                  { r2, r3, r4, r6, r7, pc }

@Dirbaio
Copy link
Contributor Author

Dirbaio commented Jun 11, 2021

Closing in favor of #505

@Dirbaio Dirbaio closed this Jun 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change fix / feature / improvement involves a breaking change and needs to wait until next minor version status: blocked Blocked on another issue, or on upstream libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants