Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I tell when a deferred file read is finished? #9

Open
kcsmnt0 opened this issue Jan 22, 2024 · 14 comments
Open

How can I tell when a deferred file read is finished? #9

kcsmnt0 opened this issue Jan 22, 2024 · 14 comments

Comments

@kcsmnt0
Copy link

kcsmnt0 commented Jan 22, 2024

I'm all up and running with the core and having fun putting together a little Slint app, but I've hit a roadblock with the deferred file loading API. This might be a silly question, but I've been staring at the docs and experimenting on my Pocket for a couple days and I'm still stuck on it.

I'm trying to use deferred loading to read individual parts of a file that might be too big to fit in the core's RAM all at once. My slot in data.json looks like this:

{
  "name": "file",
  "id": 1,
  "required": true,
  "parameters": "0x89",
  "deferload": true
}

The parameters value sets flag 0 (user-reloadable), flag 3 (read-only), and flag 7 (restart of core before/after loading).

My Rust code to read a piece of data starts like this:

// bridge: APF_BRIDGE
// buffer: &mut [u8]
// offset: u32
bridge.ram_data_address.write(|w| w.ram_data_address().variant(buffer as *mut [u8] as *mut u8 as u32));
bridge.transfer_length.write(|w| w.transfer_length().variant(buffer.len() as u32));
bridge.data_offset.write(|w| w.data_offset().variant(offset);
bridge.request_read.write(|w| w.request_read().bit(true));

If I try println!("{:?}", buffer) immediately after setting request_read, I get the original unmodified buffer contents; if I have the program do some arbitrary pointless work for long enough before inspecting buffer, then it has the file contents that I expect. That seems like a race condition, which makes sense with my understanding of what's going on under the hood, but I can't figure out how to tell when the program has waited long enough.

None of these seem to work reliably as signals to block on:

  • status is cleared on read, so reading a 0 from it means that either no read is in progress or a read I've already observed is still in progress. I haven't figured out any way to distinguish between these two possibilities.
  • command_result_code is always 0 unless an invalid read happens, so if my code never does an invalid read, then command_result_code never changes at all. It also seems to update before a read is finished: I've tried deliberately doing an invalid read to set command_result_code to a nonzero value before I do the actual read, but waiting for command_result_code to return to 0 still didn't resolve my race condition.
  • current_address seems to update when the bridge starts to copy a chunk of the data, not when it finishes copying the chunk of data.

Right now I'm running into this problem in a larger project with some graphics and input code that are irrelevant to this particular issue, but I can boil it down to a minimal code example demonstrating the race condition if that's helpful. I've pored over the Analogue target command docs and I don't see anything relevant in there that isn't mirrored in the Rust API for this core, so I'm guessing I'm probably just missing something about how this is all supposed to fit together? Thanks in advance!

@agg23
Copy link
Owner

agg23 commented Jan 22, 2024

Unless I misunderstood what you said about status, you were on the right track:

pub fn check_op_complete() -> bool {
unsafe {
let peripherals = pac::Peripherals::steal();
peripherals.APF_BRIDGE.status.read().bits() == 1
}
}
pub fn block_op_complete() {
while !File::check_op_complete() {
// Loop
}
}

The status bit goes high on completion of an operation, and it’s cleared on read. So if you poll that register, and receive a 1, you know that whatever IO operation you started has now completed.

@kcsmnt0
Copy link
Author

kcsmnt0 commented Jan 22, 2024

Hm, waiting for status to read 1 also doesn't reliably resolve the race condition for me. The core docs say that status reads 1 when it's currently transferring a file and 0 when it's not transferring - I took that to mean that it should be 0 when the IO operation has completed?

@agg23
Copy link
Owner

agg23 commented Jan 22, 2024

I'm sorry, that comment is wrong (looks like maybe I forgot a word?). complete_trigger is pulsed high for one cycle when target_dataslot_done pulses high (which is directly from the APF sample code, but it's sent when the transfer is complete).

.apf_bridge_complete_trigger(target_dataslot_done_s && ~prev_target_dataslot_done_s),

When complete_trigger is pulsed high, status is set high, and it's only cleared once read, so it should act as a definitive way to notify your software of read complete.

openfpga-litex/litex/csr.py

Lines 105 to 109 in a3fbabc

If(
bridge_pins.complete_trigger & ~self.prev_complete_trigger,
# Push status high
self.status.status.eq(1),
),

Unfortunately I haven't released my MP3 player code, but I could show you that I'm using block_op_complete and it seems to be working just fine for me. If you could give me more information about your race condition, even just a few lines where you're requesting the read and waiting for status, I might be able to help more.

@kcsmnt0
Copy link
Author

kcsmnt0 commented Jan 23, 2024

Thanks for that clarification! I think I might have found my answer in your code, but just in case, here's a minimal test program that's giving me unexpected results, with the same data.json as in my previous comment:

#![no_std]
#![no_main]

use alloc::vec;
use core::panic::PanicInfo;
use core::mem::MaybeUninit;
use embedded_alloc::Heap;
use litex_openfpga::*;
use litex_pac::Peripherals;
use riscv_rt::entry;

extern crate alloc;

pub const HEAP_SIZE: usize = 32 * 1024 * 1024;
pub static mut HEAP_MEM: [MaybeUninit<u8>; HEAP_SIZE] = [MaybeUninit::uninit(); HEAP_SIZE];

#[global_allocator]
pub static HEAP: Heap = Heap::empty();

#[panic_handler]
fn panic(info: &PanicInfo) -> ! {
    println!("Panic:");
    println!("{info}");
    loop {}
}

#[entry]
fn main() -> ! {
    unsafe { HEAP.init(HEAP_MEM.as_ptr() as usize, HEAP_SIZE) };

    println!("start");

    let bridge = unsafe { Peripherals::steal().APF_BRIDGE };
    bridge.slot_id.write(|w| w.slot_id().variant(1));

    let length = bridge.file_size.read().bits();
    let mut buffer = vec![0u8; length as usize];

    bridge.ram_data_address.write(|w| w.ram_data_address().variant(buffer.as_mut_slice() as *mut [u8] as *mut u8 as u32));
    bridge.transfer_length.write(|w| w.transfer_length().variant(length));
    bridge.data_offset.write(|w| w.data_offset().variant(0));
    bridge.request_read.write(|w| w.request_read().bit(true));

    while !bridge.status.read().status().bit() {
        println!("waiting");
    }

    println!("buffer contents: {:X?}", buffer);

    loop { }
}

This shows the buffer as being full of zeroes regardless of what file I select. If I throw in some time-wasting loops, I can end up printing a buffer that's half file data and half zeroes.

I think I misdiagnosed the race condition, though: I thought it was a race on reading bytes from the file, but I notice that you're blocking on the slot change, which I wasn't doing in my code. When I add that into this test code, it seems to work correctly. (I haven't had time to try it out in my larger project yet.) I suppose the half-full buffers I was observing might be caused by the data slot switching mid-read, not by observing the buffer before the read is finished. Just to be precise, does reading from slot_id automatically block until a pending write to slot_id has finished?

I should admit that I didn't notice the File API in the litex_openfpga crate before you pointed it out here, which is why I've been trying to hack this together from raw parts 🤦‍♀️ I've been looking over all the relevant internet-hosted docs, but I forgot to check the local generated docs for other APIs. It's been an informative detour for me, at least - thanks again for your patience helping me out with this!

@agg23
Copy link
Owner

agg23 commented Jan 23, 2024

Just to be precise, does reading from slot_id automatically block until a pending write to slot_id has finished?

The CPU is a single core, so it will prevent a read from slot_id until the prior write has completed. What happens in this case is a write to slot_id updates the address of a single cycle RAM in the FPGA, which is storing the current size information for that slot. It takes a few cycles between the write and the read for the data to be active (and the read could even get pipelined), so inserting an additional read is a bit of a hack to give us enough time to guarantee everything has completed.

I probably should go back and fix that.

Because of that, you might not be reading the correct file_size in your example, depending on what slot your core is starting with.


My working file streaming example (for the MP3 player) is the simple construction you would expect from my File API:

File::request_read(0, 0x100000 as u32, MP3_BANK_ADDRESS_A as u32, 1);

File::block_op_complete();

That being said, I don't really understand why your code isn't working, unless I missed something...

@kcsmnt0
Copy link
Author

kcsmnt0 commented Jan 23, 2024

Yeah, I'm still not quite sure why that example code doesn't seem to read anything - slot 0 has the program binary in it, so even if slot_id fails to update from its initial value of 0 in time, I should be getting something read into the buffer. But without blocking on the slot switch, it seems to read nothing at all - the contents of the buffer are just whatever I initially put in there.

In case it matters at all, I'm testing this on a pink Pocket from the drop a couple months ago, running OS 2.0 (just haven't gotten around to updating to 2.1 yet). I'm happy to run any test code that you think might be helpful to figure out if something weird is going on with this particular device.

@kcsmnt0
Copy link
Author

kcsmnt0 commented Jan 23, 2024

Well, I'm still running into this problem in the bigger project. I went ahead and uploaded the code in progress in case you want to look at it - sorry that it's still messy!

Here's my current file reading code:

https://github.com/kcsmnt0/pocket-knife/blob/70eec25db72025c481a762d279cf95cd966e8946/backend-pocket/src/pocket.rs#L105-L129

I'm definitely not 100% sure that the code around this is bug-free, but it logs what I expect right up until the buffer turns out to be unchanged. The file size seems to read correctly, it's just the content of the file that doesn't load.

I'm using a big slow microSD card - I wonder if that's relevant? I'm going to test it out with some other microSD cards when I get a chance.

Quick edit: I should clarify that the codebase I linked does compile and run correctly except for the deferred file loading stuff; it's unpolished and there might be a few bugs lurking, but it's mostly "finished" as a minimal proof of concept. On my Pocket, the file loading issue makes it panic almost immediately, but if I hack around that problem with hardcoded data the rest of the app seems to work fine.

@agg23
Copy link
Owner

agg23 commented Jan 25, 2024

You have your data_offset and storage_address values mixed up. data_offset is the offset in the file on disk, and storage_address is where we're reading or writing to. So you should have:

File::request_read(
    *self.seek_position.borrow(),
    buffer.len() as u32,
    buffer as *mut [u8] as *mut u8 as u32,
    1,
);

Beware also that you probably want to take into consideration the length of the file when setting the read_length. buffer.len() + self.seek_position might go off the end of the file.

Your buffer pointer cast seems unnecessarily complicated, but I don't want to think about it now :P

@kcsmnt0
Copy link
Author

kcsmnt0 commented Jan 25, 2024

Ah, thanks for catching that! I'll try it out when I get home tonight but that'll presumably fix my issue. I was deliberately ignoring the possibility of running off the end of the file just because I've been testing with known-good files - there's pretty much no real error handling in the codebase yet, despite all the error-related boilerplate. I sure hope that pointer cast can be simplified, that's part of the polishing that I haven't done yet :) I'm still a bit new to Rust, just trying to hack something out that works before I go and try to make it all pretty!

@kcsmnt0
Copy link
Author

kcsmnt0 commented Jan 25, 2024

Nope... That was clearly a bug, but apparently it's not the bug. I'm still seeing an unmodified buffer:

read: buffer address 0x43fffeec, length 20, data address 0
read requested
buffer bytes: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

The actual first 20 bytes of the file are the bytes of the string "Pocket Knife Archive" (just double-checked in a hex editor). I think I did have the arguments in the right place when I was writing to the APF_BRIDGE registers directly instead of using the File API, so I guess I'm still running into whatever was going wrong in that previous minimal example (the one that didn't block on the slot change).

I've pushed the argument order fix to the repo. Let me know if you have any thoughts about what else to try!

@agg23
Copy link
Owner

agg23 commented Jan 27, 2024

For some reason I can't get your project to build (litex-pac is complaining about the externals TIMER0 and UART not existing). Odd because I have structured my own projects in the same way. I didn't care to look into this too much.

I tested your basic read scenario and it worked fine for me. I validated that 20 bytes works (I was worried that it would be too small), then noticed again the thing I first noticed and worried about; you declare a stack managed buffer (&mut [u8]) and have the core directly write to it.

Sure enough, when I attempt to write to the buffer (the nicer pointer type is simply output_slice.as_ptr() as u32 by the way), I only get 0s out.

From Rust's perspective, this slice is created (probably with all 0s), never touched, then read. This can be optimized out to just print/read 0 wherever. As far as I know there's no way to mark a slice read volatile, i.e. force reading this memory address, instead of optimizing it out. Unless you can teach Rust how to do that, you have to statically declare a chunk of memory as your "transfer" buffer, and call from_raw_parts to get a usable slice.

@kcsmnt0
Copy link
Author

kcsmnt0 commented Jan 27, 2024

Ahh, that totally makes sense - it's not a race condition that's been giving me unpredictable results, it's an overeager compiler optimization either firing or not firing depending on the code around the file read. I can work with that! Thanks again for talking through all this with me :)

@kcsmnt0 kcsmnt0 closed this as completed Jan 27, 2024
@kcsmnt0
Copy link
Author

kcsmnt0 commented Jan 29, 2024

Yep, reading into an unsafe static buffer seems to work reliably for me. This also works for reading into a local &mut [u8]:

File.request_read(/* args */);
File.block_op_complete();
for addr in buffer.iter_mut() {
    *addr = unsafe { core::ptr::read_volatile(addr as *const u8) };
}

It's not a great solution, but it seems like at worst it trades off a little unnecessary time for a little unnecessary space (compared to a static transfer buffer). I haven't found a way to get effect of read_volatile for the whole slice without going byte-by-byte.

@kcsmnt0
Copy link
Author

kcsmnt0 commented Jan 29, 2024

Nope... Sorry about this, I should have tested more thoroughly before drawing a conclusion. I'm getting more reliable reads, but still missing a lot of data.

Here's another minimal example that's not doing what I expect, where I'm trying to use a static mutable transfer buffer as unsafely as possible:

#![no_std]
#![no_main]

extern crate alloc;

use core::alloc::Layout;
use core::panic::PanicInfo;
use core::mem::MaybeUninit;
use embedded_alloc::Heap;
use litex_openfpga::*;
use riscv_rt::entry;

pub const HEAP_SIZE: usize = 32 * 1024 * 1024;
pub static mut HEAP_MEM: [MaybeUninit<u8>; HEAP_SIZE] = [MaybeUninit::uninit(); HEAP_SIZE];

#[global_allocator]
pub static HEAP: Heap = Heap::empty();

#[panic_handler]
fn panic(info: &PanicInfo) -> ! {
    println!("Panic:");
    println!("{info}");
    loop { }
}

const BUFFER_SIZE: u32 = 1024;
static mut BUFFER: *mut u8 = 0 as *mut u8;

#[entry]
fn main() -> ! {
    unsafe { HEAP.init(HEAP_MEM.as_ptr() as usize, HEAP_SIZE) };
    unsafe { BUFFER = alloc::alloc::alloc(Layout::array::<u8>(BUFFER_SIZE as usize).unwrap()) };

    println!("start");

    let size = File::size(1);
    println!("size: {}", size);

    let mut offset = 0;
    while offset < size {
        unsafe {
            core::slice::from_raw_parts_mut(BUFFER, BUFFER_SIZE as usize).copy_from_slice(&[0; BUFFER_SIZE as usize]);
        }
        let length = BUFFER_SIZE.min(size - offset);
        File::request_read(offset, length, unsafe { BUFFER } as u32, 1);
        File::block_op_complete();
        println!("buffer: {:X?}", unsafe { core::slice::from_raw_parts(BUFFER, length as usize) });
        offset += length;
    }

    loop { }
}

The filesize of the file I'm testing with is 7597590, so the loop goes through plenty of iterations. The first time the buffer gets filled, all 1024 bytes have the right contents. Every time after that, it seems like exactly the first 88 bytes of the buffer have the right contents, and the rest of the buffer is all the zeroes that were put in there at the start of the loop iteration. If I throw a couple extra print statements at the top of the loop, it becomes the first 20 bytes of the buffer instead of the first 88 bytes. That really sounds like a race condition, but I'm totally confused at this point.

@kcsmnt0 kcsmnt0 reopened this Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants