Skip to content

Conversation

veblush
Copy link
Collaborator

@veblush veblush commented Sep 26, 2025

A recent QEMU upgrade appears to expose an underlying unaligned memory access issue on Cortex-M3. This is causing crashes (uncaught target signal 7 (Bus error)) in two tests, TestRecordsPersistentTfLiteTensorData and TestRecordsPersistentTfLiteTensorQuantizationData, which were temporarily disabled in PR #3206.

The crash occurs when accessing the test model's Flatbuffer memory, as shown by the following callstack:

#0  0x00008e14 in flatbuffers::Vector<long long, unsigned long>::Get (this=<optimized out>, i=110405) at tensorflow/lite/micro/tools/make/downloads/flatbuffers/include/flatbuffers/vector.h:178
#1  0x00008edc in tflite::internal::InitializeTfLiteTensorFromFlatbuffer (persistent_buffer_allocator=0x407fdc80, non_persistent_buffer_allocator=0x407fdc7c, allocate_temp=false, flatbuffer_tensor=..., 
    buffers=0x15d29 <kTestConvModelData+116>, result=0x407fdbb4) at tensorflow/lite/micro/micro_allocator.cc:285
#2  0x000093c8 in tflite::MicroAllocator::PopulateTfLiteTensorFromFlatbuffer (this=this@entry=0x407fdbd4, model=model@entry=0x15cd9 <kTestConvModelData+36>, tensor=tensor@entry=0x407fdbb4, tensor_index=tensor_index@entry=0, 
    subgraph_idx=<optimized out>, subgraph_idx@entry=0, allocate_temp=<optimized out>, allocate_temp@entry=false) at tensorflow/lite/micro/micro_allocator.cc:1110
#3  0x000098ac in tflite::RecordingMicroAllocator::PopulateTfLiteTensorFromFlatbuffer (this=0x407fdbd4, model=0x15cd9 <kTestConvModelData+36>, tensor=0x407fdbb4, tensor_index=0, subgraph_index=0, allocate_temp=false)
    at tensorflow/lite/micro/recording_micro_allocator.cc:236
#4  0x0000910e in tflite::MicroAllocator::AllocatePersistentTfLiteTensor (this=0x407fdbd4, model=0x15cd9 <kTestConvModelData+36>, subgraph_allocations=0x0, tensor_index=0, subgraph_index=0) at tensorflow/lite/micro/micro_allocator.cc:814
#5  0x00008484 in main (argc=<optimized out>, argv=<optimized out>) at tensorflow/lite/micro/recording_micro_allocator_test.cc:169

My hypothesis was that the pre-generated test model file has an alignment defect but it turned out that just adding alignas(16) to the model variable was enough to fix the tests.

BUG=TestFix

@veblush veblush added the ci:run label Sep 26, 2025
@TFLM-bot TFLM-bot removed the ci:run label Sep 26, 2025
@veblush veblush added the ci:run label Sep 26, 2025
@TFLM-bot TFLM-bot removed the ci:run label Sep 26, 2025
@veblush veblush changed the title Test conv model Regenerate test_conv_model.cc Sep 26, 2025
@veblush veblush added the ci:run label Sep 26, 2025
@TFLM-bot TFLM-bot removed the ci:run label Sep 26, 2025
@ddavis-2015
Copy link
Member

@veblush All the required corrections are in PR #3197

@veblush
Copy link
Collaborator Author

veblush commented Oct 2, 2025

@veblush All the required corrections are in PR #3197

Great! I'll try capturing your changes in this PR if your PR is not directly related to fixing this bug.

@veblush veblush added the ci:run label Oct 2, 2025
@TFLM-bot TFLM-bot removed the ci:run label Oct 2, 2025
@veblush veblush added the ci:run label Oct 2, 2025
@TFLM-bot TFLM-bot removed the ci:run label Oct 2, 2025
@veblush veblush marked this pull request as ready for review October 2, 2025 20:20
@veblush veblush requested a review from a team as a code owner October 2, 2025 20:20
@veblush
Copy link
Collaborator Author

veblush commented Oct 2, 2025

@ddavis-2015 I'd like to merge this separately from #3197 to ensure the two changes are atomic. I've ported over the relevant fixes from your PR to address the issue here. Thanks for your work on that!

@veblush veblush added the ci:run label Oct 2, 2025
@TFLM-bot TFLM-bot removed the ci:run label Oct 2, 2025
@veblush veblush added the ci:run label Oct 2, 2025
@TFLM-bot TFLM-bot removed the ci:run label Oct 2, 2025
@veblush veblush changed the title Regenerate test_conv_model.cc Reenable recording_micro_allocator_test Oct 2, 2025
@veblush
Copy link
Collaborator Author

veblush commented Oct 2, 2025

@ddavis-2015 I'd like to merge this separately from #3197 to ensure the two changes are atomic. I've ported over the relevant fixes from your PR to address the issue here. Thanks for your work on that!

As described above, it doesn't seem to need to regenerate a test model. So I removed that part from the PR.

@ddavis-2015
Copy link
Member

ddavis-2015 commented Oct 2, 2025

@ddavis-2015 I'd like to merge this separately from #3197 to ensure the two changes are atomic. I've ported over the relevant fixes from your PR to address the issue here. Thanks for your work on that!

As described above, it doesn't seem to need to regenerate a test model. So I removed that part from the PR.

@veblush
Oh! Duh. Of course that is the problem. Thanks for finding this. I'll change my PR accordingly.

@veblush veblush merged commit 4186e7d into tensorflow:main Oct 3, 2025
94 of 96 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants