Skip to content

Use std::align_alloc in file_data_loader #10660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lucylq
Copy link
Contributor

@lucylq lucylq commented May 2, 2025

Summary:
Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer to 16, we overflow the original buffer (as it wasn't padded) and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)

The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

Differential Revision: D74041198

Copy link

pytorch-bot bot commented May 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10660

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 48db257 with merge base 46a18cb (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 2, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

@lucylq lucylq changed the title Fix alignment in file_data_loader Use std::align_alloc in file_data_loader May 2, 2025
lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 2, 2025
Summary:

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from e3b1227 to 96bf3d8 Compare May 2, 2025 17:54
Copy link
Contributor

@larryliu0820 larryliu0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add some tests?

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 2, 2025
Summary:
Pull Request resolved: pytorch#10660

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

Reviewed By: larryliu0820

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from 96bf3d8 to 89b0d90 Compare May 2, 2025 17:58
@lucylq
Copy link
Contributor Author

lucylq commented May 2, 2025

Is it possible to add some tests?

alignment is covered by the file_data_loader_tests:

EXPECT_ALIGNED(fb->data(), alignment());

I think we shouldn't have this error now that we've moved to aligned_alloc (hopefully), probably the main thing is to make sure oss ci passes on macos.

Are you thinking of a different test though?

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 2, 2025
Summary:

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

Reviewed By: larryliu0820

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from 89b0d90 to 2654ddc Compare May 2, 2025 18:20
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 3, 2025
Summary:

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from 2654ddc to bcab1f2 Compare May 3, 2025 00:04
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 3, 2025
Summary:

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from bcab1f2 to eb2794b Compare May 3, 2025 00:19
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 3, 2025
Summary:
Pull Request resolved: pytorch#10660

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from eb2794b to 3ca3e07 Compare May 3, 2025 00:23
lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 3, 2025
Summary:
|...

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

For systems that do not have aligned_alloc (or posix_memalign) fallback to malloc.

The malloc implementation is similar to what file_data_loader.cpp does. Except, we do not have a custom free function from FreeableBuffer, so we store an offset just before the aligned ptr to free the actual buffer.

1. Allocate via malloc; buffer = malloc(size + sizeof(uint16_t) + alignment - 1)
- size: the size requested.
- sizeof(uint16_t): a place to store the offset between the aligned buffer and the original ptr.
- alignment-1: extra padding to allow for alignment.
2. Align (buffer + sizeof(uint16_t)) to alignment. This (usually) pushes the buffer forward by `alignment`.
3. Store the difference of (aligned_ptr - buffer).

The memory will look like this:
| buffer start | maybe padding | offset (aligned_buffer - buffer) | aligned_buffer start | maybe padding | buffer end |

We should have between offset_size (2) and 1 aligned block before the actual aligned buffer.

https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from 3ca3e07 to 049a032 Compare May 3, 2025 01:32
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

@lucylq lucylq force-pushed the export-D74041198 branch from 049a032 to 00fea9e Compare May 5, 2025 19:40
lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 5, 2025
Summary:
|...

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

For systems that do not have aligned_alloc (or posix_memalign) fallback to malloc.

The malloc implementation is similar to what file_data_loader.cpp does. Except, we do not have a custom free function from FreeableBuffer, so we store an offset just before the aligned ptr to free the actual buffer.

1. Allocate via malloc; buffer = malloc(size + sizeof(uint16_t) + alignment - 1)
- size: the size requested.
- sizeof(uint16_t): a place to store the offset between the aligned buffer and the original ptr.
- alignment-1: extra padding to allow for alignment.
2. Align (buffer + sizeof(uint16_t)) to alignment. This (usually) pushes the buffer forward by `alignment`.
3. Store the difference of (aligned_ptr - buffer).

The memory will look like this:
| buffer start | maybe padding | offset (aligned_buffer - buffer) | aligned_buffer start | maybe padding | buffer end |

We should have between offset_size (2) and 1 aligned block before the actual aligned buffer.

https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 5, 2025
Summary:
|...

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

For systems that do not have aligned_alloc (or posix_memalign) fallback to malloc.

The malloc implementation is similar to what file_data_loader.cpp does. Except, we do not have a custom free function from FreeableBuffer, so we store an offset just before the aligned ptr to free the actual buffer.

1. Allocate via malloc; buffer = malloc(size + sizeof(uint16_t) + alignment - 1)
- size: the size requested.
- sizeof(uint16_t): a place to store the offset between the aligned buffer and the original ptr.
- alignment-1: extra padding to allow for alignment.
2. Align (buffer + sizeof(uint16_t)) to alignment. This (usually) pushes the buffer forward by `alignment`.
3. Store the difference of (aligned_ptr - buffer).

The memory will look like this:
| buffer start | maybe padding | offset (aligned_buffer - buffer) | aligned_buffer start | maybe padding | buffer end |

We should have between offset_size (2) and 1 aligned block before the actual aligned buffer.

https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from 00fea9e to acb9e6d Compare May 5, 2025 23:40
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 5, 2025
Summary:
|...

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

For systems that do not have aligned_alloc (or posix_memalign) fallback to malloc.

The malloc implementation is similar to what file_data_loader.cpp does. Except, we do not have a custom free function from FreeableBuffer, so we store an offset just before the aligned ptr to free the actual buffer.

1. Allocate via malloc; buffer = malloc(size + sizeof(uint16_t) + alignment - 1)
- size: the size requested.
- sizeof(uint16_t): a place to store the offset between the aligned buffer and the original ptr.
- alignment-1: extra padding to allow for alignment.
2. Align (buffer + sizeof(uint16_t)) to alignment. This (usually) pushes the buffer forward by `alignment`.
3. Store the difference of (aligned_ptr - buffer).

The memory will look like this:
| buffer start | maybe padding | offset (aligned_buffer - buffer) | aligned_buffer start | maybe padding | buffer end |

We should have between offset_size (2) and 1 aligned block before the actual aligned buffer.

https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from acb9e6d to f6779ca Compare May 5, 2025 23:46
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 6, 2025
Summary:

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

For systems that do not have aligned_alloc (or posix_memalign) fallback to malloc.

The malloc implementation is similar to what file_data_loader.cpp does. Except, we do not have a custom free function from FreeableBuffer, so we store an offset just before the aligned ptr to free the actual buffer.

1. Allocate via malloc; buffer = malloc(size + sizeof(uint16_t) + alignment - 1)
- size: the size requested.
- sizeof(uint16_t): a place to store the offset between the aligned buffer and the original ptr.
- alignment-1: extra padding to allow for alignment.
2. Align (buffer + sizeof(uint16_t)) to alignment. This (usually) pushes the buffer forward by `alignment`.
3. Store the difference of (aligned_ptr - buffer).

The memory will look like this:
| buffer start | maybe padding | offset (aligned_buffer - buffer) | aligned_buffer start | maybe padding | buffer end |

We should have between offset_size (2) and 1 aligned block before the actual aligned buffer.

https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/

NOTE: this increase binary size (linux, clang) by 8 bytes, so also raising it there.

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from f6779ca to f4391a7 Compare May 6, 2025 00:31
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

@lucylq lucylq force-pushed the export-D74041198 branch from f4391a7 to a4f65dc Compare May 6, 2025 16:16
lucylq added a commit to lucylq/executorch-1 that referenced this pull request May 6, 2025
Summary:

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

For systems that do not have aligned_alloc (or posix_memalign) fallback to malloc.

The malloc implementation is similar to what file_data_loader.cpp does. Except, we do not have a custom free function from FreeableBuffer, so we store an offset just before the aligned ptr to free the actual buffer.

1. Allocate via malloc; buffer = malloc(size + sizeof(uint16_t) + alignment - 1)
- size: the size requested.
- sizeof(uint16_t): a place to store the offset between the aligned buffer and the original ptr.
- alignment-1: extra padding to allow for alignment.
2. Align (buffer + sizeof(uint16_t)) to alignment. This (usually) pushes the buffer forward by `alignment`.
3. Store the difference of (aligned_ptr - buffer).

The memory will look like this:
| buffer start | maybe padding | offset (aligned_buffer - buffer) | aligned_buffer start | maybe padding | buffer end |

We should have between offset_size (2) and 1 aligned block before the actual aligned buffer.

https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/

NOTE: this increase binary size (linux, clang) by 8 bytes, so also raising it there.

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

Summary:

Issue with aligned buffers: P1800967583

The alignment requested is 16, and std::max_align_t is also 16. This means we do not need to pad the size to meet any alignment.

However, the buffer we get from malloc is aligned to 8, not 16. When we try to align the buffer, we overflow and error out.

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers, so also a bit hard to test definitively. So far we've only seen this when the buffer size is small (size 2, 4)
```
The malloc(), calloc(), realloc(), and reallocarray() functions
return a pointer to the allocated memory, which is suitably
aligned for any type that fits into the requested size or less.
```

Use std::aligned_alloc (C++17) to ensure buffer is aligned.

For systems that do not have aligned_alloc (or posix_memalign) fallback to malloc.

The malloc implementation is similar to what file_data_loader.cpp does. Except, we do not have a custom free function from FreeableBuffer, so we store an offset just before the aligned ptr to free the actual buffer.

1. Allocate via malloc; buffer = malloc(size + sizeof(uint16_t) + alignment - 1)
- size: the size requested.
- sizeof(uint16_t): a place to store the offset between the aligned buffer and the original ptr.
- alignment-1: extra padding to allow for alignment.
2. Align (buffer + sizeof(uint16_t)) to alignment. This (usually) pushes the buffer forward by `alignment`.
3. Store the difference of (aligned_ptr - buffer).

The memory will look like this:
| buffer start | maybe padding | offset (aligned_buffer - buffer) | aligned_buffer start | maybe padding | buffer end |

We should have between offset_size (2) and 1 aligned block before the actual aligned buffer.

https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/

NOTE: this increase binary size (linux, clang) by 8 bytes, so also raising it there.

Reviewed By: larryliu0820, mcr229

Differential Revision: D74041198
@lucylq lucylq force-pushed the export-D74041198 branch from a4f65dc to 48db257 Compare May 7, 2025 00:51
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74041198

@swolchok
Copy link
Contributor

swolchok commented May 7, 2025

Seems like malloc is not guaranteed to return 8 or 16 byte-aligned buffers

it is supposed to be guaranteed, but reality does not always conform to standards :\

Comment on lines +190 to +203
#elif defined(__APPLE__)
#include <stdlib.h> // For posix_memalign and free
inline void* et_apple_aligned_alloc(size_t alignment, size_t size) {
void* ptr = nullptr;
// The address of the allocated memory must be a multiple of sizeof(void*).
if (alignment < sizeof(void*)) {
alignment = sizeof(void*);
}
if (posix_memalign(
&ptr, alignment, (size + alignment - 1) & ~(alignment - 1)) != 0) {
return nullptr;
}
return ptr;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to include citations for why we need all these varied solutions despite std::aligned_alloc being required in C++17. https://gitlab.com/gromacs/gromacs/-/issues/3968 is one example but I'm curious where you got this from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants