Skip to content

Commit

Permalink
GH-43254: [C++] Always prefer mimalloc to jemalloc (#40875)
Browse files Browse the repository at this point in the history
### Rationale for this change

As discussed [on the mailing-list](https://lists.apache.org/thread/dts9ggvkthczfpmd25wrz449mxod76o2), this PR switches the default memory pool to mimalloc for all platforms. This should have several desirable effects:

* less variability between platforms
* mimalloc generally has a nicer, more consistent API and is easier to work with (in particular, jemalloc's configuration scheme is slightly abtruse)
* potentially better performance, or at least not significantly worse, than the statu quo

### Are these changes tested?

Yes, by existing CI configurations.

### Are there any user-facing changes?

Behavior should not change. Performance characteristics of some user workloads might improve or regress, but this is something we cannot predict in advance.

* GitHub Issue: #43254

Lead-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
  • Loading branch information
pitrou and kou authored Jul 16, 2024
1 parent a43950a commit 36fe1da
Show file tree
Hide file tree
Showing 5 changed files with 18 additions and 18 deletions.
18 changes: 8 additions & 10 deletions cpp/src/arrow/memory_pool.cc
Original file line number Diff line number Diff line change
Expand Up @@ -85,19 +85,17 @@ struct SupportedBackend {

const std::vector<SupportedBackend>& SupportedBackends() {
static std::vector<SupportedBackend> backends = {
// ARROW-12316: Apple => mimalloc first, then jemalloc
// non-Apple => jemalloc first, then mimalloc
#if defined(ARROW_JEMALLOC) && !defined(__APPLE__)
{"jemalloc", MemoryPoolBackend::Jemalloc},
#endif
// mimalloc is our preferred allocator for several reasons:
// 1) it has good performance
// 2) it is well-supported on all our main platforms (Linux, macOS, Windows)
// 3) it is easy to configure and has a consistent API.
#ifdef ARROW_MIMALLOC
{"mimalloc", MemoryPoolBackend::Mimalloc},
{"mimalloc", MemoryPoolBackend::Mimalloc},
#endif
#if defined(ARROW_JEMALLOC) && defined(__APPLE__)
{"jemalloc", MemoryPoolBackend::Jemalloc},
#ifdef ARROW_JEMALLOC
{"jemalloc", MemoryPoolBackend::Jemalloc},
#endif
{"system", MemoryPoolBackend::System}
};
{"system", MemoryPoolBackend::System}};
return backends;
}

Expand Down
2 changes: 2 additions & 0 deletions dev/archery/archery/benchmark/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ def default_configuration(**kwargs):
with_csv=True,
with_dataset=True,
with_json=True,
with_jemalloc=True,
with_mimalloc=True,
with_parquet=True,
with_python=False,
with_brotli=True,
Expand Down
2 changes: 1 addition & 1 deletion dev/tasks/linux-packages/github.linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ jobs:
run: |
set -e
pushd arrow/dev/tasks/linux-packages
rake version:update
rake version:update ARROW_RELEASE_TIME="$(date --iso-8601=seconds)"
rake docker:pull || :
rake --trace {{ task_namespace }}:build BUILD_DIR=build
popd
Expand Down
6 changes: 3 additions & 3 deletions docs/source/cpp/memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -139,9 +139,9 @@ Default Memory Pool

The default memory pool depends on how Arrow C++ was compiled:

- if enabled at compile time, a `jemalloc <http://jemalloc.net/>`_ heap;
- otherwise, if enabled at compile time, a
`mimalloc <https://github.com/microsoft/mimalloc>`_ heap;
- if enabled at compile time, a `mimalloc <https://github.com/microsoft/mimalloc>`_
heap;
- otherwise, if enabled at compile time, a `jemalloc <http://jemalloc.net/>`_ heap;
- otherwise, the C library ``malloc`` heap.

Overriding the Default Memory Pool
Expand Down
8 changes: 4 additions & 4 deletions docs/source/python/memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,12 +110,12 @@ the buffer is garbage-collected, all of the memory is freed:
pa.total_allocated_bytes()
Besides the default built-in memory pool, there may be additional memory pools
to choose (such as `mimalloc <https://github.com/microsoft/mimalloc>`_)
from depending on how Arrow was built. One can get the backend
name for a memory pool::
to choose from (such as `jemalloc <http://jemalloc.net/>`_)
depending on how Arrow was built. One can get the backend name for a memory
pool::

>>> pa.default_memory_pool().backend_name
'jemalloc'
'mimalloc'

.. seealso::
:ref:`API documentation for memory pools <api.memory_pool>`.
Expand Down

0 comments on commit 36fe1da

Please sign in to comment.