Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NUMA support and optimizations #11871

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

tlichwala
Copy link

Summary:
This pull request adds support for Non-Uniform Memory Access (NUMA) to Apache Traffic Server, enhancing performance on NUMA systems.

Key Changes:

  • Added CMake options to enable NUMA support and debugging.
  • Introduced new configuration options for NUMA optimizations.
  • Enhanced thread and memory management to be NUMA-aware.
  • Added RamCacheContainer for cache duplication across NUMA nodes.
  • Integrated NUMA debugging utilities.

Performance testing has shown increased throughput, reduced CPU usage, improved latency, decreased UPI bus load

These changes aim to optimize memory access patterns and reduce latency on NUMA systems.

To build the application with NUMA support, use the following CMake command:
cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_MIMALLOC=ON -DENABLE_NUMA=ON -DENABLE_HWLOC=ON -Dmimalloc_DIR=""

@apache apache deleted a comment from ezelkow1 Nov 15, 2024
@zwoop
Copy link
Contributor

zwoop commented Nov 15, 2024

Nice! The PR uses the old Debug() statements, which needs to be changed to the new Dbg / Ctl features:

../src/iocore/net/Server.cc:247:3: error: use of undeclared identifier 'Debug'
  Debug("numa", "[Server::listen] Attempting to create socket with family: %d, type: %d, protocol: %d", addr.sa.sa_family,
  ^
../src/iocore/net/Server.cc:256:3: error: use of undeclared identifier 'Debug'
  Debug("numa", "[Server::listen] Attempting to set up fd for listen with non_blocking: %d, options: %d", non_blocking, opt);
  ^

@zwoop zwoop added this to the 10.1.0 milestone Nov 15, 2024
Copy link
Contributor

@cmcfarlen cmcfarlen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this PR! It looks really interesting. Just a few comments to get ci happy. Mainly migrating Debug->Dbg. Looking forward to trying this out!

{
return details::splitter<Cript::string_view>(input, delim);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this chunk, perhaps a merge conflict issue.

int my_thread_id = this_ethread()->id;
int my_numa_node = this_ethread()->get_numa_node();

Debug("numa_sequencer", "[NUMASequencer] Thread %d (NUMA node %d) entered run_sequential.", my_thread_id, my_numa_node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug has been removed, please use the Dbg macro instead.

if (opt.f_mptcp) {
Dbg(dbg_ctl_connection, "Define socket with MPTCP");
prot = IPPROTO_MPTCP;
}

// Create the socket
Debug("numa", "[Server::listen] Attempting to create socket with family: %d, type: %d, protocol: %d", addr.sa.sa_family,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Debug lines were maybe for your debug? Could they be removed?

}

RamCache *
RamCacheContainer::get_cache(unsigned int my_node, unsigned int node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RamCacheContainer::get_cache(unsigned int my_node, unsigned int node)
RamCacheContainer::get_cache(unsigned int /* my_node ATS_UNUSED */, unsigned int node)


// returns true if consistent
static bool
check_pages_consistency(void *data, size_t size, const char *name = "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is unused unless NUMA_CONSISTENCY_CHECK is defined. Please add precompile check here.

}

static void
move_pages_to_current_numa_zone(void *data, size_t size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is unused


// If all threads have been added (assuming their number is equal to eventProcessor.net_threads), sort the thread IDs and set
// ready_to_run to true
if (thread_ids.size() == eventProcessor.net_threads) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get a signed/unsigned compare error for this line.

Copy link
Contributor

@cmcfarlen cmcfarlen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build is failing for systems without numa.h. I noted one place, but there could be other similar issues.

include/iocore/eventsystem/EThread.h Show resolved Hide resolved
@tlichwala
Copy link
Author

Hi all,
I've addressed the feedback and made the necessary changes. However, the automated tests are still showing build errors on some operating systems. Could someone please help review these issues?
Thank you!

@cmcfarlen
Copy link
Contributor

It looks like its still related to when numa is not available:

../src/iocore/net/QUICPacketHandler.cc: In member function 'virtual void QUICPacketHandlerIn::_recv_packet(int, UDPPacket*)':
../src/iocore/net/QUICPacketHandler.cc:270:65: error: no matching function for call to 'EventProcessor::assign_thread(const int&)'
  270 |     eth                           = eventProcessor.assign_thread(ET_NET);
      |                                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
In file included from ../src/iocore/net/../eventsystem/P_EventSystem.h:46,
                 from ../src/iocore/net/P_Net.h:92,
                 from ../src/iocore/net/QUICPacketHandler.cc:25:
../src/iocore/net/../eventsystem/P_UnixEventProcessor.h:54:1: note: candidate: 'EThread* EventProcessor::assign_thread(EventType, int)'
   54 | EventProcessor::assign_thread(EventType etype, int numa_node)
      | ^~~~~~~~~~~~~~
../src/iocore/net/../eventsystem/P_UnixEventProcessor.h:54:1: note:   candidate expects 2 arguments, 1 provided

#include <sched.h>
#if TS_USE_HWLOC
#include <hwloc.h>
#include <hwloc/glibc-sched.h>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unable to reproduce the compilation error that occurs when including the header file <hwloc/glibc-sched.h> on freebsd. The error message is: "Please make sure to include sched.h before including glibc-sched.h, and define _GNU_SOURCE before any inclusion of sched.h". I added the undesirable definition of the _GNU_SOURCE macro in the code, as shown below, to ensure that it is defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I'm not super familiar with freebsd, but I think this should be added to CXXFLAGS as part of the build setup, not in the source code. Is Freebsd able to find the numa library? This feature could be disabled for that OS for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _GNU_SOURCE macro is defined in the CMakeLists.txt file as follows:
list(APPEND CMAKE_REQUIRED_DEFINITIONS -D_GNU_SOURCE -DATS_BUILD)
I added the macro in the source code only for testing purposes to ensure that it is indeed defined.
It appears that the error I am encountering is not related to the numa.h library but rather to hwloc.
Thank you for your suggestion. If you have any further insights or recommendations, I would appreciate it.

@cmcfarlen
Copy link
Contributor

You might need to add more precompiler conditions for TS_USE_NUMA to keep the errors down for systems that don't support this.

@tlichwala tlichwala force-pushed the Numa-awareness-patch branch 4 times, most recently from 3b23f05 to 5f061f2 Compare December 13, 2024 11:45
@cmcfarlen
Copy link
Contributor

Unfortunately CMakeLists.txt is conflicting. You can just accept both the new and old line when resolving.

@tlichwala tlichwala force-pushed the Numa-awareness-patch branch from 5f061f2 to 9712d65 Compare January 9, 2025 08:51
 - Enable NUMA optimizations via CMake options
 - Introduce NUMA-aware thread assignment and memory allocation
 - Add NUMA debugging utilities
 - Enhance cache management for NUMA systems
@tlichwala tlichwala force-pushed the Numa-awareness-patch branch from 9712d65 to 91a6cc6 Compare January 9, 2025 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants