Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: use _init as the start of large pages #31947

Closed

Conversation

gabrielschulhof
Copy link
Contributor

The assumption that the .text section starts at the mapping following
the one that contains __executable_start is not valid on Skylake CPUs
where the whole code resides in one mapping.

OTOH, The symbol _init is usually located at the beginning of the
.text section and it does not need the assumption that the next
mapping has the .text section.

We also rename the section into which we place the remapping code to
lpstub. This causes the linker to produce symbols __start_lpstub
and __stop_lpstub, the latter of which we do not use. Still,
__start_lpstub helps us find the end of the .text section because on
Skylake this section is inserted before the end of the mapping, so we
use __start_lpstub as the end instead of the end of the mapping.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Feb 25, 2020
@gabrielschulhof
Copy link
Contributor Author

@suresh-srinivas @uttampawar PTAL!

@nodejs-github-bot
Copy link
Collaborator

@bnoordhuis
Copy link
Member

clang++ seems pretty averse to emitting _init and _fini symbols, probably because they've been deprecated for I don't know how long. Perhaps you can use main instead?

on Skylake CPUs where the whole code resides in one mapping

That's interesting, do you know why that is? The current parser can probably be made to accommodate that without switching to a different symbol.

@gabrielschulhof
Copy link
Contributor Author

gabrielschulhof commented Feb 25, 2020

@bnoordhuis I tried keeping __executable_start while retaining the MIN(__start_lpstub, end) on SKX, but it crashed. I'm pretty sure __executable_start never worked on Skylake. I don't know why it maps differently. Might be because this particular Skylake machine is running Ubuntu 18.04. I'll have a look where main resides within the .text section.

@gabrielschulhof
Copy link
Contributor Author

@bnoordhuis just looking at nm -v ./node I find that main comes pretty late in the game – I stopped looking when both openssl and v8 preceeded it.

@gabrielschulhof
Copy link
Contributor Author

gabrielschulhof commented Feb 25, 2020

@bnoordhuis got any particular combination of Linux version/clang version where clang doesn't produce _init? I just tested clang 8 locally and it produces _init.

@bnoordhuis
Copy link
Member

I'm testing with clang 9 but even after grepping through the llvm+clang+lld source tree I haven't really narrowed down what causes clang to emit them or not. 🤷‍♂

At any rate, does it matter where main is located? As long as it's somewhere in the text segment that you want to relocate, right?

@suresh-srinivas
Copy link
Contributor

Could you remind me why we are not using the linker script?

The default linker script with ld --verbose shows that it provides etext. We want to provide a symbol right at the start of .text.

@gabrielschulhof werent you trying an alternate way to use an assembly file to provide this as part of the first node build file?

  .text           :
  {
    *(.text.unlikely .text.*_unlikely .text.unlikely.*)
    *(.text.exit .text.exit.*)
    *(.text.startup .text.startup.*)
    *(.text.hot .text.hot.*)
    *(.text .stub .text.* .gnu.linkonce.t.*)
    /* .gnu.warning sections are handled specially by elf32.em.  */
    *(.gnu.warning)
  }
  .fini           :
  {
    KEEP (*(SORT_NONE(.fini)))
  }
  PROVIDE (__etext = .);
  PROVIDE (_etext = .);
  PROVIDE (etext = .);

@suresh-srinivas
Copy link
Contributor

I'm testing with clang 9 but even after grepping through the llvm+clang+lld source tree I haven't really narrowed down what causes clang to emit them or not.

At any rate, does it matter where main is located? As long as it's somewhere in the text segment that you want to relocate, right?

Yes it matters where main it is located, if it is located towards the end, then we may not relocate any of the .text. We want to relocate as much of the .text segment so finding a symbol that is at the very beginning is what we need.

@bnoordhuis
Copy link
Member

Could you remind me why we are not using the linker script?

They're hard to keep portable across ld.bfd, ld.gold and ld.ldd.

We want to relocate as much of the .text segment so finding a symbol that is at the very beginning is what we need.

We scan /proc/self/maps for a mapping that contains a symbol. I don't see why it matters if that symbol is at the start, the end or somewhere in the middle of that segment; it's copied whole.

@suresh-srinivas
Copy link
Contributor

Could you remind me why we are not using the linker script?

They're hard to keep portable across ld.bfd, ld.gold and ld.ldd.

I thought node is build only with gcc and ld. Are all these supported as valid linkers?

We want to relocate as much of the .text segment so finding a symbol that is at the very beginning is what we need.

We scan /proc/self/maps for a mapping that contains a symbol. I don't see why it matters if that symbol is at the start, the end or somewhere in the middle of that segment; it's copied whole.

The symbol is used to determine the start address of the region that we are copying from. The original code with the ld script was using it like this.


        uintptr_t ntext = reinterpret_cast<uintptr_t>(&__nodetext);
        if (ntext >= start && ntext < end) {
          char* from = reinterpret_cast<char*>(hugepage_align_up(ntext));
          char* to = reinterpret_cast<char*>(hugepage_align_down(end));


@gabrielschulhof
Copy link
Contributor Author

@suresh-srinivas I was, based on https://bugzilla.redhat.com/show_bug.cgi?id=927573, but it's really difficult to get the object file to be the very first file passed to the linker, and even then I was unable to get the symbol to go all the way to the beginning of the .text section.

@gabrielschulhof
Copy link
Contributor Author

gabrielschulhof commented Feb 25, 2020

We scan /proc/self/maps for a mapping that contains a symbol. I don't see why it matters if that symbol is at the start, the end or somewhere in the middle of that segment; it's copied whole.

Actually, we cannot copy the whole thing. For example, on Skylake there is a significant distance between __executable_start and _init, and __start_lpstub comes right before the end of the mapping. If we relocate from the start of the mapping (which is the same as __executable_start), we get a segfault.

The mapping may include multiple executable sections. We only want to relocate the .text section.

@gabrielschulhof
Copy link
Contributor Author

@bnoordhuis another way of looking at this is that it's not so much a question of whether or not clang generates _init as much as it is a question of whether our official binaries for Linux/x64 contain such a symbol. If they do, there should be no problem with the approach taken in this PR, because --use-largepages is a runtime option, after all, so it should work on any distro if people use the official release.

I also checked and it is distro-related rather than processor-related. I ran Ubuntu 18.04 on my laptop in a VM and it also produces the same kind of mapping as on the Skylake machine I ran it on earlier.

@gabrielschulhof
Copy link
Contributor Author

@bnoordhuis @devnexen @lundibundi I changed the implementation to not use _init statically, but rather to attempt to retrieve it using dlsym() so, if absent, it will do what it does now. Otherwise, it uses _init as the starting point and the minimum of end and __stop_lpstub as the ending point.

The assumption that the .text section starts at the mapping following
the one that contains `__executable_start` is not valid on 64-bit
Ubuntu 18.04 where the whole code resides in one mapping.

OTOH, The symbol `_init` is usually located at the beginning of the
.text section and it does not need the assumption that the next
mapping has the .text section. Thus, we use the symbol, if available,
to perform the mapping on Ubuntu 18.04.

We also rename the section into which we place the remapping code to
`lpstub`. This causes the linker to produce symbols `__start_lpstub`
and `__stop_lpstub`, the latter of which we do not use. Still,
`__start_lpstub` helps us find the end of the .text section because on
Ubuntu 18.04 this section is inserted before the end of the sole
mapping, so we use `__start_lpstub` as the end instead of the end of
the mapping.
@nodejs-github-bot
Copy link
Collaborator

@bnoordhuis
Copy link
Member

f we relocate from the start of the mapping (which is the same as __executable_start), we get a segfault.

You have to walk me through this. Ultimately, at least on Linux, the goal is to call madvise(MADV_HUGEPAGE) on the right memory range, isn't it? Consider this change:

diff --git a/src/node_main.cc b/src/node_main.cc
index e92c0df942..1a163ec2a6 100644
--- a/src/node_main.cc
+++ b/src/node_main.cc
@@ -75,6 +75,7 @@ int wmain(int argc, wchar_t* wargv[]) {
 // UNIX
 #ifdef __linux__
 #include <elf.h>
+#include <sys/mman.h>
 #ifdef __LP64__
 #define Elf_auxv_t Elf64_auxv_t
 #else
@@ -94,6 +95,12 @@ extern bool linux_at_secure;
 }  // namespace node
 
 int main(int argc, char* argv[]) {
+#if defined(__linux__)
+  constexpr uintptr_t k2MB = 1 << 21;
+  void* const addr =
+      reinterpret_cast<void*>(reinterpret_cast<uintptr_t>(&main) & -k2MB);
+  if (-1 == madvise(addr, k2MB, MADV_HUGEPAGE)) perror("madvise");
+#endif  // __linux__
 #if defined(__POSIX__) && defined(NODE_SHARED_MODE)
   // In node::PlatformInit(), we squash all signal handlers for non-shared lib
   // build. In order to run test cases against shared lib build, we also need

Apart from the lack of sophistication, isn't that sufficient to turn on THP?

@gabrielschulhof
Copy link
Contributor Author

@bnoordhuis currently the Node.js .text section takes up about 14-20 MiB, depending on whether you're using clang or gcc. The order in which the functions appear in the .text section is uncertain. We would like to capture as much of the .text section as possible, while leaving out the section containing the code that does the remapping itself. main() could be preceded by 1, 2, or even 12 pages of .text consisting of openssl, v8, and all the other stuff that's statically linked, while the rest of it could be following main().

If we do not capture as much of .text as possible, we risk leaving the hot parts of the code mapped onto 4KiB pages, thereby forfeiting the benefits of re-mapping to large pages.

I believe, and @suresh-srinivas can give more detail, that we cannot simply madvise() a memory range without having previously allocated it ourselves without a recent kernel patch. That's why the code copies the relevant code to a temporary region and then copies it back to a region that's been madvise()-ed to be backed by huge pages.

At any rate, please also have a look at #31981 which is an alternative implementation. Basically, instead of a linker script, it uses an object file compiled from an assembly file. The assembly introduces a symbol into the text section, much as the linker script did before. Since we cannot use a linker script to ensure that the symbol is placed at the beginning of the .text section, we must ensure that the object file resulting from the assembly file is the first item to be linked on the linker's command line. Assuming the linker places the code more or less in the order it encounters the object/archive files ones gives it, the symbol should end up at or near the beginning of the .text section and still give us a good chunk of it for remapping.

@gabrielschulhof
Copy link
Contributor Author

Closing in favour of #31981.

@gabrielschulhof gabrielschulhof deleted the text-start-via-_init branch March 13, 2020 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants