Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault with Intel Compiler #58

Closed
dakep opened this issue Nov 1, 2019 · 10 comments
Closed

Segmentation Fault with Intel Compiler #58

dakep opened this issue Nov 1, 2019 · 10 comments

Comments

@dakep
Copy link
Contributor

dakep commented Nov 1, 2019

I came across a very strange problem in tinyformat v2.2.0. Even the most trivial of examples

#include "tinyformat.h"

int main(int argc, char** argv) {
  auto str = tfm::format("Got %d arguments", argc);
  return str.size();
}

causes a segfault when compiling with icpc -std=gnu++11 -O0. There is no segfault when compiling with any optimization level greater than 0.
The backtrace is

#0  0x00007fffffff9808 in ?? ()
#1  0x000000000040216f in tinyformat::detail::FormatArg::format (this=0x7fffffff9768, out=..., fmtBegin=0x404738 "%d arguments", fmtEnd=0x40473a " arguments", ntrunc=-1) at tinyformat.h:513
#2  0x0000000000403994 in tinyformat::detail::formatImpl (out=..., fmt=0x404738 "%d arguments", formatters=0x7fffffff9768, numFormatters=1) at tinyformat.h:812
#3  0x0000000000403ec6 in tinyformat::vformat (out=..., fmt=0x404734 "Got %d arguments", list=...) at tinyformat.h:956
#4  0x0000000000403fb5 in tinyformat::format<int> (out=..., fmt=0x404734 "Got %d arguments", args=@0x7fffffff99a8: 1) at tinyformat.h:966
#5  0x0000000000403f49 in tinyformat::format<int> (fmt=0x404734 "Got %d arguments", args=@0x7fffffff99a8: 1) at tinyformat.h:975
#6  0x00000000004018e2 in main (argc=1, argv=0x7fffffff9ac8) at tfm_bug.cc:4

I tried with two different versions of icpc

  • icpc (ICC) 18.0.3 20180410 on RHEL7 (3.10.0-957.27.2.el7.x86_64)
  • icpc (ICC) 19.0.5.281 20190815 RHEL7 (3.10.0-1062.1.2.el7.x86_64)
@c42f
Copy link
Owner

c42f commented Nov 1, 2019

This is weird. Can you compile with debug info (-g I presume on icc), run with gdb and show the results of info locals and info args? I'm not sure what to make of that instruction pointer at 0x00007fffffff9808 either. That sure does seem more like a data address when compared to the value of this etc...

@dakep
Copy link
Contributor Author

dakep commented Nov 5, 2019

Yes, the address looks more like a data address. Here's what I get from gdb:

Breakpoint 1, tinyformat::detail::FormatArg::format (this=0x7fffffffa658, out=..., fmtBegin=0x404ae8 "%d arguments", fmtEnd=0x404aea " arguments", ntrunc=-1) at tinyformat.h:517
517	            TINYFORMAT_ASSERT(m_value);
(gdb) info locals
No locals.
(gdb) info args
this = 0x7fffffffa658
out = @0x7fffffffa6f8: <incomplete type>
fmtBegin = 0x404ae8 "%d arguments"
fmtEnd = 0x404aea " arguments"
ntrunc = -1
(gdb) p *this
$4 = {m_value = 0x404222 <tinyformat::vformat(std::ostream&, char const*, tinyformat::FormatList const&)+62>, m_formatImpl = 0x7fffffffa6f8, m_toIntImpl = 0x404ae4}
(gdb) p tinyformat::detail::FormatArg::formatImpl<int>
$3 = {void (std::ostream &, const char *, const char *, int, const void *)} 0x402244 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>

The correct formatImpl function is compiled into the program, but for whatever reason, the address is not properly stored in m_formatImpl. Another weird thing is that the address stored in m_value seems to point to a function?!

When I step through the program, the addresses seem to be correct inside the constructors tinyformat::detail::FormatArg::FormatArg and tinyformat::detail::FormatListN<1>::FormatListN:

(gdb) bt
#0  tinyformat::detail::FormatArg::FormatArg<int> (this=0x7fffffffa658, value=@0x7fffffffa898: 1) at tinyformat.h:512
#1  0x00000000004022fe in tinyformat::detail::FormatListN<1>::FormatListN<int> (this=0x7fffffffa648, args=@0x7fffffffa898: 1) at tinyformat.h:966
#2  0x0000000000404337 in tinyformat::makeFormatList<int> (args=@0x7fffffffa898: 1) at tinyformat.h:1015
#3  0x00000000004042f7 in tinyformat::format<int> (out=..., fmt=0x404ae4 "Got %d arguments", args=@0x7fffffffa898: 1) at tinyformat.h:1051
#4  0x00000000004042a5 in tinyformat::format<int> (fmt=0x404ae4 "Got %d arguments", args=@0x7fffffffa898: 1) at tinyformat.h:1060
#5  0x00000000004018e2 in main (argc=1, argv=0x7fffffffa9b8) at tfmbug.cc:4
(gdb) p *this
$12 = {m_value = 0x7fffffffa898, m_formatImpl = 0x402244 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>, 
  m_toIntImpl = 0x402298 <tinyformat::detail::FormatArg::toIntImpl<int>(void const*)>}
(gdb) s
tinyformat::detail::FormatListN<1>::FormatListN<int> (this=0x7fffffffa648, args=@0x7fffffffa898: 1) at tinyformat.h:967
967	        { static_assert(sizeof...(args) == N, "Number of args must be N"); }
(gdb) p *this
$15 = {<tinyformat::FormatList> = {m_args = 0x7fffffffa658, m_N = 1}, m_formatterStore = {{m_value = 0x7fffffffa898, 
      m_formatImpl = 0x402244 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>, 
      m_toIntImpl = 0x402298 <tinyformat::detail::FormatArg::toIntImpl<int>(void const*)>}}}

Once I step out of the tinyformat::detail::FormatListN constructor, however, something seems off. Inside tinyformat::vformat I have

(gdb) p *list.m_args
$18 = {m_value = 0x7fffffffa898, m_formatImpl = 0x7fffffffa6f8, m_toIntImpl = 0x404ae4}

and in the next step in tinyformat::detail::formatImpl I get

(gdb) p *args
$17 = {m_value = 0x404222 <tinyformat::vformat(std::ostream&, char const*, tinyformat::FormatList const&)+62>, m_formatImpl = 0x7fffffffa6f8, m_toIntImpl = 0x404ae4}

I don't know why, but something seems to be going wrong when assigning to FormatListRef. When I debug using -O1 optimizations, everything works as expected and the value of args in tinyformat::detail::formatImpl is correct:

(gdb) print *args
$2 = {m_value = 0x7fffffffa8a8, m_formatImpl = 0x403010 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>, 
  m_toIntImpl = 0x4035e4 <tinyformat::detail::FormatArg::toIntImpl<int>(void const*)>}

@c42f
Copy link
Owner

c42f commented Nov 5, 2019

Wow, great sleuthing. That's exactly the information we need (I can't track this down as I don't have an icc compiler license, though I applied for one.)

To me this is looking rather like a bizarre compiler bug. The other option is that I'm committing some obscure form of UB somewhere by creating my own quasi-vtable-like thing with the m_formatImpl pointer.

At this point I think the best option is to progressively minimize tinyformat.h by deleting functionality until you've got a minimal reproduction. Then we might discover exactly what triggers this.

Yes, the address looks more like a data address. Here's what I get from gdb:

...
(gdb) p *this
$4 = {m_value = 0x404222 <tinyformat::vformat(std::ostream&, char const*, tinyformat::FormatList const&)+62>, m_formatImpl = 0x7fffffffa6f8, m_toIntImpl = 0x404ae4}
(gdb) p tinyformat::detail::FormatArg::formatImpl<int>
$3 = {void (std::ostream &, const char *, const char *, int, const void *)} 0x402244 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>

It looks like the pointer m_value contains the pointer to m_formatImpl here??

Another option for debugging here might be to just printf the values of these pointers from inside Format::format(). (Are we sure that gdb is not lying to us here :-) ?)

@dakep
Copy link
Contributor Author

dakep commented Nov 7, 2019

I created a small example (I don't know if it's actually minimal but it should be close) to recreate the problem:

class FormatArg {
 public:
  FormatArg(const int& value)
    : m_value(static_cast<const void*>(&value)), m_formatImpl(&formatImpl) {}

  void format() const {
      m_formatImpl(m_value);
  }

 private:
  static void formatImpl(const void* value) {
    const int true_val = *static_cast<const int*>(value);
  }

  const void* m_value;
  void (*m_formatImpl)(const void* value);
};

class FormatList {
 public:
  FormatList(FormatArg* args) : m_args(args) {}
  friend int main(int argc, char** argv);

 private:
  const FormatArg* m_args;
  int m_N;
};

class FormatList1 : public FormatList {
 public:
  FormatList1(const int& arg) : FormatList(&m_formatterStore[0]),
      m_formatterStore{ FormatArg(arg) } {}

 private:
  FormatArg m_formatterStore[1];
};

FormatList1 makeFormatList(const int& arg) {
  return FormatList1(arg);
}

int main(int argc, char** argv) {
  // This gives a segfault when using icpc
  makeFormatList(argc).m_args[0].format();
  // This would be fine...
  // FormatList1(argc).m_args[0].format();
  return 0;
}

The segfault is introduced in the makeFormatList function. I assume that return value optimization "fixes" the problem for icpc, that's why there's no segfault for -O1 and up.

I can't find any UB in this example, but that doesn't mean anything. However, a further indication of a compiler problem is that when you omit FormatList::m_N from the class definition, there's no segfault.

@dakep dakep closed this as completed Nov 7, 2019
@dakep dakep reopened this Nov 7, 2019
@c42f
Copy link
Owner

c42f commented Nov 7, 2019

I can't see any problem with this code either. I think this is minimal enough that you could report it upstream to icc and see what happens.

when you omit FormatList::m_N from the class definition, there's no segfault

If you can find a workaround you can put behind a small #ifdef __INTEL_COMPILER (maybe injecting some dummy arguments into the class or some such) I'd be happy to merge that. If not, I'm not sure what else we can do here.

@dakep
Copy link
Contributor Author

dakep commented Nov 7, 2019

I posted the issue on the Intel Developer Forum and someone spotted the problem.
The function makeFormatList creates a temporary object on the stack. The returned copy still holds a pointer to the memory allocated for the temporary object, which is invalid when the temporary goes out of scope.

To fix the issue, detail::FormatListN simply needs a copy constructor. I created PR #59.

c42f pushed a commit that referenced this issue Nov 8, 2019
When copying a detail::FormatListN object, the m_formatterStore must also be copied, otherwise the pointer in the parent FormatList class are possibly invalid.

This addresses the segfault explained in #58. The issue arises when makeFormatList returns an object which holds a pointer to memory created for the temporary object on the stack.
@c42f
Copy link
Owner

c42f commented Nov 8, 2019

Great! A big thanks to whoever pointed that out, and thanks for communicating it back here.

I tried clang's -fsanitize=undefined, but unfortunately it can't see the error because the constructor is elided even with -O0. It's necessary to use the -fno-elide-constructors option which then reproduces this issue with both g++ and clang++. -fsanitize=undefined reports the problem as a stack overflow though which it isn't. According to the g++ docs, C++17 compilers are now actually required to elide the copy constructor so icc may also change the default in the future.

@c42f c42f closed this as completed Nov 8, 2019
@c42f
Copy link
Owner

c42f commented Nov 8, 2019

Actually there's another way we could have debugged this which I should have thought of - it's possible to get valgrind to report the read of uninitialized memory. (Though for g++ only if -fno-elide-constructors is passed explicitly, which require knowing what the error was.)

g++ -Wall -O0 -g -fno-elide-constructors -fno-omit-frame-pointer tinyformat_test.cpp -o a.out
valgrind --tool=memcheck --track-origins=yes ./a.out

@dakep
Copy link
Contributor Author

dakep commented Nov 8, 2019

Thanks for merging the fix. At least now I've learned that gcc & clang elide constructors even if setting -O0! Probably something I'll check for more carefully in the future.

@c42f
Copy link
Owner

c42f commented Nov 9, 2019

Yes, that was surprising to me. It seems with C++17 that the language has moved to this being considered a feature of the language rather than an optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants