Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hyper scan library crash #20

Closed
TidyHuang opened this issue May 10, 2016 · 17 comments
Closed

hyper scan library crash #20

TidyHuang opened this issue May 10, 2016 · 17 comments

Comments

@TidyHuang
Copy link

TidyHuang commented May 10, 2016

Hello,

I have been developing an app with very simple function to compile a pattern and use the pattern to match which needs use Hyperscan library:

After compiling the hyperscan library with following options: -DCMAKE_BUILD_TYPE=debug -DBOOST_ROOT=${BOOST_ROOT} -DBUILD_SHARED_LIBS=1 -DCMAKE_INSTALL_PREFIX=${USR_LIB_PATH}.

The App runs as expected at my compiling machine[model name : Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz], and then I copied my app and libhs.so to another machine [model name : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz](Same X86 64bit machine).
however it crashed every time at pattern load function when I execute the app.

The compiling machine and running machine has difference cpuinfo, not sure it has relationship with this.

Could some one can help me out. Thanks.

Here are the crash info:

rgdb.sh core.simplegrep.14441 
file: compiled magic version [521] does not match with shared library magic version [524]
gdb ./simplegrep -c core.simplegrep.14441
GNU gdb (GDB) Amazon Linux (7.6.1-64.33.amzn1)
    Copyright (C) 2013 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-amazon-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>...
    Reading symbols from /home/david/simplegrep...done.
    [New LWP 14441]
    Missing separate debuginfo for /usr/lib64/libstdc++.so.6
    Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/0a/90c35d3174805453ea67a785446d628e298b59.debug
    Missing separate debuginfo for /lib64/libgcc_s.so.1
    Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/00/fa2883fb47b1327397bbf167c52f51a723d013.debug
    Core was generated by `./simplegrep outside ss.list'.
    Program terminated with signal 4, Illegal instruction.
#0  set_range (to=111, from=111, this=0x7ffee99b7910) at /home/tidy/work/app/hyperscan/hyperscan/src/util/bitfield.h:154
    154 /home/tidy/work/app/hyperscan/hyperscan/src/util/bitfield.h: No such file or directory.
    Missing separate debuginfos, use: debuginfo-install glibc-2.17-106.167.amzn1.x86_64
    (gdb) bt
#0  set_range (to=111, from=111, this=0x7ffee99b7910) at /home/tidy/work/app/hyperscan/hyperscan/src/util/bitfield.h:154
#1  setRange (to=111 'o', from=111 'o', this=0x7ffee99b7910) at /home/tidy/work/app/hyperscan/hyperscan/src/util/charreach.h:106
#2  CharReach (to=111 'o', from=111 'o', this=0x7ffee99b7910) at /home/tidy/work/app/hyperscan/hyperscan/src/util/charreach.h:62
#3  ue2::AsciiComponentClass::add (this=0x1c3aca0, c=111) at /home/tidy/work/app/hyperscan/hyperscan/src/parser/AsciiComponentClass.cpp:124
#4  0x00007fe4c5fddcb1 in ue2::getLiteralComponentClass (c=c@entry=111 'o', nocase=<optimized out>)
        at /home/tidy/work/app/hyperscan/hyperscan/src/parser/ComponentClass.cpp:421
#5  0x00007fe4c5e0b1cc in ue2::addLiteral (currentSeq=currentSeq@entry=0x1c3ac50, c=111 'o', mode=...)
            at /home/tidy/work/app/hyperscan/hyperscan/src/parser/Parser.rl:181
#6  0x00007fe4c5fe78ca in ue2::parse (c_ptr=c_ptr@entry=0x7ffee99ba6ca "outside", globalMode=...)
                at /home/tidy/work/app/hyperscan/hyperscan/src/parser/Parser.rl:1823
#7  0x00007fe4c5e1b7b1 in ue2::ParsedExpression::ParsedExpression (this=0x7ffee99b8570, index_in=<optimized out>, expression=0x7ffee99ba6ca "outside", flags=2, 
            actionId=<optimized out>, ext=0x0) at /home/tidy/work/app/hyperscan/hyperscan/src/compiler/compiler.cpp:116
#8  0x00007fe4c5e1d0c4 in ue2::addExpression (ng=..., index=index@entry=0, expression=0x7ffee99ba6ca "outside", flags=2, ext=0x0, id=0)
    at /home/tidy/work/app/hyperscan/hyperscan/src/compiler/compiler.cpp:236
#9  0x00007fe4c5e19305 in ue2::hs_compile_multi_int (expressions=expressions@entry=0x7ffee99b8a98, flags=flags@entry=0x7ffee99b8a94, ids=ids@entry=0x7ffee99b8aac, 
            ext=ext@entry=0x0, elements=elements@entry=1, mode=mode@entry=1, platform=platform@entry=0x0, db=db@entry=0x7ffee99b8bf0, comp_error=comp_error@entry=0x7ffee99b8bf8, g=...)
    at /home/tidy/work/app/hyperscan/hyperscan/src/hs.cpp:228
#10 0x00007fe4c5e19a13 in hs_compile (expression=expression@entry=0x7ffee99ba6ca "outside", flags=flags@entry=2, mode=mode@entry=1, platform=platform@entry=0x0, 
            db=db@entry=0x7ffee99b8bf0, error=error@entry=0x7ffee99b8bf8) at /home/tidy/work/app/hyperscan/hyperscan/src/hs.cpp:285
#11 0x0000000000400d31 in main (argc=<optimized out>, argv=<optimized out>) at /home/tidy/exam/simplegrep.c:163
(gdb) q

Compiling machine cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 69
model name : Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz
stepping : 1
microcode : 0x17
cpu MHz : 2300.092
cache size : 4096 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm ida arat epb pln pts dtherm fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid xsaveopt
bogomips : 4600.18
clflush size : 64
cache_alignment : 64
address sizes : 42 bits physical, 48 bits virtual
power management:

Running machine cpuinfo:

cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping : 4
microcode : 0x415
cpu MHz : 2494.028
cache size : 25600 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase smep erms xsaveopt
bugs :
bogomips : 4988.05
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

-Tidy

@mdb256
Copy link

mdb256 commented May 10, 2016

Yes, the difference is that the two machines support different instruction set extensions.

The machine you used to compile the HS library uses a 4th Gen i7 (which goes by the codename Haswell). This machine supports AVX2 and BMI1/2 instructions, and when we compile with -march=native GCC will emit AVX2/BMI2 instructions. These instructions will not work on the Xeon you have (codenamed Ivy Bridge), as it supports AVX, but not AVX2 or BMI2. This is the cause of the SIGILL

To fix this, you'll either need to compile Hyperscan on the Xeon, or configure the build on the first machine by passing in the correct -march=<xx> flags for your compiler. Different versions of GCC support differing arguments to this flag.

@crazy-william
Copy link

hi, mdb256, I'm one guy with TidyHuang.

I have a question, Where do you get the gen and supports of CPU type? When we use march flags, we will use the CPU type of running machine, is it right? Have we one method to let program run all the Intel X86_64 CPU? I don't get one through gcc manpage.
Thank you very much!

Our gcc version is:

gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Through "man gcc", I got this:

           i686
               When used with -march, the Pentium Pro instruction set is used, so the code runs on all i686 family chips.  When used with -mtune, it has the same meaning as
               generic.

           pentium2
               Intel Pentium II CPU, based on Pentium Pro core with MMX instruction set support.

           pentium3
           pentium3m
               Intel Pentium III CPU, based on Pentium Pro core with MMX and SSE instruction set support.

           pentium-m
               Intel Pentium M; low-power version of Intel Pentium III CPU with MMX, SSE and SSE2 instruction set support.  Used by Centrino notebooks.

           pentium4
           pentium4m
               Intel Pentium 4 CPU with MMX, SSE and SSE2 instruction set support.

           prescott
               Improved version of Intel Pentium 4 CPU with MMX, SSE, SSE2 and SSE3 instruction set support.

           nocona
               Improved version of Intel Pentium 4 CPU with 64-bit extensions, MMX, SSE, SSE2 and SSE3 instruction set support.

           core2
               Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 instruction set support.

           corei7
               Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 instruction set support.

           corei7-avx
               Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES and PCLMUL instruction set support.

           core-avx-i
               Intel Core CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C instruction set support.

           core-avx2
               Intel Core CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2 and F16C
               instruction set support.

           atom
               Intel Atom CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and SSSE3 instruction set support.

Thanks a lot.

@mdb256
Copy link

mdb256 commented May 11, 2016

Yes, you should choose the architecture of the machine that you plan to run Hyperscan on.

The list from GCC 4.8 is somewhat confusing - and I should note it has changed in newer versions of GCC - but the minimum feature set required for Hyperscan is core2. Your machines are very likely to support more than the minimum, most likely to also support SSE4.1/4.2 which would be the confusing name of corei7 from this list.

The Xeon v2 that was mentioned earlier is covered by core-avx-i - the extra features that this includes do allow some performance improvements in Hyperscan over the baseline of core2.

The Haswell that you first built Hyperscan on would be using the feature sets from core-avx2. Again, there are performance improvements from using more recent features, but if they aren't available on the machines you will be using, then you cannot build the library with these instructions.

@crazy-william
Copy link

OK, Thanks for soon reply again! It's so clear now.

@TidyHuang
Copy link
Author

TidyHuang commented May 11, 2016

Hi Matt,

Thanks for your kind and detail answer. 
Based on your suggestion, I've successfully fixed my crash issue partially.   And then I've done several times of experiment. 
My project has one executable program which depends on several dynamical libs: eg: lib1.so, lib2.so, lib3.so ... libn.so and libhs.so, these libraries are independent with each other, and lib1.so and lib2.so are preinstalled libraries (without extra compiling flags) at running VM.
1). The first time I build my project with CMAKE_C_FLAGS and CMAKE_CXX_FLAGS using -march=core-avx-i -march=generic for executable program and dependent libraries: lib3.so... libn.so and libhs.so at a 4th Gen i7 (codename Haswell), then the program can run well on the xeon. (codenamed Ivy Bridge),)
2). And then I tried to build my project with CMAKE_C_FLAGS and CMAKE_CXX_FLAGS using -march=core-avx-i -march=generic, either some libraries or the program uses the compiling flag, however, the program will be crashed as previous, which is very wired to me.

My questions:
a) why the first time can run well without crash since all the experiments the project are dependent on the preinstalled dynamical libraries: lib1.so, lib2.so.
b) What's the scope of the march=core-avx-i -march=generic will affect, that's to say, the program with hyperscan and its dependent libraries all should be compile with the " march=core-avx-i -march=generic " flags.

-Tidy

@TidyHuang TidyHuang reopened this May 16, 2016
@mdb256
Copy link

mdb256 commented May 17, 2016

I'm not quite sure I understand. Firstly I suspect you mean "-march=core-avx-i -mtune=generic". Specifying -march twice will usually mean the second one overrides the first. Also generic is not a valid argument for -march=.

If you compile Hyperscan with -march=core-avx-i it should not affect any other library. Is it possible there is still a version of the Hyperscan library built on the Haswell with -march=native in the dynamic library path?

@TidyHuang
Copy link
Author

TidyHuang commented May 18, 2016

Thanks Matt, there is a typo for mtune=generic. In theory�$B!$�(B there should be no existing library with march=native. two if us have done such testing. I'll use a clean VM to test and sperate depend lib one by one.

@starius
Copy link
Contributor

starius commented Jul 7, 2016

Can hyperscan detect CPU features in runtime?

When I compile hyperscan with -mavx2 and then disable AVX2 in hs_platform_info passed to hs_compile, I still get SIGILL in function getMask on machine without AVX2 support. Is it expected behaviour? If so, what is the purpose of hs_platform_info structure?

@mdb256
Copy link

mdb256 commented Jul 8, 2016

The hs_platform_info structure is for the Hyperscan compiler, and allows the HS compiler to determine which engines should be chosen while it builds the pattern database. Modifying hs_platform_info is the equivalent of cross-compiling for using hs_compile

Compiling the Hyperscan lib with -mavx2 means that the C/C++ compiler is free to generate AVX/2 instructions, VEX encoded SSE instructions, and use ymm registers - and these can occur at any time during execution, and not in any way that Hyperscan could detect or avoid on non-AVX2 platforms.

@starius
Copy link
Contributor

starius commented Jul 8, 2016

@mdb256, is it possible to enable -mavx2 only on files using AVX2? I see files *_avx2.c in hyperscan source tree. If only *_avx2.c are compiled with -mavx2, then it will be possible to build universal libhs.so file and delegate the decision of using AVX2 to hs_platform_info, not to cmake options. In this case one can put this universal libhs.so on 2 types of machines (with AVX2 and without AVX2), compile regular expressions to 2 bytecodes (AVX2-enabled bytecode and AVX2-disabled bytecode) using hs_platform_info and select in runtime what bytecode to use using runtime information of whether AVX2 is available for that machine.

Note: there are actually more than 2 instruction sets, so replace 2 with actual number.

Note 2: it would be even better if hyperscan had cross-platform bytecode.

@starius
Copy link
Contributor

starius commented Jul 9, 2016

Something like this:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 6710979..2a398e4 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -184,10 +184,10 @@ else()

     if (NOT CMAKE_C_FLAGS MATCHES .*march.*)
         message(STATUS "Building for current host CPU")
-        set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -march=native -mtune=native")
+        set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -mssse3")
     endif()
     if (NOT CMAKE_CXX_FLAGS MATCHES .*march.*)
-        set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -march=native -mtune=native")
+        set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -mssse3")
     endif()

     if(CMAKE_COMPILER_IS_GNUCC)
@@ -540,13 +540,16 @@ set (hs_exec_SRCS
     src/database.h
 )

-if (HAVE_AVX2)
     set (hs_exec_SRCS
         ${hs_exec_SRCS}
         src/fdr/teddy_avx2.c
         src/util/masked_move.c
         )
-endif ()
+    set_source_files_properties(
+        src/fdr/teddy_avx2.c
+        src/util/masked_move.c
+        PROPERTIES COMPILE_FLAGS -mavx2
+        )


 SET (hs_SRCS

@mdb256
Copy link

mdb256 commented Jul 10, 2016

Unfortunately it isn't as simple as just building some of the files with avx2 - those two files are only required for avx2 builds, but there are many more places we use avx2 instructions where they are available.

Similarly in the Hyperscan lib we use a mix of other microarch additions where we can, like sse4.2 (crc32), popcnt, bmi2 (pext, pdep), and more.

We have looked at building a "fat binary", or as you say a universal lib that supports as many different microarchitectures as required - but it is going to take some time, and has portability problems. Plus we need to be careful about mixing SSE and AVX instructions, as switching between them can incur expensive performance penalties.

@sadegh01
Copy link

sadegh01 commented Aug 20, 2016

which flag make it executable for all range of hardware's ?

@starius
Copy link
Contributor

starius commented Aug 28, 2016

@sadegh01, -DCMAKE_C_FLAGS="-march=core2" -DCMAKE_CXX_FLAGS="-march=core2" works for me.

@StefanBruens
Copy link

As the hs core is written in C++ (as far as I can see), wouldn't use of function multiversioning https://gcc.gnu.org/wiki/FunctionMultiVersioning be applicable here?

@mdb256
Copy link

mdb256 commented Dec 12, 2016

FMV seems to be a popular topic lately. I spent a while trying to make it work for Hyperscan, but it wasn't the right fit.

We have a working version of the fat runtime working that I mentioned above - it is still a bit experimental, but I'll be pushing the commits soon. It works by building n-copies of the runtime code (the C, not the C++) and uses the indirect function attribute to dispatch the right API function based on what the host platform supports.

@mdb256
Copy link

mdb256 commented Jan 20, 2017

Hyperscan v4.4 includes the fat runtime work for Linux, and this issue is becoming a collection of somewhat related items.

I'm going to close this issue, but if please open a new issue or contact us directly if there are any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants