Illegal instruction issue on older CPUs #3

matted-zz · 2016-08-25T16:11:20Z

Reported by Robert Küffner:

Illegal instruction (core dumped)
...
Traceback (most recent call last):
  File "rundeepsea.py", line 27, in <module>
    check_call(["luajit 2_DeepSEA.lua -test_file_h5 "+tempdir+"/infile.fasta.ref.h5"],shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['luajit 2_DeepSEA.lua -test_file_h5 /tmp/tmpk_5oxX/infile.fasta.ref.h5']' returned non-zero exit status 132

our CPU according to cat /proc/cpuinfo:

vendor_id    : GenuineIntel
cpu family    : 6
model        : 15
model name    : Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

The text was updated successfully, but these errors were encountered:

matted-zz · 2016-08-25T16:14:27Z

I looked into this, and initially thought it was an OpenBLAS installation issue, according to hints here and here.

However, it seems like the illegal instruction is coming from Torch itself:

=> 0x00007ffff62394d2 <+210>:   pcmpeqq %xmm7,%xmm0

in:

#0  0x00007ffff62394d2 in THRandom_nextState () from /root/torch/install/lib/libTH.so.0
#1  0x00007ffff62396af in THRandom_random () from /root/torch/install/lib/libTH.so.0
#2  0x00007ffff6239714 in THRandom_uniform () from /root/torch/install/lib/libTH.so.0
#3  0x00007ffff6087578 in THDoubleTensor_uniform () from /root/torch/install/lib/libTH.so.0
#4  0x00007ffff6720230 in m_torch_DoubleTensor_uniform () from /root/torch/install/lib/lua/5.1/libtorch.so
#5  0x000000000047db8a in lj_BC_FUNCC ()
#6  0x00007ffff64837c4 in luaT_cmt__call () from /root/torch/install/lib/libluaT.so.0
#7  0x000000000047db8a in lj_BC_FUNCC ()
#8  0x000000000046baed in lj_cf_package_require ()
#9  0x000000000047db8a in lj_BC_FUNCC ()
#10 0x000000000046baed in lj_cf_package_require ()
#11 0x000000000047db8a in lj_BC_FUNCC ()
#12 0x000000000046d12d in lua_pcall ()
#13 0x0000000000406f4f in pmain ()
#14 0x000000000047db8a in lj_BC_FUNCC ()
#15 0x000000000046d1a7 in lua_cpcall ()
#16 0x0000000000404f04 in main ()

I'm investigating ways to control the Torch build process to be less aggressive in the instruction sets it uses (no AVX/AVX2 would be better, I think), but it may involve changes to the upstream image, or rebuilding Torch inside this image.

matted-zz · 2016-08-29T17:35:57Z

I added a workaround (and documentation) in b0f5451. I couldn't figure out how to use environment variables to control the Torch build process, which would let us simply have multiple concurrent auto-built images. This should be a rare problem, though, so asking the user to rebuild locally seems fine.

matted-zz self-assigned this Aug 25, 2016

matted-zz closed this as completed Aug 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Illegal instruction issue on older CPUs #3

Illegal instruction issue on older CPUs #3

matted-zz commented Aug 25, 2016

matted-zz commented Aug 25, 2016

matted-zz commented Aug 29, 2016

Illegal instruction issue on older CPUs #3

Illegal instruction issue on older CPUs #3

Comments

matted-zz commented Aug 25, 2016

matted-zz commented Aug 25, 2016

matted-zz commented Aug 29, 2016