Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instruction issue on older CPUs #3

Closed
matted-zz opened this issue Aug 25, 2016 · 2 comments
Closed

Illegal instruction issue on older CPUs #3

matted-zz opened this issue Aug 25, 2016 · 2 comments
Assignees

Comments

@matted-zz
Copy link
Member

Reported by Robert Küffner:

Illegal instruction (core dumped)
...
Traceback (most recent call last):
  File "rundeepsea.py", line 27, in <module>
    check_call(["luajit 2_DeepSEA.lua -test_file_h5 "+tempdir+"/infile.fasta.ref.h5"],shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['luajit 2_DeepSEA.lua -test_file_h5 /tmp/tmpk_5oxX/infile.fasta.ref.h5']' returned non-zero exit status 132

our CPU according to cat /proc/cpuinfo:

vendor_id    : GenuineIntel
cpu family    : 6
model        : 15
model name    : Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
@matted-zz
Copy link
Member Author

I looked into this, and initially thought it was an OpenBLAS installation issue, according to hints here and here.

However, it seems like the illegal instruction is coming from Torch itself:

=> 0x00007ffff62394d2 <+210>:   pcmpeqq %xmm7,%xmm0

in:

#0  0x00007ffff62394d2 in THRandom_nextState () from /root/torch/install/lib/libTH.so.0
#1  0x00007ffff62396af in THRandom_random () from /root/torch/install/lib/libTH.so.0
#2  0x00007ffff6239714 in THRandom_uniform () from /root/torch/install/lib/libTH.so.0
#3  0x00007ffff6087578 in THDoubleTensor_uniform () from /root/torch/install/lib/libTH.so.0
#4  0x00007ffff6720230 in m_torch_DoubleTensor_uniform () from /root/torch/install/lib/lua/5.1/libtorch.so
#5  0x000000000047db8a in lj_BC_FUNCC ()
#6  0x00007ffff64837c4 in luaT_cmt__call () from /root/torch/install/lib/libluaT.so.0
#7  0x000000000047db8a in lj_BC_FUNCC ()
#8  0x000000000046baed in lj_cf_package_require ()
#9  0x000000000047db8a in lj_BC_FUNCC ()
#10 0x000000000046baed in lj_cf_package_require ()
#11 0x000000000047db8a in lj_BC_FUNCC ()
#12 0x000000000046d12d in lua_pcall ()
#13 0x0000000000406f4f in pmain ()
#14 0x000000000047db8a in lj_BC_FUNCC ()
#15 0x000000000046d1a7 in lua_cpcall ()
#16 0x0000000000404f04 in main ()

I'm investigating ways to control the Torch build process to be less aggressive in the instruction sets it uses (no AVX/AVX2 would be better, I think), but it may involve changes to the upstream image, or rebuilding Torch inside this image.

@matted-zz matted-zz self-assigned this Aug 25, 2016
@matted-zz
Copy link
Member Author

I added a workaround (and documentation) in b0f5451. I couldn't figure out how to use environment variables to control the Torch build process, which would let us simply have multiple concurrent auto-built images. This should be a rare problem, though, so asking the user to rebuild locally seems fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant