-
Notifications
You must be signed in to change notification settings - Fork 12
Unable to Have More Than 1024 File Descriptors at Once #197
Comments
Uhm... I find puzzling that this is only triggered if you output the extras
files -- I don't see how that should relate to the sockets side of the
story. Is it possible that you're trying to use 1024 instances of the
driver? That would not really be necessary as i-PI does some scheduling and
can use the same driver for more than one bead - which is a good idea
unless you have 1024 processors on a single node.
…On 24 August 2017 at 20:37, heindelj ***@***.***> wrote:
Hi,
For some context, I have hooked up a potential to the driver code which
comes with i-PI as this is probably the easiest way to use a personal
potential as far as I can see. This works fine, but I need to print a
property for each bead (from the extras), and am also using more than 1024
beads in some simulations at very low temperatures.
When this is done, the following error is given:
Exception in thread poll_driver:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py",
line 810, in __bootstrap_inner
self.run()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py",
line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/Users/hein071/research/i-pi-dev/ipi/engine/forcefields.py", line
166, in _poll_loop
self.poll()
File "/Users/hein071/research/i-pi-dev/ipi/engine/forcefields.py", line
260, in poll
self.socket.poll()
File "/Users/hein071/research/i-pi-dev/ipi/interfaces/sockets.py", line
674, in poll
self.pool_update()
File "/Users/hein071/research/i-pi-dev/ipi/interfaces/sockets.py", line
514, in pool_update
readable, writable, errored = select.select([self.server], [], [],
searchtimeout)
ValueError: filedescriptor out of range in select()
To the best of my knowledge, this error is independent of whether a unix
or inet socket is used, but I have noted that more than 1024 beads are
possible if the extra files are not opened. I do not know if this is only a
problem when using the driver interface, or if using e.g. LAMMPS for forces
would have the same problem.
After doing some googling, this is a known limitation of select.select()
<https://docs.python.org/2/library/select.html>. I don't think it is
mentioned in the documentation I just linked, but it is noted in the NOTES
<http://man7.org/linux/man-pages/man2/select.2.html> sections at that
site. Specifically, FD_SETSIZE is 1024 on linux systems, so select() can
only monitor up to 1024 file descriptors at a time.
That being said, the problem can apparently be fixed with minimal changes
by using select.poll() rather than select.select(), but I do not know if I
can fix this properly myself, so I thought I would mention the problem
here. I believe the only real changes needed are that whenever a new file
descriptor is set, it needs to be registered using poll.register() and then
select.poll() needs to be called rather than select.select().
To be clear, this is not a bug in i-PI but a limitation of the python (and
hence underlying C) module select(), but there is a solution which can be
implemented in i-PI with only minor changes using poll(). Unfortunately,
the details have prevented me from being able to fix this myself.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#197>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABESZ1QzGyrRsg_Za20msw3QUU1eAaUCks5sbcLxgaJpZM4PBxSM>
.
|
I just checked again by running where I attempt to print the extras associated with 1536 beads, but only run 64 instances of the driver (2 nodes with 32 cores), and the same exception is raised. I believe it is because this error is not associated with the number of sockets open, but with the number of file descriptors total between all the sockets. Perhaps because all the file descriptors are handled by the i-PI instance, and the drivers never actually do any writing? (This is a guess as to what happens, so sorry if this is incorrect.) So, from experience I can run as many instances of the driver code as I want, 1 per replica, but I cannot write to an arbitrary number of files. I thought this might be that I just had the ulimit set too low, but that is not the problem sadly. See, for instance, the NOTES documentation I linked above or this SO thread. |
Hi, you can try to look what are the limits defined by your operating system using |
Hi @grhawk, I could reproduce this and I think that @heindelj is right, this is not ulimit-related. I don't understand why this gets only triggered when printing extras though - that has nothing to do with the socket machinery. Now, @heindelj honestly I do not see in the very near future us fixing a bug that is only triggered above 1024 beads (we're kind of focusing of non-PIMD use cases) but if you think you can substitute the select () call with a poll, I'd be very happy to review the bugfix and merge it. |
@ceriottm That's understandable. Honestly it's not a big deal because I only need to compute averages from what the extras prints so there's really no need to have all the files printed and I can just use fewer beads in the average. I believe I have seen you do this in a paper as well (the one with Felix Uhl). I will have some free time in the next couple weeks and I'll see if I can fix this, even though it's really not much of an issue. And thanks for the tip on pyFFTW. I have noticed a deterioration and was unsure of the cause. |
Something that would be hyper-useful and perhaps it's not too hard to implement is to be able to specify a range of beads in the trajectory outputs. I mean, you can already say <trajectory bead="0" ...> but it would be fantastic to say and have it to the right thing. Fancy some coding :-) ? |
I encountered this very thing yesterday! Instead I just used a loop on the command line and printed the same line 256 times with different bead numbers. Not the prettiest input file :) I'm sure I could find a way to add that functionality. |
Something like a stride for imaginary time like we have for real time could
be nice and very useful. The cost of printing xyz traj files also kicks in
for very large number of beads which could be alleviated by this option.
…On Aug 29, 2017 11:36 PM, "heindelj" ***@***.***> wrote:
I encountered this very thing yesterday! Instead I just used a loop on the
command line and printed the same line 256 times with different bead
numbers. Not the prettiest input file :)
I'm sure I could find a way to add that functionality.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#197 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKQjG1MtqiYqicwCxw7rQ-xtG6xVZyZpks5sdIQ9gaJpZM4PBxSM>
.
|
Let me summarize the discussion and confirm the problem. I could reproduce the issue by first increasing the ulimit:
Note that one needs ulimit greater than 1024, in order to get this error, otherwise, "too many open files" error is thrown. So there are two tasks related to the issue:
|
Some more input about the bug (1.). The error is triggered in The filedescriptor is over the limit is due to the fact that in this example large number of output files is used. This results in a large descriptor for the socket, which is over the python limit for select. Given my limited knowledge about socket machinery, I have not been able to fix the problem in the short time available. As reported, a suggested solution is to use poll() instead of select(). |
Hi,
For some context, I have hooked up a potential to the driver code which comes with i-PI as this is probably the easiest way to use a personal potential as far as I can see. This works fine, but I need to print a property for each bead (from the extras), and am also using more than 1024 beads in some simulations at very low temperatures.
When this is done, the following error is given:
Exception in thread poll_driver:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/Users/hein071/research/i-pi-dev/ipi/engine/forcefields.py", line 166, in _poll_loop
self.poll()
File "/Users/hein071/research/i-pi-dev/ipi/engine/forcefields.py", line 260, in poll
self.socket.poll()
File "/Users/hein071/research/i-pi-dev/ipi/interfaces/sockets.py", line 674, in poll
self.pool_update()
File "/Users/hein071/research/i-pi-dev/ipi/interfaces/sockets.py", line 514, in pool_update
readable, writable, errored = select.select([self.server], [], [], searchtimeout)
ValueError: filedescriptor out of range in select()
To the best of my knowledge, this error is independent of whether a unix or inet socket is used, but I have noted that more than 1024 beads are possible if the extra files are not opened. I do not know if this is only a problem when using the driver interface, or if using e.g. LAMMPS for forces would have the same problem.
After doing some googling, this is a known limitation of select.select(). I don't think it is mentioned in the documentation I just linked, but it is noted in the NOTES sections at that site. Specifically, FD_SETSIZE is 1024 on linux systems, so select() can only monitor up to 1024 file descriptors at a time.
That being said, the problem can apparently be fixed with minimal changes by using select.poll() rather than select.select(), but I do not know if I can fix this properly myself, so I thought I would mention the problem here. I believe the only real changes needed are that whenever a new file descriptor is set, it needs to be registered using poll.register() and then select.poll() needs to be called rather than select.select().
To be clear, this is not a bug in i-PI but a limitation of the python (and hence underlying C) module select(), but there is a solution which can be implemented in i-PI with only minor changes using poll(). Unfortunately, the details have prevented me from being able to fix this myself.
The text was updated successfully, but these errors were encountered: