Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling /proc/mounts with long lines #142

Closed
hjelmn opened this issue Jan 6, 2016 · 6 comments
Closed

Error handling /proc/mounts with long lines #142

hjelmn opened this issue Jan 6, 2016 · 6 comments

Comments

@hjelmn
Copy link
Member

hjelmn commented Jan 6, 2016

I have a system with unusually long lines in /proc/mounts (larger than 512 characters). This is causing parsing errors in hwloc_find_linux_cpuset_mntpnt. Ideally this could be fixed by using setmntent/getmntent/etc. Is there a reason why (as the comment states) that hwloc needs a relative open here? From what I understand the behavior of open and openat is identical when given an absolute path (such as /proc/mounts).

@bgoglin
Copy link
Contributor

bgoglin commented Jan 6, 2016

Hello

I knew this code would break one day :/

We need the relative openat() so that we can load sys/proc filesystems gathered by hwloc-gather-topology for offline regression testing, instead of the local sys/proc files. openat() makes that code generic and simple everywhere in the Linux backend. That said, in yout case, we always open /proc/mounts, so the generic openat() code isn't that useful. I'll try to just setmnt(filesystemroot+"/proc/mounts") directly.

For the record, what makes your /proc/mounts line so long?

@hjelmn
Copy link
Member Author

hjelmn commented Jan 6, 2016

The lines are long because of the way Cray mounts their system images. Here is an example:

/var/opt/cray/imps /opt/intel dvs ro,relatime,blksize=32768,nodename=c0-0c1s9n3:c0-1c1s9n3:c0-2c1s9n3:c0-3c1s9n3:c0-4c1s9n3:c1-0c1s9n3:c1-1c1s9n3:c1-2c1s9n3:c1-3c1s9n3:c1-4c1s9n3:c2-0c1s9n3:c2-1c1s9n3:c2-2c1s9n3:c2-3c1s9n3:c2-4c1s9n3:c3-0c1s9n3:c3-1c1s9n3:c3-2c1s9n3:c3-3c1s9n3:c3-4c1s9n3:c4-0c1s9n3:c4-1c1s9n3:c4-2c1s9n3:c4-3c1s9n3:c4-4c1s9n3:c5-0c1s9n3:c5-1c1s9n3:c5-2c1s9n3:c5-3c1s9n3:c5-4c1s9n3:c6-0c1s9n3:c6-1c1s9n3:c6-2c1s9n3:c6-3c1s9n3:c7-0c1s9n3:c7-1c1s9n3:c7-2c1s9n3:c7-3c1s9n3:c8-0c1s9n3:c8-1c1s9n3:c8-2c1s9n3:c8-3c1s9n3:c9-0c1s9n3:c9-1c1s9n3:c9-2c1s9n3:c9-3c1s9n3:c10-0c1s9n3:c10-1c1s9n3:c10-2c1s9n3:c10-3c1s9n3:c11-0c1s9n3:c11-1c1s9n3:c11-2c1s9n3:c11-3c1s9n3,nodefile=/proc/fs/dvs/mounts/1/nodenames,attrcache_timeout=14400,nodwfs,nomultifsync,cache,nodatasync,noclosesync,retry,failover,userenv,clusterfs,killprocess,noatomic,nodeferopens,no_distribute_create_ops,no_ro_cache,loadbalance,maxnodes=1,nnodes=54,hash=fnv-1a 0 0

@bgoglin
Copy link
Contributor

bgoglin commented Jan 6, 2016

Ok. Which hwloc or OMPI are you using? (to make sure my upcoming patch will apply)

@hjelmn
Copy link
Member Author

hjelmn commented Jan 6, 2016

Both ompi master and v2.x will need to be patched.

bgoglin added a commit to bgoglin/hwloc that referenced this issue Jan 6, 2016
setmntent() doesn't support root_fd, but manual parsing of /proc/mounts is fragile,
and actually buggy for very long mount lines
(see open-mpi#142 (comment)).

Since we only openat("/proc/mounts") there, just manually concatenate the fsroot_path
and use setmntent().

Thanks to Nathan Hjelm for the report.

Fixes open-mpi#142.

(cherry picked from commit 41833e7)
@bgoglin
Copy link
Contributor

bgoglin commented Jan 6, 2016

Here's a patch against hwloc v1.11.2 (pretty close to what's in OMPI iirc)
bgoglin@d2d07b9

@bgoglin bgoglin closed this as completed in 41833e7 Jan 7, 2016
hjelmn added a commit to hjelmn/ompi that referenced this issue Jan 7, 2016
setmntent() doesn't support root_fd, but manual parsing of
/proc/mounts is fragile, and actually buggy for very long mount lines
(see open-mpi/hwloc#142 (comment)).

Since we only openat("/proc/mounts") there, just manually concatenate
the fsroot_path and use setmntent().

Thanks to Nathan Hjelm for the report.

(Cherry-picked from open-mpi/hwloc@d2d07b9)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
hjelmn added a commit to hjelmn/ompi that referenced this issue Jan 7, 2016
setmntent() doesn't support root_fd, but manual parsing of
/proc/mounts is fragile, and actually buggy for very long mount lines
(see open-mpi/hwloc#142 (comment)).

Since we only openat("/proc/mounts") there, just manually concatenate
the fsroot_path and use setmntent().

Thanks to Nathan Hjelm for the report.

(Cherry-picked from open-mpi/hwloc@d2d07b9)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
hjelmn added a commit to hjelmn/ompi-release that referenced this issue Jan 7, 2016
setmntent() doesn't support root_fd, but manual parsing of
/proc/mounts is fragile, and actually buggy for very long mount lines
(see open-mpi/hwloc#142 (comment)).

Since we only openat("/proc/mounts") there, just manually concatenate
the fsroot_path and use setmntent().

Thanks to Nathan Hjelm for the report.

(Cherry-picked from open-mpi/hwloc@d2d07b9)
(cherry picked from open-mpi/ompi@15007b4)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
ronawho pushed a commit to ronawho/chapel that referenced this issue Mar 12, 2016
setmntent() doesn't support root_fd, but manual parsing of /proc/mounts is fragile,
and actually buggy for very long mount lines
(see open-mpi/hwloc#142 (comment)).

Since we only openat("/proc/mounts") there, just manually concatenate the fsroot_path
and use setmntent().

Thanks to Nathan Hjelm for the report.

Fixes chapel-lang#142.

(cherry picked from commit 41833e7ac317d80d7cd7ce31f43d9508cea58522)
ronawho pushed a commit to ronawho/chapel that referenced this issue Mar 13, 2016
setmntent() doesn't support root_fd, but manual parsing of /proc/mounts is fragile,
and actually buggy for very long mount lines
(see open-mpi/hwloc#142 (comment)).

Since we only openat("/proc/mounts") there, just manually concatenate the fsroot_path
and use setmntent().

Thanks to Nathan Hjelm for the report.

Fixes chapel-lang#142.

(cherry picked from commit 41833e7ac317d80d7cd7ce31f43d9508cea58522)
ronawho added a commit to chapel-lang/chapel that referenced this issue Mar 14, 2016
Fix hwloc /proc/mounts overflow

[ok'd by @gbtitus]

On some newer versions of CLE we were encountering an issue where long
/proc/mounts entries were overflowing a 512 byte buffer. It turns out this
issue was already reported and fixed on hwloc/master, but it won't make it into
an official version before our release:

  open-mpi/hwloc#142

This just cherry-picks that commit and updates our hwloc README to mention it.
@bgoglin
Copy link
Contributor

bgoglin commented Jul 7, 2016

This fix breaks thread-safety because of getmntent(). By the way getmntent() is actually internally limited to 4kB lines. getmntent_r() is thread-safe and supports larger buffers, but it cannot report truncation errors anyway. See #194 for more details.

bgoglin added a commit that referenced this issue Jul 8, 2016
getmntent() isn't thread safe, it uses a static internal struct mntent.
Two concurrent hwloc_topology_load() would break if they call getmntent()
(for finding the cgroup directory) at the same time.

Use getmntent_r() instead. However we have to specify a buffer for
storing mntent strings. And getmntent_r() doesn't actually report an
error if the buffer is too small, it silently truncates output strings,
so we can't dynamically realloc that buffer.

The getmntent() that we used before this commit was internally limited
to 4kB. And Linux actually limits mount options to 3 pages (during mount,
not when reading /proc/mounts). So use 4 pages to be above both.

Thanks to Corentin Rossignon for reporting the issue
(using gcc -fsanitize=thread)

Fixes #194

Refs #142
bgoglin added a commit that referenced this issue Jul 8, 2016
getmntent() isn't thread safe, it uses a static internal struct mntent.
Two concurrent hwloc_topology_load() would break if they call getmntent()
(for finding the cgroup directory) at the same time.

Use getmntent_r() instead. However we have to specify a buffer for
storing mntent strings. And getmntent_r() doesn't actually report an
error if the buffer is too small, it silently truncates output strings,
so we can't dynamically realloc that buffer.

The getmntent() that we used before this commit was internally limited
to 4kB. And Linux actually limits mount options to 3 pages (during mount,
not when reading /proc/mounts). So use 4 pages to be above both.

Thanks to Corentin Rossignon for reporting the issue
(using gcc -fsanitize=thread)

Fixes #194

Refs #142

(cherry picked from commit 2337361)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants