Skip to content

[SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-tests.py #19998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

kiszk
Copy link
Member

@kiszk kiszk commented Dec 16, 2017

What changes were proposed in this pull request?

In the environment where /usr/sbin/lsof does not exist, ./dev/run-tests.py for maven causes the following error. This is because the current ./dev/run-tests.py checks existence of only /usr/sbin/lsof and aborts immediately if it does not exist.

This PR changes to check whether lsof or /usr/sbin/lsof exists.

/bin/sh: 1: /usr/sbin/lsof: not found

Usage:
 kill [options] <pid> [...]

Options:
 <pid> [...]            send signal to every <pid> listed
 -<signal>, -s, --signal <signal>
                        specify the <signal> to be sent
 -l, --list=[<signal>]  list all signal names, or convert one to a name
 -L, --table            list all signal names in a nice table

 -h, --help     display this help and exit
 -V, --version  output version information and exit

For more details see kill(1).
Traceback (most recent call last):
  File "./dev/run-tests.py", line 626, in <module>
    main()
  File "./dev/run-tests.py", line 597, in main
    build_apache_spark(build_tool, hadoop_version)
  File "./dev/run-tests.py", line 389, in build_apache_spark
    build_spark_maven(hadoop_version)
  File "./dev/run-tests.py", line 329, in build_spark_maven
    exec_maven(profiles_and_goals)
  File "./dev/run-tests.py", line 270, in exec_maven
    kill_zinc_on_port(zinc_port)
  File "./dev/run-tests.py", line 258, in kill_zinc_on_port
    subprocess.check_call(cmd, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/usr/sbin/lsof -P |grep 3156 | grep LISTEN | awk '{ print $2; }' | xargs kill' returned non-zero exit status 123

How was this patch tested?

manually tested

@kiszk kiszk changed the title [SPARK-22377][BUILD] Use lsof or /usr/sbin/lsof in run-tests.py [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-tests.py Dec 16, 2017
@kiszk
Copy link
Member Author

kiszk commented Dec 16, 2017

@srowen @HyukjinKwon could you please review this?

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

dev/run-tests.py Outdated
cmd = ("lsof -P |grep %s | grep LISTEN "
"| awk '{ print $2; }' | xargs kill") % zinc_port
subprocess.check_call(cmd, shell=True)
except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we catch the explicit exception?

Also, I think we could this like:

cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill"
...
lsof = "lsof"
subprocess.check_call(cmd % (lsof  zinc_port), shell=True)
...
lsof = "/usr/sbin/lsof"
subprocess.check_call(cmd % (lsof  zinc_port), shell=True)

Copy link
Member Author

@kiszk kiszk Dec 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if the command does not exist, we can catch it as an exception. Thus, we can execute one of the two cases.

Yea, to use such a cmd is fine.

dev/run-tests.py Outdated
try:
subprocess.check_call(cmd % ("lsof", zinc_port), shell=True)
except:
subprocess.call(cmd % ("/usr/sbin/lsof", zinc_port), shell=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, subprocess.call -> subprocess.check_call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally use subprocess.call to continue the execution even if lsof and /usr/sbin/lsof do not exist. This is because it is ok for other steps if we fail to kill zinc.

WDYT?

Copy link
Member

@HyukjinKwon HyukjinKwon Dec 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but it changes what originally kill_zinc_on_port does though because now it is not guaranteed to kill it. I see the point but let's stick to the original behaviour, for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Since this change is not strong preference, I will revert this change to keep the original behavior.

cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN "
"| awk '{ print $2; }' | xargs kill") % zinc_port
subprocess.check_call(cmd, shell=True)
cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, @kiszk, I think we can actually use sparktestsupport.shellutils.which("...") too like what we do for java:

return java_exe if java_exe else which("java")

So, like ..

cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill"
lsof_exe = which("lsof") 
subprocess.check_call(cmd % (lsof_exe if lsof_exe else "/usr/sbin/lsof", zinc_port), shell=True)

I just double checked:

>>> lsof_exe = which("lsof")
>>> cmd % (lsof_exe if lsof_exe else "/usr/sbin/lsof", zinc_port)
"/usr/sbin/lsof -P |grep 1234 | grep LISTEN | awk '{ print $2; }' | xargs kill"
>>> lsof_exe = which("lsof")
>>> cmd % (lsof_exe if lsof_exe else "/usr/bin/lsof", zinc_port)
"/usr/sbin/lsof -P |grep 1234 | grep LISTEN | awk '{ print $2; }' | xargs kill"
>>> lsof_exe = which("foo")
>>> cmd % (lsof_exe if lsof_exe else "/usr/sbin/lsof", zinc_port)
"/usr/sbin/lsof -P |grep 1234 | grep LISTEN | awk '{ print $2; }' | xargs kill"
>>> lsof_exe = which("bar")
>>> cmd % (lsof_exe if lsof_exe else "/usr/bin/lsof", zinc_port)
"/usr/bin/lsof -P |grep 1234 | grep LISTEN | awk '{ print $2; }' | xargs kill"

Copy link
Member Author

@kiszk kiszk Dec 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, I also checked it.

>>> print(which("lsof"))
/usr/bin/lsof
>>> 
% ls /usr/bin/lsof /usr/sbin/lsof
ls: cannot access '/usr/sbin/lsof': No such file or directory
/usr/bin/lsof

@SparkQA
Copy link

SparkQA commented Dec 16, 2017

Test build #84990 has finished for PR 19998 at commit 969bc22.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 16, 2017

Test build #84992 has finished for PR 19998 at commit b384336.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 16, 2017

Test build #84993 has finished for PR 19998 at commit 964e5ff.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 16, 2017

Test build #84997 has finished for PR 19998 at commit 6c29a11.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Dec 16, 2017

retest this please

@SparkQA
Copy link

SparkQA commented Dec 16, 2017

Test build #84999 has finished for PR 19998 at commit 6c29a11.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Dec 16, 2017

Should we retest this again?

@HyukjinKwon
Copy link
Member

Let's wait for a bit. Let me restart this once we started to get passed. Seems globally failed in R's CRAN check.

@kiszk
Copy link
Member Author

kiszk commented Dec 16, 2017

I see, thank you very much.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Dec 17, 2017

Test build #85026 has finished for PR 19998 at commit 6c29a11.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Dec 18, 2017

Test build #85050 has finished for PR 19998 at commit 6c29a11.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in 3a07eff Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants