Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testssl.sh 3.0 starts significantly slower than the previous version #1489

Closed
timsummerer opened this issue Feb 2, 2020 · 22 comments
Closed

Comments

@timsummerer
Copy link

Since updating testssl.sh to 3.0 on my Thinkpad T450 running Archlinux, every testssl command takes several minutes before it actually starts to do anything. On the version before 3.0, it only took seconds to start.

testssl prints "No engine or GOST support via engine with your /usr/bin/openssl" and then nothing happens for several minutes after which testssl behaves normally. If it helps, I was able to see a openssl process start, stop and start a new instance repeatedly in task manager while testssl seemed to do nothing.

  1. uname -a
    Linux T450 5.4.15-arch1-1 Heartbleed for STARTTLS #1 SMP PREEMPT Sun, 26 Jan 2020 09:48:50 +0000 x86_64 GNU/Linux

  2. testssl version from the banner:
    3.0 from https://testssl.sh/

  3. git log | head -1 (if running from git repo)
    running binary from ArchLinux repos

  4. openssl version used by testssl.sh:
    Using "OpenSSL 1.1.1d 10 Sep 2019" [~80 ciphers]
    on T450:/usr/bin/openssl
    (built: "Nov 13 16:09:29 2019", platform: "linux-x86_64")

  5. steps to reproduce: testssl.sh or docker command line, if possible incl. host
    e.g. testssl -p google.com (it doesn't really matter as long as an argument is given to testssl)

  6. what exactly was happening, output is needed
    see above

  7. what did you expect instead?
    see above

@drwetter
Copy link
Collaborator

drwetter commented Feb 3, 2020

testssl prints "No engine or GOST support via engine with your /usr/bin/openssl" and then nothing happens for several minutes after which testssl behaves normally.

Several minutes??

There are some pre-checks which save you time later (e.g. in the protocol section) but that's a matter of seconds.

Please try the following first: date ; trap "date" SIGINT; ./testssl.sh testssl.sh, when SSLv2 shows on the screen then you press ^C. It is 3-4 seconds for me.

Also please try date ; trap "date" SIGINT; ./testssl.sh --nodns=none --ip=81.169.166.184 testssl.sh

and see whether there;'s a difference.

@dcooper16
Copy link
Collaborator

testssl prints "No engine or GOST support via engine with your /usr/bin/openssl" and then nothing happens for several minutes after which testssl behaves normally. If it helps, I was able to see a openssl process start, stop and start a new instance repeatedly in task manager while testssl seemed to do nothing.

Nothing at all happens after "No engine or GOST support via engine with your /usr/bin/openssl" is printed? The next step should be printed the banner:

      ###########################################################
          testssl.sh       3.0 from https://testssl.sh/
          (0252316 2020-01-31 08:46:37 -- )

            This program is free software. Distribution and
                   modification under GPLv2 permitted.
            USAGE w/o ANY WARRANTY. USE IT AT YOUR OWN RISK!

             Please file bugs @ https://testssl.sh/bugs/

      ###########################################################

After "No engine or GOST support via engine with your /usr/bin/openssl" is printed, but before the banner is printed, OpenSSL is called many times in order to find out what capabilities it has (e.g., SSLv2, SSLv3, TLS 1.3, etc.). This code was recently changed to address #1418, so that is one possible source of the problem.

Try running

time /usr/bin/openssl s_client -ssl2 -connect invalid.

This should be almost instantaneous. Here are the results I got:

> time /usr/bin/openssl s_client -ssl2 -connect invalid.
s_client: Option unknown option -ssl2
s_client: Use -help for summary.
0.006u 0.000s 0:00.00 0.0%      0+0k 0+0io 0pf+0w

The output you see may be different, but if the amount of time this operation takes to run isn't almost zero, then that is the source of the problem.

@timsummerer
Copy link
Author

timsummerer commented Feb 3, 2020

date ; trap "date" SIGINT; testssl testssl.sh took 3:50min until SSLv2 appeared

date ; trap "date" SIGINT; testssl --nodns=none --ip=81.169.166.184 testssl.sh took 3:27min until SSLv2 appeared

The output from time /usr/bin/openssl s_client -ssl2 -connect invalid. was:

s_client: Option unknown option -ssl2
s_client: Use -help for summary.
/usr/bin/openssl s_client -ssl2 -connect invalid.  0,00s user 0,00s system 92% cpu 0,003 total

By the way, this problem only affects the locally installed testssl from the ArchLinux repositories and not the version from Github with a separate openssl version.

@dcooper16
Copy link
Collaborator

Just to be sure I understand. Are you seeing the banner before or after the 3 minute wait? Do you see "No engine or GOST support via engine with your /usr/bin/openssl" and then it is 3 minutes later before the banner is printed, or is some other text printed between when "No engine or GOST support via engine with your /usr/bin/openssl" is printed and when the 3 minute wait begins?

What happens if you use the version from Github, but include --openssl=/usr/bin/openssl in the command line?

@drwetter
Copy link
Collaborator

drwetter commented Feb 3, 2020

Could you pls paste the output from testssl.sh -p testssl.sh | /usr/bin/ts -i '%.T' ? (redact the hostname if you mind and press ^C when the line w SSLv2 has been reached).

ts comes from moreutils, double check whether it's installed / the path is correct.

@timsummerer
Copy link
Author

No engine or GOST support via engine with your /usr/bin/openssl is printed directly after issuing a testssl command and after 3 minutes the testssl banner is printed.

@dcooper16 the latest version from Github on 3.1dev with --openssl=/usr/bin/openssl is also affected.

@drwetter testssl -p testssl.sh | /usr/bin/ts -i '%.T' ouput:

00:00:00.232905 
00:00:00.001345 No engine or GOST support via engine with your /usr/bin/openssl
00:02:56.712826 
00:00:00.000058 ###########################################################
00:00:00.001391     testssl       3.0 from https://testssl.sh/
00:00:00.000171 
00:00:00.000069       This program is free software. Distribution and
00:00:00.000034              modification under GPLv2 permitted.
00:00:00.000026       USAGE w/o ANY WARRANTY. USE IT AT YOUR OWN RISK!
00:00:00.000024 
00:00:00.001896        Please file bugs @ https://testssl.sh/bugs/
00:00:00.000072 
00:00:00.001798 ###########################################################
00:00:00.000066 
00:00:00.006323  Using "OpenSSL 1.1.1d  10 Sep 2019" [~80 ciphers]
00:00:00.000182  on T450:/usr/bin/openssl
00:00:00.000078  (built: "Nov 13 16:09:29 2019", platform: "linux-x86_64")
00:00:00.000033 
00:00:00.000286 
00:00:01.335657  Start 2020-02-03 18:32:13        -->> 81.169.166.184:443 (testssl.sh) <<--
00:00:00.000109 
00:00:00.010263  Further IP addresses:   2a01:238:4308:a920:1000:0:e571:51 
00:00:00.038766  rDNS (81.169.166.184):  testssl.sh.
00:00:01.782164  Service detected:       HTTP
00:00:00.000064 
00:00:00.323106 
00:00:00.003374  Testing protocols via sockets except NPN+ALPN 
00:00:00.000074 
00:00:00.193955  SSLv2      not offered (OK)

@dcooper16
Copy link
Collaborator

Hi @summerer,

Thanks for the information. It seems that

  • the problem occurs after "No engine or GOST support via engine with your /usr/bin/openssl" is printed but before the banner is printed;
  • the problem occurs with /usr/bin/openssl but not ./bin/openssl.Linux.x86_64; and
  • the problem occurs with testssl.sh version 3.0 but not 2.9.5

The only guess I have is that the problem is happening in find_openssl_binary() during the tests of OpenSSL's capabilities. However, given that /usr/bin/openssl s_client -ssl2 -connect invalid. ran without problems, I can't think of what could be causing the problem.

Perhaps Dirk has a better idea, but the next step I can think of is to try running in debug mode with timing information. As described in https://github.com/drwetter/testssl.sh/blob/67598e824fdcc0a7a25e0e85fb79e29c3866365b/doc/testssl.1.md, try running:

DEBUGTIME=true bash -x testssl.sh -p testssl.sh
paste /tmp/testssl-* > timing.txt

The resulting timing.txt file will be large, but it will show the time at which each step in the program was executed. Presumably somewhere in find_openssl_binary there will be places with large gaps in time between the execution of two steps.

@timsummerer
Copy link
Author

timing.txt
Here you go, hope this is helpful.

@dcooper16
Copy link
Collaborator

Thanks @summerer,

That helps a lot in finding the source of the problem (but, unfortunately, not in understanding why it is happening.

Here is the first thing I noticed:

|16956> 	find_openssl_binary(): /usr/bin/openssl s_client -ssl2 -connect invalid.	1580759110.162418262
|16956> 	find_openssl_binary(): grep -aq 'unknown option'	1580759110.162620224
|16959> 	find_openssl_binary(): /usr/bin/openssl s_client -ssl3 -connect invalid.	1580759110.166255249
|16959> 	find_openssl_binary(): grep -aq 'unknown option'	1580759110.166438823
|16962> 	find_openssl_binary(): /usr/bin/openssl s_client -tls1_3 -connect invalid.	1580759110.170110068
|16962> 	find_openssl_binary(): grep -aq 'unknown option'	1580759110.170295636
|16963> 	find_openssl_binary(): HAS_TLS13=true	1580759120.504800321
|16965> 	find_openssl_binary(): /usr/bin/openssl genpkey -algorithm X448 -out -	1580759120.506590377
|16965> 	find_openssl_binary(): grep -aq 'not found'	1580759120.507149845
|16966> 	find_openssl_binary(): HAS_X448=true	1580759120.515452383
|16968> 	find_openssl_binary(): /usr/bin/openssl genpkey -algorithm X25519 -out -	1580759120.516128589
|16968> 	find_openssl_binary(): grep -aq 'not found'	1580759120.516308256
|16969> 	find_openssl_binary(): HAS_X25519=true	1580759120.519664589
|16971> 	find_openssl_binary(): /usr/bin/openssl s_client -no_ssl2 -connect invalid.	1580759120.520293201
|16971> 	find_openssl_binary(): grep -aq 'unknown option'	1580759120.520423317
|16974> 	find_openssl_binary(): /usr/bin/openssl s_client -noservername -connect invalid.	1580759120.524043035
|16974> 	find_openssl_binary(): grep -aq 'unknown option'	1580759120.524215608
|16975> 	find_openssl_binary(): HAS_NOSERVERNAME=true	1580759121.075276114
|16977> 	find_openssl_binary(): /usr/bin/openssl s_client -ciphersuites -connect invalid.	1580759121.077042188
|16977> 	find_openssl_binary(): grep -aq 'unknown option'	1580759121.077644912
|16978> 	find_openssl_binary(): HAS_CIPHERSUITES=true	1580759121.084068328
|16980> 	find_openssl_binary(): /usr/bin/openssl s_client -comp -connect invalid.	1580759121.084889867
|16980> 	find_openssl_binary(): grep -aq 'unknown option'	1580759121.085093769
|16981> 	find_openssl_binary(): HAS_COMP=true	1580759126.085304103
|16983> 	find_openssl_binary(): /usr/bin/openssl s_client -no_comp -connect invalid.	1580759126.087177537
|16983> 	find_openssl_binary(): grep -aq 'unknown option'	1580759126.087776149
|16984> 	find_openssl_binary(): HAS_NO_COMP=true	1580759127.158002484

Testing for SSLv2 and SSLv3 (which /usr/bin/openssl does not support) was very fast, but testing for TLS 1.3 took about 10 seconds. This might suggest that calls of the form

/usr/bin/openssl s_client <option> -connect invalid.

take a long time if <option> is a supported option. However, it is not that simple. Both -comp and -no_comp are supported, but testing for -no_comp was fairly fast while testing for -comp took 5 seconds.

Testing which elliptic curves OpenSSL supports took 171 seconds (so about 5.7 seconds for each curve tested)! After these tests, there don't seem to be any more delays.

These tests in find_openssl_binary() that are running slowly were not in find_openssl_binary() in version 2.9.5, so that would explain the difference. However, I cannot explain why these tests are running so slowly while others (e.g., -no_comp, -noservername) are not.

@drwetter
Copy link
Collaborator

drwetter commented Feb 3, 2020

As described in https://github.com/drwetter/testssl.sh/blob/67598e824fdcc0a7a25e0e85fb79e29c3866365b/doc/testssl.1.md, try running:

LOL. I was looking for the external documentation for this debug mode earlier and I wasn't remembering that I already documented it

@drwetter
Copy link
Collaborator

drwetter commented Feb 3, 2020

Thanks David!

@summerer : please try /usr/bin/openssl s_client -comp -connect invalid. and /usr/bin/openssl s_client -comp -connect x and let us know whether there's a run time difference

@timsummerer
Copy link
Author

time /usr/bin/openssl s_client -comp -connect invalid.        1 ↵
140106500756736:error:2008F002:BIO routines:BIO_lookup_ex:system lib:crypto/bio/b_addr.c:724:Temporary failure in name resolution
connect:errno=11
/usr/bin/openssl s_client -comp -connect invalid.  0,01s user 0,00s system 0% cpu 14,535 tota

circa 15 seconds

time /usr/bin/openssl s_client -comp -connect x               1 ↵
140455564260608:error:2008F002:BIO routines:BIO_lookup_ex:system lib:crypto/bio/b_addr.c:724:Name or service not known
connect:errno=6
/usr/bin/openssl s_client -comp -connect x  0,01s user 0,01s system 0% cpu 2,318 total

circa 2 seconds

@drwetter
Copy link
Collaborator

drwetter commented Feb 3, 2020

Thanks @summerer . Actually both seem too high to me

prompt:~|% time /usr/bin/openssl s_client -comp -connect invalid.  
s_client: must not provide both -connect option and target parameter
s_client: Use -help for summary.
/usr/bin/openssl s_client -comp -connect invalid. �  0.00s user 0.00s system 94% cpu 0.006 total
prompt:~|1% 
prompt:~|1%  time /usr/bin/openssl s_client -comp -connect x 
139801312719296:error:2008F002:BIO routines:BIO_lookup_ex:system lib:crypto/bio/b_addr.c:724:Name or service not known
connect:errno=0
/usr/bin/openssl s_client -comp -connect x  0.03s user 0.00s system 37% cpu 0.076 total
prompt:~|1%

Smells like a DNS related problem.

@dcooper16
Copy link
Collaborator

Smells like a DNS related problem.

Perhaps. But, I can't understand why it is slow with one version of OpenSSL but not another, or why it is slow with

/usr/bin/openssl s_client -comp -connect invalid.

but not

/usr/bin/openssl s_client -no_comp -connect invalid.

@timsummerer
Copy link
Author

My DNS setup is as follows:
NetworkManager will use dnsmasq which serves as a cache. dnsmasq will then use stubby to actually forward the DNS request to Cloudflare through DNS-over-TLS.

The other thing I noticed was that invalid. returned a failure in name resolution and x returned name or service not known. Might openssl 1.1 parse URLs differently compared to openssl 1.0?

The dot at the end might have made a difference.

@drwetter
Copy link
Collaborator

drwetter commented Feb 3, 2020

It looks to me something is broken in your setup or, less likely, openssl.

invalid. was defined because it is supposed to return sooner, see https://tools.ietf.org/html/rfc6761#section-6.4

   4.  Caching DNS servers SHOULD recognize "invalid" names as special
       and SHOULD NOT attempt to look up NS records for them, or
       otherwise query authoritative DNS servers in an attempt to
       resolve "invalid" names.  Instead, caching DNS servers SHOULD
       generate immediate NXDOMAIN responses for all such queries.  This
       is to avoid unnecessary load on the root name servers and other
       name servers.

I can't tell for sure but I assume that every request goes without caching to cf. x seems be be different but still costs too much time IMO.

The dot in the end is needed because of multicast DNS resolvers (#1421 ).

Is there a difference between /usr/bin/openssl s_client <PARAM> -connect invalid. when using PARAM=-comp, PARAM=-no_comp, or PARAM=""?

@timsummerer
Copy link
Author

timsummerer commented Feb 4, 2020

time openssl s_client -comp -connect invalid.                 1 ↵
140475744273664:error:2008F002:BIO routines:BIO_lookup_ex:system lib:crypto/bio/b_addr.c:724:Temporary failure in name resolution
connect:errno=11
openssl s_client -comp -connect invalid.  0,01s user 0,00s system 0% cpu 10,120 total

First run with -comp took 15 seconds then 10 seconds, thrid run 5 seconds.

time openssl s_client -no_comp -connect invalid.              1 ↵
140173421843712:error:2008F002:BIO routines:BIO_lookup_ex:system lib:crypto/bio/b_addr.c:724:Temporary failure in name resolution
connect:errno=11
openssl s_client -no_comp -connect invalid.  0,01s user 0,01s system 0% cpu 5,494 total

PARAM=-no_comp also sometimes finishes instantly and sometimes 5-10 seconds

time openssl s_client -connect invalid.                       1 ↵
139894759830784:error:2008F002:BIO routines:BIO_lookup_ex:system lib:crypto/bio/b_addr.c:724:Temporary failure in name resolution
connect:errno=11
openssl s_client -connect invalid.  0,01s user 0,01s system 0% cpu 5,908 total

PARAM="" once took almost no time on first run and 5 seconds on the second run.

Runtime seems to vary between 0 and 15 seconds for all commands.

By the way, the problem seems to persist even if I set my DNS directly to Cloudflare in NetworkManager.

@drwetter
Copy link
Collaborator

drwetter commented Feb 4, 2020

By the way, the problem seems to persist even if I set my DNS directly to Cloudflare in NetworkManager

So dnsmasq is off the hook.

What's left is either a local DNS issue (Arch Linux DNS system setup) or one from CF 1.

Out of curiosity I tried here host invalid.. That ends up on my local DNS, it doesn't go upstream. I used strace -f locally and tcpdump on my DNS server, the latter one might be partly difficult for your setup CF.

Can you try to analyze this yourself pls, with host invalid. and with any openssl s_client -connect commands and let us know.

1 In a sense of an error like "we don't know this host, please hold". The response time otherwise is way too long

@timsummerer
Copy link
Author

@drwetter Alright, I will do some digging into this and report back if I am able to pinpoint the issue further.

@timsummerer
Copy link
Author

@drwetter I was able to identify the issue. Disabling DNSSEC in stubby seems to have fixed the issue.

@drwetter
Copy link
Collaborator

DNSSEC was enabled on the net upstream resolver? I reopend this for the time being bc of #1812 and #1738. Those seemed to be platform depended though

@drwetter
Copy link
Collaborator

drwetter commented Oct 2, 2021

Closing this for housekeeping reasons. I'll be happy to reopen if needed.

#1988 will partly help.

@drwetter drwetter closed this as completed Oct 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants