Skip to content
This repository has been archived by the owner on May 7, 2020. It is now read-only.

Proxy time-outs #3184

Closed
kgoderis opened this issue Mar 26, 2017 · 16 comments
Closed

Proxy time-outs #3184

kgoderis opened this issue Mar 26, 2017 · 16 comments

Comments

@kgoderis
Copy link
Contributor

kgoderis commented Mar 26, 2017

I am not sure whether this is related to #2734, but I recently moved to a new linux-based OH2 runtime environment, and suddenly, proxied images are not longer displayed, and in fact, time out. I think this is related to OH2/ESH and not nginx in front of OH2, but this is what I get in the OH logs (never mind the exact URL, I have 6 or so images that fail to load):

2017-03-26 14:43:52.755 [DEBUG] [web.service.spi.model.ServletModel.7] - 925904774 rewriting: http://supermicro.kalipedia.be/proxy?sitemap=kalideux.sitemap&widgetId=070204&t=1490532232744 -> http://192.168.0.62/cgi-bin/image.jpg?imgprof=Right
2017-03-26 14:43:52.756 [DEBUG] [web.service.spi.model.ServletModel.7] - 509713573 rewriting: http://supermicro.kalipedia.be/proxy?sitemap=kalideux.sitemap&widgetId=070203&t=1490532232744 -> http://192.168.0.61/cgi-bin/image.jpg?imgprof=Left
2017-03-26 14:43:52.756 [DEBUG] [web.service.spi.model.ServletModel.7] - 925904774 proxying to upstream:
GET /proxy?sitemap=kalideux.sitemap&widgetId=070204&t=1490532232744 HTTP/1.0
Accept: */*
X-Forwarded-Proto: https
Connection: close
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8
Referer: https://supermicro.kalipedia.be/classicui/app?sitemap=kalideux
X-Forwarded-For: 81.164.174.126
Host: supermicro.kalipedia.be
Accept-Language: en-us
Accept-Encoding: gzip, deflate
X-Real-IP: 81.164.174.126

HttpRequest[GET /cgi-bin/image.jpg HTTP/1.0]@542ed396

, and then in the nginx logs:

2017/03/26 14:42:34 [error] 9179#9179: *31 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 81.164.174.126, server: 192.168.0.6, request: "GET /proxy?sitemap=kalideux.sitemap&widgetId=070206&t=1490532034751 HTTP/1.1", upstream: "http://127.0.0.1:8080/proxy?sitemap=kalideux.sitemap&widgetId=070206&t=1490532034751", host: "supermicro.kalipedia.be", referrer: "https://supermicro.kalipedia.be/classicui/app?sitemap=kalideux"

anyone any idea what is going on? I literally copied over the configs from my old server to the new ones, so it puzzles me even more.

@kgoderis
Copy link
Contributor Author

kgoderis commented Mar 27, 2017

For all completeness, nginx is running on the same host as OH, and the upstream URL work fine when used directly in a browser

Mental self-note: check if related to openhab/openhab-distro#423

@maggu2810
Copy link
Contributor

Can you test without nginx?
If the problem still occurs the best approach IMHO would be to attach a debugger.

@kgoderis
Copy link
Contributor Author

@maggu2810 Did a quick check without nginx and the outcome is the same. So it is not nginx related. I will try to attach a debugger

@kgoderis
Copy link
Contributor Author

kgoderis commented Mar 28, 2017

@watou @maggu2810 I am really wondering if the above issue is introduced by #2753?

  • disabling nginx does nog change a thing
  • recreating the http request by hand in an ordinary browser does work
  • when attaching a debugger, it seems that the AsyncProxyServlet is fired up, but then it times out
  • going via http (just to make sure that this is not an issue with https, as the upstream server sending the image can only talk http) yields the same timeout
vice.jetty.internal.HttpServiceResponseWrapper@782ba181
2017-03-28 19:59:15.054 [DEBUG] [web.service.spi.model.ServletModel.7] - 28932942 rewriting: http://192.168.0.6:8080/proxy?sitemap=kalideux.sitemap&widgetId=070204&t=1490532232744 -> http://192.168.0.62/cgi-bin/image.jpg?imgprof=Right
2017-03-28 19:59:15.054 [DEBUG] [web.service.spi.model.ServletModel.7] - 28932942 proxying to upstream:
GET /proxy?sitemap=kalideux.sitemap&widgetId=070204&t=1490532232744 HTTP/1.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8
Connection: keep-alive
Host: 192.168.0.6:8080
Accept-Language: en-us
Accept-Encoding: gzip, deflate

HttpRequest[GET /cgi-bin/image.jpg HTTP/1.1]@20f45ffb

This goes well beyond my understanding of the underlying Jetty proxy classes, so I leave it up to you guys to figure out what is going on.

@kgoderis
Copy link
Contributor Author

For completeness, I have tried to load an image coming from another type of appliance : same result....

@kgoderis
Copy link
Contributor Author

@maggu2810 were you able to reproduce the problem?

@maggu2810
Copy link
Contributor

@kgoderis I edit your two posts, so the logs are marked as pre-formatted text -- much better to read.

I haven't tried to reproduce the problem, yet.
I can try to do it, if you give me instructions how to setup a runtime to reproduce the problem.

@kgoderis
Copy link
Contributor Author

@maggu2810 The installation I tried it on is a plain-vanilla "Ubuntu apt-get" installation, adjusted for my own configs and with nginx.

@maggu2810
Copy link
Contributor

@kgoderis Sorry, but I don't know how I can setup a demo configuration to reproduce the problem.
Shouldn't it be possible to use the Eclipse SmartHome launch configuration that is shipped in the source tree, add some bundles, add a specific configuration, ... to reproduce it?

@kgoderis
Copy link
Contributor Author

kgoderis commented Apr 22, 2017

@kaikreuzer @maggu2810 I spend the last week investigating the issue, it was quite an adventure, and I made some progress but no resolution whatsoever. The problem is impossible to replicate. I installed a fresh new Ubuntu in a VM, and installed the latest SNAPSHOT OH2, and everything runs quite OK.

Then I made sure that my own environment was identical to this running version (even dit literal file content comparisons between the 2 instances). That is , same SNAPSHOT, same JVM, same configs for OH2 (apart for bindings). Still my own environment was not proxying.

I nailed the issue to the SocketAddressResolver at https://github.com/eclipse/jetty.project/blob/jetty-9.2.19.v20160908/jetty-client/src/main/java/org/eclipse/jetty/client/HttpClient.java#L533 not being able to "resolve" the host address. This SocketAddressResolver does nothing more than creating a SocketAddress but it goes through an Executor and Jetty's own QueuedThreadPool implementation (https://github.com/eclipse/jetty.project/blob/jetty-9.2.19.v20160908/jetty-util/src/main/java/org/eclipse/jetty/util/SocketAddressResolver.java#L133) . In short, the ThreadPool or the Executor of that SocketAddressResolver is not working.

If I modify the ESH Proxy to use only BlockingProxyServlet everything works. If I create my own version of the SocketAddressResolver everything works (see below). So, for some @#$#% reason, the Executor/QTP is failing.

Do you know of anything that could influence the functioning of Executors in the runtime? I mean, they are running fine for other bindings/places where they are used, but not here.

   // this works - simple test of an http client
    HttpClient client = new HttpClient();
    client.start();
    ContentResponse cr = client.GET("http://192.168.0.14/enu/camera320x240.jpg");
        // this works as well
        QueuedThreadPool threadPool = new QueuedThreadPool();
        threadPool.setName("proxytest");
        threadPool.start();
        Executor executor = threadPool;
        Scheduler scheduler = new ScheduledExecutorScheduler("proxytest-scheduler", false);
        scheduler.start();

        Async resolver = new Async(executor, scheduler, 15000);
        logger.trace("Starting resolve");
        resolver.resolve("192.168.0.14", 80, new Promise<SocketAddress>() {
            @Override
            public void succeeded(SocketAddress socketAddress) {
                Map<String, Object> context = new HashMap<>();
                logger.trace("Resolve Succeeded");
            }

            @Override
            public void failed(Throwable x) {
                logger.trace("Resolve Failed {}", x.getMessage(), x);
            }
        });
        logger.trace("Ending resolve");

     public class Async {
        private final Logger LOG = LoggerFactory.getLogger(UIActivator.class);

        private final Executor executor;
        private final Scheduler scheduler;
        private final long timeout;

        public Async(Executor executor, Scheduler scheduler, long timeout) {
            this.executor = executor;
            this.scheduler = scheduler;
            this.timeout = timeout;
        }

        public Executor getExecutor() {
            return executor;
        }

        public Scheduler getScheduler() {
            return scheduler;
        }

        public long getTimeout() {
            return timeout;
        }

        public void resolve(final String host, final int port, final Promise<SocketAddress> promise) {
            executor.execute(new Runnable() {
                @Override
                public void run() {
                    Scheduler.Task task = null;
                    final AtomicBoolean complete = new AtomicBoolean();
                    if (timeout > 0) {
                        final Thread thread = Thread.currentThread();
                        task = scheduler.schedule(new Runnable() {
                            @Override
                            public void run() {
                                if (complete.compareAndSet(false, true)) {
                                    promise.failed(new TimeoutException());
                                    thread.interrupt();
                                }
                            }
                        }, timeout, TimeUnit.MILLISECONDS);
                    }

                    try {
                        long start = System.nanoTime();
                        InetSocketAddress result = new InetSocketAddress(host, port);
                        long elapsed = System.nanoTime() - start;
                        LOG.debug("Resolved {} in {} ms", host, TimeUnit.NANOSECONDS.toMillis(elapsed));

                        if (complete.compareAndSet(false, true)) {
                            if (result.isUnresolved()) {
                                promise.failed(new UnresolvedAddressException());
                            } else {
                                promise.succeeded(result);
                            }
                        }
                    } catch (Throwable x) {
                        if (complete.compareAndSet(false, true)) {
                            promise.failed(x);
                        }
                    } finally {
                        if (task != null) {
                            task.cancel();
                        }
                        // Reset the interrupted status before releasing the thread to the pool
                        Thread.interrupted();
                    }
                }
            });
        }
    }

I noticed that we are using jetty.util 9.2.19, it seems a bit dated (e.g. 9.4.x is available). For some reason in Eclipse the TP always resolves to 9.2.12, and in that version the SocketAddressResolver has a different signature.

@kgoderis
Copy link
Contributor Author

kgoderis commented Apr 23, 2017

@maggu2810 @kaikreuzer I did the following test: I installed the latest snapshot (#889) on the machine that has proxying issues by merely downloading the .zip from cloudbees. Installed it, no modification whatsoever, and I put in demo mode after startup: proxying is not working for the demo.sitemap "multimedia" items

Is there anything that possibly could be influencing the JVM threading functionality? I presume that even with jetty it is mapping Threads to underlying system/OS threads, no?

This is by far the weirdest problem I have ever encountered with OH2/ESH

@kgoderis
Copy link
Contributor Author

kgoderis commented Apr 24, 2017

@maggu2810 @kaikreuzer Found it! The culprit is https://github.com/eclipse/smarthome/blob/3ae5d7b1b284f2bde6199bfe8dad950fd38cae39/bundles/ui/org.eclipse.smarthome.ui/src/main/java/org/eclipse/smarthome/ui/internal/proxy/ProxyServletService.java#L71 in combination with org.eclipse.jetty.client.http.HttpClientTransportOverHTTP creating exactly Math.max(1, Runtime.getRuntime().availableProcessors() / 2) selector threads, the QueuedThreadPool maximum of 8 threads are filled up with the 8 selectors on my 8-Core/16-Thread Xeon-based server, thereby not allowing anymore threads to be created to handle HTTP connections themselves. It explains why an exact copy of the environment is perfectly running on a Virtual Machine with for example 2 virtual Cores.

Now, when I add "org.eclipse.smarthome.proxy:maxThreads=16" to for example runtime.cfg this does not seem to have any effect. How can I change the Map<String, Object> config that is passed onto the ProxyServletService's activate() method?

kgoderis added a commit to kgoderis/smarthome that referenced this issue Apr 25, 2017
… at least the double of the number of selector threads that are created by the HttpClient, so that the QueuedThreadPool is not clogged with blocking “selector” threads

Signed-off-by: Karel Goderis <karel.goderis@me.com>
kaikreuzer pushed a commit that referenced this issue Apr 26, 2017
…uble of the number of selector threads that are created by the HttpClient, so that the QueuedThreadPool is not clogged with blocking “selector” threads (#3311)

Signed-off-by: Karel Goderis <karel.goderis@me.com>
@kaikreuzer
Copy link
Contributor

Congrats on figuring out that issue! Your servers clearly have too many cores :-D

How can I change the Map<String, Object> config that is passed onto the ProxyServletService's activate() method?

I would expect this to work like this.
Maybe this line needs to be changed to have name="org.eclipse.smarthome.proxy (i.e. remove the "ui.") and maybe also add a configuration-policy="optional" to it.

@kgoderis
Copy link
Contributor Author

kgoderis commented Apr 26, 2017 via email

@maggu2810
Copy link
Contributor

Congrats @kgoderis 👍

@maggu2810
Copy link
Contributor

I assume this is fixed by #3311 now

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants