Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpredictable errors when OkHttpClient.Builder instance is reused #445

Closed
cantide5ga opened this issue May 4, 2018 · 16 comments
Closed
Assignees
Milestone

Comments

@cantide5ga
Copy link

influxdb-java version 2.10

There are two similar stack traces throwing different exceptions:

Caused by: java.lang.StackOverflowError
	at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:240)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:160)
	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257)
	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at org.influxdb.impl.GzipRequestInterceptor.intercept(GzipRequestInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:144)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at org.influxdb.impl.GzipRequestInterceptor.intercept(GzipRequestInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:213)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at org.influxdb.impl.GzipRequestInterceptor.intercept(GzipRequestInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:213)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)

and

Caused by: java.net.SocketTimeoutException: timeout
	at okio.Okio$4.newTimeoutException(Okio.java:232)
	at okio.AsyncTimeout.exit(AsyncTimeout.java:285)
	at okio.AsyncTimeout$2.read(AsyncTimeout.java:241)
	at okio.RealBufferedSource.indexOf(RealBufferedSource.java:355)
	at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:227)
	at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215)
	at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
	at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at org.influxdb.impl.GzipRequestInterceptor.intercept(GzipRequestInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:213)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at org.influxdb.impl.GzipRequestInterceptor.intercept(GzipRequestInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:213)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at org.influxdb.impl.GzipRequestInterceptor.intercept(GzipRequestInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:213)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)

Common parts of the two is:

04 May 2018 08:49:09.354184 <190>1 2018-05-04T12:49:09.289207+00:00 app worker3.1 - - at org.influxdb.impl.GzipRequestInterceptor.intercept(GzipRequestInterceptor.java:42)
04 May 2018 08:49:09.354183 <190>1 2018-05-04T12:49:09.289209+00:00 app worker3.1 - - at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
04 May 2018 08:49:09.354183 <190>1 2018-05-04T12:49:09.289211+00:00 app worker3.1 - - at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
04 May 2018 08:49:09.354183 <190>1 2018-05-04T12:49:09.289212+00:00 app worker3.1 - - at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:144)
04 May 2018 08:49:09.354183 <190>1 2018-05-04T12:49:09.289214+00:00 app worker3.1 - - at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
04 May 2018 08:49:09.354183 <190>1 2018-05-04T12:49:09.289216+00:00 app worker3.1 - - at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)

Sometimes I get one, sometimes the other. Sometimes not at all. Maybe a race depending on if the stack is exhausted before the timeout?

Here are the details of my code:

InfluxDb client = InfluxDBFactory.connect(url, writerUser, writerPassword, httpClient);
client.query(new Query(someUpdateQuery, myDb));

httpClient is a singleton, instantiated like so:

return new OkHttpClient.Builder()
            .writeTimeout(55, TimeUnit.SECONDS);

I've also .setLogLevel(InfluxDB.LogLevel.FULL) to gather more information, but immediately preceding the exception, I only get logs on the HTTP request and nothing abnormal.

I found this related issue that didn't ever seem to get closure: #379

Again, this is often, but not always. I'm trying to find ways that I can consistently reproduce in order to be more helpful. The StackOverflowError and recursive trace only started happening when I had to explicitly configure the influx timeout.

@majst01
Copy link
Collaborator

majst01 commented May 7, 2018

Have you tried to disable gzip compression ?

@fmachado
Copy link
Contributor

fmachado commented May 7, 2018

@majst01 my guess (and this is a loooooooong shot) is: a race condition that could happen when you reuse a custom OkHttpClient.Builder. To be more precise, this is the offending line from InfluxDBImpl: multiple threads changing the interceptor List.

Checking the OkHttpClient javadoc documentation, you see that OkHttpClient should be shared (as it's thread-safe) but I don't think the same is applied to the Builder.

@majst01
Copy link
Collaborator

majst01 commented May 7, 2018

so should we create the okhttpClient instance inside a synchronized block in a helper function and pass that Retrofit.Builder instead? @cantide5ga Can you reproduce this Stackoverflow ? And try a version from a branch ?

@cantide5ga
Copy link
Author

@majst01 absolutely - let me know the branch. And as mentioned, the Stackoverflow comes and goes, but I think I can lock it down with a heavy load.

re: gzip, I've already disabled and the result is the same.

One thing a colleague and I were talking about is how InfluxDB itself seems to be giving a different instance and the constructor is adding interceptors to okHttpClient each and everytime. Is it recommended to use a shared instance of InfluxDB as well? I'm going to look more into it when I get a chance shortly, but might be a red herring.

@majst01
Copy link
Collaborator

majst01 commented May 8, 2018

You should share and use the same instance of InfluxDB, if you are creating many of them, this might be the root cause. Would you please try with a single instance of influxDB ?

@majst01
Copy link
Collaborator

majst01 commented May 8, 2018

If this is the case we should at least document this fact, and bonus try to prevent the creation of multiple influxdb instances to the same endpoint.

@cantide5ga
Copy link
Author

I thought this might be the case. This is an addition to some older code, so I'll need to determine how heavy an impact this refactor is. Standby folks, thanks for the help.

@majst01
Copy link
Collaborator

majst01 commented May 13, 2018

Hi @cantide5ga any news here ?

@fmachado
Copy link
Contributor

fmachado commented May 23, 2018

Apologies for being late here.

IMO there is no way to fix the reported issue without removing the OkHttpClient.Builder passed as parameter because users are allowed to create multiple InfluxDB instances (e.g. per request). Of course we could work to educate them but, IMHO, this makes the client a bit fragile.

Errors will happen if we share a single OkHttpClient.Builder between multiple clients or if we synchronize the access.

A quick-and-dirty PoC showing the collateral effects:

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;

import org.influxdb.InfluxDB;
import org.influxdb.InfluxDBFactory;

import okhttp3.OkHttpClient;

public class Main {

	public static void main(String[] args) throws InterruptedException, ExecutionException {
		ExecutorService executor = Executors.newFixedThreadPool(100);
		
		final int maxCallables = 10_000;
		List<Callable<String>> callableList = new ArrayList<>(maxCallables);
		for (int i = 0; i < maxCallables; i++) {
			callableList.add(new Callable<String>() {
				@Override
				public String call() throws Exception {
					MyInfluxDBBean myBean = new MyInfluxDBBean();
					return myBean.connectAndDoNothing1(); //pick one
					return myBean.connectAndDoNothing2(); //pick one
				}
			});
		}
		System.out.println("Invoking all callableList (size()=" + callableList.size() + ")");
		executor.invokeAll(callableList);
		System.out.println("Shutting down...");
		executor.shutdown();
		System.out.println("Shutdown requested and waiting for termination...");
		if (!executor.awaitTermination(20_000, TimeUnit.SECONDS)) {
			executor.shutdownNow();
	    } 
	}
	
	public static class MyInfluxDBBean {

		static final AtomicInteger COUNT = new AtomicInteger(0);
		static final OkHttpClient.Builder OKHTTP_BUILDER = new OkHttpClient.Builder();
		
		InfluxDB influxClient;

		/**
		 * DO NOT USE THIS CODE! This method will throw a <pre>java.lang.IllegalStateException: Null interceptor</pre>.
		 */
		String connectAndDoNothing1() {
			try {
				influxClient = InfluxDBFactory.connect("http://127.0.0.1:8086", "admin", "supersecretpassword", OKHTTP_BUILDER); //LINE56
			} catch (Exception e) {
				e.printStackTrace();
				System.exit(1); // OPS!
			}
			return null;
		}

		/**
		 * DO NOT USE THIS CODE!
		 */
		String connectAndDoNothing2() {
			try {
				synchronized (OKHTTP_BUILDER) {
					influxClient = InfluxDBFactory.connect("http://127.0.0.1:8086", "admin", "supersecretpassword", OKHTTP_BUILDER);
				}
				if (COUNT.incrementAndGet() % 1000 == 0) {
					System.out.println("connectAndDoNothing2: check how many interceptors the builder will use: " + OKHTTP_BUILDER.interceptors().size());
				}
			} catch (Exception e) {
				e.printStackTrace();
				System.exit(1); // OPS!
			}
			return null;
		}
	}
}

Output when executing myBean.connectAndDoNothing1():

Invoking all callableList (size()=10000)
java.lang.IllegalStateException: Null interceptor: [okhttp3.logging.HttpLoggingInterceptor@118d46f0, org.influxdb.impl.GzipRequestInterceptor@2c4d3503, null, okhttp3.logging.HttpLoggingInterceptor@1ec4a5f, org.influxdb.impl.GzipRequestInterceptor@3b8b9cde, org.influxdb.impl.GzipRequestInterceptor@59f01b5d, okhttp3.logging.HttpLoggingInterceptor@2b160cf, org.influxdb.impl.GzipRequestInterceptor@56bb085a, okhttp3.logging.HttpLoggingInterceptor@540aed38, org.influxdb.impl.GzipRequestInterceptor@7397180, okhttp3.logging.HttpLoggingInterceptor@4219b949, org.influxdb.impl.GzipRequestInterceptor@4c6e8088, okhttp3.logging.HttpLoggingInterceptor@4f0ec710, org.influxdb.impl.GzipRequestInterceptor@3770a00, okhttp3.logging.HttpLoggingInterceptor@cab38d6, org.influxdb.impl.GzipRequestInterceptor@34f9830b]
	at okhttp3.OkHttpClient.<init>(OkHttpClient.java:275)
	at okhttp3.OkHttpClient$Builder.build(OkHttpClient.java:935)
	at org.influxdb.impl.InfluxDBImpl.<init>(InfluxDBImpl.java:94)
	at org.influxdb.InfluxDBFactory.connect(InfluxDBFactory.java:84)
	at com.fmachado.test445.Main$MyInfluxDBBean.connectAndDoNothing1(Main.java:56)
	at com.fmachado.test445.Main$1.call(Main.java:29)
	at com.fmachado.test445.Main$1.call(Main.java:1)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Output when executing myBean.connectAndDoNothing1():

Invoking all callableList (size()=10000)
connectAndDoNothing2: check how many interceptors the builder will use: 2000
connectAndDoNothing2: check how many interceptors the builder will use: 4000
connectAndDoNothing2: check how many interceptors the builder will use: 6000
connectAndDoNothing2: check how many interceptors the builder will use: 8000
connectAndDoNothing2: check how many interceptors the builder will use: 10000
connectAndDoNothing2: check how many interceptors the builder will use: 12000
connectAndDoNothing2: check how many interceptors the builder will use: 14000
connectAndDoNothing2: check how many interceptors the builder will use: 16000
connectAndDoNothing2: check how many interceptors the builder will use: 18000
connectAndDoNothing2: check how many interceptors the builder will use: 20000
Shutting down...
Shutdown requested and waiting for termination...

What do you think, guys?

@lxhoan
Copy link
Contributor

lxhoan commented May 25, 2018

I agree, seems multiple loggingInterceptor and gzipRequestInterceptor is likely to result in a super deep call stack (and StackOverflowError consequently). I am not sure if it's also the root cause of the SocketTimeotException, however when the system goes unstable (such as running out of memory), many kind of weird errors can happen. So we first should address that redundancy of loggingInterceptor and gzipRequestInterceptor to isolate the problems

So how about these approaches

  1. Clone the builder
OkHttpClient.Builder clone = client.build().newBuilder();
this.retrofit = new Retrofit.Builder().baseUrl(url)
               .client(clone.addInterceptor(loggingInterceptor).addInterceptor(gzipRequestInterceptor).build())
                .addConverterFactory(MoshiConverterFactory.create())
                .build();
  1. Or check if client.interceptors() has contained an instance of HttpLoggingInterceptor or GzipRequestInterceptor, if yes then replace them by the new ones

@fmachado
Copy link
Contributor

@lxhoan:

  1. Clone the builder

This will create a Builder instance having the same properties of existing OkHttpClient, including the interceptors. Check this source file for more details.

  1. Or check if client.interceptors() has contained an instance of HttpLoggingInterceptor or GzipRequestInterceptor, if yes then replace them by the new ones

This may work. Personally I don't like this approach because:

  1. We will have to serialize the access to the interceptor list for reads and writes;
  2. Interceptors don't implement hashCode and equals so you'll have to manually iterate on the list checking for instance of;

IMO, this "temporary fix" should be clearly stated in the source code (maybe a comment with link to this ticket). We don't want to scary anyone checking the source for the first time and finding this "workaround" without a good explanation. :)

For the next major version, we should also remove the Builder parameter because the correct approach (the one recommended by OkHttpClient library) is not about sharing the Builder.

@lxhoan
Copy link
Contributor

lxhoan commented May 28, 2018

Hi @fmachado,

This will create a Builder instance having the same properties of existing OkHttpClient, including the interceptors. Check this source file for more details.

This is to keep the builder untouched so even this builder is going to be reused multiple times, the InfluxDBImpl constructors will not pile up its interceptor list

Actually I prefer this approach most because it's still unclear these constructors can modify the builder (add more interceptors) or not, so try to keep it immutable and we can prevent bad side effects for user's code

So I believe cloning the builder is not a workaround but a real fix.

I am not sure by now if for next major release we should remove the Builder parameter or not because it gives users some level of customization for their very own network config (such as proxy, key store ...).

@fmachado
Copy link
Contributor

@lxhoan OK. Let's move forward with (1).

@cantide5ga
Copy link
Author

cantide5ga commented Jun 7, 2018

@majst01 et al, sorry for the month long MIA. We migrated anything impacted here to use a single instance and I haven't seen the problem since.

Single instance of OkHttpClient.Builder is already documented. In addition to the above (which makes sense) should use of a single instance of InfluxDb still be pretty clear in the docs? What are the caveats of using multiple instances after this fix? Something should be loud about it if multiple instances are detected.

Willing to help in this aspect.

Regarding the SocketTimeoutException, I'm starting to think this might be unrelated. I've noticed this might be related to the Influx provider (we use a few depending on the environment).

@lxhoan lxhoan added this to the 2.12 milestone Jul 12, 2018
@lxhoan
Copy link
Contributor

lxhoan commented Jul 13, 2018

@cantide5ga ,
Singleton pattern can be considered if a reuse of OkHttpClient.Builder is detected. However I think going with option 1 (keep OkHttpClient.Builder immutable - see the discussion above) seems enough right now, in the scope of a minor release 2.12

lxhoan added a commit to bonitoo-io/influxdb-java that referenced this issue Jul 25, 2018
@lxhoan lxhoan self-assigned this Jul 25, 2018
lxhoan added a commit to bonitoo-io/influxdb-java that referenced this issue Jul 26, 2018
lxhoan added a commit to bonitoo-io/influxdb-java that referenced this issue Jul 26, 2018
lxhoan added a commit to bonitoo-io/influxdb-java that referenced this issue Jul 26, 2018
lxhoan added a commit to bonitoo-io/influxdb-java that referenced this issue Jul 27, 2018
majst01 added a commit that referenced this issue Jul 27, 2018
@lxhoan lxhoan closed this as completed Jul 27, 2018
@lxhoan lxhoan changed the title StackOverflowError when using with OkHttpClient Unpredictable errors when OkHttpClient.Builder instance is reused Jul 27, 2018
@lxhoan
Copy link
Contributor

lxhoan commented Jul 27, 2018

I change the title to reflect the real discussion and the final fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants