-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PreparedStatementHandle leak leading to OOM #2264
Comments
I think your sample code is hard to follow, you should include all looping logic and batchAdd functionality and try to use only the minimum number of columns possible for readability. if you add prepared statements to a batch I think it will keep them all open until you close the batch (potentially even the connection?). |
Hi @nicolaslledo, We'll look at this, but the above comment is right, this sample code is hard to follow, and if it can be simplified, that would be very helpful. |
I have edited the code. It is generated by Talend Open Studio, an ETL recently aquired by Qlik. Batchadd is not used, Talend generate it that way. |
I've a workaround for now. When using version 6 of the MSSQL driver, it made the process extremly slow. |
It looks like a loop with reusing a prepared statement (an update and insert statement) which does not get closed in the loop iteration. Maybe thats enough to reproduce the problem while using the statement pooling. |
The repro code above does not use the MSSQL JDBC driver. Are you able to provide appropriate repro code for our driver? |
I tried to remove as much unwanted detail that I could.
What I saw:
My two cents on this. 2 possible problems,
|
That’s a good analysis. I think the variations in signatures is (not only for the client) a thing which can be optimized a bit - for example using 3 digits precision and only more if needed? (I don’t think the types can be removed from the hash). But how many different hashes does that create in your case? the second point sounds more severe. btw: maybe it makes sense to chunk your batches to something like 10k, that also makes the transactions smaller (and would allow streaming/parallel preparation) |
Hi @nicolaslledo, My confusion comes from this line:
Are you not using the Oracle JDBC driver in this example? Has this problem been replicated with the MSSQL JDBC driver? Thank you for the breakdown, we'll take a look at this further if this is a MSSQL JDBC issue, but if not, you would need to reach out to the appropriate team. |
I just found that 1.2.15 might reduce this part, as it does no longer use variable scale by default, as seen here #2248 |
Sorry for the confusion. Data is read on Oracle then inserted/updated in SQL Server using ms sql driver. |
I upped the pool at 10 K to delay the memory leak and it eventually came. So I presume it's more. It doesn't only involve BigDecimal and DECIMAL but also INT / BIGINT and DATETIME2 / DATE. Most columns may be null too. As you point out, the main problem is the leak even if causation is the hash. You're right, the developer was a bit rough with the transaction size and duration. ^^' |
Hi @nicolaslledo, We may have a solution to this issue, if you are able to test. We're still not able to replicate this issue on our end, but the thought is that discarded prepared statement handles are not cleaned up often enough, only being cleaned on connection close. The becomes a problem when running UPDATES and INSERTS on the scale that you are doing above. We have moved the cleanup code to the start of every execute, which should, if this theory is correct, resolve the issue. The changes are in #2272. In the meantime, we will continue to try to replicate this on our end. |
I have an reproducer, the following testclass creates the problem: package net.eckenfels.test;
import java.math.BigDecimal;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.sql.Statement;
import org.testcontainers.containers.Container.ExecResult;
import org.testcontainers.utility.DockerImageName;
import org.testcontainers.utility.MountableFile;
import org.testcontainers.containers.MSSQLServerContainer;
public class Main {
public static void main(String[] args) throws SQLException
{
MSSQLServerContainer dbContainer = new MSSQLServerContainer<>(/*
* DockerImageName.parse(
* "mcr.microsoft.com/mssql/server:2017-CU12")
*/).acceptLicense();
dbContainer.start();
String url = dbContainer.getJdbcUrl() + ";disableStatementPooling=false;statementPoolingCacheSize=1000";
String user = dbContainer.getUsername();
String pass = dbContainer.getPassword();
try (Connection c = DriverManager.getConnection(url, user, pass))
{
createTable(c);
c.setAutoCommit(false); // or true, doesnt change the outcome
try (PreparedStatement ps = c.prepareStatement(
"UPDATE tab SET c1=?, c2=?, c3=?, c4=?, c5=?, c6=?, c7=?, c8=?, c9=?, c10=?, c11=?, c12=?, c13=?, c14=?, c15=?, c16=?, c17=?, c18=?, c19=?, c20=? WHERE cKey=?"))
{
for (int i = 0; i < 10_000_000; i++) {
setArguments(i, ps);
ps.executeUpdate();
if (i % 100_000 == 0)
System.out.println(" " + i);
}
}
c.commit();
}
}
private static void setArguments(int i, PreparedStatement ps) throws SQLException
{
ps.setString(21, "key");
for(int c = 1; c < 21; c++)
{
//for each iteration use a DECIMAL definition declaration encoding it in binary
boolean bit = (i & (1 << (c-1))) != 0;
BigDecimal num = bit ? new BigDecimal(1.1) : new BigDecimal(1);
ps.setBigDecimal(c, num);
}
}
private static void createTable(Connection c) throws SQLException
{
try (Statement s = c.createStatement())
{
s.execute("CREATE TABLE tab (cKey VARCHAR(100), c1 DECIMAL, c2 DECIMAL, c3 DECIMAL,"
+"c4 DECIMAL, c5 DECIMAL, c6 DECIMAL, c7 DECIMAL, c8 DECIMAL, c9 DECIMAL,"
+"c10 DECIMAL, c11 DECIMAL, c12 DECIMAL, c13 DECIMAL, c14 DECIMAL, c15 DECIMAL,"
+"c16 DECIMAL, c17 DECIMAL, c18 DECIMAL, c19 DECIMAL, c20 DECIMAL)");
s.execute("INSERT INTO tab(cKey) VALUES('key')");
}
}
} with the follwoing pom: <?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>net.eckenfels.test</groupId>
<artifactId>mssql-leaktest</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.release>11</maven.compiler.release>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<version.slf4j>2.0.9</version.slf4j>
<version.testcontainers>1.19.2</version.testcontainers>
</properties>
<dependencies>
<dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>mssql-jdbc</artifactId>
<version>12.4.2.jre11</version>
</dependency>
<!-- testcontainer/docker has conflicting versions -->
<dependency>
<artifactId>slf4j-api</artifactId>
<groupId>org.slf4j</groupId>
<version>${version.slf4j}</version>
</dependency>
<!-- make slf4j not complain (configure: -Dorg.slf4j.simpleLogger.defaultLogLevel=info) -->
<dependency>
<artifactId>slf4j-simple</artifactId>
<groupId>org.slf4j</groupId>
<version>${version.slf4j}</version>
</dependency>
<!-- testcontainer modules for testing various RDBMS -->
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>testcontainers</artifactId>
<version>${version.testcontainers}</version>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>mssqlserver</artifactId>
<version>${version.testcontainers}</version>
</dependency>
</dependencies>
</project> and the following command to start it:
This shows up quickly in the histogram (after 100.000 updates):
The idea here is that for each execution i use a different combination of scale for the 20 decimals (basically binary encoding the iteration number). But this could also happen from other types like NULL, DateTime, etc (I guess). In the default case with no cache, its totally fine and does not leak. |
Driver version
mssql-jdbc-12.4.2.jre8.jar
SQL Server version
Microsoft SQL Server 2019 (RTM-CU21) (KB5025808) - 15.0.4316.3 (X64) Jun 1 2023 16:32:31 Copyright (C) 2019 Microsoft Corporation Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2019 Standard 10.0 (Build 17763: ) (Hypervisor)
Client Operating System
Windows Server 2019
JAVA/JVM version
version 1.8.0_362, vendor Amazon.com Inc.
Table schema
Problem description
PreparedStatementHandle are leaking. A batch is updating or inserting around 4 millions rows in a single table.
For an unknown reason although there are only two prepared statements, the cache defined with
setStatementPoolingCacheSize
is rapidely exhausted (in my case 1000) and the number of references tocom.microsoft.sqlserver.jdbc.SQLServerConnection$PreparedStatementHandle
andcom.microsoft.sqlserver.jdbc.SQLServerConnection$CityHash128Key
instances is constantly increasing.This leads to an OutOfMemoryException.
Expected behavior
Only two preparedStatement handles in the connection cache.
Actual behavior
Error message/stack trace
Any other details that can be helpful
The java code genereted by Talend Open Studio
JDBC trace logs
The text was updated successfully, but these errors were encountered: