-
Notifications
You must be signed in to change notification settings - Fork 40
For Oracle, change the data type for BLOB column to support storing up to 2GB #3070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
c02b715 to
58f82af
Compare
| @EnabledIf("isDb2OrOracle") | ||
| @ParameterizedTest() | ||
| @MethodSource("provideBlobSizes") | ||
| public void put_largeBlobData_ShouldWorkCorrectly(int blobSize, String humanReadableBlobSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an integration test that verifies if inserting then getting a 100MB blob works fine;
This test fails for other storage for various reasons that require time to investigate. Since this is not of the highest priority, I will investigate the issues at a later time.
| options { | ||
| systemProperties(System.getProperties().findAll{it.key.toString().startsWith("scalardb")}) | ||
| } | ||
| maxHeapSize = "4g" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to increase the heap because of experiencing failure with the addition of the large BLOB integration tests for Oracle.
| case BLOB: | ||
| return "RAW(2000)"; | ||
| return "BLOB"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates the ScalarDB BLOB data type mapping for Oracle from RAW(2000) (allows a max of 2000 bytes) to BLOB (allows a max of 2GB).
Because of limitations of the Oracle BLOB data type, ScalarDB BLOB column can no longer be used as :
- primary key
- clustering key
- secondary index
- a column of a condition in a cross-partition scan, .i.e ScanAll, operation
- an ordering column in a cross-partition scan
| public void bindBlobColumnToPreparedStatement( | ||
| PreparedStatement preparedStatement, int index, byte[] bytes) throws SQLException { | ||
| // When writing to the BLOB data type with a BLOB size greater than 32766 using a MERGE INTO | ||
| // statement, an internal error ORA-03137 on the server side occurs so we needed to use a | ||
| // workaround. This has been confirmed to be a limitation by AWS support. | ||
| // Below is a detailed explanation of the workaround. | ||
| // | ||
| // Depending on the byte array size, the JDBC driver automatically choose one the following mode | ||
| // to transfer the BLOB data to the server: | ||
| // - DIRECT: the most efficient mode. It's used when the byte array length is less than 32767. | ||
| // - STREAM: this mode is less efficient. It's used when the byte array length is greater than | ||
| // 32766. | ||
| // - LOB BINDING: this mode is the least efficient. It's used when an input stream without | ||
| // specifying the length is specified. | ||
| // | ||
| // When the driver selects the STREAM mode, the error | ||
| // ORA-03137 occurs. So, we work around the issue by making sure to use the driver in a way so | ||
| // that it should never selects the STREAM mode. | ||
| // For more details about the modes, see the following documentation: | ||
| // https://docs.oracle.com/en/database/oracle/oracle-database/23/jjdbc/LOBs-and-BFiles.html#GUID-8FD40D53-8D64-4187-9F6F-FF78242188AD | ||
| if (bytes.length <= 32766) { | ||
| // the DIRECT mode is used to send BLOB data of small size | ||
| preparedStatement.setBytes(index, bytes); | ||
| } else { | ||
| // the LOB BINDING mode is used to send BLOB data of large size | ||
| InputStream inputStream = new ByteArrayInputStream(bytes); | ||
| preparedStatement.setBinaryStream(index, inputStream); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the comments explain, there is a limitation with a large BLOB file when executing a ScalarDB Put operation that gets translated to a MERGE INTO SQL statement. This limitation requires special handling to put BLOB data.
I added several integration tests in core/src/integration-test/java/com/scalar/db/storage/jdbc/JdbcDatabaseColumnValueIntegrationTest.java to test the values that could be problematic for the implemented workaround.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request changes the default data type for BLOB columns in Oracle from RAW(2000) to BLOB to support storing up to 2GB of data. This change aligns Oracle's BLOB handling with that of Db2 and addresses previous concerns about storage limitations. As a result of using the native BLOB type, several limitations are introduced for Oracle: BLOB columns can no longer be used as primary keys, clustering keys, secondary indexes, or in conditions and orderings for cross-partition scans. The changes include updating the RdbEngine for Oracle, adding new error codes, and modifying integration tests to respect these new limitations. A specific workaround for handling large BLOBs (>32KB) in Oracle has also been implemented. The changes are thorough and well-tested. I have one minor suggestion to improve a comment's clarity.
...integration-test/java/com/scalar/db/storage/jdbc/JdbcDatabaseColumnValueIntegrationTest.java
Outdated
Show resolved
Hide resolved
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the Oracle JDBC adapter to change the default data type mapping for BLOB columns from RAW(2000) to native Oracle BLOB type, which can store up to 2GB instead of just 2KB. This addresses the restrictive storage limitation of the previous mapping.
- Changes Oracle BLOB column mapping from
RAW(2000)toBLOBfor increased storage capacity - Introduces restrictions preventing BLOB columns from being used as keys, secondary indexes, or in cross-partition scan conditions/ordering
- Updates integration tests to disable BLOB-related functionality where Oracle no longer supports it
Reviewed Changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| RdbEngineOracle.java | Implements new BLOB data type mapping and validation methods for Oracle limitations |
| JdbcOperationChecker.java | Adds cross-partition scan condition checking for BLOB columns |
| Multiple test files | Updates integration tests to handle Oracle BLOB limitations and adds new BLOB size testing |
| CoreError.java | Adds error messages for Oracle BLOB limitations |
| Multiple admin/storage test files | Updates to disable BLOB column features where Oracle doesn't support them |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
...src/main/java/com/scalar/db/api/DistributedStorageCrossPartitionScanIntegrationTestBase.java
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/jdbc/RdbEngineOracle.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/jdbc/RdbEngineOracle.java
Outdated
Show resolved
Hide resolved
...integration-test/java/com/scalar/db/storage/jdbc/JdbcDatabaseColumnValueIntegrationTest.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
| protected void checkConjunctions(Selection selection, TableMetadata metadata) { | ||
| super.checkConjunctions(selection, metadata); | ||
| if (selection instanceof ScanAll) { | ||
| rdbEngine.throwIfCrossPartitionScanConditionOnBlobColumnNotSupported( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check conjunctions only for cross-partition scan operations? Shouldn’t we also check conjunctions for partition scan and get operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than what Toshi pointed out.
Thank you!
Description
This changes the default mapping for Oracle BLOB column to a data type that allows storing up to 2GB. The former data type
RAW(2000)allowed a maximum data size of 2000 bytes. This changes addresses a similar concerns regarding Db2 that the original data type only allowed to store up 2KB which was deemed too restrictive.The new Oracle data type used is named
BLOBand its maximum capacity is 2GB which also coincides with the maximum capacity that the backing objectbyte[]of BlobColumn object can hold.Because of limitations of the Oracle
BLOBdata type, ScalarDB BLOB column can no longer be used as :wherecondition in a cross-partition scan, .i.e ScanAll, operationRelated issues and/or PRs
Changes made
RAW(2000)toBLOBChecklist
Additional notes (optional)
Release notes
When using Oracle, the default data type used for ScalarDB BLOB column from Oracle is changed from
RAW(2000)toBLOBto allow storing data up to 2GB. This brings new limitations that a BLOB column can no longer be used as partition key, clustering key, secondary index, a condition or as an ordering column in a cross-partitions scan, .i.e. ScanAll, operation.