Skip to content

Conversation

@Torch3333
Copy link
Contributor

@Torch3333 Torch3333 commented Oct 17, 2025

Description

This changes the default mapping for Oracle BLOB column to a data type that allows storing up to 2GB. The former data type RAW(2000) allowed a maximum data size of 2000 bytes. This changes addresses a similar concerns regarding Db2 that the original data type only allowed to store up 2KB which was deemed too restrictive.

The new Oracle data type used is named BLOB and its maximum capacity is 2GB which also coincides with the maximum capacity that the backing object byte[] of BlobColumn object can hold.

Because of limitations of the Oracle BLOB data type, ScalarDB BLOB column can no longer be used as :

  • primary key
  • clustering key
  • secondary index
  • a column of a where condition in a cross-partition scan, .i.e ScanAll, operation
  • an ordering column in a cross-partition scan

Related issues and/or PRs

Changes made

  • Changes the default mapping for ScalarDB BLOB column in Oracle from RAW(2000) to BLOB
  • Disable some integration tests that can no longer be run on a BLOB column because of the limitations mentioned above
  • Added an integration test for putting and reading a 100MB BLOB for Oracle and Db2.

Checklist

The following is a best-effort checklist. If any items in this checklist are not applicable to this PR or are dependent on other, unmerged PRs, please still mark the checkboxes after you have read and understood each item.

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes.
  • I have considered whether similar issues could occur in other products, components, or modules if this PR is for bug fixes.
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

Provide any additional information or notes that may be relevant to the reviewers or stakeholders.

Release notes

When using Oracle, the default data type used for ScalarDB BLOB column from Oracle is changed from RAW(2000) to BLOB to allow storing data up to 2GB. This brings new limitations that a BLOB column can no longer be used as partition key, clustering key, secondary index, a condition or as an ordering column in a cross-partitions scan, .i.e. ScanAll, operation.

@Torch3333 Torch3333 self-assigned this Oct 17, 2025
Comment on lines +101 to +104
@EnabledIf("isDb2OrOracle")
@ParameterizedTest()
@MethodSource("provideBlobSizes")
public void put_largeBlobData_ShouldWorkCorrectly(int blobSize, String humanReadableBlobSize)
Copy link
Contributor Author

@Torch3333 Torch3333 Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an integration test that verifies if inserting then getting a 100MB blob works fine;
This test fails for other storage for various reasons that require time to investigate. Since this is not of the highest priority, I will investigate the issues at a later time.

options {
systemProperties(System.getProperties().findAll{it.key.toString().startsWith("scalardb")})
}
maxHeapSize = "4g"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to increase the heap because of experiencing failure with the addition of the large BLOB integration tests for Oracle.

Comment on lines 245 to +253
case BLOB:
return "RAW(2000)";
return "BLOB";
Copy link
Contributor Author

@Torch3333 Torch3333 Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates the ScalarDB BLOB data type mapping for Oracle from RAW(2000) (allows a max of 2000 bytes) to BLOB (allows a max of 2GB).

Because of limitations of the Oracle BLOB data type, ScalarDB BLOB column can no longer be used as :

  • primary key
  • clustering key
  • secondary index
  • a column of a condition in a cross-partition scan, .i.e ScanAll, operation
  • an ordering column in a cross-partition scan

Comment on lines 507 to 534
public void bindBlobColumnToPreparedStatement(
PreparedStatement preparedStatement, int index, byte[] bytes) throws SQLException {
// When writing to the BLOB data type with a BLOB size greater than 32766 using a MERGE INTO
// statement, an internal error ORA-03137 on the server side occurs so we needed to use a
// workaround. This has been confirmed to be a limitation by AWS support.
// Below is a detailed explanation of the workaround.
//
// Depending on the byte array size, the JDBC driver automatically choose one the following mode
// to transfer the BLOB data to the server:
// - DIRECT: the most efficient mode. It's used when the byte array length is less than 32767.
// - STREAM: this mode is less efficient. It's used when the byte array length is greater than
// 32766.
// - LOB BINDING: this mode is the least efficient. It's used when an input stream without
// specifying the length is specified.
//
// When the driver selects the STREAM mode, the error
// ORA-03137 occurs. So, we work around the issue by making sure to use the driver in a way so
// that it should never selects the STREAM mode.
// For more details about the modes, see the following documentation:
// https://docs.oracle.com/en/database/oracle/oracle-database/23/jjdbc/LOBs-and-BFiles.html#GUID-8FD40D53-8D64-4187-9F6F-FF78242188AD
if (bytes.length <= 32766) {
// the DIRECT mode is used to send BLOB data of small size
preparedStatement.setBytes(index, bytes);
} else {
// the LOB BINDING mode is used to send BLOB data of large size
InputStream inputStream = new ByteArrayInputStream(bytes);
preparedStatement.setBinaryStream(index, inputStream);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comments explain, there is a limitation with a large BLOB file when executing a ScalarDB Put operation that gets translated to a MERGE INTO SQL statement. This limitation requires special handling to put BLOB data.
I added several integration tests in core/src/integration-test/java/com/scalar/db/storage/jdbc/JdbcDatabaseColumnValueIntegrationTest.java to test the values that could be problematic for the implemented workaround.

@Torch3333
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request changes the default data type for BLOB columns in Oracle from RAW(2000) to BLOB to support storing up to 2GB of data. This change aligns Oracle's BLOB handling with that of Db2 and addresses previous concerns about storage limitations. As a result of using the native BLOB type, several limitations are introduced for Oracle: BLOB columns can no longer be used as primary keys, clustering keys, secondary indexes, or in conditions and orderings for cross-partition scans. The changes include updating the RdbEngine for Oracle, adding new error codes, and modifying integration tests to respect these new limitations. A specific workaround for handling large BLOBs (>32KB) in Oracle has also been implemented. The changes are thorough and well-tested. I have one minor suggestion to improve a comment's clarity.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Torch3333 Torch3333 marked this pull request as ready for review October 20, 2025 00:27
@Copilot Copilot AI review requested due to automatic review settings October 20, 2025 00:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the Oracle JDBC adapter to change the default data type mapping for BLOB columns from RAW(2000) to native Oracle BLOB type, which can store up to 2GB instead of just 2KB. This addresses the restrictive storage limitation of the previous mapping.

  • Changes Oracle BLOB column mapping from RAW(2000) to BLOB for increased storage capacity
  • Introduces restrictions preventing BLOB columns from being used as keys, secondary indexes, or in cross-partition scan conditions/ordering
  • Updates integration tests to disable BLOB-related functionality where Oracle no longer supports it

Reviewed Changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
RdbEngineOracle.java Implements new BLOB data type mapping and validation methods for Oracle limitations
JdbcOperationChecker.java Adds cross-partition scan condition checking for BLOB columns
Multiple test files Updates integration tests to handle Oracle BLOB limitations and adds new BLOB size testing
CoreError.java Adds error messages for Oracle BLOB limitations
Multiple admin/storage test files Updates to disable BLOB column features where Oracle doesn't support them

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Torch3333 and others added 3 commits October 20, 2025 09:30
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Torch3333 Torch3333 requested review from a team, brfrn169, feeblefakie and komamitsu and removed request for a team October 22, 2025 01:49
Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

protected void checkConjunctions(Selection selection, TableMetadata metadata) {
super.checkConjunctions(selection, metadata);
if (selection instanceof ScanAll) {
rdbEngine.throwIfCrossPartitionScanConditionOnBlobColumnNotSupported(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check conjunctions only for cross-partition scan operations? Shouldn’t we also check conjunctions for partition scan and get operations?

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than what Toshi pointed out.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants