- Authorization Introduced the
credential-java
authorization package, now supporting authentication withAlibabaCloudCredentialsProvider
. - StreamUploadSession Added awareness for Slot updates and automatic retry logic.
- table-api Introduced the
TableRetryHandler
class, adding retry logic to thetable-api
. - udf The
InputSplitter
now includes the methodsetLimit
.
- TypeInfo The
StructTypeInfo
class now includes the methodgetTypeName(boolean quote)
. In version0.51.0-public (rc0)
,StructTypeInfo
defaulted to quoting field names with backticks. We suspect that this change may affect users, so we decided to revert to the original behavior (not quoting by default). Users can now callgetTypeName(true)
when quoting is needed.
- TypeInfo Field names will now be correctly escaped when quoted with backticks.
- MCQA2 Fixed an issue where the
getRawTaskResults
interface call in MCQA2 jobs could not retrieve results.
- MapReduce Supports multi pipeline output.
- VolumeBuilder Added the
accelerate
method to speed up the download process using dragonfly when the external volume is too large. - Table Introduced
TableType OBJECT_TABLE
and the methodisObjectTable
to verify it. - Project The
list
method now includes a filter conditionenableDr
to filter projects based on whether data disaster recovery is enabled. - Cluster New fields added:
clusterRole
,jobDataPath
, andzoneId
.
- TableBatchReadSession The
predicate
class variable is now set to transient. - Attribute added escaping logic and will no longer double quote.
- SQLTask Restored the
SQLTask.run(Odps odps, String project, String sql, String taskName, Map<String, String> hints, Map<String, String> aliases, int priority)
method removed in version 0.49.0 to resolve potential interface conflicts when users' MR jobs depend on older versions of the SDK.
- Table.changeOwner Fixed SQL spelling error.
- Instance.getTaskSummary Removed unreasonable debug logging introduced since version 0.50.2.
- TruncTime Uses backticks to quote
columnName
during table creation/toString.
Note: This version also includes all changes from "0.51.0-public.rc0" and "0.51.0-public.rc1".
- Logview Added support for Logview V2, details see November 14, 2024 (UTC+8): Notice on the security upgrade for MaxCompute LogView
. It can be created
using
new Logview(odps, 2)
, and SQLExecutor specifies the version through thelogviewVersion
method.
- Column
ColumnBuilder
adds a newwithGenerateExpression
method for constructing auto-partition columns - TableSchema
- Added
generatePartitionSpec
method, used to generate partition information fromRecord
- The
setPartitionColumns
method now acceptsList<Column>
instead ofArrayList<Column>
- Added
- TableCreator
- Added support for
GenerateExpression
and introduced the methodautoPartitionBy
, which allows for the creation of AutoPartition tables. - Added support for
ClusterInfo
, enabling the creation of Hash/Range Cluster tables. - Added the option to specify
TableFormat
, allowing for the creation of tables inAPPEND
,TRANSACTION
,DELTA
,EXTERNAL
, andVIEW
formats. - Introduced the
selectStatement
parameter forcreate table as
andcreate view as
scenarios. - Added the
getSql
method to obtain the SQL statement for table creation. - Now quotes all
Comment
parameters to support those that contain special characters. - Integrated DataHub-related table creation parameters (
hubLifecycle
,shardNum
) intoDataHubInfo
. - Renamed the
withJars
method towithResources
to indicate it can use resources other than JAR files. - Renamed the
withBucketNum
method towithDeltaTableBucketNum
to indicate this method is for Delta Tables only. - Modified the logic of
withHints
,withAlias
,withTblProperties
, andwithSerdeProperties
methods, now overwriting previous values instead of merging. - Removed the
createExternal
method; you can now use thecreate
method instead.
- Added support for
- Table
- Introduced the
getSchemaVersion
method, allowing users to retrieve the current schema version of the table. The version number is updated each time a Schema Evolution occurs, and this field is used primarily for specifying when creating a StreamTunnel. - Added
setLifeCycle
,changeOwner
,changeComment
,touch
,changeClusterInfo
,rename
,addColumns
,dropColumns
methods to support modification of table structure.
- Introduced the
- StreamTunnel Modified the initialization logic; if
allowSchemaMismatch
is set tofalse
, it will automatically retry until the latest version of the table structure is used (with a timeout of 5 minutes).
- GenerationExpression Fixed the issue where an exception would be thrown when the
TruncTime
was uppercase during table creation and reloading the table. - TypeInfoParser Can now correctly handle
Struct
types, with fields quoted using backticks inTypeInfo
.
- GenerateExpression added support for generating expression lists for partition columns, along with the first generated expression
TruncTime
. For usage, please refer to Example - UpsertStream supports writing values with primary keys of type
TIMESTAMP_NTZ
- Table added new methods for querying CDC-related data:
getCdcSize()
,getCdcRecordNum()
,getCdcLatestVersion()
,getCdcLatestTimestamp()
- SQLExecutor MCQA 2.0 job supports retrieving InstanceProgress information
- TypeInfo added backticks for quoting names in Struct type TypeInfo and other methods that assemble SQL
- AutoClosable to remind users to properly close resources, added corresponding
close()
methods to the following resource classes to prompt users to close resources correctly:UpsertStream
in theodps-sdk-core
package,LocalOutputStreamSet
,ReduceDriver.ReduceContextImpl
,MapDriver.DirectMapContextImpl
,LocalRecordWriter
in theodps-sdk-impl
packageVectorizedOutputer
,VectorizedExtractor
,RecordWriter
,RecordReader
,Outputer
,Extractor
in theodps-sdk-udf
package
- TableAPI added retry logic for errors in network requests that can be safely retried,
improving the stability of the interface. A new configuration option,
retryWaitTimeInSeconds
, has been added toRestOptions
to specify the retry wait time. - SQLTask added an overload of the
run
method that supports passing in themcqaConnHeader
parameter for submitting MCQA 2.0 jobs. - SQLExecutor now supports specifying the
odps.task.wlm.quota
hint to set the interactive quota when submitting MCQA 2.0 jobs. - RestClient introduced a new
retryWaitTime
parameter along with corresponding getter and setter methods to configure the retry wait time for network requests. - Configuration added a new
socketRetryTimes
parameter with corresponding getter and setter methods to configure the retry wait time for Tunnel network requests. If not set, it will use the configuration inRestClient
; otherwise, this configuration will be used.
- Instances removed the overloaded
get
methodget(String projectName, String id, String quotaName, String regionId)
, which was added in version0.50.2-public
to retrieve MCQA 2.0 instances. Now, users do not need to distinguish whether a job is an MCQA 2.0 job when using theget
method, so this method has been removed. Users can directly use theget(String projectName, String id)
method to retrieve instances.
- Table.read fixed an issue where the configured network-related parameters (such as timeout and retry logic) did not take effect correctly during data preview.
- Streams fixed an issue where specifying the
version
in thecreate
method would cause an error. A default value of1
has also been added forversion
, indicating the initial version of the table.
- PartitionSpec Added a new constructor
(String, boolean)
that uses a boolean parameter to specify whether to trim partition values. This caters to scenarios (such as using char type as a partition field) where users may not want to trim partition values.
- Instance The OdpsException thrown when calling the stop method will no longer be wrapped a second time.
- SQLExecutor
- Fixed an issue in MCQA 1.0 mode where the
user-specified
fallbackPolicy.isFallback4AttachError
did not take effect correctly. - Fixed an issue in MCQA 2.0 mode where the
cancel
method threw an exception when the job failed. - Fixed an issue in MCQA 2.0 mode where using instanceTunnel to fetch results resulted in an error when the isSelect check was incorrect.
- Fixed an issue in MCQA 1.0 mode where the
user-specified
- Table Fixed an issue with the
getPartitionSpecs
method that trimmed partition values, causing the retrieval of non-existing partitions.
- SQLExecutor In MCQA 1.0 mode, it is allowed to add custom fallback policies, add
subclass
FallbackPolicy.UserDefinedFallbackPolicy
.
- SQLExecutor Enhanced MCQA 2.0 functionality:
isActive
will return false, indicating that there are no active Sessions in MCQA 2.0 mode.- Added a
cancel
method to terminate ongoing jobs. getExecutionLog
now returns a deep copy of the current log and clears the current log, preventing duplicates.- New
quota
method inSQLExecutorBuilder
allows reusing already loadedQuota
, reducing load times. - New
regionId
method inSQLExecutorBuilder
allows specifying the region where the quota is located.
- Quotas Added
getWlmQuota
method withregionId
parameter to fetch quota for a specified regionId. - Quota Introduced
setMcqaConnHeader
method to allow users to override quota using a custom McqaConnHeader, supporting MCQA 2.0. - Instances Added
get
method applicable for MCQA 2.0 jobs, requiring additional parameters for QuotaName and RegionId. - Instance Further adapted for MCQA 2.0 jobs.
- TableSchema
basicallyEquals
method will no longer strictly check for identical Class types.
- SQLExecutor The
run
method's hints will now be deep-copied, preserving the user-provided Map and supporting immutable types (e.g.,ImmutableMap
).
- Stream Fixed potential SQL syntax errors in the
create
method.
- TableAPI Fixed an issue where
ArrayRecord
could not correctly invoketoString
when usingSplitRecordReaderImpl
to retrieve results. - TableAPI Fixed an issue where a
get
operation would throw an array index out of bounds exception when the number ofRecords
corresponding to aSplit
is 0 while usingSplitRecordReaderImpl
to retrieve results. - TableAPI Fixed an issue with composite predicates
CompositePredicate
that could lead to an additional operator being added when encountering an empty predicate.
- Added
SchemaMismatchException
: This exception will be thrown when usingStreamUploadSession
if the Record structure uploaded by the user does not match the table structure. This exception will additionally carry the latest schema version to assist users in rebuilding the Session and performing retry operations. - Added
allowSchemaMismatch
method inStreamUploadSession.Builder
: This method specifies whether to tolerate mismatches between the user's uploaded Record structure and the table structure without throwing an exception. The default value istrue
.
- Fixed an issue where specifying
tunnelEndpoint
in Odps was ineffective when usingStreamUploadSession
. - Fixed a potential NPE issue in
TunnelRetryHandler
.
- SQLExecutor added
isUseInstanceTunnel
method:- Used to determine whether to use instanceTunnel to obtain results
- Fixed an issue where when using SQLExecutor to execute MCQA 2.0 jobs, executing the CommandApi task would affect the next job, causing NPE to be thrown when retrieving results.
- SQLExecutor supports submitting MCQA 2.0 jobs
- SQLExecutorBuilder adds method
enableMcqaV2
- SQLExecutorBuilder adds getter methods for fields
- SQLExecutorBuilder adds method
- SQLExecutor adds
getQueryId
method:- For offline jobs and MCQA 2.0 jobs, it returns the currently executing job's InstanceId
- For MCQA 1.0 jobs, it returns the InstanceId and SubQueryId
- TableAPI adds
SharingQuotaToken
parameter inEnvironmentSettings
to support sharing quota resources during job submission - Quotas introduces
getWlmQuota
method:- Allows retrieval of detailed quota information based on projectName and quotaNickName, including whether it belongs to interactive quotas
- Quota class adds
isInteractiveQuota
method to determine if a quota belongs to interactive quotas (suitable for MCQA 2.0)
Adds getResultByInstanceTunnel(Instance instance, String taskName, Long limit, boolean limitEnabled)
method:
-
Allows unlimited retrieval of results via instanceTunnel (lifting restrictions requires higher permissions)
-
UpsertSession.Builder adds
setLifecycle
method to configure the session lifecycle
- Fixed the issue where using SQLExecutor to execute offline jobs with
limitEnabled
specified resulted in no effect - Modified the SQLExecutor so that
getQueryId
method returns the job's instanceID instead of null when executing offline jobs - Fixed the issue where using instanceTunnel to retrieve results on encountering non-select statements no longer throws exceptions, instead falling back to non-tunnel logic
- Fixed the problem of missing one data entry when using DownloadSession to download data and an error occurred while the read count equaled the number of records to be read minus one
- The
clone
method of the Odps class now correctly clones other fields, includingtunnelEndpoint
- The Instance's
getRawTaskResults
method now does not make multiple requests when processing synchronous jobs
-
OdpsRecordConverter Enhancement: Now supports converting data to SQL-compatible formats. For example, for the
LocalDate
type, data can be converted to"DATE 'yyyy-mm-dd'"
format. Additionally, for theBinary
type, hex representation format is now supported. -
Enhanced Predicate Pushdown for Storage Constants: Improved the behavior of the
Constant
class and added theConstant.of(Object, TypeInfo)
method. Now, when setting or identifying types as time types, the conversion to SQL-compatible format can be done correctly (enabling correct pushdown of time types). Other type conversion issues have been fixed; anIllegalArgumentException
will be thrown during session creation when conversion to SQL-compatible mode is not possible. -
UpsertSession Implements Closable Interface: Notifies users to properly release local resources of the UpsertSession.
-
SQLExecutorBuilder New Method
offlineJobPriority
: Allows setting the priority of offline jobs when a job rolls back. -
New Method in Table Class
getLastMajorCompactTime
: Used to retrieve the last time the table underwent major compaction. -
New Method in Instance Class
create(Job job, boolean tryWait)
: When thetryWait
parameter is true, the job will attempt to wait on the server for a period of time to obtain results more quickly. -
Resource Class Enhancement: Now able to determine if the corresponding resource is a temporary resource.
-
CreateProjectParma class enhancement Added
defaultCtrlService
parameter to specify the default control cluster of the project.
-
UpsertStream NPE Fix: Fixed an issue where an NPE was thrown during flush when a local error occurred, preventing a proper retry.
-
Varchar/Char type fix: Fixed the problem that when the
Varchar/Char
type obtains its length and encounters special characters such as Chinese symbols or emoticons, it will be incorrectly calculated twice.
- Introduced internal validation of compound predicate expressions, fixed logic when handling invalid or always true/false predicates, enhanced test coverage, and ensured stability and accuracy in complex query optimization.
- TableTunnel Configuration Optimization: Introduced the
tags
attribute toTableTunnel Configuration
, enabling users to attach custom tags to tunnel operations for enhanced logging and management. These tags are recorded in the tenant-levelinformation schema
.
Odps odps;
Configuration configuration=
Configuration.builder(odps)
.withTags(Arrays.asList("tag1","tag2")) // Utilize Arrays.asList for code standardization
.build();
TableTunnel tableTunnel=odps.tableTunnel(configuration);
// Proceed with tunnel operations
- Instance Enhancement: Added the
waitForTerminatedAndGetResult
method to theInstance
class, integrating optimization strategies from versions 0.48.6 and 0.48.7 for theSQLExecutor
interface, enhancing operational efficiency. Refer tocom.aliyun.odps.sqa.SQLExecutorImpl.getOfflineResultSet
for usage.
- SQLExecutor Offline Job Processing Optimization: Significantly reduced end-to-end latency by
enabling immediate result retrieval after critical processing stages of offline jobs executed
by
SQLExecutor
, without waiting for the job to fully complete, thus boosting response speed and resource utilization.
- TunnelRetryHandler NPE Fix: Rectified a potential null pointer exception issue in
the
getRetryPolicy
method when the error code (error code
) wasnull
.
- Serializable Support:
- Key data types like
ArrayRecord
,Column
,TableSchema
, andTypeInfo
now support serialization and deserialization, enabling caching and inter-process communication.
- Key data types like
- Predicate Pushdown:
- Introduced
Attribute
type predicates to specify column names.
- Introduced
- Tunnel Interface Refactoring:
- Refactored Tunnel-related interfaces to include seamless retry logic, greatly enhancing stability and robustness.
- Removed
TunnelRetryStrategy
andConfigurationImpl
classes, which are now replaced byTunnelRetryHandler
andConfiguration
respectively.
- SQLExecutor Optimization:
- Improved performance when executing offline SQL jobs through the
SQLExecutor
interface, reducing one network request per job to fetch results, thereby decreasing end-to-end latency.
- Improved performance when executing offline SQL jobs through the
- Decimal Read in Table.read:
- Fixed issue where trailing zeroes in the
decimal
type were not as expected in theTable.read
interface.
- Fixed issue where trailing zeroes in the
- Added the
getPartitionSpecs
method to theTable
interface. Compared to thegetPartitions
method, this method does not require fetching detailed partition information, resulting in faster execution.
-
Removed the
isPrimaryKey
method from theColumn
class. This method was initially added to support users in specifying certain columns as primary keys when creating a table. However, it was found to be misleading in read scenarios, as it does not communicate with the server. Therefore, it is not suitable for determining whether a column is a primary key. Moreover, when using this method for table creation, primary keys should be table-level fields (since primary keys are ordered), and this method neglected the order of primary keys, leading to a flawed design. Hence, it has been removed in version 0.48.5.For read scenarios, users should use the
Table.getPrimaryKey()
method to retrieve primary keys. For table creation, users can now use thewithPrimaryKeys
method in theTableCreator
to specify primary keys during table creation.
- Fixed an issue in the
RecordConverter
where formatting aRecord
of typeString
would throw an exception when the data type wasbyte[]
.
- Use
table-api
to write MaxCompute tables, now supportsJSON
andTIMESTAMP_NTZ
types odps-sdk-udf
functions continue to be improved
- When the Table.read() interface encounters the Decimal type, it will currently remove the trailing 0 by default (but will not use scientific notation)
- Fixed the problem that ArrayRecord does not support the getBytes method for JSON type
- Support for passing
retryStrategy
when buildingUpsertSession
.
- The
onFlushFail(String, int)
interface inUpsertStream.Listener
has been marked as@Deprecated
in favor ofonFlushFail(Throwable, int)
interface. This interface will be removed in version 0.50.0. - Default compression algorithm for Tunnel upsert has been changed to
ODPS_LZ4_FRAME
.
- Fixed an issue where data couldn't be written correctly in Tunnel upsert when the compression
algorithm was set to something other than
ZLIB
. - Fixed a resource leak in
UpsertSession
that could persist for a long time ifclose
was not explicitly called by the user. - Fixed an exception thrown by Tunnel data retrieval interfaces (
preview
,download
) when encountering invalidDecimal
types (such asinf
,nan
) in tables; will now returnnull
to align with thegetResult
interface.
- Fixed the issue of relying on the user's local time zone when bucketing primary keys of DATE and DATETIME types during Tunnel upsert. This may lead to incorrect bucketing and abnormal data query. Users who rely on this feature are strongly recommended to upgrade to version 0.48.2.
Table
adds a methodgetTableLifecycleConfig()
to obtain the lifecycle configuration of hierarchical storage.TableReadSession
now supports predicate pushdown
Arrow and ANTLR Libraries: Added new includes to the Maven Shade Plugin configuration for better handling and packaging of specific libraries. These includes ensure that certain essential libraries are correctly packaged into the final shaded artifact. The newly included libraries are:
- org.apache.arrow:arrow-format:jar
- org.apache.arrow:arrow-memory-core:jar
- org.apache.arrow:arrow-memory-netty:jar
- org.antlr:ST4:jar
- org.antlr:antlr-runtime:jar
- org.antlr:antlr4:jar
- org.antlr:antlr4-runtime:jar
Shaded Relocation for ANTLR and StringTemplate: The configuration now includes updated relocation rules for org.antlr and org.stringtemplate.v4 packages to prevent potential conflicts with other versions of these libraries that may exist in the classpath. The new shaded patterns are: org.stringtemplate.v4 relocated to com.aliyun.odps.thirdparty.org.stringtemplate.v4 org.antlr relocated to com.aliyun.odps.thirdparty.antlr
- Introduced
odps-sdk-udf
module to allow batch data reading in UDFs for MaxCompute, significantly improving performance in high-volume data scenarios. Table
now supports retrievingColumnMaskInfo
, aiding in data desensitization scenarios and relevant information acquisition.- Support for setting proxies through the use of
odps.getRestClient().setProxy(Proxy)
method. - Implementation of iterable
RecordReader
andRecordReader.stream()
method, enabling conversion to a Stream ofRecord
objects. - Added new parameters
upsertConcurrentNum
andupsertNetworkNum
inTableAPI RestOptions
for more detailed control for users performing upsert operations via the TableAPI. - Support for
Builder
pattern in constructingTableSchema
. - Support for
toString
method inArrayRecord
.
UploadSession
now supports configuration of theGET_BLOCK_ID
parameter to speed up session creation when the client does not needblockId
.- Enhanced table creation method using the
builder
pattern (TableCreator
), making table creation simpler.
- Fixed a bug in
Upsert Session
where the timeout setting was configured incorrectly. - Fixed the issue where
TimestampWritable
computed one second less when nanoseconds were negative.
- Support for new Stream type that enables incremental queries.
preview
method to theTableTunnel
for data preview purposes.OdpsRecordConverter
for parsing and formatting records.- Enhancements to the
Projects
class withcreate
anddelete
methods now available, andupdate
method made public. Operations related to thegroup-api
package are now marked as deprecated. - Improved
Schemas
class to support filtering schemas withSchemaFilter
, listing schemas, and retrieving detailed schema metadata. DownloadSession
introduces new parameterdisableModifiedCheck
to bypass modification checks andfetchBlockId
to skip block ID list retrieval.TableWriteSession
supports writingTIMESTAMP_NTZ
/JSON
types and adds a new parameterMaxFieldSize
.TABLE_API
addspredicate
related classes to support predicate pushdown in the future.
- The implementation of the
read
method in theTable
class is now replaced withTableTunnel.preview
, supporting new types in MaxCompute and time types switched to Java 8 time types without timezone. - The default
MapWritable
implementation switched fromHashMap
toLinkedHashMap
to ensure order. Column
class now supports creation using the Builder pattern.
TableReadSession
now introduces new parametersmaxBatchRawSize
andsplitMaxFileNum
.UpsertSession
enhancements:- Supports writing partial columns.
- Allows setting the number of Netty thread pools with the default changed to 1.
- Enables setting maximum concurrency with the default value changed to 16.
TableTunnel
now supports settingquotaName
option.