branch-3.1: [opt](hive) Speed up Hive insert on partition tables using cache #58166 #58606 #58748 #58886

morningman · 2025-12-10T03:24:58Z

…che#58166) For Hive tables with massive partitions (10K+), INSERT operations are extremely slow because: - FE fetches all partition metadata from HMS directly (expensive RPC calls) - Full table cache invalidation after each insert (unnecessary) Problem Summary: 1. **Use cache for partition metadata in INSERT** - FE now fetches partition info from cache instead of directly querying HMS when preparing INSERT - Avoid expensive HMS RPC calls for every INSERT operation 2. **Selective cache refresh after commit** - Only invalidate affected partitions instead of full table cache - Based on partition update info from BE (NEW/APPEND/OVERWRITE) - Significantly reduces cache invalidation overhead 3. **Handle cache inconsistency gracefully** - When BE marks partition as NEW but it already exists in HMS (cache miss) - FE detects this by checking HMS and treats it as APPEND instead of failing - Prevents `AlreadyExistsException` errors For tables with partitions: - **Before**: HMS calls per INSERT + full cache invalidation - **After**: cache lookup + selective partition refresh - Expected speedup: 10x-100x for partition metadata fetching phas

…les (apache#58606) ### Problem Reproduction Steps: Create a Hive Catalog, create an unpartitioned table, then insert data. The following failure occurs. ``` copy file failed: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The specified key does not exist. (Service: S3, Status Code: 404, ``` The BE mistakenly treats non-partitioned tables as partitioned ones. For partitioned tables, the system always appends a folder suffix for each partition, organizing data into partition directories. However, non-partitioned tables do not require partition information. In this case, the BE incorrectly added a partition folder suffix for non-partitioned tables, causing the insert operation to fail. ### Solution - Skip setting partition information for non-partitioned tables in the BE. - Maintain current behavior for partitioned tables, including folder suffix handling. ### Result - Inserts into non-partitioned object storage tables succeed. - Partitioned tables continue to work as expected. This issue was introduced in apache#58166

…apache#58748) ### What problem does this PR solve? Followup apache#58166 In apache#58166, the edit log need record "modified partitions" and "new partitions" separately, so that non-master FE can correctly update the partition cache. Otherwise, some new partitions can not be queried in non-master FE after inserting.

hello-stephen · 2025-12-10T03:25:04Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

morningman · 2025-12-10T03:29:35Z

run buildall

hello-stephen · 2025-12-10T04:52:11Z

FE UT Coverage Report

Increment line coverage 24.48% (35/143) 🎉
Increment coverage report
Complete coverage report

morningman · 2025-12-10T07:11:38Z

LGTM

…les using cache apache#58166 apache#58606 apache#58748 (apache#58886)" This reverts commit a5ce97a.

…rtition tables using cache #58166 #58606 #58748 (#58886)" (#59348) revert #58932

zy-kkk and others added 3 commits December 10, 2025 11:23

morningman requested a review from morrySnow as a code owner December 10, 2025 03:24

zy-kkk approved these changes Dec 11, 2025

View reviewed changes

morningman changed the title ~~[opt](hive) Speed up Hive insert on partition tables using cache (#58166)(#58606)(#58748)~~ branch-3.1: [opt](hive) Speed up Hive insert on partition tables using cache (#58166)(#58606)(#58748) Dec 15, 2025

morrySnow changed the title ~~branch-3.1: [opt](hive) Speed up Hive insert on partition tables using cache (#58166)(#58606)(#58748)~~ branch-3.1: [opt](hive) Speed up Hive insert on partition tables using cache #58166 #58606 #58748 Dec 15, 2025

morrySnow approved these changes Dec 15, 2025

View reviewed changes

morrySnow merged commit a5ce97a into apache:branch-3.1 Dec 15, 2025
25 of 26 checks passed

morrySnow mentioned this pull request Dec 24, 2025

3.1.4 Release Notes #59325

Open

zy-kkk added a commit to zy-kkk/doris that referenced this pull request Dec 25, 2025

Revert "branch-3.1: [opt](hive) Speed up Hive insert on partition tab…

3819549

…les using cache apache#58166 apache#58606 apache#58748 (apache#58886)" This reverts commit a5ce97a.

morrySnow pushed a commit that referenced this pull request Dec 25, 2025

branh-3.1: Revert "branch-3.1: [opt](hive) Speed up Hive insert on pa…

d3177ba

…rtition tables using cache #58166 #58606 #58748 (#58886)" (#59348) revert #58932

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

branch-3.1: [opt](hive) Speed up Hive insert on partition tables using cache #58166 #58606 #58748 #58886

branch-3.1: [opt](hive) Speed up Hive insert on partition tables using cache #58166 #58606 #58748 #58886

Uh oh!

morningman commented Dec 10, 2025 •

edited by morrySnow

Loading

Uh oh!

hello-stephen commented Dec 10, 2025

Uh oh!

morningman commented Dec 10, 2025

Uh oh!

hello-stephen commented Dec 10, 2025

Uh oh!

morningman commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

branch-3.1: [opt](hive) Speed up Hive insert on partition tables using cache #58166 #58606 #58748 #58886

branch-3.1: [opt](hive) Speed up Hive insert on partition tables using cache #58166 #58606 #58748 #58886

Uh oh!

Conversation

morningman commented Dec 10, 2025 • edited by morrySnow Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hello-stephen commented Dec 10, 2025

Uh oh!

morningman commented Dec 10, 2025

Uh oh!

hello-stephen commented Dec 10, 2025

FE UT Coverage Report

Uh oh!

morningman commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

morningman commented Dec 10, 2025 •

edited by morrySnow

Loading