MDEV-34703 LOAD DATA INFILE using Innodb bulk load aborts #3751

Thirunarayanan · 2025-01-13T05:45:11Z

The Jira issue number for this PR is: MDEV-34703

Description

Problem:

During load statement, InnoDB bulk operation relies on temporary directory and it got crash when exhausted.

Solution:

For load statement, InnoDB bulk operation does the following

Avoids creation of temporary file for clustered index.
Use normal insert operation for clustered index
Writes the undo log for first insert operation alone

How can this PR be tested?

./mtr innodb.bulk_load

Basing the PR against the correct MariaDB version

This is a new feature or a refactoring, and the PR is based against the main branch.
This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
x] For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

CLAassistant · 2025-01-13T05:45:17Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

vaintroub · 2025-01-13T11:33:31Z

I gave it a try using the test case from the bug report https://jira.mariadb.org/browse/MDEV-34703. The results are good -
prior to the patch, the bulk optimization was slowing down import by approx 50 percent (i.e on Windows, where it did not crash, and with good IO)

Now the optimization actually works, and cuts load time by factor 2, i.e

No-bulk	Bulk- prior to patch	Bulk-patched
234.634	373.120	129.285

mysql-test/suite/innodb/t/bulk_load.test

storage/innobase/include/row0merge.h

storage/innobase/include/trx0trx.h

dr-m · 2025-01-13T13:49:14Z

storage/innobase/include/trx0trx.h

+    /* Avoid using bulk buffer for load statement */
+    if (index->is_clust() && it->second.load_stmt())
+      return nullptr;


The load in the name of the member function or data member is not descriptive. It’s really about skipping the merge sort, and possibly also skipping the building of the index one page at a time. The name needs to reflect that.

Based on the changes of row_merge_bulk_t::bulk_insert_buffered() it seems that skip_sort would be an apppropriate name.

Shouldn’t it suffice to test this flag only, and to add ut_ad(index->is_primary())? In that way, we’d execute fewer conditions in a release build.

storage/innobase/row/row0merge.cc

dr-m · 2025-01-13T13:53:17Z

storage/innobase/row/row0merge.cc

+      trx->error_info= index;
+    else if (ind.is_primary() && index->table->persistent_autoinc)
+      btr_write_autoinc(index, 1);
+    err= btr_bulk.finish(err);


Should this be avoided if we got err!=DB_SUCCESS from the previous call?

BtrBulk::finish() is called even when error exists. This internally calls pageAbort(page_bulk);

mysql-test/suite/innodb/r/bulk_load.result

storage/innobase/row/row0merge.cc

storage/innobase/include/trx0trx.h

storage/innobase/include/row0merge.h

dr-m

Thank you. This is rather simple. I would suggest some clarification to the comments.

storage/innobase/include/row0merge.h

storage/innobase/include/trx0trx.h

dr-m · 2025-01-15T15:16:42Z

storage/innobase/row/row0merge.cc

+  /** During load bulk, InnoDB does build the clustered index
+  by one record at a time and doesn't use bulk buffer.
+  So skip the clustered index while applying the buffered
+  bulk operation */
+  if (i)
+    index= UT_LIST_GET_NEXT(indexes, index);


The comment should start with /* and not /**. It would be good to mention load_one_row() here, as well as row_ins_index_entry(), which will insert subsequent rows by invoking row_ins_clust_index_entry().

While I was figuring out the logic from the code, I noticed that row_ins_index_entry() is not ideal:

if (index->is_btree()) { if (auto t= trx->check_bulk_buffer(index->table)) { /* MDEV-25036 FIXME: row_ins_check_foreign_constraint() check should be done before buffering the insert operation. */ ut_ad(index->table->skip_alter_undo || !trx->check_foreigns); return t->bulk_insert_buffered(*entry, *index, trx); } } if (index->is_primary()) { return row_ins_clust_index_entry(index, entry, thr, 0);

is_clust() should be equivalent to is_primary() here, because the code is not being invoked for ibuf.index. Also, is_primary() should imply is_btree(). We seem to be unnecessarily traversing trx->mod_tables for each index. The check had better be done in row_ins(), once for all indexes, and then for the primary clustered index separately. However, refactoring this is outside the scope of this ticket. Can you please file a separate MDEV for this, and only revise the comment here?

problem: ======= - During load statement, InnoDB bulk operation relies on temporary directory and it got crash when tmpdir is exhausted. Solution: ======== During bulk insert, LOAD statement is building the clustered index one record at a time instead of page. By doing this, InnoDB does the following 1) Avoids creation of temporary file for clustered index. 2) Writes the undo log for first insert operation alone

Thirunarayanan requested a review from dr-m January 13, 2025 05:45

Thirunarayanan force-pushed the 10.11-MDEV-34703 branch 2 times, most recently from 509f16c to 5b90845 Compare January 13, 2025 08:38

Thirunarayanan added the MariaDB Corporation label Jan 13, 2025

Thirunarayanan force-pushed the 10.11-MDEV-34703 branch from 5b90845 to 59895a4 Compare January 13, 2025 11:24

dr-m reviewed Jan 13, 2025

View reviewed changes

dr-m mentioned this pull request Jan 14, 2025

MDEV-34719 Disable purge for LOAD DATA INFILE into empty table #3628

Closed

5 tasks

Thirunarayanan force-pushed the 10.11-MDEV-34703 branch from 59895a4 to 33a3704 Compare January 15, 2025 08:27

dr-m reviewed Jan 15, 2025

View reviewed changes

Thirunarayanan force-pushed the 10.11-MDEV-34703 branch from 33a3704 to 71af51b Compare January 15, 2025 13:16

dr-m approved these changes Jan 15, 2025

View reviewed changes

Thirunarayanan force-pushed the 10.11-MDEV-34703 branch from 71af51b to 2d42e9f Compare January 15, 2025 18:19

dr-m approved these changes Jan 16, 2025

View reviewed changes

Thirunarayanan merged commit 2d42e9f into 10.11 Jan 16, 2025
13 of 14 checks passed

Thirunarayanan deleted the 10.11-MDEV-34703 branch January 16, 2025 08:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDEV-34703 LOAD DATA INFILE using Innodb bulk load aborts #3751

MDEV-34703 LOAD DATA INFILE using Innodb bulk load aborts #3751

Thirunarayanan commented Jan 13, 2025

CLAassistant commented Jan 13, 2025

vaintroub commented Jan 13, 2025

dr-m Jan 13, 2025

dr-m Jan 13, 2025

Thirunarayanan Jan 14, 2025 •

edited

Loading

dr-m left a comment

dr-m Jan 15, 2025

MDEV-34703 LOAD DATA INFILE using Innodb bulk load aborts #3751

MDEV-34703 LOAD DATA INFILE using Innodb bulk load aborts #3751

Conversation

Thirunarayanan commented Jan 13, 2025

Description

Problem:

Solution:

How can this PR be tested?

Basing the PR against the correct MariaDB version

PR quality check

CLAassistant commented Jan 13, 2025

vaintroub commented Jan 13, 2025

dr-m Jan 13, 2025

Choose a reason for hiding this comment

dr-m Jan 13, 2025

Choose a reason for hiding this comment

Thirunarayanan Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

dr-m left a comment

Choose a reason for hiding this comment

dr-m Jan 15, 2025

Choose a reason for hiding this comment

Thirunarayanan Jan 14, 2025 •

edited

Loading