Skip to content

Conversation

@jepett0
Copy link
Contributor

@jepett0 jepett0 commented Aug 8, 2024

#7608

This PR enables restoring indices from backups with the original partition split boundaries.

This feature is mostly needed to speed-up the restoration from backups. According to my tests (tpch lineitem table with 156x scale on an 8 node cluster with the cpu80_soc2_mem512G_net25G_4ssd preset), it cuts the duration of the BuildIndexes stage of the import/s3 operation from 829 seconds to 568 seconds when restoring a 150GiB table with a single 100GiB index. This is a 31% reduction in BuildIndexes time! 🎉 Total restoration from backup time went from 1259 seconds to 995 seconds which is a 21% reduction.

C++ SDK is also changed a little to enable users to create a table with an index that has specific partitioning settings and uniform partition count or explicit split boundaries. Enabling TTableBuilder to add an index based on its description makes it as capable as session.AlterTable already was. You can see how it can be helpful in the added test.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@jepett0 jepett0 force-pushed the IndexBackupRestore.1 branch from 7abf265 to 781ca4f Compare August 9, 2024 09:56
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@jepett0 jepett0 requested review from MBkkt and ijon August 9, 2024 10:57
@jepett0 jepett0 marked this pull request as ready for review August 9, 2024 10:57
MBkkt
MBkkt previously approved these changes Aug 9, 2024
@jepett0 jepett0 force-pushed the IndexBackupRestore.1 branch from 781ca4f to 0196b02 Compare August 13, 2024 12:14
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

This method was never called, so this bug was hidden.
+ Add a dedicated option to control inclusion of the indexImplTable boundaries in the main table description. This should help minimizing the network traffic and IO CPU pool usage.
These are needed for a good looking test. However, these functions might be helpful for users also. It is currently the only way (except direct GRPCs) to create a table with an index that has predefined split boundaries.
@jepett0 jepett0 force-pushed the IndexBackupRestore.1 branch from 33900bb to 5798b1c Compare August 16, 2024 20:36
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@jepett0 jepett0 merged commit e56caba into ydb-platform:main Aug 21, 2024
stanislav-shchetinin pushed a commit to stanislav-shchetinin/ydb that referenced this pull request Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants