Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve DDL and insert inot select Possible space inflation issues(#366) #1493

Closed
wants to merge 3 commits into from

Conversation

konghaiya
Copy link
Collaborator

@konghaiya konghaiya commented Mar 30, 2023

Cause of the problem:

  1. For multiple versions, Tianmu needs to first copy an original pack when performing DML operations,
    Modify the copied package and use append write or overwrite write after modification
    (If there is invalid space in the DATA file that can be written to the current pack, use overwrite write, otherwise use append write) to write to the file,
    After the latest package is written to a file, the latest version chain will point to the address that was last written.
    There is a problem with the current (TianmuAttr:: LoadData) logic. Every time you call (TianmuAttr:: LoadData),
    Will write data to disk,
    If there are multiple rows written in a transaction, there will be multiple copies of data,
    "Because the current transaction has not been committed, the space for previous repeated writes has not been released, so the logic of overwriting writes will not be reached.",
    "I only follow the logic of additional writing, which is the fundamental reason for the skyrocketing space.".
    If you encounter a particularly large multiline write transaction, it will lead to a space explosion.
    Moreover, disk IO is performed once per load line, which can also lead to degraded insert performance.
    Solution:
    To optimize the logic of (TianmuAttr:: LoadData), it is necessary to determine whether the data in the pack is full before saving changes,
    Is whether to reach 65536 lines, and if so, write again,
    If it cannot be reached, it is necessary to write again in the commit phase.

Summary about this PR

Issue Number: close #366
Issue Number: close #1079
Issue Number: close #689

Tests Check List

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Changelog

  • New Feature
  • Bug Fix
  • Performance Improvement
  • Build/Testing/CI/CD
  • Documentation
  • Not for changelog (changelog entry is not required)

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features

…on issues(stoneatom#366)

Cause of the problem:
1. For multiple versions, Tianmu needs to first copy an original pack when performing DML operations,
Modify the copied package and use append write or overwrite write after modification
(If there is invalid space in the DATA file that can be written to the current pack, use overwrite write, otherwise use append write) to write to the file,
After the latest package is written to a file, the latest version chain will point to the address that was last written.
There is a problem with the current (TianmuAttr:: LoadData) logic. Every time you call (TianmuAttr:: LoadData),
Will write data to disk,
If there are multiple rows written in a transaction, there will be multiple copies of data,
"Because the current transaction has not been committed, the space for previous repeated writes has not been released, so the logic of overwriting writes will not be reached.",
"I only follow the logic of additional writing, which is the fundamental reason for the skyrocketing space.".
If you encounter a particularly large multiline write transaction, it will lead to a space explosion.
Moreover, disk IO is performed once per load line, which can also lead to degraded insert performance.
Solution:
To optimize the logic of (TianmuAttr:: LoadData), it is necessary to determine whether the data in the pack is full before saving changes,
Is whether to reach 65536 lines, and if so, write again,
If it cannot be reached, it is necessary to write again in the commit phase.
@mergify
Copy link
Contributor

mergify bot commented Mar 30, 2023

This pull request's title should follow requirements next. @konghaiya please check it 👇.

Valid format:

fix(vc): fix sth..... (#3306)
  ^         ^---------^  ^----^
  |         |            |
  |         +            +-> you issue id.
  |         |
  |         +-> Summary in present tense.
  |
  +-------> Type: feat, fix, docs, workflow, style, refactor, test, website, chore

Valid types:

  • feat: new feature for stonedb
  • fix: bug fix for stonedb
  • docs: changes to the documentation
  • workflow: ci/cd in .github
  • perf: Changes to improve code performance
  • refactor: refactoring production code, eg. renaming a variable
  • style: formatting, missing semi colons, etc; no production code change
  • test: adding missing tests, refactoring tests; no production code change
  • website
  • chore: updating grunt tasks etc; no production code change

@codecov
Copy link

codecov bot commented Mar 30, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +42.95 🎉

Comparison is base (05db04d) 0.00% compared to head (d7c177c) 42.95%.

Additional details and impacted files
@@                 Coverage Diff                  @@
##           stonedb-5.7-dev    #1493       +/-   ##
====================================================
+ Coverage                 0   42.95%   +42.95%     
====================================================
  Files                    0     1838     +1838     
  Lines                    0   397379   +397379     
====================================================
+ Hits                     0   170712   +170712     
- Misses                   0   226667   +226667     
Impacted Files Coverage Δ
storage/tianmu/core/column_share.cpp 85.60% <ø> (ø)
storage/tianmu/core/tianmu_attr.cpp 69.55% <100.00%> (ø)
storage/tianmu/core/tianmu_table.cpp 46.54% <100.00%> (ø)

... and 1835 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@konghaiya konghaiya closed this Mar 30, 2023
@mergify
Copy link
Contributor

mergify bot commented Mar 30, 2023

This pull request's title should follow requirements next. @konghaiya please check it 👇.

Valid format:

fix(vc): fix sth..... (#3306)
  ^         ^---------^  ^----^
  |         |            |
  |         +            +-> you issue id.
  |         |
  |         +-> Summary in present tense.
  |
  +-------> Type: feat, fix, docs, workflow, style, refactor, test, website, chore

Valid types:

  • feat: new feature for stonedb
  • fix: bug fix for stonedb
  • docs: changes to the documentation
  • workflow: ci/cd in .github
  • perf: Changes to improve code performance
  • refactor: refactoring production code, eg. renaming a variable
  • style: formatting, missing semi colons, etc; no production code change
  • test: adding missing tests, refactoring tests; no production code change
  • website
  • chore: updating grunt tasks etc; no production code change

1 similar comment
@mergify
Copy link
Contributor

mergify bot commented Mar 30, 2023

This pull request's title should follow requirements next. @konghaiya please check it 👇.

Valid format:

fix(vc): fix sth..... (#3306)
  ^         ^---------^  ^----^
  |         |            |
  |         +            +-> you issue id.
  |         |
  |         +-> Summary in present tense.
  |
  +-------> Type: feat, fix, docs, workflow, style, refactor, test, website, chore

Valid types:

  • feat: new feature for stonedb
  • fix: bug fix for stonedb
  • docs: changes to the documentation
  • workflow: ci/cd in .github
  • perf: Changes to improve code performance
  • refactor: refactoring production code, eg. renaming a variable
  • style: formatting, missing semi colons, etc; no production code change
  • test: adding missing tests, refactoring tests; no production code change
  • website
  • chore: updating grunt tasks etc; no production code change

@konghaiya konghaiya deleted the dev_lhj_366 branch March 30, 2023 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant