Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storages: Fix obtaining incorrect column information when there are virtual columns in the query #9189

Merged
merged 3 commits into from
Jul 9, 2024

Conversation

JinheLin
Copy link
Contributor

@JinheLin JinheLin commented Jul 5, 2024

What problem does this PR solve?

Issue Number: close #9188

Problem Summary:
When parsing some column expressions, it may need to use column index to obtain column information, but since virtual columns are filtered out before being sent to the storage layer, the original columns and the columns for storage to read are
inconsistency.

What is changed and how it works?

1. Use the original columns  in `query_info.dag_query` instead of `columns_to_read` when building `RSOperator`.
2. For runtime filters, creating the `DM::Attr` object in `StorageDeltaMerge::read`, so it doesn't need to rely on `column_to_read`.
  1. Currently, if a query contains virtual columns, hash join will not be pushed down to TiFlash. So in fact, runtime filters will not occur when a query contains virtual columns. But in order to keep the relevant codes consistent, make them not rely on columns_to_read, the way that runtime filters creating DM::Attr object is modified.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Fix obtaining incorrect column information when there are virtual columns in the query.

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 5, 2024
@JinheLin JinheLin changed the title WIP: Fix column to read WIP: Storages: Fix the issue of obtaining incorrect column information when there are virtual columns in the query Jul 5, 2024
query_info.dag_query = std::make_unique<DAGQueryInfo>(
filter_conditions->conditions,
empty_pushed_down_filters, // Not care now
mockColumnInfosToTiDBColumnInfos(table_schema_for_delta_merge[table_id]),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only local const references prolong the lifespan a temporary.
ref: https://stackoverflow.com/questions/2784262/

As a const reference member of DAGQueryInfo, source_columns cannot prolong the lifespan of a temporary.
DAGQueryInfo::source_columns will be a dangle reference after the contructor of DAGQueryInfo exits.

@@ -99,10 +99,14 @@ PushDownFilterPtr PushDownFilter::build(
has_cast)
{
NamesWithAliases project_cols;
for (size_t i = 0; i < columns_to_read.size(); ++i)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

casted_columns align with table_scan_column_info one by one.

@@ -301,6 +303,8 @@ class StorageDeltaMerge
Context & global_context;

LoggerPtr log;

friend class MockStorage;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MockStorage will call getStoreColumnDefines().

@JinheLin JinheLin changed the title WIP: Storages: Fix the issue of obtaining incorrect column information when there are virtual columns in the query Storages: Fix the issue of obtaining incorrect column information when there are virtual columns in the query Jul 8, 2024
@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Jul 8, 2024
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jul 8, 2024
@JinheLin JinheLin changed the title Storages: Fix the issue of obtaining incorrect column information when there are virtual columns in the query Storages: Fix obtaining incorrect column information when there are virtual columns in the query Jul 8, 2024
Copy link
Member

@CalvinNeo CalvinNeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

ti-chi-bot bot commented Jul 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CalvinNeo, Lloyd-Pottiger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [CalvinNeo,Lloyd-Pottiger]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jul 9, 2024
Copy link
Contributor

ti-chi-bot bot commented Jul 9, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-07-08 03:16:38.57049833 +0000 UTC m=+241095.805732442: ☑️ agreed by Lloyd-Pottiger.
  • 2024-07-09 03:23:24.00387575 +0000 UTC m=+327901.239109864: ☑️ agreed by CalvinNeo.

@ti-chi-bot ti-chi-bot bot merged commit e6fc04a into pingcap:master Jul 9, 2024
5 checks passed
@ti-chi-bot ti-chi-bot added needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. labels Jul 9, 2024
ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Jul 9, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #9206.

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Jul 9, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #9207.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #9208.

JinheLin added a commit to ti-chi-bot/tiflash that referenced this pull request Jul 10, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot pushed a commit that referenced this pull request Jul 11, 2024
…irtual columns in the query (#9189) (#9206)

close #9188

1. Use the original columns  in `query_info.dag_query` instead of `columns_to_read` when building `RSOperator`.
2. For runtime filters, creating the `DM::Attr` object in `StorageDeltaMerge::read`, so it doesn't need to rely on `column_to_read`.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

Co-authored-by: jinhelin <linjinhe33@gmail.com>
@JinheLin JinheLin deleted the fix_column_to_read branch July 12, 2024 09:51
JinheLin added a commit that referenced this pull request Jul 29, 2024
…irtual columns in the query (#9189) (#9206)

close #9188

1. Use the original columns  in `query_info.dag_query` instead of `columns_to_read` when building `RSOperator`.
2. For runtime filters, creating the `DM::Attr` object in `StorageDeltaMerge::read`, so it doesn't need to rely on `column_to_read`.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

Co-authored-by: jinhelin <linjinhe33@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DB::TiFlashException: Column index out of bound: 4, should in [0,3)
4 participants