Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy vector has smaller loaded size than expected #2237

Closed
rui-mo opened this issue Aug 9, 2022 · 6 comments
Closed

Lazy vector has smaller loaded size than expected #2237

rui-mo opened this issue Aug 9, 2022 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@rui-mo
Copy link
Collaborator

rui-mo commented Aug 9, 2022

Hi,

We are debugging one issue on lazy vector inside a dictionary vector, which causes invalid access to the loadedVector. It would be great if you could provide us with some advice.

We printed some logs and found below clues.
In FilterProject, rows has the same size as input (assume it is size_a), and is marked as all selected. During the call of project(*rows, &evalCtx), error occurs in isNullAt function. In this case, the lazy vector is in a dictionary vector, and the size of dictionaryValues_(the lazy vector) is also size_a (isNullAt). But the actual loadedVector in this lazy vector has smaller size than size_a, so when the index becomes larger than the loaded vector size, invalid access occurs (LazyVector.h).

In this case, looks like the actual loaded vector in the lazy vector is smaller than the expected size. Should they be the same, or the loaded vector can to be smaller than expected? If so, should the selective vector in project(*rows, &evalCtx) be changed accordingly?

In below test, I manually simulated above case in which lazy vector has smaller real size than the size in constructor. It fails during copy->copy process.

TEST_F(LazyVectorTest, lazyBetweenDictionaries) {
  static constexpr int32_t kInnerSize = 320;
  static constexpr int32_t kOuterSize = 10000;

  auto makeWrapped = [&](BufferPtr firstDicNulls, BufferPtr secondDicNulls) {
    auto firstDic = BaseVector::wrapInDictionary(
        std::move(firstDicNulls),
        makeIndices(kInnerSize, [](auto row) { return row; }),
        kInnerSize,
        makeFlatVector<int32_t>(kInnerSize, INTEGER()));

    auto lazy = std::make_shared<LazyVector>(
        pool_.get(),
        INTEGER(),
        kOuterSize,
        std::make_unique<SimpleVectorLoader>([=](RowSet /*rows*/) {
          return firstDic;
        }));

    return BaseVector::wrapInDictionary(
        std::move(secondDicNulls),
        makeIndices(kOuterSize, [](auto row) { return row; }),
        kOuterSize,
        lazy);
  };

  auto wrapped = makeWrapped(nullptr, nullptr);

  SelectivityVector rows(kOuterSize);

  auto copy =
      BaseVector::create(wrapped->type(), wrapped->size(), wrapped->pool());
  copy->copy(wrapped.get(), rows, nullptr);
}

Thanks!

@Yuhta Yuhta self-assigned this Aug 9, 2022
@Yuhta Yuhta added the bug Something isn't working label Aug 9, 2022
@Yuhta
Copy link
Contributor

Yuhta commented Aug 9, 2022

Can you reproduce it using PlanBuilder? It's not clear under which case lazy vector would have a size larger than the base vector.

@rui-mo
Copy link
Collaborator Author

rui-mo commented Aug 10, 2022

Can you reproduce it using PlanBuilder? It's not clear under which case lazy vector would have a size larger than the base vector.

Hi Yuhta, I raised a PR on this issue and it can solve the failure in our test. I would be nice if you could help take a review. Also tried to reproduce this issue locally with PlanBuilder, but the test did not trigger this error. I guess this issue can only occur on certain case. Here is the link if you also want to take a look: test. Thanks.

@Yuhta
Copy link
Contributor

Yuhta commented Aug 10, 2022

When you create lazy in the test, why the length of lazy is claimed to be kOuterSize but the underlying vector is only kInnerSize ? That is the cause of the issue, and that inconsistency is what we need to find out to fix.

@rui-mo
Copy link
Collaborator Author

rui-mo commented Aug 11, 2022

When you create lazy in the test, why the length of lazy is claimed to be kOuterSize but the underlying vector is only kInnerSize ? That is the cause of the issue, and that inconsistency is what we need to find out to fix.

This is just for illustration of the issue we found. Do you suggest the loaded size should be the same as expected? And we need to find out why they are different and solve the bug to make them consistent.

@rui-mo
Copy link
Collaborator Author

rui-mo commented Aug 11, 2022

Hi @Yuhta, I noticed this PR for fixing the missing rows in lazy vector. Tried it locally, and found our issue could be solved. What we are testing is also conditional expressions as below, so the case can be similar.

sum(case when (d_day_name='Sunday') then ss_sales_price else null end) sun_sales,
sum(case when (d_day_name='Monday') then ss_sales_price else null end) mon_sales,

Closing this issue. Thanks for your time.

@rui-mo rui-mo closed this as completed Aug 11, 2022
@Yuhta
Copy link
Contributor

Yuhta commented Aug 11, 2022

Do you suggest the loaded size should be the same as expected?

Yes that is the expected behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants