Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX](resize) fix array and map offsets resize with default value #25669

Merged
merged 2 commits into from
Oct 24, 2023

Conversation

amorynan
Copy link
Contributor

Proposed changes

Issue Number: close #xxx
fix array and map offsets resize with default value
if we create_column() then resize() , offsets in array\map may has wrong default value to push

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

// specific language governing permissions and limitations
// under the License.

#include <gtest/gtest-message.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'gtest/gtest-message.h' file not found [clang-diagnostic-error]

#include <gtest/gtest-message.h>
         ^

return dataTypes;
}

TEST(ResizeTest, ScalaTypeTest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: all parameters should be named in a function [readability-named-parameter]

Suggested change
TEST(ResizeTest, ScalaTypeTest) {
TEST(ResizeTest /*unused*/, ScalaTypeTest /*unused*/) {

}
}

TEST(ResizeTest, ArrayTypeTest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: all parameters should be named in a function [readability-named-parameter]

Suggested change
TEST(ResizeTest, ArrayTypeTest) {
TEST(ResizeTest /*unused*/, ArrayTypeTest /*unused*/) {

}
}

TEST(ResizeTest, ArrayTypeWithValuesTest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: all parameters should be named in a function [readability-named-parameter]

Suggested change
TEST(ResizeTest, ArrayTypeWithValuesTest) {
TEST(ResizeTest /*unused*/, ArrayTypeWithValuesTest /*unused*/) {

col_a->insert(af);
col_a->insert(af);

col_a->resize(10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 10 is a magic number; consider replacing it with a named constant [readability-magic-numbers]

    col_a->resize(10);
                  ^

MutableColumnPtr b = a->create_column();
b->insert_range_from(*col_a, 0, 10);
EXPECT_EQ(b->size(), 10);
ColumnMap* col_map = reinterpret_cast<ColumnMap*>(b.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
ColumnMap* col_map = reinterpret_cast<ColumnMap*>(b.get());
auto* col_map = reinterpret_cast<ColumnMap*>(b.get());

}
}

TEST(ResizeTest, StructTypeTest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: all parameters should be named in a function [readability-named-parameter]

Suggested change
TEST(ResizeTest, StructTypeTest) {
TEST(ResizeTest /*unused*/, StructTypeTest /*unused*/) {


DataTypePtr a = std::make_shared<DataTypeStruct>(dataTypes);
auto col_a = a->create_column();
col_a->resize(10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 10 is a magic number; consider replacing it with a named constant [readability-magic-numbers]

    col_a->resize(10);
                  ^

auto col_a = a->create_column();
col_a->resize(10);
MutableColumnPtr b = a->create_column();
b->insert_range_from(*col_a, 0, 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 10 is a magic number; consider replacing it with a named constant [readability-magic-numbers]

    b->insert_range_from(*col_a, 0, 10);
                                    ^

col_a->resize(10);
MutableColumnPtr b = a->create_column();
b->insert_range_from(*col_a, 0, 10);
EXPECT_EQ(b->size(), 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 10 is a magic number; consider replacing it with a named constant [readability-magic-numbers]

    EXPECT_EQ(b->size(), 10);
                         ^

@amorynan
Copy link
Contributor Author

run buildall

@@ -422,7 +422,8 @@ void ColumnArray::reserve(size_t n) {

//please check you real need size in data column, because it's maybe need greater size when data is string column
void ColumnArray::resize(size_t n) {
get_offsets().resize(n);
auto last_off = get_offsets().back();
get_offsets().resize_fill(n, last_off);
get_data().resize(n);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this get_data().resize(n) correct? for example if this array contains 3 element s[1, 2] [1, 2] [1, 2], so the inner data size is 6 instead of 3. so after doing ::resize(3), the inner data size became 3 and lost 3 nested element.Is this semantic right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should not call get_data().resize(n). Just append offsets with the same value.

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.08% (8309/22409)
Line Coverage: 29.20% (66744/228598)
Region Coverage: 27.84% (34639/124407)
Branch Coverage: 24.43% (17591/72018)
Coverage Report: http://coverage.selectdb-in.cc/coverage/e4709fd4bff8637b4422595677e2b236a8ae9d05_e4709fd4bff8637b4422595677e2b236a8ae9d05/report/index.html

@xiaokang xiaokang added usercase Important user case type label p0_c dev/2.0.3 labels Oct 20, 2023
@@ -422,7 +422,8 @@ void ColumnArray::reserve(size_t n) {

//please check you real need size in data column, because it's maybe need greater size when data is string column
void ColumnArray::resize(size_t n) {
get_offsets().resize(n);
auto last_off = get_offsets().back();
get_offsets().resize_fill(n, last_off);
get_data().resize(n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should not call get_data().resize(n). Just append offsets with the same value.

@@ -422,7 +422,8 @@ void ColumnArray::reserve(size_t n) {

//please check you real need size in data column, because it's maybe need greater size when data is string column
void ColumnArray::resize(size_t n) {
get_offsets().resize(n);
auto last_off = get_offsets().back();
get_offsets().resize_fill(n, last_off);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check if resize_fill will modify old offsets value.

Copy link
Contributor Author

@amorynan amorynan Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

     if (n > old_size) {
            this->reserve(n);
            std::fill(t_end(), t_end() + n - old_size, value);
        }
        this->c_end = this->c_start + this->byte_size(n);

@@ -450,7 +450,8 @@ void ColumnMap::reserve(size_t n) {
}

void ColumnMap::resize(size_t n) {
get_offsets().resize(n);
auto last_off = get_offsets().back();
get_offsets().resize_fill(n, last_off);
keys_column->resize(n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not call resize on keys and values column

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@@ -422,7 +422,8 @@ void ColumnArray::reserve(size_t n) {

//please check you real need size in data column, because it's maybe need greater size when data is string column
void ColumnArray::resize(size_t n) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there the same problem for ColumnStruct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct has no offsets array so do not has this problem

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.54 seconds
stream load tsv: 579 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17162350704 Bytes

@hello-stephen
Copy link
Contributor

run p1

@amorynan
Copy link
Contributor Author

run buildall

@amorynan
Copy link
Contributor Author

run compile

@amorynan
Copy link
Contributor Author

run beut

@amorynan
Copy link
Contributor Author

run clickbench

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.88 seconds
stream load tsv: 553 seconds loaded 74807831229 Bytes, about 129 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17162131278 Bytes

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.05% (8304/22410)
Line Coverage: 29.18% (66661/228417)
Region Coverage: 27.77% (34633/124696)
Branch Coverage: 24.42% (17585/72010)
Coverage Report: http://coverage.selectdb-in.cc/coverage/741457aa1ebb3122d447f07b2fa2551a89347aad_741457aa1ebb3122d447f07b2fa2551a89347aad/report/index.html

@amorynan
Copy link
Contributor Author

run p0

@amorynan
Copy link
Contributor Author

run p1

@amorynan
Copy link
Contributor Author

run external

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 24, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiaokang xiaokang merged commit 9160834 into apache:master Oct 24, 2023
25 of 28 checks passed
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Oct 30, 2023
wsjz pushed a commit to wsjz/incubator-doris that referenced this pull request Nov 19, 2023
@xiaokang xiaokang mentioned this pull request Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.3-merged p0_c reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants