Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2 segment support string encode(#1766) #1816

Merged
merged 21 commits into from
Sep 30, 2019
Merged

v2 segment support string encode(#1766) #1816

merged 21 commits into from
Sep 30, 2019

Conversation

wangbo
Copy link
Contributor

@wangbo wangbo commented Sep 17, 2019

#1766
major change

  1. change data format of binary dict page, appending (dict page data) and (dict page offset) to binary dict page;
  2. add new decoding method for new binary dict page format
  3. add ut for segment test
  4. set the elements of initial array to 0 ,when calling arena.AllocateNewBlock
  5. hard code way to choose dict coding for string

0919 commit major change

  1. change dict file format:when saving binary dict page, separate dict page from dict page,one dict page may have multi data pages;when reading a binary dict page,one ColumnReader keeps one dict page
  2. loading dict when calling column_reader._read_page
    3.rollback BinaryDictPage
  3. no longer using memset(0) to inital column_zonemap.max_value

0926 17 commit major change

  1. init column_zone_map min value column_zone_map slice's data array;
    set char/varchar column_zone_map'max value size to 0
  2. add ut for char column zone map query hit/miss

0929 10 commit major change

  1. allocate mem for column_zone_map 's max and min value
  2. direct copy content to column_zone_map's max and min value

be/src/olap/rowset/segment_v2/binary_dict_page.cpp Outdated Show resolved Hide resolved
be/src/olap/rowset/segment_v2/binary_dict_page.cpp Outdated Show resolved Hide resolved
be/src/util/arena.cpp Outdated Show resolved Hide resolved
be/src/olap/rowset/segment_v2/binary_dict_page.h Outdated Show resolved Hide resolved
be/src/olap/rowset/segment_v2/column_reader.cpp Outdated Show resolved Hide resolved
be/src/olap/rowset/segment_v2/column_reader.cpp Outdated Show resolved Hide resolved
be/src/olap/rowset/segment_v2/column_reader.h Outdated Show resolved Hide resolved
be/src/olap/rowset/segment_v2/segment_writer.cpp Outdated Show resolved Hide resolved
be/src/olap/rowset/segment_v2/binary_dict_page.h Outdated Show resolved Hide resolved
be/src/olap/rowset/segment_v2/column_reader.cpp Outdated Show resolved Hide resolved
be/src/olap/rowset/segment_v2/column_reader.cpp Outdated Show resolved Hide resolved
be/src/olap/rowset/segment_v2/column_reader.h Outdated Show resolved Hide resolved
be/src/olap/rowset/segment_v2/segment_writer.cpp Outdated Show resolved Hide resolved
be/test/olap/rowset/segment_v2/binary_dict_page_test.cpp Outdated Show resolved Hide resolved
be/test/olap/rowset/segment_v2/binary_dict_page_test.cpp Outdated Show resolved Hide resolved
be/src/olap/rowset/segment_v2/column_reader.h Outdated Show resolved Hide resolved
be/test/olap/rowset/segment_v2/segment_test.cpp Outdated Show resolved Hide resolved
be/test/olap/rowset/segment_v2/segment_test.cpp Outdated Show resolved Hide resolved
be/test/olap/rowset/segment_v2/segment_test.cpp Outdated Show resolved Hide resolved
for (int j = 0; j < 4; ++j) {
auto cell = row.cell(j);
cell.set_not_null();
set_column_value_by_type(tablet_schema->_cols[j]._type, i * 10 + j, (char*)cell.mutable_cell_ptr(), tablet_schema->_cols[j]._length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, how can you access private field _cols and _type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

me too;not FRIEND_TEST,not "#define private public"

be/test/olap/rowset/segment_v2/segment_test.cpp Outdated Show resolved Hide resolved
} else if (_type_info->type() == OLAP_FIELD_TYPE_CHAR) {
Slice *min_value = (Slice *)_zone_map.min_value;
min_value->data = _max_char_value;
min_value->size = OLAP_CHAR_MAX_LENGTH;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why set char type's max length to 255? I think it should be equal to OLAP_STRING_MAX_LENGTH.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fe defines max char and varchar length,see ScalarType;this is a temporary define,later I will use columns real length

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, for char type, it is a bit complex

Copy link
Contributor

@gaodayue gaodayue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM generally. One thing I'm not sure is the change to varchar's set_to_max since it's been used by segment v1's zonemap. Would it break backward compatibility? @kangpinghuang Could you verify it?

be/src/olap/types.h Outdated Show resolved Hide resolved
be/src/olap/types.h Outdated Show resolved Hide resolved
@@ -587,13 +604,19 @@ struct FieldTypeTraits<OLAP_FIELD_TYPE_VARCHAR> : public FieldTypeTraits<OLAP_FI
}
static void set_to_max(void* buf) {
auto slice = reinterpret_cast<Slice*>(buf);
slice->size = 1;
memset(slice->data, 0xFF, 1);
memset(slice->data, 0xFF, slice->size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should first reset slice->size to OLAP_STRING_MAX_LENGTH

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

be/src/olap/types.h Outdated Show resolved Hide resolved
be/src/olap/field.h Outdated Show resolved Hide resolved
be/src/olap/rowset/segment_v2/column_zone_map.h Outdated Show resolved Hide resolved
2. interpret not cast
3.remove useless var
4. set_to_min/max consist with master
@kangpinghuang
Copy link
Contributor

I have check set_to_max in olap, I think original code has bug and this pr's code has fix it.
I check set_to_max used in two place of v1:

  1. in zone map building. But for char/varchar type, there are no zone map for them, so there is no influence.
  2. in split_range for query, I think the new max is larger than the old one, will not cause problems.
    LGTM

kangpinghuang
kangpinghuang previously approved these changes Sep 30, 2019
Copy link
Contributor

@kangpinghuang kangpinghuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

gaodayue
gaodayue previously approved these changes Sep 30, 2019
Copy link
Contributor

@gaodayue gaodayue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangbo wangbo closed this Sep 30, 2019
@wangbo wangbo reopened this Sep 30, 2019
@wangbo
Copy link
Contributor Author

wangbo commented Sep 30, 2019

current commit is old version:in varchar's set_to_max,slice size set to 1
because set to max_string_value will cause delete_handler_test core dump,I will see this case today

@wangbo
Copy link
Contributor Author

wangbo commented Sep 30, 2019

current commit is old version:in varchar's set_to_max,slice size set to 1
because set to max_string_value will cause delete_handler_test core dump,I will see this case today

Core dump stack

#0 SLL_Next (t=0xffffffffffffffff) at src/linked_list.h:45
#1 SLL_TryPop (rv=, list=0x436dce0) at src/linked_list.h:69
#2 TryPop (rv=, this=0x436dce0) at src/thread_cache.h:220
#3 Allocate (oom_handler=0x25e5700 <tcmalloc::cpp_throw_oom(unsigned long)>, cl=5, size=64, this=) at src/thread_cache.h:379
#4 malloc_fast_pathtcmalloc::cpp_throw_oom (size=64) at src/tcmalloc.cc:1855
#5 tc_new (size=64) at src/tcmalloc.cc:1976
#6 0x0000000000c21672 in doris::WrapperField::WrapperField (this=0x4e07d40, rep=, variable_len=,
is_string_type=) at /home/wangbo/incubator-doris/be/src/olap/wrapper_field.cpp:84
#7 0x0000000000c21b9e in doris::WrapperField::create (column=..., len=, len@entry=0)
at /home/wangbo/incubator-doris/be/src/olap/wrapper_field.cpp:51
#8 0x0000000000cb88e1 in doris::ColumnDataWriter::init (this=0x4bc4780) at /home/wangbo/incubator-doris/be/src/olap/rowset/column_data_writer.cpp:72
#9 0x0000000000ca273c in doris::AlphaRowsetWriter::_init (this=this@entry=0x4bf83c0)
at /home/wangbo/incubator-doris/be/src/olap/rowset/alpha_rowset_writer.cpp:275
#10 0x0000000000ca3694 in doris::AlphaRowsetWriter::init (this=0x4bf83c0, rowset_writer_context=...)
at /home/wangbo/incubator-doris/be/src/olap/rowset/alpha_rowset_writer.cpp:73
#11 0x0000000000c91b03 in doris::RowsetFactory::create_rowset_writer (context=..., output=output@entry=0x7fffcdf75318)
at /home/wangbo/incubator-doris/be/src/olap/rowset/rowset_factory.cpp:50
#12 0x0000000000bc9b3f in doris::TabletManager::_create_inital_rowset (this=this@entry=0x4d5a180, tablet=..., request=...)
at /home/wangbo/incubator-doris/be/src/olap/tablet_manager.cpp:1250
#13 0x0000000000bcc919 in doris::TabletManager::_internal_create_tablet (this=this@entry=0x4d5a180, alter_type=alter_type@entry=doris::SCHEMA_CHANGE,
request=..., is_schema_change_tablet=is_schema_change_tablet@entry=false, ref_tablet=..., data_dirs=...)
at /home/wangbo/incubator-doris/be/src/olap/tablet_manager.cpp:371
#14 0x0000000000bcdd72 in doris::TabletManager::create_tablet (this=this@entry=0x4d5a180, request=..., stores=...)
at /home/wangbo/incubator-doris/be/src/olap/tablet_manager.cpp:318
#15 0x0000000000b82642 in doris::StorageEngine::create_tablet (this=, request=...)
at /home/wangbo/incubator-doris/be/src/olap/storage_engine.cpp:760
#16 0x0000000000b4fac3 in doris::TestDeleteConditionHandler::SetUp (this=0x4e08460)
at /home/wangbo/incubator-doris/be/test/olap/delete_handler_test.cpp:173
#17 0x0000000002530f37 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) ()
#18 0x000000000252ba9c in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) ()
#19 0x0000000002510df7 in testing::Test::Run() ()
#20 0x00000000025116ce in testing::TestInfo::Run() ()
#21 0x0000000002511d25 in testing::TestCase::Run() ()
#22 0x0000000002518899 in testing::internal::UnitTestImpl::RunAllTests() ()
#23 0x0000000002531eb3 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::)(), char const) ()
#24 0x000000000252c7a2 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::)(), char const) ()
#25 0x0000000002517570 in testing::UnitTest::Run() ()
#26 0x00000000009fb316 in RUN_ALL_TESTS () at /home/wangbo/incubator-doris/thirdparty/installed/include/gtest/gtest.h:2233
#27 main (argc=, argv=0x7fffcdf75f88) at /home/wangbo/incubator-doris/be/test/olap/delete_handler_test.cpp:951

core dump happends in wrapper_field.cpp

line:80 _string_content.reset(new char[slice->size]);
the slice->size is 16,it seems reasonable
the code logic here is that we want to create a varchar wrapper_field

before that in column_data_writer.cpp

line:68 WrapperField::create // create varchar wrapper_field first
line:70 _zone_maps[i].first->set_to_max();
...
line:72 WrapperField::create # core dump here,when we want to create a another varchar wrapper_field

in line:68,we get a char[64]
in line:70, varchar-> set_to_max, memset(char[64],0xff,65536)
it seems that unreasonable,correct is:"memset(char[64],0xff,strlen(char[64])])"
So I think that we show set_to_max with the correct column length.

I prepare to solve it with a new pr which allocate ColumnZoneMapBuilder max/min value with real column length

@wangbo wangbo dismissed stale reviews from gaodayue and kangpinghuang via e0d6ca0 September 30, 2019 05:03
@wangbo
Copy link
Contributor Author

wangbo commented Sep 30, 2019

@imay please review again

  1. add comment for dict lazy init
    2.merge master branch to resolve conflict

Copy link
Contributor

@imay imay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@imay imay merged commit 8aa8e08 into apache:master Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants