-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big strings in string columns #154
Conversation
Conflicts: src/tightdb/array_string.hpp
…eral). Also got rid of Column::NodeGetOffsets(), Column::NodeGetRefs(), Array::GetSubArray(), and the const-violating attempt at a moving constructor for Array. Also, Table::to_dot() and friends had a major overhaul. Also no more warnings from GCC 4.4 23/64-bit (Ubuntu 10.04).
Conflicts: src/tightdb/column_mixed_tpl.hpp test/test_array_blob.cpp test/test_table.cpp
I think that it will be most effective for you to do the merge as you have the best overview of how the b-tree handling has been changed. My changes should be limited to string columns. |
@@ -133,6 +151,7 @@ class AdaptiveStringColumn: public ColumnBase { | |||
|
|||
private: | |||
static const size_t short_string_max_size = 15; | |||
static const size_t long_string_max_size = 63; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you come up with "63" as the maximum long string size?
Conflicts: test/test_column.cpp test/test_column_mixed.cpp
Sorry to say, but there Valgrind finds some issues. See attached. On Mon, Sep 23, 2013 at 4:04 AM, astigsen notifications@github.com wrote:
|
These memory bugs are relative to my last commit to your branch, where I As far as I can tell, you have created a few more branches based on top of I'll continue to attempt a merge "big_blobs" and "node-consolidation", but By the way, "big_blobs" also has a memory leak reported by Valgrind. Since ==27916== 640 bytes in 5 blocks are definitely lost in loss record 1 of 1 Note that none of these memory errors occur in master. On Mon, Sep 23, 2013 at 4:47 PM, Kristian Spangsege ks@tightdb.com wrote:
|
Conflicts: src/tightdb/column_basic.hpp src/tightdb/column_basic_tpl.hpp src/tightdb/column_tpl.hpp
Conflicts: src/tightdb/column_string.cpp src/tightdb/column_string.hpp
…lobs Conflicts: src/tightdb/array_blobs_big.cpp
I have merged node-consolidation into your branch. Here the status: ArrayBigBlobs::to_dot() needs to be implemented. Error in query engine. This testcase
gets you to the ASSERT below
|
Please pull my changes into your other branches that are based on this one. |
I have attached the latest from Valgrind against the current state of your Note that the memory errors might be related to the mentioned assert-bug in Everything except the leaks comes from test_query.cpp. On Mon, Sep 23, 2013 at 5:32 PM, Kristian Spangsege ks@tightdb.com wrote:
|
I have fixed the issues in the unit tests. I did try to merge it into the big_blobs_binary branch, but there were too many btree related conflicts needing understanding of your btree related changes. I think that you will be the most appropriate to do this merge. |
Move checksum to front of Array header
Merged into https://github.com/Tightdb/tightdb/tree/breaking-updates This concludes this pull request. |
Note: Contains everything from https://github.com/Tightdb/tightdb/tree/breaking-updates
Here is what is really in this pull request: https://github.com/astigsen/tightdb/compare/Tightdb:breaking-updates...big_blobs
This adds support for big strings in string columns. It used to be that each leaf in a string column would contain MAX_LIST_SIZE strings, which made the leafs very big if the individual strings were big.
This caused problems, both in terms of performance, having to copy-on-write these big leafs on every change, but also by putting a relatively small limit on the sizes of the strings. The fact that all these strings was put together in one leaf, meant that the total length could not be longer that what could be written in the size field in the header (3 bytes).
This PR add a new step for "big" strings which puts each string in it's own Array. This fixes the performance issues and extends the possible size of individual strings to what fits in the Arrays size field (this limitation will be fixed in a future PR).
This same approach will also be used for binary columns.
NOTE: This is a non-breaking change in the file format.NOTE: This is effectively a breaking change in the file format.