forked from blugelabs/ice
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
First go at shrinking segment sizes - all by adjusting how locations …
…are stored Position, start and end offset are all stored as deltas from the last location - though end is a delta from start and so is the length of the original token - however in many cases this is not stored as described later. Every block of locations associated with a term has its first integer stored as a set of flags - which in practice will end up as a single byte in the file. This currently has one flag which says the fields in the locations are explicitly stored or not. If not stored all fields in the locations are the same as the field the associated dictionary refers to. The position delta is actually stored by shifting it left one bit and using the bottom bit to indicate if the length to be stored to indicate the end of the token is the same as the length of the term associated with the list of tokens. If the length is the same the bit is set and the end field does not exist.
- Loading branch information
waddyano
committed
Feb 22, 2022
1 parent
3993755
commit a5ffbee
Showing
7 changed files
with
251 additions
and
38 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.