Skip to content

Code to read Jimmy Lin's Common Index File Format files without using protobuf

License

Notifications You must be signed in to change notification settings

andrewtrotman/CommonIndexFileFormat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CommonIndexFileFormat

Code to read Jimmy Lin's Common Index File Format files without using protobuf

The format is describes by the protocol buffer definition:

syntax = "proto3";
package io.anserini.cidxf;
message Posting {
  int32 docid = 1;
  int32 tf = 2;
}
message PostingsList {
  string term = 1;
  int64 df = 2;
  int64 cf = 3;
  repeated Posting posting = 4;
}

Each postings list is written in the protobuf Delimited format. This means that you don't need to read the entire file into memory to process it - but this program does.

About

Code to read Jimmy Lin's Common Index File Format files without using protobuf

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published