Skip to content
This repository has been archived by the owner on Aug 13, 2019. It is now read-only.

Exposed helper methods for reading index bytes. #492

Merged

Conversation

bwplotka
Copy link
Contributor

@bwplotka bwplotka commented Jan 2, 2019

Changes:

  • ReadSymbols, TOC and ReadOffsetTable are not public functions (used by Thanos).
  • decbufXXX are now functions.
  • More verbose errors.
  • Removed unused crc32 field.
  • Some var name changes to make it more verbose:
    • symbols -> allocatedSymbols
    • symbolsSlice -> symbolsV1
    • symbols -> symbolsV2
  • Pre-calculate symbolsTableSize.
  • Initialized symbols for Symbols() method with valid length.
  • Added test for Symbol method.
  • Made Decoder LookupSymbol method public. Kept Decode public as it is useful as helper from index package.

Signed-off-by: Bartek Plotka bwplotka@gmail.com

Signed-off-by: Bartek Plotka bwplotka@gmail.com

@bwplotka
Copy link
Contributor Author

bwplotka commented Jan 2, 2019

Circle CI does not look right... I guees it is some migration period to CircleCI and it is being set up, right?

@brian-brazil
Copy link
Contributor

That was an internal method that should never have existed (it was a layering violation), nor are there any public methods that take offsets. I don't think this makes sense as part of the API.

Made Decoder LookupSymbol method public, otherwise there is no point in having Decoder struct public.

Decoder should probably be private.

@bwplotka
Copy link
Contributor Author

bwplotka commented Jan 2, 2019

Thanks for taking a look @brian-brazil ! (: I can see totally your concerns with limiting API, but we have IMO valid use case: Thanos as consumer of TSDB library wants to be able use helpers for decoding index bytes:

  • index.Decoder
  • index.Reader.symbolsSlice (or symbols depends on index version)

This is to be able to implement index cache, so storing part of index (version, symbol, values, posting index) in different place than rest (actual postings and series). Which allows us to use S3 API to get_range bytes we need from index.

The truth is that TSDB format (including index format) is an API itself, so why not exposing useful helper methods for accessing it?

I agree that on IndexReader interface you don't want to expose offsets (layering violation), that's why I did not add this method to interface. However, exposing helpers for decoding index bytes on lower level for the reuse purpose is healthy both for Thanos and TSDB (we basically maintain same piece of code!). (:

If you don't agree with reusability pattern, do you have any suggestion how in Thanos we should overcome lack of such helpers?

@brian-brazil
Copy link
Contributor

Looking through it, I think the right way to do it would be to split the parsing logic from the API logic. This would make readSymbols and friends available, rather than changing the API.

@bwplotka
Copy link
Contributor Author

bwplotka commented Jan 3, 2019

Sound Good, will give it a try.

@bwplotka
Copy link
Contributor Author

bwplotka commented Jan 4, 2019

PTAL @brian-brazil . Updated description with summary of changes.

@bwplotka bwplotka changed the title Added back SymbolTable for exposing actual offsets of symbols in index. Exposed helper methods for reading index bytes. Jan 4, 2019
// after offset to hold the big endian encoded content length, followed by the contents and the expected
// checksum.
func newDecbufAt(bs ByteSlice, off int) decbuf {
if bs.Len() < off+4 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

straight copy of r.newDecbufAt just with ByteSlice passed by argument.

// decbufUvarintAt returns a new decoding buffer. It expects the first bytes
// after offset to hold the uvarint-encoded buffers length, followed by the contents and the expected
// checksum.
func newDecbufUvarintAt(bs ByteSlice, off int) decbuf {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@bwplotka bwplotka force-pushed the index-reader-symbols branch 5 times, most recently from ba20032 to e944b59 Compare January 4, 2019 19:38
index/index.go Outdated
res[s] = struct{}{}
}
for _, s := range r.symbolSlice {
for _, s := range r.symbolsV2 {
res[s] = struct{}{}
}
return res, nil
}

// SymbolTableSize returns the symbol table that is used to resolve symbol references.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is out of date

index/index.go Outdated
labelIndicesTable uint64
postings uint64
postingsTable uint64
// TOC represents index Table Of Content that states were each section of index starts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where

Changes:
* Make `NewReader` method useful. It was impossible to use it, because closer was always nil.
* ReadSymbols, TOC and ReadOffsetTable are not public functions (used by Thanos).
* decbufXXX are now functions.
* More verbose errors.
* Removed unused crc32 field.
* Some var name changes to make it more verbose:
  * symbols -> allocatedSymbols
  * symbolsSlice -> symbolsV1
  * symbols -> symbolsV2
  *
* Pre-calculate symbolsTableSize.
* Initialized symbols for Symbols() method with valid length.
* Added test for Symbol method.
* Made Decoder LookupSymbol method public. Kept Decode public as it is useful as helper from index package.

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
@bwplotka
Copy link
Contributor Author

bwplotka commented Jan 7, 2019

Comments addressed.

@bwplotka
Copy link
Contributor Author

kind PTAL @brian-brazil

@brian-brazil brian-brazil merged commit c065fa6 into prometheus-junkyard:master Jan 11, 2019
@brian-brazil
Copy link
Contributor

Thanks!

@bwplotka bwplotka deleted the index-reader-symbols branch January 11, 2019 17:31
radeklesniewski pushed a commit to SabreOSS/tsdb that referenced this pull request Jan 18, 2019
)

Changes:
* Make `NewReader` method useful. It was impossible to use it, because closer was always nil.
* ReadSymbols, TOC and ReadOffsetTable are not public functions (used by Thanos).
* decbufXXX are now functions.
* More verbose errors.
* Removed unused crc32 field.
* Some var name changes to make it more verbose:
  * symbols -> allocatedSymbols
  * symbolsSlice -> symbolsV1
  * symbols -> symbolsV2
  *
* Pre-calculate symbolsTableSize.
* Initialized symbols for Symbols() method with valid length.
* Added test for Symbol method.
* Made Decoder LookupSymbol method public. Kept Decode public as it is useful as helper from index package.

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants