-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic structure of bloom gateways #10782
Conversation
f415580
to
67647f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some suggestions but approving 👍
|
||
// NewRingManager instantiates a new RingManager instance. | ||
// The other functions will assume the RingManager was instantiated through this function. | ||
func NewRingManager(mode ManagerMode, cfg Config, logger log.Logger, registerer prometheus.Registerer) (*RingManager, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: probably a good thing to refactor into independent code. This looks to be repeated in both the scheduler & index-gw
BloomGatewayShardSize(tenantID string) int | ||
} | ||
|
||
type ShardingStrategy interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this also looks copy+pasted from the index-gw, can't we reuse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will consolidate both the ring manager scaffolding and the sharding strategy in a separate PR after merging this one.
3ba9e8e
to
aec3d7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor comments/questions. LGTM.
| | ||
bloomshipper.Shipper | ||
| | ||
bloomshipper.BloomFileClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drawing the flow like this is a really nice practice to follow, i'll adopt it. nit tab tab, alignment on this line :)
// Since we set the shard size to replication factor if shard size is 0, this | ||
// can only happen if both the shard size and the replication factor are set | ||
// to 0. | ||
if shardSize <= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can shardSize be negative? why not ==
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shardSize
is an int
. And the config would allow setting a negative value.
type ForEachBlockCallback func(bq BlockQuerier) error | ||
|
||
type ReadShipper interface { | ||
ForEachBlock(ctx context.Context, tenant string, from, through time.Time, fingerprints []uint64, callback ForEachBlockCallback) error | ||
} | ||
|
||
type WriteShipper interface { | ||
} | ||
|
||
type Shipper interface { | ||
ReadShipper | ||
WriteShipper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this block also have bloom-compactor in mind? otherwise why is WriteShipper needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could remove it, because it's not needed for the purposes of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome. just one important comment that I believe we need to address before merging it
rs, err = subRing.Get(uint32(blockRef.FromFp), BlocksOwnerSync, bufDescs, bufHosts, bufZones) | ||
if err != nil { | ||
return nil, err | ||
} | ||
// Include the block only if it belongs to this bloom gateway shard. | ||
if rs.Includes(s.instanceID) { | ||
filteredBlockRefs = append(filteredBlockRefs, blockRef) | ||
continue | ||
} | ||
|
||
rs, err = subRing.Get(uint32(blockRef.ThroughFp), BlocksOwnerSync, bufDescs, bufHosts, bufZones) | ||
if err != nil { | ||
return nil, err | ||
} | ||
// Include the block only if it belongs to this bloom gateway shard. | ||
if rs.Includes(s.instanceID) { | ||
filteredBlockRefs = append(filteredBlockRefs, blockRef) | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we include the block by two factors: 1) if FromFp
belongs to this instance 2) if ThroughFp belongs to this instance, it means that two separate bloomGw might own the same block, one by FromFp
another one by ThroughFp
.
We should check only FromFp
, it's enough to identify the owner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks cool
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
on top of existing shipper code The existing bloom_shipper.Shipper will be renamed to bloom_shipper.BloomClient since it is mainly responsible for file handling from/to object storage and has a lower level interface than what the bloom gateway consumes. Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Provide no-op implementation of the shipper to define boundary of this pull request. Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
6daeb41
to
796154a
Compare
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks awesome, but I found some cases when isOutsideRange()
does not work
|
||
blockRefs, err := s.getActiveBlockRefs(ctx, tenantID, startTimestamp, endTimestamp, minFingerprint, maxFingerprint) | ||
level.Debug(s.logger).Log("msg", "ForEachBlock", "tenant", tenantID, "from", from, "through", through, "fingerprints", fingerprints) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be thousands of fingerprints, lets skip printing it in the logs
} | ||
return i | ||
} | ||
return len(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must return len(s) - 1
otherwise it will be index out of range error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually possible that the index is out of range: in case the value v
is greater than the last element of s
.
idx := getPosition[[]uint64](fingerprints, b.MinFingerprint) | ||
return b.MaxFingerprint < fingerprints[idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check should give a true
if no one item from fingerprints []uint64
is greater or equal to b.MinFingerprint
and less or equal to b.MaxFingerprint
.
however, this method returns false
for the block b.MinFingerprint: 3 , b.MaxFingerprint: 5
if fingerprints []uint64 is :
[2]
- should be true because the block does not cover this fingerprint[6]
- should be true because the block does not cover this fingerprint
could you please add more test cases to cover this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These cases are actually covered in line 163-166
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
**What this PR does / why we need it**: The errors below were introduced during adding initial service structure for BloomGateway #10782 and BloomCompactor #10748 components. ``` docker-loki-read-3 | panic: runtime error: invalid memory address or nil pointer dereference docker-loki-read-3 | [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x9e6d90] docker-loki-read-3 | docker-loki-read-3 | goroutine 1 [running]: docker-loki-read-3 | github.com/grafana/dskit/kv.createClient({_, _}, {_, _}, {{{0x22f6128, 0xe}, {{0x0, 0x0}}, 0x4a817c800, 0x0, ...}, ...}, ...) docker-loki-read-3 | /src/loki/vendor/github.com/grafana/dskit/kv/client.go:158 +0x360 docker-loki-read-3 | github.com/grafana/dskit/kv.NewClient({{0x22ec444, 0xa}, {0x22ee95d, 0xb}, {{{0x22f6128, 0xe}, {{...}}, 0x4a817c800, 0x0, 0x3ff0000000000000, ...}, ...}, ...}, ...) ``` This PR initialies new components as part of MemberlistKV. **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: Tested on local via docker workflow **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. <!-- TODO(salvacorts): Add example PR -->
### Summary This pull requests adds the basic structure for the new bloom gateway component. - Adds new `bloom-gateway` target that runs with multiple instances joined by a ring - Adds a querier and client component on the index gateway to filter chunk refs - Adds the gRPC protobuf definitions for commication between index gateways and bloom gateways - Adds a store component used on the bloom gateways to query binary bloom files ``` Querier Query Frontend | | ................................... service boundary | | +----+------+ | indexgateway.Gateway** | bloomgateway.BloomQuerier | bloomgateway.GatewayClient | logproto.BloomGatewayClient | ................................... service boundary | bloomgateway.Gateway | bloomshipper.Store | bloomshipper.Shipper | bloomshipper.BloomFileClient** | ObjectClient** | ................................... service boundary | object storage ** not part of this PR ``` This PR still contains a lot of TODOs and possibilities for optimisations, which will be addressed in subsequent pull requests. Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
**What this PR does / why we need it**: The errors below were introduced during adding initial service structure for BloomGateway grafana#10782 and BloomCompactor grafana#10748 components. ``` docker-loki-read-3 | panic: runtime error: invalid memory address or nil pointer dereference docker-loki-read-3 | [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x9e6d90] docker-loki-read-3 | docker-loki-read-3 | goroutine 1 [running]: docker-loki-read-3 | github.com/grafana/dskit/kv.createClient({_, _}, {_, _}, {{{0x22f6128, 0xe}, {{0x0, 0x0}}, 0x4a817c800, 0x0, ...}, ...}, ...) docker-loki-read-3 | /src/loki/vendor/github.com/grafana/dskit/kv/client.go:158 +0x360 docker-loki-read-3 | github.com/grafana/dskit/kv.NewClient({{0x22ec444, 0xa}, {0x22ee95d, 0xb}, {{{0x22f6128, 0xe}, {{...}}, 0x4a817c800, 0x0, 0x3ff0000000000000, ...}, ...}, ...}, ...) ``` This PR initialies new components as part of MemberlistKV. **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: Tested on local via docker workflow **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](grafana@d10549e) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. <!-- TODO(salvacorts): Add example PR -->
Summary
This pull requests adds the basic structure for the new bloom gateway component.
bloom-gateway
target that runs with multiple instances joined by a ringThis PR still contains a lot of TODOs and possibilities for optimisations, which will be addressed in subsequent pull requests.