Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Loki modules to services #1804

Merged
merged 21 commits into from
Apr 23, 2020
Merged

Convert Loki modules to services #1804

merged 21 commits into from
Apr 23, 2020

Conversation

pstibrany
Copy link
Member

cortexproject/cortex#2166, but for logs.

This PR converts loki modules to act as "services", which are objects with operational state with methods to start/stop the service, observable state and listening methods for state transitions. Please check Cortex PR 2166 for details and proposal document.

What it means for Loki:

  • All components like Distributor or Lifecycler now begin their life in "New" state, at which point they are not running yet.
  • After all components are initialized, Loki creates a service manager to track them, and starts them all concurrently
  • Services returned by modules are wrapped in via newModuleServiceWrapper function, which adds waiting for dependencies: eg. Ingester doesn't fully start until all its dependencies are started. Ring doesn't stop until all services using it stop first.
  • Server service waits until all other services stop, before stopping
  • Lifecycler being a service now doesn't stop the whole process, but enters Failed state on failures. Components using a Lifecycler must react on it (typically by entering Failed state on their own)
  • When Loki detects that any service has failed, it shuts down and reports error via exit code
  • there is now single /ready handler, which reports 200 only if all services are in Running state. If any service is starting, stopping, terminated or failed, it returns 503 (StatusServiceUnavailable) state.
  • Services can perform startup tasks in their Starting states. This is useful for replaying WALs or fetching extra data to handle queries (TSDB index headers in Cortex)

Special notes for your reviewer: Please familiarize yourself with Service interface and perhaps BasicService implementation.

Checklist

  • Documentation added
  • Tests updated

@pstibrany
Copy link
Member Author

This PR builds on top of #1799, but still somehow pulls more files into vendor. I'll rebase when #1799 is merged, I think number of changes will go down still.

@pstibrany
Copy link
Member Author

ARM build currently fails because of thanos-io/thanos#2268.

@cyriltovena
Copy link
Contributor

Ping me for a review when everything is ready.

@pstibrany
Copy link
Member Author

Ping me for a review when everything is ready.

Thanks Cyril! Thanos build is now fixed, but we need new Thanos in Cortex first. That’s the next step.

@pstibrany
Copy link
Member Author

Rebased on top of #1869 to get latest Thanos with 32-bit compilation fix in.

@pstibrany
Copy link
Member Author

32bit ARM build is working now! Let's focus on #1869 first though.

@pstibrany
Copy link
Member Author

With #1869 merged, I've rebased on top of Loki master. I think this is ready for review now.

@cyriltovena
Copy link
Contributor

cyriltovena commented Apr 23, 2020

Going to need a bit of rebase, I'm starting the review.

If you're busy let me know I can try to take over.

@pstibrany
Copy link
Member Author

I will try to rebase.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
@cyriltovena
Copy link
Contributor

cortexproject/cortex#2166, but for logs.

🤣

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
It now starts all background tasks in Starting state.
Stopping needs little work, as does reacting on lifecycler errors.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
It now doesn't call os.Exit, but shuts down gracefully and enters Failed state.
That triggers Loki to shutdown completely.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
This is a signal that Loki should stop.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
It checks the state of all services, and asks ingester for its own check as well.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
@pstibrany
Copy link
Member Author

Rebased, with tests passing.

@codecov-io
Copy link

Codecov Report

Merging #1804 into master will decrease coverage by 0.32%.
The diff coverage is 48.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1804      +/-   ##
==========================================
- Coverage   64.69%   64.37%   -0.33%     
==========================================
  Files         125      131       +6     
  Lines        9539     9995     +456     
==========================================
+ Hits         6171     6434     +263     
- Misses       2941     3081     +140     
- Partials      427      480      +53     
Impacted Files Coverage Δ
pkg/chunkenc/facade.go 2.70% <0.00%> (-0.16%) ⬇️
pkg/ingester/transfer.go 65.00% <0.00%> (ø)
pkg/loki/loki.go 0.00% <0.00%> (ø)
pkg/loki/module_service_wrapper.go 0.00% <0.00%> (ø)
pkg/loki/status.go 0.00% <0.00%> (ø)
pkg/loki/modules.go 15.78% <9.63%> (+3.00%) ⬆️
pkg/storage/store.go 61.86% <13.04%> (-11.82%) ⬇️
pkg/ingester/instance.go 59.44% <40.00%> (+0.62%) ⬆️
pkg/storage/stores/local/shipper.go 53.65% <53.65%> (ø)
pkg/storage/stores/local/uploads.go 55.55% <55.55%> (ø)
... and 20 more

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
pkg/loki/loki.go Outdated Show resolved Hide resolved
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyriltovena cyriltovena merged commit e60164b into grafana:master Apr 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants