Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created Shard Manager Service #6297

Merged
merged 6 commits into from
Oct 1, 2024

Conversation

jakobht
Copy link
Member

@jakobht jakobht commented Sep 23, 2024

What changed?

  • Create new service cadence-shard-manager with a simple config and some initial files.
  • Created a resource factory so we can inject the resource instead of having to test the whole resource initialisation in each service
  • Created obligatory dynamic config properties that are necessary to initialise the Resource struct.

Why?
This is the first step to creating a new shard manager service. This is the first step of the skeleton of the service, to build upon

How did you test it?
Unit tests and local runs

Potential risks

Release notes

Documentation Changes

Copy link

codecov bot commented Sep 23, 2024

Codecov Report

Attention: Patch coverage is 82.60870% with 8 lines in your changes missing coverage. Please review.

Project coverage is 73.43%. Comparing base (dc51527) to head (a8709af).
Report is 25 commits behind head on master.

Files with missing lines Patch % Lines
common/resource/resourceImpl.go 0.00% 4 Missing ⚠️
common/service/metrics.go 0.00% 2 Missing ⚠️
service/shardmanager/service.go 95.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
common/service/name.go 100.00% <ø> (ø)
common/service/metrics.go 0.00% <0.00%> (ø)
service/shardmanager/service.go 95.00% <95.00%> (ø)
common/resource/resourceImpl.go 2.28% <0.00%> (-0.04%) ⬇️

... and 63 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 811ffe7...a8709af. Read the comment docs.


s.Resource.Start()

<-s.stopC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this line for?
Awe we blocking on Start() call?

Copy link
Member Author

@jakobht jakobht Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I think it's a pretty questionable design, it's the same for all the services, e.g. matching (but also all the others):
https://github.com/uber/cadence/blob/6108a443a82b7998582fc8870e012a21ae0a353c/service/matching/service.go#L121

We cannot just remove it, this code depends on it blocking: https://github.com/uber/cadence/blob/fb1324877829f126af9d7e9760636240a1f92bcc/cmd/server/cadence/server.go#L338-L342

I don't know why we track the life cycle in two different places. I think the right way to fix is migrate to fx.

service/shardmanager/service_test.go Show resolved Hide resolved
assert.Equal(t, int32(common.DaemonStatusStopped), atomic.LoadInt32(&service.status))
}

func TestStartAndStopReturnsImmediatelyWhenAlreadyStopped(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it is a bit misleading name ...ReturnsImmediatelyWhenAlreadyStopped because you're not really checking any side-effects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to TestStartAndStopDoesNotChangeStatusWhenAlreadyStopped

Comment on lines +64 to +67
go service.Start()

time.Sleep(100 * time.Millisecond) // The code assumes the service is started when calling Stop
assert.Equal(t, int32(common.DaemonStatusStarted), atomic.LoadInt32(&service.status))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Sleep requires to leverage Start() background activity.
Why Start() is not just blocking? Is that an approach we use for other services?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment for the first comment :-) - #6297 (comment)

- Renamed test
@jakobht jakobht merged commit 45d5403 into cadence-workflow:master Oct 1, 2024
20 checks passed
@mantas-sidlauskas
Copy link
Contributor

Just curious, is there more context available to read on this change?

@jakobht
Copy link
Member Author

jakobht commented Oct 7, 2024

Just curious, is there more context available to read on this change?

There is currently no publicly available context. The idea is to have a service that manages the shards in Cadence instead of relying on ringpop.
This is still very much experimental, but the hope is to centralise the shard management in it's own service. This should improve latency and availability drops during deployments, as well as observability into shard ownership.
We also hope this will give a platform for doing more changes such as shard load balancing.

@pabssen1
Copy link

pabssen1 commented Oct 10, 2024

Screenshot 2024-10-10 at 12 37 48 PM

Hi, getting this error on both local docker setup and k8s helm setup.

Interim Resolution: Use ubercadence/server:v1.2.13-auto-setup

Shaddoll added a commit to Shaddoll/cadence that referenced this pull request Oct 15, 2024
Shaddoll added a commit to Shaddoll/cadence that referenced this pull request Oct 15, 2024
Shaddoll added a commit to Shaddoll/cadence that referenced this pull request Oct 15, 2024
Shaddoll added a commit to Shaddoll/cadence that referenced this pull request Oct 15, 2024
Shaddoll added a commit to Shaddoll/cadence that referenced this pull request Oct 16, 2024
Shaddoll added a commit to Shaddoll/cadence that referenced this pull request Oct 16, 2024
Shaddoll added a commit to Shaddoll/cadence that referenced this pull request Oct 16, 2024
Shaddoll added a commit to Shaddoll/cadence that referenced this pull request Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants