Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade gogo/protobuf to 1.3.0 #2055

Merged
merged 22 commits into from
Feb 20, 2020

Conversation

cyriltovena
Copy link
Contributor

@cyriltovena cyriltovena commented Jan 30, 2020

Signed-off-by: Cyril Tovena cyril.tovena@gmail.com

What this PR does:

This PR upgrades gogo/protobuf generation tools from 1.2.1 to 1.3. I've dig into their changelog and this is a single version jump, only bug fixes (gogo/protobuf@v1.2.1...v1.3.0). The reason I'm upgrading is because there is an API break between 1.2.1 and 1.3.0 generated pb files, which means I can't import any proto files in the project I'm working on (Loki).

While you can't import generated 1.2 file with generated 1.3 file (compilation issue), they are although still backward compatible (two applications can communicate by using 1.2 and 1.3 respectively)

go vendoring was already using 1.3.0.

Checklist

  • Tests updated
  • Documentation added
  • [ x] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@cyriltovena
Copy link
Contributor Author

Ho no ! weaveworks needs to also be updated for the frontend.

import "github.com/weaveworks/common/httpgrpc/httpgrpc.proto";

The build error is the same problem I'm hitting in Loki. (MarshalToSizedBuffer undefined)

@cyriltovena
Copy link
Contributor Author

see https://github.com/gogo/protobuf#release-v130

With this new release comes a new internal library version. This means any newly generated *pb.go files generated with the v1.3.0 library will not be compatible with the old library version (v1.2.1). However, current *pb.go files (generated with v1.2.1) should still work with the new library.

Please make sure you manage your dependencies correctly when upgrading your project. If you are still using v1.2.1 and you update your dependencies, one of which could include a new *pb.go (generated with v1.3.0), you could get a compile time error.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cyriltovena! Let's wait for weaveworks/common#177 so that we can proceed with this PR.

CHANGELOG.md Outdated Show resolved Hide resolved
@bboreham
Copy link
Contributor

still backward compatible (two applications can communicate by using 1.2 and 1.3 respectively)

Who has tested this? What was the process and results?

(We could add it to the "integration tests" in this repo)

@cyriltovena
Copy link
Contributor Author

cyriltovena commented Jan 31, 2020

@bboreham wrote a test, found a bug with the custom type, I guess I have to regen those too, I'm working on it.

Everything else looks good will share more soon. I don't think there is a compatibility issue, I think I didn't regen code inside https://github.com/cortexproject/cortex/blob/master/pkg/ingester/client/timeseries.go

@cyriltovena
Copy link
Contributor Author

@bboreham I added a full ingesters tests and there were only issues with custom type. Once regenerated everything works fine, I can assure you now that the wire format has not changed.

Not sure if you want to keep those tests they are very long (took me a day), but at least you have the proof, I'm happy to remove them if you want.

We're now blocked by weaveworks/common#177 which has no custom type usage so this will also be backward compatible.

@bboreham
Copy link
Contributor

bboreham commented Feb 1, 2020

I don't think there is a compatibility issue

With respect, what you think matters a lot less than what the tests show us.

Are the three CircleCI failures something you can fix?

@bboreham
Copy link
Contributor

bboreham commented Feb 1, 2020

I don't understand why we have extra data structure code for integration tests: why don't we just run the image from Cortex 0.6.0 ?

@cyriltovena
Copy link
Contributor Author

cyriltovena commented Feb 1, 2020

Are the three CircleCI failures something you can fix?

The CI fails because of weaveworks/common#177. see my comment here #2055 (comment)

With respect, what you think matters a lot less than what the tests show us.

This is the test I wrote.
https://github.com/cortexproject/cortex/blob/f60de80bce45ed7e3f1b6633356b3bbaa65954d9/integration-tests/proto/proto_test.go

It uses the two different versions of the code (1.2.1 is master and 1.3.0 is this PR). It was more simple to experiment with proto services, I believe writing a full integration test on a CI that I don't control will be way more complex and will probably cover less (the feedback loop was way faster with go tests).

Those tests that I did actually covers the full ingesters proto service which is the most complex we have, I also tests marshalling of custom proto from 1.2 to 1.3 and vise versa because those were having issues (forgot to regenerate that copied code) so I put more effort into those. (see https://github.com/cortexproject/cortex/blob/f60de80bce45ed7e3f1b6633356b3bbaa65954d9/integration-tests/proto/proto_test.go)

I'm basically giving a proof of compatibility with that test, you can try it yourself and we can get rid of it once we all are happy.

@bboreham
Copy link
Contributor

bboreham commented Feb 1, 2020

Can you put an override in go.mod so this PR builds off your branch in weaveworks/common?

I still don't get what's so complicated about running an integration test with two different image versions. E.g. 0.6.0 ingester against current distributor. We have push tests already; why isn't it a matter of copy-pasting that with a different image tag?

@cyriltovena
Copy link
Contributor Author

I'll put the override, and give a go to integration tests.

@cyriltovena
Copy link
Contributor Author

I removed my go test (although it allowed me to discover a bug in the query path) as this was a simple proof of backward compatibility.

I merged your PR.

Let me know.

@cyriltovena
Copy link
Contributor Author

cyriltovena commented Feb 1, 2020

Looks like we need to update the CI with the new image, but I would need access or someone to do it for me.

check-protos runs against the old version which fails.

@bboreham
Copy link
Contributor

bboreham commented Feb 2, 2020

The CI is controlled from .circleci/config.yml; you can change that as part of a PR.

Probably you can’t push an image under the same name, but for the purpose of testing you can use a different name.

@cyriltovena
Copy link
Contributor Author

This is ready to go, I'll rebase to upstream/master once I have the green light.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cyriltovena for your patience addressing feeback! 🙏

In terms of backward compatibility within Cortex, I feel quite comfortable with this PR. I've picked the new integration tests (which include backward compatibility tests) from the PR #1839, merged them into this PR's branch, rebuilt the Docker image and run the integration tests: passed. See details below.

$ hub pr checkout 2055
$ git merge --squash setup-integration-tests-with-docker-compose-and-golang

# Then rebuilt cortex:latest image before running the integration tests:
$ go test -timeout 60s -v ./integration

=== RUN   TestBackwardCompatibilityWithChunksStorage
Starting dynamodb
Starting consul
dynamodb: Initializing DynamoDB Local with the following configuration:
dynamodb: Port:	8000
dynamodb: InMemory:	true
dynamodb: DbPath:	null
dynamodb: SharedDb:	true
dynamodb: shouldDelayTransientStatuses:	false
dynamodb: CorsParams:	*
consul: ==> Starting Consul agent...
consul: ==> Consul agent running!
consul: Version: 'v0.9.4'
consul: Node ID: 'a3ea2a25-0474-c943-9df7-8d2b4c017195'
consul: Node name: 'consul'
consul: Datacenter: 'dc1' (Segment: '<all>')
consul: Server: true (Bootstrap: false)
consul: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
consul: ==> Log data will now stream in as it occurs:
Starting table-manager
Starting ingester-1
consul: ==> Newer Consul version available: 1.6.3 (currently running: 0.9.4)
Starting distributor
Starting ingester-2
Stopping ingester-1
Starting querier
querier: 2020/02/06 08:12:09 Request.HasNextPage deprecated. Use Pagination type for configurable pagination of API operations
Stopping querier
Starting querier
querier: 2020/02/06 08:12:12 Request.HasNextPage deprecated. Use Pagination type for configurable pagination of API operations
Stopping querier
Killing ingester-2
ingester-2: level=warn ts=2020-02-06T08:12:13.4828736Z caller=transfer.go:422 msg="transfer attempt failed" err="cannot find ingester to transfer chunks to: no pending ingesters" attempt=1 max_retries=10
Killing distributor
Killing table-manager
Killing consul
Killing dynamodb
--- PASS: TestBackwardCompatibilityWithChunksStorage (19.51s)
=== RUN   TestIngesterFlushWithChunksStorage
Starting dynamodb
Starting consul
dynamodb: Initializing DynamoDB Local with the following configuration:
dynamodb: Port:	8000
dynamodb: InMemory:	true
dynamodb: DbPath:	null
dynamodb: SharedDb:	true
dynamodb: shouldDelayTransientStatuses:	false
dynamodb: CorsParams:	*
consul: ==> Starting Consul agent...
consul: ==> Consul agent running!
consul: Version: 'v0.9.4'
consul: Node ID: '2b8d9200-616b-ba16-cd27-02ea7c9f4fb7'
consul: Node name: 'consul'
consul: Datacenter: 'dc1' (Segment: '<all>')
consul: Server: true (Bootstrap: false)
consul: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
consul: ==> Log data will now stream in as it occurs:
Starting table-manager
Starting ingester-1
Starting querier
Starting distributor
querier: 2020/02/06 08:12:25 Request.HasNextPage deprecated. Use Pagination type for configurable pagination of API operations
Stopping ingester-1
Killing distributor
Killing querier
Killing table-manager
Killing consul
Killing dynamodb
--- PASS: TestIngesterFlushWithChunksStorage (12.73s)
=== RUN   TestIngesterHandOverWithBlocksStorage
Starting minio
Starting consul
minio: Attempting encryption of all config, IAM users and policies on MinIO backend
consul: ==> Starting Consul agent...
consul: ==> Consul agent running!
consul: Version: 'v0.9.4'
consul: Node ID: '33ddd101-4865-8bf9-77c5-44c9e8a5158c'
consul: Node name: 'consul'
consul: Datacenter: 'dc1' (Segment: '<all>')
consul: Server: true (Bootstrap: false)
consul: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
consul: ==> Log data will now stream in as it occurs:
Starting ingester-1
Starting querier
Starting distributor
Starting ingester-2
Stopping ingester-1
ingester-1: level=warn ts=2020-02-06T08:12:37.2829326Z caller=transfer.go:422 msg="transfer attempt failed" err="cannot find ingester to transfer blocks to: no pending ingesters" attempt=1 max_retries=10
Killing ingester-2
ingester-2: level=warn ts=2020-02-06T08:12:38.1423973Z caller=transfer.go:422 msg="transfer attempt failed" err="cannot find ingester to transfer blocks to: no pending ingesters" attempt=1 max_retries=10
Killing distributor
Killing querier
Killing consul
Killing minio
minio: Exiting on signal: TERMINATED
--- PASS: TestIngesterHandOverWithBlocksStorage (11.86s)
=== RUN   TestIngesterHandOverWithChunksStorage
Starting dynamodb
Starting consul
dynamodb: Initializing DynamoDB Local with the following configuration:
dynamodb: Port:	8000
dynamodb: InMemory:	true
dynamodb: DbPath:	null
dynamodb: SharedDb:	true
dynamodb: shouldDelayTransientStatuses:	false
dynamodb: CorsParams:	*
consul: ==> Starting Consul agent...
consul: ==> Consul agent running!
consul: Version: 'v0.9.4'
consul: Node ID: 'f339f5e6-fdbc-0041-38b3-783506cb222d'
consul: Node name: 'consul'
consul: Datacenter: 'dc1' (Segment: '<all>')
consul: Server: true (Bootstrap: false)
consul: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
consul: ==> Log data will now stream in as it occurs:
Starting table-manager
Starting ingester-1
Starting querier
Starting distributor
querier: 2020/02/06 08:12:49 Request.HasNextPage deprecated. Use Pagination type for configurable pagination of API operations
Starting ingester-2
Stopping ingester-1
ingester-1: level=warn ts=2020-02-06T08:12:50.7051847Z caller=transfer.go:422 msg="transfer attempt failed" err="cannot find ingester to transfer chunks to: no pending ingesters" attempt=1 max_retries=10
Killing ingester-2
ingester-2: level=warn ts=2020-02-06T08:12:51.521813Z caller=transfer.go:422 msg="transfer attempt failed" err="cannot find ingester to transfer chunks to: no pending ingesters" attempt=1 max_retries=10
Killing distributor
Killing querier
Killing table-manager
Killing consul
Killing dynamodb
--- PASS: TestIngesterHandOverWithChunksStorage (13.90s)
PASS
ok  	github.com/cortexproject/cortex/integration	58.028s

/cc @bboreham

CHANGELOG.md Outdated Show resolved Hide resolved
.circleci/config.yml Outdated Show resolved Hide resolved
docs/configuration/config-file-reference.md Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
integration-tests/test-backcompat.sh Outdated Show resolved Hide resolved
pkg/ingester/client/timeseries.go Show resolved Hide resolved
pkg/ingester/client/timeseries.go Show resolved Hide resolved
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cyriltovena! CI tests pass. I also did another round of the new integration tests (PR #1839) against this PR and passed.

cyriltovena and others added 8 commits February 18, 2020 16:12
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Here we test pushing from the new distributor to an older ingester,
and hand-over from the old ingester to a new ingester.

The backwards-compatibility version is pinned at v0.6.0.

Signed-off-by: Bryan Boreham <bryan@weave.works>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

go mod check

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

update circle-ci image

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Rebased to master

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nit. I don't think this should be in the changelog.

CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with one thought about a test.
The changelog line can be removed when we do the actual release; I think we've listed smaller changes there.

pkg/ingester/client/timeseries_test.go Show resolved Hide resolved
@cyriltovena
Copy link
Contributor Author

Merge squash when you're ready. (Those commits are not useful)

@pracucci pracucci merged commit c63a4f2 into cortexproject:master Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants