Setting up some lightweight automation for publishing docker images #1

Groxx · 2023-04-24T23:12:16Z

We have this in an internal wiki, but it's still quite manual, and there are a lot of steps and docker arguments and whatnot.

Since I got tired of following it by hand, and we seem to have missed some in the past: seems worth doing some basic automation.

* Fill domainID for backwards compatibility * Added unit test

* Log error fields as tags * Update common/log/loggerimpl/logger.go Co-authored-by: Steven L <stevenl@uber.com> * Fix syntax error * Use zap ObjectMarshaler for nested fields Co-authored-by: Steven L <stevenl@uber.com>

…er#4806)

* Add logs for domain failover (uber#2359) * Add operation name tag for domain update (uber#2359) * Add error logs for domain update (uber#2359) * Update logs to reuse the logger (uber#2359)

) We have a user desiring this, and in general it seems like a good idea. Activities are generally assumed to be "high cost" to lose, or at least potentially. Longer term, we should probably consider making this a per-domain config, rather than something that is hardcoded for a whole cluster. Nothing about this seems like it would be cluster-bound.

The suspicion is that this is not actually a transient and retriable error, so this should be handled differently

* Simplify history builder * Removed unused methods

)

* Removing target-domain-not-active special-case handling The suspicion is that this is not actually a transient and retriable error, so this should be handled differently * Fixing remaining non-retriable error * Fix test

…tion (uber#4792)

* Decouple domain cache entry from cluster metadata * Addressing comments * Fixing test

Service name should be `worker` not `workers` Co-authored-by: Zijian <Shaddoll@users.noreply.github.com>

…lds are filtered (uber#5151) * add unit test for filter PII functions to check bugs and error when cloning * handles when pointers are nil to avoid bugs and errors * resume the changes from previous reverted branch * use json tags to filter PII instead of hard copies * Create a new struct in unit test that only contains PII. Would be much more clearer to see filtered result. * some clean up

add remaining persistence stuff that goes to a shard

* added and update consistent query per shard metric * testing pershard metric * move sample logger into persistence metric client for cleaness * fix test * fix lint * fix test again * fix lint * sample logging with workflowid tag * added domain tag to logger * metric completed * addressing comments * fix lint * Revert "fix lint" This reverts commit 1e96944. * fix lint second attempt --------- Co-authored-by: Allen Chen <allenchen2244@uber.com>

* ES: single interface for different ES/OpenSearch versions * make fmt

* Elasticsearch: reduce code duplication * address comments --------- Co-authored-by: Zijian <Shaddoll@users.noreply.github.com>

* Set poll interval for filebased dynamic config if not set * update unit test

* Initial checkin for pinot config files

… dropping queued tasks (uber#5164) What changed? When domain cache returned entity not found error, don't drop queued tasks to be more conservative. Why? In cases when the cache is dubious, we shouldn't drop the queued tasks.

* add support for TLS connections by Canary, add development config for Canary with TLS * update README to include new config option * remove testing config --------- Co-authored-by: David Porter <david.porter@uber.com> Co-authored-by: Shijie Sheng <shengs@uber.com> Co-authored-by: Zijian <Shaddoll@users.noreply.github.com>

* Remove misleading type check, Add more detailed log message * removing debugging logging * Handle nil update edge case --------- Co-authored-by: allenchen2244 <102192478+allenchen2244@users.noreply.github.com> Co-authored-by: Zijian <Shaddoll@users.noreply.github.com>

Co-authored-by: David Porter <david.porter@uber.com>

* Adds a small test to catch issues with deadlocks

Metrics for large workflows

* Add thin ES clients

uber#5185) * remove validation & test for add search attribute with no advanced config - Remove validation for Advance Visibility Store - Add Advance Visibility Config check before update ElasticSearch/OpenSearch mapping - Remove co-related test for 'no advanced config' * Update CHANGELOG.md Update CHANGELOG.md * Add warn level message if skip updating OpenSearch/ElasticSearch mapping * Add warn level message and add validSearchAttributes in development.yaml --------- Co-authored-by: Quanzheng Long <prclqz@gmail.com>

* add shardid tag to log * remove counter for overall scope * fix lint

What changed? Added a sharding layer to the NoSQL persistence stack so that Cadence can use multiple Cassandra clusters at once in a physically sharded manner. Cadence is a heavily storage-bounded system, so the limits for the load per Cadence cluster is strictly limited by the underlying storage system. Given the massive adoption of Cadence at Uber, this scale limitation forces us to create more Cadence clusters than we want to operate. This capability will let us have one or two orders of magnitude larger Cadence clusters than we have today. Note that this feature only enables bootstrapping a brand-new cluster with multiple databases behind it. Resharding is designed but not implemented yet. Why? So that a Cadence cluster can be bootstrapped with multiple Cassandra clusters powering it. How did you test it? Added unit tests. Ran samples and tested bench tests in a staging environment. Potential risks Since this change significantly changes the low-level persistence logic, it can cause data loss if something goes terribly wrong. Release notes The change is backward compatible. Existing Cadence cluster configurations can be updated, if desired, to use the sharded NoSQL config format. However, they must continue having a single shard since Cadence still doesn't have the ability to reshard data. Documentation Changes There is a sample config file included in this PR that shows how to make use of the feature in a new cluster.

…ts (uber#5218) * add tasklist traffic metrics for decision task * add logger, remove tasklistID * add taskListCombined * add more fields * add forward metric and source * fix nil * add tlMgr metrics * add more metrics * remove tlMgr metric * only emit metrics if not sticky and not forwarded * create new metrics name for better distinction * add new emitted info * change nil to empty string * add domain and tasklist name tags * add metrics for forwarded tasklist * new metrics for activity task, rename metrics to allow aggregation of both type of tasks * clean up logging * clean up changes in emitInfoOrDebugLog() * resolve comments * improve some logic * fix small error

We have this in an internal wiki, but it's still quite manual, and there are a lot of steps. Some lightweight automation seems worth adopting.

vytautas-karpavicius and others added 30 commits May 10, 2022 15:30

Fill domainID for backwards compatibility (uber#4819)

eede466

* Fill domainID for backwards compatibility * Added unit test

Log error fields as tags (uber#4801)

a51b613

* Log error fields as tags * Update common/log/loggerimpl/logger.go Co-authored-by: Steven L <stevenl@uber.com> * Fix syntax error * Use zap ObjectMarshaler for nested fields Co-authored-by: Steven L <stevenl@uber.com>

Improve failover coordinator error logging (uber#4811)

400bbe4

Add CustomDomain and Operator as default indexed keys (uber#4825)

45770c2

Remove unused PayloadSerializer param (uber#4827)

4194b29

Check for resurrected activities during RecordActivityTaskStarted (ub…

ee5461b

…er#4806)

remove redundant type conversions for activity task dispatch (uber#4820)

0a37a8b

Add logs for domain failover (uber#4810)

d21162d

* Add logs for domain failover (uber#2359) * Add operation name tag for domain update (uber#2359) * Add error logs for domain update (uber#2359) * Update logs to reuse the logger (uber#2359)

Update roadmap.md (uber#4829)

94fd0a6

add metric tags for activity task disaptch (uber#4821)

b03d03e

Update PROPOSALS.md (uber#4831)

fbfafb9

Extract Engine from matching handler (uber#4833)

a575908

Removing target-domain-not-active special-case handling (uber#4835)

beab75c

The suspicion is that this is not actually a transient and retriable error, so this should be handled differently

Simplify history builder (uber#4837)

915a777

* Simplify history builder * Removed unused methods

Remove unused loggers from history (uber#4822)

535cda8

Fix error conversion for WorkflowExecutionAlreadyStartedError (uber#4838

48a157e

)

Shard tag not needed in shard.Context (uber#4842)

856d33f

Remove no-longer used dynamic configs (uber#4843)

3cfcaea

Log replication messages that did not fit (uber#4844)

471e6d1

Remove domain cache from history/workflow (uber#4846)

3a813e8

Removed global domain enabled config (uber#4845)

c6ce732

Update SQL implementation of UpdateExecution to support async transac…

0582a58

…tion (uber#4792)

Separate buildkite pipeline for PRs (uber#4850)

15267b9

Decouple domain cache entry from cluster metadata (uber#4847)

d6ae278

* Decouple domain cache entry from cluster metadata * Addressing comments * Fixing test

Add UpdateFromConfig function to schema tool library (uber#4848)

12d8c54

Make cluster.Metadata a struct and stop using mocks for it (uber#4851)

b457b55

Removed unused internal type getters (uber#4852)

2408f9d

Fix CLI rendering long workflow types (uber#4853)

af932bd

subhash-veluru and others added 30 commits March 13, 2023 13:33

Update README.md (uber#5064)

bb7cb10

Service name should be `worker` not `workers` Co-authored-by: Zijian <Shaddoll@users.noreply.github.com>

add remaining persistence stuff that goes to a shard (uber#5142)

61c64c3

add remaining persistence stuff that goes to a shard

added logging with workflow/domain tags (uber#5159)

e3ac246

ES: single interface for different ES/OpenSearch versions (uber#5158)

a25cba8

* ES: single interface for different ES/OpenSearch versions * make fmt

Add ShardID to valid attributes (uber#5161)

ba19a29

fix samples documentation (uber#5088)

cbf0d14

Elasticsearch: reduce code duplication (uber#5137)

42a14b1

* Elasticsearch: reduce code duplication * address comments --------- Co-authored-by: Zijian <Shaddoll@users.noreply.github.com>

Set poll interval for filebased dynamic config if not set (uber#5160)

1304570

* Set poll interval for filebased dynamic config if not set * update unit test

Add Pinot docker files, table config and schema (uber#5163)

55a8d93

* Initial checkin for pinot config files

Upgrade Golang base image to 1.18 to remediate CVEs (uber#5035)

f1e2476

Co-authored-by: David Porter <david.porter@uber.com>

Adds a small test to catch issues with deadlocks (uber#5171)

7b281c2

* Adds a small test to catch issues with deadlocks

fix build (uber#5180)

dd51c53

large workflow hot shard detection (uber#5166)

9d01035

Metrics for large workflows

Refactor matching integration test (uber#5182)

9a86fbc

Merge Activity and Decision matching tests (uber#5186)

d2f9ec6

Add thin ES clients (uber#5162)

b2bc8bf

* Add thin ES clients

Update idls version (uber#5200)

d165c7b

Corrects the config-store handling for not-found errors (uber#5203)

eade936

Fix consistent query metric (uber#5170)

c5678dd

* add shardid tag to log * remove counter for overall scope * fix lint

Fixed the nil pointer issues in the InactiveDomain Invariant (uber#5213)

824f0ac

Add generic ES query building utilities (uber#5168)

b18be27

Setting up some lightweight automation for publishing docker images

bb9521f

We have this in an internal wiki, but it's still quite manual, and there are a lot of steps. Some lightweight automation seems worth adopting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting up some lightweight automation for publishing docker images #1

Setting up some lightweight automation for publishing docker images #1

Groxx commented Apr 24, 2023

Setting up some lightweight automation for publishing docker images #1

Are you sure you want to change the base?

Setting up some lightweight automation for publishing docker images #1

Conversation

Groxx commented Apr 24, 2023