Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(warehouse): degraded workspace id #2627

Merged
merged 27 commits into from
Nov 8, 2022
Merged

Conversation

lvrach
Copy link
Member

@lvrach lvrach commented Oct 27, 2022

Description

Feature

Degradation is a common concept in rudder-stack infrastructure, where a service runs in a limited way allowing some operations to happen safely. In the warehouse domain, degraded mode means:

  • No processing or database updates are taking place.
  • Will not accept any new request. Most APIs are not registered.

Workspace degradation is a feature that enables applying the degradation mode for specific workspaceIDs.

Motivation

  1. Migrate to multi-tenant cluster without downtime for customers that are not being migrated but are residents in the source or destination of migration.
  2. Isolate problematic workspaceIDs if the need arises/
  3. Set the foundations for a multi-master warehouse system.

Implementation

A new component is introduced, multitenant.Manager is responsible for:

  • be the source of truth for degraded workspaceIDs
  • filter degraded workspaceIDs from backend config
  • map sourceID to workspaceID

Places that have been affected:

  1. HTTP API handlers will not respond with http.StatusServiceUnavailable, if a workspace id is degraded.
  2. archiver will not archive degraded workspaceID
  3. WatchConfig has replace backendconfig.Subcribe so that workers related to degraded mode ids won't start.

Archive refactoring

To properly test the changes of the archiver:

  • I have added assertions for minio contents and db states
  • Added table test for workspaceId functionality
  • Convert from gingko to go native tests, as I had trouble setting up table tests
  • Small refactor of the archive, introducing struct and DI (surely further work remains)

limitation and out-of-scope

  • Degraded workspaces are statically defined. A service restart will be required every time we change them. The manager is design is a way to easily support dynamic workspace ids in the future.

  • The manager is not properly injected in all the cases. To use proper DI for the manager significant refactoring will be required.

  • async framework is not in the scope of this PR.

Notion Ticket

https://www.notion.so/rudderstacks/workspace-degraded-mode-5f231fa3899b4a79bf230d2ee12e85ca

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@lvrach lvrach changed the title exp: PoC feat(warehouse): degraded workspace id Oct 31, 2022
@lvrach lvrach marked this pull request as ready for review November 3, 2022 16:53
@lvrach lvrach marked this pull request as draft November 3, 2022 17:20
@codecov
Copy link

codecov bot commented Nov 4, 2022

Codecov Report

Base: 45.60% // Head: 45.44% // Decreases project coverage by -0.15% ⚠️

Coverage data is based on head (f324099) compared to base (0d061ff).
Patch coverage: 47.46% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2627      +/-   ##
==========================================
- Coverage   45.60%   45.44%   -0.16%     
==========================================
  Files         287      289       +2     
  Lines       47790    47928     +138     
==========================================
- Hits        21793    21783      -10     
- Misses      24619    24762     +143     
- Partials     1378     1383       +5     
Impacted Files Coverage Δ
warehouse/archive/cron.go 0.00% <0.00%> (ø)
warehouse/identities.go 1.04% <0.00%> (ø)
warehouse/warehouse.go 8.42% <2.59%> (-0.26%) ⬇️
warehouse/api.go 70.41% <42.85%> (ø)
warehouse/archive/archiver.go 65.63% <50.81%> (ø)
warehouse/upload.go 18.45% <58.33%> (+0.09%) ⬆️
warehouse/multitenant/manager.go 100.00% <100.00%> (ø)
warehouse/retry.go 87.58% <100.00%> (ø)
warehouse/table_upload.go 93.92% <100.00%> (ø)
processor/stash/stash.go 41.56% <0.00%> (-24.28%) ⬇️
... and 7 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@lvrach lvrach marked this pull request as ready for review November 4, 2022 07:56
@achettyiitr
Copy link
Member

achettyiitr commented Nov 4, 2022

@lvrach
Since we have added an archiver component, Should we consider moving it to a separate package?

@lvrach
Copy link
Member Author

lvrach commented Nov 4, 2022

@lvrach Since we have added an archiver component, Should we consider moving it to a separate package?

Yeah, I was considering this as well. I will give it a try and see how it goes.

@lvrach
Copy link
Member Author

lvrach commented Nov 4, 2022

@lvrach Since we have added an archiver component, Should we consider moving it to a separate package?

Yeah, I was considering this as well. I will give it a try and see how it goes.

@achettyiitr archiver has its package now, but it has some changes to upload types to avoid circular imports.

}

// Run is a blocking function that executes manager background logic.
func (m *Manager) Run(ctx context.Context) error {
Copy link
Member

@achettyiitr achettyiitr Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are always returning nil. Should we remove the error it?

}

m.sourceIDToWorkspaceID = make(map[string]string)
m.excludeWorkspaceIDMap = make(map[string]struct{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not using this anywhere?

Copy link
Member

@achettyiitr achettyiitr Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we are using degradedWorkspaceIDs as excludeWorkspace. Do we still need this?

@achettyiitr
Copy link
Member

Kudos @lvrach. It was really nice going through this and learning a lot from it.

Copy link
Contributor

@abhimanyubabbar abhimanyubabbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. 👍🏼

func backupRecords(args backupRecordsArgs) (backupLocation string, err error) {
pkgLogger.Infof(`Starting backupRecords for uploadId: %s, sourceId: %s, destinationId: %s, tableName: %s,`, args.uploadID, args.sourceID, args.destID, args.tableName)
type Archiver struct {
DB *sql.DB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having a separate package for archiver and explicitly listing down the dependencies. For having a better DI, should we expose a NewArchiver function so that the system doesn't allowing creating archiver with nil values. Wdyt ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer this approach for packages that are used as internal components. Let me explain,

If we were to introduce a NewArchive method, is going to need 5 args, and more will be added in the future. In this case, we could use opts fn or a separate struct for the options that still require validation.

In general, is common in Go to initialise struct like this (http.Server, for example).

If we need validation we can do it lazily as part of .Run method for example, but personally, for internal packages, I find it overkill.

@@ -0,0 +1,228 @@
//go:build !warehouse_integration

package archive_test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job with the test case ! Moreee tests !! 🥳 🎉

@lvrach lvrach merged commit a0da8f7 into master Nov 8, 2022
@lvrach lvrach deleted the feat.degraded-workspace-id branch November 8, 2022 11:18
atzoum added a commit that referenced this pull request Dec 1, 2022
feat(warehouse): added support for bigquery custom partition for workspaceIDs (#2679)
chore: by default enable max concurrent gw request limit. (#2648)
doc: create SECURITY.md (#2656)
chore: use bugsnag in router and processor goroutines (#2686)
test: using arm64 compatible images if necessary (#2670)
chore: regulation worker avoid panic in case of timeout (#2657)
feat(warehouse): degraded workspace id (#2627)
Release-As: 1.4.0
atzoum added a commit that referenced this pull request Dec 1, 2022
feat(warehouse): added support for bigquery custom partition for workspaceIDs (#2679)
chore: by default enable max concurrent gw request limit. (#2648)
doc: create SECURITY.md (#2656)
chore: use bugsnag in router and processor goroutines (#2686)
test: using arm64 compatible images if necessary (#2670)
chore: regulation worker avoid panic in case of timeout (#2657)
feat(warehouse): degraded workspace id (#2627)
Release-As: 1.4.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants