Skip to content
This repository has been archived by the owner on Jan 26, 2023. It is now read-only.

Bulk paginated api for backfilling #29

Merged
merged 15 commits into from
Sep 19, 2019
Merged

Bulk paginated api for backfilling #29

merged 15 commits into from
Sep 19, 2019

Conversation

ensary
Copy link
Contributor

@ensary ensary commented Sep 9, 2019

Asking for early feedback.
To be updated with tests and possibly changes to queries once I look into performance.

@ensary ensary requested a review from a team as a code owner September 9, 2019 17:18
@codecov
Copy link

codecov bot commented Sep 9, 2019

Codecov Report

Merging #29 into master will decrease coverage by 1.87%.
The diff coverage is 85%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #29      +/-   ##
==========================================
- Coverage   88.39%   86.52%   -1.88%     
==========================================
  Files           9        9              
  Lines         431      579     +148     
==========================================
+ Hits          381      501     +120     
- Misses         30       51      +21     
- Partials       20       27       +7
Impacted Files Coverage Δ
pkg/domain/storage.go 0% <ø> (ø) ⬆️
pkg/handlers/v1/cloud_fetch.go 89.43% <81.7%> (-10.57%) ⬇️
pkg/storage/db.go 84.97% <92.1%> (-0.46%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b408c3c...1507f33. Read the comment docs.

main.go Outdated Show resolved Hide resolved
@@ -60,6 +60,35 @@ const latestStatusQuery = "WITH latest_candidates AS ( " +
" aws_resources ON " +
" latest.aws_resources_id = aws_resources.id;"

// This query is used to retrieve all the 'active' resources (i.e. those with assigned IP/Hostname) for specific date
// TODO - run performance analysis, possibly do something else on SQL level (optimize query or adjust schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@zlozano
Copy link
Contributor

zlozano commented Sep 9, 2019

LGTM so far

@gcase555
Copy link
Contributor

lgtm so far as well

}
success: '{"status": 200, "bodyPassthrough": true}'
error: >
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: After playing around with this syntax a bunch, I personally find something like this to be a little easier to read, since it separates the JSON response from the template syntax. There's more duplication here though, so I have no strong feelings either way.

            #! if eq .Response.Body.errorType "NotFound" !#
            { "status": 404 }
            #! else if eq .Response.Body.errorType "DependencyFailure" !#
            { "status": 502, "body": { "code": 502, "status": "Bad Gateway", "reason": "#!.Response.Body.errorMessage!#" }}
            #! else !#
            { "status": 500, "body": { "code": 500, "status": "Internal Server Error", "reason": "#!.Response.Body.errorMessage!#" }}
            #! end !#

api.yaml Outdated Show resolved Hide resolved
pkg/handlers/v1/cloud_fetch.go Outdated Show resolved Hide resolved
pkg/handlers/v1/cloud_fetch.go Outdated Show resolved Hide resolved
api.yaml Show resolved Hide resolved
api.yaml Outdated Show resolved Hide resolved
api.yaml Outdated Show resolved Hide resolved
api.yaml Outdated Show resolved Hide resolved
zlozano
zlozano previously approved these changes Sep 16, 2019
@zlozano
Copy link
Contributor

zlozano commented Sep 16, 2019

Approving this in the WIP state. Still would like integration tests.

@@ -138,8 +138,8 @@ paths:
request: >
{
"time": "#!index .Request.Query.time 0!#",
"count": #!if .Request.Query.count !# #!index .Request.Query.count 0!# #! else !# 100 #! end !# #!if .Request.Query.type !# ,
"type": "#!index .Request.Query.type 0!#" #! end !#
"count": #!if .Request.Query.count !# #!index .Request.Query.count 0!# #! else !# 100 #! end !# ,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not really a suggestion, mostly posing the question: would this default value be better if embedded as a value inside the app? If in the app, it can be used immediately by anyone who uses this service. If left here in the api descriptor, it must be duplicated by anyone who uses it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a good example? It looks like a good idea to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of. One approach would be to have an attribute inside your handler for a default page size which is populated from an environment variable.

Count *uint `json:"count"`
Offset *uint `json:"offset"`
Count uint `json:"count"`
Offset uint `json:"offset"`
Copy link
Contributor

@zlozano zlozano Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task: AFAICT this isn't used anymore NVM. I see that it's returned by the conversion functions.

@ensary ensary changed the title WIP: first pass on bulk paginated api for backfilling Bulk paginated api for backfilling Sep 19, 2019
@@ -546,48 +545,49 @@ func (db *DB) runQuery(ctx context.Context, query string, args ...interface{}) (
var timestamp time.Time

err = rows.Scan(&row.ARN, &ipAddress, &hostname, &isPublic, &isJoin, &timestamp, &row.AccountID, &row.Region, &row.ResourceType, &metaBytes)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 thanks!

@ensary ensary requested a review from zlozano September 19, 2019 14:34
Copy link
Contributor

@gcase555 gcase555 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐️

@ensary ensary merged commit 11d8fd0 into master Sep 19, 2019
}
}
found = false
var ipAddresses *[]string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I can see that this was already being handled as a pointer to a slice. However, I don't see any reason why this is being handled this way. All usage of the variable after this dereferences the pointer value. I believe this should just be a slice type and not a pointer to a slice type. In general, it's odd to ever see a pointer to a slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's outside this PR (the change is indentation) but I will look into fixing this in the next one.

if err != nil {
return "", err
}
token := base32.StdEncoding.WithPadding(base32.NoPadding).EncodeToString(js)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why base32 instead of base64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional. Base32 is case insensitive, which makes use in path (or part of URL) safer. Not exactly a big deal in our case, but the thinking was to make this as portable as possible.

Type string `json:"type"`
}

func (p *CloudAssetFetchAllByTimestampParameters) toNextPageToken() (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Personally, I prefer not having behaviors/methods attached to data containers. My recommendation is generally that 1) interfaces define behaviors but do not contain data, 2) structs contain data but do not implement behaviors, and 3) a struct may implement and interface but then it must not contain data. The phrasing of "contains data" here means that structs attributes are exported and we expect people to use them directly. The goal in that philosophy is to ensure that we don't fall into traditional class patterns where we're trying to model state and ways of manipulating that state. It's not perfect and there are no "hard lines" in practice but I find it useful.

For example, instead of func(p *CloudFetchAllbyTimestampParameters) toNextPageToken() I'd suggest func NextPageToken(p CloudFetchAllByTimestampParameters). Practically speaking, these are both the same. The receiver pointer notation like func(*Something) is intended to replace old C-style struct methods that eventually became class/instance methods. This is really more of a philosophical change where receiver functions operate on the state of the instance they are attached to and the non-receiver style is a pure function that operates on a given input.

Again, they aren't that different in most cases. I just find it more comfortable to avoid any accidental state that might show up in class-style methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent explanation, esp the "receiver functions operate on the state of the instance they are attached to and the non-receiver style is a pure function that operates on a given input" part.

return extractOutput(assets), nil
nextPageToken, e := input.toNextPageToken()
if e != nil {
logger.Error(logs.StorageError{Reason: e.Error()})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why do we continue here instead of returning the storage error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current page of results is valid, and the case where we can not generate the token from valid parameters is unlikely (if at all possible), but I did not want to ignore the possible error condition that we did not foresee. So the logic currently is - return valid results (that's what client requested), log any issue with generating token for the next page, and let the next request fail.
One clear action item - I should not use StorageError for this, as it has nothing to do with storage, but I am open to failing loudly if we can't generate token for the next page even when we have valid results for the current one.

}
if len(assets) == 0 {
return CloudAssets{}, NotFound{ID: "any"}
return PagedCloudAssets{}, NotFound{ID: "any"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This was not introduced in this PR, but we would usually keep custom error types in the domain package since they are used to communicate across interface boundaries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if an error is used in the handler layer only? Should we still put those in the domain package?

}

//generic error to report to caller to avoid exposing the internal token structure NB, the specific error is still logged
tokenError := errors.New("malformed pageToken")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This could be a package constant/variable and re-used rather than redefined on each call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point

@ensary ensary deleted the add-bulk-fetch branch September 9, 2020 18:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants