Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHINENG-14146 Optimize Get Recommendations Query #282

Merged

Conversation

saltgen
Copy link
Contributor

@saltgen saltgen commented Dec 9, 2024

Why do we need this change? 💭

The current sql query is expensive considering there is a lot of records to sift through.
Analysis details are present in the JIRA card.

That said, some important details are provided in the Additional section below.

Documentation update? 📝

  • Yes
  • No

Security Checklist 🔒

Upon raising this PR please go through RedHatInsights/secure-coding-checklist

💂‍♂️ Checklist 🎯

  • Does this change depend on specific version of Kruize
    • If yes what is the version no:
    • Is that image available in production or needs deployment?
  • Bugfix
  • New Feature
  • Refactor
  • Unittests Added
  • DRY code
  • Dependency Added
  • DB Migration Added

Additional 📣

Indexes added

clusters (last_reported_at)
recommendation_sets (workload_id)
workloads (cluster_id)

Other Optimizations

  • Discarding usage of postgres' DATE method in queries
  • queryParams object is now a native interface

Query Diffs

GetRecommendationSet

Old

SELECT "recommendation_sets"."id","recommendation_sets"."workload_id","recommendation_sets"."container_name","recommendation_sets"."monitoring_start_time","recommendation_sets"."monitoring_end_time","recommendation_sets"."recommendations","recommendation_sets"."updated_at" FROM "recommendation_sets" 
                        JOIN workloads ON recommendation_sets.workload_id = workloads.id
                        JOIN clusters ON workloads.cluster_id = clusters.id
                        JOIN rh_accounts ON clusters.tenant_id = rh_accounts.id
                 WHERE rh_accounts.org_id = '3340851' AND DATE(recommendation_sets.monitoring_end_time) >= '1970-01-01' AND DATE(recommendation_sets.monitoring_end_time) <= '2024-12-02' ORDER BY clusters.last_reported_at LIMIT 10;

New

SELECT recommendation_sets.id, recommendation_sets.container_name AS container, workloads.namespace AS project, workloads.workload_name as workload, workloads.workload_type, clusters.source_id, clusters.cluster_uuid, clusters.cluster_alias, clusters.last_reported_at AS last_reported, recommendation_sets.recommendations FROM "recommendation_sets" 
			JOIN workloads ON recommendation_sets.workload_id = workloads.id
			JOIN clusters ON workloads.cluster_id = clusters.id
			JOIN rh_accounts ON clusters.tenant_id = rh_accounts.id
		 WHERE rh_accounts.org_id = '3340851' AND recommendation_sets.monitoring_end_time >= '1970-01-01 00:00:00' AND recommendation_sets.monitoring_end_time <= '2024-12-09 12:07:47.503' ORDER BY clusters.last_reported_at DESC LIMIT 10

GetRecommendationSetByID

Old

SELECT "recommendation_sets"."id","recommendation_sets"."workload_id","recommendation_sets"."container_name","recommendation_sets"."monitoring_start_time","recommendation_sets"."monitoring_end_time","recommendation_sets"."recommendations","recommendation_sets"."updated_at" FROM "recommendation_sets" JOIN workloads ON recommendation_sets.workload_id = workloads.id JOIN clusters ON workloads.cluster_id = clusters.id JOIN rh_accounts ON clusters.tenant_id = rh_accounts.id WHERE rh_accounts.org_id = '3340851' AND recommendation_sets.id = '4514582e-a156-47ae-8cc7-705a1dd3207b' ORDER BY "recommendation_sets"."id" LIMIT 1

New

SELECT recommendation_sets.id, recommendation_sets.container_name AS container, workloads.namespace AS project, workloads.workload_name as workload, workloads.workload_type, clusters.source_id, clusters.cluster_uuid, clusters.cluster_alias, clusters.last_reported_at AS last_reported, recommendation_sets.recommendations FROM "recommendation_sets" 
			JOIN workloads ON recommendation_sets.workload_id = workloads.id
			JOIN clusters ON workloads.cluster_id = clusters.id
			JOIN rh_accounts ON clusters.tenant_id = rh_accounts.id
		 WHERE rh_accounts.org_id = '3340851' AND recommendation_sets.id = '32a5d36c-aec1-41d7-b19f-5fef015a930a' ORDER BY "recommendation_sets"."id" LIMIT 1

@saltgen
Copy link
Contributor Author

saltgen commented Dec 9, 2024

I'll fix the unittest soon, please feel free to go through the other changes meanwhile.

Copy link
Collaborator

@kgaikwad kgaikwad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. 👍 I will continue reviewing latest updated changes specific to date field.
Added just two inline suggestions. Please feel free to skip if it doesn't make sense.
Thanks!

internal/api/handlers.go Outdated Show resolved Hide resolved
internal/api/handlers.go Outdated Show resolved Hide resolved
internal/api/api_test.go Outdated Show resolved Hide resolved
internal/api/utils.go Outdated Show resolved Hide resolved
internal/api/utils.go Show resolved Hide resolved
if endDateStr == "" {
endDate = now
endTimestamp = now
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
endTimestamp = now
endTimestamp = time.Date(now.Year(), now.Month(), now.Day(), 0, 0, 0, 0, time.UTC)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would work in case we don't expect any values for the query param end_date.
Not sure if I'm missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have the time portion available in the timestamp especially when the query param end_time value has not been provided

internal/api/utils.go Outdated Show resolved Hide resolved
@upadhyeammit
Copy link
Contributor

Request to rebase! the IQE tests are running most of the times now!

@saltgen saltgen force-pushed the refactor/get-recommendations-query branch from d96ab53 to bd8f4da Compare December 18, 2024 11:27
@upadhyeammit
Copy link
Contributor

/retest

Copy link
Contributor

@upadhyeammit upadhyeammit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query var in GetRecommendationSets and GetRecommendationSetByID is exactly same part from a filter Where("recommendation_sets.id = ?", recommendationID) . This is good candidate to consolidate and take out in another common function.

- queryparam now uses truncated ts
- Added test case for start,end date
- Parallelized tests
@saltgen saltgen force-pushed the refactor/get-recommendations-query branch from bd8f4da to 7e745bc Compare December 23, 2024 08:21
@upadhyeammit
Copy link
Contributor

/retest

@upadhyeammit
Copy link
Contributor

On IQE I can see its failing because of syntax error, however I dont see this locally? hmm.. is it because locally I am on 1.22 version of Go? need to check,

{"file":"/go/src/app/internal/api/handlers.go:118","func":"github.com/redhatinsights/ros-ocp-backend/internal/api.GetRecommendationSetList","level":"error","msg":"unable to fetch records from databaseERROR: syntax error at or near \")\" (SQLSTATE 42601)","service":"rosocp-api","time":"2024-12-24T07:31:53Z"}
{"time":"2024-12-24T07:31:53.233083919Z","id":"","remote_ip":"10.131.11.30","host":"ros-ocp-backend-api.ephemeral-xbkkoa.svc:8000","method":"GET","uri":"/api/cost-management/v1/recommendations/openshift?cluster=ros_ocp_cluster_caqjsNZzY&start_date=2024-12-17","user_agent":"OpenAPI-Generator/1.0.0/python","status":200,"error":"","latency":11045462,"latency_human":"11.045462ms","bytes_in":0,"bytes_out":361}

@saltgen
Copy link
Contributor Author

saltgen commented Dec 24, 2024

On IQE I can see its failing because of syntax error, however I dont see this locally? hmm.. is it because locally I am on 1.22 version of Go? need to check,

{"file":"/go/src/app/internal/api/handlers.go:118","func":"github.com/redhatinsights/ros-ocp-backend/internal/api.GetRecommendationSetList","level":"error","msg":"unable to fetch records from databaseERROR: syntax error at or near \")\" (SQLSTATE 42601)","service":"rosocp-api","time":"2024-12-24T07:31:53Z"}
{"time":"2024-12-24T07:31:53.233083919Z","id":"","remote_ip":"10.131.11.30","host":"ros-ocp-backend-api.ephemeral-xbkkoa.svc:8000","method":"GET","uri":"/api/cost-management/v1/recommendations/openshift?cluster=ros_ocp_cluster_caqjsNZzY&start_date=2024-12-17","user_agent":"OpenAPI-Generator/1.0.0/python","status":200,"error":"","latency":11045462,"latency_human":"11.045462ms","bytes_in":0,"bytes_out":361}

I'll take a look

@saltgen
Copy link
Contributor Author

saltgen commented Jan 7, 2025

On IQE I can see its failing because of syntax error, however I dont see this locally? hmm.. is it because locally I am on 1.22 version of Go? need to check,

{"file":"/go/src/app/internal/api/handlers.go:118","func":"github.com/redhatinsights/ros-ocp-backend/internal/api.GetRecommendationSetList","level":"error","msg":"unable to fetch records from databaseERROR: syntax error at or near \")\" (SQLSTATE 42601)","service":"rosocp-api","time":"2024-12-24T07:31:53Z"}
{"time":"2024-12-24T07:31:53.233083919Z","id":"","remote_ip":"10.131.11.30","host":"ros-ocp-backend-api.ephemeral-xbkkoa.svc:8000","method":"GET","uri":"/api/cost-management/v1/recommendations/openshift?cluster=ros_ocp_cluster_caqjsNZzY&start_date=2024-12-17","user_agent":"OpenAPI-Generator/1.0.0/python","status":200,"error":"","latency":11045462,"latency_human":"11.045462ms","bytes_in":0,"bytes_out":361}

@upadhyeammit This should be fixed now, I have also pushed some DRY for GetRecommendationSet db method

@@ -97,6 +97,8 @@ func MapQueryParameters(c echo.Context) (map[string]interface{}, error) {
log.Error("error parsing end_date:", err)
return queryParams, err
}
// Inclusive user-provided end_date timestamp
endTimestamp = endTimestamp.Add(24 * time.Hour)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@upadhyeammit, what would prefer to have your opinion as well on this?

Copy link
Contributor

@upadhyeammit upadhyeammit Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me bit of time to understand whats happening here. I can see we are adding 24hours(a day) to user provided date for returning the recommendations upto user provided date. However the condition for filter has <=; so do we need it?

queryParams["recommendation_sets.monitoring_end_time <= ?"] = endTimestamp

edit:
And if its about, in db we store date plus timestamp and so we need timestamp as well, in that case I think then do we also need -1 day for start date?

Current time  2025-01-13 13:03:28.909723655 +0530 IST m=+0.000029727
User time  2025-01-14
Parsed time  2025-01-14 00:00:00 +0000 UTC
Time after addition  2025-01-15 00:00:00 +0000 UTC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the = from queryParams["recommendation_sets.monitoring_end_time <= ?"] = endTimestamp
Regarding start_date I think we should be fine, the 24 Hrs delta is being added only for user provided end_date

func (r *RecommendationSet) GetRecommendationSets(orgID string, orderQuery string, limit int, offset int, queryParams map[string][]string, user_permissions map[string][]string) ([]RecommendationSet, int, error) {

var recommendationSets []RecommendationSet
func (r *RecommendationSet) GetRecommendationSet(orgID string, user_permissions map[string][]string, opts ...GetRecommendationOptions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep separate func for GetRecommendationSet and GetRecommendationSetByID, now I see a bigger method doing both.

So it should be,

  1. Separate GetRecommendationSet and GetRecommendationSetByID
  2. A func for common query
  3. Another func if there is anything more common, again there is possibility of having few more func instead of just one, considering those have set responsibilities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change pushed

@saltgen
Copy link
Contributor Author

saltgen commented Jan 22, 2025

/retest

Copy link
Collaborator

@kgaikwad kgaikwad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codewise changes look good to me now 👍
@saltgen, could you please check why konflux pipeline is not happy?

@kgaikwad
Copy link
Collaborator

kgaikwad commented Jan 27, 2025

/retest

2 similar comments
@saltgen
Copy link
Contributor Author

saltgen commented Jan 28, 2025

/retest

@saltgen
Copy link
Contributor Author

saltgen commented Jan 28, 2025

/retest

@upadhyeammit
Copy link
Contributor

/retest

1 similar comment
@upadhyeammit
Copy link
Contributor

/retest

@saltgen saltgen merged commit be2ac47 into RedHatInsights:main Jan 28, 2025
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants