feat: add job refresh #227

arinda-arif · 2022-03-17T02:51:14Z

#180

Acceptance criteria:

Should able to refresh:
- selected jobs, or
- all jobs in selected namespaces, or
- all jobs in a project
After doing refresh, ALL jobs in the requested project will be deployed
Dependency should be correctly persisted
If dependency resolution is failed for a job, it should not failed the whole refresh process
Users should know which jobs failed on refresh or deploy
Metrics on every process in refresh should be pushed

arinda-arif · 2022-03-17T03:40:15Z

raystack/proton#114

coveralls · 2022-03-17T08:01:20Z

Pull Request Test Coverage Report for Build 2112938619

270 of 397 (68.01%) changed or added relevant lines in 12 files are covered.
17 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.2%) to 74.688%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
ext/scheduler/airflow2/airflow.go	3	4	75.0%
job/priority_resolver.go	1	2	50.0%
job/dependency_resolver.go	101	103	98.06%
job/deployer.go	40	42	95.24%
models/project.go	0	3	0.0%
job/service.go	92	97	94.85%
models/job.go	13	21	61.9%
api/handler/v1beta1/observer.go	0	47	0.0%
models/progress.go	0	58	0.0%

Files with Coverage Reduction	New Missed Lines	%
models/job.go	17	47.06%

Totals
Change from base Build 2100149915:	0.2%
Covered Lines:	5872
Relevant Lines:	7862

💛 - Coveralls

…rom main

…e based on the progress type

…function

…or ProgressJobDependencyResolution

store/postgres/migrations/000020_add_job_dependencies_table.down.sql

store/store.go

store/postgres/migrations/000020_add_job_dependencies_table.up.sql

store/postgres/job_spec_repository.go

store/postgres/job_dependency_repository_test.go

job/service.go

models/project.go

…n commit

sravankorumilli · 2022-03-30T19:44:46Z

job/service.go

+var (
+	errDependencyResolution = fmt.Errorf("dependency resolution")
+
+	resolveDependencyFailureGauge = promauto.NewGauge(prometheus.GaugeOpts{


we can have a single gauge with two values success & failure

have changed it to only a single gauge. differentiate the metric values with status label.

sravankorumilli · 2022-03-30T19:45:38Z

job/service.go


 type AssetCompiler func(jobSpec models.JobSpec, scheduledAt time.Time) (models.JobAssets, error)

 // DependencyResolver compiles static and runtime dependencies
 type DependencyResolver interface {
 	Resolve(ctx context.Context, projectSpec models.ProjectSpec, jobSpec models.JobSpec, observer progress.Observer) (models.JobSpec, error)
+	ResolveAndPersist(ctx context.Context, projectSpec models.ProjectSpec, jobSpec models.JobSpec, observer progress.Observer) error


Instead of Resolve And Persist, lets reuse resolve and create new function for persist

agree, I think it will reduce the complexity. however, in the existing resolve function, there is a resolveHookDependencies process, which is not needed (as we are going to store the requested and fetch all after). should we also decouple it from Resolve or keep it until we refactor job Sync?

As discussed, reuse resolve and introduce Persist instead of ResolveAndPersist

sravankorumilli · 2022-03-30T19:50:56Z

job/service.go

+
+// Refresh fetches all the requested jobs, resolves its dependencies, assign proper priority weights,
+// compile all jobs in the project and upload them to the destination store.
+func (srv *Service) Refresh(ctx context.Context, projectSpec models.ProjectSpec, namespaceJobNamePairs []models.NamespaceJobNamePair,


why not work with the *pb.RefreshJobsRequest only & fetch all the relevant jobspecs, I don't see a need for this model.

I was limiting the usage of protobuf model to only in the handler and not passing it to the service layer. this namespaceJobNamePairs is only being used to get all of the jobSpecs for dependency resolution. should we do it early in the handler? what do you think?

As discussed, still use NamespaceJobNamePair to avoid using external contract in the service layer. However, we are not passing NamespaceSpec, but NamespaceName instead (and fetching the spec just before where it is being used).

sravankorumilli · 2022-03-30T19:51:25Z

job/service.go

+	// Resolve dependency
+	if err = srv.resolveAndPersistDependency(ctx, projectSpec, namespaceJobNamePairs, progressObserver); err != nil {
+		// if err is caused by dependency resolution, ignore this as error.
+		var merrs *multierror.Error


don't see a need for this just return error if fails

there are several possible errors that happens in that particular function: multierror (errors from dependency resolution), when compiling assets, or when fetching the job specs. if it is from dependency resolution, we should skip the error, as we don't want the whole refresh process to be canceled. looks like it is not clear enough, i will refactor it.

Have modified this so it will not check for multierror, it will check for the error type.

sravankorumilli · 2022-03-30T19:52:20Z

job/service.go

+func (srv *Service) Refresh(ctx context.Context, projectSpec models.ProjectSpec, namespaceJobNamePairs []models.NamespaceJobNamePair,
+	progressObserver progress.Observer) (err error) {
+	// Resolve dependency
+	if err = srv.resolveAndPersistDependency(ctx, projectSpec, namespaceJobNamePairs, progressObserver); err != nil {


As mentioned earlier we can do these steps

fetch all job specs

resolve dependencies

persist dependencies

agree, will change.

modifiedthis, however persist dependencies and resolve dependencies are still in a same parallel runner. please check this.

sravankorumilli · 2022-03-31T02:56:19Z

job/dependency_resolver.go

+}
+
+func (r *dependencyResolver) FetchJobDependencies(ctx context.Context, projectSpec models.ProjectSpec,
+	observer progress.Observer) (map[uuid.UUID][]models.JobSpecDependency, error) {


return type to be map[JobID]

As discussed, we are still using uuid for now, to avoid having very big change in 1 PR

sravankorumilli · 2022-03-31T03:13:01Z

job/service.go

+	}
+
+	// Fetch dependency and enrich
+	jobDependencies, err := srv.dependencyResolver.FetchJobDependencies(ctx, projectSpec, progressObserver)


Similarly here, as you are refetching the dependencies which were constructed above i believe your intention is to move to seperate place later. It would be better to do that right now.

FetchAllJobsAgain

FetchAllJobDependencies

EnrichJobsWithJobDependencies

EnrichJobsWithHookDependencies // This needs to be optimized by not making call to plugin inside enrichment rather we cache the dependencies between plugins during bootstrap and use pluginService for these kind of purposes.

PriorityResolution

Grouping Namespecs // This step too, why are we keep on passing the projectJobSpecRep and fetching jobs lets just enrich the JobSpecModel with Project & NamespaceSpec or store ProjectId and NamespaceID in the JobSpec which can be used for grouping.

I expect all the data from datasources or third party integrations to happen only once and this data can be cached for optimization purposes. in here we are passing jobspecrepo in every place and fetching jobspecs again and again which can be avoided.

updated. please help to recheck

sravankorumilli · 2022-03-31T03:15:17Z

job/service.go

+	return jobSpecs, nil
+}
+
+func (srv *Service) prepareJobSpecs(ctx context.Context, projectSpec models.ProjectSpec,


prepare is generic better to have specific functions so readers can easily understand whats happening

renamed this to fetchJobSpecs

sravankorumilli · 2022-03-31T03:19:15Z

job/service.go

+
+	// resolve specs in parallel
+	runner := parallel.NewRunner(parallel.WithTicket(ConcurrentTicketPerSec), parallel.WithLimit(ConcurrentLimit))
+	for _, jobSpec := range jobSpecs {


If possible we can see a mechanism on abstract out all the operations of how the parallel running is happening by keeping things declarative

decided to not refactor this part in this PR, as abstracting this out is quite tricky, and will see a better way to do this.

sravankorumilli · 2022-03-31T03:26:08Z

job/dependency_resolver.go

+	return nil
+}
+
+func (r *dependencyResolver) FetchJobDependencies(ctx context.Context, projectSpec models.ProjectSpec,


FetchJobDependencies I believe can be simplified the logic of fetching and updating with inter/intra job dependencies .

have modified this, FetchJobDependencies will just fetch the dependencies, adapting to the required model is handled on the enriching part (in deployer).

…ution, and refresh handler

…us config

cmd/job_refresh.go

sravankorumilli · 2022-04-05T19:23:58Z

cmd/job_refresh.go

+			if !resp.GetSuccess() {
+				deployFailedCounter++
+				if verbose {
+					l.Info(coloredError(fmt.Sprintf("%d. %s failed to be deployed: %s", deployCounter, resp.GetJobName(), resp.GetMessage())))


errors can be warn messages

sravankorumilli · 2022-04-05T19:24:52Z

cmd/job_refresh.go

+			if !resp.GetSuccess() {
+				refreshFailedCounter++
+				if verbose {
+					l.Info(coloredError(fmt.Sprintf("error '%s': failed to refresh dependency, %s", resp.GetJobName(), resp.GetMessage())))


failures to be considered as warn logs

cmd/job_refresh.go

sravankorumilli · 2022-04-05T19:31:10Z

api/handler/v1beta1/job_spec.go

+
+	namespaceJobNamePairs := sv.prepareNamespaceJobNamePairs(req.NamespaceJobs)
+
+	if err := sv.jobSvc.Refresh(respStream.Context(), req.ProjectName, namespaceJobNamePairs, observers); err != nil {


can you split this into multiple lines, will be more readable

Can you change the proto request to accept request for project refresh or bunch of namespaces in project or bunch of jobs. project is mandatory. namespaces and jobs list is optional.
We can have different functions for refreshing all jobs for a project. refresh all jobs of given namespaces, refresh bunch of jobs in a project.

is this what you meant?

message RefreshJobsRequest { string project_name = 1; repeated string namespace_names = 2; repeated string job_names = 3; }

currently, to refresh a bunch of jobs in a project, we need namespace info of where the jobs belong (for auth). we can go with what you proposed, but will still require the namespace that needs to be provided in the namespace_names (which is not straightforward), or we can avoid requesting the namespace for this case.

updated, the proto PR: raystack/proton#127

sravankorumilli · 2022-04-05T19:40:26Z

models/namespace.go

@@ -16,3 +16,8 @@ type NamespaceSpec struct {
 }

 const AllNamespace = "*"
+
+type NamespaceJobNamePair struct {


we don't need this struct, we can avoid this.

removed this as we are no longer need namespace to refresh specific jobs.

sravankorumilli · 2022-04-05T19:55:16Z

job/service.go

+	return srv.deployer.Deploy(ctx, projectSpec, progressObserver)
+}
+
+func (srv *Service) fetchJobSpecs(ctx context.Context, projectSpec models.ProjectSpec,


fetchJobSpecs should be split into 4 functions, fetchAllForAProject, fetchAllForGivenNamespaces, fetchSpecsForGivenJobNames, fetchJobSpecs will be the wrapper function

sravankorumilli · 2022-04-06T03:54:00Z

job/service.go

+	defer resolveDependencyHistogram.Observe(time.Since(start).Seconds())
+
+	// compile assets before resolving in parallel
+	for i, jSpec := range jobSpecs {


compiling assets can also be part of the parallel run, anyreason to not keep this, and by the we decided to not fail in case any failure with a single job spec and proceed with others, but here we are returning.

sravankorumilli · 2022-04-06T05:27:14Z

job/deployer.go

+	return deployError
+}
+
+func (d *deployer) enrichJobSpecWithJobDependencies(ctx context.Context, jobSpecs []models.JobSpec, jobDependencies []models.JobIDDependenciesPair) ([]models.JobSpec, error) {


we can refactor this method

createJobSpecMap can be dissolved into fetchJobSpecsForDependentJobs which can go into the main function call and the map creation will just create JobSpecMap given jobSpecs and dependentJobSpecs, here we can optimize by grouping the jobs by dependent project and fetching only once. Or why not fetch the jobspecs through a foreign key contraint, then we can avoid this fetch altogether.

We will have a function to enrich a single obSpec With JobDependencies, which will be reused here. When ever there is nesting better to break down into multiple functions

updated as discussed: not fetching the jobspecs through foreign key constraint (to avoid performance issue), jobSpecMap and enrichment are now being done when fetching job specs with job dependencies.

…FetchJobSpecsWithJobDependencies

sravankorumilli · 2022-04-08T02:51:45Z

job/service.go

+	// resolve dependency and persist
+	if err := srv.resolveDependency(ctx, projectSpec, jobSpecs, progressObserver); err != nil {
+		// if err is caused by other than asset compilation, ignore the error.
+		if err != nil {


why fail in case of asset compilation

this should be removed by now. fixing

sravankorumilli · 2022-04-08T02:53:27Z

cmd/job_refresh.go

+		}
+	)
+
+	cmd.Flags().StringVarP(&projectName, "project", "p", projectName, "Optimus project name")


looks like we are accepting the flags but it doesn't work, as we are always referring from config. @deryrahman is working on fixing loading of specs from a single place, lets do the same here.

yes, it's known issue @sravankorumilli , will be fixed on #274, let's merge it once the PR is approved

for now i have removed the project flag

arinda-arif marked this pull request as draft March 17, 2022 02:51

arinda-arif linked an issue Mar 17, 2022 that may be closed by this pull request

Support Recompiling & Refreshing of Jobs in the scheduler. #180

Closed

arinda-arif self-assigned this Mar 17, 2022

arinda-arif force-pushed the refresh-jobs branch from 6bc0613 to bd85af3 Compare March 17, 2022 07:57

arinda-arif force-pushed the refresh-jobs branch from 3431867 to 828f51e Compare March 22, 2022 03:38

arinda-arif marked this pull request as ready for review March 22, 2022 03:40

arinda-arif added 7 commits March 23, 2022 11:16

feat: persist job dependency

4603d0d

feat: add refresh job command

5012807

refactor: change fetch dependencies function to return dependencies only

7d31f1c

refactor: preload dependency project in job dependency repository

852c28c

refactor: remove projectService from dependency resolver and rebase f…

a55cbf8

…rom main

refactor: change job event to progress job and handle refresh respons…

5cbbb88

…e based on the progress type

refactor: remove unused field and sprintf

464ded0

arinda-arif force-pushed the refresh-jobs branch from 828f51e to 464ded0 Compare March 23, 2022 04:58

arinda-arif added 7 commits March 23, 2022 12:12

refactor: remove unused project service GetByID

fe8f6c7

refactor: move notify progress in job refresh to each of the process …

43386d1

…function

refactor: change observer type value to use the event type and refact…

d48ee64

…or ProgressJobDependencyResolution

Merge branch 'main' into refresh-jobs

dd68618

fix: remove err return on JobDependencyRepository

50a878a

refactor: group parameter with same type, change error format to use %w

63a152b

docs: add refresh jobs to guide section

0907953