-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark ETL FLEX jobs - L2 Constructs #2
Conversation
command: { | ||
name: JobType.ETL, | ||
scriptLocation: this.codeS3ObjectUrl(props.script), | ||
pythonVersion: PythonVersion.THREE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required given that it is a Scala Flex job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it...
|
||
export interface PySparkFlexEtlJobProps extends JobProperties { | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these in the RFC? Cannot see them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are the Props interface.. not required to be passed during invocation
const args: { [key: string]: string } = {}; | ||
args['--job-language'] = JobLanguage.PYTHON; | ||
|
||
// TODO: Confirm with Glue service team what the mapping is from extra-x to job language, if any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO still valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsure...
return args; | ||
} | ||
|
||
private setupSparkUI(role: iam.IRole, sparkUiProps: SparkUIProps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some comments here to describe the functions purpose, including the arguments and return types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* S3 URL where additional python dependencies are located | ||
* @default - no extra files | ||
*/ | ||
readonly extraPythonFiles?: string[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in this class? This is for Scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it from the scala props interface
return args; | ||
} | ||
|
||
private setupSparkUI(role: iam.IRole, sparkUiProps: SparkUIProps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can comment what this function does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
role: iam_role, | ||
}); | ||
|
||
/*new glue.PySparkFlexEtlJob(stack, 'BasicPySparkFlexEtlJobv3', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is on the test file. Just thought of leaving it there as it can be used for any future test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small questions / removals
...this.checkNoReservedArgs(props.defaultArguments), | ||
}; | ||
|
||
/*if ((!props.workerType && props.numberOrWorkers !== undefined) || (props.workerType && props.numberOrWorkers === undefined)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove. We're enforcing contractual defaults so both will always be set whether they're overridden or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.. removed it.
//args['--extra-py-files'] = props.extraPythonFiles.map(code => this.codeS3ObjectUrl(code)).join(','); | ||
} | ||
|
||
// if (props.extraJars && props.extraJars?.length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify whether extraFiles is used for all languages, but definitely don't need extraJars in the pyspark job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the extraJars from pyspark job. Added --extra-py-files and --extra-files for pyspark flex etl job. The property glueVersion is defined in the interfaces on job-execution.ts(SharedJobExecutableProps) and job.ts(JobProperties) and are conflicting and hence could not use SharedJobExecutableProps interface.
|
||
} | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-paste?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed....
public readonly sparkUILoggingLocation?: SparkUILoggingLocation; | ||
|
||
/** | ||
* PySparkFlexEtlJob constructor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-paste?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
numberOfWorkers: props.numberOrWorkers ? props.numberOrWorkers : 10, | ||
maxRetries: props.maxRetries, | ||
executionProperty: props.maxConcurrentRuns ? { maxConcurrentRuns: props.maxConcurrentRuns } : undefined, | ||
//notificationProperty: props.notifyDelayAfter ? { notifyDelayAfter: props.notifyDelayAfter.toMinutes() } : undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is optional.. It was commented in the pyspark ELT class. Now I have uncommented it and it would be an option parameter to pass
numberOfWorkers: props.numberOrWorkers ? props.numberOrWorkers : 10, | ||
maxRetries: props.maxRetries, | ||
executionProperty: props.maxConcurrentRuns ? { maxConcurrentRuns: props.maxConcurrentRuns } : undefined, | ||
//notificationProperty: props.notifyDelayAfter ? { notifyDelayAfter: props.notifyDelayAfter.toMinutes() } : undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is optional.. It was commented in the pyspark ELT class. Now I have uncommented it and it would be an option parameter to pass
//args['--extra-py-files'] = props.extraPythonFiles.map(code => this.codeS3ObjectUrl(code)).join(','); | ||
} | ||
|
||
// if (props.extraJars && props.extraJars?.length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirm extraFiles requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added --extra-jars , --extra-files & --user-jars-first to the scalaspark flex job. These are optional. There is a conflict for the property glueVersion from the interfaces job-execution.ts(SharedJobExecutableProps) and job.ts(JobProperties) and hence could not use the properties from the interfaces in job-execution.ts.
Issue # (if applicable)
https://app.asana.com/0/1204099321156022/1204826115759041/f
Description of changes
Per RFC#498, Added L2 Constructs for Spark FLEX ETL job - Python & Scala
Added Integration and Unit tests for both the constructs(Python & Scala ETL Flex jobs)
Refer to the RFC for more details (Spark ETL FLEX jobs) - aws/aws-cdk-rfcs#497
Description of how you validated changes
Validated through Integration & Unit testing
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license