Skip to content

Commit 2a44cbf

Browse files
authored
Merge pull request #44 from aws-samples/dockerfilefixes
Dockerfilefixes
2 parents 789a230 + a480489 commit 2a44cbf

File tree

2 files changed

+9
-6
lines changed

2 files changed

+9
-6
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Here is a summary of the main steps in the script:
4242
1. The lambda_handler function is the entry point for the Lambda function. It receives an event object and a context object as parameters.
4343
2. The s3_bucket_script and input_script variables are used to specify the Amazon S3 bucket and object key where the Spark script is located.
4444
3. The boto3 module is used to download the Spark script from Amazon S3 to a temporary file on the Lambda function's file system.
45-
4. The os.environ dictionary is used to set the PYSPARK_SUBMIT_ARGS environment variable, which is required by the Spark application to run.
45+
4. The os.environ dictionary is used to store any arguments passed via the lambda event.
4646
5. The subprocess.run method is used to execute the spark-submit command, passing in the path to the temporary file where the Spark script was downloaded.The event payload recieved by the lambda is passed onto the spark application via the event arguement.
4747
Overall, this script enables you to execute a Spark script in AWS Lambda by downloading it from an S3 bucket and running it using the spark-submit command. The script can be configured by setting environment variables, such as the PYSPARK_SUBMIT_ARGS variable, to control the behavior of the Spark application. </p>
4848

sparkLambdaHandler.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,14 +34,17 @@ def spark_submit(s3_bucket_script: str,input_script: str, event: dict)-> None:
3434
Submits a local Spark script using spark-submit.
3535
"""
3636
# Set the environment variables for the Spark application
37-
pyspark_submit_args = event.get('PYSPARK_SUBMIT_ARGS', '')
38-
# Source input and output if available in event
39-
input_path = event.get('INPUT_PATH','')
40-
output_path = event.get('OUTPUT_PATH', '')
37+
# pyspark_submit_args = event.get('PYSPARK_SUBMIT_ARGS', '')
38+
# # Source input and output if available in event
39+
# input_path = event.get('INPUT_PATH','')
40+
# output_path = event.get('OUTPUT_PATH', '')
41+
42+
for key,value in event.items():
43+
os.environ[key] = value
4144
# Run the spark-submit command on the local copy of teh script
4245
try:
4346
logger.info(f'Spark-Submitting the Spark script {input_script} from {s3_bucket_script}')
44-
subprocess.run(["spark-submit", "/tmp/spark_script.py", "--event", json.dumps(event)], check=True)
47+
subprocess.run(["spark-submit", "/tmp/spark_script.py", "--event", json.dumps(event)], check=True, env=os.environ)
4548
except Exception as e :
4649
logger.error(f'Error Spark-Submit with exception: {e}')
4750
raise e

0 commit comments

Comments
 (0)