-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically clean up AWS Batch temporary folder #1450
Comments
This looks a duplicate of #452, for which there's no a quick solution tho the plan is to tackle in a more general manner at some point. Have you taken into consideration using a S3 lifecycle policy to cleanup the bucket? it works beautifully https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html |
My apologies, I was hoping to explain how this is a really different issue and concern. There is a group of files staged to the working instance in an ephemeral scratch directory which is distinct from the S3 bucket used for the working directory. That's the set of files I want to target in this issue. Does that help explain? |
If it were possible to write this code in the process, it would constitute a reasonable solution:
Note that this would not impact any files in the work directory |
I see. But that's already done, if you look at the command launcher you will see
|
Oh, that's great to know!
Does it only get invoked on job success? Or whenever a process finished for any reason?
Get Outlook for Android<https://aka.ms/ghei36>
…________________________________
From: Paolo Di Tommaso <notifications@github.com>
Sent: Monday, January 13, 2020 6:45:47 AM
To: nextflow-io/nextflow <nextflow@noreply.github.com>
Cc: Minot, Sam <sminot@fredhutch.org>; Author <author@noreply.github.com>
Subject: Re: [nextflow-io/nextflow] Automatically clean up AWS Batch temporary folder (#1450)
I see. But that's already done, if you look at the command launcher you will see
on_exit() {
exit_status=${nxf_main_ret:=$?}
printf $exit_status | /home/ec2-user/miniconda/bin/aws --region eu-west-1 s3 cp --only-show-errors - s3://nf-course/work/ad/84c59e22b4b0d4dd038b35e9885a05/.exitcode || true
set +u
[[ "$tee1" ]] && kill $tee1 2>/dev/null
[[ "$tee2" ]] && kill $tee2 2>/dev/null
[[ "$ctmp" ]] && rm -rf $ctmp || true
rm -rf $NXF_SCRATCH || true
exit $exit_status
}
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_nextflow-2Dio_nextflow_issues_1450-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DABHZKSH3FBIOAZ54WFVIIQLQ5R5BXA5CNFSM4KE3RELKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIY624A-23issuecomment-2D573697392&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=43yq3NlvxZAkeuLlWR4RGR24qPDLFwQzc_wHlXeny0I&m=zo7ybsUfrZqLT0hQnxBxQPGSz5KY72elkhIUvxl8c4s&s=3obfigXwceyJqdiFP7M0EN2cs-ZzUT1Qeczwv-mAkRM&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABHZKSH4N44QIJYNAIVK5WDQ5R5BXANCNFSM4KE3RELA&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=43yq3NlvxZAkeuLlWR4RGR24qPDLFwQzc_wHlXeny0I&m=zo7ybsUfrZqLT0hQnxBxQPGSz5KY72elkhIUvxl8c4s&s=VGu0PSa-oo7mpQumYzwsrkeRsevBkJtBdD6tM9OMbMI&e=>.
|
it should be invoked in all cases. |
Ok, in that case I'm all set! Thank you for your quick and helpful response! |
New feature
When using the
awsbatch
executor, there is no reason to keep files in the temporary directory after a process has finished execution. I suggest that Nextflow automatically delete all of the files in the ephemeral temporary directory after execution.The accumulation of large files in the temporary directory of AWS Batch workers can cause big problems when they fill up the partition, effectively blocking an entire workflow. While it is possible to write a workflow which fixes this with
afterScript "rm -r *"
, such a solution is entirely incompatible with any executor which uses a shared filesystem (local, SLURM, etc.).Usage scenario
The main usage case is a user who is using the
awsbatch
executor. The desired scenario is that when they run a workflow, the usage partition used for scratch space will be kept to a minimum, storing only those files which are being used by actively running tasks. The current scenario is that long-running workflows will accumulate files in the scratch partition, eventually filling up and completely locking up those workers.For the developer, this improvement would also mean that I can take out the
afterScript "rm -r *"
command in my processes, which will make them easily compatible with local execution modes, and which will also protect me against running out of space in the scratch partition on AWS Batch.Suggest implementation
I regret to say that I do not understand the Nextflow codebase enough to suggest the most efficient implementation of this idea.
The text was updated successfully, but these errors were encountered: