Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using file.size() to request memory for job on cluster #922

Closed
PhilPalmer opened this issue Nov 12, 2018 · 10 comments
Closed

Error when using file.size() to request memory for job on cluster #922

PhilPalmer opened this issue Nov 12, 2018 · 10 comments

Comments

@PhilPalmer
Copy link

PhilPalmer commented Nov 12, 2018

Bug report

Expected behavior and actual behavior

For nf-core/deepvariant dev branch
memory is set as follows in base.config:

memory = { bam.size() < 1000000000 ? 4.GB : check_max( ( bam.size()/1024/1024/1024) * 10.GB * task.attempt, 'memory')}

When the pipeline is run on a computing cluster the job fails to submit despite the requested resources being within the limits. (The requested node, has a max of 28 cores, 48hours of wall time and 128GB RAM per node).

Program output

Caused by:
  Failed to submit process to grid scheduler for execution

Command executed:

  qsub -N nf-make_example .command.run

Command exit status:
  168

Command output:
  qsub: submit error (Illegal attribute or resource value for Resource_List.mem)

Work dir:
  /beegfs/work/iiipe01/2018-10-21_DeepVariant_Test/work/d7/78ed3a18fda374310837f6ce93b9e7

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
Execution cancelled -- Finishing pending tasks before exit

Requested resources from .command.run:

#PBS -N nf-make_example
#PBS -o /beegfs/work/iiipe01/2018-10-21_DeepVariant_Test/work/d7/78ed3a18fda374310837f6ce93b9e7/.command.log
#PBS -j oe
#PBS -q short
#PBS -l nodes=1:ppn=20
#PBS -l walltime=10:00:00
#PBS -l mem=57.4gb
cd /beegfs/work/iiipe01/2018-10-21_DeepVariant_Test/work/d7/78ed3a18fda374310837f6ce93b9e7

Issue only appeared after using file.size() to set memory, see here for original issue/thread

Steps to reproduce the problem

@apeltzer please feel free to add more information such as any steps & environment if possible, thanks

Environment

  • Nextflow version: [?]
  • Java version: [?]
  • Operating system: [macOS, Linux, etc]
@rsuchecki
Copy link
Contributor

Not convinced this is a NF issue. Can men be a float on PBS?

@apeltzer
Copy link
Contributor

Nextflow Version is 18.10.1, Java 8u152, Linux (CentOS7).

Never had that issue before and PBS was fine with jobs being submitted and this just came up with that specific feature. According to https://www.osc.edu/supercomputing/batch-processing-at-osc/pbs-directives-summary , one can submit either in bytes, MB or GB mode.

Not sure whether we can have 57.4GB though...

@pditommaso
Copy link
Member

What executor is this? PBS or SGE ?

@apeltzer
Copy link
Contributor

PBS/Torque

@pditommaso
Copy link
Member

Somehow is expected because the result of the dyn rule you have specified is taken as it is.

result << "-l" << "mem=${task.config.memory.toString().replaceAll(/[\s]/,'').toLowerCase()}"

Maybe NF should use always MiB to avoid the decimal. For now you should round to giga units.

@apeltzer
Copy link
Contributor

.toGiga() works for that I suppose?

@pditommaso
Copy link
Member

I think the problem is this (bam.size()/1024/1024/1024) that returns a decimal, if you replace with bam.size() >> 30 (shift operation) should work.

@PhilPalmer
Copy link
Author

Thanks @pditommaso, unfortunately, while the job can now be submitted the pipeline fails on the make_examples process with an error I had previously

Caused by:
  nextflow.util.MemoryUnit cannot be cast to java.base/java.lang.Long

Exception in thread "Task submitter" java.lang.IllegalArgumentException: Cannot compare java.lang.Long with value '3,925,783' and nextflow.util.MemoryUnit with value '1 GB'

I think this is because the type of bam.size() and 1 GB are different. Do you know how I might be able to solve this?
One way might be to change the code back to how it was and convert the result of (bam.size()/1024/1024/1024) to an int. Do you know how I could do this? Is there a .toInt() function or similar for example? Eg:

memory = { bam.size() < 1000000000 ? 4.GB : check_max( (bam.size()/1024/1024/1024).toInt() * 10.GB * task.attempt, 'memory')}

@pditommaso
Copy link
Member

I'm lost here, I don't see the use of bam.size() >> 30 in your example and surely .toInt() does not exist.

@PhilPalmer
Copy link
Author

I couldn't get bam.size() >> 30 to work but realised I had made a mistake. No .toInt() doesn't exist, I was hoping you may know of a similar method but it doesn't matter now. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants