Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java run cromwell.jar in HPC, too many file handles have been opened #7571

Open
AlsoATraveler opened this issue Oct 14, 2024 · 0 comments
Open

Comments

@AlsoATraveler
Copy link

Hi,
I run 2 cromwell jobs, and then I find each job have lots of file handle, I use command "lsof | grep uername | awk '{print $1"\t"$2}' | sort | uniq -c | sort -nr" to see it, if I run 50 jobs, I will not be able to log into that Linux through shell, how to reduce the number of file handle in each job? thanks.

6533 cromwell-       4751
   5687 cromwell-       2381
    940 pool-6-th       4751
    940 pool-6-th       2381
    940 pool-5-th       2381
    705 pool-5-th       4751
    611 GC      4751
    611 GC      2381
    470 pool-9-th       4751
    470 pool-9-th       2381
    470 pool-8-th       4751
    470 pool-8-th       2381
    470 pool-7-th       4751
    470 pool-7-th       2381
    470 pool-10-t       4751
    470 pool-10-t       2381
    282 G1      4751
    282 G1      2381
    188 blaze-tic       4751
    188 blaze-tic       2381
     94 VM      4751
     94 VM      2381
     94 java    4751
     94 java    2381
     94 db-9    4751
     94 db-9    2381
     94 db-8    4751
     94 db-8    2381
     94 db-7    4751
     94 db-7    2381
     94 db-6    4751
     94 db-6    2381
     94 db-5    4751
     94 db-5    2381
     94 db      4751
     94 db-4    4751
     94 db-4    2381
     94 db-3    4751
     94 db-3    2381
     94 db-2    4751
     94 db      2381
     94 db-2    2381
     94 db-20   4751
     94 db-20   2381
     94 db-19   4751
     94 db-19   2381
     94 db-18   4751
     94 db-18   2381
     94 db-17   4751
     94 db-17   2381
     94 db-16   4751
     94 db-16   2381
     94 db-15   4751
     94 db-15   2381
     94 db-1    4751 ...

this is my java command

java -Xms10M  -Xmx125M -Dconfig.file=SGE.conf -jar cromwell-86.jar run xxx.wdl --inputs xxx.json

SGE.conf file:

# Documentation:
# https://cromwell.readthedocs.io/en/stable/backends/SGE

backend {
  default = SGE

  providers {
    SGE {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {

        # Limits the number of concurrent jobs
        concurrent-job-limit = 5

        # If an 'exit-code-timeout-seconds' value is specified:
        # - check-alive will be run at this interval for every job
        # - if a job is found to be not alive, and no RC file appears after this interval
        # - Then it will be marked as Failed.
        # Warning: If set, Cromwell will run 'check-alive' for every job at this interval

        exit-code-timeout-seconds = 120

        runtime-attributes = """
        Int cpu = 1
        Float? memory_gb
        String? sge_queue = "xxx"
        String? sge_project = "xxx"
        """

        submit = """
        qsub \
        -terse \
        -V \
        -b y \
        -N ${job_name} \
        -wd ${cwd} \
        -o ${out}.qsub \
        -e ${err}.qsub \
        ${"-l num_proc=" + cpu + ",virtual_free=" + memory_gb + "g"} \
        ${"-q " + sge_queue} \
        ${"-P " + sge_project} \
        -binding ${"linear:" + cpu} \
        /usr/bin/env bash ${script}
        """

        kill = "qdel ${job_id}"
        check-alive = "qstat -j ${job_id}"
        job-id-regex = "(\\d+)"
      }
    }
  }
}

call-caching {
    enabled = true
    invalidate-bad-cache-results = true
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant