-
Notifications
You must be signed in to change notification settings - Fork 1
March 27, 2018, Tuesday
Liya Wang edited this page Mar 29, 2018
·
6 revisions
- Issues
- Fixed issues related to displaying folder output in the right panel (e.g. FastQC, MultiQC, JBrowse_out, etc)
- Index files are not an issue (bam file will be retrieved before bam.bai file for the workflow diagram)
- Inputs and parameters are reset after job submission
- Removed file uploading support
- Converted back all apps for direct visualization and reconfigured apache (both places in the file)
- Need to fix sci_data issue: https://pods.iplantcollaborative.org/jira/browse/DS-256
- Server set up
- Set public site on the master branch
- Set de on the maizecode branch
- Rebuilding all workflows for both sites
- Need to fix brebiou as well, which has mysql installed with the same password
- Improvements
- Mermaid: https://github.com/knsv/mermaid/issues/580
- Not sure how to add it
- Could add "flowchart": { "curve": 'basis' } to the online editor: https://mermaidjs.github.io/mermaid-live-editor
- Maximal number of concurrent data transfer: https://unix.stackexchange.com/questions/416498/set-a-limit-on-concurrent-ssh-sftp-connections-to-2-per-user
- Need to discuss with Rion and Peter
- Agave is actually only doing server side copy when both host and user account are the same (not necessary to have compute and storage as the same system? Need to test)
- We observed a lot of NFS traffic on ascutney, brie7, and wildcat via bmon
- The bottleneck is the NFS mounting, more ideal situation will be purchasing ascutney with plenty of storage space so all copies happen locally (when Agave staging data from wildcat to wildcat on brie7) instead of relying on NFS transfer
- The pipeline idea makes ascutney also involved in transferring data
- Possible improvements given the current setup
- Keep input data on brie7? Will need to mount it to wildcat for h5ai? So data will be copied from brie7 to wildcat during staging, which could be 2x faster since its not reading & writing the same server
- Benchmark results (only see minor improvements for moving $24 GB data from brie7)
- [liyawang@brie7 agave]$ time cp ../tacc/jobs_cshl/maizecode/test/* .
- real 4m39.879s
- user 0m0.042s
- sys 0m57.115s
- [liyawang@brie7 agave]$ time cp *.gz *.fastq ../tacc/jobs_cshl/maizecode/test/.
- real 7m9.410s (5m57s)
- user 0m0.042s
- sys 1m1.983s
- [liyawang@brie7 agave]$ time cp ../tacc/jobs_cshl/maizecode/test/* ../tacc/jobs_cshl/maizecode/test1/.
- real 9m27.909s
- user 0m0.047s
- sys 1m4.079s
- [liyawang@brie7 agave]$ time cp *.gz *.fastq test1/.
- real 2m58.937s
- user 0m0.163s
- sys 0m42.421s
- [liyawang@wildcat maizecode]$ time cp test/* test1/.
- real 0m30.816s
- user 0m0.125s
- sys 0m30.696s
- Might be useful links
- https://serverfault.com/questions/547604/transfer-between-two-nfs-shares-on-the-same-machine
- also Googing 'optimizing NFS mount'?
- Keep index files on fixed location on wildcat so no staging is needed
- Limit maximum number of jobs that can run concurrently, maybe 6? Ideally, we just want to limit that in half hour time interval or so. need to check SLURM
- Keep input data on brie7? Will need to mount it to wildcat for h5ai? So data will be copied from brie7 to wildcat during staging, which could be 2x faster since its not reading & writing the same server
- Interface improvement discussed with Jerry
- Speeding up loading workflow
- Just use workflow json
- Don't load job and search for output until user clicks to expand a job (might need to add a circling thing?)
- Speeding up loading diagram
- Just use workflow json and load jobs for status and output folder
- link output files to the folder instead of individual files (so no longer need to search for files)
- Only display output ids (no longer try to match with individual files)
- Adding a tutorial column/link for public workflows
- Speeding up loading workflow
- Mermaid: https://github.com/knsv/mermaid/issues/580