March 27, 2018, Tuesday

Issues
- Fixed issues related to displaying folder output in the right panel (e.g. FastQC, MultiQC, JBrowse_out, etc)
- Index files are not an issue (bam file will be retrieved before bam.bai file for the workflow diagram)
- Inputs and parameters are reset after job submission
- Removed file uploading support
- Converted back all apps for direct visualization and reconfigured apache (both places in the file)
- Need to fix sci_data issue: https://pods.iplantcollaborative.org/jira/browse/DS-256
Server set up
- Set public site on the master branch
- Set de on the maizecode branch
- Rebuilding all workflows for both sites
- Need to fix brebiou as well, which has mysql installed with the same password
Improvements
- Mermaid: https://github.com/knsv/mermaid/issues/580
  - Not sure how to add it
  - Could add "flowchart": { "curve": 'basis' } to the online editor: https://mermaidjs.github.io/mermaid-live-editor
- Maximal number of concurrent data transfer: https://unix.stackexchange.com/questions/416498/set-a-limit-on-concurrent-ssh-sftp-connections-to-2-per-user
  - Need to discuss with Rion and Peter
  - Agave is actually only doing server side copy when both host and user account are the same (not necessary to have compute and storage as the same system? Need to test)
  - We observed a lot of NFS traffic on ascutney, brie7, and wildcat via bmon
  - The bottleneck is the NFS mounting, more ideal situation will be purchasing ascutney with plenty of storage space so all copies happen locally (when Agave staging data from wildcat to wildcat on brie7) instead of relying on NFS transfer
  - The pipeline idea makes ascutney also involved in transferring data
  - Possible improvements given the current setup
    - Keep input data on brie7? Will need to mount it to wildcat for h5ai? So data will be copied from brie7 to wildcat during staging, which could be 2x faster since its not reading & writing the same server
      - Benchmark results (only see minor improvements for moving $24 GB data from brie7)
      - [liyawang@brie7 agave]$ time cp ../tacc/jobs_cshl/maizecode/test/* .
        
        real 4m39.879s
        
        user 0m0.042s
        
        sys 0m57.115s
      - [liyawang@brie7 agave]$ time cp *.gz *.fastq ../tacc/jobs_cshl/maizecode/test/.
        
        real 7m9.410s (5m57s)
        
        user 0m0.042s
        
        sys 1m1.983s
      - [liyawang@brie7 agave]$ time cp ../tacc/jobs_cshl/maizecode/test/* ../tacc/jobs_cshl/maizecode/test1/.
        
        real 9m27.909s
        
        user 0m0.047s
        
        sys 1m4.079s
      - [liyawang@brie7 agave]$ time cp *.gz *.fastq test1/.
        
        real 2m58.937s
        
        user 0m0.163s
        
        sys 0m42.421s
      - [liyawang@wildcat maizecode]$ time cp test/* test1/.
        
        real 0m30.816s
        
        user 0m0.125s
        
        sys 0m30.696s
    - Might be useful links
      - https://serverfault.com/questions/547604/transfer-between-two-nfs-shares-on-the-same-machine
      - also Googing 'optimizing NFS mount'?
    - Keep index files on fixed location on wildcat so no staging is needed
    - Limit maximum number of jobs that can run concurrently, maybe 6? Ideally, we just want to limit that in half hour time interval or so. need to check SLURM
- Interface improvement discussed with Jerry
  - Speeding up loading workflow
    - Just use workflow json
    - Don't load job and search for output until user clicks to expand a job (might need to add a circling thing?)
  - Speeding up loading diagram
    - Just use workflow json and load jobs for status and output folder
    - link output files to the folder instead of individual files (so no longer need to search for files)
    - Only display output ids (no longer try to match with individual files)
  - Adding a tutorial column/link for public workflows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

March 27, 2018, Tuesday

Clone this wiki locally