Skip to content

update_bedgraph

dytk2134 edited this page Sep 12, 2018 · 1 revision

Inroduction

Update the sequence id and coordinates of a BedGraph file using an alignment file generated by the fasta_diff program

The coordinates are converted by the following algorithm

  • bedGraph_new_start = bedGraph_old_start - match_old_start + match_new_start
  • bedGraph_old_end = bedGraph_old_end - match_old_start + match_new_start

In the following situation, the line in bedGraph will be removed.

  • bedGraph_old_start and bedGraph_old_end coordinates not contained within match_old_start to match_old_end
  • sequence name not found in the match.tsv file (output from the fasta_diff program)

bedGraph format

chromA  chromStartA  chromEndA  dataValueA
chromB  chromStartB  chromEndB  dataValueB

chrom, chromStart, chromEnd will be updated

Usage:

update_bedgraph -a match.tsv example_file/example.bedGraph

Message Example:

INFO     Reading alignment data from: match.tsv...
INFO       Alignments: 7081
INFO     Processing BedGraph file: ctrlF-BRN_S18.sorted.BedGraph...
INFO       Updated lines: 7072732
INFO       Removed lines: 11

Result Example:

CASE1: 100% match

  • Information in match.tsv
old_id old_start old_end new_id new_start new_end
Scaffold1 0 3368518 KK245166.1 0 3368518
  • original Bedgraph file
Scaffold1       28      35      1
Scaffold1       35      36      2
Scaffold1       36      37      3
  • updated Bedgraph file
KK245166.1      28      35      1
KK245166.1      35      36      2
KK245166.1      36      37      3

CASE2: New sequence is a substring of the old sequence with 100% match

  • Information in match.tsv
old_id old_start old_end new_id new_start new_end
Scaffold4139 2368 8532 JHOM01041610.1 0 6164
  • original Bedgraph file
Scaffold4139    265     337     3
Scaffold4139    337     340     1
Scaffold4139    340     3299    0
Scaffold4139    3299    3324    1
Scaffold4139    3324    3325    4
Scaffold4139    3325    3326    5
  • updated Bedgraph file
JHOM01041610.1  931     956     1
JHOM01041610.1  956     957     4
JHOM01041610.1  957     958     5
  • removed Bedgraph file
Scaffold4139    265     337     3
Scaffold4139    337     340     1
Scaffold4139    340     3299    0

CASE3: part of the old sequence was converted into Ns

  • Information in match.tsv
old_id old_start old_end new_id new_start new_end
Scaffold1688 0 390 KK246853.1 0 390
Scaffold1688 2775 4110 KK246853.1 2775 4110
Scaffold1688 4670 5814 KK246853.1 4670 5814
Scaffold1688 8337 8871 KK246853.1 8337 8871
Scaffold1688 10333 11477 KK246853.1 10333 11477
  • original Bedgraph file
Scaffold1688    5735    5738    1
Scaffold1688    5738    5784    0
Scaffold1688    5784    5807    1
Scaffold1688    5807    6909    0
Scaffold1688    6909    6910    1
Scaffold1688    6910    6911    3
  • updated Bedgraph file
KK246853.1      5735    5738    1
KK246853.1      5738    5784    0
KK246853.1      5784    5807    1
  • removed Bedgraph file
Scaffold1688    5807    6909    0
Scaffold1688    6909    6910    1
Scaffold1688    6910    6911    3

CASE4: Information in match.tsv not found

  • original Bedgraph file
Scaffold5211    2790    2865    1
Scaffold5211    2926    2962    1
Scaffold5211    2963    3001    1
  • removed Bedgraph file
Scaffold5211    2790    2865    1
Scaffold5211    2926    2962    1
Scaffold5211    2963    3001    1

Running the program with –h prints the following help:

update_bedgraph -h

usage: update_bedgraph [-h] [-a ALIGNMENT_FILE] [-u UPDATED_POSTFIX]
                       [-r REMOVED_POSTFIX] [-v]
                       BedGraph_FILE [BedGraph_FILE ...]

Update the sequence id and coordinates of a BedGraph file using an alignment file generated by the fasta_diff program.
Updated Line are written to a new file with '_updated'(default) appended to the original BedGraph file name.
Line that can not be updated, due to the id being removed completely or the line contains regions that
are removed or replaced with Ns, are written to a new file with '_removed'(default) appended to the original BedGraph file name.

Example:
    fasta_diff example_file/old.fa example_file/new.fa | update_bedgraph example_file/example.bedGraph

positional arguments:
  BedGraph_FILE         List one or more BedGraph files to be updated

optional arguments:
  -h, --help            show this help message and exit
  -a ALIGNMENT_FILE, --alignment_file ALIGNMENT_FILE
                        The alignment file generated by fasta_diff, a TSV file
                        with 6 columns: old_id, old_start, old_end, new_id,
                        new_start, new_end (default: STDIN)
  -u UPDATED_POSTFIX, --updated_postfix UPDATED_POSTFIX
                        The filename postfix for updated features (default:
                        "_updated")
  -r REMOVED_POSTFIX, --removed_postfix REMOVED_POSTFIX
                        The filename postfix for removed features (default:
                        "_removed")
  -v, --version         show program's version number and exit
Clone this wiki locally