Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse2: created #96

Closed
wants to merge 5 commits into from
Closed

Conversation

agalitsyna
Copy link
Member

@agalitsyna agalitsyna commented Mar 9, 2021

  • docs
  • parse 'all' policy removed
  • parse2 command
  • parse2 coordinate-system option
  • tests

Parse2
-------------------------

If your Hi-C has long reads, you may want to report all the alignments in the reads with ``pairtools parse2``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe specify read length where it would benefit? e.g.
If your Hi-C has long reads (>50bp).

Would this work on nanopore reads or not? Maybe good to mention somewhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

150 bp allows you to save 10% of the simple Hi-C library on DpnII. On Nanopore this won't work for now. Good points, will mention!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by #99

'first column lists scaffold names. Any scaffolds not listed will be '
'ordered lexicographically following the names provided.')
@click.option(
"-o", "--output",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be more clear if this variable be renamed to output-file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, good suggestion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revolved in PR update: #99

@agalitsyna
Copy link
Member Author

agalitsyna commented Mar 9, 2021

  • IPython notebook on MC-3C reads with parse2 (TestWalks.ipynb update)
  • Parsing of the sam header (breaks when the options are not as expected)
  • Arima or Hi-C reads on 150-300 bp as proof of concept
  • [-] Pore-C as additional test (impossible for now, as there are no public Pore-C fastqs)
  • minimap output?

@agalitsyna agalitsyna marked this pull request as draft March 21, 2021 18:11
@agalitsyna agalitsyna closed this Mar 21, 2021
@agalitsyna agalitsyna deleted the origin/parse_all branch March 21, 2021 18:48
@agalitsyna
Copy link
Member Author

This PR is moved to drafts, the branch is renamed to "parse2"

agalitsyna added a commit to agalitsyna/pairtools that referenced this pull request Apr 9, 2021
@agalitsyna agalitsyna mentioned this pull request Apr 10, 2021
4 tasks
@agalitsyna
Copy link
Member Author

Updated version of this PR: #99

agalitsyna added a commit that referenced this pull request Dec 8, 2021
* Parse2: created. Improved version of parse2 with resolved comments from the previous PR: #96

Major changes:

* Single-end mode of parse2 added, --single-end option. Tested on minimap2 output for MC-3C.

* parse2 now has three possible coordinate systems for reporting: read, walk and pair (described in the docstring). Default coord system "read" tested.

* demo notebook with MC-3C and Arima datasets

* simplified code of parse2, e.g. push_pair function added instead of repetitive code
improved docstrings

* Max molecule size replaced with max fragment size.  

* parse2(docs): Documentation improved, #96 (comment) resolved.

* Option to report 5' or 3' ends option added.
@agalitsyna agalitsyna mentioned this pull request Dec 8, 2021
agalitsyna added a commit that referenced this pull request Apr 11, 2022
Improved version of parse2 with resolved comments from the previous PR: #96

- Separation of parse and parse2 modules. Parse has an option --walks-policy all, which parses long walks, but always reporting pair orientation and outer positions of 5'-ends, as if each pair was read in paired-end mode independently. Parse2 is specifically designed for long walks, and has options --report-position and --report-orientation, which might be used to report junctions, or reads, or walks.

- Parse2 has an option to parse single-end reads, --single-end option, tested on minimap2 output for MC-3C.

- Parse2 has the max_fragment_size instead instead of parse's max_molecule_size, which help to determine the overlapping ends of forward and reverse reads.

- Recent update simplifies the code: single _parse library used by both parse and parse2,

- a number of functions that reduce repetitive code, e.g. push_pair function,

- dosctrings and documented structure of _parse library.

- Both parse and parse2 have the options to report 5' or 3' ends; to flip alignments according to chromosome coordinate.

- Both parse and parse2 have the pysam backend

- Improvements of the tests for parse and parse2

- Documentation includes description of various --report-orientation and --report-position cases.
agalitsyna added a commit that referenced this pull request Jun 1, 2022
* Separate cli and lib

* pairtools flip fix for unannotated chromosomes, resolving #91

* handle empty chromosomes, resolved
#76

* fixed rfrags indexing and first rfrag omission, resolved
#73

* resolved or deprecated suggestions in #16

* merge improvements, header merge fixed

- resolved merge without arguments: #61

- option to add only the first header in merge, resolved
#18

* in merge, added option to concatenate instead of merge sorted inputs,
resolving: #23

* merge now checks that columns of inputs are the same

* I/O improvements

- auto_open defaults to stdin/stdout when path evaluates to False.
resolved #48

- auto_open defaults to stdin/stdout when the path is "-"

- if the stream is optional, it's controlled by the module itself

* Parse2 update (#99) (#109)

Improved version of parse2 with resolved comments from the previous PR: #96

- Separation of parse and parse2 modules. Parse has an option --walks-policy all, which parses long walks, but always reporting pair orientation and outer positions of 5'-ends, as if each pair was read in paired-end mode independently. Parse2 is specifically designed for long walks, and has options --report-position and --report-orientation, which might be used to report junctions, or reads, or walks.

- Parse2 has an option to parse single-end reads, --single-end option, tested on minimap2 output for MC-3C.

- Parse2 has the max_fragment_size instead instead of parse's max_molecule_size, which help to determine the overlapping ends of forward and reverse reads.

- Recent update simplifies the code: single _parse library used by both parse and parse2,

- a number of functions that reduce repetitive code, e.g. push_pair function,

- dosctrings and documented structure of _parse library.

- Both parse and parse2 have the options to report 5' or 3' ends; to flip alignments according to chromosome coordinate.

- Both parse and parse2 have the pysam backend

- Improvements of the tests for parse and parse2

- Documentation includes description of various --report-orientation and --report-position cases.

* Merge pairlib into pairtools.lib.

* CLI for scalings added.

* stats output in yaml format

* Header CLI (#121)

- new module called by `pairtools header`
- submodules: 
  - generate : Generate the header
  - set-columns : Add the columns to the .pairs/pairsam file
  - transfer : Transfer the header from one pairs file to another
  - validate-columns : Validate the columns of the .pairs/pairsam file
- resolves #119 
- option remove-columns for `pairtools select`: Remove the columns from .pairs/pairsam file

* pairtools phase critical update (#114)

* imporant fixes: - cython dedup with no-parent id forgotten counter reset; - sphinx doc update (added pysam); - header warning if empty and error if try to add a field to empy one

* Add summaries (#105)

* Add functions for duplication tile and complexity

* Make dedup stats!

* Benchmarks finalization

* [WIP] Stats split by filters (#132)

* Markasdup lib removed; markasdup CLI explanation improved

* dedup filter stats added and tested

Co-authored-by: Aleksandra Galitsyna <agalitzina@gmail.com>
Co-authored-by: Ilya Flyamer <flyamer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants