Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI to add header #119

Closed
Phlya opened this issue Apr 8, 2022 · 3 comments · Fixed by #121
Closed

CLI to add header #119

Phlya opened this issue Apr 8, 2022 · 3 comments · Fixed by #121

Comments

@Phlya
Copy link
Member

Phlya commented Apr 8, 2022

Since we are starting to rely on the presence of a header and column names in particular, would be nice to provide a tool to add a header to a headerless file.

@Phlya Phlya mentioned this issue Apr 8, 2022
31 tasks
@agalitsyna
Copy link
Member

Also, because pairtools restrict and pairtools dedup can now generate additional columns, it makes some sequence of commands irreversible. E.g. files with extra columns cannot be merged with files without such columns; there is no way to add restriction sites annotation, select and then remove unnecessary fields.
Ilya and I thought of some
pairtools headerops utils that will

  • add columns to the header (pairtools headerops add-header?)
  • remove unnecessary fields from the whole table (pairtools headerops select?)
  • check if the header complies with the content (pairtools headerops verify?)

This is not an issue but rather the discussion that might be important to decide before 1.0.0. release.

@agalitsyna
Copy link
Member

@Phlya Can you check whether this draft is similar to what might be useful to you?
#121

Any comments or modifications are much appreciated!

agalitsyna added a commit that referenced this issue Apr 14, 2022
- new module called by `pairtools header`
- submodules: 
  - generate : Generate the header
  - set-columns : Add the columns to the .pairs/pairsam file
  - transfer : Transfer the header from one pairs file to another
  - validate-columns : Validate the columns of the .pairs/pairsam file
- resolves #119 
- option remove-columns for `pairtools select`: Remove the columns from .pairs/pairsam file
@agalitsyna
Copy link
Member

Okay, haven't heard on this suggestion for over a week, will convert it to discussion. The module is ready for testing/improvements, feel free to add ideas!

@open2c open2c locked and limited conversation to collaborators Apr 20, 2022
@agalitsyna agalitsyna converted this issue into discussion #123 Apr 20, 2022
agalitsyna added a commit that referenced this issue Jun 1, 2022
* Separate cli and lib

* pairtools flip fix for unannotated chromosomes, resolving #91

* handle empty chromosomes, resolved
#76

* fixed rfrags indexing and first rfrag omission, resolved
#73

* resolved or deprecated suggestions in #16

* merge improvements, header merge fixed

- resolved merge without arguments: #61

- option to add only the first header in merge, resolved
#18

* in merge, added option to concatenate instead of merge sorted inputs,
resolving: #23

* merge now checks that columns of inputs are the same

* I/O improvements

- auto_open defaults to stdin/stdout when path evaluates to False.
resolved #48

- auto_open defaults to stdin/stdout when the path is "-"

- if the stream is optional, it's controlled by the module itself

* Parse2 update (#99) (#109)

Improved version of parse2 with resolved comments from the previous PR: #96

- Separation of parse and parse2 modules. Parse has an option --walks-policy all, which parses long walks, but always reporting pair orientation and outer positions of 5'-ends, as if each pair was read in paired-end mode independently. Parse2 is specifically designed for long walks, and has options --report-position and --report-orientation, which might be used to report junctions, or reads, or walks.

- Parse2 has an option to parse single-end reads, --single-end option, tested on minimap2 output for MC-3C.

- Parse2 has the max_fragment_size instead instead of parse's max_molecule_size, which help to determine the overlapping ends of forward and reverse reads.

- Recent update simplifies the code: single _parse library used by both parse and parse2,

- a number of functions that reduce repetitive code, e.g. push_pair function,

- dosctrings and documented structure of _parse library.

- Both parse and parse2 have the options to report 5' or 3' ends; to flip alignments according to chromosome coordinate.

- Both parse and parse2 have the pysam backend

- Improvements of the tests for parse and parse2

- Documentation includes description of various --report-orientation and --report-position cases.

* Merge pairlib into pairtools.lib.

* CLI for scalings added.

* stats output in yaml format

* Header CLI (#121)

- new module called by `pairtools header`
- submodules: 
  - generate : Generate the header
  - set-columns : Add the columns to the .pairs/pairsam file
  - transfer : Transfer the header from one pairs file to another
  - validate-columns : Validate the columns of the .pairs/pairsam file
- resolves #119 
- option remove-columns for `pairtools select`: Remove the columns from .pairs/pairsam file

* pairtools phase critical update (#114)

* imporant fixes: - cython dedup with no-parent id forgotten counter reset; - sphinx doc update (added pysam); - header warning if empty and error if try to add a field to empy one

* Add summaries (#105)

* Add functions for duplication tile and complexity

* Make dedup stats!

* Benchmarks finalization

* [WIP] Stats split by filters (#132)

* Markasdup lib removed; markasdup CLI explanation improved

* dedup filter stats added and tested

Co-authored-by: Aleksandra Galitsyna <agalitzina@gmail.com>
Co-authored-by: Ilya Flyamer <flyamer@gmail.com>

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants