Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool to split, merge and rearrange pages of notebooks #14

Closed
wants to merge 3 commits into from
Closed

Tool to split, merge and rearrange pages of notebooks #14

wants to merge 3 commits into from

Conversation

matteodelabre
Copy link
Collaborator

(This would close #1, close #3.)

A small proposal for a new tool which enables splitting, merging and rearranging pages of several notebooks. It is loosely inspired from pdftk’s syntax. Here is the intended usage gist:

# Merge notebooks A and B into C
arrange A - B -o C

# Reverse the order of pages in notebook A
arrange A last-first

# Split notebook A into B and C
arrange A first-10 -o B
arrange A 11-last -o C

# Remove pages 3 and 8 from notebook A
arrange A first-2 4-7 9-last

I’d love to have your opinion on several design choices I made:

  • because the CLI syntax is non-trivial for this tool, I thought it was better to delegate some of the CLI parsing to an external library. I used CLI11 because it is quite small and well-supported (and easily generates a nice help page!), but we can easily migrate this to another tool if you prefer;
  • I used a Git submodule to pull in the dependency, and a small snipped in the CMake listfile to automatically download submodules at configure time;
  • however, there is still some ad-hoc parsing involved, specifically for parsing ranges. I figured out the grammar was not complex enough to justify pulling in another dependency for parser generation. However, as a result, the parsing code is quite brittle. The formal grammar is as such:
args ::= notebook ('-' notebook)*
notebook ::= string range*
range ::= page'-'page
page ::= 'first' | 'last' | number
  • by default, if no output is specified, the output overwrites the last notebook specified. I’m in two minds about this: it is quite convenient for when you operate on a single notebook (as seen in the previous examples), but I feel this could bite some users if they are not careful enough (eg. arrange A 1-2 and you lose all pages except 1 and 2).

What do you think?

@matteodelabre
Copy link
Collaborator Author

matteodelabre commented Aug 5, 2018

Mmh, unfortunately GCC generates a lot of variable shadowing warnings while compiling the CLI11 library, which causes trouble in Travis because of the -Werror flag. We would have to find a way to contain the warnings generation to this library’s own code, and not generate any warning for third-party code. This is tricky because it is a header-only library. Not sure how to fix this.

Edit: I’m able to workaround this problem by forcing CMake to treat CLI11’s included directories as SYSTEM directories, which instructs GCC to not emit warnings for those files. However, this requires poking around the internals of the library, which is not that clean:

get_target_property(CLI11_include_dirs CLI11 INTERFACE_INCLUDE_DIRECTORIES)
set_target_properties(CLI11 PROPERTIES INTERFACE_SYSTEM_INCLUDE_DIRECTORIES "${CLI11_include_dirs}")

@ax3l ax3l self-requested a review August 6, 2018 09:23
@ax3l ax3l added the tools label Aug 6, 2018
@ax3l
Copy link
Owner

ax3l commented Aug 6, 2018

It is loosely inspired from pdftk’s syntax.

That's just excellent, I would have done the same!

CLI11

Love it, wanted to use it for a long time. I can make sure it's placed at the right point and integrated well in our build system. Is it gcc 4.8 compatible?

Git submodule

not a big fan, will place a pinned version with cmake control for external lib if requested. similar to how we do it here: https://github.com/openPMD/openPMD-api

by default, if no output is specified

how does pdftk handle this situation?

shadowing warnings while compiling the CLI11

hm, why aren't they already included as system libs anyway?

Anyway, we could also add a gcc/clang pragma pushing/popping the warning around the include.
Also, we should report his upstream. Shadowing is subideal.

@matteodelabre
Copy link
Collaborator Author

CLI11’s README claims compatibility with GCC ≥4.7. I’ve managed to compile it inside a Docker container with GCC 4.8. The shadowing warnings are almost exclusively for constructor arguments that shadow class member variables, and it looks like the folks from Clang decided not to report this use as shadowing.

Not sure about using a pragma for this, because it would imply having to surround all includes of this library, should we use it somewhere else in the future (although I guess we could make an internal header for this?).

I’m far from a CMake expert, but as far as I understand CMake treats included files as system headers only if the SYSTEM flag is passed to target_include_directories. Because we’re using the exported target from CLI11 instead of manually including the right folder, it uses the target_include_directories from CLI11’s CMake listfile, which does not have the SYSTEM flag.

Alright about submodules, I was not sure how you’d prefer including the dependency. Would you like me to change this or can you take care of it? I think you have write access to my branch if needed.

Regarding the syntax, pdftk aborts with an error if no output is specified. This is inconvenient if you want the output to be the same as the input, especially with lines files whose file names are long, because you have to type it twice. Furthermore, pdftk does not allow outputting to a file used as input—I can’t see any good reason for that, it just makes in-place modification harder. Do you think we should strive for a fully pdftk-compatible syntax? This would surely confuse users less.

@ax3l
Copy link
Owner

ax3l commented Aug 10, 2018

If you like, we could also use CMake's ExternalProject(_Add) (or FetchContent) for the dependency to download it on the fly to the temporary build dir. It's an internal dependency, so that should work well. We can later on add an option to skip the download and use a locally available version as well.

@ax3l
Copy link
Owner

ax3l commented Sep 21, 2018

hi @matteodelabre, sorry for being so inactive in the last months. I am still quite busy with my work...

That said, is it ok if I acknowledge you in a public talk at "Datenspuren" tomorrow? :)
Event (German): https://www.datenspuren.de/2018/fahrplan.html
Talk: https://www.datenspuren.de/2018/fahrplan/events/9324.html

@matteodelabre
Copy link
Collaborator Author

Hi Axel, no worries, I’ve been quite busy too, but I should be able to make progress on this soon. About the talk, no problem; on the contrary, thanks for the acknowledgement! I hope this will convince more people into contributing to the project, and I wish you good luck for the talk.

@matteodelabre
Copy link
Collaborator Author

matteodelabre commented Mar 6, 2020

Hi @ax3l! So, it turns out “soon” was quite a long time in the end. During the hiatus, the file format changes have made this kind of manipulation much easier, not requiring manipulation of the binary format anymore. Plus, moving pages around is now built-in in the tablet’s UI. All in all, I’m no longer convinced of the usefulness of this PR. What’s your take on this?

@ax3l
Copy link
Owner

ax3l commented Jun 11, 2020

That's no problem, we can close it here, which still keeps it for informational purposes for readers if you like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tool: Reorder Pages in a Notebook Tool: Split/Merge Notebooks
2 participants