Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look for filter executables in data-dir if referenced by name #3127

Closed
nichtich opened this issue Sep 23, 2016 · 22 comments
Closed

Look for filter executables in data-dir if referenced by name #3127

nichtich opened this issue Sep 23, 2016 · 22 comments

Comments

@nichtich
Copy link
Contributor

By now filters referred to with --filter must be executables on the PATH with some disadvantages:

  • having many filters you get as many executables in your PATH that are not used on the command line anyway
  • possible naming conflict of filters with other executables
  • there is no standard location to install filters

I propose to let Pandoc first look in --data-dir if a filter is reference by name (instead of by path), so we can collect filter executables there. Temporarily prepending the value of --data-dir to PATH should do the job.

@jgm
Copy link
Owner

jgm commented Sep 23, 2016

This seems reasonable. I'd like to get further comment on
this, though. Perhaps you could raise it on pandoc-discuss.

+++ Jakob Voß [Sep 23 16 00:39 ]:

By now filters referred to with --filter must be executables on the
PATH with some disadvantages:
* having many filters you get as many executables in your PATH that
are not used on the command line anyway
* possible naming conflict of filters with other executables
* there is no standard location to install filters

I propose to let Pandoc first look in --data-dir if a filter is
reference by name (instead of by path), so we can collect filter
executables there. Temporarily prepending the value of --data-dir to
PATH should do the job.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

  1. Look for filter executables in data-dir if referenced by name #3127
  2. https://github.com/notifications/unsubscribe-auth/AAAL5LMBitgU53y76yczozJuP98qwN4rks5qs4IWgaJpZM4KEsVf

@jkr
Copy link
Collaborator

jkr commented Sep 23, 2016

Maybe $DATA_DIR/bin to reduce clutter, and provide better symmetry to $DATA_DIR/templates and the XDG dirs?

@jgm
Copy link
Owner

jgm commented Sep 23, 2016

Yes, good idea to use bin subdirectory.

+++ Jesse Rosenthal [Sep 23 16 04:03 ]:

Maybe $DATA_DIR/bin to reduce clutter, and provide better symmetry to
$DATA_DIR/templates and the XDG dirs?


You are receiving this because you commented.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

  1. Look for filter executables in data-dir if referenced by name #3127 (comment)
  2. https://github.com/notifications/unsubscribe-auth/AAAL5AUkUlRk7h_GEo4qDjrohZapG0pFks5qs7IbgaJpZM4KEsVf

@nichtich
Copy link
Contributor Author

Then $DATA_DIR/filters$ seems more appropriate.

@jgm
Copy link
Owner

jgm commented Sep 25, 2016

Yes, probably filters would make more sense. What do you think @jkr?

@jkr
Copy link
Collaborator

jkr commented Sep 25, 2016

No real preference. filters fits more with pandoc conventions, bin fits more with unix conventions (I think it would be the only thing in my $PATH that didn't end with a bin). So I'm not sure how principle of least surprise would work here. But really, people will figure out either one. So I guess if I had to vote, I'd say bin, but I'd be fine with either.

@jgm
Copy link
Owner

jgm commented Sep 26, 2016

more with unix conventions (I think it would be the only thing in my
$PATH that didn't end with a bin). So I'm not sure how principle of

I think the idea is not that this will go in your path, but
that pandoc would temporarily add it to the path when
looking for the filter.

@jkr
Copy link
Collaborator

jkr commented Sep 26, 2016

Ah, right -- @nichtich did say that. Sorry I missed that. In that case, filters definitely seems better, since it distinguishes it from the path.

@jgm
Copy link
Owner

jgm commented Sep 26, 2016 via email

@jkr
Copy link
Collaborator

jkr commented Sep 26, 2016

So, looks like the easiest way is to replace findExecutable with something like

paths <- splitSearchPath <$> getEnv "path"
let paths' = FILTER_DIR : paths
findExecutableInDirectories paths'

I haven't tried it yet, but something like that. Two issues:

  1. This might be weird on windows, which has its own win32 version of findExecutable
  2. Should the filter dir jump to the head of the line ahead of local bin? Probably, right?

@jgm
Copy link
Owner

jgm commented Sep 26, 2016

Actually it's a bit more complicated than I had thought.
Currently if a filter isn't executable, we automatically supply the right executable (e.g. python or runhaskell). We should make sure that the new feature allows this, too -- you can put a non executable haskell file in ~/.pandoc/filters and it will be useable as a filter. So maybe, instead of adding temporarily to the path, we should include the new filters directory in the logic that is used when the executable isn't found in the path.

@nichtich
Copy link
Contributor Author

Is the executable determined based on file extension or how is it done? I am writing a filter that executed other filters so the same logic should apply there (even for older versions of Pandoc that do not know about ~/.pandoc/filters yet).

@jkr
Copy link
Collaborator

jkr commented Sep 27, 2016

@nichtich: when you call a filter with the --filter/-F flag, pandoc

  1. checks to see if it's a path.
  2. if not, looks in the $PATH (and, in the future, in the filters subdir)
  3. once it has the file, checks to see if it is executable. If so, executes it.
  4. if not, looks to see if it has a recognized file extension (.py, .pl, .rb, .hs, etc). If so, uses the asscoiated program to run it.

If you call another filter from within your filter, we could conceivably alter the path env the filter is run under, allowing the filter to look under pandoc/filters. But I don't see how we could handle step 4. So the question is whether it's worthwhile to go partway, or just make the rule that filters run in the OS, and should be handled accordingly.

@jgm
Copy link
Owner

jgm commented Sep 27, 2016

First we check to see if the filter is executable. If it is, we
simply execute it. If not, we check the extension, and run
it with the following programs:

.py -> python
.hs -> runhaskell
.pl -> perl
.rb -> ruby
.php -> php

+++ Jakob Voß [Sep 27 16 05:11 ]:

Is the executable determined based on file extension or how is it done?
I am writing a filter that executed other filters so the same logic
should apply there (even for older versions of Pandoc that do not know
about ~/.pandoc/filters yet).


You are receiving this because you commented.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

  1. Look for filter executables in data-dir if referenced by name #3127 (comment)
  2. https://github.com/notifications/unsubscribe-auth/AAAL5EQzwNedgkcYMt8P-W4d6nLne_ntks5quQgLgaJpZM4KEsVf

@jkr
Copy link
Collaborator

jkr commented Sep 27, 2016

@jgm: should pandoc/filters have priority over $PATH? (i.e., if we have ~/.pandoc/filters/foo.py and /usr/local/bin/foo.py, and pandoc is called with -F foo.py, we would run the one in ~/.pandoc/filters.) I would say yes.

Looks like we're going to have to add an opts parameter to externalFilter to give it access to optDataDir.

@jkr
Copy link
Collaborator

jkr commented Sep 27, 2016

I'll give this a try this morning and see how it goes, unless you're already on it.

@jkr
Copy link
Collaborator

jkr commented Sep 27, 2016

Actually @jgm, there is a further subtlety. Right now, we prioritize executables. It only looks for extension/runtime if it doesn't find an executable. What do we want to do if there's a non-executable ~/.pandoc/filters/foo.py and an executable /usr/local/bin/foo.py? I think the clearest thing to do would still be to prioritize the filters script.

We're changing the expectation that non-executables will only be used if they're specified by full path. So it would require changing the flow a bit.

@jgm
Copy link
Owner

jgm commented Sep 27, 2016

+++ Jesse Rosenthal [Sep 27 16 05:56 ]:

[1]@jgm: should pandoc/filters have priority over $PATH? (i.e., if we
have ~/.pandoc/filters/foo.py and /usr/local/bin/foo.py, and pandoc is
called with -F foo.py, we would run the one in ~/.pandoc/filters.) I
would say yes.

I'd say yes.

Looks like we're going to have to add an opts parameter to
externalFilter to give it access to optDataDir.

No problem, it's not exported. But probably we SHOULD
export it, so filters can use it themselves.

@jgm
Copy link
Owner

jgm commented Sep 27, 2016

+++ Jesse Rosenthal [Sep 27 16 06:15 ]:

Actually [1]@jgm, there is a further subtlety. Right now, we prioritize
executables. It only looks for extension/runtime if it doesn't find an
executable. What do we want to do if there's a non-executable
~/.pandoc/filters/foo.py and an executable /usr/local/bin/foo.py? I
think the clearest thing to do would still be to prioritize the filters
script.

Yes, I think this makes sense.

We're changing the expectation that non-executables will only be used
if they're specified by full path. So it would require changing the
flow a bit.


You are receiving this because you were mentioned.
Reply to this email directly, [2]view it on GitHub, or [3]mute the
thread.

References

  1. https://github.com/jgm
  2. Look for filter executables in data-dir if referenced by name #3127 (comment)
  3. https://github.com/notifications/unsubscribe-auth/AAAL5I0XuNnSqV9cYchzEFOMwcEjGTflks5quRcBgaJpZM4KEsVf

@jkr
Copy link
Collaborator

jkr commented Sep 27, 2016

okay, this seems to do it:

jkr@110e254

It runs a expansion function prior to externalFilter. If it's not a path, it tries to expand it into a $DATADIR/filters/ path. If it can, it passes that to the externalFilter function as a path. If it can't, it just passes along the filter name. Since externalFilter receives it as a path, it will do the right extension/interpreter things.

Seems to work with my tests (same name in path, path specified, etc). If this sounds good to you, I'll push it after the Travis tests are done.

@jkr
Copy link
Collaborator

jkr commented Sep 27, 2016

BTW, the question of filters using these functions seems like it's part of another larger issue: refactoring the big monolith in pandoc.hs, and putting the functions in Text.Pandoc, or Text.Pandoc.Executable or something. That would allow for a lot of nice things (input files in mixed formats, filters being aware of the given options, better "include" filters because of the previous). Plus it would allow us to test the binary.

I got about 80% of the way through it this summer, but the last 20% is, of course, the hardest.

@jgm
Copy link
Owner

jgm commented Sep 27, 2016

+++ Jesse Rosenthal [Sep 27 16 08:09 ]:

BTW, the question of filters using these functions seems like it's part
of another larger issue: refactoring the big monolith in pandoc.hs, and
putting the functions in Text.Pandoc, or Text.Pandoc.Executable or
something. That would allow for a lot of nice things (input files in
mixed formats, filters being aware of the given options, better
"include" filters because of the previous). Plus it would allow us to
test the binary.

I got about 80% of the way through it this summer, but the last 20% is,
of course, the hardest.

Yes, this is definitely needed. It would also make it
easier to add a GUI or web interface to pandoc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants