Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: --glob-input relative path error #1655

Open
felix9743 opened this issue Sep 15, 2023 · 2 comments
Open

BUG: --glob-input relative path error #1655

felix9743 opened this issue Sep 15, 2023 · 2 comments
Assignees
Labels
C-bug Category: bug

Comments

@felix9743
Copy link

I use the Windows binary to convert the images extracted from a PDF but when I give a relative path, it tells me that it does not find it.

image

It works properly with --input, so the problem probably comes from --glob-input.

It however works correctly if I give an absolute path.

@foresterre foresterre added the C-bug Category: bug label Sep 16, 2023
@foresterre
Copy link
Owner

I've confirmed that this is indeed an issue. Thanks!

@foresterre foresterre self-assigned this Oct 3, 2023
@foresterre
Copy link
Owner

foresterre commented Oct 5, 2023

I've been looking into this issue, and found lots of interesting questions:

  • globs are tricky
  • this bug is either caused by my misuse of the globwalk crate used by sic, or a bug or unsupported expansion in globwalk
    • A few days ago I tried and compared various different glob libraries in the Rust ecosystem, and found that each has its own set of quirks
  • for example: not all (rust) glob libraries support the same syntax
    • glob (unmaintained) and nu_glob (maintained) support a smaller subset than most others
    • globwalk, the one used by this crate combines globset via ignore (as alternative to using globset and walkdir directly), ignore can add glob overrides which affect directory traversal
    • wax supports an extensive set of syntax, and is very well tested, but only supports forward slashes, and flatly rejects drive letters (for windows), requiring some pre-processing and post-processing in the name of cross-platform support
  • a question every glob library has to ask: should the glob expansion support filesystem features, like .. (directory up) and . (current directory), or drive letters (windows drive mounting points)
    • globwalk, the library I currently use does not support these, but I already work around this issue for ./path relative paths in the current codebase
    • if yes, should they only be supported at the start of the path, or also between path components beyond the first, like hello/../world/*.png or hello/./world/*.png, and what should hello//world mean?
    • if no, can a dependent which does access filesystems infer these directories where possible, e.g. a reasonable
    • if you're going to evaluate . (current directory) beyond the first path component (first path component: ./some, beyond: hello/./some), then you may run into issues like BUG: Partial conversion with the Glob method #1656
  • we can differentiate between a glob matcher libraries, like globset and a libraries which also resolves actual file system paths, like globwalk and glob
    • should only the latter support file system operations like .. and .?
      • if yes, should a resolving library "lower" glob patterns which include such syntax to paths which do not include this syntax, to be ingested by glob matcher libraries?
        • if yes, aren't we really still writing two glob parsers in one?
  • is it viable to support file system syntax, or can it make glob syntax ambiguous?

The plan

  • globwalk is unmaintained
    • I forked it into globwalker
    • This fork will aims to be cross platform and practical
    • First I will move the local workarounds from sic to the globwalker crate
    • Then I will consider fixing the bug reported here
    • Then I will consider supporting (or not) other file name

I'm also still considering dropping absolute path support; then a pre-processed version of wax may be preferable (like I've implemented here)

Libraries

Still also happy to use/replace the glob library used. A list of libraries I'm aware of follows:

No filesystem walker

Includes filesystem walker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug
Projects
None yet
Development

No branches or pull requests

2 participants