Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for natural ordering (a la -v in ls) #324

Closed
ErichDonGubler opened this issue Sep 13, 2018 · 19 comments · Fixed by #556
Closed

Add option for natural ordering (a la -v in ls) #324

ErichDonGubler opened this issue Sep 13, 2018 · 19 comments · Fixed by #556
Labels

Comments

@ErichDonGubler
Copy link

lt would be really, really handy for scripting to have a flag to arrange output using natural ordering.

Given the following tree:

├── file-1
├── file-11
├── file-12
├── file-2
├── file-22
└── file-3

0 directories, 6 files

The output of ls -1/fd is currently:

file-1
file-11
file-12
file-2
file-22
file-3

The proposed output for natural order (ls -1v, proposed to be something like fd -v) would be:

file-1
file-2
file-3
file-11
file-12
file-22

If you need a dependency for this, rust-natord is small and seems like it could fit the bill.

@tmccombs
Copy link
Collaborator

as a workaround you can use fd | sort -V

@sharkdp
Copy link
Owner

sharkdp commented Sep 13, 2018

@ErichDonGubler Thank you for your feedback.

I think this is better left to external tools, as @tmccombs suggests. Another way is to use xargs to do the sorting via ls -1v:

▶ fd -0 | xargs -0 ls -1v
file-1
file-2
file-3
file-11
file-12
file-22

@sharkdp
Copy link
Owner

sharkdp commented Sep 13, 2018

Also, see #196 and #159.

@ErichDonGubler
Copy link
Author

ErichDonGubler commented Sep 13, 2018

@sharkdp: I don't really have anything to add after seeing the discussion you've linked. You definitely call the shots! I wonder if there IS a case where there's significant enough gain by using internal sorting (which would NOT be the default, of course -- I agree with the opinion you expressed there in #159). Let me see if I can find some numbers that form a convincing case -- if I can't find something in the next few days, I'll happily close this. :)

@sharkdp
Copy link
Owner

sharkdp commented Sep 16, 2018

@ErichDonGubler Thank you for the feedback.

I'd definitely be interesting in hearing use cases for such a feature! However, I am still following the "80% of the use cases" philosophy with fd, as mentioned in the README.

@sharkdp
Copy link
Owner

sharkdp commented Oct 5, 2018

I'm going to close this for now. Feel free to comment here and I can reopen the ticket.

@sharkdp sharkdp closed this as completed Oct 5, 2018
@sergeevabc
Copy link

Windows user here to complain about order inconsistency between launches.

Let’s say I want to compute hashes like that fd -tf -d 1 -x rhash --sha256

Expected order Launch 1 Launch 2 Launch 3
AUTHORS AUTHORS AUTHORS AUTHORS
ccguess.1 ccguess.1 ccguess.1 ccguess.1
ccguess.html ccguess.html ccguess.html ccrypt.1
ccrypt.1 ccrypt.1 ccrypt.1 ccguess.html
ccrypt.html ccrypt.html ccrypt.html ChangeLog
ChangeLog ChangeLog ChangeLog ccrypt.html
COPYING COPYING COPYING COPYING
cygwin1.dll cypfaq01.txt cypfaq01.txt cypfaq01.txt
cypfaq01.txt NEWS NEWS cygwin1.dll
NEWS ps-ccrypt.el cygwin1.dll NEWS
ps-ccrypt.el cygwin1.dll ps-ccrypt.el ps-ccrypt.el
ps-ccrypt.elc ps-ccrypt.elc ps-ccrypt.elc ps-ccrypt.elc
README README README README-WIN
README-WIN README-WIN README-WIN README

@ErichDonGubler
Copy link
Author

@sergeevabc: Was there something in the documentation that gave you the impression that a certain order of output was guaranteed? AFAIK fd doesn't make any.

@sergeevabc
Copy link

@ErichDonGubler, some kind of processing order is usually expected from CLI file-related utils (archivers like 7zip and Zstandard, backup managers like Duplicacy and Restic, defrag managers like Contig, duplicate killers like Jdupes, hash calculators like Rhash, even media encoders like LAME and FLAC gave me that impression).

@ErichDonGubler
Copy link
Author

ErichDonGubler commented Mar 9, 2019

So, first, let me see if I can address your immediate problem by asking a question: can your environment be expected to have common POSIX tools likesort and xargs? If it can and I'm understand how you want to use rhash, then you can do something like:

fd -tf -d 1 | sort -V | xargs -I {} rhash --sha256 {}

In regards to fd itself, perhaps the best way to handle your complaint is making the lack of order guarantee explicit in documentation? How do you feel about that? EDIT: see @sharkdp's suggestion below. This is probably the solution to add to documentation.

Second, it's true that applications can (and often do) enforce a specific order of file walking results -- even if it is only defined by the filesystem implementation. However, not all applications or tools guarantee it, particularly those that traverse file trees asynchronously and without a cleanup nor sorting pass. fd is one of those tools.

To illustrate my point, let's analyze where asynchronously operations happen in the relevant paths of fd's source by stepping through manually:

  1. main enters walk::scan.
  2. A channel sender and receiver pair is created in walk:: scan that acts as the work queue for printing results to stdout later, with the results sent by a later usage of a parallel directory walker constructed here. This introduces at least two places where async conditions (which are effectively non-deterministic) will affect order of results.
  3. walk::scan enters walk::spawn_receiver, where the thread receiving results to print is born. If we're executing the invocation with job execution you referenced above (fd <expr> -x <job_template>),
    the passed FdConfig has a command and it's not a batch command, so a pool of threads are spun up, which run exec::job .
  4. Once a file is found and pulled by a job worker thread in exec::job, it calls exec::CommandTemplate::execute_command, which calls exec::command::execute_command.
  5. execute_command finally executes the job command and locks a printing mutex, first printing the command's stdout and then stderr. This means that even if a command starts first, if it ends AFTER another command then the second command will still print first.

I'll let @sharkdp correct me if I'm wrong here about the intent of the code, but my assumption is that it's optimized for speed: don't add another pass, keep work between file discovery and printing output as simple as possible.

@sharkdp
Copy link
Owner

sharkdp commented Mar 9, 2019

@ErichDonGubler is correct. You can use --threads 1 / -j 1 if you want to have a deterministic output order.

@sergeevabc
Copy link

sergeevabc commented Mar 19, 2019

@sharkdp, indeed, -j 1 fixes the issue of output sorting.
Consider adding remark about sorting both to docs here and next to that switch (via -h and --help).

@ErichDonGubler, your bio says ‘dedicated to building software for other humans’. Being an average human with calloused hands, I’m looking for tools that first and foremost deliver the predictable output based on the previous experience. Human-friendly tool is expected to have name and version, licenсe and author’s contact data, manual with commands explanation and usage examples. But above all its tangible visual part should resemble behaviour of other tools from the same niche (until author is some kind of revolutionary who believes that customs are obsolete or ineffective). For example, ag, grep, pt, ripgrep, and sift are made to search files for patterns, ripgrep is the fastest among them and it delivers that speed without quirks: switches are mostly kept intact for a sake of consistency not to retrain users and output looks like what user rooted in (pioneering) Western digital culture expects to see (e.g. left-to-right, a-z). The other way round inevitably leads to lengthy justifications about ‘asynchronicity’ and other peculiarities under the hood, which might impress enthusiasts and the academic milieu, but would likely confuse and alienate our human.

@ErichDonGubler
Copy link
Author

ErichDonGubler commented Mar 19, 2019

@sergeevabc: I see the value in having a reproducible order with the tools we're discussing here, and I'm glad you are teaching me about it! You're the first human I've encountered that has A) expressed a preference for a reliable order and B) has actually taken time to write about it. I would imagine that many humans might also not care or prefer speed to that ordering (because they may not have the same previous experience as you!) -- so I don't consider your point generally applicable, but I do think it's a valuable perspective to keep in mind.

sharkdp added a commit that referenced this issue Apr 2, 2020
Add a new `-l`/`--list` option to show more details about the search results. This is basically
an alias for `--exec-batch ls -l` with some additional `ls` options.
This can be used in order to:
    * see metadata like permissions, owner, file size, modification times (#491)
    * see symlink targets (#482)
    * achieve a determinstic output order (#324, #196, #159)
    * avoid duplicate search results when multiple search paths are given (#405)
sharkdp added a commit that referenced this issue Apr 2, 2020
Add a new `-l`/`--list` option to show more details about the search results. This is basically
an alias for `--exec-batch ls -l` with some additional `ls` options.
This can be used in order to:
    * see metadata like permissions, owner, file size, modification times (#491)
    * see symlink targets (#482)
    * achieve a deterministic output order (#324, #196, #159)
    * avoid duplicate search results when multiple search paths are given (#405)
sharkdp added a commit that referenced this issue Apr 3, 2020
Add a new `-l`/`--list` option to show more details about the search results. This is basically
an alias for `--exec-batch ls -l` with some additional `ls` options.
This can be used in order to:
    * see metadata like permissions, owner, file size, modification times (#491)
    * see symlink targets (#482)
    * achieve a deterministic output order (#324, #196, #159)
    * avoid duplicate search results when multiple search paths are given (#405)
@sharkdp
Copy link
Owner

sharkdp commented Apr 3, 2020

This is now supported (in a particular way) by the new -l/--list-details option, see #556.

@sharkdp
Copy link
Owner

sharkdp commented Apr 16, 2020

This has now been released in fd v8.0.

@sergeevabc
Copy link

Hmm.

Windows 7 x64, FD 9.0.0.

$ fd -g *.jpg -tf -j 1 -x xxhsum {}
\879b2d9894fda9fd  .\\thumbs.jpg
\44c472c9d6f50bf8  .\\DSC_6953.jpg
\9e0e685cb71d658e  .\\DSC_6947.jpg
\b21dfab7d945fc8c  .\\DSC_6945.jpg
\e507ebc868c72df5  .\\DSC_6943.jpg
\d13e17e56d68c251  .\\DSC_6942.jpg
\fc5313fefea68b02  .\\DSC_6923.jpg
\9e87379e55f0c7d4  .\\DSC_6907.jpg
\23703ed86e11a3e6  .\\DSC_6906.jpg
\8e8ee2a826c7e045  .\\DSC_6905.jpg
\31939f6304f099b6  .\\DSC_6904.jpg
\323f7c57871e27e6  .\\DSC_6903.jpg
\d1dfab7f948a3dc1  .\\DSC_6902.jpg
\563c70cda89a1737  .\\DSC_6901.jpg
\8d1ab1076d4b4cd7  .\\DSC_6900.jpg
\6fbfac6f39669c1b  .\\DSC_6899.jpg
\b7607f0b98a92bf4  .\\DSC_6898.jpg
\73ed628cb434a733  .\\DSC_6897.jpg
\8380dc1a51b5972a  .\\DSC_6896.jpg
\24fa50966a7b913c  .\\DSC_6895.jpg
\46cb55b63ff71972  .\\DSC_6894.jpg

However, the following output was expected

$ xxhsum *.jpg
46cb55b63ff71972  DSC_6894.jpg
24fa50966a7b913c  DSC_6895.jpg
8380dc1a51b5972a  DSC_6896.jpg
73ed628cb434a733  DSC_6897.jpg
b7607f0b98a92bf4  DSC_6898.jpg
6fbfac6f39669c1b  DSC_6899.jpg
8d1ab1076d4b4cd7  DSC_6900.jpg
563c70cda89a1737  DSC_6901.jpg
d1dfab7f948a3dc1  DSC_6902.jpg
323f7c57871e27e6  DSC_6903.jpg
31939f6304f099b6  DSC_6904.jpg
8e8ee2a826c7e045  DSC_6905.jpg
23703ed86e11a3e6  DSC_6906.jpg
9e87379e55f0c7d4  DSC_6907.jpg
fc5313fefea68b02  DSC_6923.jpg
d13e17e56d68c251  DSC_6942.jpg
e507ebc868c72df5  DSC_6943.jpg
b21dfab7d945fc8c  DSC_6945.jpg
9e0e685cb71d658e  DSC_6947.jpg
44c472c9d6f50bf8  DSC_6953.jpg
879b2d9894fda9fd  thumbs.jpg

As you can see, FD a) got it in reverse and b) added some odd slashes.

@ErichDonGubler
Copy link
Author

ErichDonGubler commented May 10, 2024

@sergeevabc: If you consider that a bug, filing a separate issue is likely to be more fruitful than posting in a (tangentially related) resolved feature request that was originally filed ~5.5 years ago. 😉

@tmccombs
Copy link
Collaborator

Huh, I can reproduce this as well. It appears, at least for relatively small numbers of results, that if you use -j 1 with --exec, that fd runs the commands in reverse order from what the file system gives you. I don't know why though, maybe some weird behavior with crossbeam_channel, or possibly the ignore crate?

As for the slashes... that is very strange. fd doesn't do any transformation on the command output, so I have no idea what is causing that.

@tmccombs
Copy link
Collaborator

After some more investigation it does appear that this is the result of ignore giving us results in the opposite order from what the filesystem does.

I don't know why that is, probably the implementation uses a stack, and pulls items off of the stack. If we switched to using Walk instead of WalkParallel in the -j1 case then it might use a more expected order, at the cost of additional code complexity.

Note that while using -j1 will give you a determenistic ordering, it won't necessarily give you a sorted order, even if we used exec in the same order we got them, because depending on the filesystem you could get the results in a variety of different orders (for example, in creation order, based on hash values of the file names, alphabetically, etc.).

Another option could be to refactor our optimistic sorting if we get results quickly for --exec as well. I'm not sure how difficult to that would be to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants