-
-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider buffered algorithm to handle recursion more gracefully #82
Comments
So for some reason I had it in my head that I'd already replied to this when I assigned it to myself. Looking back, that is not the case! So sorry for the late reply to this.
Heh, believe me, I'm aware! When I was testing the Unfortunately, there's one thing that's standing in the way of having nice fluent output, and that's that exa has to format its output into a table before it's able to output anything. You're right that this could be fixed for just the The For example, when I run My (probably incomplete) list of partial solutions to this is, so far:
In any case, you have correctly identified that the Also, a specific reply as to your implementation:
Right now each file is queried in parallel (at least with The reason it has to be sorted afterwards is that reading the data about the files isn't guaranteed to be done sequentially, and there's a chance that certain calls will take more time than others, and the results will be sent down the channel out-of-order. So sorting has to be done afterwards, which means there's no point doing it beforehand as well. The joys of parallel programming... |
Hmm some good points, I might respond a bit asynchronously as I come up with solutions, so sorry in advanced... By using human readable sizes you could force the file size to always fit reasonably in a fixed width size. With 6 characters, you could do this.
I think 6 is the minimum, but bumping up characters you could get more granularity (a flag perhaps!?) I think the next peculiarity is the username/group. You can get a listing of all groups and users before-hand by looking at the administrative database with
to:
Which would still allow them to be uniquely identified, but might keep them shorter. Again, I think this solution may be in need of real data to test how often it triggers. I think file name is the easiest one. When wrapping, simply include in indent to the next line so the line easily stands out as a continuation rather than the next entry. Some other considerationsIt may be possible to emulate the docker cli, which has far too many columns to fit in normal console windows, so it intentionally staggers them onto the next line for easy parsing. Another option might be to keep all fields that you can control at the left while moving the variable length fields to the right. You'll end up with 2-3 fields on the right floating around a bit, but with proper coloration it may still be readable without being aligned perfectly. If a user needs to see a better alignment, it might be wise to jump out of tree mode. Just a thought. Thanks for the response! |
Don't worry, I do this, like, all the time.
I wasn't sold on truncating too, but I get what you mean. The annoying thing is that the only times you'll want to specifically see the names are when you're trying to track down some kind of "why can't I write to this file" issue, and it'd be essential to not hide any file information from the user in times like that. My computer has a bunch of users with names like
It is, because the filename column doesn't have any sizing/padding done to it: it'll always be on the right, and never gets padded!
Ooooh, could you find a screenshot of this? I've never used Docker so I'm not sure what I'm meant to be looking for! exa already allows you to jump out of tree mode and list every folder with separate column withs ( |
Looking at the docker output now. Its an interesting case study, but I'm not sure how helpful it is. All of their columns have defined row sizes. They truncate the variable rows. I'll get a screen shot in a bit... gotta scrub the image first. |
Here's a potential mockup. The idea behind it is that the user is specified as part of the tree "wire". The width of the block is determined per-level. Each level is completed before beginning to print it. |
I think I see how things are aligned there: every time we go into a new sublevel, we get a new set of column widths, and when we go out of that level back to the original, we revert back to the previous set? |
Yes. It's a little odd to follow from the right, since the user/group name length fluctuates by level, but I think it should still be fairly easy to follow from the left where the tree is drawn. It's also important to note that the tree is (assuming black background) white text while the user or group is orange, so it should be easy to tell when nesting begins. |
|
This commit merges in a *lot* of refactoring work done to the tree, details, and grid-details views. The main one is that the concept of a “table” has been completely separated from the details view. Previously, if you wanted to use the long view without drawing the table (such as with --tree), exa used a “table” with 0 columns in it behind the scenes. This is now reversed: the table is now the optional component in the details view, and a tree-only view is just a details view without a table! Doing this has paved the way for all sorts of code cleanups: now, we only calculate values needed for the table if one’s going to be displayed. Some of these were rather expensive to compute (such as the user’s time zone, or locale) This refactoring is not fully complete; there are still several things that can be done, including having the errors and xattrs in the tree view use the same TreeParams constructor as the others, and separating the column widths from the table so fewer mutable values need to be passed around. Finally, this also merges in a lot of debuffering. There were at least two places in the code where values were collected into a vector before immediately being iterated over, instead of just using those values as they were generated! For example, when displaying the table, each row was rendered into a set of cells for displaying: but then it all went into a vector, and was only displayed at the very end, because that was what I needed for the grid-details view. Now, exa is smart enough to not do that. Basically, even if exa doesn’t actually *get* that much faster, it should at least display its first line of output quicker. Fixes #90, but also see #82.
Hi again! It’s been a while, but I did end up working on exa some more. The branch I just merged in doesn’t give exa streaming capabilities, but it does land two nice features:
So while it’s not streaming yet, it’s certainly faster, and the code improvements mean that implementing the hybrid breadth-depth search that you originally talked about is now a possibility. |
Awesome. I'll update my provisioning scripts and check it out with my coworkers who are loving exa. Thanks! |
Currently, exa does a depth first tree traversal when operating in recursive/tree mode and waits until full recursion is completed before printing anything. This algorithm can have a negative impact on performance both in terms of runtime and memory use. For larger trees (
exa / -lTL4
) there can be a substantial wait and memory impact will be proportional to total files.I suggest instead that exa use a hybrid of breadth and depth that will give it enough information so that it can begin printing, then buffer that information to output before clearing the memory. For example: suppose we have a standard linux tree and we run
exa / -lTL4
. The first thing we should do is create the root of a tree,/
then gather the/
level so that we can begin adding those nodes. We'll getbin boot cdrom dev etc home lib media mnt ...
. We sort this level (by either time or name) then place it in the tree. We then move to the most highly sorted item in that level. For alphabetic sort, this is/bin
. As soon as we move the recursion down to/bin
we will also format a line for bin and pass it to a print queue.The print queue should be checked and printed to asynchronously within processing to ensure that as soon as output is available at some interval, we initiate printing it. This will improve performance for remote clients, reduce in-memory footprint of exa, and improve the perceived performance by the user. This asynchronous printing could be done by another thread if you wanted, but I think that just convolutes the solution. As long as the printing is initiated somewhere before the program completes it's good. In all honesty, a queue is a very high level abstraction for this. You could literally print each line as it becomes available to print if you wanted.
So, onto
/bin
-/bin
has no nested directories, when we gather this level, we end up withbash bunzip busyybox ...
etc. We begin walking this directory and discoverbash
is not directory, so we add it to the print queue, which I mentioned above. Since we're now done withbash
we can pop it from the tree -- this is where we begin to see our memory savings. We then do the same forbunzip
,busybox
etc until all of/bin
is done. Then we start this process all over again on/boot
.The idea behind the tree-traversal model is that you can know the depth (and thus, how to print it) so if you visit nodes in the order that they will be printed, you can immediately print them and then allow the program to forget about them.
Let me know what you think about this. I've been trying to work my way around the code and see how this might be implemented, but I'm still pretty green when it comes to rust. :)
The text was updated successfully, but these errors were encountered: