Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting order #5

Closed
SolitudeSF opened this issue Jul 6, 2022 · 8 comments
Closed

Sorting order #5

SolitudeSF opened this issue Jul 6, 2022 · 8 comments

Comments

@SolitudeSF
Copy link
Contributor

sometimes the sorting order is subtly broken

Featurettes
Futurama (1999) - S07E16 - T. The Terrestrial (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E20 - Calculon 2.0 (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E01 - The Bots and the Bees (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E02 - A Farewell to Arms (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E03 - Decision 3012 (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E04 - The Thief of Baghead (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E05 - Zapp Dingbat (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E06 - The Butterjunk Effect (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E07 - The Six Million Dollar Mon (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E08 - Fun on a Bun (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E09 - Free Will Hunting (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E10 - Near-Death Wish (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E11 - Viva Mars Vegas (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E12 - 31st Century Fox (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E13 - Naturama (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E14 - 2-D Blacktop (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E15 - Fry and Leela's Big Fling (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E17 - Forty Percent Leadbelly (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E18 - The Inhuman Torch (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E19 - Saturday Morning Fun Pit (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E21 - Assie Come Home (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E22 - Leela and the Genestalk (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E23 - Game of Tones (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E24 - Murder on the Planet Express (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E25 - Stench and Stenchibility (1080p BluRay x265 RCVR).mkv
Futurama (1999) - S07E26 - Meanwhile (1080p BluRay x265 RCVR).mkv

episodes 16 and 20 for some reason are at the top

@c-blake
Copy link
Owner

c-blake commented Jul 7, 2023

Believe it or not, I only just saw this issue minutes ago! Deepest apologies for not seeing it and responding sooner.

I have reproduced the behavior with the order = "0134EN" of the default config.

It works as you expect with a much simpler lc -of (which is the easy workaround here - that default order is quite fancy).

@c-blake
Copy link
Owner

c-blake commented Jul 7, 2023

A more thorough explanation - this is not an actual bug, but perhaps poor documentation and/or a limitation of a (documented in the --help) heuristic.

The issue is that the . (in "T." and "2.0") makes the extension sorting (the 'E' part) think the extension is "not .mkv", but whatever follows that first ..

So, if you run lc -o0134eN then you get the results you were expecting.

I made the default config use 'E' as that seemed what I would usually want. That longest ('E') will put things like .tar.gz and .tar.bz together relative to other .gz & bz files. The shortest will put all the .gz's and .bz's together, independently of whatever precedes it, but this "longest" rule is fragile to other periods in the name.

It's worth saying that it is near human like linguistic intelligence to know that "T. " is some "initial" inside a title and not anything related to a file type or that "2.0" is embedded and not part of, say, a software version number.

That caveat being made, really, lc just provides both first & last which might be "the simplest", but there may be a better even fancier idea here -- "multi-level extension comparison from the end". I.e., first compare the .bz and then the .tar and then any other "component". This would at least put .tar.bz and .tar.gzs in separate "blocks" within the .bz and .gz files. And this fancier idea might be a better default or even a better binding for capital 'E'.

I'm open to a PR along those lines, if you want to try. { I'd guesstimate that it's about 10X simpler than that hyperlink PR you did.. although it is also about 50X harder than just changing the case of E in your default config. ;-) }

@c-blake c-blake closed this as completed Jul 7, 2023
@c-blake
Copy link
Owner

c-blake commented Jul 7, 2023

I should also have said that even with that new mentioned multi-level from the end feature, your specific example would still break since your . s are just not part of a hierarchical extension syntax and so require the user to know 'e' not 'new E'. This is why I closed the issue - a true full solution is "AI-Complete" (like NP complete).

But I fully acknowledge for many users lowercase 'e' is likely to be a less confusing default. We can switch the default config to that, if you want. What I personally do is just toss on a -of in situations like that right on the command-line, though, as it's pretty quick to keystroke.

c-blake added a commit that referenced this issue Jul 7, 2023
@SolitudeSF
Copy link
Contributor Author

SolitudeSF commented Jul 7, 2023

im fine with changing my config. problem for me is that im just changing magic symbols, i still have no idea what the order string means and what can i even set it to. i guess the real issue here was that i have no clue what im doing or what is the true or possible meaning of the config, even if its the only file listing program i've been using for 99.(9)% of the time since whenever i found it. No offense, but the readme doesnt pass as a tutorial/documentation as it reads more like working notes.

Maybe in some other situtation i would just man up and read the source code, but that seems like a disproportional time investment for a file listing program.

And im not trying to coerce you into making it 100% approachable and user friendly. If this is a tool you hacked for your personal use, im still fine with whatever functionality i can comprehend myself.

@c-blake
Copy link
Owner

c-blake commented Jul 7, 2023

First - I am very glad to hear you use lc for 99+% of your cases. Me, too! :-) I have a couple dozen wrapper scripts set up and a lot of muscle-memory for various stylized/customized listings because even lc -sx feels too long to type compared to "dx". Lol.

It's really an outgrowth of ideas in a 500-line Python thing I did back around y2k that was about 50% defined in environment variables Python could eval (!). So, I've kind of been playing with these abstractions for almost a quarter century by now and they are probably not the only ones and probably do require a lot more explanation.

By "documentation" I meant the part of the help message I just "clarified", but it is true that I do not elaborate on "multi-level order/sorting", but I also added a comment in the config.

Granted, that will not much help someone who just copies my config without understanding and never understands admittedly terse --help output for admittedly terse sub-syntaxes.. and given the system complexity that is not so unexpected/invalid a way to try out the system. I am very happy to work with you on trying to make it more approachable. { I think 100% is impossible as there are just too many new ideas and in my experience 50+% people will not pay attention to anything that doesn't interest them immediately which will then cause trouble with interacting features. So, realistic expectations and all that.. } A person who is both confused but also motivated is solid gold to the project of better approachability. Also, I have contemplated using the cligen/strUt.MacroCall templating system to move away from all those 1-letter idents (and also in procs which I also use dozens to hundreds of times/day..).

On the topic at hand, multi-level sorts are pretty common in databases (like SQL "order by") or even in the GNU sort command. The idea is that you have a key that is compared like a Nim tuple -- (field1, field2, field 3, ...) and you only compare pairs of tuples until such fields differ, with the difference deciding the overall comparison. So, all those 1-letter codes in the --help message for "order keys" are possibilities for specifying those field1, field2, .. in the just above mentioned tuple. You can play around with lc --help|less in one terminal and lc -oXYZ in another on some interesting directory, such as /dev/ or something.

Part of the complexity surely derives from that default order = "0134eN" using 2 user defined "kind/type" categories the "01" goes with the part of the help which says simply "tK 0-2" while the "34" goes with what the help calls "3-5, ... fK, ..." which is obviously.. crazy pants dense. Further a user themselves can define (with integer codes) what order means across their types.

The order system could all definitely use some elaboration beyond the comments in/example of the config/(kind|style) stuff, though that is a good starting point. { It could also maybe use a more user-friendly specification system, such as just listing the things in order, say, instead of assigning integer codes, though that might need some new CL option like -O,--orders (but I am not trying to assign work or anything).. }

The inspiring use-case was "dot file directories" being in a block at the start, then "dot file non-dirs", then dirs, then non-dirs. This was more important to me in $HOME in days before XDG, but I still like that organization. A lot of people "hate files/dirs like X", which is fair enough, but if they are around anyway I want some tool to wrangle & present whatever organization can be find/be helpful. Basically, so I "know where to look" with order & color highlighting reinforcing each other.

The README is tilted toward "What other stuff is missing" rather than "How to use lc", though both relate to "Why should I care having clicked some link taking me here?"... I am super-duper open to crafting a more wordy elaboration on how the configs work and how features compose. My goal with the help message was to try to keep it down to one screenful, but there is no real "backup doc" and that is surely sorely needed, maybe with its URL in the --help output..

One good starting point might be an expanded --help and all the comments in the shipped config. Maybe a github Wiki type thing like I began over on cligen? Open to suggestions on format/venue.

I'm also very open, if you want, to getting you up to speed enough to have your very own configs! And distributing them under configs/sf0 (or 1 or 2 if that ever happens, or whatever name you want). It might help to have some more synchronous private conversation over Element/Matrix/whatever if you are in a compatible timezone. We can email to set something up if desired.

c-blake added a commit that referenced this issue Jul 8, 2023
exhibited in #5

I do doubt "User-DefinedFileKindOrder0" would be much less inscrutable
than '0' for the uninitiated :-(  Instead this needs real how-to-config
documentation, also covering complexities of dual personality dirents..
aka "symbolic links" which definitely do not help. }
c-blake added a commit that referenced this issue Jul 8, 2023
…ented"

above in table of 1-letter codes *shared* as formatting codes), in further
service of #5 (though that confusion
was really caused by a classic "delimiter inside DSV" problem/eE "docs").
c-blake added a commit that referenced this issue Jul 8, 2023
fair to assume that `lc` users also have ~/.config/cligen colors set up
and it (maybe) makes the litany of sub-syntaxes/keywords thereof more
digestible.  Should probably embellish more of the per-option strings.
(This is but one step on a road #5
discussion suggests.)
@SolitudeSF
Copy link
Contributor Author

SolitudeSF commented Jul 8, 2023

after observing configs more closely and playing around a bit, situation got a bit clearer, but there are still fundamental gaps i cant fully grasp, like dimensionality, and what constitutes dimensions and some other things.

and small things like where in all example Style definitions i dont see closing double-quote in format-strings.

My goal with the help message was to try to keep it down to one screenful

i can see the point, but i dont believe in its practicality and i believe that its one of the big blocks in the barrier of entry. as it stands the help doesnt fit on one screen on my machine, so it isnt really a cheatsheet. and when i need to consult a help page, i either read it as it comes or grep it to find the relevant info. i get the abbreviations are almost part of your identity, but they obfuscate already compressed info to the point of noise.

@c-blake
Copy link
Owner

c-blake commented Jul 8, 2023

Well, I am working on formatting it less densely with highlighting, at least (..which I assume you have set up for cligen helps? @Vindaar and @kaushalmodi have both mentioned colorized help being one of the big cg draws). It may be hard to get it to "easy greppability", since that relies upon guessing "what will people grep for...".

Really a "Guide To Configuring" is probably what is likely needed with various worked through examples.

I may be able to just help2man the thing and begin a less dense actual man page. lc is pretty much Unix-only and man is the Unix idea for more thorough than --help docs.

Detail is a tricky thing, of course - one wants just the right amount for a given audience, but audiences vary a lot.

@c-blake
Copy link
Owner

c-blake commented Jul 8, 2023

Re: the unclosed quotes - that was the only way I could get std/parsecfg to work. While it's possible I am misusing it in cligen, I kinda expect it's just a weird glitch. Even now, if you put a 4th quote it fails with "unexpected token". Meanwhile, it works if there is a space between the closing " and the closing triple-" (but I didn't want that extra space causing line wraps just to appease std/parsecfg).

This is just a generic thing that would affect any cligen program. I don't recall ever submitting a Nim bug report about this, but you can if you like. If it works with newer Nim's I'd happily update those configs. If you hate it, you can actually just compile lc with -d:cgCfgToml today (for years, even). You will need to update all your configs, but you might enjoy such. :)


As I was editing that --help output, I was really reminding myself how good a cheat sheet it was for me writing my configs/etc. So, I think the answer is some more elaborate documentation (man page or markdown) to get other people to the point where that (or some future version of it) can be a cheat sheet for them doing similar.


It is true that 5 or 6 of my 1-letter codes are not very related to their "thing" and a few others are just "user-defined", but the vast majority are just the 1st letter of their thing which is not particularly "worse" than GNU ls with -[mHNopux]. Many (but not all!) of those GNU ls things have --long-options. You don't usually go too far with 1-letter before something becomes "a stretch", and it's not just me. strftime has a similar set of issues. As I said, moving from 1-letter codes to several letter short codes with cligen/strUt.MacroCall more f-string interpolation-like syntax is a possibility. That's probably even do-able in a backward compatible way with the short names just as aliases for the long.

To expand on that, what I was thinking is using the same string as used for the "header row" could be the %code used in the formatting with a % and %{} if you need them back-to-back with no space. Almost all sorting codes are shared with formatting codes, and those could be the same, but we could just also accept a comma-separated list as the more verbose syntax and distinguish backward compatibly with the 1-letter variant by a leading comma or % in the new one, maybe. I'm not against any of this - I just did what was easiest first because no one complained (until now). It will need to expand the help cheat sheet. There may also be tension between "overwide columns from long header idents" (e.g. "%o" for occupancy of address space with data blocks - % not "holes", and also since that uses the meta-character %).

There are things in this space that are already hyper abbreviated, though, like "i-node" which was originally for "index node", but if you called it "index node" many people would struggle until they realized "oh, he means i-node!". [abcm]time are kind of other examples where you might well have to read man pages to know which syscalls update which file times. I cannot do much about that stuff... { "vtime" is my own term, though..most would say max(c,m)-time instead of coining "version time", if I am even the first to do that.. Anyway, you can blame me for vtime and copying printf/strftime more than f-strings. }.

c-blake added a commit that referenced this issue Jul 10, 2023
of under-explained &| impenetrable help message discussion surrounding
#5 .

This really is incomplete.  A short list of failures is:
  - Manual rather than automatic man-macro aligns in option sub-sections
  - almost everything should have some little example near it
  - abbreviation and time formats (at least!) are under-explained
but something is better than nothing.
c-blake added a commit that referenced this issue Jul 12, 2023
…t of

#5 (comment) .

This change is just about builtin file type/kind names.  Since name match
has always been by unique prefix, this can be ~100% back-compatible. (The
only compatibility risk is users might have defined things with colliding
same-prefixes like "regDir" which will now error out with ambiguity on
use of "reg" rather than taking the exact match.  This seems unlikely,
but I mention it here to remember for release notes..).

Specifically:
    reg    -> regular
    bdev   -> blockDevice
    cdev   -> charDevice
    exec   -> executable
    hard   -> hardLinks (with 's' since, technically, n=1 is a hard link
              of sorts.  So, think of it more as "has >1 hard links".)
    tmpD   -> tmpDir
    worldW -> worldWritable
    unR    -> unReadable
    odd    -> oddPerm
    +-sym  -> +-symlink (I still like +- here since people can never
                         settle on "broken", "orphaned", "stale", etc.)
    CAP    -> CAPABILITY

Retained bdev & cdev as aliases since for those two only the new name is
not prefixed by the old name.  Retained some other things as not so bad.
"suid" seems almost as standard an abbreviation as "symlink" (e.g. mount
uses "nosuid" and `/tmp` is the prototypical "tmpDir" and almost all ACL
tools just abbreviate "access control list", like `getfacl`).

Also, change the type add template `tAdd` to wrap `t` in bool().

I left old abbreviations in the `lc --help` message since the whole style
of the help is to be very brief (both here & just in general for --help).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants