Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Optimized select for selecting the binary #67

Open
breml opened this issue Mar 20, 2021 · 18 comments
Open

Feature Request: Optimized select for selecting the binary #67

breml opened this issue Mar 20, 2021 · 18 comments
Labels
enhancement New feature or request

Comments

@breml
Copy link
Collaborator

breml commented Mar 20, 2021

When presenting the user a list of potential files to select the correct binary from, the following improvements could be applied to improve the user experience:

  • Filter files with extensions, that are most likely not executables or find the files with extensions, that are likely executables (e.g. .exe, .sh and of course no extension)
  • Filter by elements provided in the install url (e.g. repo name in the case of the github provider)
  • Evaluate FileInfo() information from header of archive (e.g. in tar and zip) to find files with executable bit set
  • Order by relevance
  • Inverse order, such that the files with the highest relevance are at the end of the list (and therefore closest to the prompt), this is especially important if the archive contains lots of files and the list might span even multiple pages).
  • Optional: add a fuzzy search like https://github.com/junegunn/fzf or https://github.com/ktr0731/go-fuzzyfinder
@breml breml changed the title Optimized select for selecting the binary Feature Request;Optimized select for selecting the binary Mar 20, 2021
@breml breml changed the title Feature Request;Optimized select for selecting the binary Feature Request: Optimized select for selecting the binary Mar 20, 2021
@sirlatrom
Copy link
Collaborator

The current approach uses a very minimal scoring method, which includes having the repo's name in the binary name or URL's basename. This should already give priority to files that at least include the repo name. However, the other suggestions sound interesting, and especially looking at FileInfo() in .tar and .zip files and other archives that support the executable ti sounds like a quick win.

@sirlatrom sirlatrom added the enhancement New feature or request label Mar 20, 2021
@cristiand391
Copy link
Contributor

Finding binaries by reading it's MIME type would be awesome, every time I update a binary I've to manually select it between ~10-20 files.

Is this open to contributions?

@marcosnils
Copy link
Owner

Is this open to contributions?

Defintely!

For updating binaries I believe there's actually something better we can do. We could save the name of the original file selected the first time and then whenever triggering an update, check if that same file exists to fetch it without asking the user again. Additionally, I suggest we can do what @sirlatrom suggests about improving the scoring method to target the files better to remove altogether the selection step.

I guess we can implement this in several steps

  • Score better archive files based on OS and Arch and don't prompt the user if we have a single high scoring file
  • Save the selected file the first time and use the same name upon updates.

@akhan4u
Copy link

akhan4u commented Mar 31, 2021

  • Save the selected file the first time and use the same name upon updates.

This will help a lot as it's difficult to check out the same binary when updating. Sometimes I have ended up downloading the checkgen instead of the executable.

@sirlatrom
Copy link
Collaborator

  • Score better archive files based on OS and Arch and don't prompt the user if we have a single high scoring file
  • Save the selected file the first time and use the same name upon updates.

The first point is already handled for .tar(.*) and .zip archives as the same filtering/selection mechanism is used there as for 'top level' files/assets.

I'm not sure how we can handle the second idea as there can theoretically be an indefinitely long chain. Maybe we can somehow store each choice along the way and 'pop' a choice for each part of the chain?

@marcosnils
Copy link
Owner

I'm not sure how we can handle the second idea as there can theoretically be an indefinitely long chain. Maybe we can somehow store each choice along the way and 'pop' a choice for each part of the chain?

Hmm maybe I missing something here? What I had in mind is:

  • Install a .tar binary and keep the original final file name in the tar file (regardless of the final binary name) on the bin config
  • When performing an update, check the tar files again and look for a match on the initially saved file. If yes, just use that same file.

Not sure I'm missing something Sune, since I didn't quite understand the "indefinitely long chain" part.

@sirlatrom
Copy link
Collaborator

sirlatrom commented Mar 31, 2021

  • When performing an update, check the tar files again and look for a match on the initially saved file. If yes, just use that same file.

Not sure I'm missing something Sune, since I didn't quite understand the "indefinitely long chain" part.

@marcosnils Not very likely, so we don't need to handle it, but there could be a binary within a tar.gz within a zip etc, each with an ambiguous list of files, and we'd need to remember each choice the user made along the way.

Practically speaking, we should at least remember which top level asset was chosen, and if it's an archive then which file was chosen within that archive.

@breml
Copy link
Collaborator Author

breml commented Mar 31, 2021

I guess we can implement this in several steps

  • Score better archive files based on OS and Arch and don't prompt the user if we have a single high scoring file

I would like to emphasize once again, that I do not like the scoring part about OS and Arch. There is really no value in ever presenting the user a file, that does not match the OS or the Arch, even if such a file has the highest score, for example if there is no file available for the OS/Arch of the user. I had this situation once, where bin installed a Windows exe on my Linux and in my opinion, this should never happen. So I propose a filtering by OS and Arch and apply the scoring only to the files, that remain as options after the filtering has been applied.

@marcosnils
Copy link
Owner

marcosnils commented Mar 31, 2021

So I propose a filtering by OS and Arch and apply the scoring only to the files, that remain as options after the filtering has been applied.

I agree with this approach. I believe we're on the same page here and we're mostly discussing semantics. Files with different OS / Arch should score 0 by default and we shouldn't present that option to the user (unless eventually overridden by a flag?).

Not very likely, so we don't need to handle it, but there could be a binary within a tar.gz within a zip etc, each with an ambiguous list of files, and we'd need to remember each choice the user made along the way.

Now I understood your original concern. I guess we can save all the file chain, doesn't seem very difficult to do. However, I still haven't come across a scenario with multiple zipped files to ultimately get a binary. Not sure how often this becomes in practice, since it's not very standard right?

@sirlatrom
Copy link
Collaborator

Files with different OS / Arch should score 0 by default and we shouldn't present that option to the user (unless eventually overridden by a flag?).

Currently, any asset containing the repo name gets a score of 1 to begin with, and additional points for matching the OS/arch/OS specific extension (.exe/.appimage). I don't think we can expect all repos to play nice and include both the OS and the arch in asset names or binary names within archive assets. What's the best way to move forward?

Not sure how often this becomes in practice, since it's not very standard right?

Agreed. We would still need to save two choices, though: Which archive, and which binary within the archive.

@marcosnils
Copy link
Owner

I don't think we can expect all repos to play nice and include both the OS and the arch in asset names or binary names within archive assets. What's the best way to move forward?

My basic scoring proposal:

  • All files start with score 0
  • If file has arch and/or OS and it doesn't match the bin host, subtract -1
  • If file has arch and/or OS and does match the bin host, add +1

Given scores:

  • Single high score file, install automatically
  • Multiple score files => 0, prompt the user order by score desc
  • Files with score < 0 don't prompt the user

I'm probably missing something and there's surely a better way of doing it, I just wrote the first idea that came to my mind.

@sirlatrom
Copy link
Collaborator

* match the `bin` host

What does that mean? Do you mean the repo name? That's what we already do, but I suppose we can wait with giving that point until we've found at least one of the os/arch matches first.

@marcosnils
Copy link
Owner

marcosnils commented Apr 1, 2021 via email

@breml
Copy link
Collaborator Author

breml commented Apr 1, 2021

I don't think we can expect all repos to play nice and include both the OS and the arch in asset names or binary names within archive assets. What's the best way to move forward?

My basic scoring proposal:

  • All files start with score 0
  • If file has arch and/or OS and it doesn't match the bin host, subtract -1
  • If file has arch and/or OS and does match the bin host, add +1

Given scores:

  • Single high score file, install automatically
  • Multiple score files => 0, prompt the user order by score desc
  • Files with score < 0 don't prompt the user

I'm probably missing something and there's surely a better way of doing it, I just wrote the first idea that came to my mind.

In general I like the above proposal. One downside I see is, that a file with the correct os, but the wrong arch will still get a score of 0 (+1 -1) and therefore this file remains a candidate. So I guess in order for a file to be considered a candidate, it must achieve at least a score > 0.

Additionally I would like to work towards an algorithm, that is successful in most cases to pick an archive and perform a successful installation and only in very few exceptional cases, it should be necessary for the user to select an archive. One step into this direction would be to put the different archive types into an ordered list (ordered by priority). This would allow us to successfully install the binary even if there are multiple archive types available (e.g. tar.gz and .zip).

I have an additional idea, which I feel worth exploring and this idea is to check, if the repo does contain a .goreleaser.yml file. I know, this targets only towards Go, but I feel that goreleaser is becoming the defacto standard for releasing binaries in the Go eco system. The hugh advantage of considering this file is, that we no longer need to guess if arch / os are present in the file name, because based on the existence of the replacement section, we know which is the correct file to download.

Example from bin:

archives:
- replacements:
    darwin: Darwin
    linux: Linux
    windows: Windows
    386: i386
    amd64: x86_64

It might be worth it to try to figure out, if there is something similar for e.g. Rust.

@breml
Copy link
Collaborator Author

breml commented Apr 1, 2021

I did a quick test with my ~50 binaries managed with bin. For a little bit more than 1/3, I found a .goreleaser.yml.

@breml
Copy link
Collaborator Author

breml commented Apr 4, 2021

Just for reference, this site lists the valid combinations of arch/os supported by the Go compiler: https://gist.github.com/asukakenji/f15ba7e588ac42795f421b48b8aede63

marcosnils added a commit that referenced this issue Apr 10, 2021
Fixes #93

Temporary fix until we properly implement #67
sirlatrom pushed a commit that referenced this issue Apr 10, 2021
* Don't filter when there's a single asset

Fixes #93

Temporary fix until we properly implement #67

* Add dry-run mode to update command
@schnatterer
Copy link

I'd like to contribute some more examples test cases that could affect this issue:

This issue might overlap with #102.

@pataquets
Copy link

I'd like to add the case where there are alternate binaries matching your platform, such as:

  • statically/dynamically linked
  • libc/musl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants