-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Choco Daily All-Package Extract #2144
Comments
Thanks! I think that you are right that ultimately, a local cache could possibly achieve a similar goal, but would it include the important Description fields for every package so that I could do free-form searches to look for packages that match certain criteria in the package name or Description fields? And would it include the FirstPackageReleaseDate, so that I can quickly do a search to say "what completely brand new packages appeared in the last week" (this is quite distinct from packages that have been updated, which is a very different thing)? If it could achieve these things, that would be great, but I think that what I see in #820 is that it will do none of those things... :-( (please correct me if I am wrong!?). Also, since this has been on the roadmap for over 4 years, what do you think of just running that query once per day at your side that collects this information into a simple CSV published somewhere on the chocolatey site (that I can grab with Invoke-WebRequest)? This would turn CLI package discovery from a death-rattle crawl nightmare of running scripts that take 10-15 minutes just to pull the most basic package discovery information (as per my script above), into a joy and a pleasure to use the command line for all queries to find useful packages from the repo! (and I would appreciate this so much <all-of-my-fingers-and-toes-crossed!>). |
@roysubs why not simply use the current available RSS feed? http://feeds.feedburner.com/Chocolatey |
Thanks for that, I've not been aware of that. Had a look now, but it seems that this doesn't help with package discovery? e.g. Would the RSS feed allow me to take a query (say "aws") and from that list every package that matches that in it's package name or Description fields, and see information about the size of the package, FirstReleaseDate, LastPublishedDate, AppVersion etc, and so making it super-easy to pin down packages, find new packages etc? It is useful to see newly published packages, and I might be wrong, but this does not seem to help (much) with package discovery (i.e. A "good user experience" would be: being able to very quickly crawl the available packages via an offline CSV in seconds instead of the 10-15 minute torture that I have to endure on the CLI just to return basic package information as described above). |
A bit more on my situation and why so many people are in the same situation as me, and why this is useful: Like most people, my company has not (yet) bought Chocolatey, and so, like the vast majority of Chocolatey users, I am exclusively using the Community driven CCR ( chocolatey.org/packages ). This is primarily what I am referring to, since 99+% of what I will access and install will come from there (until my company hopefully purchases a license). So, all new users to Chocolatey (that have not started working in an organisation that has a paid license) will be focused on the CCR as the central location from which they will a) search for packages, b) install packages. It is this package discovery process that I am most interested in, and the "user experience" here is frankly, awful. The website is great, the tools are great, but if I want to do a quick search on some search term, like say I feel that this massively hurts the experience for new users to Chocolatey and turns them away. I've always found this package discovery aspect incredibly frustrating, and everyone that I've shown it too finds this very slow and awkward. I'm just trying to ask for some way that the CCR can be quickly queried - I feel that this will greatly increase the user experience for users. Maybe a better way overall to achieve this would be as follows:
Sorry to describe the user experience of this aspect of using |
You can use a custom repository without purchasing Chocolatey, and you can purchase Chocolatey and still use the chocolatey.org repository. Purchasing Chocolatey is not linked to using a custom repository. Most Nuget repositories work fine with Chocolatey. Nexus, Artifactory, ProGet, MyGet all work.
If you already know the name of the software you are looking for, I've found it to be fine. If you are looking to discover new pieces of software via finding the Chocolatey packages for that software, then yes, I would agree that it is not ideal. But in this case, I'm not sure that is the intent of the search/list command. I think this would need to be part of a larger discussion for if Chocolatey.org should also be a discovery platform for software, not just a repository. |
Exactly, I'm not talking about when I know the software already. Don't you think that "discovery" of available packages is not only a nice-to-have, but actually a core function that people can find more packages that might be of interest to them? For example, when I do In turn, the ability to discover packages in really quick succession with 20 searches for various different things also massively increases the overall reach of lesser known Chocolatey packages, making it a) a ton more appealing and enjoyable to use, and b) making some of those less-downloaded packages come within reach of people doing random searches for things and tinkering with "oh, ok, that minor package that is not downloaded a lot looks really interesting for something that I'm working on..." and they install it and find out that it really helps them with some task. It seems such an important oversight, to just neuter peoples ability to quickly do random searches for things. Are these not worthwhile user experience benefits? And I mean, at such a low cost, because it's just allowing people to cache the metadata, or to have a CSV with the fields that I mentioned above that I would happily open in Excel or use PowerShell to do regex searches through. Searching on the metadata instantaneously would be a real pleasure where you can whip through tons of random searches in seconds instead of 30+ minutes (quite literally!) to do 5 or 10 searches but then have to tie up what each packagename is to a meaningful description via I know what you mean, but repositories are big places, discovery is the means to get a handle on what is in that repository and that just seems to be a natural match to me; it feels currently like having a database but no easy way to search that information (well, of course, that's exactly what it is! lol). i.e. Shouldn't "easily searchable" be a core component of a repository? |
This is what community.chocolatey.org/packages can be used for. Until there is a local index to be able to pull that information from I don't see how it would be feasible for every So I think you're really look at this coming into package indexing. |
When you say "This is what community.chocolatey.org/packages can be used for", I completely agree, but consider this: if I am on Linux (say Ubuntu), I can do I completely agree that this is about package indexing, but, just the meta-data, as a clone of the community repository on my hard disk is somewhat overkill. "Until there is a local index to be able to pull that information from I don't see how it would be feasible for every |
I appreciate that Chocolatey is a package manager in the same way as apt, yum, pacman etc, and that the FOSS version, like any open source project, requires us to be a bit more hands on and roll our own DIY solutions for some things, but it occurs to me that package discovery is a needlessly frustrating, slow, and awkward user-experience (and as DevOps / SysAdmin types are the main users of this great package manager, it seems odd to cripple the main users). e.g. There is no way to find newly released packages (as in brand new, newly created, as opposed to updates), and tying up a package name to its package descriptions is really awkward since the output from
choco list
says nothing about the package (great if you know what a packagename is for, terrible if you have to repeatedly go throughchoco list --description
to resolve. This firstly made me write a simple PowerShell script to try and interrogate the server, but it is very slow and wasteful to repeatedly query the server over and over, so after discussion with Paul Broadwith, he suggested that I propose a feature request for discussion.My proposal is very simple: Once per day, a query is run on the DB to extract the following information:
Name,Version,PackageSize,DateOfPackageCreation,DateOfLastPublish,DateOfLastApproval,StatusOfLastApproval,Description(multiple fields?),LinkToGithubProject,LinkToWebPage,Maintainer,OtherUsefulMetaData1,OtherUsefulMetaData2,...etc
This information can then be available to download as a CSV or XML or JSON (in addition, publish the output as a simple web page sortable by column, with Name field being a hyperlink to the Chocolatey page for that package, and the Status or Maintainer field being a hyperlink to the Github project - I think such a page would become a super-efficient resource to find packages).
After that, I do not need to interrogate the server at all while I am hunting for packages, as I can just use PowerShell tools like Select-String to interrogate the .CSV - only when I have found the exact packages that I want to download, I can immediately go to a
choco install
to get what I need.Pros:
• Normally, if I am hunting for packages, I might make 20 queries against the server (firstly,
choco list <searchterm>
, then multiplechoco list <searchterm> --description
to get some idea of what the package is - and each step of this is really quite slow to come back with information...). This is tedious and frustrating. With the above, on any day that I want to query the DB, I would just usecurl
(or maybe a switch inchoco list
also) to grab the daily-dump from any site hosting it and query that.• The result is a massively faster retrieval of information to allow users (both FOSS and paid-subscribers) to very quickly discover new and interesting packages that are available. i.e. instead of interrogating the server (a waste of server resources!) I would just repeatedly interrogate the CSV/XML/JSON and this would take milliseconds instead of quite slow process that I have to do.
• Overall queries on the server should be reduced. This might not be a huge factor, but anything that can reduce stress on the DB is probably desirable in general. Additionally, the daily extract could be held in any other location, thus spreading query-load away from the DB server.
Cons:
• None. Setting up the daily dump would be trivially simple; query server-side once per day to create the CSV / XML / JSON etc, then all users can query that dump lightning fast instead of the frustrating and slow repeated
choco list
requests over and over. Having that one query available as a CSV daily would make usability / package discovery a dream compared to the current slog-fest.Why this would benefit all Chocolatey users:
• I have been told that there is a daily email with new/updated package information but that it doesn't contain a good overview of the information.
• The website is also not the best way to get overview information as we cannot easily use tools to interrogate for packages that we want, and DevOps / CLI users are the core user-base of Chocolatey, helping those users, can only help the spread of the tool.
• It would allow people to have a much faster holistic overview of the totality of the available packages, which allows the community to more easily test and find issues with broken or redundant packages, to help keep the environment clean, and the more easily that FOSS users can manipulate that totality view of the package list, the more easy it would be for them to compile useful summaries to management to show them how the paid subscription could piggy-back off the huge maintained package list (giving techies the ability to quickly compile such justification is good).
My own workaround to search for a term is super-clunky-nasty-cludge and horribly slow, but I can finally extract something useful. https://gist.github.com/roysubs/09454e20d54adeb805f14a4ec89deaa9
e.g.
choco-index azure aws git
will compile all packages that match these search terms and present them in an easy to read format and save the output. It is very slow, and is a real shame to have to d0o this when having a single daily query from the server as a CSV would make this all doable in a fraction of a second instead of the 15-30 minutes that it takes to collect the above, and so much simpler for Chocolatey users to discover more of the great packages that are in the repository.I would very much appreciate consideration of this as I think it would be trivially simple to implement (just a simple query run once per day) with only upsides for all users and making package discovery massively easier than very slow CLI
choco list --description
runs of having to go to a website and doing search after search (not exactly how DevOps / SysAdmins that like to work fast and automate works!) and just makes for a much faster and more efficient user experience (i.e. my command just to find packages with azure/aws/git takes 7 minutes to run ... surely a daily extract is a better way to return such basic information and allow chocolatey users to do many searches instantly instead of equivalent 7 minutes per run like the above?)The text was updated successfully, but these errors were encountered: