-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add language support for Liquidsoap #6565
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given its age, usage is surprisingly low in GitHub at the moment, but that could be due to old inactive repos that haven't been indexed yet (the new search will get to them one day).
Other than the suggestion, your three samples suppressed in the diff are too big (probably because of the extensive comments). Please replace these with smaller yet diverse samples with fewer comments.
As popularity isn't sufficient for inclusion right now, I'll label it as such and will review popularity with each release.
Thanks for the review @lildude. I have applied the changes. I am confused by the popularity assessment. I read the contribution guideline carefully before engaging in all the work required to implement the grammar and requirements for this so this comes as a surprise. The contribution guidelines states that:
Considering the poor results provided by the current search this was later amended in #5756, stating:
In light of this, The search results only show 5 tabs with 20 entries each. That represents only the first
Considering this, I have no doubt that, should the search be returning satisfactory view on the actual data, it would show an even distribution of very well over 200 unique repositories. This is the assumption I made before starting this work. We have engaged in this work because, on the user side, we have identified that, since most of our users are not developers, better support for tooling around the language is really important to improve understanding and learning it. This is part of a larger push that includes a tree-sitter grammar, a prettier plugin and vs-code extension (linked to this PR). Support for the language in github would be a very valuable tool to help our users report issues and understand suggestions and responses to them, particularly since they are very new to programming. The search results for issues reports about 3k of them mentioning the project and 1k pull requests. Discussions is a fairly new feature but it already almost 500 of them. A very quick google search on stackoverflow reveals at least 1k entries. Our user community is very large, with over one million pull on the most popular docker image. The project is used by multiple large scale organization (Radionomy, owner of winamp and shoutcast, live 365, Radio France with millions of daily listeners) as well as a great number of open source projects (AzuraCast, about 30k radio stations all running liquidsoap scripts, libretime, etc). Lastly, but very importantly for us, multiple smaller community radios and communities around the world rely on the tool to communicate. See for instance some of the presentations during our 3rd liquidshop here: http://www.liquidsoap.info/liquidshop/3. The project was also represented in the last two FOSDEM open-source conference at the media devrooms with solid interest from the audience. Thus I would like to kindly ask if it would be possible to reconsider the popularity threshold in light of these details as I do believe that the language does meet the documented threshold for inclusion. Thank you for your consideration! |
de19397
to
3e0f282
Compare
Hi @lildude ! The indexer has caught up to |
No, because most of those files are owned by a single user, the creator of the language, and thus they have an undue influence on the count. Excluding them drops things quite dramatically. Note, I reevaluate popularity whenever I make a new release (approx every 3-4 months) so there's no need for pings to check. As an aside, this PR has conflicts that would need to be addressed first anyway. |
@lildude I asked the question because I do not want to have to update the PR constantly without a clear understanding of wether or not it will be considered for inclusion. Your policy has a distinction between one file per repository and states a threshold of 200 and multiple files per repository and states a threshold of How do you decide which case applies? Can you list example of languages that fall into the one file per repository policy? For instance, does the one file per repository polity apply to Lastly, and as I asked previously: how did you assess that the liquidsoap language does not qualify as one that fits under that specific policy? In my previous comment, most repositories are single-script users because the language is mostly used to defined a single stream script, not to define libraries and large code-base projects. Evidently, With all due respect, I find this temporary popularity assessment policy, documented only in "FYI" pull request vague and a disservice to the open source community at large. If the purpose of it is to assess that only real languages, i.e. languages that are used in the wild, are considered for inclusion then I have to admit than this whole thread is a complete failure. Users of this platform, a lot of them relying on it for professional and real-life applications, deserve a better due process and clarity about how decisions are made as those impact their projects and community at large. Thus I would like to kindly ask: does this position represent the official github platform position? If so, what are the appropriate channels to file a complaint about it? Thanks. |
There's no need to constantly update the PR. Master will always be merged in prior to merging so there's no point in continually merging in master. Resolving conflicts can be done as you notice them or I'll ping you to resolve before merging if I can't resolve them myself.
This is based on how the file/extension is commonly used. If a file/extension is generally only expected to have a single file per repo, for example Makefile or Dockerfile, then the former applies. If a repo is expected to reasonably, and commonly, contain multiple files of the language, the latter applies.
Yes.
The search query you placed in the PR template currently returns ~1.7k results: This is already less than the threshold for the multiple files per-repo scenario, but might qualify for the single file per-repo scenario (yes, I know this figure fluctuates so things are a little precarious on the 2k border). So I look at the directory for the top result and a) note that this is the language creator and b) the library contains many As this is the language creator, they're most likely to be the largest user and promoter of the language so I remove them from the search results to see how much of an influence they have on the figures and if their usage dispropotionately swings things in their favour, which it does in this case as as soon as we remove them, the number of files drops significantly to: This is a dramatic reduction so I stop my analysis at this point. In some cases I might filter out a few more high users to be sure things aren't being unduly influenced.
Most might be, but there is clear evidence that this is not always the case from a quick look at several of the repos returned by the search results. From a more lengthy analysis than I normally do, the first page of results, once I exclude the language creator, returns 20 results. Of that 3 are clearly not Liquidsoap. 10 of the remaining 17 contain more than 1
No. This is the policy of this project alone and has been the policy for 10+ years. The even more vague "In most cases we prefer that languages already be in use in hundreds of repositories before supporting them in Linguist." was first documented in the CONTRIBUTING.md file back on Nov 2014 but the policy existed before then. Back then it was possible to assess the number of repos a language was used in. As the sole maintainer of this project, in my spare time, I can't be expected to know how every language is used so have to rely on this imperfect analysis that is hobbled by the limitations of GitHub's Search. The best I can do is be consistent in how I implement this, which I try to do, hence I've documented the basic process I follow. If you can come up with a more reliable method of assessing the number of files and unique I know this is far from perfect, but there isn't a perfect solution. |
So lemme get this straight. You'll only support a language if the files it uses are either present by the hundreds per repo or hundreds of repos using a single file, completely disregarding whether the repos that contain the file have thousands of stars, forks and other metrics that show a vast interest in the project? That's a pretty bad assessment of use. Whether or not something has widespread adoption would be a better metric. Seeing liquidsoap as the quasi default system for building multimedia streams it has found its way into many projects that have thousands of downloads, stars and forks. I'm frankly surprised it is not being considered given the projects it drives are quite plentiful to the point even your resident AI of choice has heard of it and can hallucinate some syntax for it. I agree that it might not be the most common language owing in part to it being complex to learn and master and serving only one type of usecase regarding streaming and multimedia applications, but the same can be said for a lot of other things. Haskell and COBOL only serve very few projects, but they run the mainframes that power the entire financial sector. The metric of whether something is worth adding should never be just an arbitrary filecount and rather reflect how useful the language actually is and how much of an impact it makes. Another metric worth looking at is the nature of the language itself. A repo containing a docker or composer file can happily run without it. I regularly implement these things natively without them being used, but liquidsoap practically drives the projects its part of. Pulling out the file would break the execution of the project and so much depends on it that replacing it with something else is usually impossible as well. It would be like ripping out a dll. I get it. Adding a language and maintaining support for it is an additional burden. Languages show up and go nowhere plenty of times so one must set some rules for what's worthy of the time. People re-invent the wheel constantly to fix old languages so some metric is needed to avoid having to follow that trend all the time. However. That metric should be based on the maturity of a language in terms of its adoption in projects and what functionality it provides rather than a simple count. |
From a developer and user standpoint, I must say Liquidsoap isn’t too easy to learn—exactly one reason to have grammar/highlighting support!–and its real distribution probably not so easy to determine, since it is "buried" into many other projects with very high distribution, like Centovacast, AirTime, LibreTime, AzuraCast, Radio France, the Live365 network and many others. Also, it has been around for almost 20 years, and the developers are very active, which might explain the "creator overhang". A typical radio station will also mostly write their own code, adapt things, and not necessarily publish. But still wish to be abe to read and write code better. There are hundreds of thousands of them, including myself, who has been using Liquidsoap for ~15 years now, and always missed grammar/highlighting support with the "big players" like GitHub, VSCode, etc. The assessment count also—I believe—doesn’t include Gists, that often have Liquidsoap definitely is a language with not too great visibility, but that’s mainly because it’s the working backbone in many other high-volume projects. And typically so stable that it can run over years, which might also explain the low assessment counts. Liquidsoap definitely is not a hyped mayfly that’ll soon vanish, like many other so-called "languages" that evaporate quickly. It’s a serious business backbone used in hundreds of thousands, if not millions of installations. Both developers and users set high hopes for GitHub adding this language. I mean, it’s all prepared, PR’ed, checked, etc. Thank you for considering, and hopefully adding the language! |
Not at all. Stars, forks, downloads and other similar metrics are indicators of the popularity of a project/repo, not the wide-spread usage of the language. Any project/repo that makes it to the front page of Hacker News instantly jumps in all of those metrics, but in the case of a new language, the actual usage of the language in new projects doesn't see a similar rapid growth. Mojo is one such language, though it saw quicker growth than most so it didn't take long to reach require usage levels. One repo with a single unique language with a million stars, forks and downloads is not indicative that people are actually using the langauge, it only indicates that repo is popular. Linguist's requirement is all about usage on GitHub.com (added emphasis mine):
You've just agreed with how we're measuring things to meet Linguist's requirements 😁
That would involve rewriting Linguist as it currently only looks at files in isolation. We assess usage in a similar manner.
That's what we're trying to guage, specifically "its adoption in [public] projects [on GitHub]". |
Thank you for these clarifications, I genuinely appreciate. Ultimately, I still do no understand the policy and its purpose. If the purpose is to prevent frivolous languages from being included then I think that we have provided ample support against this here. Materially, it does not make sense to arbitrarily limit the number of languages either. Your contributors come with all the hard work already done and if your project or the platform cannot handle a large number of language then that surely is a conceptual issue. At the end of the day, the policy's purpose still hasn't been explicitly stated and its application definitely seems capricious at best. Obviously, we are not gonna come to an agreement here. Since you are not affiliated with the platform, I will have to reach out to file a complaint and see if they can find a solution that better fits their users and community. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage still isn't quite there, but you've now got conflicts. Please resolve these so when things meet usage reqs we can go ahead and merge.
Hi, I have fixed the conflicts as they were quite trivial. However, I do not think that continuously asking contributors to do work that is not likely to be included is appropriate and, as I said earlier, the process used to evaluate popularity is wholly inadequate. |
@lildude You search excluding savonet files has doubled in 6 months from 700 files to 1.4k files now: https://github.com/search?type=code&q=NOT+is%3Afork+path%3A*.liq+NOT+user%3Asavonet Not only does that show the futility of this metric but at this point it's also clearly questioning the accuracy of these numbers. I love our project but I don't think adoption has doubled in those 6 months. I do believe, however, that its usage is widespread. |
Also worth nothing that the ratio is now 2k including savonet to 1.4k excluding it. It was 1.9k vs. 700 6 months ago, which indicates that your assumption that the driver for those numbers is "the largest user and promoter of the language" is wrong. |
Description
This PR adds support for the liquidsoap language. The language has been existing since ~2005 and is widely used to run media streaming applications. Although its original scope is specialized, the language itself if a general-purpose scripting language that is functional and statically typed with inferred typed.
Checklist:
#990066