Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add language support for Liquidsoap #6565

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

toots
Copy link

@toots toots commented Oct 5, 2023

Description

This PR adds support for the liquidsoap language. The language has been existing since ~2005 and is widely used to run media streaming applications. Although its original scope is specialized, the language itself if a general-purpose scripting language that is functional and statically typed with inferred typed.

Checklist:

@toots toots requested a review from a team as a code owner October 5, 2023 06:31
Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given its age, usage is surprisingly low in GitHub at the moment, but that could be due to old inactive repos that haven't been indexed yet (the new search will get to them one day).

Other than the suggestion, your three samples suppressed in the diff are too big (probably because of the extensive comments). Please replace these with smaller yet diverse samples with fewer comments.

As popularity isn't sufficient for inclusion right now, I'll label it as such and will review popularity with each release.

vendor/licenses/git_submodule/vscode-liquidsoap.dep.yml Outdated Show resolved Hide resolved
@toots
Copy link
Author

toots commented Oct 5, 2023

Thanks for the review @lildude. I have applied the changes.

I am confused by the popularity assessment. I read the contribution guideline carefully before engaging in all the work required to implement the grammar and requirements for this so this comes as a surprise.

The contribution guidelines states that:

each new file extension [should] be in use in at least 200 unique :user/:repo repositories

Considering the poor results provided by the current search this was later amended in #5756, stating:

  • at least 2000 files per extension indexed in the last year (the number you see at the top of the search results), unless the extension is expected to only occur once per repo, then 200 files.
  • with a reasonable distribution across unique :user/:repo combinations assessed by manually and randomly clicking through the results.

In light of this, 1.7k files is already pretty close to the 2k threshold in and of it self. Furthermore, liquidsoap is a specialized programming language. Most of our users are not developers and most projects consists of one liquidsoap script file that defines all the streaming setup with e.g. a docker compose files or a CMS or database and website application, etc.

The search results only show 5 tabs with 20 entries each. That represents only the first 100 results, 5% of the total results. Glancing through these very limited results, however, it is clear that:

  • Most repositories have between 1 and 5 files
  • A lot of them are large projects with backend and front end for a radio-related website and only a handful of liquidsoap files

Considering this, I have no doubt that, should the search be returning satisfactory view on the actual data, it would show an even distribution of very well over 200 unique repositories. This is the assumption I made before starting this work.

We have engaged in this work because, on the user side, we have identified that, since most of our users are not developers, better support for tooling around the language is really important to improve understanding and learning it. This is part of a larger push that includes a tree-sitter grammar, a prettier plugin and vs-code extension (linked to this PR).

Support for the language in github would be a very valuable tool to help our users report issues and understand suggestions and responses to them, particularly since they are very new to programming.

The search results for issues reports about 3k of them mentioning the project and 1k pull requests. Discussions is a fairly new feature but it already almost 500 of them. A very quick google search on stackoverflow reveals at least 1k entries.

Our user community is very large, with over one million pull on the most popular docker image.

The project is used by multiple large scale organization (Radionomy, owner of winamp and shoutcast, live 365, Radio France with millions of daily listeners) as well as a great number of open source projects (AzuraCast, about 30k radio stations all running liquidsoap scripts, libretime, etc).

Lastly, but very importantly for us, multiple smaller community radios and communities around the world rely on the tool to communicate. See for instance some of the presentations during our 3rd liquidshop here: http://www.liquidsoap.info/liquidshop/3.

The project was also represented in the last two FOSDEM open-source conference at the media devrooms with solid interest from the audience.

Thus I would like to kindly ask if it would be possible to reconsider the popularity threshold in light of these details as I do believe that the language does meet the documented threshold for inclusion. Thank you for your consideration!

@toots toots force-pushed the master branch 2 times, most recently from de19397 to 3e0f282 Compare October 5, 2023 13:15
EstelitoBunyijr

This comment was marked as spam.

@toots
Copy link
Author

toots commented Jun 26, 2024

Hi @lildude ! The indexer has caught up to 2k files now. Is it time to reconsider this inclusion?

@lildude
Copy link
Member

lildude commented Jun 26, 2024

Hi @lildude ! The indexer has caught up to 2k files now. Is it time to reconsider this inclusion?

No, because most of those files are owned by a single user, the creator of the language, and thus they have an undue influence on the count. Excluding them drops things quite dramatically.

Note, I reevaluate popularity whenever I make a new release (approx every 3-4 months) so there's no need for pings to check.

As an aside, this PR has conflicts that would need to be addressed first anyway.

@toots
Copy link
Author

toots commented Jun 26, 2024

@lildude I asked the question because I do not want to have to update the PR constantly without a clear understanding of wether or not it will be considered for inclusion.

Your policy has a distinction between one file per repository and states a threshold of 200 and multiple files per repository and states a threshold of 2k.

How do you decide which case applies?

Can you list example of languages that fall into the one file per repository policy?

For instance, does the one file per repository polity apply to Dockerfile and the language associated with it? If so, do you accept that most repository will have only one file per repository but in a lot of other cases, there will be handful of them as well?

Lastly, and as I asked previously: how did you assess that the liquidsoap language does not qualify as one that fits under that specific policy?

In my previous comment, most repositories are single-script users because the language is mostly used to defined a single stream script, not to define libraries and large code-base projects.

Evidently, 1.2k files spread on unique repositories where most of them are expected to have one or a handful of files largely surpass the 200 files threshold.

With all due respect, I find this temporary popularity assessment policy, documented only in "FYI" pull request vague and a disservice to the open source community at large.

If the purpose of it is to assess that only real languages, i.e. languages that are used in the wild, are considered for inclusion then I have to admit than this whole thread is a complete failure.

Users of this platform, a lot of them relying on it for professional and real-life applications, deserve a better due process and clarity about how decisions are made as those impact their projects and community at large.

Thus I would like to kindly ask: does this position represent the official github platform position? If so, what are the appropriate channels to file a complaint about it?

Thanks.

@lildude
Copy link
Member

lildude commented Jun 26, 2024

@lildude I asked the question because I do not want to have to update the PR constantly without a clear understanding of wether or not it will be considered for inclusion.

There's no need to constantly update the PR. Master will always be merged in prior to merging so there's no point in continually merging in master. Resolving conflicts can be done as you notice them or I'll ping you to resolve before merging if I can't resolve them myself.

Your policy has a distinction between one file per repository and states a threshold of 200 and multiple files per repository and states a threshold of 2k.

How do you decide which case applies?

This is based on how the file/extension is commonly used. If a file/extension is generally only expected to have a single file per repo, for example Makefile or Dockerfile, then the former applies. If a repo is expected to reasonably, and commonly, contain multiple files of the language, the latter applies.

For instance, does the one file per repository polity apply to Dockerfile and the language associated with it? If so, do you accept that most repository will have only one file per repository but in a lot of other cases, there will be handful of them as well?

Yes.

Lastly, and as I asked previously: how did you assess that the liquidsoap language does not qualify as one that fits under that specific policy?

The search query you placed in the PR template currently returns ~1.7k results:

1.7k results

This is already less than the threshold for the multiple files per-repo scenario, but might qualify for the single file per-repo scenario (yes, I know this figure fluctuates so things are a little precarious on the 2k border). So I look at the directory for the top result and a) note that this is the language creator and b) the library contains many .liq files so the single file per-repo scenario does not apply.

As this is the language creator, they're most likely to be the largest user and promoter of the language so I remove them from the search results to see how much of an influence they have on the figures and if their usage dispropotionately swings things in their favour, which it does in this case as as soon as we remove them, the number of files drops significantly to:

785 results

This is a dramatic reduction so I stop my analysis at this point. In some cases I might filter out a few more high users to be sure things aren't being unduly influenced.

In my previous comment, most repositories are single-script users because the language is mostly used to defined a single stream script, not to define libraries and large code-base projects.

Most might be, but there is clear evidence that this is not always the case from a quick look at several of the repos returned by the search results. From a more lengthy analysis than I normally do, the first page of results, once I exclude the language creator, returns 20 results. Of that 3 are clearly not Liquidsoap. 10 of the remaining 17 contain more than 1 .liq file so whilst only a small sample (I don't have the time to manually analyse every repo), I think it's fair to assess that it's quite common to have more than one file per repo.

Thus I would like to kindly ask: does this position represent the official github platform position? If so, what are the appropriate channels to file a complaint about it?

No. This is the policy of this project alone and has been the policy for 10+ years. The even more vague "In most cases we prefer that languages already be in use in hundreds of repositories before supporting them in Linguist." was first documented in the CONTRIBUTING.md file back on Nov 2014 but the policy existed before then. Back then it was possible to assess the number of repos a language was used in.

As the sole maintainer of this project, in my spare time, I can't be expected to know how every language is used so have to rely on this imperfect analysis that is hobbled by the limitations of GitHub's Search. The best I can do is be consistent in how I implement this, which I try to do, hence I've documented the basic process I follow. If you can come up with a more reliable method of assessing the number of files and unique :user/:repo using a language, I'm happy to explore adopting that method. I used to have a script for this, but GitHub's Search changed such that that method is no longer possible to use.

I know this is far from perfect, but there isn't a perfect solution.

@Tampa
Copy link

Tampa commented Jun 26, 2024

So lemme get this straight. You'll only support a language if the files it uses are either present by the hundreds per repo or hundreds of repos using a single file, completely disregarding whether the repos that contain the file have thousands of stars, forks and other metrics that show a vast interest in the project? That's a pretty bad assessment of use.

Whether or not something has widespread adoption would be a better metric. Seeing liquidsoap as the quasi default system for building multimedia streams it has found its way into many projects that have thousands of downloads, stars and forks. I'm frankly surprised it is not being considered given the projects it drives are quite plentiful to the point even your resident AI of choice has heard of it and can hallucinate some syntax for it.

I agree that it might not be the most common language owing in part to it being complex to learn and master and serving only one type of usecase regarding streaming and multimedia applications, but the same can be said for a lot of other things. Haskell and COBOL only serve very few projects, but they run the mainframes that power the entire financial sector. The metric of whether something is worth adding should never be just an arbitrary filecount and rather reflect how useful the language actually is and how much of an impact it makes.

Another metric worth looking at is the nature of the language itself. A repo containing a docker or composer file can happily run without it. I regularly implement these things natively without them being used, but liquidsoap practically drives the projects its part of. Pulling out the file would break the execution of the project and so much depends on it that replacing it with something else is usually impossible as well. It would be like ripping out a dll.

I get it. Adding a language and maintaining support for it is an additional burden. Languages show up and go nowhere plenty of times so one must set some rules for what's worthy of the time. People re-invent the wheel constantly to fix old languages so some metric is needed to avoid having to follow that trend all the time. However. That metric should be based on the maturity of a language in terms of its adoption in projects and what functionality it provides rather than a simple count.

@Moonbase59
Copy link

From a developer and user standpoint, I must say Liquidsoap isn’t too easy to learn—exactly one reason to have grammar/highlighting support!–and its real distribution probably not so easy to determine, since it is "buried" into many other projects with very high distribution, like Centovacast, AirTime, LibreTime, AzuraCast, Radio France, the Live365 network and many others. Also, it has been around for almost 20 years, and the developers are very active, which might explain the "creator overhang".

A typical radio station will also mostly write their own code, adapt things, and not necessarily publish. But still wish to be abe to read and write code better. There are hundreds of thousands of them, including myself, who has been using Liquidsoap for ~15 years now, and always missed grammar/highlighting support with the "big players" like GitHub, VSCode, etc.

The assessment count also—I believe—doesn’t include Gists, that often have .liq code.
Though I do very well understand that a "one-person show" has its limits, I’d wish more factors would be taken into account, like @Tampa said, maybe stars, forks, download counts.

Liquidsoap definitely is a language with not too great visibility, but that’s mainly because it’s the working backbone in many other high-volume projects. And typically so stable that it can run over years, which might also explain the low assessment counts.

Liquidsoap definitely is not a hyped mayfly that’ll soon vanish, like many other so-called "languages" that evaporate quickly. It’s a serious business backbone used in hundreds of thousands, if not millions of installations.

Both developers and users set high hopes for GitHub adding this language. I mean, it’s all prepared, PR’ed, checked, etc.

Thank you for considering, and hopefully adding the language!

@lildude
Copy link
Member

lildude commented Jun 26, 2024

So lemme get this straight. You'll only support a language if the files it uses are either present by the hundreds per repo or hundreds of repos using a single file, completely disregarding whether the repos that contain the file have thousands of stars, forks and other metrics that show a vast interest in the project? That's a pretty bad assessment of use.

Not at all. Stars, forks, downloads and other similar metrics are indicators of the popularity of a project/repo, not the wide-spread usage of the language. Any project/repo that makes it to the front page of Hacker News instantly jumps in all of those metrics, but in the case of a new language, the actual usage of the language in new projects doesn't see a similar rapid growth. Mojo is one such language, though it saw quicker growth than most so it didn't take long to reach require usage levels. One repo with a single unique language with a million stars, forks and downloads is not indicative that people are actually using the langauge, it only indicates that repo is popular.

Linguist's requirement is all about usage on GitHub.com (added emphasis mine):

We try only to add new extensions once they have some usage on GitHub. In most cases we prefer that each new file extension be in use in at least 200 unique :user/:repo repositories before supporting them in Linguist (but see #5756 for a temporary change in the criteria).

Whether or not something has widespread adoption would be a better metric.

You've just agreed with how we're measuring things to meet Linguist's requirements 😁

Another metric worth looking at is the nature of the language itself.

That would involve rewriting Linguist as it currently only looks at files in isolation. We assess usage in a similar manner.

That metric should be based on the maturity of a language in terms of its adoption in projects and what functionality it provides rather than a simple count.

That's what we're trying to guage, specifically "its adoption in [public] projects [on GitHub]".

@toots
Copy link
Author

toots commented Jun 27, 2024

Thank you for these clarifications, I genuinely appreciate.

Ultimately, I still do no understand the policy and its purpose.

If the purpose is to prevent frivolous languages from being included then I think that we have provided ample support against this here.

Materially, it does not make sense to arbitrarily limit the number of languages either. Your contributors come with all the hard work already done and if your project or the platform cannot handle a large number of language then that surely is a conceptual issue.

At the end of the day, the policy's purpose still hasn't been explicitly stated and its application definitely seems capricious at best.

Obviously, we are not gonna come to an agreement here. Since you are not affiliated with the platform, I will have to reach out to file a complaint and see if they can find a solution that better fits their users and community.

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage still isn't quite there, but you've now got conflicts. Please resolve these so when things meet usage reqs we can go ahead and merge.

@toots
Copy link
Author

toots commented Sep 8, 2024

Hi,

I have fixed the conflicts as they were quite trivial.

However, I do not think that continuously asking contributors to do work that is not likely to be included is appropriate and, as I said earlier, the process used to evaluate popularity is wholly inadequate.

@Alhadis Alhadis changed the title Add language: liquidsoap Add language support for Liquidsoap Sep 11, 2024
lib/linguist/languages.yml Outdated Show resolved Hide resolved
@toots toots requested a review from Alhadis September 12, 2024 17:21
@toots
Copy link
Author

toots commented Nov 13, 2024

@lildude You search excluding savonet files has doubled in 6 months from 700 files to 1.4k files now: https://github.com/search?type=code&q=NOT+is%3Afork+path%3A*.liq+NOT+user%3Asavonet

Not only does that show the futility of this metric but at this point it's also clearly questioning the accuracy of these numbers. I love our project but I don't think adoption has doubled in those 6 months.

I do believe, however, that its usage is widespread.

@toots
Copy link
Author

toots commented Nov 13, 2024

Also worth nothing that the ratio is now 2k including savonet to 1.4k excluding it. It was 1.9k vs. 700 6 months ago, which indicates that your assumption that the driver for those numbers is "the largest user and promoter of the language" is wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants