-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
License change from GPL to MPL #2456
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve the license change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve
Codecov Report
@@ Coverage Diff @@
## master #2456 +/- ##
=======================================
Coverage 91.58% 91.58%
=======================================
Files 62 62
Lines 12028 12028
=======================================
Hits 11016 11016
Misses 1012 1012 Continue to review full report at Codecov.
|
Thanks Matt, this is really great! I understand that Alan was studying this move in great detail, so I wonder if he can post answers to few more questions for the FAQ:
|
@st-pasha Yes the fact that R's headers are LGPL makes the difference. That's why it's possible for there to be a very wide range of licenses (including MPL, Apache and MIT) available for packages acceptable to CRAN listed in https://svn.r-project.org/R/trunk/share/licenses/license.db which is linked from CRAN policies and R-exts. In terms of Rversion.h I doubt anyone considers that a version number needs to be licensed. There isn't any code to be protected in a version number. Yes, Alan (our in-house H2O lawyer) reviewed my email to contributors before it was sent out. If you need him to reply here about these supplemental questions, please ask him. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. Hoping this means a wider audience for data.table and more :)
approve |
Just curious, will |
@dselivanov Yes |
… check removal of Rversion.h for #2456 was ok.
Everyone has approved. Thank you! As the thumbs-up pop-up only shows the first 10 ids "plus 5 more", I need to post a final list in one place. It seems non project members can add an approving review even if they haven't been requested, so that's better than thumbs-up if there's ever a next time. I've heard from everyone via email too. NB: the sum(commits) here is 2,871 and excludes merge commits, as stated at the top of the graph in Insights->Contributors. The 3,175 commits displayed at the top left of the code tab is 304 larger because that includes merge commits. |
|
||
You may add additional accurate notices of copyright ownership. | ||
|
||
Exhibit B - "Incompatible With Secondary Licenses" Notice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that as the LICENSE file contains the Exhibit B and I couldn't find any other copyright notice, it makes it very unclear to determine whether data.table is allowed to be combined with "Secondary Licenses" or not. Because the License states that in the absence of copyright notice, the LICENSE file should be used as copyright notice.
As the relicensing has been conducted to make data.table compatible with closed source software. I believe including this Exhibit in the LICENSE file is not recommended, as it makes data.table incompatible with software with different licenses than MPL 2.0.
@alamit If you still believe it is recommended to remove Exhibit B from the LICENSE file, please point me to a third party recommendation. I believe the LICENSE file is best left unchanged: it is best to be a verbatim copy of MPL-2.0 including its exhibits so there can be no doubt it is verbatim MPL-2.0; e.g. as confirmed by the file size in bytes and a hash. I hope this answers every aspect of your comment and that you are able to proceed satisfactorily. If there is anything else I can answer in more detail or anything that I can change, please let me know. A little more on Exhibit A (not exhibit B). This text appears just after Exhibit A at the bottom of the LICENSE file :
So, yes, I see no need to place the license, or Exhibit A at the top of each and every source file. I know many projects do, but I saw no need. I'm relying on this explicit paragraph in the license (GPL has a similar one) and I've called this file LICENSE in the root directory with that in mind, as is common in many other projects too. This saves having to check we've remembered to place Exhibit A at the top of each source file in the correct way. Also, when we changed from GPL to MPL we didn't need to go and touch every single source file. I've seen Exhibit A or similar in every source file in projects which have a mixture of licenses. So having one LICENSE file in the root of data.table conveys clearly and easily that there is no mixed licensing in data.table: it's all MPL 2.0. That is my current thinking anyway. Happy to hear further comments and suggestions. |
From https://www.mozilla.org/en-US/MPL/2.0/FAQ/ :
|
related to #4140 - approving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve this licensing change.
This pull request is a verbatim copy of an email I sent to all 24 contributors of code to the project. All 24 have now approved with either a review or a thumbs-up. Thanks everyone!
Dear contributor to data.table,
Since its creation in 2006, data.table has been GPL. You are receiving this email because you have contributed either R or C code to the data.table project (even one line) and are considered a license holder. Contributors to documentation-only have not been included. Myself and Arun (because we are the biggest contributors) have tentatively agreed, subject to all your approvals and further comments, that it was never our intention to prevent closed-source products from using data.table, and to change to the Mozilla Public License (MPL). We think of data.table as a library.
I believed that closed-source products can already call interpreted R code, such as :
It was never our intention to prevent usage like that by closed-source products. I believed that R code was "data" and not considered linking. The reason I believed that was this GNU FAQ :
https://www.gnu.org/licenses/gpl-faq.en.html#IfInterpreterIsGPL
The first paragraph is very clear :
So, the GPL does not apply to interpreted code. However, it has been brought to my attention that the same answer continues with 3 further paragraphs :
So, it is my new understanding that data.table's GPL license could be interpreted as preventing closed-source products from using data.table, even at R-only level. However strange that sounds, I'm willing to agree to that definition, for the avoidance of doubt.
What we intended was that we contributed free software in the open under one condition: that if you improve data.table itself and distribute your improved data.table as a product, then that product must be open-source too. We were concerned about the code inside data.table itself, not use of data.table. This is why, for data.table, we never agreed with MIT or Apache, and still don't. We do not like those licenses for data.table because we think them unfair to the contributors. We are concerned about, say, data.table.PRO, a closed-source improvement of data.table being created based on our free contributions. We are not concerned about using data.table as a library by closed-source software. In fact, we are now concerned about that being perceived as prevented.
Whether or not we agree with the way this GNU FAQ is being interpreted, we are happy in principle to agree to that interpretation. This is not a change in policy, but a change in the license to match what we intended anyway, for the avoidance of doubt.
The natural first thought, was LGPL. But that has some restrictions that are opposed :
https://softwareengineering.stackexchange.com/q/221365
The MPL (Mozilla Public License) does not have those restrictions. The MPL is even lesser than the LGPL. It is the lowest we can go with throwing the code to the wind and going with a lax license like MIT or Apache which would allow closed-source data.table.PRO to be created.
I didn't want to put anyone under public pressure. So this is an email first. If nobody disagrees then I will create a pull request making the license change. All project members will be reviewers and all must approve. I can't add non project members as reviewers because GitHub doesn't allow that, so non project members will need to please add their vote to 'thumbs-up'. Any single one of you can veto the change and that will be respected. Even if all project members agree, a single contributor who isn't a project member can still veto the change and that will be respected too. The pull request will basically be this email copied in. Further discussion could take place inside the PR before you approve; you don't have to approve now by email unless you want to. This is an opportunity to apply your veto privately via email to me, before the public PR is created.
To put some examples to it, if we change to MPL, here is what will be ok and what will not be ok.
What is ok under MPL
closed-source products can use data.table via any mechanism of their choosing. That includes R-only usage of data.table. Linking to datatable.so and calling its C API in a closed-source product will now be ok under MPL but was not ok under GPL (if anyone wants to do that!).
improving data.table's implementation and distributing that improvement privately within your company, for profit or not, even across your international offices is ok by MPL and was ok by GPL too. It's only when the package is distributed outside a company (think data.table.PRO) that any improvements to data.table have to be released open-source. Those improvements don't have to be contributed back to data.table but they do have to be open-source in public and licensed as MPL to ensure the code stays open-source. Nothing compels private changes to GPL, LGPL or MPL code to be released. Distributing within a company is not distributing. However, the MPL's wording is simpler and more explicit than the GPL in this regard.
Re-implementing data.table in a "clean-room" independently from scratch: there's nothing we can do to prevent that. For example, TERR is a closed-source re-implementation of R by TIBCO which is distributed to its customers. We'd all prefer if TERR was open-source so we could benefit from it but there's nothing we can do because they did it independently with new code without looking at R's source code. That could, presumably, be done for data.table too. If we changed to MIT or Apache, then the task of improving data.table and making closed-source data.table.PRO would be much easier, which we don't think is fair to data.table's contributors.
What is not ok under MPL
Matt Dowle creating closed-source data.table.PRO. I am not the license holder. All the contributors (i.e. you) are jointly the license holders. As soon as you contributed to data.table you are a license holder and you can then veto any future license changes. I have prevented myself from making closed-source data.table.PRO via the choice of GPL. Moving to MPL will not change this, I will still be prevented. If I am prevented, I don't see why we should change to Apache or MIT to let someone else create data.table.PRO.
anyone creating data.table.PRO by starting from the code inside data.table, making it better and releasing that as a closed-source product. Or, the other way too: releasing the changes under an Apache, MIT, or similarly lax license. Because the lax license opens the door to closed-source data.table.PRO from there. Our intention is to prevent our free contributions from helping a closed-source improvement of data.table be created. We don't want to be disrespected in that way or taken advantage of. We don't mind if a closed-source product uses data.table.
Anticipated questions
Q: What does Matt and Arun's 'author' status mean?
A: We get mentioned in
citation("data.table")
because we're, currently, the biggest contributors. It's unrelated to ownership or licensing.Q: What does the 'contributor' status mean in DESCRIPTION?
A: Again, it's not to do with licensing or ownership. The contributors names are listed on the CRAN page. It's kudos.
Q: Why is there no license holder in DESCRIPTION?
A: Because, currently, we like that there isn't. There still won't be under MPL. All the contributors own the project jointly and the license can't be changed without all their permission. If we changed to MIT or Apache we would then have to name a license holder (that's a requirement of those licenses). Who would be the license holder be in our case? From that point, your contributions would be given to the license holder. They could change the license later (say, to closed-source, or a PRO version with enhancements). We could pick Apache and assign the Apache Foundation as license holder. But that is a long process to incubate. For now, we prefer the simpler MPL and leave the door open to consider Apache in the future.
Q: If we decide MPL was a mistake, can we change back to GPL?
A: Yes. We would all have to agree again. The MPL is the lowest and simplest we can go while retaining this "it-belongs-to-all-of-us" feature. However, once v1.11.0 was released as MPL we could not take that back. Any subsequent change back to GPL would apply from v1.11.2 onwards.
Q: Could we change to a lax license, like MIT, Apache or BSD?
A: Yes. We would all have to agree again. To change back from those licenses, though, your permission would no longer be needed. It would be the license holder who could decide that by themselves. This is one reason I have ruled out those licenses for data.table, for now, subject to your comments. The other reason being that they permit closed-source data.table.PRO.
Q: Why is it Matt that's writing?
A: Just because I'm the biggest contributor and current maintainer of the package on CRAN. I'm merely acting as an administrator/maintainer.
Q: What's driving the change?
A: H2O (my employer) has created a closed-source product called Driverless-AI. It uses a Python package pydatatable which is a port of data.table to Python. With data.table being GPL, pydatatable needs to be GPL, and therefore the concern is that Driverless-AI can't be closed-source because it would call a Python GPL package, even just interpreted Python. Changing R's data.table to MPL allows pydatatable to be MPL which would then allow closed-source Driverless-AI to use it. I see it a good compromise between these two very differently licensed communities.
Q: Has Matt been asked about license changes before?
A: Yes. One person in the Julia community asked me last year if they could use fread.c in Julia. At the time I declined because Julia is MIT and allows, for example, JuliaPRO to be created. I didn't want to contribute for free to closed-source JuliaPRO. If data.table changed to MPL, it would be easier for Julia to use fread.c. They could, if they wished, take fread.c and include it in Julia with the MPL license. Any improvements to fread.c could not be made in JuliaPRO, however, without open-sourcing those improvements. Which I think is fair to fread's contributors. fread.c has already been separated/agnostified from the R API (freadR.c) so it should be easier to hook up into Julia. It is already hooked up into pydatatable.
Q: Has Matt ever been asked to change data.table to Apache?
A: Yes, recently. I discussed with Arun and we declined. I continued negotiations, discovered MPL, agreed with Arun and am now putting this forward to you all.
Q: Can any contributor make a pull request to change the license?
A: Yes. As long you gain approval from all contributors, the change would be made.
Q: You said MPL is the "lowest we can go" and is lesser than the LGPL. Is there an independent table or something to look at?
A: See https://en.wikipedia.org/wiki/Comparison_of_free_and_open-source_software_licenses. Concentrating on the first 3 columns of colored boxes (headings: Linking, Distribution, Modification), observe:
Apache & MIT : all 3 boxes green -- we think that is too lax
LGPL : all 3 boxes blue -- the "With restrictions" blue status in the first column for Linking is opposed (clauses 4d0 and 4d1 of LGPL-3)
MPL : first box green, other two blue.
No other license on the list has green, blue, blue. So MPL uniquely matches our intentions.
Q: Is there any debate about which version of MPL to choose?
A: No. v2 of MPL is clear. Unlike GPL and LGPL which have their quirks about versions and combinations of versions.
Q: What about the license of data.table's dependencies?
A: data.table does not have any dependencies other than R itself so this isn't an issue. If data.table depended on any GPL packages then we would not be free to choose MPL.
Q: Is Mozilla Public License (MPL) recognized by CRAN as an acceptable license for a package?
A: Yes it is listed in https://svn.r-project.org/R/trunk/share/licenses/license.db with the acronym MPL-2.
Q: Why will the license field in DESCRIPTION contain "MPL-2 | LICENSE"
A: Google lawyers do not accept CRAN's acronyms or links. They require the actual license file to be present. The LICENSE file is a verbatim copy of the MPL-2. If we put just the LICENSE file then people might think it was a special unique license. So we've ended up with both the acronym and the file.
Q: What if Matt dies?
A: CRAN maintainers would ask the next largest contributor if they would like to be maintainer: currently, Arun.
Q: What if a contributor cannot be contacted?
A: That is one reason why this is an email first. To see if it is even possible to achieve 100% approval.
Please reply to indicate if you're ok in principle for the pull request to be created and your permission to be asked for in public there. I need a reply from everyone please to know whether full agreement is possible.
Thanks!
Matt