Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of posterior licenses #230

Open
MansMeg opened this issue Mar 5, 2021 · 11 comments
Open

Handling of posterior licenses #230

MansMeg opened this issue Mar 5, 2021 · 11 comments

Comments

@MansMeg
Copy link
Collaborator

MansMeg commented Mar 5, 2021

Is the intention to accept data with closed licenses? If not, the
CONTRIBUTING.md text should be clarified as to what licenses other
than BSD-3 are acceptable. If there are multiple licenses, there
needs to be a master list and they all need to be compatible if you're
going to put them into the same package as very few licenses are as
compatible with other licenses as BSD-3.

@MansMeg MansMeg changed the title Handling of model licenses Handling of posterior licenses Mar 5, 2021
@ahartikainen
Copy link
Collaborator

Should all models here use same permissive license?

@MansMeg
Copy link
Collaborator Author

MansMeg commented Mar 5, 2021

@avehtari will check if Aalto data support team can help us with solving this. If we should have a different licence for data and code.

@bob-carpenter
Copy link
Contributor

Was there ever a licensing decision? I want to import the models elsewhere, but can't do that without a license. Also, this repo needs to cite the source and license of any model retrieved from elsewhere. This might be hard as I think some of them came from the example-models directory somewhere on stan-dev, but that was never released with a license as far as I know.

P.S. The reason I'm asking is that we are building out a database of models at Flatiron Institute with just the model implementation, data, and draws, but we can't do that without a license in place. The main motivation is that (a) we can distribute big data sets through our cluster, and (b) we can strip down the complexity of the R and Python packages so that the distribution is just the Stan programs, data, and draws. This is another thing I don't want to do through the Stan project because I don't want to have to compromise with a bunch of people on the goals or contents.

@MansMeg
Copy link
Collaborator Author

MansMeg commented Oct 19, 2023

So, I opened up this discussion with the Aalto lawyers in 2020, but the pandemic struck, and everything stopped. We need to know how to do this in a good way. As you say, most models are from within the community, and some data I have gotten okay on to put in posteriordb, but we didn't discuss licence.

Do we have someone that knows about licencing of data and this type of this?

@bob-carpenter
Copy link
Contributor

I know the basics, but we also have access to IP lawyers for free through NumFOCUS if there are more complicated issues.

Copyright is automatically assigned to whoever writes code or text. The author can reassign copyright. For example, most faculty contracts at American universities stipulate the all copyright for code is reassigned to the university, but all text is owned by the author. I have no idea what contracts in Sweden are like.

The copyright owner can choose to distribute their copyrighted work with a license. Once the copyright holder does that, you can use it according to the license. For example, other projects can use stan-dev/stan and stan-dev/math and stan-dev/stanc3 code under the BSD-3 license without further permission from the Stan team. (Our name and logo are trademarked, which is a different branch of IP law.)

When you redistribute the copyrighted works of others, you are legally required to respect the licensing terms. They almost always require you to cite the copyright holder and the license under which the copyrighted work is used.

If you try to combine code with multiple licenses into a single project, there's an issue of license compatibility and copyleft. Some licenses are fundamentally incompatible, like Apache 2 and GPL 2, but others are compatible, like GPL 3 and BSD-3. If all you do is redistribute each contribution under its own license, it makes it harder for people to use the project (they have to scan the license for everything they use), but it's otherwise OK to do that. If you have more complicated question, we'll have to get help from a real lawyer.

@MansMeg
Copy link
Collaborator Author

MansMeg commented Oct 19, 2023

So the idea I think @avehtari had was to set the licence for each model and data in the database. So maybe the easiest would be to do that. Then you could filter out everything that has the licence that you are ok with?

@bob-carpenter
Copy link
Contributor

You can create a repo with each component licensed under its own license. There just can't be an overall license unless you find one that's compatible. I'm not sure what that would mean relative to your writing Python or R code that compiles against those models. That's something I'd ask the NumFOCUS IP attorneys.

@MansMeg
Copy link
Collaborator Author

MansMeg commented Oct 20, 2023

Ok. You mean our code? Thats mainly written by me so thats no problem.

So do I understand you correctly that you are happy with clear licences per model/posterior?

@bob-carpenter
Copy link
Contributor

Yes, I mean the R and Python code. It has to be released under some license, but I don't know what the implications would be of it using a bunch of Stan code under different licenses. This is where license compatibility becomes an issue and also a reading of what it means to be a derivative product. If all of the models you are using are licensed under GPL v3 or BSD-3 or Apache 2 or MIT license or similar, you're OK with going with a BSD-3 license for your code. As soon as you try to include something with an incompatible license (e.g. homebrew "academic only" use license or GPL v2), I would urge you to ask NumFOCUS lawyers.

@avehtari
Copy link
Collaborator

posteriordb repo has R code only in tests directory and it seems that directory could be also removed. posteriordb repo has Python code for one PyMC model, but that is the model code. So it seems posteriordb needs just to have clear license per model code and per data. I don't think the posterior draws are under copyright and data license seems also bit silly as hey can be regenerated.

There are separate repositories posteriordb-r and posteriordb-python that have useful utilities for accessing data, model code, and reference posterior from anywhere on the internet. The license for those codes doesn't need to match the licenses in posteriordb, as well as web browser code doesn't need to match licenses of material the browser downloads.

The different model codes in posteriordb are not linked together, and it is just a code repository in the same way as CRAN. CRAN contains packages with different licenses including restrictive license which are incompatible with each other, but in general the CRAN packages are not combined, and in the same way the posteriordb models are independent from each other. Naturally, all the code need to have license that allows us to distribute that single code via the repository.

We do need to mention the licenses for each code in the repository, and remove hose codes for which the licence is not clear and we can't get the original author to license it with something suitable.

@bob-carpenter
Copy link
Contributor

That makes sense, @avehtari. I hadn't realized posteriordb-r and posteriordb-python had been split out. If they're in different repos and don't distribute anyone else's copyright code and don't include posteriordb as a submodule, then I think you should be OK.

I think distributing a repo with a bunch of separately licensed codes is OK. I would personally stay away from anything other than the standard open source licenses, but that's your call.

I haven't thought about copyright on draws. You can only copyright things produced by human, but I don't know what the status of things produced by tools by a human.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants