Open source software communities have shown the power of open collaboration building some of the world’s most important software assets together. There are communities also looking to collaboratively build datasets that can be shared and developed in a very similar model to software. For example, machine learning and AI systems require vast amounts of training data. Governments are looking for ways to establish public-private sharing of data. The challenge is that intellectual property systems around the world treat data different than software. Our common OSI-approved licenses do not work well applied to data.
Our communities wanted to develop data license agreements that could enable sharing of data similar to what we have with open source software. The result is a large scale collaboration on two licenses for sharing data under a legal framework which we call the Community Data License Agreement (CDLA).
There are two initial CDLA licenses. The CDLA-Sharing license was designed to embody the principles of copyleft in a data license. In general, if someone shares their data, the CDLA-Sharing agreement puts terms in place to ensure that downstream recipients can use and modify that data, and are also required to share their changes to the data. The CDLA-Permissive agreement is similar to permissive open source licenses in that the publisher of data allows anyone to use, modify and do what they want with the data with no obligations to share any of their changes or modifications.
These two licenses establish the framework for collaborative sharing of data that we have seen proven to work in open source software communities. The context document should be helpful for understanding the framework to apply the CDLA. We encourage communities and organizations seeking to share data to review the Community Data License Agreements and see if they fit your needs and use cases.
Please visit https://cdla.io/ for additional details and resources.