Skip to content

Apache Incubation Proposal

gsergiu edited this page Jun 9, 2016 · 22 revisions

This is an in progress potential proposal to the Apache Software Foundation Incubator which would start the process of Annotator becoming (after the incubation process is complete) a top-level Apache project.

We are currently discussing this draft on the annotator-dev@ mailing list.

Additionally, we've submitted this draft to the ASF incubator list and posted it to the Apache Incubator wiki--which should now be considered the canonical version (which will be updated from this one as/when needed...i.e. prior to final voting).

If you have existing Annotator related code, and would like to join this proposed project, please post and introduction to the annotator-dev@ list highlighting what you've built and if you are willing to bring your code and/or expertise to the ASF (see: the Apache iCLA and CCLA for related legalese).


Based on the Apache Incubator Proposal Template. Block quotes are descriptions of the section pulled from the proposal template.

Abstract

A short descriptive summary of the project. A short paragraph, ideally one sentence in length.

Annotation enabling code for browsers, servers, and humans.

Proposal

A lengthier description of the proposal.

The Annotator community seeks to build a foundational set of libraries under a liberal license providing the pieces necessary for developers to add annotation to their projects.

Background

Provides context for those unfamiliar with the problem space and history.

Annotator.js was originally created by Open Knowledge (formerly The Open Knowledge Foundation) to provide annotation over works by Shakespeare. Since that time, Annotator has found its way into a wide range of browser-based annotation systems such as Hypothes.is, LacunaStories.com, and various academic, publishing, and scientific research projects.

Sadly, this increased usage has primarily happened in forks of the main code or through copy-left licensed plugins that prevent their use by many community members.

However, the community remains interested in combined collaboration and interested in a foundational future for annotation--both in browsers as well as servers and desktop/mobile applications.

Rationale

Explains why this project needs to exist and why should it be adopted by Apache.

Annotation is often implemented in projects in ad hoc ways with developers often re-solving problems well known to the Annotator community. The Annotator community works to provide knowledge and code to help developers more quickly implement or improve annotation within their projects.

We believe bringing the Annotator community into the Apache Software Foundation will allow for wider recognition of the annotation problem space, help more developers find their way to solving this shared problem, provide increased cohesion for our own somewhat fractured community, and increase the use of commonly shared code within a wide range of projects.

Initial Goals

  • create a collaborative space for the existing Annotator contributors and community
  • further ignite interest and activity around annotation
  • build foundational libraries for annotation
  • implement code to support the Web Annotation Data Model, Protocol, and other annotation related specifications
  • potentially re-license Annotator under the Apache License 2.0
    • Annotator is currently licensed under a combination of the MIT & GPL
  • consolidate (where possible) community activity around building add-ons, annotation storage providers, and use-case specific feature sets
  • grow interest and activity in annotation

Current Status

Meritocracy

Apache is a meritocracy.

The project is in transition from a primarily BDFL-based model to one with a more diverse set of committers. There are 36 total known commiters to Annotator. 3 commiters having done the bulk of the coding and decision making. 2 of those commiters acting as project leadership.

However, the community is much larger and more diverse when the various forks and plugin authors are considered.

We intend to invite and include participants from a wide array of annotation problem spaces to collaborate in this new shared space.

Community

Apache is interested only in communities.

Community calls had been being done every 3-6 months with reports of the calls outcome being posted to the mailing list and the annotatorjs.org website.

Most activity within the project happens on the mailing list. There is also a relatively inactive #annotator channel on irc.freenode.net. The website is primarily for promotion and includes promotion of community plugins and showcases projects using Annotator. Documentation is published on readthedocs.org and linked to from the website.

There are many Annotator and W3C Annotation Data Model related projects found on GitHub. Our objective would be to invite these communities to join this collaborative community with the hope of greater stability and community longevity.

Core Developers

Apache is composed of individuals.

The 3 primary committers to the project are Nick Stenning of The Hypothesis Project, Randall Leeds of Medal, and Aron Carroll of Dropbox, Inc. Nick Stenning is the original creator of Annotator. Randall Leeds is an Apache CouchDB committer. Aron has been a frequent contributor. All three have been members of The Hypothes.is Project in past years.

Other currently active community members include:

  • Andrew Magliozzi of FinalsClub.org
    • Andrew drives the scheduling of community calls, is active on the mailing list, and encourages progress within the project and community
  • Benjamin Young of Wiley (also formerly of The Hypothes.is Project)
    • an Apache CouchDB commiter
    • co-editor of the Web Annotation Data Model
  • Oliver Sauter of WordBrain
    • active advocate for Annotator and the growth of the annotation community

Other committers have contributed significant amounts of code, content, or issues and discussions, but are currently (in the last 3-6 months) less active on the project. However, at recent annotation related conferences the scale of the plugin, fork, and ancillary project activity was shown to be much higher than what was apparent from activity on the main Annotator mailing list--in part due to community fracturing...something we hope to fix with joining the ASF.

A full list of Annotator contributors can be seen here: https://github.com/openannotation/annotator/graphs/contributors

Alignment

Describe why Apache is a good match for the proposal.

The Annotator community believes that the Apache Software Foundation promotes and enforces the sort of community that will best serve the future of the project. It is also believed that Annotator can serve the ASF by providing its tools to bring annotation into various Apache projects and eventually to the apache.org site, project documentation, and other tools within the ASF.

The priority is on increasing community involvement, defining--via the Apache Way--how we will code and collaborate going forward, and upon creating the best possible annotation solution born out of that collaboration.

Known Risks

An exercise in self-knowledge. Risks don't mean that a project is unacceptable. If they are recognized and noted then they can be addressed during incubation.

Orphaned products

A public commitment to future development.

The majority of the core committers were formerly from The Hypothes.is Project which used an earlier version of Annotator within it's annotation web service and BSD-licensed h annotation software. However, Hypothesis and most other organizations and projects using Annotator have forked the main code base or created unique plugins which only exist within their projects and have not been contributed upstream.

The fracturing of the community and previous single-entity contribution has greatly prohibited collaboration and growth of the community. Concurrently, interest and growth of annotation projects from a wide constituents has grown--though around a much wider array of code and projects. The hope is that the creation of a collaborative space built for discussion and sharing of these tools would provide the opportunity to reach a common core to be shared among the many diverse players.

As such, the Annotator project has begun the process of becoming an Apache project to establish a development and community process that encourages diversity and cross-organization collaboration.

Inexperience with Open Source

Annotator was established as an Open Source project in 2011 with it's first, v0.0.1 release being made on January 1st of that year: https://github.com/openannotation/annotator/releases/tag/v0.0.1

The project has continued since that time as an open source project developed on GitHub. The community has grown in diversity since that time and was moved into a separate "openannotation" GitHub organization (from the original "okfn" GitHub organization) in 2014 in an effort to increase community involvement and diversity.

Each of the core committers have worked on and created open source software for themselves or various organizations for the greater than 5 years. Two of the contributors mentioned above also have greater than 5 years contributor experience at the ASF and are both now core committers to a top-level project (Apache CouchDB).

Homogeneous Developers

Healthy projects need a mix of developers. Open development requires a commitment to encouraging a diverse mixture. This includes the art of working as part of a geographically scattered group in a distributed environment.

Active community members as well as plugin and compatible annotation storage system builders are from a diverse, though scattered, range of organizations and individually driven projects.

The Annotator community is seeking to combine its efforts into a core group of committers to more accurately encourage a shared foundation as well as continue the growth in diversity of the community.

Geographically, the Annotator community is widely distributed from Germany, Hungary, the East and West coasts of the US, and Australia.

Additionally, the wide range of annotation related projects that may be considered as input for this projects code explorations range in size, contributor diversity, and growth.

Reliance on Salaried Developers

A project dominated by salaried developers who are interested in the code only whilst they are employed to do so risks its long term health.

In the past, contributors to Annotator project were solely from The Hypothes.is Project and their activity was driven primarily by the needs of that project. However, the diversity of interested participants has greatly increased. There is an additional hope of creating an aggregated community from various projects (including Annotator, Hypothesis' h code, and various related libraries and plugins) as well as exploring the creation of new tools--not only for the browser--to further widen the interest and activity around annotation.

Relationships with Other Apache Projects

Apache projects should be open to collaboration with other open source projects both within Apache and without. Candidates should be willing to reach outside their own little bubbles.

The Annotator community also provides an annotation storage system ("annotator-store") built upon ElasticSearch. There are compatible implementations of that API built on various storage systems (including Apache CouchDB), and the community would encourage the creation of other compatible storage systems built upon other Apache storage projects.

Additionally, Annotator is a JavaScript library which could serve any of the various CMS projects within Apache.

The roadmap for Annotator also includes compatibility with the Web Annotation Data Model which is a JSON-LD serialization of an RDF-based data model for annotation. The growing number of RDF-focused Apache projects could take advantage of and contribute to the creation of these features.

The W3C Annotation Working Group is also creating multiple related deliverables around Web Annotation including an Linked Data Platfrom-based Protocol specification, a note about selector systems, and future notes for various serialization and integration opportunities for the Web Annotation Data Model. Apache Marmotta is one project within the ASF which has native support for LDP and may have an interest in collaborating around implementation of the Web Annotation Protocol.

Lastly, Apache UIMA can currently generates Open Annotation Data Model annotations as an output of it's Natural Language Processing system. These annotations could be displayed via code written within this new Apache project--which could further leverage user interaction with those NLP-based annotation (such as confirmation, rejection, or modification of the annotations made by Apache UIMA's NLP process). There are other NLP projects within the ASF which could similarly benefit from these explorations and code generated here.

A Excessive Fascination with the Apache Brand

Concerns have been raised in the past that some projects appear to have been proposed just to generate positive publicity for the proposers. This is the right place to convince everyone that is not the case.

The Annotator community acknowledges the value and recognition that the Apache brand would bring to the Annotator project. However, the primary interest is in the community building process and long-term stability that the Apache Software Foundation provides for its projects.

We do hope for increased recognition of and contribution to an array of annotation code projects built within this community. However, we primarily hope for community aggregation driven by building a core set of tools for our shared set of needs which are now scattered across various annotation endeavors.

Integrating those developers into this new community and adding them as contributors is seen as a much higher priority then increasing awareness through branding.

Documentation

References to further reading material.

Websites:

Documentation:

Mailing List:

Code:

Annotator plugin index:

Initial Source

Describes the origin of the proposed code base. If the initial code arrives from more than one source, this is the right place to outline the different histories.

The original Annotator code base was created by Nick Stenning while at the Open Knowledge Foundation. The code has been in development since before 2011 with the first public release (v0.0.1) happening on January 1st, 2011 on GitHub.

The example annotation storage system (which works with Annotator's stock Store plugin) had it's first release in February 21, 2011 and was originally built for Apache CouchDB. The contributor list of annotator-store is similar, but the license is simply the MIT (rather than MIT & GPL). The stated copyright is 2010-2012 Open Knowledge Foundation.

Additionally, there is a growing list of forks, plugins, and related tooling created by the community in various places--often embedded within larger projects. The Annotator Plugins index has reference to some such possible inputs to this project's code. The W3C specifications are also being implemented and the growing number of projects available around those specifications would also be considered as possible inputs. Most specifically, Randal Leeds (also a contributor to Annotator) has built a set of libraries focus on implementing the W3C selectors. These libraries could serve as an initial foundation for a core library for browsers or JavaScript-base server code.

Source and Intellectual Property Submission Plan

Complex proposals (typically involving multiple code bases) may find it useful to draw up an initial plan for the submission of the code here. Demonstrate that the proposal is practical.

Our primary goal is to aggregate communities that center around annotation. We intend to focus our initial work on a JavaScript-based library built from Randall Leeds dom-anchor-* libraries (single owner copyright; MIT licensed) and potentially reusing code from Annotator (mixed owner copyright; MIT & GPL dual-licensed).

The Annotator community has a stated copyright owner of "The Annotator Community." All contributions are believed to have been made "in kind" and the copyright owned by the various contributors. The three primary committers have stated a willingness to donate their contributions to the Apache Software Foundation and the minimal parts with copyright owned by others will likely be rewritten. Though we also hope to engage these individuals to join the combined efforts being made at the ASF.

The annotator-store project is under a clearer, single BSD license. The copyright holder is stated to be the Open Knowledge Foundation with the years 2010-2012. It is likely that this code will only be used for reference or via library inclusion and not directly developed upon within the ASF.

An earlier process was undertaken to collect re-licensing permission from known contributors via the existing mailing list and GitHub issues--using a model similar to Twitter's when it relicensed Bootstrap. General agreement was reached, but no decisive actions were taken as many contributors of smaller amounts of code were no longer reachable.

We hope to engage the various plugin and fork authors along with similar annotation projects to engage future work under a shared license and developed within The Apache Way. The contribution of specific code to this project or its future deliverables will be handled individually by the community over the course of the project.

One core goal of bringing the community to the ASF is to avoid this confused licensing situation in the future.

External Dependencies

Annotator depends on the following JavaScript modules from NPM:

  • backbone-extend-standalone - MIT
  • browserify-shim - MIT
  • clean-css - MIT
  • enhance-css - MIT
  • es6-promise - MIT
  • insert-css - MIT
  • jquery - MIT
  • through - MIT / Apache License 2.0
  • xpath-range - MIT + GPL-3.0+ Dual License

annotator-store depends on the following Python modules:

  • elasticsearch - Apache License 2.0
  • PyJWT - MIT
  • iso8601 - MIT
  • six - MIT

MongoServer (a Web Annotation Platform implementation) is a single owner project currently licensed under the Apache License 2.0.

Randall Leeds dom-anchor-* libraries are all licensed under the MIT and include these dependencies:

  • dom-anchor-fragment - MIT
    • no dependencies
  • dom-anchor-text-position - MIT
    • node-iterator-shim - MIT
    • dom-seek - MIT
  • dom-anchor-text-quote - MIT
    • dom-anchor-text-position - MIT
    • diff-match-patch - Apache License 2.0

Required Resources

Mailing Lists
  • private@
  • dev@
  • commits@

Note: the Annotator community currently uses a single list hosted by Open Knowledge at: https://lists.okfn.org/mailman/listinfo/annotator-dev

Git Repository

Note: the Annotator community hosts its code on GitHub as part of the "openannotation" organization. Randall Leeds also uses GitHub for his dom-anchor-* libraries as does Rob Sanderson for his Web Annotation Protocol implementation. These are all potential code inputs to be considered for reuse or continuation by this community.

Issue Tracking

The Annotator community would prefer to continue using GitHub Issues if that is a possibility.

Other Resources
  • static website hosting for annotatorjs.org

Initial Commiters

Affiliations

  • Nick Stenning of The Hypothes.is Project
  • Randall Leeds of Medal
  • Benjamin Young of Wiley
  • Oliver Sauter of WorldBrain.io
  • Aron Carroll of Dropbox, Inc.
  • Andrew Magliozzi of AdmitHub.com
  • Mariano Giagante of WorldBrain.io
  • Luke Murphy of WorldBrain.io
  • T B Dinesh of Janastu.org
  • Rob Sanderson of J. Paul Getty Trust
  • Rebecca Sutton Koeser of Emory University Library & IT Services, Emory University
  • Robert Casties of Max Planck Institute for the History of Science
  • Amanda Visconti of Purdue University
  • Nils Rethmeier of WorldBrain.io, formerly Fraunhofer FOKUS
  • Giulio Andreini of Pundit
  • Raffaele Masotti of Pundit
  • Sergiu Gordea of AIT - Austrian Institute of Technology GmbH

Sponsors

Champion

Daniel Gruno aka humbedooh

Nominated Mentors

TBD

Sponsoring Entity

The Sponsor is the organizational unit within Apache taking responsibility for this proposal. The sponsoring entity can be: the Apache Board, the Incubator, another Apache project

The Incubator