Contributors

Contributor Agreements

If you are interested in being a contributor, we will need you to complete a contributor agreement. Please contact the project managers for more details.

Wiki

In order to have wiki edit permission, you must have write access to the blazegraph/database github repository.

Developer Discussion

Please use the GitHub Issues for project discussion.

The previous sourceforge developer list is archived and can be searched using this link:

http://sourceforge.net/mailarchive/forum.php?forum_name=bigdata-developers

Development

The main development branch should remain stable at all times. Releases are tagged as branches for maintenance. Major change sets should be created in branches (see below). Discussion regarding the project should take place on the developers list so everyone can participate and benefit.

Consistency and coherence in the architecture and the implementation is critical for databases correctness and performance. Coordinate with component owners before making changes to those components. When in doubt, ask first on the developers list. Final resolution for questions concerning the database architecture will be made by the project administrators.

Maintaining Tickets

Issues are maintained on JIRA.

Developers must:

file an issue on jira.blazegraph.com for any planned work;
accept the issue before making changes; and
update the status for accepted issues at least weekly (Friday morning).

This provides everyone with oversight on planned and active change sets via the jira dashboard and makes it easier to minimize conflicts in the code base.

Pull Requests (GIT)

The proper process for getting changes into the code base is:

Discuss the feature on GitHub. Do this first to make sure that the concept has traction with the developer community. Make sure that you are subscribed to the mailing list first since it will not accept email if you are not subscribed.
Create a ticket for the feature.
Create a feature branch.
Do your work in that branch.
Make sure that you have not broken the tests.
Create a pull request.

Do not commit to the master. Changes will be merged to master from the pull request by one of the project maintainers.

Eclipse EGit Plugin

There is an extremely nice feature in the EGit integration when you can hover over a line of code to see who last modified it. Make sure that EGit is installed. Configure the GIT perspective to point to your local git repository. Right click on an editor and select Team => Show Annotations.

Branching and Merging (GIT)

We strongly recommend taking an hour to work through a Git tutorial:

https://www.atlassian.com/git/tutorials/using-branches

Branching and merging is much, much easier under git. If you want to create your own new branch:

 git checkout -b my_branch

If you want to checkout out someone else's branch:

 git checkout --track origin/daves_branch

To revert to master

 git checkout master

Note: The following will put you in a detached head state where your local repository will not track the remote repository.

 git checkout origin/master

This is generally undesirable. To recover from this do

 git checkout master

To checkout a tagged release, do the following.

 git checkout tags/BIGDATA_RELEASE_1_5_0

Again, this puts you in a detached head state so do

 git checkout master

to get back to master.

Pulling changes up from master

If your feature branch is behind master, you can pull up changes using the following command:

 git merge origin/master

Private Branches (Sandboxes)

Individual developers interested in exploring new concepts may create a private branch to serve as a sandbox in which they can explore those ideas without introducing changes into the trunk.

To discard all changes and revert to a previous commit Find the commit point to restore (or just look at github).

 git log

Reset to that commit point:

 git reset --hard FULL-COMMIT-HASH

Continuous Integration (CI)

CI results are available at [2]. You can download the result of test suite runs. There is an additional artifact for analyzing the logs for the HA CI test suite. If you need a specific branch to be entered into CI, please contact one of the project admins or if you have GitHub Access try the guide below.

Jenkins Configuration

Jenkins is accessed using your GitHub credentials. It is configured to pull automatically from GitHub and spawn up to four EC2 instances dynamically to handle the workload of CI. The authentication is tied to the Github credentials. In general, you should not need to create new Jenkins jobs as the CI should be run through the Github Pull Request integration.

Know Good JVM Settings

 ANT_OPTIONS="-XX:MaxPermSize=256m -Dfile.encoding=UTF-8 -Xmx8g -server -Dsun.jnu.encoding=UTF-8"

The maven options are set in the global Jenkins configuration, but are also included here for reference.

 MAVEN_OPTIONS="-XX:MaxPermSize=256m -Dfile.encoding=UTF-8 -Xmx8g -server -Dsun.jnu.encoding=UTF-8"

Getting thread dumps

To get a thread dump, you must have your SSH public key installed on the Jenkins SSH Slave EC2 image. Create a JIRA ticket to make this request and include your public ssh key. Then, determine the SLAVE IP that ran the job and ssh directly to the machine to grab the thread dump.

Generating an SSH Public Key ssh-keygen -t rsa -b 2048 -f ~/.ssh/blazegraph Hit enter twice for a blank pass phrase or choose one.

 cat ~/.ssh/blazegraph.pub

Include the output of your public key in the JIRA ticket.

Initiating a CI Run from GitHub

To initiate a CI run from GitHub, first create a Pull Request (PR) from GitHub. The CI job will run automatically. If you need to retest without a code change, in the comment of the PR, include the text Computer, please test this. This will trigger an automatic CI run in Jenkins. The results will be posted back into the PR and you can take an appropriate action based on the results of the CI (Success, Failure, Error).

CI Job Naming Conventions

* (CI job for master for github-module). The exception is the bigdata master, which is called GIT_DEVELOPMENT_MAVEN.

*-PR-tester (Pull request tester for github-module), i.e. bigdata-github-maven-PR-tester

Unit tests

Bigdata has a large and growing test suite. Whether you code the unit tests first or after, do not commit code without writing a test suite for that code and verifying the test suite for your changes plus any affected modules. When in doubt, ask or run the entire test suite. After you commit, please review the CI results to see if you have broken anything.

Proxy Test Suites

Some of the test suites in use a "proxy" pattern to allow the same test suite to execute against different implementations or parameterizations of a given implementation. This feature is heavily used to:

exercise different backend storage models (the RWStore, MemStore, etc.);
run (nearly identical) test suites in triples vs RDR vs quads modes; and
exercise the REST API test suite against both embedded and scale-out architectures.

You can specify the delegate for the proxy using

 -DtestClass=fully-qualified-class-name

You may need to hunt around a little bit (typically in the TestAll suite) to figure out what are the different proxy class names that you can use on a given proxy test suite. Some of the common ones are:

TestBigdataSailWithQuads
TestLocalTripleStore
TestRWJournal
TestWORMStrategy
TestNanoSparqlServerWithProxyIndexManager

Running the test suite with maven

You can run the entire test suite using:

 mvn clean package

You can run the tests in an individual class in the test suite using:

 mvn clean package -Dtest=com.bigdata.journal.jini.ha.TestHA1GroupCommit

An example using the proxy test suite from the command line:

 mvn -DtestClass=com.bigdata.rdf.sail.webapp.TestNanoSparqlServerWithProxyIndexManager -Dtest=Test_REST_ASK test

Running the test suite with eclipse

Many of the test suites can be run directly under eclipse. However some of the test suites do have dependencies on external services that must be running before the tests are executed:

The external text index feature depends on SOLR

These external resources are setup through the maven POM associated with the appropriate projects. You can also start these resources yourself and there are examples on how to do this at the bottom of this page for HA/scale-out.

Hunting resource leaks in CI

Add this to Manage Jenkins => Configure under "Global Properties" "Environment variables". The specific path depends on the version of yourkit that is installed on the CI node.

 LD_LIBRARY_PATH /nas/install/yjp-2014-build-14100/bin/linux-x86-64

Add this to the Advanced options for the jenkins project configuration, e.g., where it says "-server -ea" etc. This specific command begins the profiler with everything disabled. Once you connect to the process, you can then selectively enable things. Replace port=XXXXX is something like port=10001. This is the port that you will use to connect to yourkit. See here for the background on setting this up.

 -agentlib:yjpagent=disableexceptiontelemetry,disablestacktelemetry,port=XXXXX

Setup local port forwarding for the CI machine and ssh into it. Again, replace XXXXX with the specific port.

# ~/.ssh/config
Host ci.bigdata.com
#...
LocalForward XXXXX localhost:XXXX

You can then start yourkit locally and connect to the running CI job (if any).

Intellectual Property

Everyone who is a contributor is bound by a signed contributor license agreement (CLA).

Your own work

Your contributions MUST be your own work. DO NOT incorporate code from other projects or other sources. There MUST be an explicit contribution made the the copyright holders before 3rd party intellectual property may be incorporated into the project. Please refer any such matters to the project administrators.

Dependencies

The choice of a dependency is very important and must be made in consultation with the project administrators. In addition to choosing technically sound dependencies, there are also a number of legal rules that must be followed to properly acknowledge the copyright for the dependency and a number of administrative tasks that must be performed to ensure that the dependency is correctly integrated into development, CI, and the various deployment environments.

Adding a dependency

You MUST NOT add a dependency without contacting the project administrators.

The following all need to be addressed when adding a dependency:

what	definition
build.properties	The dependency version number needs to be declared.
build.xml	The dependency needs to be integrated into the WAR, stage, bundleJar, and javadoc (external links), and various other deployment targets. This is both tricky and vital.
pom.xml	The dependency needs to be declared.
Depends.java	The dependency needs to be declared. This is responsible for generating the list of dependencies at runtime as part of the banner.
bigdata-XXX/lib	The dependency needs to be placed into an appropriate library directory with the correct bigdata module. The choice of the module depends on the scope in which the dependency will be used.
bigdata-XXX/LEGAL	The license for the dependency must be placed into the LEGAL directory within the module in which the dependency is housed. The name of the license should include the name of the dependency. E.g., "jetty-license.txt". Many dependencies have the same license, but a separate license file MUST be present for each dependency.
bigdata/NOTICE	This file must include any text from a NOTICE file associated with the dependency. This is a requirement of the Apache license!

Updating a Dependency

You MUST verify that the license associated with a dependency has not changed BEFORE updating that dependency.

You MUST NOT update a dependency if there is has been license change. Instead, refer the matter to the project administrators.'

Coding Style, Copyright comment blocks, and related matters.

Head of file comment block

The correct comment block for the head of each source file is the GPL license block as follows:

/*

Copyright (C) SYSTAP, LLC 2006-2014.  All rights reserved.

Contact:
     SYSTAP, LLC
     4501 Tower Road
     Greensboro, NC 27410
     licenses@bigdata.com

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2 of the License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
*/

Author Tags

Author tags should be provided on each class you create and on each class where you make major changes. This helps us to track who are the most knowledgeable people for a given class.

TODOs

Please use the follow tags to mark todos in the code:

FIXME - Encouraged for more important tasks.
TODO - Encouraged for minor tasks or possible future directions in the code.

Margins and Code Formatting

Please:

Set margins to 80 columns.
Wrap comments and code at the margin.
Set display width of tabs to 4 spaces and set editor to convert tabs to spaces (4). In Eclipse, to set tabs to spaces, there are two settings that must be updated:
Preferences => Java => Code Style => Formatter => Indentation => Tab Policy := Spaces Only
Preferences => General => Editors => Text Editors => [x] Insert Tabs for Spaces
Please do not broadly reformat existing code, especially code for which you are not the primary maintainer, since that makes it significantly more difficult to handle merges.

Conditional Logging

Each class which will have log output should declare its own logger. Loggers should be private, static, and final. Logging at INFO, DEBUG, or TRACE MUST be condition using the pattern:

if(log.isInfoEnabled() {
   log.info(...);
}

Conditional logging is critical for performance. Generating log messages (when they are not directly given strings such as "Hello") produces a tremendous amount of heap churn from String concatenation. Heap churn is evil and must be avoided for performance. Hence, the conditional logging pattern.

System.out and System.err

Do NOT use either System.out or System.err in anything other than a main() routine. It is very difficult to locate the code where such output is being produced and unconditional output not only drives the heap, but it also clogs the CI servers since CI buffers the output of the test suite in memory during the test run.

Eclipse based developers can obtain colorization of their output using grep-console. The defaults colorize java.util.logging output. They can be edited (by removing the square brackets) to also colorize log4j colorizing.

Introduction