https://data.sba.gov is the home of open data at the Small Business Administration. The platform of choice for hosting this data is called CKAN. Within this repository, you will find a set of Docker image configurations that make up the services required to run CKAN in a containerized cloud environment.
Custom Docker Images
CKAN is currently being stood up in a new AWS account. Data will need to be migrated from the old sba.gov
aws account to the new ckan
account. In order to migrate data perform the following steps:
- use
pg_dump
to dump the database to s3
pg_dump -U ckan_default -h ${HOST} --clean --if-exists -d ckan_default > ckan.dump
aws s3 cp s3://230968663929-us-east-1-ckan-migration/ckan.dump
- Restore the database into the new environment
pg_restore -U ckan_default -h ${HOST} --clean --if-exists -d ckan_default < ckan.dump
- Copy the assets from s3 to EFS
aws s3 cp s3://230968663929-us-east-1-ckan-migration/resources/ resources/ --recursive
aws s3 cp s3://230968663929-us-east-1-ckan-migration/storage/ storage/ --recursive
- Set the following permissions on the CKAN EFS mount:
chown -R 92:root .
- Connect to the CKAN container and perform a DB upgrade
ckan -c /srv/app/ckan.ini db upgrade
- Reindex search
ckan -c /srv/app/ckan.ini search-index rebuild
Plugins are not native and require additional installation steps.
-
xloader
: Loads CSV and similar data into CKAN's datastore. Designed to replace DataPusher. GitHub. -
google-analytics
: Puts the google analytics asynchronous tracking code into your page headers. GitHub -
s3filestore
: Use Amazon S3 as a filestore for resources. GitHub -
ckanext-dcat_usmetadata
: This extension provides a new dataset form for inventory.data.gov. The form is tailored to managing metadata meeting the DCAT-US Schema GitHub -
ckanext-datajson
: Plugin datajson provides a harvester to import datasets from other remote /data.json files. See below for setup instructions GitHub -
ckanext-usmetadata
: expands CKAN to offer a number of custom fields related to the DCAT-US Schema [GitHub](expands CKAN to offer a number of custom fields related to the DCAT-US Schema)
envvars is now native to CKAN2.10
ckanext-envvars
: This CKAN extension checks for environmental variables conforming to an expected format and updates the corresponding CKAN config settings with its value GitHub
Extensions are native and can be enabled by configuring the CKAN config file.
datastore
: Provides an ad hoc database for storage of structured data from CKAN resources. Data can be pulled out of resource files and stored in the DataStore. Documentationstats
: Analyzes your CKAN database and displays several tables and graphs with statics about your site. Documentationtext_view
: Displays files in XML, JSON or plain text based formats with the syntax highlighted. Documentationrecline_view
: Deprecated and should not be used. Recommends if needed to replace withReact Data Explorer
. Documentation
All dependencies are currently managed in /ckan/requirements.txt
. If separate installation is required please review the documentation links above or follow the install instructions below.
- install the plugin
pip install ckanext-googleanalytics
- add
google_analytics
tockan_plugins
in.env
ckan__plugins="datastore datapusher stats text_view recline_view envvars googleanalytics"
- install the plugin
pip install ckanext-dcat_usmetadata
- add
ckanext-dcat_usmetadata
tockan_plugins
in.env
ckan__plugins="datastore datapusher stats text_view recline_view envvars dcat_usmetadata"
- confirm
ckanext-dcat_usmetadata
is installed by running the following command:
ckan dcat_usmetadata --help
** This plugin appears to create a problem with google analytics **
pip install ckanext-usmetadata
- Add
ckanext-usmetadata
toCKAN_PLUGINS
in.env
CKAN__PLUGINS="datastore datapusher stats text_view recline_view envvars usmetadata"
- Confirm
ckanext-dcat_usmetadata
is installed by creating a dataset. The form should have a message at the top stating the following:
The following fields are required metadata for each dataset in an agency’s inventory (per Section 202 of the OPEN Government Data Act). For more information about the form fields, consult the DCAT-US Schema.
** This plugin appears to create a problem with google analytics and other plugins **
pip install ckanext-datajson
pip -e git+https://github.com/ckan/ckanext-harvest.git@2e5ac42f3ba58dd4bcb1e69a783e155828ff4b89#egg=ckanext-harvest
pip install requirements.txt (contains other packages required for harvest)
- Add
ckanext-datajson
toCKAN_PLUGINS
in.env
CKAN__PLUGINS="datastore datapusher stats text_view recline_view envvars datajson harvest datajson_harvest datajson_validator"
- Confirm
ckanext-datajson
is installed the endpoint for validation should be accessible at:
http://domain.com/dcat-us/validator
The purpose of this section is to demonstrate how user accounts can be created, password can be reset, and if necessary promote a user to sysadmin status using the ckan command line utility from a running container. This section assumes that you are using the docker-compose solution provided and that CKAN services are already running on your local machine or that you have a running AWS Fargate service.
Please be sure that the present working directory is the root of this project and the following software has been installed and configured.
- docker
- docker-comose
- awscli
- jq
If you are running CKAN using the provided docker-compose
solution then you can gain shell access to the running container using the following command.
$ docker-compose exec ckan /bin/bash
If you are running CKAN as a Fargate service in AWS then you can gain shell accesss to the running container considering the service has the execute-command enabled. At the root of this repository we have crafted a shell script named fargate-service-list.sh
that will generate the awscli
command necessary for connecting to a running running service task.
Simply run the script which will prompt for some information:
- Choose an ECS Cluster
- Then choose the ECS Service found on that cluster
- Then choose the ECS task running under the provision of that service
It will then generate a command to copy and paste which will look like the following example:
aws ecs execute-command --interactive --cluster production \
--task 11223344556677889900 --container ckan --command '/bin/bash'
Display a list of users.
$ ckan user list
Creating a new user.
# with prompt
$ ckan user add 'username'
# without prompt
$ ckan user add 'username' email='email' password='password'
Reset a users password.
$ ckan user setpass 'username'
Remove a user.
$ ckan user remove 'username'
Display a list of sysadmin users.
$ ckan sysadmin list
Promote a user to sysadmin.
$ ckan sysadmin add 'username'
Demote a sysadmin user.
$ ckan sysadmin remove 'username'
Requirements are:
- docker installed
- docker-compose installed
This solution will also require an entry into your hosts
file of the following 127.0.0.1 sba.ckan.com
and this file can be found respectivly based on your OS at:
- Windows:
c:\windows\system32\drivers\etc\hosts
- Linux:
/etc/hosts
# In a Linux setting this is easy!
$ sudo echo "127.0.0.1 sba.ckan.com" >> /etc/hosts
Note:
To explain why this is necessary please understand that in a production setting the CKAN_SITE_URL
variable must be able to resolve. When a dataset is uploaded, CKAN tracks that file in Solr as a fully qualified URI which triggers a DataPusher job to process that file. If the URI cannot resolve then the DataPusher job will fail and the preview
option of that dataset in the browser will be unavailable.
This docker-compose solution uses a custom bridge
network where each service is assigned a static IPv4 address. This way we can use the extra_hosts
option of the DataPusher service to map sba.ckan.com
to the static IPv4 address assigned to CKAN allowing it to resolve both on your local machine and by the DataPusher virtual machine.
- Open a command line shell
- Run
docker-compose build
to build all images in the solution - Run
docker-compose up
once the images have been built - Wait for services to come online, and the databases to be initialized
- Interface with the following service via a web browser:
- CKAN @ http://sba.ckan.com
- FakeEmail @ http://sba.ckan.com:1080
- Solr @ http://sba.ckan.com:8983
- Open another command line shell
- Run
docker-compose -f docker-compose.sysadmin.yaml run --rm sysadmin
to create the ckanadmin user - Login to CKAN using
ckanadmin
as both the username and password - Login to FakeEmail using
fake
as both the username and passowrd - Enjoy!
- Open a command line shell
- Run:
docker-compose down
to remove all contaienrs and networksdocker-compose down -v
to remove all containers, volumes, and networksdocker-compose down -v --rmi all
to remove all containers, volumes, and networks and images
- Open a command line shell
- Run:
docker compose -f docker-compose-new.yaml down
to remove all containers and networksdocker-compose -f docker-compose-new.yaml down -v
to remove all containers, volumes, and networksdocker-compose -f docker-compose-new.yaml down -v --rmi all
to remove all containers, volumes, and networks and images
This project is built with CircleCI and has the configuration in this repository.
When a new branch is pushed to GitHub, circleci will:
- Tests the docker builds
- Runs a terraform fmt
- Run a snyk scan on the built image
No jobs besides the feature branch
jobs run without a tag. On a Staging or Production tag push the following jobs run:
- ckan-solr-build-push
- ckan-datapusher-build-push
- ckan-build-push
- test-terraform-plan
- deploy-services-${env}
To trigger a tag based deployment please see the instructions below.
To trigger a build/deploy workflow for a specific environment, the following git tags can be used for their respective environments:
- Staging ->
rc-vX.X.X
- Production ->
vX.X.X
Staging Example:
git tag rc-v1.0.0 && git push origin rc-v1.0.0
Production Example:
git tag v1.0.0 && git push origin v1.0.0
We welcome contributions. To contribute please read our CONTRIBUTING document.
All contributions are subject to the license and in no way imply compensation for contributions.
We strive for a welcoming and inclusive environment for all SBA projects.
Please follow this guidelines in all interactions:
- Be Respectful: use welcoming and inclusive language.
- Assume best intentions: seek to understand other's opinions.
Please do not submit an issue on GitHub for a security vulnerability. Instead, contact the development team through HQVulnerabilityManagement. Be sure to include all pertinent information.
The agency reserves the right to change this policy at any time.