Skip to content

Releases: gwu-libraries/sfm-docker

Version 2.5.0

01 Nov 19:49
Compare
Choose a tag to compare
  • See changes to example.docker-compose.yml, example.prod.docker-compose.yml, and smoketests.docker-compose.yml reflecting upgraded images in this release (using Python 3.8/Debian 10) (sfm-ui #1071).
  • Upgrades the processing and smoketests images to use Python 3.8.
  • Upgrades dependencies in those images for use with Python 3.8.

Version 2.4.0

07 Jul 14:05
Compare
Choose a tag to compare

Changes in this release:

  • Introduces support for hosting data volumes on different filesystems, rather than as subdirectories in a single sfm-data directory (#30). This allows storage for RabbitMQ, Postgres, and SFM data for exports, containers, and collection sets to be separately configured. Thank you, @SvenLieber, for code contributions to add this feature! Existing SFM instances upgrading to 2.4.0 should read the notes below carefully for required new environment variables and changes to docker-compose.yml.
  • Updates dependencies in the processing container, including Twarc and JWAT Tools. (#31)
  • Adds configuration for AWS Elastic Load Balancer. (#22) Thanks, @justinlittman!

New configurations for existing SFM instances:

IMPORTANT: Because of the numerous changes required for existing .env and docker-compose.yml files, we strongly recommend either:

  • Re-copying the new example.env to .env and the new example.prod.docker-compose.yml to docker-compose.yml, and re-customizing.
  • Updating .env and docker-compose.yml based on the new example.env and example.prod.docker-compose.yml files in version 2.4.0.

Existing SFM instances will need to set the following new variables in their .env to point their existing data directories.

  • DATA_VOLUME_MQ
  • DATA_VOLUME_DB
  • DATA_VOLUME_EXPORT
  • DATA_VOLUME_CONTAINERS
  • DATA_VOLUME_COLLECTION_SET

Example: The value should have the path to your sfm-data on the filesystem on the left of the : and the path inside the container on the right. If your data is currently on the filesystem in /sfm-data, use the following examples:

  • DATA_VOLUME_MQ=/some-local-path/sfm-mq-data:/sfm-mq-data
  • DATA_VOLUME_MQ=/sfm-data:/sfm-mq-data
  • DATA_VOLUME_DB=/sfm-data:/sfm-db-data
  • DATA_VOLUME_EXPORT=/sfm-data:/sfm-export-data
  • DATA_VOLUME_CONTAINERS=/sfm-data:/sfm-containers-data
  • DATA_VOLUME_COLLECTION_SET=/sfm-data:/sfm-collection-set-data

In order for SFM to find WARCs and exports at former internal paths, configure in .env:

  • DATA_VOLUME_FORMER_COLLECTION_SET
  • DATA_VOLUME_FORMER_EXPORT

Example:

  • DATA_VOLUME_FORMER_EXPORT=/sfm-data/export:/sfm-data/export
  • DATA_VOLUME_FORMER_COLLECTION_SET=/sfm-data/collection_set:/sfm-data/collection_set

In docker-compose.yml, these data volumes need to be uncommented at the end of the data container definition.

Monitoring space usage on a shared filesystem

With data volumes now configurable to live on mounted filesystems, SFM will monitor space usage of each volume. Thresholds to trigger warning emails can be set for each volume. However, since most SFM instances currently store data on a single filesystem, for meaningful monitoring when all data is on the same filesystem, existing SFM instances must include the new environment variables (see example.env).

  • In .env: set DATA_SHARED_USED to True and set DATA_SHARED_DIR to the path of the parent directory on the filesystem, e.g. /sfm-data.
  • In .env: Provide a threshold for space usage warning emails to be sent by updating DATA_THRESHOLD_SHARED.
  • In docker-compose.yml: uncomment the volumes section in the ui container definition so that the DATA_SHARED_DIR is accessible to SFM for monitoring. See example.prod.docker-compose.yml and example.docker-compose.yml for these configurations.
  • SFM instances which are not using a shared filesystem for data and which are making use of the new option to store data volumes on mounted filesystems should:
    • In .env: set DATA_SHARED_USED to False and comment out DATA_SHARED_DIR
    • In docker-compose.yml, comment out the ui container's volumes section which refers to DATA_SHARED_DIR.

Changes to Postgres and RabbitMQ environment variable names

There are several new environment variables that must be included in docker-compose.yml container definitions. See example.prod.docker-compose.yml and example.docker-compose.yml for these updates. Note in particular:

  • The db container's reference to POSTGRES_PASSWORD is changed to POSTGRES_PASSWORD=${SFM_POSTGRES_PASSWORD}.
  • The mq container has changes to environment variables.

Version 2.3.0

04 May 14:27
Compare
Choose a tag to compare

This release requires an upgrade of the Postgres database. See required upgrade steps below.

Changes in this version include:

  • Upgrade of Postgres database from 9.4 to 9.6.
  • Optional cookie consent pop-up (#1009). Instructions for enabling below.
  • Optional GW footer (#1003). Instructions for enabling below.

For a complete list of tickets, see sfm-ui milestone 2.3.0.

Upgrading Postgres

Stop SFM and bring up only the database container

  1. Stop containers
    docker-compose stop -t 180 twitterstreamharvester
    docker-compose stop -t 45

  2. Bring up just the database container
    docker-compose up -d db

Create a backup

  1. Before doing the upgrade, we recommend you first create a backup of the database, using the following command, where pgdump is the name of the backup file:
    docker exec sfm_db_1 pg_dumpall -U postgres > pgdump

  2. You can then review the dumpfile:
    cat pgdump | less

Upgrade the database

  1. Remove the existing database container
    docker-compose stop db
    docker-compose rm -v db

  2. Create an initial Postgres 9.6 database in a new directory alongside the existing postgres database.
    Use the path for your /sfm-data/postgresql directory as the first element of the volume parameter in the docker run command. Substitute your actual postgres password (this is in your .env file) for password. For example, if your existing database is within /sfm-data/postgresql (it is probably in /sfm-data/postgresql/data) and your password is password123, the command would look like:

docker run --name postgres -d -v /sfm-data/postgresql/9.6/data:/var/lib/postgresql/data \
        -e POSTGRES_PASSWORD=password123 postgres:9.6
  1. Stop and remove the postgres container:
    docker stop postgres
    docker rm -v postgres

  2. Run the postgres upgrade image, changing the sfm-data path to match yours:

docker run --rm \
    -v /sfm-data/postgresql/data:/var/lib/postgresql/9.4/data \
    -v /sfm-data/postgresql/9.6/data:/var/lib/postgresql/9.6/data \
    tianon/postgres-upgrade:9.4-to-9.6

Proceed with the rest of the SFM upgrade
Continue with the SFM upgrade, following step 2 of the upgrade instructions, "Make a copy of your existing docker-compose.yml and .env files".

Cookie consent popup

Version 2.3.0 adds a new configurable cookie consent popup. The user's consent is valid until the user clears their browser cookies, for a maximum of 365 days. This feature is disabled by default.

To enable and configure the cookie consent popup, you will need to modify two files in your sfm-docker directory:

  1. docker-config.yml. View 2.2.0...2.3.0 to see the new lines added to example.prod.docker-compose.yml and example.docker-compose.yml. Apply the same changes to your docker-config.yml.
  2. .env (environment settings file). View 2.2.0...2.3.0 to see the new lines added to example.env. Copy these new lines into your .env file. Configure the new variables as follows:
    1. Set SFM_ENABLE_COOKIE_CONSENT to True.
    2. Modify SFM_COOKIE_CONSENT_HTML to your institution's preferred message text. Note that the text may include HTML tags; for example, you may wish to use <a href> to link to your institution's privacy policy.
    3. If desired, modify SFM_COOKIE_CONSENT_BUTTON_TEXT to change the wording on the button that closes the message banner. The default wording is I consent.

GW footer

Version 2.3.0 adds a new, GW-specific footer which is disabled by default. When enabled, the GW footer appears below the standard footer. If you opt to use this footer, you will need to modify two files in your sfm-docker directory:

  1. In your .env file, set SFM_ENABLE_GW_FOOTER to True. View 2.2.0...2.3.0 to see the new lines added to example.env.
  2. In your docker-compose.yml file, add SFM_ENABLE_GW_FOOTER to the environment variables for the ui container. View 2.2.0...2.3.0 to see the new lines added to example.prod.docker-compose.yml and example.docker-compose.yml.

Release notes for specific components:

2.2.0...2.3.0

Version 2.2.0

08 Sep 03:14
Compare
Choose a tag to compare

Bump version in example files.

Version 2.1.0

03 Jan 15:51
Compare
Choose a tag to compare

This release adds the SFM_EMAIL_FROM environment variable. It is optional, unless you are using AWS SES (Simple Email Service) to send notification emails.

2.0.2...2.1.0

Version 2.0.2

13 Aug 21:19
Compare
Choose a tag to compare

Various minor tweaks:

  • Fixed serialization / deserialization and other management commands.
  • Fixed display issue with credentials on collection detail page.
  • Made SFM UI queue length configurable.

See release notes for 2.0.0 for relevant information. As an alternative to a full upgrade, only SFM UI and SFM UI Consumer can be set to 2.0.2.

Version 2.0.1

07 Aug 17:59
Compare
Choose a tag to compare

Patch for warcprox threading bug.

See release notes for 2.0.0 for relevant information.

Version 2.0.0

24 Jul 15:28
Compare
Choose a tag to compare

Major improvements in SFM in version 2.0.0:

  • Upgraded to python 3.
  • Upgraded to django 2.
  • Upgraded to warcprox 2.
  • Upgraded most other dependencies to latest.
  • Replaced deprecated IA WARC library with warcio.

Known issues:

  • All existing scheduled harvests are removed. See deployment notes for how to handle.

Release notes for specific components:

For a complete list of tickets, see sfm-ui milestone 2.0.0.

To upgrade to this version of SFM, follow the general upgrade instructions.

Because of changes in apscheduler (which is used to schedule harvests), all scheduled jobs are purged during the upgrade. To fix this, all collections that are turned on (excluding Twitter filter and sample stream collections) must be turned off and turned on.

A collection that must be re-rescheduled is on, but is not scheduled. This is indicated on the collection detail page by the presence of a red button that says Turn off appears on the upper right, but no blue notification that says "Next harvest scheduled for ...".

If you press the turn on button and then the turn on button, the collection will be scheduled as indicated by the blue notification that says "Next harvest scheduled for ...".

After this upgrades, make sure to monitor your collections to make sure harvesting is occurring properly.

Version 1.12.1

15 Jun 16:49
Compare
Choose a tag to compare

Installed python3 in processing container.

To use this patch, change the version of processingcontainer to 1.12.1 in docker-compose.yml.

Version 1.12.0

12 Jun 13:55
Compare
Choose a tag to compare

Major improvements in SFM in version 1.12.0:

  • Deprecated web harvester. This will be replaced by other approaches in future releases that involve sending URLs to external web archives. Existing web harvests will not be deleted, but no new web harvests will be performed.
  • Deprecated ELK. This is replaced by TweetSets, which provides a more scalable approach for indexing social media posts in ElasticSearch. An existing ELK instance can continue to run, but no new social media posts will be loaded.
  • To improve citability of datasets, added public links field to collections and citation guidance to documentation.
  • Added automatic, configurable seed deletion for seeds that have been suspended, deleted, protected, etc.
  • Added support for deactivating credentials, for credentials which are no longer valid.
  • Removed pinning of transitive dependencies to assist with managing dependency change.
  • Worked to enable clean shutdown (status code 0) of containers.
  • Switched to used Twarc's Json2Csv for exporting tweets.

Changes in sfm-docker:

  • Upgraded processing containers to newer Ubuntu and added / upgraded tools.
  • Removed ELK and web harvester.

Changes in docs:

  • Fixed links to Twitter docs.
  • Add citation guidance page.
  • Updated processing container docs to reflect changes / additions.
  • Corrected smoke test instructions.
  • Deprecated web harvester and ELK.
  • Updated Twitter data dictionary to reflect change in Twitter export.
  • Update Export documentation to add detail about time zones.

Known issues:

  • No significant known issues

Release notes for specific components:

For a complete list of tickets, see sfm-ui milestone 1.12.0.

To upgrade to this version of SFM, follow the general upgrade instructions. In your .env file, remove the WEB HARVESTER CONFIGURATION SECTION and the WEB_REQS line.

Also, change the versions of twitterrestexporter and twitterstreamexporter to 1.12.1 in your docker-compose.yml file.

After SFM is upgraded, execute docker system prune -a.