Releases: gwu-libraries/sfm-docker
Version 1.4.1
Version 1.4.1 includes the following bug fixes:
- Fix to incremental Tumblr harvesting.
- Fix to creating export manifests and serializing collections that require unicode.
Changes in sfm-docker:
- For production deployments, the version is now controlled by the SFM_VERSION variable in the
.env
file.
Known issues:
Release notes for specific components:
- sfm-ui
- sfm-utils
- sfm-twitter-harvester
- sfm-flickr-harvester
- sfm-weibo-harvester
- sfm-web-harvester
- sfm-tumblr-harvester
- sfm-elk
For a complete list of tickets, see sfm-ui milestone 1.4.0.
To upgrade to this version of SFM, follow the general upgrade instructions except:
- In step 5, make the following changes to the
.env
file:
In the common configuration section add:
SFM_VERSION=1.4.1
In the development section add:
# Set to false to skip pip upgrading requirements. This will allow SFM to start up faster
# and work offline, but will not get the latest requirements.
UPGRADE_REQS=True
This shows the changes made to docker configuration files in this release.
Version 1.4.0
Major improvements in SFM in version 1.4.0:
- Export enhancements include:
- README.txt generated for each export. The README.txt contains documentation on the export and the collection.
- Large exports are segmented into multiple files.
- Work to prevent harvest congestion include:
- When selecting a credential, information is provided on other collections that the credential is used for.
- Harvests will be skipped when the previous harvest has not been completed.
- SFM components are run as sfm user inside of containers instead of root.
- When admins delete database records, the corresponding files are deleted as well.
Changes in sfm-docker:
- The data container now creates the collection set, container, and export directories and changes ownership.
Changes to documentation:
- Added data dictionary for Twitter exports. (Other data dictionaries forthcoming.)
- Added better description of storage.
- Added (limited) guidance on server sizing.
- Provided an explanation of skipping harvests.
- Added documentation on administering SFM, including the Admin Interface.
Known issues:
Release notes for specific components:
- sfm-ui
- sfm-utils
- sfm-twitter-harvester
- sfm-flickr-harvester
- sfm-weibo-harvester
- sfm-web-harvester
- sfm-tumblr-harvester
- sfm-elk
For a complete list of tickets, see sfm-ui milestone 1.4.0.
To upgrade to this version of SFM, follow the general upgrade instructions except:
- In step 5, make the following changes to the
.env
file:
In the volumes section add:
# Group id for sfm group
SFM_GID=990
# User id for sfm user
SFM_UID=990
In the development section add:
# This adds a 100 item export option for testing.
HUNDRED_ITEM_SEGMENT=True
- After step 5, execute the following:
docker-compose run data chown -R 990:990 /sfm-data/collection_set
docker-compose run data chown -R 990:990 /sfm-data/containers
docker-compose run data chown -R 990:990 /sfm-data/elk
docker-compose run data chown -R 990:990 /sfm-data/export
docker-compose run data chown -R 990:990 /sfm-data/heritrix-data
It is possible that some of these directories don't exist and an error is returned. That's OK.
This shows the changes made to docker configuration files in this release.
Version 1.3.1
Version 1.3.1 was a bug fix release. Some of the changes include:
- Email notifications sent to admins when a harvest or export fails with unknown errors.
- Fixes to deserialization (import) of collections.
- Fixes to SFM-ELK container startup and restarts. Loaded data will now persist when SFM-ELK containers are killed or removed.
- Upgrade JQ to 1.5 in processing container.
- Handle blank lines correctly when iterating over tweets from a Twitter stream.
media_url
column has been added to export of Twitter data.
Release notes for specific components:
- sfm-ui
- sfm-utils
- sfm-twitter-harvester
- sfm-flickr-harvester
- sfm-weibo-harvester
- sfm-web-harvester
- sfm-tumblr-harvester
- sfm-elk
For a complete list of tickets, see sfm-ui milestone 1.3.0.
To upgrade to this version of SFM, follow the general upgrade instructions. If you are using an SFM-ELK container, make sure to set the hostname
as shown in the example.
Version 1.3.0
Major improvements in SFM in version 1.3.0:
- Collection portability, allowing collections to be moved to SFM instances and to other environments (e.g., a repository)
- Support for monitoring harvesters/exporters and queues.
- Support for monitoring free space.
- Additional option for running harvest once (instead of a repeating schedule).
Changes in sfm-docker:
- Moved processing volume into its own data container.
- Increased Elasticsearch memory in ELK container.
Changes to documentation:
- Documentation supporting new collection portability feature.
- General instructions for upgrading SFM.
- Documentation for new monitoring features.
Known issues:
- Scheduled serializations are disabled
- Problem with Excel export. Use CSV instead.
Release notes for specific components:
- sfm-ui
- sfm-utils
- sfm-twitter-harvester
- sfm-flickr-harvester
- sfm-weibo-harvester
- sfm-web-harvester
- sfm-tumblr-harvester
- sfm-elk
For a complete list of tickets, see sfm-ui milestone 1.3.0.
To upgrade to this version of SFM, follow the general upgrade instructions except stop all Twitter filter stream collections first. (They can be restarted after upgrade.) The following should be added to the VOLUME CONFIGURATION section of your .env
file:
# sfm-data free space threshold to send notification emails,only ends with MB,GB,TB. eg. 500MB,10GB,1TB
DATA_VOLUME_THRESHOLD=10GB
# sfm-processing free space threshold to send notification emails,only ends with MB,GB,TB. eg. 500MB,10GB,1TB
PROCESSING_VOLUME_THRESHOLD=10GB
This shows the changes made to docker configuration files in this release.
Version 1.2.0
Major improvements in SFM in version 1.2.0:
- Harvesters and exporters better handle expected and unexpected shutdowns.
- Harvesters are more resilient to errors that occur during harvesting.
- Better feedback on the status of harvests within SFM UI.
- Users will be sent regular emails with harvest updates.
Changes in sfm-docker:
- Restricted Docker log size.
Changes to documentation:
- Added page on collection types providing guidance on the different types of social media harvests that can be performed.
- Updated limitations and known issues.
- Updated installation instructions to recommend scaling Twitter REST Harvesters.
Release notes for specific components:
- sfm-ui
- sfm-utils
- sfm-twitter-harvester
- sfm-flickr-harvester
- sfm-weibo-harvester
- sfm-web-harvester
- sfm-tumblr-harvester
- sfm-elk
For a complete list of tickets, see sfm-ui milestone 1.2.0.
To upgrade to this version of SFM:
mv docker-compose.yml old.docker-compose.yml
.- Get
example.prod.docker-compose.yml
file and namedocker-compose.yml
. - Add Docker log configuration, Harvester configuration, and Warcprox debug configuration to your
.env
file. docker-compose pull
- Turn off all running collections.
docker-compose -f old.docker-compose.yml stop
docker-compose up -d
- It is now recommended to scale the number of Twitter REST harvester containers:
docker-compose scale twitterrestharvester=2
. - Test that SFM is up and running correctly.
- Turn on stopped collections.
Version 1.1.0
Major improvements in SFM in version 1.1.0:
- Added support for Tumblr. This includes harvesting and exporting.
- Significant refactoring to Docker configuration. This will make deployments easier, with most configuration performed in a .env properties files. Documentation for both production deployment and development has been updated to reflect the changes.
- Significant performance improvements to extracting social media data from WARCs.
For sfm-docker in version 1.1.0:
- Refactoring of docker images and example docker-compose.yml files.
- Added parallel to processing containers.
- Made processing containers access to /sfm-data read-only and added /sfm-processing for processing files.
Release notes for specific components:
- sfm-ui
- sfm-utils
- sfm-twitter-harvester
- sfm-flickr-harvester
- sfm-weibo-harvester
- sfm-web-harvester
- sfm-tumblr-harvester
- sfm-elk
For a complete list of tickets, see sfm-ui milestone 1.1.0.
The update to Docker configuration requires a special upgrade process:
mv docker-compose.yml old.docker-compose.yml
.- Get
example.prod.docker-compose.yml
file and namedocker-compose.yml
. Getexample.env
file and name.env
. - Update configuration in
.env
to match previoussecrets.env
anddocker-compose.yml
. Pay particular attention to data volume, hostname and ports, and passwords. docker-compose pull
- Turn off all running collections.
mv .env new.env
docker-compose -f old.docker-compose.yml stop
mv new.env .env
docker-compose up -d
- Test that SFM is up and running correctly.
- Turn on stopped collections.
At some later point:
mv .env new.env
.docker-compose -f old.docker-compose.yml rm -v --force
mv new.env .env
rm old.docker-compose.yml; rm secrets.env
Version 1.0.0
And after version 0.6.1 comes version 1.0.0. Work focused on:
- Usability improvements to the UI.
- New documentation including an overview of SFM, a user quick start, and exploring social media data with ELK.
- Adding Flickr warc iterator, Weibo warc iteration, and additional tools to processing container
Release notes for specific components:
- sfm-ui
- sfm-utils
- sfm-twitter-harvester
- sfm-flickr-harvester
- sfm-weibo-harvester
- sfm-web-harvester
- sfm-elk
For a complete list of tickets, see sfm-ui milestone 1.0.0.
Version 0.6.1
In version 0.6.1, collection was renamed to collection set and seed sets was renamed to collection. It also includes a few minor bug fixes.
Release notes for specific components:
For a complete list of tickets, see sfm-ui milestone 0.6.1.
Known defects:
- Restarting sfm-ui container running apache fails (#291)
Note that upgrading from previous versions to this version has not been tested and may be problematic. (This is the last release where upgrading will be untested.)
Version 0.6.0
Version 0.6.0 is where it all comes together! The most significant improvements include:
- UI improvements to just about every page. Some of the highlights include:
- Added visualizations of items collected over time on the home page
- Customization of harvest options and seed entry to the social media platform
- Added UI support of requesting and viewing exports.
- Added display of harvest history.
- Improved display of change history for credentials, collections, seed sets, and seeds.
- Added support for pausing/resuming harvests, including for stream harvests.
- Added the Weibo and web harvesters.
- Added exporters for Weibo, Twitter REST, and Twitter stream.
- Added a processing container to support processing/analysis of collected data.
- New/ improved documentation for:
- Deployment on an Amazon EC2 instance.
- Processing social media data.
- Authentication
- Credentials
- Docker
Release notes for specific components:
For a complete list of tickets, see sfm-ui milestone 0.6.0.
Known defects:
- Restarting sfm-ui container running apache fails (#291)
Want to give it a try? Just choose to do a local installation (simple) or install on an Amazon EC2 instance (super-simple).
v0.5.0
Version 0.5.0 is focused on:
- Continued work on SFM UI, including:
- adding screens for credentials
- refinement of seed set and seed screens
- tracking changes made by users to collections, seed sets, seeds, and credentials
- Keeping records for WARC files that are created during harvesting.
- Adding the Sina Weibo harvester.
- Enhancements to the Twitter harvester, including user timelines and the sample stream.
- Initial support for export. In this release, export to CSV, Excel, JSON, and others is supported for Flickr photos and exports must be requested using the Admin interface. (Export for other types of social media content and SFM UI screens for requesting and retrieving exports are forthcoming.)
Release notes for specific components:
For a complete list of tickets, see sfm-ui milestone 0.5.0.
Steps to give it a try:
- Get a set of Twitter API keys
- Get a set of Flickr API keys.
- Bring up an instance of SFM using
prod.docker-compose.yml
. - Log into the SFM UI (
http://localhost/
) as "testuser" using password "password". - Using the "My Credentials" link, create a new Credentials. Select “testuser” as the user and enter “twitter” as the platform. For token, enter:
{
"key": "<YOUR KEY>",
"secret": "<YOUR SECRET>"
}
- Click My Collections and then the Add New Collections button. Enter a collection name and select the group you created as the group. Click Save.
- Click on the collection you just created and then the Add Seedset button. Enter a name and select "Flickr user" for harvest type. For harvest options, enter
{"sizes": ["Thumbnail", "Original"]}
. Select the credentials you created as the credentials. Click Save. - To create a seed, go to the Admin site and click "Add Seed". Select the seed set you just created as the seed set. Either enter a flickr username as the token.
- Update the seed set, but don't change anything.
- Wait
- Go back to admin site and look at the list of harvest. Expected result is a harvest record for each harvest.
- From the commandline, execute
docker exec -it sfmdocker_sfmflickrharvester_1 find /tmp/collection -name "*.warc.gz"
. Expected result is path for a WARC.