Skip to content

shimoku-tech/dynamodb-garbage-collector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DynamoDB Garbage Collector

Version

The DynamoDB Garbage Collector is a Python library that allows you to delete garbage items in DynamoDB tables.

Table of Contents

Installation

To install the DynamoDB Garbage Collector, use pip:

$ pip install dynamodb-garbage-collector

Usage

The DynamoDB Garbage Collector currently provides a single function called purge_orphan_items, which allows you to delete orphan items in a child table that reference a non-existent item in a parent table. If optional timestamp attributes are provided only will be delete orphan items earlier than a specified maximum time (by default, one hour ago).

To use purge_orphan_items, you need to provide the following parameters:

  • logger: a logger object to log messages during the execution of the function.
  • region: the AWS region where the parent and child tables are located.
  • parent_table: the name of the parent table.
  • child_table: the name of the child table.
  • key_attribute: the name of the key attribute for both tables.
  • child_reference_attribute: the name of the reference attribute in the child table.
  • max_workers (optional): the maximum number of workers to use for concurrent operations. If not provided, a default value of 100 will be used.
  • timestamp_attribute (optional): the name of the attribute that contains the timestamp of the records in the child table. If not provided, timestamp will not be taken into account when deleting items.
  • timestamp_format (optional): the format of the timestamp attribute. If not provided, timestamp will not be taken into account when deleting items.

Here is an example of how to use the purge_orphan_items function:

import logging
from dynamodb_garbage_collector import purge_orphan_items

# Set up the logger
logging.basicConfig()
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Set the AWS region where the parent and child tables are located
region = 'eu-west-1'

# Set the names of the parent and child tables, and the key and reference attributes
parent_table = 'ParentTable'
child_table = 'ChildTable'
key_attribute = 'id'
child_reference_attribute = 'parentId'

# Set the maximum number of workers
max_workers = 50

# Set the name of the timestamp attribute and the timestamp format
timestamp_attribute = 'createdAt'
timestamp_format = '%Y-%m-%dT%H:%M:%S.%fZ'

# Call the function
purge_orphan_items(logger, region, parent_table, child_table, key_attribute, child_reference_attribute, max_workers, timestamp_attribute, timestamp_format)

Contributing

We welcome contributions to the DynamoDB Garbage Collector. To contribute, please fork the repository and create a pull request with your changes.

License

The DynamoDB Garbage Collector is released under the MIT License.