Parallel write is a Python module for distributing writes between an arbitrary number of open file(like) objects.
Features:
- Distributes each calls to the proxy object to each passed file objects, so all of them should be in the same state
- Writes are done in a configurable length thread pool, so you can have slower underlying objects, their slowness won't add up
- Compares results from the methods, so despite its name, you can actually read from many objects at once and fail if any of them return different data
We often write the same data to local disk (for later caching) and remote (S3 for persistence). The files must be the same, but the tool we're using may produce binary-different outputs for two subsequent writes (either because PYTHONHASH shuffles things or it includes time-stamps into the compressed output's metadata, doesn't matter).
We could write the file locally first, then copy it to S3, but that would take more time and complexity in code. It's easier to write them at the same time.
See the documentation.
As usual in any GitHub based project, raise an issue if you find any bug or room for improvements.
v0.0.10