Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feat: Implement a PersistentDatastore by adding DiskUsage method (#27)
* Feat: Implement a PersistentDatastore by adding DiskUsage method This adds DiskUsage(). This datastore would have a big performance hit if we walked the filesystem to calculate disk usage everytime. Therefore I have opted to keep tabs of current disk usage by walking the filesystem once during "Open" and then adding/subtracting file sizes on Put/Delete operations. On the plus: * Small perf impact * Always up to date values * No chance that race conditions will leave DiskUsage with wrong values On the minus: * Slower Open() - it run Stat() on all files in the datastore * Size does not match real size if a directory grows large (at least on ext4 systems). We don't track directory-size changes, only use the creation size. * Update .travis.yml: latest go * DiskUsage: cache diskUsage on Close() Avoids walking the whole datastore when a clean shutdown happened. File is removed on read, so a non-cleanly-shutdown datastore will not find an outdated file later. * Manage diskUsage with atomic.AddInt64 (no channel). Use tmp file + rename. * Remove redundant comments * Address race conditions when writing/deleting the same key concurrently This improves diskUsage book-keeping when writing and deleting the same key concurrently. It however means that existing values in the datastore cannot be replaced without a explicit delete (before put). A new test checks that there are no double counts in a put/delete race condition environment. This is true when sync is enabled. No syncing causes small over-counting when deleting files concurrently to put. * Document that datastore Put does not replace values * Comment TestPutOverwrite * Implement locking and discard for concurrent operations on the same key This implements the approach suggested by @Stebalien in #27 Write operations (delete/put) to the same key are tracked in a map which provides a shared lock. Concurrent operations to that key will share that lock. If one operation succeeds, it will remove the lock from the map and the others using it will automatically succeed. If one operation fails, it will let the others waiting for the lock try. New operations to that key will request a new lock. A new test for putMany (batching) has been added. Worth noting: a concurrent Put+Delete on a non-existing key always yields Put as the winner (delete will fail if it comes first, or will skipped if it comes second). * Do less operation in tests (travis fails on mac) * Reduce counts again * DiskUsage: address comments. Use sync.Map. * Add rw and rwundo rules to Makefile * DiskUsage: use one-off locks for operations Per @Stebalien 's suggestion. * DiskUsage: write checkpoint file when du changes by more than 1 percent Meaning, if the difference between the checkpoint file value and the current is more than one percent, we checkpoint it. * Fix tests so they ignore disk usage cache file * Rename: update disk usage when rename fails too.. * Improve rename comment and be less explicit on field initialization * Do not use filepath.Walk, use Readdir instead. * Estimate diskUsage for folders with more than 100 files This will estimate disk usage when folders have more than 100 files in them. Non processed files will be assumed to have the average size of processed ones. * Select file randomly when there are too many to read * Fix typo * fix tests * Set time deadline to 5 minutes. This provides a disk estimation deadline. We will stat() as many files as possible until we run out of time. If that happens, the rest will be calculated as an average. The user is informed of the slow operation and, if we ran out of time, about how to obtain better accuracy.
- Loading branch information