Skip to content

Conversation

@gammazero
Copy link
Contributor

@gammazero gammazero commented Nov 14, 2025

Continued from #136
Credit to @requilence for most of this PR

Problem

When uploading large files through UnixFS or other IPLD structures, the current batch implementation keeps everything in memory and writes blocks individually. If the application crashes mid-process, partial blocks remain in FlatFS, making garbage collection extremely difficult since there's no way to identify which blocks belong to incomplete operations.

Solution

This PR rewrites the batch mode to use a temporary directory approach:

  1. Atomic batch operations: All blocks in a batch are written to a temporary directory first, then renamed to their final locations only on commit. If the process crashes, no partial data remains in the main datastore - the temp directory is cleaned on restart.
  2. Memory efficiency: Instead of keeping all data in memory, blocks are written directly to temp files, enabling large file uploads without high RAM usage.
  3. Read operations in batch: Added BatchReader interface implementing Get/Has/GetSize methods. This is essential for IPLD structures which need to verify block links during construction. Without batch reads, you cannot build complex IPLD structures that reference blocks added earlier in the same batch. This follows standard database transaction semantics where reads within a transaction see uncommitted writes.

Implementation Details

  • Each batch creates a unique temp directory under .temp/
  • Files are written without sharding in temp
  • On commit, files are atomically renamed to their sharded destinations
  • On discard or crash, temp files are cleaned up (on the next startup). This code was already existing
  • Thread-safe with mutex protection
  • Batch can be reused after commit/discard (because this behavior of Batch was before my changes)

requilence and others added 2 commits November 14, 2025 11:10
* Rewrite batch mode to use temp directory

Write batch data to temp directory instead of memory to avoid high RAM usage with large files. Each batch creates its own temp directory, writes blocks directly to disk, and renames all files atomically on commit.

* Use slice instead of map for batch puts
* Add DiscardableBatch interface with Discard method
* add read operations for batch iface
* fix: remove unused function and variable to fix CI checks
* Removed unused tempFileOnce() method
* Replaced unused 'done' variable with '_' in Commit method
* remove some local files
* perf: optimize batch read operations to skip filesystem checks

Only check temp directory for keys that exist in puts slice

* feat: implement batch Query with temp directory merging
* make batch Put operations async

Address PR review feedback

* fix Batch init
* Update flatfs.go
@lidel lidel mentioned this pull request Nov 18, 2025
34 tasks
@hsanjuan hsanjuan requested review from hsanjuan and lidel November 25, 2025 15:24
@gammazero
Copy link
Contributor Author

Triage: Open PR in kubo and test on kubo staging.

gammazero added a commit to ipfs/kubo that referenced this pull request Dec 4, 2025
Upgrade go-ds-flatfs to version that uses uses temproary files to store items added to batches.

See: ipfs/go-ds-flatfs#142
- add godoc for maxConcurrentPuts explaining why 16 was chosen
- fix incorrect O(n) claims for Get/Has/GetSize (actually O(1) via putSet map)
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, as long we address two things:

  1. Updated godoc (bd468ee) to reflect the latest version, but unsure about one thing – see inline.
  2. IIUC two concurrent goroutines calling os.Create() and Write() on the same file can error or corrupt data – see inline.

gammazero and others added 2 commits December 8, 2025 19:43
Co-authored-by: Marcin Rataj <lidel@lidel.org>
flatfs.go Outdated
return
}

file, err := os.Create(tempFile)
Copy link
Member

@lidel lidel Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit worried about this on Windows: i suspect concurrent async writes (multiple ipfs add) may cause file handle issues, things like golang/go#34681 – flatfs uses https://github.com/alexbrainman/goissue34681 for a reason, we likely need similar wrapper on windows here (old code used tempFileOnce from util_windows.go).

@gammazero too late on my end to write code, but we probably need to create createFile equivalent that does something similar (pseudocode):

util_unix.go:

  func createFile(name string) (*os.File, error) {
  	return os.Create(name)
  }

util_windows.go:

  func createFile(name string) (*os.File, error) {
  	return goissue34681.OpenFile(name, os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0666)
  }

Then finally here:

  file, err := createFile(tempFile)

Windows concurrency is janky, unsure if we need a retry wrapper, maybe can go without it, but in case, it could be something like:

util_windows.go:

  func createFile(name string) (*os.File, error) {
  	var file *os.File
  	var err error
  	for i := 0; i < RetryAttempts; i++ {
  		file, err = createFileOnce(name)
  		if err == nil || !isTooManyFDError(err) {
  			break
  		}
  		time.Sleep(time.Duration(i+1) * RetryDelay)
  	}
  	return file, err
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, without retry wrapper.

@hsanjuan
Copy link
Contributor

hsanjuan commented Dec 9, 2025

Triage:

  • Should test windows well. Perhaps stress-test with a test in this repo. Potential problem with file with duplicate blocks writing in parallel etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants