Skip to content

Latest commit

 

History

History
45 lines (42 loc) · 3.28 KB

README.md

File metadata and controls

45 lines (42 loc) · 3.28 KB

GoDoc Build Status

Writing a HDFS clone in Go to learn more about Go and the nitty-gritty of distributed systems.

Features/TODO

  • MetaDataNode/DataNode handle uploads
  • MetaDataNode/DataNode handle downloads
  • DataNode dynamically registers with MetaDataNode
  • DataNode tells MetaDataNode its blocks on startup
  • MetaDataNode persists file->blocklist map
  • DataNode pipelines uploads to other DataNodes
  • MetaDataNode can restart and DataNode will re-register (heartbeats)
  • Tell DataNodes to re-register if MetaDataNode doesn't recognize them
  • Drop DataNodes when they go down (heartbeats)
  • DataNode sends size of data directory (heartbeat)
  • MetaDataNode obeys replication factor
  • MetaDataNode balances based on current reported space
  • MetaDataNode balances based on expected new blocks
  • MetaDataNode handles not enough DataNodes for replication
  • Have MetaDataNode manage the block size stuff (in HDFS, clients can change this per-file)
  • Re-replicate blocks when a DataNode disappears
  • Drop over-replicated blocks when a DataNode comes up
  • Looking at DataNode utilization should take into account the DeletionIntents and ReplicationIntents
  • Grace period for replicating new blocks
  • MetaDataNode balances blocks as it runs!
  • Record hash digest of blocks, reject send if hash is wrong
  • DataNode needs to keep track of blocks it's receiving / deleting / checking so that the integrity checker can run only on real blocks
  • Remove blocks if checksum doesn't match
  • Run a cluster in a single process for testing
  • Structure things better
  • Resiliency to weird protocol stuff (run the RPC loop manually?)
  • Command line parser doesn't work that well (try "main datanode -help")
  • Events from servers for testing
  • Better configuration handling (defaults)
  • Allow decommissioning nodes
  • Better logging, so warnings normally can be fatal for tests (two levels: warn that this process broke, and warn that somebody we're communicating with broke)
  • Don't need to wait around to delete blocks, just prevent any new reads and we'll come back to them
  • DataNode should do stuff on startup, and then spawn workers, not just spawn everybody (race conditions with address and data directories)
  • Support multiple MetaDataNodes somehow (DHT? Raft? Get rid of MetaDataNodes and use Gossip?)
  • Keep track of MoveIntents (subtract from predicted utilization of node), might fix the volatility when re-balancing
  • HashiCorp claims heartbeats are inefficient (linear work aafo number of nodes). Use Gossip?
  • Don't force a long-running connection for creating a file, give the client a lease and let them re-connect
  • If a client tries to upload a block and every DataNode in its list is down, it needs to get more from the MetaDataNode.
  • Keep track of blocks as we're creating a file, if the client bails before committing then delete the blocks.