This repository has been archived by the owner on Feb 16, 2023. It is now read-only.
Extend Grid to take advantage of IPFS file sharding #181
Labels
Type: New Feature ➕
Introduction of a completely new addition to the codebase
IPFS has a max block size of 1MB for security reasons. They've implemented sharding as a way to store larger files/directories on IPFS (see ipfs/notes#76, ipfs/kubo#3042, and also https://github.com/ipfs/js-ipfs-unixfs#usage for an example of how it's used in JS).
This becomes a problem for us, since we'll often want to send tensor objects that contain more than 1MB of data. For example, a 50-dimensional word embedding over a vocabulary of 100,000 words would normally require sending an embedding matrix of at least
50*100000*32/1000000/8=20MB
. Training a matrix like this presents a range of challenges, but even freezing it and sending it once would be feasible and useful for users, so this is definitely something we want to be able to do to allow for a larger class of architectures to be trained on Grid.The goal here would be to figure out a way to do JSON sharding with py-ipfs-api, and then to integrate those changes into Grid.
The text was updated successfully, but these errors were encountered: