Hub improvement checklist #424
Unanswered
edogrigqv2
asked this question in
Ideas
Replies: 1 comment 2 replies
-
@davidbuniat @mynameisvinn Would like to hear your opinions regarding these points here. New suggestions would be nice as well. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This is a list of points that needs to be improved to consider it a polished gem.
MutableMapping
interface to interact with the filesystem.gcsfs
ands3fs
are part offsspec
and provide such an interface. However, those libraries are slow and we need to replace them with manually written ones. Aside fromMutableMapping
functionality we also need ls, rm, mv, cd, exists and other basic fs functionality as well. So by getting theurl
andcredentials
and input we should output MutableMapping with all of those functionalities. Right now there are 2 interfacesAbstractFileSystem
andMutableMapping
. These have been forced byfsspec
. Either we should start contributing to them and improve their performance to our needs or we should go completely our way and combine those 2 interfaces into 1 as we don't need 2. The reasons3fs
andgcsfs
are slow is that the contributors were lazy to optimize the code properly. With proper implementation, those 2 libraries can to optimal speeds and we won't need to keep our own version. There is also optionallistdir
andrmdir
functionality for zarr. Those should be added as wellThis 2-way decoupling should make it easier to use Dataset class. Next we need to get rid of class constructor from API, using function like
hub.dataset(...)
is much better thanhub.Dataset
. In the latter, the return type is fixed, while with API one we can be flexible. There is also a lot of optional functionality like to_pytorch or to_tensorflow IMPLEMENTED inside this class. We need to export implementations into separate "optional" files, that way it will be easier to manage everything.DynamicTensor
. This will make those parts of our code more manageable.Beta Was this translation helpful? Give feedback.
All reactions