-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PseudoFS with extra contracts #70
Comments
The metafile format assumes that all sectors in a file are stored on the same set of hosts, and that every host has all of the shards. So the format would need to be changed to accommodate this functionality. This definitely seems in the same vein as #69, which would also likely require a change to the metafile format. Perhaps this is the right catalyst to pursue |
Since the number of files we have is not huge, we wouldn’t mind if metafile format changes. But, migrating from |
The desired change -- maintaining a "buffer" of extra hosts, using only e.g. 30 of 35 hosts for each chunk, and allowing each chunk of a file to be stored on a different host set ("heterogenous chunks") -- is significant. I'm not sure it can be achieved in a short timeframe. Here are some alternatives:
I will take a stab at a new metafile format that supports heterogenous chunks. If I feel it can be done within a week or two, I'll go for it. If not, we should choose another option; option 1 seems to be the most viable. |
Thanks for the extra feedback Luke, much appreciated. As a imminent solution, especially for the Clear Center Integration, decreasing the host-set size will relieve some stress on the system. That said it's not a permanent nor comfortable solution but we can do it temporarily while we work on (4). I think (1) could work as well, however, it does not tolerate anymore host failures after the working contract set is formed so it won't be as resilient as (4), which if I'm correct, would allow > 1 hosts to fail mid-upload stream. That said it might get us halfway there. Will ask Junpei provide some more feedback as well. Please keep us posted and let us know if you need any help. Thanks! |
To my understanding, (1) means we create a PseudoFS with say 35 hosts, and if we store a file, PseudoFS selects 30 live hosts to upload the first sector. Then, it uses the same hosts to upload the rest of the sectors. If we store another file, it re-selects 30 hosts that are alive at that moment. Is that correct? If so, I think (1) is a good option for now. We could migrate to PseudoKVS at some point. However, we wouldn't be able to make it by the next release schedule. |
Hmm, unfortunately I think (1) is less viable than I originally thought. |
I think that is not a problem.
This means we could upload at least two sectors to the host set, and then we might be able to upload another sector to them. If the upload of the previous two sectored is done say one hour ago, we should consider some host gets offline. However, we call PseudoFile.Sync and PseudoFS.Close frequently, we can expect the host set is still working. |
Currently, PseudoFS needs that all underlying hosts are accessible to write a file and close the filesystem. However, it hardly expects it, especially if you increase the number of hosts. Since this is a network application, it might not be a good idea to assume all remote hosts are always working.
I'd propose that PseudoFS should have some extra hosts so that it can select live hosts for each operation (
flushSectors
). For example, if we need to upload files to 30 hosts and create a PseudoFS with 35 contracts (hosts), PseudoFS.flushSectors tries to acquire 30 hosts and can skip some hosts that don't respond. Also, ifh.Append(sector)
fails with a host, it can replace the host with another extra host. Then, we can keep using the PseudoFS even if some hosts get temporarily unreachable. As a result, each sector would be stored in a different host set, but it won't be a problem because metafile keeps which hosts have each sector.This seems like it doesn't require a significant change but will increase reliability.
It is related to #69.
The text was updated successfully, but these errors were encountered: