Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store multiple DataObjs for the same Block #12

Closed
jakobvarmose opened this issue Aug 18, 2016 · 12 comments
Closed

Store multiple DataObjs for the same Block #12

jakobvarmose opened this issue Aug 18, 2016 · 12 comments

Comments

@jakobvarmose
Copy link

Currently only one DataObj will be stored for any given Block. I think it's important to be able to store multiple DataObjs for the same Block.

Example:

Assume we add two files (alpha and beta) to the filestore, but only the DataObj for beta is stored. Later we modify or remove beta (either from the file system, or directly from the filestore). Now the data is gone even though alpha still contains the data we want, and it could be served to other peers. It also seems weird that a DataObj will be removed if you add another one that happens to contain the same data.

In practice we would probably not add two files that are completely identical, but they might still have some identical blocks. Or we might add two directories with some files appearing in both. Both of these scenarious seem realistic, and would cause the same problems.

@kevina
Copy link

kevina commented Aug 18, 2016

I somewhat agree. The main problem is that it complicates the implementation. In fact my add-dir (https://github.com/ipfs-filestore/go-ipfs/blob/master/filestore/examples/add-dir.sh) script ran into this issue due to only storing one DataObj per block.

@kevina
Copy link

kevina commented Aug 18, 2016

There is also the issue of what to do when the same file (but with different paths) are added. I don't want to force a physical path with no symbolic links as sometimes a path with symbolic links is more robust than the physical location. The easiest think would be to not care and just allow them both.

@kevina
Copy link

kevina commented Aug 25, 2016

The reason I default to replacing the DataObj is so that invalid blocks will easily be fixed. For example if you move a file you can fix the invalid blocks by just re-adding. I am consider being able to configure this behavior to one of "insert" "auto" or "replace". Where "insert" will never overweight a DataObj for a given hash, "auto" will only overwrite it if the block is invalid, and "replace" will always overwrite.

@kevina kevina added this to the 0.20 milestone Aug 28, 2016
@kevina
Copy link

kevina commented Oct 24, 2016

@jakobvarmose I decided to go ahead and try to implement this. Filestore DataObj will now be uniquely referred to by HASH/FILENAME//OFFSET for example QmVr26fY1tKyspEJBniVhqxQeEjhF78XerGiqWAwraVLQH//somedir/hello.txt//0. This will add a lot of complexity to filestore operations but I decided it was worth it.

Implementation in progress expect some code to test in a few days.

@kevina
Copy link

kevina commented Oct 25, 2016

#23 is a start of an implementation. Still more work to do. The aim is to have this ready by the end of the week.

@kevina
Copy link

kevina commented Oct 29, 2016

@jakobvarmose the implementation in pull request #23 should be stable now and could use testing and feedback. Thanks.

@jakobvarmose
Copy link
Author

jakobvarmose commented Oct 29, 2016

I want to say first that I have come to the realization that IPFS is too resource intensive for my purposes, so I have started to implement a leaner alternative. But I decided to try this out anyway, and I have some comments.

Why do you have to enable filestore before adding files? Can't it be enabled automatically or always be enabled?

It seems odd that you have to specify the hash instead of the filename when removing files.

And when you remove a file by its hash, only one of the matching blocks/files is removed. Which block/file seems to be chosen at random. This was definitely not what I expected.

It would also be good if you could specify a relative filename, but I'm not sure if this is technically possible.

@kevina
Copy link

kevina commented Oct 29, 2016

Thanks for you feedback.

Why do you have to enable filestore before adding files? Can't it be enabled automatically or always be enabled?

I specifically added this because it is an experimental feature.

It seems odd that you have to specify the hash instead of the filename when removing files.

This is unavoidable unless you do a full database scan to search for blocks with the filename. I may still support it as it would be useful.

And when you remove a file by its hash, only one of the matching blocks is removed. Which block seems to be chosen at random. This was definitely not what I expected.

This needs to be documented better. I will update the README.

It would also be good if you could specify a relative filename, but I'm not sure if this is technically possible.

If you are referring to adding, not really, that is why I provide options to make absolute paths from relative ones.

For searching for files, I could provide that, but again it will be a full database scan.

@jakobvarmose
Copy link
Author

This is unavoidable unless you do a full database scan to search for blocks with the filename. I may still support it as it would be useful.

Don't you have a list of all the root nodes? Then you can just iterate over the children, as they should be stored in the block.

If you are referring to adding, not really, that is why I provide options to make absolute paths from relative ones.

Yes, I mainly meant when adding files.

@kevina
Copy link

kevina commented Oct 29, 2016

Don't you have a list of all the root nodes? Then you can just iterate over the children, as they should be stored in the block.

Not at the moment, that is something I might add in the future.

If you are referring to adding, not really, that is why I provide options to make absolute paths from relative ones.

Yes, I mainly meant when adding files.

The reason I require the --physical or --logical option is because want to give the user control over the path used to add the file. It may seam that it would be best to just follow all links, (i.e. physical) but this may not create the most stable path for the file. For example on my system /home/kevina/Videos point to /aux/media/Videos, but that location might change /home/kevina/Videos is the more stable location.

@kevina
Copy link

kevina commented Oct 31, 2016

@jakobvarmose I rewrite the filestore rm command and added documentation. It should behave more like you would expect.

Let me know what you think. Thanks.

@kevina
Copy link

kevina commented Nov 7, 2016

This is now done. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
@kevina @jakobvarmose and others