Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic photo links (e.g. hash / filename) #308

Open
twatzl opened this issue Jul 24, 2019 · 10 comments
Open

Deterministic photo links (e.g. hash / filename) #308

twatzl opened this issue Jul 24, 2019 · 10 comments
Labels
enhancement New feature or request Project for volunteers The team has no plans to work on it (e.g. lack of time) but an external contribution is accepted

Comments

@twatzl
Copy link

twatzl commented Jul 24, 2019

Hello,

I want to use Lychee as a photo library for my photo blog. I am going to host the blog in a dockerized environment at some service provider, however I would like to save the effort of having to backup the database, so my idea was to just put the photos in a folder on my PC and if something happens to the server I would just upload them again.

However I have noticed that the links seem to be not deterministic. So if I start Lychee, upload some photos and then delete everything again start another new Lychee instance and upload the same photos the link will be different than the first time.

Of course this is a problem if I embed the photos and the links change afterwards.

Is there anyway to configure Lychee in a way that the links given to the photos will be deterministic? Is it for example possible to include the original picture name in the link (which are mostly unique in my photo collection) or have 'human readable' links?

@d7415
Copy link
Contributor

d7415 commented Jul 24, 2019

At the moment the photo URLs are based on photo ID, which in turn is based on upload time. It would be possible to change the ID to e.g. a file hash, but this would have impact all over the codebase. A static link is more likely. As such I'll alter the title and move this to the Lychee-Laravel repository, where development is ongoing. Soon we will be migrating to that version for the v4 release.

@d7415 d7415 transferred this issue from LycheeOrg/Lychee-v3 Jul 24, 2019
@d7415 d7415 changed the title Are the photo links deterministic or can they be configured to be deterministic? Deterministic photo links (e.g. hash / filename) Jul 24, 2019
@d7415 d7415 added enhancement New feature or request Project for volunteers The team has no plans to work on it (e.g. lack of time) but an external contribution is accepted labels Jul 24, 2019
@twatzl
Copy link
Author

twatzl commented Jul 25, 2019

Thank you. I might look into it if I have time, but I don't want to promise anything.

@ildyria
Copy link
Member

ildyria commented Jul 26, 2019

what you can do, is:

  1. take the hash of the file.
  2. truncate it to it's first 16 characters (64 bits), because this is a truncation, we do not mess with the randomness of each bit.
  3. convert the hexadecimal string to an integer and use it as the ID.

Beware, after uploading 2^32 pictures (~4 000 000 000) you have high risk of collisions (two images having the same ID).

I would suggest you add a setting (disabled by default) which decide whether to use time or hash to generate the ID.

// get the hash
$hash = sha1(rand());

// truncate
$v = substr($hash,0,16);

// convert to int
$va = base_convert($v,16,10);

// print hash (substr)
echo $v;
echo "||";
// print int
echo $va;
echo "||";
// check it fits in 64 bits.
echo log($va,2);

@twatzl
Copy link
Author

twatzl commented Jul 28, 2019

Slowly but surely I think I understand the solution. The basic idea is simple to understand, but I am trying to make sure that I also understand the details. However what I don't understand is why you would think that after 4 mio. pictures there would be a high risk of collisions?
Is this a rule of thumb that after 50 percent of the keys are used the risk of collision gets higher?

On the other hand how likely is it for someone to really have 4 mio pictures? Or that Lychee would still be able to handle that much.

@d7415
Copy link
Contributor

d7415 commented Jul 29, 2019

after 4 mio. pictures there would be a high risk of collisions?

4 billion

Is this a rule of thumb that after 50 percent of the keys are used the risk of collision gets higher?

This might be a good start. According to that, at 4 billion photos the risk of a collision is about 50%.

On the other hand how likely is it for someone to really have 4 mio billion pictures?

Not very, but worth considering

Or that Lychee would still be able to handle that much.

Depending on the resources available and how they were distributed, I could see this working. I'm a little tempted to work out some stress test once the CLI import is ready and I'm familiar with it...

@ildyria
Copy link
Member

ildyria commented Jul 29, 2019

Is this a rule of thumb that after 50 percent of the keys are used the risk of collision gets higher?

And to complete what @d7415 said, it is not 50%. The number of possible keys is 2^64, 4 billion is 2^32, so it is half the exponent. Half the key space would be 2^63. ;)

@kamil4
Copy link
Contributor

kamil4 commented Jul 29, 2019

Depending on the resources available and how they were distributed, I could see this working. I'm a little tempted to work out some stress test once the CLI import is ready and I'm familiar with it...

Well, we do get complaints from people with 1000+ photos in an album but my guess is that it's the front end that's the bottleneck...

@twatzl
Copy link
Author

twatzl commented Jul 29, 2019

Oh yeah sorry. Miscounted a few of the zeros yesterday. So if it is 4 billion then it is even less likely. I think I am taking photos now since 2013. I am taking many but i have not yet reached the 1 mio. mark.

However I think regardless of the count there would have to be some mechanism to detect and avoid identical hashes. What do you think?

@ildyria
Copy link
Member

ildyria commented Jul 29, 2019

However I think regardless of the count there would have to be some mechanism to detect and avoid identical hashes. What do you think?

You mean like this one? 😆
https://github.com/LycheeOrg/Lychee-Laravel/blob/master/app/ModelFunctions/PhotoFunctions.php#L491

@ildyria
Copy link
Member

ildyria commented Jul 29, 2019

Oh yeah sorry. Miscounted a few of the zeros yesterday. So if it is 4 billion then it is even less likely. I think I am taking photos now since 2013. I am taking many but i have not yet reached the 1 mio. mark.

You may assume a multi-user set-up. You can rack up a lot of pictures pretty quickly (with a lot of users but still. :) )

Also note that we are safe... because we are using a 64-bit index. If we used a normal 32-bit one then the thresholds gets down to 65 536 pictures...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Project for volunteers The team has no plans to work on it (e.g. lack of time) but an external contribution is accepted
Projects
None yet
Development

No branches or pull requests

4 participants