Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update popularity.php #1436

Closed
wants to merge 1 commit into from
Closed

Conversation

chrisjung-dev
Copy link

@chrisjung-dev chrisjung-dev commented Apr 30, 2018

Reflect GDPR and store hashed IPs instead of the real IP

Reflect GDPR and store hased IPs instead of  the real IP
@OleVik
Copy link
Contributor

OleVik commented Apr 30, 2018

I've not read the entirety of the GDPR or it's legal provisions, but does it specify that personal identifiable data is acceptable once obscured? Prior EU law has only required that websites inform visitors about what data is gathered, typically through listing it on a privacy policy page.

@chrisjung-dev
Copy link
Author

You may extend my humble patch, but AFAIK, there is also room to do pseudonymization, which would be done with that.

You could add some salt as well. I'm just not able to to this at this point

@OleVik
Copy link
Contributor

OleVik commented Apr 30, 2018

A court ruling by the CJEU maintains that IP addresses are personal information, and Recital 26 of the GDPR confirms that pseudonymization would be sufficient.

Though the effect of hashing an IP through Sha512 won't be noticeable on small to medium sized sites, on a large size it is an ill-afforded inefficiency. This will of course also vary by the server running Grav. Since many of the available algorithms in PHP are unnecessary slow or insecure, I ran a test with MD5, SHA1, SHA256, and SHA512 - running 100.000 iterations over the string "255.255.255.255". On PHP 7.0 this results:

md5: eea88cd0d9a7ba26282fc786713bbbb6
0.061107873916626 sec
sha1: d6a4eed80594f0806043581292d64483ebc5d34f
0.084007978439331 sec
sha2 (256): f45462bf3cd12ea2b347f32f6c4d0a0d36e01694de332b307af90d42951c5bd6
0.12620997428894 sec
sha2 (512): 45bc52e2289ad3244f923867001c4eef57668eb49a68087fc89348f5ec135b0fd34b1ab6eb650b12ecd424b0fa80eb27def49befa1672edc8f108bc873d7dce9
0.16685509681702 sec

So at the very least the end-user should have a choice between a handful of algorithms, if Popularity is enabled, but I agree with the sentiment @campino2k - this would legally be required under EU law.

Given the nature of the data, defaulting to a lower-level hash such as MD5 or even MD4 - though both are cryptographically insecure, they perform well with a more comprehensive test on the same system - could be considered adequate anonymization where decrypting a large amount of them would in any case be inefficient.

@chrisjung-dev
Copy link
Author

MD5 is out of the game, since collisions are found at similar inputs (which IPv4 is). sha1 is technically broken and the current recommendateion is to use at least 256-hashing.

@mahagr
Copy link
Member

mahagr commented May 4, 2018

I think that SHA1 would be good enough because of there's really no benefit if you could guess the IP address. I see no benefit of using longer hashes to waste disk space. :)

@chrisjung-dev
Copy link
Author

What about using SHA1( IP + date("ymd")) as basis?

@rhukster
Copy link
Member

I think SHA1 is plenty good enough for this pupose.

@rhukster
Copy link
Member

Updated manually.. thanks.

@rhukster rhukster closed this May 15, 2018
rhukster added a commit that referenced this pull request May 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants