Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protected web archive shows email address on search despite of "only show gecos" parameter #1312

Closed
SansPseudoFix opened this issue Jan 6, 2022 · 4 comments · Fixed by #1314
Labels
Milestone

Comments

@SansPseudoFix
Copy link

Version

6.2.60

Expected behavior

Address shouldn't be displayed in search pages.

Actual behavior

With Protect web archive against spam harvesters set on only show gecos. Search for a name in archive advanced search: email address are displayed.

If you browse into archives (without searching) email addresses are replaced as expected.

@ldidry
Copy link
Contributor

ldidry commented Jan 6, 2022

The problem seems to be related to the fact that the research is done in the archived source files of the mails, which are not anonymised.

I don’t know what would be the best solution: anonymising the sources of the mails or just anonymising the snippets of the mails returned by the research? What do you think?

@ikedas
Copy link
Member

ikedas commented Jan 7, 2022

I guess:

  • If the archives are restricted, i.e. login required to browse archives, setting for web_archive_spam_protection may be omitted. Because they are already protected.
  • On the other hand, if archives are not restricted and the setting for the protection is applied, email addresses in the archives may also not be searchable.

@SansPseudoFix , is my understanding correct?

@SansPseudoFix
Copy link
Author

Well, I'm not concerned (it's a issue reported by a Framalistes user), but I guess the person would prefer to be totally anonymous (even if archives are restricted).

@ikedas
Copy link
Member

ikedas commented Jan 12, 2022

Currently visibility of emails in web archives is defined as follows.

web_archive_spam_protection visibility of emails in web archives code
cookie (default) Always, however if user hasn't logged-in, they must confirm. [1][2]
javascript Always, however emails are obfuscated. [3]
at Always, however "@" characters are replaced. [3]
concealed (6.2.62 and later) Never: Emails are completely removed. [3]
none Always. [3]

The case of concealed is only exception to make the address completely invisible.

Below is my suggestion.


I think concealed should probably only be made invisible to users who are not logged in (as cookie requests confirmation to users who are not logged in). --- and I think we should do the same for searching emails in archives: Email addresses shouldn't be searchable only if the user is not logged in.

@ikedas ikedas closed this as completed in 56a03cd Mar 24, 2022
ikedas added a commit that referenced this issue Mar 24, 2022
…ve-search by ldidry & ikedas

Protect email addresses in archive search (#1312)
@ikedas ikedas added this to the 6.2.70 milestone Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants