You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
are actually the same image served from different sources (with different URLs).
We want them stacked (as duplicates) to prevent cases like this:
We might try to compare them first by dimensions, then by file size, possibly even by their ID (as seen in this example, though the subdomain is different, the ID is identical), and finally, if these are not reliable enough, compare them bitwise to make sure they're one and the same.
The text was updated successfully, but these errors were encountered:
If we are to store all of the ads locally anyway, why don't we just hash the image during storage and add it to the metadata along with site linked to and site found on? Then we can check ads against each other upon addition to the database. Alternatively, we could have users report duplicates and then have a server list with URLs which are the same as each other and auto-generated rules (e.g. [http://*]=[https://*],
[pagead2.googlesyndication.*]=[tpc.googlesyndication.*])
I have little to no actual programming experience - I can create solutions to simple logical tasks in Python but not a fully-fledged application - I was just suggesting a number of methods to accomplish your suggestion. Given that some ads are actually the same, but in different shapes and sizes or even different text/pictures (see attachments) I think that adding a user button to mark adverts as duplicates would be the most useful solution - it would save the processing power used to hash the images, but allow those who view their vault frequently (and so actually care about it - many users no doubt are just users and are not interested in viewing ads on websites or in their own free time) to tidy it up.
Some ads, like these two:
http://pagead2.googlesyndication.com/pagead/imgad?id=CICAgKDj2__aTRABGAEyCG3qbJztYSV0
https://tpc.googlesyndication.com/pagead/imgad?id=CICAgKDj2__aTRABGAEyCG3qbJztYSV0
are actually the same image served from different sources (with different URLs).
We want them stacked (as duplicates) to prevent cases like this:
We might try to compare them first by dimensions, then by file size, possibly even by their ID (as seen in this example, though the subdomain is different, the ID is identical), and finally, if these are not reliable enough, compare them bitwise to make sure they're one and the same.
The text was updated successfully, but these errors were encountered: