Skip to content
This repository has been archived by the owner on Nov 27, 2018. It is now read-only.

Sybil attack to get user share revenue #1

Closed
dimitri-xyz opened this issue Apr 18, 2016 · 9 comments
Closed

Sybil attack to get user share revenue #1

dimitri-xyz opened this issue Apr 18, 2016 · 9 comments
Assignees
Labels

Comments

@dimitri-xyz
Copy link

The current version of the documentation is (in my view) not yet sufficiently clear. I had a very hard time making sense of it. So, this is a preliminary assessment. A more formal outline would be useful.

This is a design bug. A malicious adversary can make 1000 "fake" ad-replacement personas that only browse the web once a month and see only 1 impression each. Through the current proposed revenue share model and those personas, the adversary would get 1000 times the revenue of the average user.

This attack shows that dividing the user revenue on a "per user" basis is subject to manipulation. I understand that the underlying motivation for this model is that:

In order to enhance privacy, the payment to each ad-replacement persona are calculated independently of the actual impressions served to each persona -- Brave Software does not keep track of which personas are served which impressions.

However, I believe the principle can still be achieved by aggregation that is not subject to this attack. One could consider a "per impression" rather than "per user" revenue share model where the number of impressions is bundled in multiples of 128 (for example) to still allow for anonymity.

@burdges
Copy link

burdges commented Apr 18, 2016

Just a related aside :

Anonize seemingly does not provide post-quantum anonymity, while a simple blind signature scheme does. That's certainly no show stopper, but it should give you pause when you notice it touch such a huge dataset as "browsing history". Impressions might let you use blind signatures, which reduces complexity anyways.

It's subtle though : If you needed say Brands' blind signatures to prevent double spending asynchronously, then Anonize is probably better. It's far more likely that Brands' double spending protection would deanonymize users on the spot, maybe even in an attackable way, than that someone will build a quantum computer in 20 years and read everyone's old browsing histories from Anonize logs.

I suppose blind signatures require infrastructure you cannot realistically deploy initially, but maybe worth keeping in mind longer-term.

@mrose17 mrose17 self-assigned this Apr 22, 2016
@mrose17
Copy link
Member

mrose17 commented Apr 22, 2016

@dimitri-xyz - i agree with your meta-comment regarding the documentation it is somewhat unclear, but i'm not sure what a "more formal outline" would be, could you sketch that out? you may want to submit a new issue just on the clarity issue...

thanks for the comment. i think it is a "non-issue" though and here's why: in order for a user to take money out of the system, they have to have a verified bitcoin wallet, and the ledger allows only one user to make use of a particular phone number and/or a particular email address, and the text is missing the word unique, sorry!

@burdges - keep in mind that it's a browsing summary not a browsing history, i have updated the current version of the specification to make this clearer (i hope).

@abhvious - could you comment on @burdges note on post-quantum anonymity? i'm way out of my depth on that!

@dimitri-xyz & @burdges - please keep thos comments coming!

my thanks!

/mtr

@dimitri-xyz
Copy link
Author

@mrose17 - I will create a separate issue on the documentation clarity or make a pull request with a few suggestions then. It sounds appropriate to split them up.

On the Sybil attack, it seems your argument hinges on the assumption that email addresses and/or phone numbers are expensive to get. I am not sure about phone numbers, but it is very cheap to generate a large number of email addresses.

I have any username at my domains (e.g. dimitriexample.com) being forwarded to a single large junk inbox (that I never really check). But this means that I can generate email addresses on the fly simply by generating random strings and concatenating the suffix '@dimitriexample.com'. This should show that requiring multiple email addresses will not prevent this attack.

If you plan to require every user to be reachable by a distinct phone number and the cost of phone numbers is high to the attacker (the cost of the phone numbers must be higher than the expected Brave reward, otherwise it may still be profitable), then you may thwart this attack. But I am worried this solution might just be a temporary hack and that requiring a phone number might exclude a significant number of real users.

I hope this helps! Thanks for the initative :-)

@burdges
Copy link

burdges commented Apr 22, 2016

There is one Sybil attack that goes roughly as follows :

  • Create millions of personas using crap email addresses or VoIP providers.
  • Browse in ad-replacement to build up tiny balances.
  • Switch them into ad-free mode and browse only your own sites.

All ad networks face click fraud so this is nothing unusual and gets priced into ad costs. If anything, Brave seemingly makes click fraud slightly more complex and detectable.

Just spec out the plausible Sybil attacks. And build bots to detect weird Sybil-ish behavior like this once real money starts changing hands.

If it ever gets really messy, then do not let personas withdraw or convert from ad-replacement funds to ad-free funds, but only let ad-replacement personas contribute their funds to charities, rights organizations like the EFF, etc.

Afaik, there is no reason to require that personas have an email, phone, etc. either, just deal with it like everyone else deals with click fraud.

@mrose17
Copy link
Member

mrose17 commented Apr 23, 2016

@dimitri-xyz & @burdges - the tension is that there actually is a cost (not a lot, but not insignficant) to having "control" of a phone number that does SMS, and it is effectively a requirement of AML/KYC (in the US, at least). email addresses can be amortized to the extent that they are actually free, but not phone numbers.

the cost varies, we won't know for user until we try it, etc., but it is currently believed that it will not be cost effective for a "bad guy" to participate. (famous last words, obviously... but expect that these assumptions will be fine-tuned and watched very, very carefully).

the current thinking is that if you don't verify, you can "plow" the amounts back to the publishers of your choice", but that if you want to take funds out, you have to verify. that's what it think the spec says. if i am wrong on that, please let me know!

best,

/mtr

@dimitri-xyz
Copy link
Author

@mrose17 Doesn't that then give the attacker the final steps he needs? Rather than pulling the money out (thus being subject to KYC and requiring access to 1000 phone numbers), he makes a single fake publisher profile and then plows all the money into this single publisher. Now he does not need the phone numbers and can get the money. Voi lá!

@mrose17
Copy link
Member

mrose17 commented Apr 26, 2016

except that pubilshers have a stricter verification process than others... viz.,

potentially large amounts are transferred to publishers -- so verification is more extensive, depending on the size and frequency of payments, e.g., similar to the verification spectrum seen for DV, OV, and EV certificates.

@dimitri-xyz
Copy link
Author

I don't want to belabor the point. So, I'll just make a couple more comments:

  1. The attacker's publisher is "legit", (like any small blogger) and the extra verification costs would have to be very significant to be higher than then 1000x multiplier.
  2. I do believe we can deal with this situation if it arises. At the same time, I still think that a "per user" revenue share model is more likely to see this attack than a "per impression" model.

@mrose17
Copy link
Member

mrose17 commented Apr 27, 2016

hi. those are fair points. i agree that an impression-based model is more accurate (and has better auditing capabilities), but the concern is that achieving privacy is much more difficult. an earlier version of the specification was based on impressions, but it requires too many levels of "double-blinding" to be satisfying with respect to being both accurate/auditable and private. i suspect that we'll have to revisit this later this year...

@mrose17 mrose17 closed this as completed Aug 19, 2016
mrose17 added a commit that referenced this issue Sep 9, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants