Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommendation Algorithm Manipulation via mass blocks #1386

Open
redknightlois opened this issue Apr 2, 2023 · 12 comments
Open

Recommendation Algorithm Manipulation via mass blocks #1386

redknightlois opened this issue Apr 2, 2023 · 12 comments

Comments

@redknightlois
Copy link

The current implementation allows for coordinated hurting of account reputation without recourse. The most general behavior is that global penalties are prone to be gamed (all of them). In other time I would just report this information using a vulnerability channel, but given that this is already popular knowledge there is no use to do so.

The reason is that there is nothing a user can do to get rid of it because:

  • The user can't know that it is been penalized.
  • The user can't revert the penalty because it is not in his hands to change behavior to avoid it
  • They accumulate and survive the actual tweet.
  • No matter how much you boost, with enough people applying enough signals (there are many) the multiplier gets incredibly low.

To Reproduce
Organize a botnet or a group of people with known similar views.
Request your followers to block someone for 'reasons' (it doesn't matter here if the reasons are valid or not). This is exploited by political parties, group-think, etc. Now that this is also known, the vulnerability is plain obvious.

Examples (using them to show the behavior does exist, not to punish the users for anything I had a lot to choose from):

https://twitter.com/BlockTheBlue
https://twitter.com/ayybeary/status/1642280442047995906
https://twitter.com/Kaptain_Kobold/status/1642379706925477888
https://twitter.com/MAYBEEELI/status/1642300879649792004
https://twitter.com/glenda_aus/status/1642282010462007296

There are apps that allow you build/organize/weaponize this behavior.

While already shutdown, these are some of the stats for BlockTogether:

  • 303k registered users.
  • 198k users subscribing to at least one list.
  • 4.5k users offering a list, with at least one subscriber.
  • 3.7B actions.

Steps to reproduce the behavior:

  1. Organize a group with a few friends (I have groups with 40+)
  2. Find a target, and execute the following tasks in order
  3. They should follow in preparation, a few days later unfollow first, [just doing this in 90 days intervals also hurts]
  4. Then they will report a few "borderline" posts.
  5. Then they will mute.
  6. Then they will block.

Expected behavior
No global penalty should be applied because you can game them pretty easily, all penalties (if any) should be applied at the content level.

@jbauernberger
Copy link

Someone did a CVE already. What a time to be alive.
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-29218

@chaoscode
Copy link

So we are opening CVE's for cancel culture now?

@YellowAfterlife
Copy link

YellowAfterlife commented Apr 5, 2023

So we are opening CVE's for cancel culture now?

A little different - currently the algorithm weights (by how much?) discovery based on the number of times a user has been blocked and seemingly without an "expiration" (unlike unfollows), meaning that

  1. Users may permanent tank an account's visibility by muting/blocking it en masse, which is particularly interesting since the target might never notice (especially with mutes).
    [issue text and CVE describe this]
  2. Unless you only post the most lukewarm content¹, your visibility may reduce "organically" over the years.
    You've told someone that they have their facts wrong in 2011 and they muted/blocked in you in response? This will cost you, even a decade later.

¹ Though even then, you might mute an account simply because Twitter suggests it to you too often in the algorithmic feed. Have I unknowingly contributed to downfalls of several "funny animal pictures" accounts? Oh no

I can think of problems that the current implementation solves, but if it is exactly what the code looks like, it's probably going to be abused to hell by interested parties.

Relevant code:

val blocks: SCollection[InteractionGraphRawInput] =
GraphUtil.getFlockFeatures(
readSnapshot(FlockBlocksEdgesScalaDataset, sc),
FeatureName.NumBlocks,
endTs)
val mutes: SCollection[InteractionGraphRawInput] =
GraphUtil.getFlockFeatures(
readSnapshot(FlockMutesEdgesScalaDataset, sc),
FeatureName.NumMutes,
endTs)
val abuseReports: SCollection[InteractionGraphRawInput] =
GraphUtil.getFlockFeatures(
readSnapshot(FlockReportAsAbuseEdgesScalaDataset, sc),
FeatureName.NumReportAsAbuses,
endTs)
val spamReports: SCollection[InteractionGraphRawInput] =
GraphUtil.getFlockFeatures(
readSnapshot(FlockReportAsSpamEdgesScalaDataset, sc),
FeatureName.NumReportAsSpams,
endTs)
// we only keep unfollows in the past 90 days due to the huge size of this dataset,
// and to prevent permanent "shadow-banning" in the event of accidental unfollows.
// we treat unfollows as less critical than above 4 negative signals, since it deals more with
// interest than health typically, which might change over time.
val unfollows: SCollection[InteractionGraphRawInput] =
GraphUtil
.getSocialGraphFeatures(
readSnapshot(SocialgraphUnfollowsScalaDataset, sc),
FeatureName.NumUnfollows,
endTs)
.filter(_.age < 90)
// group all features by (src, dest)
val allEdgeFeatures: SCollection[Edge] =
getEdgeFeature(SCollection.unionAll(Seq(blocks, mutes, abuseReports, spamReports, unfollows)))
val negativeFeatures: SCollection[KeyVal[Long, UserSession]] =
allEdgeFeatures
.keyBy(_.sourceId)
.topByKey(maxDestinationIds)(Ordering.by(_.features.size))
.map {
case (srcId, pqEdges) =>
val topKNeg =
pqEdges.toSeq.flatMap(toRealGraphEdgeFeatures(hasNegativeFeatures))
KeyVal(
srcId,
UserSession(
userId = Some(srcId),
realGraphFeaturesTest =
Some(RealGraphFeaturesTest.V1(RealGraphFeaturesV1(topKNeg)))))
}
// save to GCS (via DAL)
negativeFeatures.saveAsCustomOutput(
"Write Negative Edge Label",
DAL.writeVersionedKeyVal(
dataset = RealGraphNegativeFeaturesScalaDataset,
pathLayout = PathLayout.VersionedPath(opts.getOutputPath),
instant = Instant.ofEpochMilli(opts.interval.getEndMillis),
writeOption = WriteOptions(numOfShards = Some(3000))
)
)
// save to BQ
val ingestionDate = opts.getDate().value.getStart.toDate
val bqDataset = opts.getBqDataset
val bqFieldsTransform = RootTransform
.Builder()
.withPrependedFields("dateHour" -> TypedProjection.fromConstant(ingestionDate))
val timePartitioning = new TimePartitioning()
.setType("DAY").setField("dateHour").setExpirationMs(21.days.inMilliseconds)
val bqWriter = BigQueryIO
.write[Edge]
.to(s"${bqDataset}.interaction_graph_agg_negative_edge_snapshot")
.withExtendedErrorInfo()
.withTimePartitioning(timePartitioning)
.withLoadJobProjectId("twttr-recos-ml-prod")
.withThriftSupport(bqFieldsTransform.build(), AvroConverter.Legacy)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(
BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE
) // we only want the latest snapshot
allEdgeFeatures
.saveAsCustomOutput(
s"Save Recommendations to BQ interaction_graph_agg_negative_edge_snapshot",
bqWriter
)

@jamesdigid
Copy link

jamesdigid commented Apr 5, 2023

The muting aspect should be revised if not removed altogether as part weighting aspect. I agree with the op we should be applying penalties at the content level let people construct their artificial echo chambers.

I suppose the spirit of this is getting ahead of harmful content algorithmically by leveraging 'muting' and 'blocks' as a general signal. I think the ethical thing to do here would be to implement some type of time entropy with helpful feedback to the user i get the idea of wanting to get ahead of content people dont want to see but not having the ability for any redemption forever is a major flaw.

What could be done is to de-rank post for people that are in the segment of people who blocked/muted the account originally so as to not 're-offend' the community segment that originally blocked/muted.

There really should be an algorithm for 'intelligent community clustering.' I haven't finished reading all the code yet so perhaps there is.

@igorbrigadir
Copy link

There really should be an algorithm for 'intelligent community clustering.' I haven't finished reading all the code yet so perhaps there is.

This is exactly the goal of SimClusters btw, but Mutes and Blocks are used for this: https://github.com/twitter/the-algorithm/tree/main/src/scala/com/twitter/simclusters_v2

@jamesdigid
Copy link

jamesdigid commented Apr 5, 2023

There really should be an algorithm for 'intelligent community clustering.' I haven't finished reading all the code yet so perhaps there is.

This is exactly the goal of SimClusters btw, but Mutes and Blocks are used for this: https://github.com/twitter/the-algorithm/tree/main/src/scala/com/twitter/simclusters_v2

Perfect, is there a higher abstraction that leverages negative signals to the community space? Such as something like a gamification abstraction layer or even insights into 'community collision' signals. For example, if a mass block/muting event is the result of two community spaces interacting, or a coordinated attack? I haven't found this logic yet.

I'm just curious surely twitter has deeper insights as to why mass block/mute events occurring and how to differentiate between an artificial coordinated attack, and community collisions.

@redknightlois
Copy link
Author

redknightlois commented Apr 5, 2023

From my experience in reinforcement learning, I have come to the realization that negative signals, especially of the global type, are specially tricky to get right. The reason for this is that algorithms are quick to identify how to game negative feedback, leading to a rapid convergence into local minima. This pattern is evident when you add up all the negative feedback, and your reputation starts declining towards zero.

I have observed that negative feedback can be easily exploited by an adversary to push the system into such a state in a much faster way. As mentioned by @YellowAfterlife, interestingly in the infinite, everyone stabilizes around a zero reputation score. And given that now the source code is visible to everyone it puts a much bigger price on errors arising from those negative signals.

. I think the ethical thing to do here would be to implement some type of time entropy with helpful feedback to the user i get the idea of wanting to get ahead of content people dont want to see but not having the ability for any redemption forever is a major flaw.

@jamesdigid it is not so easy. Even time entropy can be gamed. Lets assume for simplicity that we use the 90 days entropy decay that is been currently applied for unfollows.

image

Essentially, you can generate follows and unfollows in 90-day intervals and keep the user in a deboosted state forever on a finite resource (bot accounts). Even though this tactic cannot be used on small accounts as the behavior would be visible and raise suspicion. However, on larger accounts, this can be done with the help of a botnet-type attack.

Here's how it works:

On day 1, you follow the account with 1000 other accounts.
On day 2, you follow the account with another 1000 accounts.
On day 90, you follow the account again with another 1000 accounts.
On day 91, you unfollow the account with the first 1000 accounts.
On day 92, you unfollow the account with the first 1000 accounts and follow it again with another 1000 accounts.

The only user who sees this behavior is the account owner who notices X number of follows and Y number of unfollows. However, follows do not affect the account's reputation, but unfollows do.

This tactic can be repeated indefinitely, resulting in the account suffering the reputation hit of 90,000 extra unfollows. In this case, since people seldom unfollow others, this is a massive signal unless you weight it down to oblivion. I still believe the proper way of handling this is using only positive signals to achieve population segmentation by content.

@jamesdigid
Copy link

jamesdigid commented Apr 6, 2023

Right, so timed coordinated attack are almost exclusively conducted by a botnet as opposed to something like a "call-to-arms" type attack conducted by social influence. My question is there should be signals capturing this somewhere which i'm not finding. There ought to be efforts to spotlight bad faith actors carrying out timed attacks against people of influence.

The the timed attack you mentioned would have a host of unnatural behaviors indicating gamification is occurring.

I'm certain this type of abstraction i'm talking about exist in twitters codebase somewhere i've tried searching some keywords without success.

@jamesdigid
Copy link

jamesdigid commented Apr 6, 2023

This issue seems related to #127

if not related atleast connected in the sense of another of way of mitigating mass block/mute attacks

@redknightlois
Copy link
Author

redknightlois commented Apr 6, 2023 via email

@jamesdigid
Copy link

So the crux of this problem is spotlighting botnet activity. I thought the notion of the blue check mark was to mitigate that risk by pushing the cost of botnets beyond the rewards?

@PaulNewton
Copy link

Related pull #660 Limit penalization on blocks / mutes for a cooldown of 180 days
For issue #658 Excessively penalizing accounts for being blocked or reported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants
@YellowAfterlife @redknightlois @chaoscode @igorbrigadir @jamesdigid @PaulNewton @jbauernberger and others