The goal of bsky-net
is to benchmark belief dynamics models, assessing their accuracy in predicting actual beliefs and enabling comparisons between different models.
Concretely, bsky-net
is a temporal graph dataset of user connections, communications, and beliefs over time, using real data from the Bluesky social network. These three components enable more accurate depictions of network structure, timing of belief updates, and measurement of model accuracy, respectively.
bsky-net
uses a newly-available, nearly-complete record of over 1 billion interactions on the Bluesky social network. As of 8/27/24, the data includes:
- 6.2M users
- >300M posts (including quotes, replies)
- >1B likes
- >100M follows
- Among other events, like reposts, blocks, etc.
This volume and granularity of raw data allows for more a more accurate model of the information flow through the network, including:
- Inherent separation of actual and perceived beliefs of others
- Anchoring belief updates to actual interactions, in contrast to arbitrary update intervals
Nevertheless, there are several nontrivial challenges to address, which will be discussed later in this overview.
To illustrate how it works in practice, we'll start with a simplified example of how the data is structured and processed, as well as how it can be used for belief dynamics modeling and validation.
The temporal graph aggregates user activity over configurable intervals (e.g., hourly, daily, weekly, monthly, etc.). Within each interval, all user activities—posts, likes, follows, etc—are collected and processed together.
The example below uses an generic time unit (
Using Bluesky's data on follows, we'll first initialize a basic follow graph. Per usual, each node in the graph represents a user on the network. Directed arrows represent follows; eg, at time
To more accurately model the actual flow of information through the network, we can add an additional "communication layer" to the network using Bluesky's data on post creations. For example, take the data point:
- At time
$t=0$ , User A created a post
Updating the graph from above based on this information, we get:
This additional information reveals a more accurate picture of the information flow on the network: despite the existence of other connections per the follow graph at time
Continuing with this idea, let's say:
- At time
$t=1$ , User B creates a post - At time
$t=2$ , both Users A and B create posts
Which results in the following graph:
At
The communication layer for each timestamp is constructed dynamically, based on both the follow graph and the post activity during that specific timestamp.
Each node's belief state can be represented by continuous or discrete values, on any scale. Assuming a discrete scale of
Note: The Bluesky dataset doesn't inherently contain internal belief measurements, but we'll discuss that in Belief inference. For now, consider these measurements as given.
At
In the continued spirit of more accurately modeling the flow of information, the internal belief(s) of a user do not directly influence the beliefs of their followers. Rather, the beliefs expressed by their posts are the sources of influence:
At time
At time
At time
Using the ground-truth internal belief states mentioned in the prior section, we can measure the accuracy of a proposed dynamics model at each timestamp.
For example, let's simulate a basic majority model on our graph. At time
At each timestamp, we can compare our majority model's predicted belief states against the ground-truth internal belief states from the follow graph. Various metrics could be used for this comparison, like accuracy, mean squared error, AUC, etc.
Regardless of the specific metric chosen, the goal is to quantitatively assess how well our model predicts the true belief states over time. By applying the same metric(s) to different belief dynamics models, we can evaluate and compare their performance.
As mentioned in Individual beliefs, the Bluesky dataset doesn't inherently describe internal belief measurements nor the belief values expressed in posts. This is by far the biggest risk to this whole idea.
However, it's plausible that the dataset contains enough information to infer both internal belief states and expressed opinions. This section is very much a work in progress; I've added some thoughts below:
The complete activity data on a user (what they post, which posts they like, users they block, etc.) might make it possible to infer their internal belief states. Importantly, we don't need belief measurements for all users at all times; instead, having accurate belief "checkpoints" for a subset of the graph across some time steps could still allow for meaningful validation.
In an optimal (but uncommon) scenario, we might encounter a user who consistently posts and likes content supporting environmental conservation, frequently blocks users who deny climate change, and regularly shares scientific articles about global warming. This clear pattern of behavior could allow us to infer with high confidence that the user has a strong pro-environment internal belief state. However, such unambiguous cases are rare, and most users would likely be much more nuanced and challenging to interpret.
Inferring the beliefs expressed in posts is more tractable than inferring internal belief states, but still nontrivial. This task requires both identifying the topic of the post ("is this post related to environmental conservation?") and classifying the support of that post for that belief ("does this post support or oppose environmental conservation?").
For example, a post stating "We need to act now to reduce carbon emissions" would be classified as related to environmental conservation and expressing support for it. Conversely, a post saying "Climate change is a hoax" would be classified as related to environmental conservation but expressing opposition to it.
There's a very large amount of literature on this sort of thing (topic modeling, opinion mining, sentiment analysis, etc). Large language models could be particularly useful for this task.
Beyond the challenges discussed in Belief inference, there are several other limitations and risks to this approach. Here's my running list:
- Almost-completeness: when reconstructing the Bluesky network's history, we can't recover deleted records/accounts.
- eg, if User A followed User B on Feb. 10th and unfollowed them on Feb 15th, our dataset would show no connection between them at any point in time.
- eg, if a user deletes their account, we would have no record of their account nor their activity on Bluesky.
- Real-time monitoring of the network's firehose can log record/account deletions, but retroactive processing is unreliable.
- Source of posts: not all posts come from accounts you follow–how can we model this in network structure?
- Selection bias: the Bluesky user base may not be representative of the general population.
- How do you handle irregular browsing behaviors? eg a user viewing a post outside of the timestamp during which it was posted, "stalking" a profile, etc.