Skip to content

Commit

Permalink
feat: public open collective dataset (#2319)
Browse files Browse the repository at this point in the history
* add: `open-collective` dataset overview

* add: `javi` as an author

* add: open collective transactions dataset `blog`

* add: analytics hub link
  • Loading branch information
Jabolol authored Oct 8, 2024
1 parent 244ba5b commit 12756b2
Show file tree
Hide file tree
Showing 8 changed files with 148 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
slug: open-collective-transactions-datasets
title: "Introducing new Open Collective transactions datasets"
authors: [jabolol]
tags: [data-science, open-collective]
image: ./total_amount_donated_oc.png
---

[Open Collective](https://opencollective.com) is a platform that enables groups
to collect and disburse funds transparently. It is used by many open-source
projects, communities, and other groups to fund their activities. Notable
projects include [`Open Web Docs`](https://openwebdocs.org) (maintainers of
[MDN Web Docs](https://developer.mozilla.org/en-US/)),
[`Babel`](https://babel.dev), and [`Webpack`](https://webpack.js.org).

At Open Source Observer, we have been working on collecting and processing Open
Collective data to make it available for analysis. This includes **all
transactions** made on the platform, such as donations, expenses, and transfers.
Datasets are updated weekly.

<!-- truncate -->

## Datasets

Transactions are the core of Open Collective data. They represent the flow of
funds between entities on the platform. We currently provide two datasets for
Open Collective transactions: `expenses` and `deposits`. These datasets contain
all transactions of the respective type.

Both datasets follow the same schema. This schema contains important fields such
the transaction's id, group, type and amount. For the full reference, see the
BigQuery schema.

## Use cases

The Open Collective transactions datasets can be used for a variety of
applications. Here are some examples:

### Donor contribution patterns

One use case is to analyze overall donor contribution patterns. This involves
examining the frequency of donations, the amounts donated, and the growth of
donations over time.

For example, the plot below illustrates the frequency of different donation
amounts. We can see that the most common donation is around **$1,500**, with
approximately **600,000** donations at this level. This indicates that people
tend to donate around $1,500 more frequently than other amounts, suggesting that
establishing a donation tier at this level would be a good idea.

![Frequency of donation quantities](./donation_amount_vs_contribution_frequency.png)

### Analyzing Donation Trends and Retention

Another use case is to analyze donation trends and retention. This involves
examining how donations have changed over time and how many donors have
continued to donate over time.

This plot illustrates the total donations made to the
[`Pandas`](https://pandas.pydata.org) project over time. Notably, donations
reached an all-time high in April 2022 but have since declined. As of September
2024, there has been a slight rebound in donations; however, they remain at less
than one-eighth of their peak average.

![Pandas donations](./total_amount_donated_pandas.png)

On the other hand, the plot below shows the total amount donated to the
[`Babel`](https://babeljs.io) project over time. One interesting observation is
that donations spiked in the beginning of 2021. This spike was likely due to the
project's widespread usage during the global pandemic, as many developers turned
to open-source tools to build applications.

![Babel donations](./total_amount_donated_babel.png)

## Getting started

To get started with the Open Collective transactions datasets, you can access
them directly in BigQuery. The datasets are available in the
`opensource-observer` project. For instance, getting the total amount of
donations per month in USD can be done with the following query:

```sql
SELECT
DATE_TRUNC(DATE(created_at), MONTH) AS donation_month,
SUM(CAST(JSON_VALUE(amount, "$.value") AS FLOAT64)) AS total_amount
FROM
`opensource-observer.open_collective.deposits`
WHERE
JSON_VALUE(amount, "$.currency") = "USD"
AND DATE(created_at) >= DATE("2020-01-01")
GROUP BY
donation_month
ORDER BY
donation_month;
```

This data can then be visualized to gain insights into donation patterns and
trends. We can see that donations in Open Collective have massively increased in
the last year, with almost **8M USD** donated in the last months.

![Total donations per month](./total_amount_donated_oc.png)

These are just some examples of the insights that can be gained from the Open
Collective transactions datasets. We hope that these datasets will be useful for
researchers, data scientists, and anyone interested in understanding the flow of
funds in the open-source ecosystem.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions apps/docs/blog/authors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,9 @@ ravenac95:
title: Engineer
url: https://github.com/ravenac95
image_url: https://github.com/ravenac95.png

jabolol:
name: Javier Ríos
title: Engineer
url: https://github.com/jabolol
image_url: https://github.com/jabolol.png
36 changes: 36 additions & 0 deletions apps/docs/docs/integrate/overview/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import GitcoinLogo from "./gitcoin.png";
import OpenrankLogo from "./openrank.png";
import ArbitrumLogo from "./arbitrum.png";
import EasLogo from "./eas.png";
import OcLogo from "./open-collective.png";

First, you need to set up your BigQuery account. You can do this by going to the
[Get Started](../../get-started/index.mdx)
Expand Down Expand Up @@ -570,6 +571,41 @@ where

**Remember to replace 'YOUR_PROJECT_NAME' with the name of your project in the query.**

### Open Collective Data

<img src={OcLogo} width="200" />

<Button
size={"compact"}
color={"blue"}
target={"_blank"}
link={
"https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae/listings/open_collective_1926d37f24d"
}
children={"Subscribe on BigQuery"}
/>{" "}

- [Reference documentation](https://models.opensource.observer/#!/source_list/open_collective)
- [Updated weekly](https://dagster.opensource.observer/assets/open_collective)

[Open Collective](https://opencollective.com/) is a platform for transparent finances and governance for open source projects.

The Open Collective datasets contains all transactions realized on the platform since its inception. Separate datasets are available for **expenses** and **deposits**.

For example, you can get the total amount of donations in `USD` made to the [**pandas**](https://pandas.pydata.org) project:

```sql
select
SUM(CAST(JSON_VALUE(amount, "$.value") as FLOAT64)) as total_amount,
from
YOUR_PROJECT_NAME.open_collective.deposits
where
JSON_VALUE(amount, "$.currency") = "USD"
and JSON_VALUE(to_account, "$.id") = "ov349mrw-gz75lpy9-975qa08d-jeybknox"
```

**Remember to replace 'YOUR_PROJECT_NAME' with the name of your project in the query.**

## Subscribe to a dataset

### 1. Data exchange listings
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 12756b2

Please sign in to comment.