Retrieval Event DB #873

jcace · 2023-01-13T19:55:24Z

We need a database to store the stream of retrieval event data that we use to compute reputation scores from. We need to define (1) the schema for the db, and (2) the underlying database technology to use

Bedrock has defined a great schema that we can base this off: https://www.notion.so/Retrieval-Reputation-Schema-edcf4e8b89674343a45f62215c6e6ea9

Database Technologies

First option - Pando

Pando is a custom database solution designed for Filecoin reputation data. Bedrock is planning to use it to store retrieval statistics data which will be used to compute reputation.

Explore how we can integrate Autoretrieve stats into Pando. Investigate what it looks like to push data in / pull data out

Second option - Event DB built on top of Postgres

prefer an existing stats db like https://github.com/application-research/estuary-metrics.

Third Option - Timeseries DB

TimescaleDB (Postgres) https://github.com/timescale/timescaledb
InfluxDB (open-source time-series DB) https://github.com/influxdata/influxdb

jcace · 2023-01-17T00:27:56Z

Just discovered we already have a database called estuary-metrics. This could serve as a nice place to store all these raw metrics:

I think it might make sense to tweak the schema of estuary-metrics : remove retrieval_success_records and retrieval_failure_records , and instead combine them into a single retrieval_events table. This new table would look mostly like the retrieval_success_records , with a flag for failed to capture the failure events.

Since we need both success/failure counts in our reputation calculation, I think this would make it quite ergonomic for us. We could query it once (aggregate by matching sp, in a given timestamp window),

jcace · 2023-01-17T00:30:20Z

also look at https://github.com/filecoin-project/cidtravel approach

jcace added the New Feature Issues that we will work on with people or ourselves label Jan 13, 2023

jcace added this to the Incentivized Retrievals Q1 milestone Jan 13, 2023

jcace self-assigned this Jan 13, 2023

jcace changed the title ~~Prove out Pando DB~~ Retrieval Event DB Jan 13, 2023

jcace mentioned this issue Jan 17, 2023

Autoretrieve Stats - Data Ingestion #872

Open

jcace added this to Estuary Retrieval Improvements Jan 19, 2023

jcace moved this to In Progress in Estuary Retrieval Improvements Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieval Event DB #873

Retrieval Event DB #873

jcace commented Jan 13, 2023 •

edited

Loading

jcace commented Jan 17, 2023

jcace commented Jan 17, 2023

Retrieval Event DB #873

Retrieval Event DB #873

Comments

jcace commented Jan 13, 2023 • edited Loading

Database Technologies

First option - Pando

Second option - Event DB built on top of Postgres

Third Option - Timeseries DB

jcace commented Jan 17, 2023

jcace commented Jan 17, 2023

jcace commented Jan 13, 2023 •

edited

Loading