What goes in to a Bluesky or atproto SDK? #2415

bnewbold · 2024-04-17T03:16:33Z

bnewbold
Apr 17, 2024
Maintainer

What does a "complete" SDK for Bluesky and atproto look like? Bluesky (the company) maintains reference implementations in TypeScript and Go, but even these still have some gaps.

This purpose of this document is to categorize components of the atproto specs (https://atproto.com), and point out which pieces are helpful for different kinds of software projects.

SDK Components

Basics

Almost all atproto software is likely to need these, often as a dependency of other packages.

HTTP API Client: handles authentication, session refresh, HTTP headers (proxying, labels). Works for PDS connections+proxying, or direct connections to other services. Preferably generic practices like HTTP retries with back-off, meaningful errors. Will need to include OAuth client support.

Lexicon Types: native language data types (structs, classes, whatever) for com.aptroto and app.bsky record types and API endpoint requests/responses. Often code-generated, and should work work with API client

Identifier Syntax: validators, and optionally types, for NSID, DID, Handle, Record Key, AT-URI, TID, CID, etc

Bluesky Client

Things specific to Bluesky. Very helpful for apps and bots.

Post Helpers: parse text and extract facets. Compute length in UTF-8 bytes and Unicode Grapheme Clusters. resolve link cards and quote posts. upload and include images as blobs, possibly with image resizing. Superset of “Twitter.js” functionality.

Social Graph Helpers: helpers to manage follow/like/block/mute relationships, both creating, reading, and removing. Handles de-duplication.

Label Behaviors: given config/preferences, labeler declarations, and labels on content (and accounts), compute behaviors (”blur”, etc). Does not include signature validation.

Preferences: helpers for managing private config and preferences

Protocol and Data Structures

Getting in to protocol details more. Simple clients probably don’t need most of these, but advanced clients, tooling, integrations, and services all might.

Keys and Cryptography: public key cryptography (signing and validation) for all the supported cryptographic systems. Key generation, de/serialization, hashing.

MST and Repository: ability to parse and work with binary repository data and MST data structure. Reading and writing CAR files. Note that this is used by firehose consumers.

Data Model: transforming between JSON and CBOR. Checking hard limits, validating that arbitrary JSON or CBOR is or isn’t atproto data (eg, floats). Extract $type, enumerate all blobs.

Lexicon Validation: ability to take arbitrary JSON or CBOR, and a Lexicon schema JSON file, and validate at runtime (aka, not using codegen)

Identity Resolution: resolution of all supported DID types, and all handle resolution mechanisms.

Stream Client: consume events from a subscription (WebSocket), parse frames and messages, validate against Lexicon. Handle both minimal streams, and streams with cursors (keep track of cursor processing, re-connects with last cursor).

Service Auth: creation and validation of service auth JWTs (separate from OAuth)

Lexicon Codegen: related to having native language data types, the ability to generate new types (for records, other objects, subscription messages, HTTP clients, and HTTP servers)

PLC Operations: basic ability to create, sign, and parse native PLC operations. Does not need ability to audit complete PLC logs (eg, resolve “forks”).

OAuth Backend: server-side implementation of OAuth. might include DPoP functionality for clients/integrations with a server component?

Service Components

These are mostly useful for implementing infrastructure and backend services.

HTTP Server: optionally, generic helpers for creating servers for atproto services

Identity Directory: building on simple identity resolution, a persistent cache of identities, with automatic cache expiration as well as manual purge/refresh behaviors.

Repository Storage: some mechanism for storing complete repositories in persistent storage (on disk). Best if this is an interface/abstraction allowing multiple storage implementations.

Stream Server: generic mechanism to have stream messages reliably sequenced, and persisted for a configurable backfill window. Presumably built on some datastore.

SDK Progress (April 2024)

	TypeScript (`atproto`)	Golang (`indigo`)
Basics	—	—
API Client	🟢	🟡
Lexicon Types	🟢	🟢
Identifier Syntax	🟢	🟢
Bluesky Client	—	—
Post Helpers	🟢	⭕
Graph Helpers	🟢	⭕
Label Behaviors	🟢	⭕
Preferences	🟡/❓	⭕
Protocol + Data	—	—
Keys and Crypto	🟢	🟢
MST and Repo	🟢	🟢
Data Model	🟡/❓	🟢
Lex Validation	🟢	🚧
Identity Resolution	🟢	🟢
Stream Client	🟢	🟡
Service Auth	🟢	❓
Lex Codegen	🟢	🟡
PLC Operations	🟢	🚧
OAuth Backend	🚧	⭕
Service Pieces	—	—
HTTP Server	🟢	🟡
Identity Directory	🟢	🟢
Repo Storage	🟢	🟢
Stream Server	🟢	🟡

✅ - great! complete, documented, examples, accessible to new devs with no atproto experience
🟢 - decent. mostly implemented, could point experienced devs at it
🟡 - partial progress: incomplete, undocumented, not ergonomic
🚧 - early work in progress, but not usable yet
⭕ - nothing started
🟣 - something exists; not assessed
❓ - unknown (need to check status)

Bossett · 2024-04-17T04:56:28Z

Bossett
Apr 17, 2024

This is great - is there an intent that both Go & TypeScript will have feature parity (either short or long term)?

3 replies

bnewbold Apr 17, 2024
Maintainer Author

yup, that is the goal! with basic interoperability demonstrated between them.

what isn't as clear is which other languages need completeness. mobile-oriented languages like Kotlin and Swift would want to prioritize client ergonomics. having a relatively complete, modular, and clean implementation in Python could be great for learning how the protocol works. having a compiled C ABI-compatible library for the binary protocol+data bits could be wrapped by higher-level languages. JVM ecosystem is it's own thing. etc, etc.

orual Apr 17, 2024

Exactly, working on Morpho (Android/multiplatform Kotlin), there's a lot of this I just wouldn't end up needing because I'm focused essentially entirely on the client side.

I'm attempting to split out a Kotlin Bluesky SDK from the app itself just in case others want it, and debating how much on top of the basic bindings is good to add to the SDK, versus keeping within the application code. What's been the approach there in Typescript? I'm not familiar enough with it to get a good idea just from reading the app source code, but it seems like you're advocating for a pretty complete solution rather than mostly language bindings/data structures.

bnewbold Apr 17, 2024
Maintainer Author

We use TypeScript both for front-end / client development, and also a lot of backend services (node.js). It was also the first and primary language used for atproto development, so that SDK is the most mature and complete. The client-oriented language SDKs are mostly coming from the community; the intent of this post is to try and untangle what that means!

MasterJ93 · 2024-04-17T06:07:27Z

MasterJ93
Apr 17, 2024

Is this something that should be used when making your own third-party API library? Or is there more to it for API library developers?

And a separate question:

Lexicon Codegen

Is this basically asking for the models and methods to be automatically generated with a script, or does this mean something else?

1 reply

bnewbold Apr 17, 2024
Maintainer Author

This is mostly just laying out how we think about language SDKs; folks building on their own are of course free to structure things as they see fit.

Yeah, Lexicon Codegen is the ability to automatically generate types (or "models") from a machine-readable Lexicon schema (JSON). This is useful for application developers. For example https://whtwnd.com/ is a blogging system built on atproto. A generic atproto SDK might decide to build-in support for the bsky.app Lexicons (they are pretty popular), but it isn't reasonable to expect SDKs to support every Lexicon schema out in the wild. Instead, it would be nice if projects like whtwnd could take a generic SDK, run codegen, and release an application-specific SDK in that language to work with their app.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What goes in to a Bluesky or atproto SDK? #2415

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

What goes in to a Bluesky or atproto SDK? #2415

bnewbold Apr 17, 2024 Maintainer

SDK Components

Basics

Bluesky Client

Protocol and Data Structures

Service Components

SDK Progress (April 2024)

Replies: 2 comments · 4 replies

Bossett Apr 17, 2024

bnewbold Apr 17, 2024 Maintainer Author

orual Apr 17, 2024

bnewbold Apr 17, 2024 Maintainer Author

MasterJ93 Apr 17, 2024

bnewbold Apr 17, 2024 Maintainer Author

bnewbold
Apr 17, 2024
Maintainer

Replies: 2 comments 4 replies

Bossett
Apr 17, 2024

bnewbold Apr 17, 2024
Maintainer Author

bnewbold Apr 17, 2024
Maintainer Author

MasterJ93
Apr 17, 2024

bnewbold Apr 17, 2024
Maintainer Author