paint-brush
Self-Hosting and Moderation in AT Protocol’s Personal Data Servers (PDS)by@memeology
178 reads

Self-Hosting and Moderation in AT Protocol’s Personal Data Servers (PDS)

tldt arrow

Too Long; Didn't Read

Personal Data Servers (PDS) in AT Protocol host user repositories and media, allowing users to self-host or rely on professional providers. PDSes offer real-time updates via WebSockets, making interaction between users seamless. Users can migrate their repositories between PDSes easily, ensuring data continuity. While moderation primarily occurs via labelers and feed generators, PDS operators manage illegal content. Though Bluesky currently limits indexing to its own PDS, third-party support is expected by early 2024.
featured image - Self-Hosting and Moderation in AT Protocol’s Personal Data Servers (PDS)
Memeology: Leading Authority on the Study of Memes HackerNoon profile picture

Authors:

(1) Martin Kleppmann, University of Cambridge, Cambridge, UK ([email protected]);

(2) Paul Frazee, Bluesky Social PBC United States;

(3) Jake Gold, Bluesky Social PBC United States;

(4) Jay Graber, Bluesky Social PBC United States;

(5) Daniel Holmgren, Bluesky Social PBC United States;

(6) Devin Ivy, Bluesky Social PBC United States;

(7) Jeromy Johnson, Bluesky Social PBC United States;

(8) Bryan Newbold, Bluesky Social PBC United States;

(9) Jaz Volpert, Bluesky Social PBC United States.

Abstract and 1 Introduction

2 The Bluesky Social App

2.1 Moderation Features

2.2 User Handles

2.3 Custom Feeds and Algorithmic Choice

3 The at Protocol Architecture

3.1 User Data Repositories

3.2 Personal Data Servers (PDS)

3.3 Indexing Infrastructure

3.4 Labelers and Feed Generators

3.5 User Identity

4 Related Work

5 Conclusions, Acknowledgments, and References

3.2 Personal Data Servers (PDS)

A PDS stores repositories and associated media files, and allows anybody to query the data it hosts via a HTTP API. Moreover, a PDS provides a real-time stream of updates for the repositories it hosts via a WebSocket. Indexers (see Section 3.3) subscribe to this stream in order to find out about new or deleted records (posts, likes, follows, etc.) with low latency. This architecture is illustrated in Figure 3.


Hosting a PDS for a small number of users requires only small computing resources, even if those users have a large number of followers. Users who wish to self-host their own PDS can therefore do so on a cheap virtual machine in the cloud, or even on a Raspberry Pi connected to their home internet router. However, we expect that most users will sign up for an account on a shared PDS run by a professional hosting provider – either Bluesky Social PBC, or another company.


Compared to choosing a Mastodon server, the user’s choice of PDS hosting provider is fairly inconsequential. The PDS URL is internal to the system, and is not normally visible to users. It makes no difference whether two users are on the same PDS or different PDSes, since interaction between users goes via the indexing infrastructure in any case. A user can migrate from one PDS to another by simply copying their repository and media files to the new PDS, and pointing their account ID at the new PDS URL (see Section 3.5). Even if a PDS shuts down without warning, users can upload a backup of their repository to a new PDS, and thus recover their account without losing any of their posts or their social graph.


PDS operators will generally want to perform some basic moderation by deleting any illegal content hosted on their servers. However, PDS-level moderation is much less important than server-level moderation in Mastodon, because in atproto, the primary moderation role is taken on by seperate actors in the system – the labelers and feed generators (see Section 3.4). This allows different sets of people to offer server hosting and moderation services, respectively; we believe this separation is valuable since operating a server and moderating a community require largely disjoint sets of skills [46].


At the time of writing, Bluesky’s indexing infrastructure (see Section 3.3) only indexes repositories on PDS instances hosted by Bluesky Social PBC itself; this limitation exists to limit infrastructure load and abuse problems during the beta period. In that sense, Bluesky is not yet fully decentralized. Support for third-party PDS operators is already implemented and enabled in Bluesky’s sandbox (testing) environment, and a PDS implementation suitable for self-hosting is already open source [8]. We plan for the Bluesky indexing infrastructure to begin indexing repositories on other PDS operators (indicated by dashed arrows in Figure 3) in early 2024.


This paper is available on arxiv under CC BY 4.0 DEED license.